Kumar Et Al., 2024
Kumar Et Al., 2024
Sandeep Kumar
Saroj Hiranwal
Ritu Garg
S. D. Purohit Editors
Proceedings
of International
Conference
on Communication
and Computational
Technologies
ICCCT 2024, Volume 1
Lecture Notes in Networks and Systems
Volume 1121
Series Editor
Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of
Campinas—UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Türkiye
Derong Liu, Department of Electrical and Computer Engineering, University of
Illinois at Chicago, Chicago, USA
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering, University of
Alberta, Alberta, Canada
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to both
the contributors and the readership are the short publication timeframe and
the world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH,
SCImago.
All books published in the series are submitted for consideration in Web of Science.
For proposals from Asia please contact Aninda Bose (aninda.bose@springer.com).
Sandeep Kumar · Saroj Hiranwal · Ritu Garg ·
S. D. Purohit
Editors
Proceedings of International
Conference
on Communication
and Computational
Technologies
ICCCT 2024, Volume 1
Editors
Sandeep Kumar Saroj Hiranwal
Department of Computer Science Victorian Institute of Technology
and Engineering Adelaide, VIC, Australia
CHRIST (Deemed to be University)
Bangalore, Karnataka, India S. D. Purohit
Rajasthan Technical University
Ritu Garg Kota, Rajasthan, India
Department of Computer Engineering
National Institute of Technology
Kurukshetra, Haryana, India
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
General Chair
Organizing Chair
Publicity Chair
v
vi Organization
This book contains outstanding research papers as the proceedings of the 6th Interna-
tional conference on communication and computational technologies (ICCCT 2024).
ICCCT 2024 has been organized by Rajasthan Institute of Engineering and Tech-
nology, Jaipur, India, and technically sponsored by Soft Computing Research Society,
India. The conference was conceived as a platform for disseminating and exchanging
ideas, concepts, and results of the researchers from academia and industry to develop
a comprehensive understanding of the challenges of the advancements in commu-
nication and computational technologies and innovative solutions for current chal-
lenges in engineering and technology viewpoints. This book will help in strength-
ening amiable networking between academia and industry. The conference focused
on the intelligent system: algorithms and applications, informatics and applications,
communication and control systems.
We have tried our best to enrich the quality of the ICCCT 2024 through a stringent
and careful peer-review process. ICCCT 2024 received many technical contributed
articles from distinguished participants from home and abroad. ICCCT 2024 received
676 research submissions. After a very rigorous peer-reviewing process, only 77
high-quality papers were finally accepted for presentation and the final proceedings.
This book presents the first volume of 39 research papers related to communica-
tion and computational technologies and serves as reference material for advanced
research.
ix
Contents
xi
xii Contents
xv
xvi Editors and Contributors
Contributors
1 Introduction
Indian Space Research Organization (ISRO) successfully selected India from the list
of countries recognized rightfully as space powers worldwide. The Indian Regional
Navigation Satellite System (IRNSS) is an independent regional navigation operating
satellite system named Navigation with Indian Constellation (NavIC). The system
provides positioning, navigation, and timing (PNT) services, covering a region of
India around 1500 km from it. Two kinds of services are restricted service (RS) and
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 1
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_1
2 K. Ansari et al.
standard positioning service (SPS), available in the IRNSS system. Currently, the
IRNSS system contains a constellation of eight fully operational satellites, which
includes inclined geosynchronous orbit (IGSO) satellites at 55° E and 111.75° E
and geostationary equatorial orbit (GEO) satellites at longitudes 32.5° E, 83° E, and
131.5° E, roughly 35,786 km above the Earth surface (Table 1). These IGSO and
GEO satellites’ visibility increases longitudinally and latitudinally over the Asia–
Pacific region [1, 2]. After establishing the IRNSS constellation, India combined the
club of a few selected countries with satellite navigation capability. It made India
an independent country in terms of navigational abilities, especially during military
operations. Ansari [3] selected a day (YYDOY 21,135) individually and plotted the
number of satellite visibility (NSV) on a global scale with a contour plot as displayed
in Fig. 1. The contour plot scale has been selected from blue to red, which means
the lowest NSV was shown with blue color while the highest NSV indicated by red
color. Three different NSV combinations with cutoff angles of 0°, 5°, and 10° are
utilized. It is visible from the figure that the maximum number of satellite visibility
is 6 in the Indian surrounding region. The area of this region keeps decreasing when
the angle of elevation increases. On the eastern side, IRNSS satellite visibility is in
Japan, while on the western side, it goes to Africa.
Like other constellation systems, IRNSS also faces various sources of errors,
which become the cause of limited accuracy. Among all of them, the irregularities of
the ionosphere are considered the significant source of error that degrades the receiver
performance or even results in loss of lock. The trans-ionospheric IRNSS signals
often experience random fluctuations in the phase and amplitude of the received
signal. Dey et al. [4] studied night-time plasma bubbles and total electron content
(TEC) variation on the IRNSS positioning at low-latitude stations from March 2019
to December 2019. They reported the scintillation characteristics observed on IRNSS
(L5 and S-band) recorded signals and investigated TEC’s daily and seasonal vari-
ations during the low solar activity period of 2019. It is observed that the position
error was maximum in the afternoon hours when the TEC was higher than at other
times. Moderate scintillation was observed during equinoctial months. However, the
position error during scintillation nights was not statistically higher than during other
quiet nights. Desai and Shah [5] noticed that geomagnetic storms influenced the accu-
racy of ionospheric delay IRNSS (L5 band) signals and positioning. They compared
IRNSS L5 and S bands signal performances during the storm, which occurred on 8
September 2017. It has been found that both L5 and S bands of IRNSS signals face
positioning accuracy challenges like other constellations. They found that the IRNSS
L5 band signals suffer from around 30–40% more ionospheric delay than IRNSS band
signals, but there is a loss of signals in the S-band compared to L5 band signals [5].
Several studies have been carried out based on simulation; some use real data [6–12].
Kumari et al. [13] tested positioning accuracy during solar radiation pressure using
IRNSS-1A and IRNSS-1B measurements. Chandrasekhar et al. [14] employed real
data of IRNSS-1A, 1B, and 1C data and validated the accuracy of orbits. The quality
of IRNSS-1A and 1B has been investigated by Montenbruck et al. [15]. IRNSS works
4 K. Ansari et al.
The IRNSS system is very fruitful for Indian-made equipment such as drones, fighter
jets, submarines, and other weapons. The system ensures that the Indian defense
forces can collect exact information on enemy positions and track the precise move-
ments of their troops. The navigation system can be used under the control of India
and will ensure the availability of accurate signals under critical military and political
situations. In India, natural disasters such as tsunamis, floods, earthquakes, cyclones,
landslides, and manmade disasters like the breaking and collapsing of dams occur
Economically Growth and Impact of Indian Regional Navigation … 5
frequently. The transport and communication system breaks down during these kinds
of extreme events. Disaster management teams can use IRNSS signals easily in
affected areas to save lives and mitigate the disaster response. The oil and mining
fields can be supervised for possible land subsidence. Any suitable action can be
taken during abnormal conditions to prevent the accident.
The Ministry of Road Transport and Highways of India has made a mandatory
rule that the IRNSS trackers will be fitted in all commercial vehicles. This will allow
law implementation activities to track commercial vehicles through an autonomous
government-controlled system. The system can be used to plot the Indian regional and
surrounding terrain maps by applying a geodetic survey. In mobile phones, IRNSS
receivers can be integrated easily. This integration will cover drivers’ visual and voice
navigation paths and terrain mapping like Google Maps. The signals can be utilized
for terrestrial navigation by travelers and hikers without fear of getting lost.
Dan et al. [20] tried to study the PDOP effects on the accuracy of IRNSS solution
by using long-term L5 and S-band variation and studied its impact on positioning
error. They observed that 3-dimensional error in positioning lies below 6 m. The
performance of IRNSS constellation system satellite visibility under constrained
and sky conditions in service regions has been presented in detail by Dan et al. [20].
They used a simulated tool and studied the performance accuracy of single-point
positioning in single and dual-frequency modes. The results showed the potential of
IRNSS for India and neighboring countries as an alternative solution in south and
southeast Asia and major sea routes, which are economically very important. Several
researchers in India and at the international level have discussed the capabilities of
the IRNSS system. IRNSS and IRNSS-GPS geometry of satellites and improvements
have been explained by Dutt et al. [21], Rajasekhar et al. [22], and Sekhar et al. [23].
Geosynchronous satellite advantages in terms of satellite visibility in IRNSS-GPS
operations are predicted from India [24, 25]. Using simulated results, Rao et al. [26]
analyzed potential, geometry, reliability, and availability with GPS and GLONASS.
Odijk et al. [27] used a novel approach by estimating L5 frequency signals differential
intersystem biases of multi-constellation systems, including IRNSS (GPS, Galileo,
QZSS, and IRNSS). They found a higher resolution of ambiguity (~67%) for the
combined constellation compared to individual constellations.
The IRNSS system works like other constellations worldwide and reduces the depen-
dency on GPS for accurate targeting and positioning with a resolution of 20 m. Now,
India has been involved in one of the five countries with their personal navigation
system like GPS for the USA, Galileo of Europe, GLONASS for Russia, BeiDou for
China, and QZSS for Japan. India influences its navigational system to strengthen ties
with neighbors in forums like SAARC and by giving small nations access to naviga-
tional services. It is useful for Sailors and fishermen who often lose their way in the
uncertainty of oceans and reach Sri Lanka and Pakistan by mistake. IRNSS increases
6 K. Ansari et al.
India’s respect in many fields in the global community. Its coverage included 1500 km
around India, thus including parts of the Indian Ocean, the Himalayan region, to parts
of Kazakhstan, and the Middle East. It provides location services in the Persian Gulf
and the Indian Ocean.
Moreover, Indians are no longer dependent on others, which would allow India to
stand with confidence globally. The IRNSS system helps create a friendly relationship
with neighboring countries by providing them with real-time information during
disasters and calamities to mitigate their aftereffects and make earlier plans. The
IRNSS system enhances the value of India in the South Asia region by providing
information where currently China has influenced and become independent from
other countries, which may create suspicion for sharing information during Wartime,
e.g., during the Kargil war. In the future, IRNSS service can be provided to the
neighboring countries for commercial purposes or freely as part of the geostrategic
movement. Additionally, more satellites will soon be added to reach the IRNSS
signals in more and more areas until it attains global coverage.
The IRNSS has become fully operational at the current time. Understanding the
economic impact and potential of navigation systems for residents is very important.
These are some fields where it can be used, but once its testing and verification is
done, it can be applied in many more areas and become a great source of income. It
has both commercial and strategic applications. ISRO has delivered many projects
for military service and India’s social and economic growth. ISRO has demonstrated
that social and economic growth can significantly increase by combining ground and
earth observation, satellite communication, and navigation. The Mangalyaan and
Chandrayaan space missions are the most notable ones. These missions are not just
an example of technology but also the expansion of knowledge in space science. This
step will promote manufacturing, startup, and laborer skills.
GNSS systems provide time stamps in commercial transactions, wireless network
communications, astronomical measurements, plants of power for grid synchroniza-
tion, and other applications because they can deliver precise time [29, 30]. The global
space economies are currently priced at about 360 billion USD. In the United States,
it is estimated that GPS has produced approximately USD 1.4 trillion in economic
benefits since 1980 when it was established as available for commercial and civilian
purposes. Most benefits have been accumulated in the last ten years following quick
gains in technology information, the obtainability of robust wireless services, and the
reduction and commoditization of influential devices. Despite the impressive capa-
bilities of ISRO, India’s share is estimated at USD 7 billion, which is only about 2%
of the space economy (Table 2). ISRO has made wonderful progress in space-based
applications with social and economic growth, which is also a part of its mission.
Private sectors have played an important and gradually progressive character in other
space-faring countries in globalizing space economies. Although the private space
Economically Growth and Impact of Indian Regional Navigation … 7
industry is limited in India for being a supplier and vendor, the government is trying
to provide good scope for a non-governmental organization to participate in and
enhance the space program. The government encourages industries to play a key role
in boosting Indian share markets in the global space economy. Above all, the space
sector can raise a vibrant ecosystem of private industries and startups. The space
sector replicates what can be seen in the information technology (IT) sector after
contributing to the Indian economy’s growth story. This will undoubtedly increase
India’s significant shares in the global space market [28]. Jagiwala and Shah [31]
studied the impact of Wi-Fi interference on IRNSS signals. They noticed that elec-
tronic systems and telecommunication, such as ultra-wideband radar, personal elec-
tronics, mobile satellite networks, etc., could interfere with the reception of IRNSS
L5 and S bands. IRNSS applications in agriculture, forestry, aeronautical, marine,
etc., are very beneficial because they are much less affected by ionospheric error
compared to L band signals. They demonstrate a reduction of multipath error and
phase noise. It is observed that the IRNSS reception on the frequency of the S-band
is strictly affected by the transmission of Wi-Fi. These interfering signals present a
threat to the performance of IRNSS signals. It will be challenging to mitigate such
errors and to equip them with facilities for future cell phones.
There are several other kinds of studies have already been done for other constel-
lations, such as crustal deformation [32–35], positioning coordinates [36, 37], tropo-
sphere [38, 39], and ionosphere [40–42], but now they are possible by IRNSS. In the
future, we plan to study such kinds of studies and implicate them for IRNSS studies.
5 Conclusion
Acknowledgements The Warsaw University of Technology funded the research within the
Excellence Initiative: Research University (IDUB) program.
References
1. Rao VG, Lachapelle G, VijayKumar SB (2011) Analysis of IRNSS over Indian subcontinent.
In: Proceedings of the 2011 international technical meeting of the institute of navigation, pp
1150–1162
2. Zaminpardaz S, Teunissen PJ, Nadarajah N (2017) IRNSS/NavIC single-point positioning: a
service area precision analysis. Mar Geodesy 40(4):259–274
3. Ansari K (2023) Investigation of the standalone and combined performance of IRNSS and
QZSS constellations over the Asia-Pacific region. Wirel Pers Commun 130(4):2887–2901.
https://doi.org/10.1007/s11277-023-10408-1
4. Dey A, Joshi LM, Chhibba R, Sharma N (2021) A study of ionospheric effects on IRNSS/
NavIC positioning at equatorial latitudes. Adv Space Res 68(12):4872–4883
5. Desai MV, Shah SN (2021) Case study: performance observation of NavIC ionodelay and
positioning accuracy. IETE Tech Rev 38(2):256–266
6. García AM, Píriz R, Samper MDL, Merino MMR (2010) Multisystem real time precise-point-
positioning, today with GPS+ GLONASS in the near future also with QZSS, Galileo, compass,
IRNSS. In: The international symposium on GPS/GNSS, Taiwan
7. Sarma AD, Sultana Q, Srinivas VS (2010) Augmentation of Indian regional navigation satellite
system to improve dilution of precision. J Navigat 63(2):313–321
8. Sekar SB, Sengupta S, Bandyopadhyay K (2012) Spectral compatibility of BOC (5, 2) modu-
lation with existing GNSS signals. In: Proceedings of the 2012 IEEE/ION position, location
and navigation symposium. IEEE, pp 886–890
9. Rethika T, Mishra S, Nirmala S, Rathnakara SC, Ganeshan AS (2013) Single frequency iono-
spheric error correction using coefficients generated from regional ionospheric data for IRNSS.
94.20. Vv; 84.40. Ua; 91.10. Fc
10. Rao VG (2013) Proposed LOS fast TTFF signal design for IRNSS. PhD dissertation, University
of Calgary, Calgary, Canada
11. Su XL, Zhan X, Niu M, Zhang Y (2012) Performance comparison for combined navigation
satellite systems in asia-pacific region. J Aeronaut Astronaut Aviat 44(4):249–257
12. Thoelert S, Montenbruck O, Meurer M (2014) IRNSS-1A: signal and clock characterization
of the Indian regional navigation system. GPS Solut 18:147–152
13. Kumari A, Samal K, Rajarajan D, Swami U, Babu R, Kartik A, Rathnakara SC, Ganeshan
AS (2015) Precise modeling of solar radiation pressure for IRNSS satellite. J Nat Sci Res
5(3):35–43
14. Chandrasekhar MV, Rajarajan D, Satyanarayana G, Tirmal N, Rathnakara SC, Ganeshan AS
(2015) Modernized IRNSS broadcast ephemeris parameters. Control Theory Inform 5(2):1–9
15. Montenbruck O, Steigenberger P, Riley S (2015) IRNSS orbit determination and broadcast
ephemeris assessment. Paper presented at International technical meeting of the institute of
navigation, Dana Point, CA, January 26–28, pp 185–193
16. Sharma KP, Poonia RC (2018) Review study of navigation systems for Indian regional navi-
gation satellite system (IRNSS). In: Soft computing: theories and applications: proceedings of
SoCTA 2016, vol 1. Springer, Singapore, pp 735–742
17. Saikiran B, Vikram V (2013) IRNSS architecture and applications. KIET Int J Commun
Electron 1(3):21–27
Economically Growth and Impact of Indian Regional Navigation … 9
18. Ansari K (2023) Review on role of multi-constellation global navigation satellite system-
reflectometry (GNSS-R) for real-time sea-level measurements. In: Structural geology and
tectonics field guidebook,vol 2. Springer Geology, Springer, pp 333–358. https://doi.org/10.
1007/978-3-031-19576-1_13
19. Koyuncu H, Yang SH (2010) A survey of indoor positioning and object locating systems.
IJCSNS Int J Comput Sci Netw Secur 10(5):121–128
20. Dan S, Santra A, Mahato S, Bose A (2020) NavIC performance over the service region:
availability and solution quality. Sādhanā 45:1–7
21. Dutt VSI, Rao GSB, Rani SS, Babu SR, Goswami R, Kumari CU (2009) Investigation of
GDOP for precise user position computation with all satellites in view and optimum four
satellite configurations. J Ind Geophys Union 13(3):139–148
22. Rajasekhar C, Srilatha Indira Dutt VBS, Sasibhushana Rao G (2016) Investigation of the best
satellite–receiver geometry to improve positioning accuracy using GPS and IRNSS combined
constellation over Hyderabad region. Wirel Pers Commun 88:385–393
23. Sekhar CR, Dutt VSI, Rao GS (2016) GDoP estimation using simulated annealing for GPS
and IRNSS combined constellation. Eng Sci Technol Int J 19(4):1881–1886
24. Kiran B, Raghu N, Manjunatha KN, Raghavendra Kumar M (2016) Tracking and analysis of
three IRNSS satellites by using satellite tool kit. IJARIIE 1(5):90–95p
25. Raghu N, Kiran B, Manjunatha KN (2016) Tracking of IRNSS, GPS and hybrid satellites by
using IRNSS receiver in STK simulation. In: 2016 international conference on communication
and signal processing (ICCSP). IEEE, pp 0891–0896
26. Rao VG, Lachapelle G, VijayKumar SB (2011) Analysis of IRNSS over Indian subcontinent.
In: Proceedings of the 2011 international technical meeting of the institute of navigation, pp
1150–1162
27. Odijk D, Nadarajah N, Zaminpardaz S, Teunissen PJ (2017) GPS, Galileo, QZSS and IRNSS
differential ISBs: estimation and application. GPS Solut 21:439–450
28. ISRO (2023) Report on channelized efforts to India on a track to serve global needs ensure
level-playing grounds enhancing the private participation in space activities
29. Mumford PJ, Parkinson K, Dempster A (2006) The namuru open GNSS research receiver. In:
Proceedings of the 19th international technical meeting of the satellite division of the institute
of navigation (ION GNSS 2006), pp 2847–2855
30. Cantelmo C, Zanello R, Blanchi M, Capetti P, Scarda S (2009) Galileo timing applications and
ACTS prototyping. In: 2009 IEEE international frequency control symposium joint with the
22nd European frequency and time forum, pp 405–410
31. Jagiwala DD, Shah SN (2018) Impact of Wi-Fi interference on NavIC signals. Curr Sci
114(11):2273–2280
32. Ansari K, Park KD (2019) Contemporary deformation and seismicity analysis in Southwest
Japan during 2010–2018 based on GNSS measurements. Int J Earth Sci 108:2373–2390. https://
doi.org/10.1007/s00531-019-01768-w
33. Ansari K, Corumluoglu O, Sharma SK (2017) Numerical simulation of crustal strain in Turkey
from continuous GNSS measurements in the interval 2009–2017. J Geodetic Sci 7(1):113–129.
https://doi.org/10.1007/s10509-017-3043-x
34. Ansari K (2018) Crustal deformation and strain analysis in Nepal from GPS time-series
measurement and modeling by ARMA method. Int J Earth Sci 107(8):2895–2905. https://
doi.org/10.1007/s00531-018-1633-7
35. Ansari K, Bae TS (2020) Contemporary deformation and strain analysis in South Korea based
on long-term (2000–2018) GNSS measurements. Int J Earth Sci 109(1):391–405. https://doi.
org/10.1007/s00531-019-01809-4
36. Ansari K, Corumluoglu O, Verma P (2018) The triangulated affine transformation parameters
and barycentric coordinates of the Turkish permanent GPS network. Surv Rev 50(362):412–
415. https://doi.org/10.1080/00396265.2017.1297016
37. Ansari K, Gyawali P, Pradhan PM, Park KD (2019) Coordinate transformation parameters in
Nepal by using neural network and SVD methods. J Geodetic Sci 9(1):22–28. https://doi.org/
10.1515/jogs-2019-0003
10 K. Ansari et al.
38. Ansari K, Althuwaynee OF, Corumluoglu O (2016) Monitoring and prediction of precipitable
water vapor using GPS data in Turkey. J Appl Geodesy 10(4):233–245. https://doi.org/10.1515/
jag-2016-0037
39. Ansari K, Corumluoglu O, Panda SK, Verma P (2018) Spatiotemporal variability of water
vapor over Turkey from GNSS observations during 2009–2017 and predictability of ERA-
Interim and ARMA model. J Glob Positioning Syst 16:1–23. https://doi.org/10.1186/s41445-
018-0017-4
40. Jamjareegulgarn P, Ansari K, Ameer A (2020) Empirical orthogonal function modeling of total
electron content over Nepal and comparison with global ionospheric models. Acta Astronaut
177:497–507. https://doi.org/10.1016/j.actaastro.2020.07.038
41. Sharma SK, Singh AK, Panda SK, Ansari K (2020) GPS derived ionospheric TEC variability
with different solar indices over the Saudi Arab region. Acta Astronaut 174:320–333. https://
doi.org/10.1016/j.actaastro.2020.05.024
42. Timoçin E, Inyurt S, Temuçin H, Ansari K, Jamjareegulgarn P (2020) Investigation of equatorial
plasma bubble irregularities under different geomagnetic conditions during the equinoxes and
the occurrence of plasma bubble suppression. Acta Astronaut 177:341–35. https://doi.org/10.
1016/j.actaastro.2020.08.007
Predictive Tomato Leaf Disease Detection
and Classification: A Hybrid Deep
Learning Framework
Abstract Tomatoes are a widely used crop in India. Its significance is beneficial
to agriculture. Tomatoes are an essential food for humans. Many illnesses can have
detrimental effects on a plant’s health as well as inhibit its growth. Farmers fail
to prevent damaged yields because they assess them too late. The development of
an intelligent system with very effective plant disease detection capabilities has
garnered increased attention recently. The goal of this study is to identify the most
accurate and efficient algorithm by reviewing a range of existing approaches. This
review also discusses the benefits and drawbacks of the recommended tactics. In
this proposed work, a hybrid CNN and BiLstm model can classify the tomato leaf
with 99% accuracy using the plant village dataset. The hybrid learning architecture
disease-finding strategy that the review suggested for a tomato leaf ailment produced
better results.
1 Introduction
Plant leaf illness is a significant issue in agriculture that affects the growth of
plants and costs a considerable amount. Utilizing computer-based image processing
methods to automatically detect and categorize plant illnesses is known as computer
vision in plant disease detection and classification. Researchers have created tech-
niques to identify and categorize different plant diseases by analyzing leaf images,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 11
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_2
12 V. Jayanthi and M. Kanchana
which helps with early management and identification. This technology can help
non-experts, including farmers, identify and treat plant illnesses more successfully.
A deep learning model is one that has learned a given task from a large dataset
before being adjusted or applied to other related tasks. The idea of pre-training is
particularly common in the fields of natural language processing (NLP) and computer
vision, where it is effective to share and reuse the models due to the vast quantity of
data and computational resources needed for deep neural network training.
Hence, the primary goal is to identify and categorize various diseases affecting
tomato leaves [6]. Detecting and classifying diseases in tomato leaves at an early
stage can help agriculturalists avoid costly pesticides and contribute to increased
food production. Thereby preventing them from experiencing intensive productive
loss [4]. Most of the diseases can be determined directly with the naked eye, but
accurate identification of diseases is crucial in terms of preventing the spreading of
diseases and also saving time. The various ML approaches, including SVM, DT,
Gaussian frameworks, and k-NN, are employed in disease inspection. Additionally,
several DL approaches, including CNN, F-RCNN, and LSTM, are employed for
plant leaf disease detection and classification.
2 Dataset
The most commonly used datasets in tomato plant leaf disease prediction are.
Plant leaf images under different conditions on the plant village dataset. 54,305
images of healthy and diseased leaves in controlled conditions. It has 14 different
crop images, like an apple, tomato, potato, grape, etc., and 12 crop species images
of healthy leaves.
2.2 Plantdoc
Plantdoc is a dataset for the visual detection of plant diseases. It has 2598 images
of illnesses for 13 kinds of plants and 17 types of illnesses. Here, the images were
annotated by pathologists, and they performed annotations for 300 h.
Predictive Tomato Leaf Disease Detection and Classification: A Hybrid … 13
Faster R-CNN (region-based CNN) [6, 12] is widely used. For object detection, it
combines a regional proposal network (RPN) with CNN. It is accurate and efficient,
making it suitable for various applications.
3.2 YOLO
Yolo [4], or you only look once, is used in computer vision and machine learning.
In real-time object detection, YOLO applications in surveillance, autonomous cars,
and image and video analysis.
3.3 SSD
In single-pass [10], it predicts bounding boxes and class probabilities across multiple
scales. Effectively balance speed and accuracy.
It reduces a lot of time and money when using an image classification model because
these models have been pre-trained on large datasets and have obtained fruitful
features that are adapted to the specific task. Here are some popular pre-trained
models for image classification.
The VGG model [2, 3] are known for their simple and uniform structure. They come
in different versions with varying depths (e.g., VGG 16 and VGG 19). Multiple
convolutional layers with small 3 × 3 filters are stacked in VGG models before
max-pooling layers.
ResNet [12] introduced residual blocks that help mitigate the vanishing gradient
problem in very deep networks. ResNet architectures can be quite deep (e.g.,
ResNet50, ResNet101) and have been highly successful in image classification tasks.
4.4 MobileNet
4.5 Xception
Xception [5] is inspired by the Inception architecture but uses depth-wise separable
convolutions in a different arrangement. It aims to capture fine-grained features
efficiently.
The author [1] utilizes the PCA model to classify plant leaf diseases and the F-RCNN
model to detect diseases. Ten groups of diseases were identified using the plant village
dataset in this paper. It is extremely quick and highly accurate (99.60%). In order
to raise the image resolution and train the model, two distinct kinds of Generative
Adversarial Network (GAN) are employed in this [2] paper. The healthy leaf was
trained using Wasserstein GAN (WGAN), whereas the diseased leaf was trained
using Superresolution GAN. The images are then produced using Deep Convolution
GAN (DCGAN). For image classification, deep network architectures, including
VGG16, ResNet50, and DenseNet121, were employed. The accuracy reported in this
proposed work was 97.83, 97.83, and 98.98%, respectively. The Improved Crossover-
Based Monarch Butterfly Optimization (ICRMBO) technique was introduced to this
[3] research to reduce architecture complexity and optimize the CNN parameters
for classifying sick plant leaves. The test accuracy for the Inception V3 and VGG16
architectures was 99.94% and 99.98%, respectively.
The author of this work [4] introduced the upgraded Yolo V3 model for image
identification in a real natural environment. To enhance Yolo V3, the proposed model
used multiscale feature identification, multiscale training, and the grouping dimen-
sion of the bounding box. The accuracy provided by this model was 92.39%. The
automated classification of tomato leaf diseases using the proposed model’s exper-
imental results showed that it worked well. The model performance was evaluated
by authors in [5] using RMSprop, stochastic gradient descent, and adaptive moment
estimation (Adam). Compared to SGD and RMSprop, the accuracy of the Adam
optimizer was higher. The model’s accuracy was 99.5%. The research’s upgraded
RCNN model aims to recognize and classify this disease in tomato leaves. In terms
of spotting accuracy and finding speed, the authors’ technique exceeds the original
Faster RCNN model, based on their analysis in [6]. The accuracy of the model was
98.54%.The proposed approach [7] uses transfer learning to identify tomato leaf
disease. Using a hybrid model, such as MobileNet V2, and a classifier network, it
was able to attain a leaf feature. It compared the outcomes of various deep learning
models, including VGG19, ResNet50, ResNet152 V2, MobileNet, MobileNet V2,
DenseNet121, and DenseNet201. 99.30% accuracy was attained for this model.
16 V. Jayanthi and M. Kanchana
For the purpose of identifying leaf diseases, a compact CNN was developed.
Because there were only 5 layers used in this study [8], CNN is referred to as a
compact convolutional neural network. When the model’s output was compared to
ImageNet, it produced an accuracy of 99.70%. Restructured RDN was employed in
the proposed approach [9] to identify leaf diseases. A dense and residual network
was used to create this hybrid model. It improved accuracy and decreased the number
of training parameters. With the given dataset, this model achieves 95% accuracy.
In particular, the research [10] suggested localization and classification of diseases
using images. There are three deep learning meta-architectures that were used to
identify plant diseases based on images. This model was able to achieve a mean
average precision of 73.07%.
Two CNNs are used in this suggested method [11]. One was used to identify a
disease in leaves, and the other was designed to learn characteristics obtained from
a validation set. The accuracy of this model is 98%. This research suggested [12]
a deep learning-based method called ResNet-34-based Faster-RCNN for localiza-
tion and classification of tomato leaf disease. Generating annotations to identify the
region of suspected images. The Convolutional Block Attention Module (CBAM)
and ResNet-34 are introduced to extract deep features. Locating and categorizing the
various tomato plant leaf abnormalities using estimated features for the Faster-RCNN
model training. This study [13] suggested a method for automatically removing the
background from leaf images taken with mobile applications that is based on CNN
to segment leaves using unsupervised learning.
In the research work [14], a modified RCNN mask was used for the identifi-
cation and segmentation of tomato leaf diseases. The suggested model’s accuracy
was 98%. The model’s detection time is very low. The paper suggested [15] using
deep learning CNN to identify and categorize diseases in tomato plant leaves. The
DLCNN is composed of 8 layers. The CNN structure was created using a Matlab
m-file. To properly describe [16] and categorize tomato infections, the Convolu-
tional Neural Network (CNN) was used. First, images are segregated after pre-
processing of the input images. Various tuning parameters of the CNN model are
used to process the images in the second step. CNN also extracts extra features from
images, such as colors, borders, and textures. According to reports, the proposed
model’s predictions were 98.49%.The study developed [17] a residual neural network
algorithm-based intelligent method for identifying nine prevalent tomato illnesses. A
standard convolutional neural network architecture’s fundamental building blocks,
known as layers, are included in the technique of the suggested network approach,
which is discussed in the study. The accuracy of the model was evaluated using five
different network depths. The method beat earlier techniques in identifying tomato
leaf disease, achieving a high F1 score of 99.5%. The proposed method [18] uses a
decision tree classifier and a random forest classifier for the classification of images.
This model achieved accuracy of 90 and 94%, respectively.
Predictive Tomato Leaf Disease Detection and Classification: A Hybrid … 17
From the above set of papers, I noticed that there were a few research challenges,
as follows: Predicting the contour information of the image is not accurate due
to low contrast with the background, weak elimination, cell overlapping, irregular
shape, and impurity interference. Performance in separating overlapped nuclei is
quite limited. The manual process also requires more time to differentiate between
normal and abnormal ones which leads to wrong predictions at the time of prediction.
So, in order to overcome these research challenges, I developed the proposed model
for a solution.
6 Proposed Methodology
In this work, a hybrid model was used to classify tomato leaf diseases. The hybrid
model is composed of CNN and Bi-LSTM. We can feed the input image into the
convolutional neural network, which has three layers: the convolution layer uses
filters to reduce the size of the feature map, the pooling layer uses half the size of the
image in the spatial domain, and the fully connected layer classifies the image from
the feature extracted from the previous layer. Bidirectional long-short-term memory
uses both directions for data transfer, one for the forward direction and another for the
backward direction. By averaging the results of two classifications, the hybrid model
determines the classified outcome as bacterial spot disease, early blight disease,
healthy, late blight disease, leaf mold disease, Septoria leaf spot disease, spider mites,
two-spotted spider mite disease, target spot disease, tomato mosaic viral disease,
and tomato yellow leaf curl viral disease. Figure 1 illustrates the architecture of the
proposed work.
The test dataset is used to assess the performance of the system by examining a
variety of distinct classifiers. To examine the effectiveness of learning algorithms, a
range of classification methods were employed in the experiments, involving many
convolutional neural network layouts. These tests provide an extensive variety of
results with different levels of accuracy in categorization. Both the CNN and BiLstm
models have been proposed for achieving the highest accuracy. Currently, a score
classification of 99% is being implemented. The effectiveness of the classifier in
the proposed technique was assessed using evaluation measures, with a particular
focus on accuracy. Figure 2. Below shows the performance of various deep learning
models. Even though the techniques in Table 1. above, PCA DeepNet and F-RCNN
provide 99.60 percent accuracy but do not satisfy flexibility and time consumption
as per user demand.
Predictive Tomato Leaf Disease Detection and Classification: A Hybrid … 19
Table 1 (continued)
References Techniques No. of No. of Performance Pros. & Cons.
images/ classes of measures
dataset diseases
[5] Xception 16,578/ 10 classes 99.5% Pros: It was
Plant of diseases utilized to
village decrease the
development
time and
computational
resources
Cons: If the
second task
involves the
fine-tuning of
the model, then
transfer learning
may result in
overfitting
[6] Improved faster 1 real image 5 classes of 98.54% Pros: By using
RCNN-detection diseases faster RCNN,
ResNet101-feature ResNet101’s
extraction detection speed
was enhanced
Cons: Only one
real image was
used as a sample
[7] MoblieNet 18,160/ 10 classes 99.30% Pros: Low
V2-detection Plant of diseases computational
village resource
requirements
Cons: Based on
fine-turning, it
produced a high
variance
[8] Compact CNN 18,160/ 10 classes 99.70% Cons: The
ImageNet Plant of diseases proposed model
village had more weight
than other
models
[9] Residual dense 13,185/AI 9 classes of 95% Pros: It was
network challenger diseases suggested as a
2018 solution to the
denoising and
image
superresolution
issues
Cons: The
implementation
was not realistic
(continued)
Predictive Tomato Leaf Disease Detection and Classification: A Hybrid … 21
Table 1 (continued)
References Techniques No. of No. of Performance Pros. & Cons.
images/ classes of measures
dataset diseases
[10] SSD Not 38 classes 73.03% Pros: A single
Faster RCNN Mentioned / of diseases framework is
RFCN Plant in various used to identify
Village plants like 12 healthy leaves
tomato, and 26 illnesses
apple, blue Cons: Training
berry and data quality and
many more quantity have a
significant
impact on how
well a deep
learning model
functions
[11] CNN 120,000/ 4 classes of 98% Pros: It is
Plant diseases accurate and
Village simple to
compute, which
qualifies it for
use in practical
applications
Cons:
Small-scale
farmers and
agricultural
communities
may find it
difficult to use
the proposed
model because it
may require a lot
of processing
power
[12] Faster RCNN 54,306/ 10 classes 99.97% Pros: The
ResNet-34 Plant of diseases proposed
village method is
reliable and
economical
Cons: It may
require an
immense
quantity of
labeled data for
training, which
can be
time-consuming
and expensive
(continued)
22 V. Jayanthi and M. Kanchana
Table 1 (continued)
References Techniques No. of No. of Performance Pros. & Cons.
images/ classes of measures
dataset diseases
[13] FCNN 1408/ Healthy and Not Pros: It is
Camera diseased mentioned substantially
image from leaves quicker than any
kenya of the other
techniques
Cons: In real
time, they were
not implemented
[14] Modified mask 1610/Plant Not 98% Pros: The
R-CNN village mentioned proposed
method was
evaluated for
credibility and
robustness
Cons: It may not
be relevant to the
other types of
plants
[15] DLCNN 6202/Plant 6 classes of 96.43% Pros: When
village diseases compared to
conventional
techniques for
identifying plant
diseases, the
suggested
approach saves
time and
resources
Cons: The
technique may
need expertise
and resources for
implementation
8 Conclusion
Spotting leaf diseases is precise in farming and requires greater accuracy in a real-
time system. An overview of the most recent methods for leaf disease detection was
provided by this study. The study examined the efficacy of the existing approach and
its shortcomings as a disease detection tool. The effectiveness of various pre-trained
deep learning models for leaf disease diagnosis is summarized in this review. The
study suggested a hybrid deep learning architecture to detect tomato leaf disease.
CNN and Bi-LSTM combine deep learning models that have been trained and offer
Predictive Tomato Leaf Disease Detection and Classification: A Hybrid … 23
greater accuracy. This research can be optimized by implementing the hybrid leaf
detection model and comparing the results with existing models.
References
1. Roy K, Chaudhuri SS, Frnda J, Bandopadhyay S, Ray IJ, Banerjee S, Nedoma J (2023) Detec-
tion of tomato leaf diseases for agro-based industries using novel PCA DeepNet. IEEE Access
11: 14986
2. Zhao Y, Chen Z, Gao X, Song W, Xiong Q, Hu J, Zhan Z (2021) Plant disease detection using
generated leaves based on DoubleGAN. IEEE/ACM Trans Comput Biol Bioinform 19(3)
3. Nandhini S, Ashokkumar K (2021) Improved crossover-based monarch butterfly optimization
for tomato leaf disease classification using convolutional neural network. Multimedia Tools
Appl 80:18583–18610
4. Liu J, Wang X (2020) Tomato diseases and pests’ detection based on improved Yolo V3
convolutional neural network. Front Plant Sci 11:898
5. Thangaraj R, Anandamurugan S, Kaliappan VK (2020) Automated tomato leaf disease clas-
sification using transfer learning-based deep convolution neural network. J Plant Dis Prot
128:73–86
6. Zhang Y, Song C, Zhang D (2020) Deep learning-based object detection improvement for
tomato disease. IEEE Access 8:56607–56614
7. Ahmed S, Hasan MB, Ahmed T, Sony MRK, Kabir MH (2022) Less is more: lighter and faster
deep neural architecture for tomato leaf disease classification. IEEE Access 10:68868–68884
8. Ozbilge E, Ulukok MK, Toygar O, Ozbilge E (2022) Tomato disease recognition using a
compact convolutional neural network. IEEE Access 10:77213–77224
9. Zhou C, Zhou S, Xing J, Song J (2021) Tomato leaf disease identication by restructured deep
residual dense network. IEEE Access 9:2882228831
10. Saleem MH, Khanchi S, Potgieter J, Arif KM (2020) Image-based plant disease identification
by deep learning meta-architectures. Plants 9(11):1451
11. Karthik R, Hariharan M, Anand S, Mathikshara P, Johnson A, Menaka R (2020) Attention
embedded residual CNN for disease detection in tomato leaves. Appl Soft Comput 86:105933
12. Alvaro F, Sook Y, Sang K, Dong P (2017) A robust deep-learning based detector for real-time
tomato plant diseases and pests’ recognition. Sensors 17(9):2022
13. Ngugi LC, Abdelwahab M, Abo-Zahhad M (2020) Tomato leaf segmentation algorithms for
mobile phone applications using deep learning. Comput Electron Agricult 178, Art. no. 105788
14. Kaur P, Harnal S, Gautam V, Singh MP, Singh SP (2022) An approach for characterization of
infected areas in tomato leaf disease based on deep learning and object detection technique.
Eng Appl Artif Intell 115:105210. https://doi.org/10.1016/j.engappai.105210
15. Salih TA (2020) Deep learning convolution neural network to detect and classify tomato plant
leaf diseases. Open Access Libr J 7(05):1
16. Trivedi NK, Gautam V, Anand A, Aljahdali HM, Villar SG, Anand D, Goyal N, Kadry S (2021)
Early detection and classification of tomato leaf disease using high-performance deep neural
network. Sensors 21:7987. https://doi.org/10.3390/s21237987
17. Kanda PS, Xia K, Kyslytysna A, Owoola EO (2022) Tomato leaf disease recognition on leaf
images based on fine-tuned residual neural networks. Plants 11(21):2935. https://doi.org/10.
3390/plants11212935
18. Basavaiah J, Anthony AA (2021) Tomato leaf disease classification using multiple feature
extraction techniques. Wirel Pers Commun 115(1):633–665
Conceptual Framework for Risk
Mitigation and Monitoring in Software
Organizations Based on Artificial
Immune System
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 25
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_3
26 N. Hasib et al.
1 Introduction
The Artificial Immune System (AIS), as shown in the pyramid, is a new field of
artificial intelligence that was created as a result of increased understanding and
research into immune system concepts (see Fig. 1) [12]. Among the many attrac-
tive features of the biological immune system is its ability to remember, classify,
and lessen the impact. In AIS, the basic idea of the biological immune system is
modeled in great detail. Many components of the immune system are used in the
Artificial Immune System. The detector combines the features of T cells, B cells,
and antibodies. A positive correlation exists between the connection likelihood and
the receptor-epitope affinity [34–36]. Memory and plasma cells are produced due
to the biological immune system’s negative (self) and positive (non-self) selection
processes. Clonal selection is adaptive and benefits from a dispersed population
of detectors. An artificial immune system is built upon the theories of immuno-
logical network, clonal selection, and affinity maturation. As part of adjusting the
network of antibodies to train the antigen patterns using the clonal selection theory,
an AIS’s memory cell network is designed to identify the existence of data clusters.
For cloning, the antibody with the highest affinities is selected. In the meantime, the
cloning process incorporates the mutation phase to enhance antigen recognition. The
antibody’s modified clone with the highest affinity is used to select the memory set [5,
12]. Throughout the project, risk is continuously monitored in the software devel-
opment process. To reach their full potential, corporations, agencies, and startups
alike should have a human factors risk monitoring framework for all their software
development projects. Risk management and business information systems seem to
work hand in hand [5]. The interplay between the system and its surroundings gives
rise to risk. The system is designed so that, before any action is taken, the risk can
either be eliminated or, if it does, it cannot result in an accident. Project managers
in an organization can produce the highest-quality software product on schedule and
within budget by mitigating risk during the project development process. Reduc-
tion and mitigation will increase project success rates, provide an estimated time
frame, and improve the quality of the finished product. Various risk factors can arise
during software development in businesses or organizations. Numerous risk manage-
ment methodologies have been developed; however, some are incomplete or contain
specific failures [12].
Therefore, the research aims to determine how the immune system and risk miti-
gation monitoring are related to addressing these issues [33]. This will provide a fresh
perspective and a revised explanation of how organizations and enterprises manage
risks and develop projects [21–24]. Almost any human endeavor carries some risks,
but some are much riskier than others. In addition to being uncertain, risk can also
Conceptual Framework for Risk Mitigation and Monitoring in Software … 27
able to recognize any antigen. In our model, we aim to retain the single best indi-
vidual rather than a large clone for each candidate solution. The clonal selection
theory states that a clone will be made temporarily and that low-affinity progeny will
be eliminated. In this work, we present a novel framework for the risk mitigation
monitoring system, HRFMMF, based on the algorithmic approach of the Artificial
Immune System based on the Biological Immune System. Therefore, the new defi-
nition of risk management that applies to this study is risk mitigation monitoring
[12–19].
It is hard to coordinate risk identification and monitoring activities in large soft-
ware organizations where several projects are jointly working toward creating a
common value. Communication loops between the projects can be long, hindering
projects from being informed about the interrelated risks across the projects. This
creates an unpleasant situation where the same risk can be mitigated in several
projects, causing unnecessary costs to the development process. Hence, the soft-
ware development process needs an effective risk mitigation monitoring approach
to identify and monitor risk proactively. Since the immune system can learn, memo-
rize, lessen the impact, and self-regulate, it can solve people-related risk factors in
large, medium, and small software organizations during the software development
life cycle [10].
As a general framework for adaptive systems, artificial immune systems have the
potential to be used in numerous domains. Applications for Artificial Immune
Systems include classification issues, optimization tasks, and other fields. It is
distributed, autonomous, and adaptive, like many systems with biological inspi-
ration. The immune system is attractive because if an adaptive pool of antibodies
can generate “intelligent” behavior, we reason that if we allow the concentrations
of those antibodies that yield a better match to rise over time, we should eventually
obtain a subset of good matches.
De Castro and Timmis [13] proposed the idea of a framework for AIS. AIS was
developed as a novel computational intelligence approach and can be defined as a
problem-solving technique that integrates immunology. The AIS algorithm is based
on the principles of the vertebrate immune system. Based on the principles proposed
by De Castro and Von Zuben [19], it is considered a basis for constructing the algo-
rithm [1, 2, 20]. The AIS platform involves clonal selection with an affinity maturation
mechanism to retrieve the immune response. The immune system’s robustness, flex-
ibility, learning capacity, and clonal selection are its effective activities, which make
AIS helpful in scheduling issues [4–7]. AIS is inspired by theoretical immunology,
its functions, and principles such as the Clonal Selection Principle, Learning and
Memory via Clonal Selection, Self/Non-self Discrimination, Negative Selection, and
Immune Network. These theories of AIS are based on Computational aspects of the
Conceptual Framework for Risk Mitigation and Monitoring in Software … 29
Fig. 2 Pictorial representation of antigen mitigation monitoring adaptive immune system [4]
30 N. Hasib et al.
link, the more closely an antibody matches a particular antigen. We refer to this
quality as affinity. Large quantities of a particular antibody produced by plasma cells
are directed against and destroy a particular antigen. Memory cells support a quick
secondary response while staying within the host. However, B cell clones are created
and go through somatic hypermutation prior to this procedure. As a result, the B cell
population becomes more diverse. Furthermore, selection pressure suggests that the
cells with higher affinity will survive [8].
Researchers have recently become interested in and inspired to develop algorithms
that evolve candidate solutions through selection, cloning, and mutation processes by
studying clonal selection theory in the immune system. Clonal Selection Algorithms
(CSAs) in their canonical form and variants are applied to a variety of problems
and have been shown to outperform other heuristics (e.g., neural networks, genetic
algorithms) in certain scenarios (e.g., function optimization and pattern recognition).
Despite the growing popularity of CSA studies, the CSA algorithm, which is based on
generations and evolutionary operators, differs from other evolutionary algorithms
in the following ways, to the best of our knowledge: First, the affinity of an individual
determines the rates of cloning and mutation. Whereas the mutation rate is inversely
proportional to the affinity, the cloning rate is proportional to the affinity. The memory
cell population gradually saves the best solution from each generation. When the
algorithm is finished, it will be given back as the definitive answer; thirdly, the
population size is dynamically changeable [8, 14–18].
Our bodies produce populations of memory B cells, helper T cells, and cytotoxic
T cells in response to an infection. These cells have receptors specific to the antigens
linked to the infectious pathogen. Following the initial immune response, the blood
still contains antibodies that are particular to the antigen. Because of this, determining
whether or not a person has had a specific infection can be done effectively by
Conceptual Framework for Risk Mitigation and Monitoring in Software … 31
looking for antibodies to that antigen. These memory cells are ready to activate
and quickly combat the infection when it resurfaces. The memory helper T cells
start triggering the humoral and cell-mediated immune responses when they contact
their complementary antigen. Memory B cells quickly differentiate into plasma cells,
which secrete antibodies and more memory B cells when they come into contact with
their complementary antigen. This is known as the secondary immune response, and
it is a far stronger and faster reaction than the primary response Figs. 2 and 3 [36].
3 Findings
Table 1 Comparison of the immune system process with the proposed system of risk mitigation
and monitoring
Process of immune system Primary immune system Secondary immune system
Antigen initialization Antigens and antibodies
similarity measures, selection,
and cloning
Proposed Process of risk Risk initialization Risk identification, mitigation &
mitigation monitoring system monitoring
32 N. Hasib et al.
The design of a novel human factor risk mitigation monitoring framework is based on
the theoretical concept of the Biological Immune System, which includes the primary
and secondary immune responses. Applying AIS-based algorithms is the proposed
framework. Risk identification, risk mitigation, and risk monitoring comprise the
three components of the suggested framework.
Based on the ideas of the artificial immune system clonal approach, simi-
larity measures, selection, and mutation of antigen initialization, an organization’s
managers use various frameworks and methods to maximize productivity. Research
has shown that the immune system and risk mitigation monitoring approach behave
similarly, which makes this framework a proactive, methodical, and process-based
approach [32, 33]. Therefore, it is very beneficial when designing the entire frame-
work that managers will process. Consequently, it is highly helpful to create a compre-
hensive framework that managers of an organization can use to address the risk posed
Conceptual Framework for Risk Mitigation and Monitoring in Software … 33
by human factors. The outcomes of this study can be used as a guide to create an
effective human-related framework for monitoring human risk factors, with refer-
ence to the proposed risk mitigation monitoring system [4]. Environmental, organi-
zational, and managerial factors can all be considered human factors. Following the
start of the project development process, all risk populations are first gathered from
previous projects according to categories. The final algorithm incorporates inputs
from experts, clients, stakeholders, and team members, as well as the solution expe-
rience of previous projects. A threshold value is used to rank the risks associated
with the pooled data as high, medium, and low. Critical risk is defined as a value
greater than the threshold, and medium and low-valued risks are defined as values
equal to or less than the threshold Figs. 4 and 5. Depending on the solutions found,
a risk once considered critical will now be considered medium or low and either be
avoided or accepted. Risks without a history are compiled and analyzed into three
categories: high, medium, and low.
Then, the latest inputs from experts, teams, clients, stakeholders, and managers
are combined with an extensive data pool of finished algorithm solution projects’
experience to document the critical risk mitigation strategies. Next, the immune
clonal strategy is initiated as selection, mutation, and similarity metrics are activated.
Throughout the development process, risk is determined by comparing the degree of
similarity between the documented risk and the solution and by gathering information
about possible solutions. High and moderate match (high and medium affinity) solu-
tions can be quickly overcome because the clonal selection approach has made many
of the best solutions available. For moderate match (medium affinity solution match)
Fig. 5 An illustration using a graph shows how the adaptive immune system (CSA approach)
initiates a response once the pathogen crosses a specific threshold (the threshold used in our
framework)
(same category risks), solutions are therefore slightly altered (mutated) depending
on the organization’s current state. Once the appropriate answer is found, the case
will be sent to primary measures overseen by experts (team, client, expert, stake-
holders, managers), from which solutions in the memorized pool can be searched for
significant matches (low-affinity solutions). Upon completion of the selection and
mutation process by the Ais algorithm, the last revised document of the risk pool is
committed to memory and utilized as the ultimate solution for any future projects
the organization runs Fig. 4.
This study could serve as the basis for future research on monitoring techniques for
risk mitigation. Although there are currently known risk management methodolo-
gies, they often fail for a variety of reasons, including executive support gaps, lack
of a specific function to recover or avoid risks, high implementation costs, delayed
responses, inadequate accountability, inability to measure the control environment
qualitatively, infrequent assessment, and inaccurate data. One way to conduct addi-
tional research is to validate and test the suggested risk mitigation monitoring proce-
dure against particular threats in an actual setting. The goal is to improve risk moni-
toring by outlining the principles of risk management, which could increase the
likelihood that the project will succeed. The outcomes are still pending confirmation
through testing in an authentic development context. In subsequent work, we plan to
analyze and validate the proposed framework and apply it in different organizations.
Since this study suggests applying AIS to a software engineering task, we anticipate
seeing more research in this field to develop into a mature discipline. The software
industry has made significant progress in human factor excellence by defining and
designing the human factor function to align with the organization’s mission. Human
Conceptual Framework for Risk Mitigation and Monitoring in Software … 35
factor excellence serves as a defining feature that inspires and prepares workers for
the organization’s recognized culture. The different aspects of human factors, such
as staffing, training and development, performance appraisal, and compensation, as
demonstrated and documented, will place the organization on the growth path toward
sustaining excellence [14]. Human factors practice configurations and systems that
are directly aligned with the organization’s strategy will require a paradigm shift
toward human factor excellence in the upcoming years [10, 12, 23–32]. To create
hybrid Artificial Immune Systems that function better overall, it’s also a good idea
to incorporate them with proven techniques.
The limitation in this area is that most of the applications of clonal selection
models deal with problems related to optimization. Even though AIS models have
demonstrated impressive performance across various application domains, further
research is still needed to address important theoretical issues like scalability, conver-
gence, and the development of unified frameworks. They might also be investigated
further and used to tackle more difficult application areas and difficult real-world
issues. From the perspective of human aspects, there are still a lot of issues. Plans for
future research should look into and describe the findings of this study, offer more
specific suggestions that may be implemented in the workplace, and indicate areas
for development.
References
1. Timmis J, Knight T, De Castro LN, Hart E (2004) An overview of artificial immune systems.
computation in cells and tissues. In: Natural computing series. Springer, pp 51–91. https://doi.
org/10.1007/978-3-662-06369-9_4
2. Artificial immune systems and their applications. In: Dasgupta D (ed). Springer (1999)
3. Costa Silva G, Dasgupta D (2015) A survey of recent works in artificial immune systems.
In: Handbook on computational intelligence. World Scientific, pp 547–586. https://doi.org/10.
1142/9789814675017_0015
4. Hasib N, Rizvi SWA, Katiyar V (2023) Biological immune system based risk mitigation moni-
toring system: an analogy. In: International conference on artificial intelligence, Blockchain,
computing and security, vol 1, 1st edn. Imprint CRC 2023, ISBN9781003393580, https://doi.
org/10.1201/9781003393580-1
5. Joseph Dominic Vijayakumar S, Saravanan M (2016) Artificial immune system algorithm for
optimization of permutation flow shop scheduling problem. A Theses, Anna University. http://
hdl.handle.net, https://doi.org/10.1016/j.proeng.2014.12.436
6. Ulutas BH, Kulturel-Konak S. A review of clonal selection algorithm and its applications. Int
Sci Eng J 117–138. https://doi.org/10.1007/s10462-011-9206-1
7. Al-Enzi JR, Abbod MF, Alsharhan S (2010) Artificial immune systems-models, algorithms
and applications. Int J Res Rev Appl Sci (IRAS), 118–131
8. Autili M, Di Salle A, Gallo F, Perucci A, Tivoli M (2015) Biological immunity and software
resilience: two faces of the same coin? In: Fantechi A, Pelliccione P (eds) Software engineering
for resilient systems. SERENE 2015. Lecture notes in computer science, vol 9274. Springer,
Cham. https://doi.org/10.1007/978-3-319-23129-7_1
9. Catal C, Diri B (2005) Application and benchmarking of artificial immune system to classify
fault-prone modules for software development projects. In: International conference applied
computing, Salamanca, pp 1–5
36 N. Hasib et al.
10. Flouris TG, Yılmaz AK (2010) The risk management framework to strategic human resource
management. Int Res J Financ Econ. (36). ISSN 1450-2887
11. Hasib N, Rizvi SWA, Katiyar V (2023) Artificial immune system: a systematic literature review.
J Theor Appl Inform Technol. 101(4):1469–1486. Little Lion Scientific. ISSN: 1992-8645,
www.scopus.com
12. Hasib N, Rizvi SWA, Katiyar V (2023) Risk mitigation and monitoring challenges in soft-
ware organizations: a morphological analysis. Int J Recent Innov Trends Comput Commun
11(8):172–185. https://doi.org/10.17762/ijritcc.v11i8.7943
13. De Castro L, Timmis J (2002) Artificial immune systems: a new computational intelligence
approach. Springer
14. Reda A, Johanyák ZC (2021) Survey on five nature-inspired optimization algorithms, pp 173–
183. ISSN 2064-8014. https://doi.org/10.47833/2021.1.CSC.001
15. Brownlee J (2005) Artificial immune recognition system (AIRS)—a review and analysis.
Technical Report No. 1-02, pp 1–44
16. Brownlee J (2005) Clonal selection theory & clonalg the clonal selection classification
algorithm (CSCA). Technical report No. 2-02
17. Benhamini E, Coico R, Sunshine G (2000) Immunology—a short course. Wiley-Liss, Inc.,
USA
18. Kimball JW (1983) Introduction to immunology. Macmillan Publishing Co., New York, USA
19. De Castro L, Von Zuben F (2001) The clonal selection algorithm with engineering applications.
Artif Immune Syst 8
20. Aickelin U, Dasgupta D, Gu F (2013) Artificial immune systems. Search methodologies intro-
ductory tutorials in optimization and decision support techniques, pp 187–211. https://doi.org/
10.1007/978-1-4614-6940-7_7
21. Roy B, Dasgupta R (2015) A study on risk management strategies and mapping with SDLC.
In: 2nd international doctoral symposium on applied computation and security systems. https://
doi.org/10.1007/978-81-322-2653-6_9
22. Elzamly A, Hussin B (2016) Quantitative and intelligent risk models in risk management for
constructing software development projects: a review. Int J Softw Eng Its Appl 10:9–20. https://
doi.org/10.14257/ijseia.2016.10.2.02
23. Arunprasad P, Kamalanabhan T (2010) Human resource excellence in the software industry in
India: an exploratory study. Int J Logist Econ Glob 2:316–330. https://doi.org/10.1504/IJLEG.
2010.037519
24. Chiang H, Lin B (2020) A decision model for human resource allocation in project management
of software development. In: IEEE Access, p 1. https://doi.org/10.1109/ACCESS.2020.297
5829
25. Boatman A. HR risk management: a practitioner’s guide
26. Kermani A, Beheshtifar M, Montazery M, Arabpour A (2021) Human resource risk manage-
ment framework and factors influencing it. Propósitosy Representaciones 9. https://doi.org/10.
20511/pyr2021.v9nSPE1.902
27. Mitrofanova A, Konovalova V, Mitrofanova E, Ashurbekov R, Konstantin T (2017) Human
resource risk management in an organization: methodological aspect.https://doi.org/10.2991/
ttiess-17.2017.114
28. Rodgers W, Murray J, Stefanidis A, Degbey WY, Tarba S (2022) An artificial intelligence
algorithmic approach to ethical decision-making in human resource management processes.
Human Resour Manag Rev 33:100925. https://doi.org/10.1016/j.hrmr.2022.100925
29. Popescu S, Santa R, Teleaba F, Ilesan H (2020) A structured framework for identifying risk
sources related to human resources in a 4.0 working environment perspective. Human Syst
Manag 39:511–527. https://doi.org/10.3233/HSM-20105
30. Zhu H (2021) Research on human resource recommendation algorithm based on machine
learning. Sci Program 1–10. https://doi.org/10.1155/2021/8387277
31. Charles J (2017) Analyzing the risk factors in human resource allocation for secure software
development. A thesis, Noorul Islam Centre for Higher Education
Conceptual Framework for Risk Mitigation and Monitoring in Software … 37
32. Aldhaheri S, Alghazzawi D (2020) Artificial Immune systems approaches to secure the Internet
of Things: a systematic review of the literature and recommendations for future research. J Netw
Comput Appl. 1084–8045. https://doi.org/10.1016/j.jnca.2020.102537
33. Sarkheyli A, Ithnin B (2011) Study of the immune system of the human body and its relationship
with risk management in organizations. In: 5th international symposium of advances on science
and technology, SASTech
34. Kuby. Immunology. W.H. Freeman
35. Novotny A. Fundamentals of immunology: innate immunity and B-cell function. Biochemistry
and Cell Biology Lecturer Department of Biosciences, PhD Rice University, Coursera.Org/
Verif Y/8PVNX66M8R8J, A Course Authorized by Rice University and offered through
Coursera
36. Rich E, Knight K. Artificial intelligence. McGraw Hill
A Multilevel Home Fire Detection
and Alert System Using Internet
of Things (IoT)
Abstract A home fire alert system can safeguard lives while limiting damage to the
greatest extent possible. However, it will be easier to implement the necessary safety
measures if the fire alert system can assess the threat based on several fire hazard
levels. In this paper, we proposed a multilevel home fire detection and alert system
using the Internet of Things (IoT). This system’s goals are to detect the multilevel
fire characteristics parameter, keep track of them, and send multilevel fire alerts to
the user(s) based on fire hazard levels so that they can take the necessary actions.
We have utilized two different sensors for fire detection. These sensors sense fire
characteristic parameters such as temperature, humidity, and gas levels and send
them to the connected NodeMCU. A system user can view the information through
the LCD, a Smartphone app, and a cloud server. When the temperature and humidity
or gas levels exceed the predetermined threshold values, a buzzer activates, an LED
light switches from green to red, and an alert shows on the Smartphone app and
cloud server. If all of the fire characteristics parameter levels go beyond the threshold
value, with previously taken steps, an extreme fire alert will be transmitted to the fire
brigade’s email address, and a red-filled circle alert will appear on the cloud server.
We have designed a prototype to show the effectiveness of the proposed system.
The prototype has undergone planned testing, and the results demonstrate that the
functionalities of the proposed system are operating as expected within a reasonable
response time.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 39
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_4
40 S. A. Jarin et al.
1 Introduction
Fire is one of the more frequent and dominant disasters. It destabilizes the ecosystem,
puts lives in danger, and destroys property. As per a research report published by
the National Fire Protection Association (NFPA), local fire departments in the US
reported over 1.5 million fires in 2022, which led to 3790 civilian fire deaths, 13,250
civilian fire injuries, and an estimated $18 billion property damages. Although resi-
dential buildings, which include family homes, apartments, and other multifamily
housing, accounted for only 25% of all fires, these fires caused roughly 72% of injuries
and 75% of civilian deaths [1]. Home fire incidents are supposed to be caused by
inappropriate power consumption, gas leaks, disconnected equipment, human error,
etc. Every fire process constantly generates heat and smoke, and a fire will cause
the temperature to increase. Flammable substances chemically react with oxygen to
start flames through combustion. High oxygen content will increase the likelihood
of a fire starting [2]. In light of this, a fire detection and alert system is a primary
safety system for residential buildings, supermalls, restaurants, and so on since it
provides a level of protection that stops an unintentional fire from spreading into an
uncontrollable outbreak.
When a fire event occurs at home and the homeowner is present, the traditional
home fire detection and alert system works perfectly. These systems are less beneficial
if the owner is absent at home. Because upon detection of fire, the alert provided by the
system is unreachable to its owner. Therefore, the researchers improved these systems
substantially by utilizing wired, wireless, and hybrid technology. The Internet of
Things (IoT), one of these technologies, has recently become more well-known
because of its low cost and ease of creation. IoT refers to actual physical objects
or collections of such objects that include sensors, computing power, software, and
other technological capabilities and are linked to and exchange data with multiple
devices and systems over the Internet or other communications networks [3]. It makes
life easier by automating the processes that are continuously expanding to improve
present protection requirements.
In this paper, we proposed a multilevel home fire detection and alert system
using the Internet of Things (IoT). Our primary goal is to use different sensors
along with their predefined threshold values to identify various fire danger levels
at home. If the fire danger level is correctly identified in the system, only then the
appropriate fire alert from multilevel fire alert notifications is possible to choose
and send to the correct user(s) (homeowner/fire station) so that they can take the
necessary actions. To achieve these goals, we have utilized two different sensors to
detect the fire characteristics parameters such as temperature, humidity, and gas. The
detected information is received and processed by NodeMCU, and a user can access
this real-time information through the LCD, a Smartphone app, and a cloud server.
When the sensors cross the predefined threshold value limits, the system follows
multilevel alert options such as an LED light changing from green to red, a buzzer
sounds, an alert appearing on the Smartphone app and cloud server, and an email
sent to the fire brigade email address. We have divided these alerts by considering
A Multilevel Home Fire Detection and Alert System Using Internet … 41
the detection of temperature and humidity or gas or both crossing threshold values.
NodeMCU is essential for communication between the proposed system’s devices,
transmitting information and alerts over the Internet. We have designed a prototype
to show the effectiveness of the proposed system. The prototype testing and results
show that the proposed system operates effectively in the case of multilevel home fire
detection based on different fire characteristics and sends multilevel alert notifications
successfully to the designated places.
2 Literature Review
The rapid population development and widespread use of electrical equipment are
considered essential facts in the rise of potential fire hazards. It is especially crucial to
gather environmental data efficiently and make accurate predictions about possible
fire incidents to reduce the number of fatalities and property damage caused by fire
incidents [4]. Several researchers worked on IoT-based home fire detection and alert
systems to ensure home safety.
The design of an Arduino-based home fire alarm system with a Global System
for Mobile Communications (GSM) module is proposed [5]. In this system, the two
primary parts of the hardware design are the connections made by the Arduino UNO
with the GSM SIM900A and the Arduino UNO with the LM35 temperature sensor.
When a fire starts within the home, LM35 will trigger the heat. When the temperature
hits 40 °C, it will instantly alert the Arduino regarding the high temperature. Because
of the growing temperature, the Arduino will alert the user about this condition via
the GSM module. An SMS is sent to the user right away to inform the user of the
fire in the house. The presence of the fire will also be alerted on the LCD.
An IoT-based smart home fire detection and alarm system is proposed [6]. Early
fire detection, automatic alarm generation, notification of the fire breakout to the
remote user or fire control station, and fire prevention before the arrival of the fire
brigade are all features of this system. The system uses temperature and gas sensors
to let the Arduino Uno microcontroller sense the environment for the possibility
of fire. When a fire event is detected, the system sounds an alarm, uses a GSM
module to send SMS or phone alerts to mobile numbers recorded within the Arduino
application, and turns on a water sprayer to put out the fire.
An IoT-based fire alarm system has been proposed for home safety [7]. The sensor,
bridge, and center nodes are regarded as the primary nodes in the system. There are
two main functions used in this system. The first function explains fire detection
and alert transmission methods using the primary nodes. The second function in
the system allows the user to request measurements from sensors in real time. The
central node receives the user’s request via SMS and retrieves the real-time value
of the last sensors. It also has an additional function to monitor the detection nodes
and inform the user in case of failure. If a node does not reply, an SMS is sent
to the user to notify him. The authors have continued to deploy such a system for
smart-city applications [8]. In this system, an edge computing-based solution has
42 S. A. Jarin et al.
3 Proposed Work
In this paper, we proposed a multilevel home fire detection and alert system using the
Internet of Things (IoT). This system detects and monitors parameters related to the
fire, including temperature, humidity, and gas. When fire characteristic parameter
levels surpass the threshold limits, the system generates alerts and sends these alerts
to the users. It can send different alert messages to multiple designated areas based
on the discovered fire characteristics variables and the degree of the fire hazard.
The proposed system model is shown in Fig. 1. To detect the parameters of fire
characteristics such as temperature, humidity, and gas, the DHT11 sensor and MQ2
sensor are used. On the other hand, we have used several output devices such as LCD,
LED light, and buzzer. Both sensors and output devices have a wire connection to the
ESP8266 NodeMCU Wi-Fi module V3. The NodeMCU Wi-Fi module connects the
Smartphone app, the cloud server, and the fire brigade’s email address. NodeMCU
receives fire characteristics parameters from the sensors. It keeps monitoring these
data to determine if a fire event has occurred. The LCD, a Smartphone app, and a cloud
server display this real-time information. When the fire characteristics parameter
surpasses the threshold limits, NodeMCU recognizes the fire incident source and
decides which fire alert has to be triggered.
Both software and hardware requirements are part of the proposed system require-
ments. Figure 2 illustrates the different hardware elements needed for the proposed
system.
Figure 2A shows the DHT11 sensor. It is a simple and inexpensive digital temper-
ature and humidity sensor. It measures the ambient air using a capacitive humidity
sensor and a thermistor, and then a digital signal is output on the data pin. Figure 2B
shows the MQ2 sensor. This type of gas sensor can detect combustible gasses such
as LPG, butane, methane, propane, hydrogen, and smoke. It works by monitoring
variations in resistance across a sensor element in the presence of various gasses
and producing an electrical signal that a microcontroller or other control system can
read and understand. Figure 2C shows the NodeMCU, an open-source firmware and
development board that enables the rapid prototyping and development of IoT appli-
cations. It consists of firmware running on the ESP8266 Wi-Fi Soc from Espressif
Systems and hardware based on the ESP-12 module. The core of it is the ESP8266
Wi-Fi module, which establishes connections between devices to enable Internet
communication. Figure 2D shows the breadboard. It is a rectangular plastic board
with many tiny holes. It can be used to make electronic circuits without soldering.
This medium-sized breadboard is perfect for experimenting with and developing the
Arduino Shield. Figure 2E shows an I2C Liquid Crystal Display (LCD). This LCD
can show white characters on a blue background or 2X16 characters on two lines.
The light-emitting diode (LED), a semiconductor light source that emits light when
current runs through it, is depicted in Fig. 2F. We see a buzzer in Fig. 2G. It is a voice
device that converts the audio model into a sound signal and is used to prompt or
alarm. Figure 2H shows the connecting wire. Because power requires a medium to
flow through, connecting wires allow electricity to flow from one point on a circuit
to the next.
The proposed system utilizes the embedded C programming language and the
Arduino 1.8.15 IDE interface for software design. The Arduino IDE, a free integrated
The NodeMCU receives the detected fire characteristics parameter values from the
DHT11 and MQ2 sensors. The LCD, Smartphone app, and cloud server display the
received data in NodeMCU. The system generates alerts if the fire characteristics
parameter level exceeds the predefined threshold value. The proposed system has
two distinct alerts that we have set: fire alert and extreme fire alert.
(A) Fire Alert: When a fire characteristic parameter, such as temperature, humidity,
or gas level, exceeds the threshold value, an LED light switches from green
to red, and a buzzer sounds in the system. In the above-indicated cases, a fire
detection alert is transmitted to the user’s smartphone’s Blynk app and the
Blynk cloud server. A homeowner can use a fire escape or suppression strategy
to take control of the situation after receiving the fire alert. In the proposed
system, the systemic crossing of a single sensor threshold value is regarded as
a controllable alert since it clearly shows minimal threat. Therefore, the fire
brigade’s email account does not receive a fire alert in this case. It will surely
help to prevent time wastage for the firefighters.
(B) Extreme Fire Alert: When temperature, humidity, and gas levels—all of the
parameters that make up a fire—exceed predetermined threshold values in the
system, the green LED light turns red, and the buzzer sounds. In addition, the
system sends a fire detection alert to the Smartphone’s Blynk app and the Blynk
46 S. A. Jarin et al.
cloud server. We have classified this fire threat level as extreme because all the
received fire characteristic values in NodeMCU exceed the system threshold
value. As a result, a red color-filled circle alert will appear on a Blynk cloud
server, and an extreme fire alert with the home address of the fire event is sent
automatically to the fire brigade’s email address via the Blynk cloud server. The
responsible fire brigade member will investigate the fire event address from the
email and take prompt action to put out the fire and reduce fire damage. Figure 3
shows the proposed work’s operational process.
Figure 5 shows the prototype testing scenarios that used the candle flame and a
burning incense stick. The obtained indoor temperature, humidity, and gas values are
displayed in Fig. 5A. When a candle flame is placed in front of the DHT11 sensor, the
temperature increases and the humidity falls, seen in Fig. 5B. A burning incense stick
placed closer to the MQ2 sensor causes the gas level to increase, shown in Fig. 5C.
Temperature and gas level increases, and humidity level decreases, respectively, when
the candle and incense stick are placed closer to the sensors at the same time, shown
in Fig. 5D.
Upon receiving fire characteristics parameters such as temperature, humidity, and
gas levels from sensors, NodeMCU compares the detected values with predetermined
threshold values to determine whether the fire event has occurred. It determines and
applies the appropriate fire alert from the proposed multilevel fire alerts whenever
any one of the fire characteristics parameters, or all of the parameters, exceeds system
threshold values. Figure 6 shows the few received fire detection alerts during proto-
type testing that appeared in the Smartphone Blynk App, the Blynk cloud server, and
the assumed fire brigade’s email account.
Here, we have discussed the prototype test result by observing the obtained
temperature, humidity, and gas values and the multiple fire alerts, which are shown
in Figs. 5 and 6. In Fig. 5B, Gas: 330 ppm, Temperature: 46.70 °C, and Humidity:
11% are displayed in LCD where only temperature and humidity levels cross their
C. Trial with a burning incense stick D. Trial with candle flame and burning incense stick
C. Red color filled circle alert D. Email alert with fire incident
address
Fig. 6 Fire-detected status and alerts in smartphones, cloud servers, and emails
threshold values. In Fig. 5C, Gas: 577 ppm, Temperature: 31.50 °C, and Humidity:
36% displayed in LCD where only the gas level crosses its threshold value. In
Fig. 5D, Gas: 581 ppm, Temperature: 47.40 °C, and Humidity: 14% are displayed in
LCD where all temperature, humidity, and Gas level cross their threshold value. By
observing these above situations, the proposed system provides the following alerts:
when temperature and humidity level or gas level crosses the threshold value, the
system’s LED light switched from green to red color is depicted in Fig. 5A and B
or Fig. 5A and C, and the buzzer gets activated. In any of these cases, a fire alert
appears on the Smartphone Blynk App and the Blynk cloud server. In this instance,
controlling the situation and putting out the fire is the homeowner’s responsibility
because it is a controllable alert where the threat is minimal. However, in contrast,
if all the fire characteristics parameters cross the threshold values, the LED light
turns from green to red in the system depicted in Fig. 5A and D, and the buzzer
gets activated. A fire detection alert appears on the Smartphone Blynk app and the
Blynk cloud server, shown in Fig. 6A and B, respectively. In this instance, it is an
extreme level of danger because all the fire characteristics parameters have crossed
50 S. A. Jarin et al.
the threshold values limits. Therefore, the homeowner and fire brigade are informed
about the extreme fire alerts in this case. A red color-filled circle alert appears in
the cloud server, as shown in Fig. 6C. Note that the circle remains white in all the
other cases. Moreover, an extreme fire alert is delivered to the fire brigade email
address automatically via the Blynk cloud server, shown in Fig. 6D. After receiving
this email, the responsible member of the fire brigade can look up the address of the
fire event from the email enabling him to act swiftly to put out the fire and reduce
fire damage.
In the above-discussed prototype testing, we observed that the temperature and
humidity levels change gradually, the gas circulation change takes less time, and
all the fire characteristics parameters change takes a significant amount of time as
both fire flame and gas are applied together. To verify this observation, besides
the prototype trial with candle flame and burning incense stick, we have planned
more trials by combining gas lighter flame and burning mosquito coil, as well as
burning matchstick flame and gas lighter gas. These trials allow us to observe the
system performance more closely because the used objects distinguish the intensity
of fire and gas. Temperature and humidity tests, gas tests, and combining all temper-
ature, humidity, and gas tests are performed separately by using those of objects.
We have calculated the system response time by adding the time it takes to cross the
threshold value from the sensor’s indoor value, acquired values processing time, and
appropriate alert sending time.
Figure 7 shows the response time of the different trials that used a candle flame and
a burning incense stick, a gas lighter flame and a burning mosquito coil, and a burning
matchstick flame and a gas lighter gas. On average, the temperature and humidity
trials response time is 232 s, the gas trials response time is 39 s, and combining all
temperature, humidity, and gas tests response time is 180 s. It justifies the use of a
multilevel home fire detection and alert system.
5 Conclusion
In this paper, we proposed a multilevel home fire detection and alert system using
the Internet of Things (IoT). We have used DHT11 and MQ2 sensors to detect the
parameters of fire characteristics such as temperature, humidity, and gas. NodeMCU
serves as the system’s core component. It receives the indoor temperature, humidity,
and gas detection levels from the sensors, keeps track of those levels, and then takes
the appropriate action following those levels. Viewing the observed data in the system
is possible via the LCD, Smartphone Blynk app, and Blynk cloud server. When one
of these two sensors’ fire characteristics parameter crosses a threshold value, the
LED light switches from green to red color, the buzzer sounds, and a fire alert is
sent to the user’s Smartphone Blynk app and the Blynk cloud server. In the proposed
system, a fire event is considered an extreme danger level when both sensors’ fire
characteristics parameter levels cross the threshold value. In this case, the LED light
color switches from green to red, the buzzer sounds, and a fire alert is sent to the user’s
Smartphone Blynk app and the Blynk cloud server, similar to the previous situation.
In addition, the system generates an email for extreme fire alerts with a home location
address that transmits to the fire brigade email address, and a red-colored circular
alert appears on the cloud server. Through this email, the fire brigade responsible
person can track the home address from the email where the fire event occurs and
can take the necessary steps to restrain the fire. We have designed a prototype to
show the effectiveness of the proposed system. Through the prototype testing, the
proposed system functionalities are well justified. It has demonstrated that the system
takes into account a reasonable response time to detect multilevel fire events from
the received sensors’ values, examine them, and provide multiple alerts according to
fire risk level. Future studies will concentrate on integrating multi-criteria detections
and video image detection approaches to strengthen the proposed system even more.
References
1. Hall S. Fire loss in the United States. National Fire Protection Association
(NFPA) Research. https://www.nfpa.org/education-and-research/research/nfpa-research/fire-
statistical-reports/fire-loss-in-the-united-states. Accessed 28 Dec 2023
2. Piera PJY, Salva JKG (2019) A wireless sensor network for fire detection and alarm system. In:
7th international conference on information and communication technology (ICoICT). IEEE,
pp 1–5
3. Li S, Xu LD, Zhao S (2015) The Internet of things: a Survey. Inf Syst Front 17:243–259
4. Ayala P, Cantizano A, Sánchez-Úbeda EF et al (2017) The use of fractional factorial design
for atrium fires prediction. Fire Technol 53:893–916
5. Mahzan NN, Enzai NIM, Zin NM, Noh KSSKM (2018) Design of an Arduino-based home
fire alarm system with GSM module. J Phys: Conf Ser 1019(1):12079
6. Yadav R, Rani P (2020) Sensor based smart fire detection and fire alarm system. In: International
conference on advances in chemical engineering (AdChE)
52 S. A. Jarin et al.
7. Mahgoub A, Tarrad N, Elsherif R, Al-Ali A, Ismail L (2019) IoT-based fire alarm system.
In: Third world conference on smart trends in systems security and sustainability (WorldS4).
IEEE, pp 162–166
8. Mahgoub A, Tarrad N, Elsherif R, Ismail L, Al-Ali A (2020) Fire alarm system for smart
cities using edge computing. In: International conference on informatics, IoT, and enabling
technologies (ICIoT). IEEE, pp 597–602
9. Gosrani S, Jadhav A, Lekhak K, Chheda D (2019) Fire detection, monitoring and alerting
system based on IoT. Int J Res Eng, Sci Manag 2(4):442–445
10. Saeed F, Paul A, Rehman A, Hong WH, Seo H (2018) IoT-based intelligent modeling of smart
home environment for fire prevention and safety. J Sens Actuator Netw 7(1):11
11. Durani H, Sheth M, Vaghasia M, Kotech S (2018) Smart automated home application using
IoT with Blynk app. In: Second international conference on inventive communication and
computational technologies (ICICCT). IEEE, pp 393–397
Smart Baby Warmer with Integrated
Weight Sensing
Abstract One of the most significant and delicate areas of treatment in the biomed-
ical profession is preterm newborn care. To acclimatize to their new world, preterm
infants need a setting that is identical to the womb. In addition to this, a preterm
newborn baby’s weight is also one of the most important health indicators. A prema-
ture infant in an incubator should begin gaining weight a few days after birth because
their average weight is around 1kg lower than that of a newborn. In this work, we
have developed the On/Off control system, which is used to control the temperature
distribution inside the incubator to keep the baby’s stable and normal state inside the
incubator at the target temperature of 36 °C using Arduino. The incubator can regulate
the surrounding temperature and keep the infant’s body temperature within normal
ranges. The measured temperature will be transmitted through Global System for
Mobile Communication (GSM) technology to the nearest nurse station or caretaker.
Additionally, a load cell has been incorporated to monitor the weight of the baby in
the incubator which is under observation. The proposed system will be useful for the
preterm baby that needs continuous monitoring in the hospital Neonatal Intensive
Care Unit (NICU).
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 53
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_5
54 R. Khanal et al.
1 Introduction
Preterm babies are sensitive, and their development is greatly influenced by their envi-
ronment, which highlights the importance of this study. Babies born before 37 weeks
of pregnancy are known as preterm babies, and they frequently struggle with inad-
equate thermoregulation, making them especially sensitive to temperature changes.
The creation of an intelligent incubator is therefore crucial since it aims to mimic
the ideal conditions seen in the mother and create an atmosphere that is supportive
of these infants’ delicate demands [1].
Technological advances in the last few years have led to unparalleled progress
in the field of neonatal care, particularly in terms of reducing difficulties related
to environmental management. Notably, the environment of neonatal incubators
has changed due to the integration of complex control systems like GSM and the
Internet of Things (IoT) [2]. These technologies provide smooth communication and
intervention tactics in addition to exact monitoring of environmental indicators.
In recent research, an innovative approach to home-based baby care incubation
has been introduced, employing the PIC 16F877A microcontroller and IoT connec-
tivity through Thingspeak. The study incorporates temperature and humidity sensors,
utilizing a proportional–integral–derivative controller (PID) algorithm to regulate a
Peltier crystal assembly for precise temperature control. The system displays real-
time data on an liquid crystal display (LCD) and a web page, accessible via an
Android app, providing users with comprehensive monitoring capabilities. This work
contributes to the evolving landscape of smart incubator systems, emphasizing IoT
integration and advanced control strategies for enhanced infant care [3].
It has been found that the temperature is the same as the setting temperature or
doesn’t reach the temperature setting using the PID system as controller. The heater
will be on and vice versa otherwise the temperature is stable then the process is
completed [4].
The alarm mechanism comprising a buzzer and GSM module serves to notify
physicians promptly when monitored parameters exceed predefined set points,
emphasizing the system’s real-time responsiveness. This represents the significant
advancement in ensuring timely medical interventions and systems potential to
enhance infant care in home-based environments [5].
In the care of underground newborn babies, incubators are being used to provide
the infant with a comfortable environment while it remains observable for medical
treatment. An important issue for babies is insufficient thermoregulation. The incu-
bator has a translucent plastic interior and will completely enclose the baby to keep
it warm and maintain normal body temperature (37 °C) [6].
Infants’ bodies are unable to compensate for the thermal loss because of their poor
thermoregulation. This calls for the infant’s body to be in a warm, wet environment.
Smart Baby Warmer with Integrated Weight Sensing 55
As a result, one of the most crucial elements that must be maintained with little
volatility is temperature [7]. To prevent harm to the infant’s body, the temperature
should always be kept at the level the attending physician has prescribed.
Consequently, it is important to regularly check on the body weight of prematurely
born babies [8, 9]. Different body weight tracking devices have been created so far.
Low Birth Weight, <2500 g, and Very Low Birth Weight, <1500 g, are the weight
categories used to describe preterm neonates. Premature labor is the process of giving
birth before the pregnancy has progressed to 37 weeks or <259 days, as measured
from the beginning day of the last menstrual cycle [10].
The IoT is a global network of mechanical and digital objects, people, animals,
and computing equipment that may share data without the need for direct human or
computer involvement. The IoT is a global network of mechanical and digital objects,
people, animals, and computing equipment that may share data without the need for
direct human or computer involvement. IoT facilitates the automatic improvement
of service quality while decreasing the need for human involvement. It also helps
to improve communication between linked electronic devices, transmit data packets
over connected networks, and access information from anywhere at any time on any
device.
In IoT applications, GSM is now the most popular network technology due to its
accessibility, price, and simplicity. The General Packet Radio Service (GPRS) or
GSM module is a chip or circuit that establishes connection between a mobile device
and a computer. A GSM modem may be a mobile phone with GSM modem capabil-
ities or it may be a standalone device connected through serial, USB, or Bluetooth.
The GSM module is a component that can be incorporated into the machinery. GSM
uses SIM cards to initiate contact with the network and IMEI numbers, much like
mobile phones, to identify devices.
2 Working Principle
The smart neonatal incubator’s core processing unit is an Arduino UNO. It is in charge
of processing weight data from the load cell via HX711, temperature data from the
LM35 sensor, and heating element management. For real-time communication, the
GSM module and Arduino are interfaced with one another.
56 R. Khanal et al.
2.1 Methodology
1. Start
2. Initialize System Components:
a. Set up an Arduino UNO board.
b. Connect the LM35 temperature sensor to analog pin A0 on Arduino.
c. Connect load cell and HX711 to appropriate pins on Arduino.
d. Integrate the GSM module and establish communication using the Software
Serial library.
e. Connect relay for controlling the heating element.
f. Ensure proper power supply and grounding for all components.
3. Calibration:
a. Calibrate HX711 with a known weight to establish accurate weight measure-
ments.
b. Verify the accuracy of the LM35 temperature sensor readings.
4. Define Constants and Thresholds:
a. Set the target temperature for the incubator (e.g., 36 °C).
b. Establish weight thresholds for normal and abnormal conditions.
c. Define GSM alert messages and recipient contacts.
5. Main Control Loop:
a. Read Temperature:
i. Analog-to-digital conversion of LM35 output using Arduino ADC.
ii. Calculate temperature in degrees Celsius using the provided formula.
b. Weight Monitoring:
i. Read weight data from load cells via HX711.
ii. Convert digital weight data to meaningful measurements.
iii. Compare weight with predefined thresholds for analysis.
c. Temperature Control:
i. Compare current temperature with the target temperature.
ii. If temperature > target temperature:
– Turn off the heating element (relay control).
iii. If temperature < target temperature:
– Turn on the heating element (relay control).
d. GSM Communication:
i. Check if temperature exceeds predefined thresholds.
ii. If yes, send an SMS alert with temperature information.
iii. If weight deviates from the normal range, include weight information in
the SMS.
58 R. Khanal et al.
Power
Supply
Temperature
Sensor Relay
(LM35)
Heater
Weight Arduino
Sensor UNO
(Load Cell)
Power Supply
(AC)
HX711
SMS GSM modem
e. Display Output:
i. Print temperature and weight information on the Arduino serial monitor.
f. Loop Delay:
i. Introduce a delay to control the frequency of sensor readings and actions.
ii. Adjust the delay based on the desired system responsiveness.
6. End.
Figure 1 shows an input portion with a load cell and an LM35, a processing unit with
an Arduino UNO controlling GSM communication and temperature management,
and an output part that shows the state of the heating element, sends SMS alerts, and
displays data on a serial monitor.
Sensor calibration, component initialization, and constant definition are shown in the
flow chart in Fig. 2. After weight and temperature monitoring, heating management,
and GSM communication, the cycle is finished.
Smart Baby Warmer with Integrated Weight Sensing 59
START
Yes No
Is T > 36?
The LM35 temperature sensor is known for its high accuracy, the LM35 provides a
linear output voltage directly proportional to the Celsius temperature, with a 10mV
increase per degree Celsius. Its ease of use, wide operating range (−55 to 150°C),
low self-heating, and low cost make it a popular choice. With low power consumption
and a linear output, the LM35 is suitable for temperature-controlled systems,
The output voltage is used and can be converted to temperature using the formula:
Vout = 10 mV/(◦ C) × T
(Input voltage is 5V and ADC in Arduino is 10 bit, i.e., 210 is equal to 1024).
where
Load cells are sensors designed to convert force or weight into an electrical signal.
The HX711, on the other hand, is a precision ADC specifically designed to interface
with load cells. It amplifies the small signals generated by the load cell, making them
more suitable for digital processing by a microcontroller. The use of these compo-
nents together simplifies the integration of weight sensing into electronic systems
and provides a digital output for easy processing and display of weight data. Load
cells give high precision and fast response time; they are generally low maintenance
devices and are designed to withstand harsh environmental conditions.
The load cell output is received by HX711 which is a 24-bit ADC which is designed
for weight and it also helps to amplify the output of the load cell. It communicates with
Arduino using 2 wire interfaces (clock and Data). The system will begin calibrating
automatically as soon as the user turns it on. Wait for the signal to place 100 g over
the load cell for calibration, which will be displayed on the serial monitor. Put the
100 g weight above the load cell when it says “Put 100 g,” then wait. The calibration
process will be completed in a few seconds. After calibration, the user can place any
weight over the load cell (up to 5kg) and obtain the value.
mySerial.read() reads the incoming data using the software serial port. Serial. Write()
prints information to the Arduino serial monitor.
Send Message() is the name of the function we created in our Arduino code to
send an SMS. We must first switch our GSM module to text mode to send an SMS.
This is done by sending the AT command “AT + CMGF = 1”. This command is sent
over the Software Serial port.
In the suggested system, the light is utilized to signal the heater turning on and off.
If the bulb glows it can be considered as a heater turning on and if the bulb doesn’t
glow it can be considered as a heater turning off. This figure shows the connection
of the required components to the Arduino UNO. LM35 produces analog output
connected to the analog pin of Arduino. Common 5V and Ground are used for all
LM35, HX711, GSM, and relay. The Arduino’s digital pin receives the relay pin.
The digital pin of the Arduino is also used to link the HX711 clock and pulse, as
well as the Tx and Rx pins for GSM.
Figure 3 shows the circuit connection of every component. Components names
are mentioned in the figure along with the wire connection.
Breadboard
Bulb (Heater)
HX711
Relay
Arduino UNO
GSM modem
Load Cell
The smart neonatal incubator that has been introduced works well. The received
messages on mobile phones, which are prompted by temperature changes, demon-
strate that the system delivers SMS notifications correctly. Real-time monitoring is
ensured by the unambiguous temperature and weight values provided by the serial
monitor output. The system’s capacity to regulate the environment within the intended
temperature range is further confirmed by visual indicators, such as the glowing
or non-glowing bulb that indicates the heater state. This accessible and affordable
option has the potential to enhance the care of preterm newborns, particularly in
environments with limited resources.
Figure 4 shows the SMS received on mobile phones which is sent through inter-
face GSM modem. When the incubator’s temperature departs from the ideal range,
SMS warnings are set off, giving medical personnel prompt notice. This function
guarantees timely action in urgent circumstances.
3.2 Output
Figure 5 shows the temperature and weight measured output after completion in
the serial monitor. Real-time weight and temperature data are displayed using the
Arduino serial monitor. The conditions of the incubator may be continuously moni-
tored thanks to this capability. Medical professionals can take preemptive actions as
they are immediately aware of any changes in weight or temperature.
3.3 Heater on
Figure 6 shows the light bulb glows which indicates the heater turning on which is
controlled by relay. It glows when the relay turns off and it is because the temperature
is <36°C.
Figure 7 shows the light bulb does not glow which indicates the heater turning off
which is controlled by a relay. It does not glow when the relay turns on and it is
because the temperature is more than 36°C.
The temperature within the incubator is kept quite near to the desired 36°C.
Accurate temperature management is essential for preterm infants because it supports
their thermoregulation by simulating the womb environment. Based on temperature
data, the device successfully turns on and off the heating element.
The load cell and HX711 are devices that precisely gauge the weight of premature
babies within the incubator. It is critical to regularly measure weight in order to
evaluate the general health and growth of preterm infants. Making educated medical
decisions is facilitated by precise weight measures.
Continuous operation of the system showed stability, with consistent and
trustworthy sensor readings.
Stability is essential to the newborn incubator’s efficient operation. Preterm
newborns’ well-being is guaranteed by the system’s capacity to sustain stable
surroundings.
The idea has a strong emphasis on price, making it usable in isolated locations.
The system’s affordability increases its potential impact, particularly in areas where
purchasing pricey medical equipment would not be as practical. The capacity to
monitor remotely also improves accessibility to healthcare.
3.5 Analysis
and prognosis for preterm babies by using cutting-edge technologies like Arduino,
IoT, and GSM to build an intelligent incubator with precise temperature control and
weight monitoring.
The project’s contributions include the ability to precisely adjust temperature,
monitor weight, and use GSM technology for real-time communication. The project
successfully implements an On/Off control system using Arduino making sure that
incubator maintains a consistent temperature of 36 °C providing preterm infants with
a stable and nurturing environment. Weight is a key parameter to monitor for infants
and the system consisting of load cells and HX711 helps to monitor it. System also
has potential for remote monitoring helping real-time monitoring of infants which
is achieved using GSM technology. This helps in the area where immediate access
to medical professionals may be limited. The noteworthy results confirm how well
these features work to improve preterm newborn care and outcomes, establishing the
smart neonatal incubator as a useful innovation in neonatal healthcare.
Our integrated temperature control method is in line with accepted procedures
when compared to other studies. But what sets this study apart from others is the
use of GSM for remote warnings, which adds another level of responsiveness. The
study’s use of HX711 load cells improves the accuracy of the weight measurements.
This sophisticated method enhances the precision of health evaluations.
The smart neonatal incubator is designed and the desired output is achieved
by using Arduino UNO as the computational medium where all the computation
to control the heater and send SMS is done via connecting different sensors and
components.
Weight is one of the important parameters that should be frequently measured
for preterm infants. So, integrating a weight measurement system in an incubator
makes it easy to treat preterm infants. The SMS is received successfully. SMS is
received only if the temperature exceeds the given threshold and also SMS is sent if
the temperature falls under the normal condition along with the weight of the preterm
baby. The price of incubators available in the market is high and this system can be
used in remote areas where everyone can afford it and the doctor can also save time
by remote monitoring.
It has a great scope in the medical field and a greater impact on saving the lives
of premature babies from the threatening condition and the physician can monitor
it continuously to keep the temperature under control. The system can be further
improved by adding a voltage control system for the heater along with the On/Off
System. After that, it can be used in hospitals where less attendant time for the doctor
is needed. There are various applications involved in using SIM SMS MODERM. It
can be used in Security applications and sensor monitoring. The system is easy to
implement because there are fewer components and they are compact in size.
66 R. Khanal et al.
4 Conclusion
The study concluded with the successful development of a smart newborn incu-
bator that included real-time communication via GSM technology, weight moni-
toring using the LM35 sensor and HX711, and Arduino-based temperature manage-
ment. The incubator’s temperature was effectively regulated by the system, providing
preterm babies with a steady and comfortable environment. By including weight
tracking, a vital health indicator was made available, which facilitated the prompt
evaluation of the baby’s development. By alerting interested parties in the event
of temperature variations, real-time SMS alerts significantly improved the system’s
usefulness. Beyond its technological accomplishments, the idea has ramifications
because the suggested system is affordable and can be installed in remote locations
where conventional incubators might not be financially feasible. Furthermore, the
ability to reduce the amount of time that medical staff members must spend on rounds
by using remote monitoring highlights how useful and effective the smart neonatal
incubator is. This research has the potential to save lives and improve the health of
premature infants by upgrading neonatal care practices, particularly in regions with
low resources. Subsequent improvements, like adding a voltage control system, can
improve the system even more and make it suitable for general usage in medical
facilities like hospitals.
References
1. Feki E, Zermani MA, Mami A (2017) GPC temperature control of a simulation model infant-
incubator and practice with Arduino board. Int J Adv Comput Sci Appl 8(6):46–59. https://
doi.org/10.14569/ijacsa.2017.080607
2. Kale AW, Raghuvanshi AH, Narule PS, Gawatre PS, Surwade SB (2018) Arduino based baby
incubator using GSM technology, 462–465
3. Kumar Singh A, Leela M, Jeevitha R, Mirudhularani R, Vigneswari S (2023) Incubator for
home-based baby care using IoT. J Biomed Eng Technol 10(1):1–7. https://doi.org/10.12691/
jbet-10-1-1
4. Maghfiroh AM, Amrinsani F, Firmansyah RM, Misra S (2022) Infant warmer with digital
scales for auto adjustment PID control parameters. J Teknokes 15(2):117–123. https://doi.org/
10.35882/jteknokes.v15i2.246
5. Nidhi M, Divyang YA, Prof DV, Bhensdadiya BS (2016) Embedded system for monitoring
and control of baby incubator and warmer with local and remote access features. Int J Sci Res
Dev 4(09):299–304
6. Kshirsgar P, More V, Hendre V, Chippalkatti P (2020) IOT based baby incubator for clinic. In:
ICCCE 2019: proceedings of the 2nd international conference on communications and cyber
physical engineering, pp 349–355
7. Tisa TA, Nisha ZA, Kiber MA (2013) Design of an enhanced temperature control system for
neonatal incubator. Bangladesh J Med Phys 5(1):53–61. https://doi.org/10.3329/bjmp.v5i1.
14668
8. Widianto A, Nurfitri I, Mahatidana P, Abuzairi T, Poespawati NR, Purnamaningsih RW (2018)
Weight monitoring system for newborn incubator application. AIP Conf Proc 1933. https://doi.
org/10.1063/1.5023983
Smart Baby Warmer with Integrated Weight Sensing 67
9. Widianto A et al (2018) The effect of moving load on remote weight monitoring system
for simple infant incubator. In: 2017 international conference on broadband communication,
wireless sensors powering, BCWSP 2017, vol 2018-January, no. November, pp 1–4. https://
doi.org/10.1109/BCWSP.2017.8272572
10. Irmansyah M, Madona E, Nasution A (2019) Design and application of portable heart rate
and weight measuring tools for premature baby with microcontroller base. Int J Geomate
17(61):195–201. https://doi.org/10.21660/2019.61.ICEE12
A Robust Multi-head
Self-attention-Based Framework
for Melanoma Detection
Abstract Melanoma has the potential to spread to several body areas if it is not
found on time, which makes it one of the world’s most serious illnesses. Of all
skin tumors, melanoma is one of the most deadly and quickly spreading condi-
tions. Recently, a lot of research has been focused on convolutional neural networks
(CNNs), which comprise the majority of deep learning methods, for their ability
to detect skin malignancies in nearly identical images. With the development of
Artificial Intelligence (AI) systems with Deep Learning and Machine Learning, the
healthcare system now has impressive automation and cutting-edge options. AI-
driven automated diagnosis tools help the medical field identify the illness they are
treating. The suggested strategy for early melanoma detection from the phase of the
image is to stop the spread of the virus. The suggested technique uses a multi-head
self-attention-based transformer architecture to extract more pertinent information
from melanoma images. The model was made more robust and generalized in the
proposed study through data augmentation, enabling deployment in real-time appli-
cations. For the PH2 dataset, the proposed multi-head attention-based technique
obtained 99.11% outstanding accuracy.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 69
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_6
70 R. Patel et al.
1 Introduction
across several categories. These datasets have been crucial for training deep learning
models for image categorization applications. Large amounts of labeled data can
improve the generalization and development of representations in deep learning
models, which improves classification performance. Deep learning models, which
feature deep architectures, have a substantially larger model capacity than traditional
machine learning models. Deep learning may benefit from transfer learning, in which
knowledge from a large dataset is applied to a new, smaller dataset or specific image
classification task [7].
2 Literature Review
An ongoing cell phone application for recognizing skin malignant growth was uncov-
ered in an exploration by TaufiSq et al. [8]. Region, border, and unpredictability were
among the surface highlights that were recovered, and this surface information was
then input into an SVM for order. Alfred and Khelifi [9] completed the following
stage by removing textural and various highlights to characterize skin lesions. They,
therefore, assumed that a Histogram of Slopes (Hoard) and a Histogram of Lines
(HL) are more suited for scrutinizing and classifying skin injuries from dermoscopic
images. Imbalance, Boundary, Variety, and Width (ABCD) were the criteria used by
Alquran et al. [10]
An important part investigation (PCA) approach was used to separate the parts to
choose the most obvious elements from these. Finally, an SVM classifier was used
to determine whether an injury was dangerous. There are four essential phases in
the sequence of SC, including (1) pre-handling (2) division, (3) feature extraction,
72 R. Patel et al.
and (4) characterization, as stated by Victor et al. [11]. Four distinct classifiers, such
as Choice Tree (DT), K-Closest Neighbor (KNN), Supported Choice Tree (BT),
and SVM, were used to complete the execution assessment. Javed et al. put forth a
quantifiable histogram-based strategy for SC layout [12]. The standard of optimal
component designing extraction and characterization, a crucial test that must be
precisely determined, is required for CCV techniques. Furthermore, the computer-
aided design framework won’t be readily implemented in practical circumstances
due to the limited presentation (Exactness, Accuracy, Review, Responsiveness, and
Explicitness) of the CCV-based approach.
A successful DL-based technique for the characterization of SCs was recently
developed by Haug et al. [12]. They upgraded their key features following two
pre-made models dubbed Dense-Net and Efficient-Net. They used the HAM10000
dataset for a trial assessment and reached an accuracy of 85.8%. Making it possible
to adapt it to inexpensive devices, like mobile phones, was a key goal of this
effort. Rocket Nano. An approach to multi-class SC grouping based on CNN was
suggested by Carcagn et al. [13]. The problem adjusted the Dense-Net CNN archi-
tecture, and an SVM classifier was used to determine the final order. They used a
dataset from HAM10000 in their tests and received a 90% accuracy rate. However,
their approach seemed to perform significantly better on modified class datasets.
A collection of DL-based models for multi-class SC arrangement were put out in
[14]. The analysts used five preset deep learning models, including MobileNetV2,
Origin ResNetV2, DenseNet201, Inception3, and Google-Net, and adjusted them
following the problem. They also used a pecking order of classifiers and a basic clas-
sifier to categorize the data. The experiments were conducted using HAM10000, and
they achieved a precision of 87.7%. Dense-Net models have demonstrated excellent
performance in testing and can be useful when modifying datasets. Mohammed et al.
[15] presented a multi-class SC grouping method based on DL. This study presented
a two-layered framework to develop models on all significantly linked levels and
address the problem of an unbalanced dataset. The next phase used two pre-made
characterization models, MobileNet and DenseNet121. The good preparation infor-
mation allowed them to achieve 92.7% accuracy on the HAM10000 dataset. The
suggested paradigm may be used in mobile apps. Chaturvedi et al. [16] presented
a method for multi-class SC characterization based on DL-based order. The infor-
mation photographs are first standardized by the models, who also resize them to fit
the DL models. The highlights are then extracted and grouped using a total of five
different pre-prepared models. Using the respectable dataset HAM10000, the accu-
racy was calculated to be 92.83%. One of the main goals of this work is to combine
several DL models to better analyze outcomes by combining data from different DL
models. Almaraz-Damian et al. put up a combined system for SC characterization
[17] in light of a dermoscopic image of the main stage. They combined noteworthy
clinical components like Deviation, Line, Variety, and Measurement (ABCD) to eval-
uate findings with high-quality components more accurately. The DL-based high-
lights were divided and combined with the main stage in the subsequent stage. Order
was carried out using the appropriate vector machine and SVM classifier, which, on
the ISBI2018 dataset, achieved 92.4% accuracy.
A Robust Multi-head Self-attention-Based Framework for Melanoma … 73
3 Methodology
Fig. 2 Multi head self-attention based transformer architecture for melanoma detection
74 R. Patel et al.
Multi-head self-attention is the primary core mechanism for the vision transformer.
A multi-head self-attention network with various combinational properties learns the
positional embedding matrix. The model may concentrate on many aspects of the
image simultaneously, allowing each head to calculate each attribute independently.
In the proposed transformer’s eight layers, four self-attention models were incor-
porated. Vision transformers can extract pertinent characteristics from melanoma
pictures by using these attention heads, which can focus on various areas of the
image and generate multiple vector representations. Eq. (1) represents the computa-
tional matrix from each head in the proposed research, which uses n = 4 to represent
four self-attention modules with random initialization of Query (Q), Key (K), and
Value (V). The ultimate attention matrix is formulated as Eq. (2) [18].
n
HEADn = Att(Q.Wnq , K.WnK , V .Wnv ) (1)
i
In contrast to CNN, the self-attention layer captures all of the information and
traits from the whole input sequence. The fundamental principle underlying self-
attention is assessing how closely one thing in a chain of things ties to the other
things. Two fundamental parts, a feed-forward network and a self-attention module,
comprise a single transformer layer. An extra weight matrix is used to normalize
the output of the multi-head self-attention layer, which aids in normalizing it for the
feed-forward layer. The transformer encoder block’s ultimate output with soft-max
activation can be calculated as Eq. (3) [19].
QK T
Atti = softmax ∗ V. (3)
dq
3.3 Classification
The MLP classifier receives the transformer module’s output. The multilayer percep-
tron network is one of the most often used artificial neural networks for classification.
The proposed method uses an MLP classifier with four hidden layers of variable sizes
to conduct classification over a multi-head transformer network. Based on skills like
(i) the ability to learn in nonlinear, complex networks, the suggested technique lever-
ages MLP. (ii) It is possible to improve the generalization capacity of neural networks.
(iii) Depending on the amount of the input variable, MLP can learn autonomously
A Robust Multi-head Self-attention-Based Framework for Melanoma … 75
on its own. In the MLP, every hidden layer is perfectly linked to the one above it.
An input layer, hidden layers, and an output layer make up the MLP. The proposed
approach has [18] following FFNN for the classification purpose. Finally, the output
layer f with activation σ was calculated with Feed Forward Neural Network (FFNN)
as Eq. (4).
FFNN(f ) = σ wl ∗ al + bl . (4)
4 Experimental Results
In this section, the authors have covered the setup for multiple experiments and
enhanced datasets used to simulate the suggested technique. Additionally, the authors
have examined the suggested design using cutting-edge melanoma identification
models.
4.1 Dataset
Figure 3 shows the sample dataset images from the ISIC 2017 dataset. The proposed
methodology will be simulated using ISIC 2017 [19] and PH2 [20] standard datasets.
The proposed methodology uses the simulated binary classification of skin lesions
from melanoma and non-melanoma. Figure 2 illustrates the sample of melanoma
and non-melanoma of a targeted dataset.
See Fig. 4.
76 R. Patel et al.
Melanoma has been discovered using a dataset of skin lesions. Melanoma was
detected from photos using the suggested multi-head transformer design. The
proposed technique used an additional dataset for model training and testing to
make the model more generic. The transformer model is created using Python 3.9,
TensorFlow, and the Keras package. Simulation and comparison analysis have been
conducted on a GeForce RTX3080 with a core i7 and 32 GB of RAM. Precision,
Recall, F1-Score, and Accuracy were used to evaluate the suggested approach. [21],
In the form of Eqs. (5) through (8). With four self-attention heads spread across
eight levels of a transformer encoder, the proposed multi-head attention-based tech-
nology achieved 99.11% of validation correctness. The accuracy and loss graph for
the proposed multi-head attention learning is shown in Fig. 4.
TP
Precision = (5)
(TP + FP)
TP
Recall = (6)
TP + FN
Precision ∗ Recall
F1-Score = 2 ∗ (7)
(Precision + Recall)
TP + TN
Accuracy = . (8)
TP + TN + FP + FN
The authors also conducted simulations of the proposed approach with various train
test split ratios and discovered that 0.2 as 80:20 had the greatest accuracy. Table 1
shows the results for various train-test split ratios. Additionally, authors have exper-
imented with other classifiers, such as the Support Vector Machine (SVM) [22],
Linear Regression (LR) [23], Decision Tree (DT) [24], and MLP are demonstrated in
A Robust Multi-head Self-attention-Based Framework for Melanoma … 77
Table 2. Authors have also simulated with other convolution state-of-the-art models
and compared them with the proposed attention-based architecture. Authors have
also analyzed the proposed architecture over standard deep learning models like
CNN [25], VGG16 [26], and ResNet34 [27]. Comparative analysis is represented
in Fig. 5, while Fig. 6 demonstrates the prediction accuracy of the proposed model
over unseen data with a confusion matrix. Figure 7 shows the confusion matrix for
melanoma detection using different deep learning algorithms like CNN, VGG16 &
proposed methodology.
Fig. 6 Comparative analysis of proposed study over state-of-the-art deep learning models
Fig. 7 Comparative analysis of Confusion matrix of proposed methodology over CNN & VGG16
4.5 Discussion
5 Conclusion
One of the most common cancer types overall has been identified as skin cancer.
Melanoma is one of the most dangerous forms of skin cancer, and it must be possible
to identify and diagnose it at an early stage. Deep learning advancements, particularly
in computer vision models, help to identify such illnesses in their early stages. The
suggested study improves feature learning through the application of an attention
mechanism. The proposed architecture uses four self-attention models to extract the
most important characteristics for melanoma diagnosis. A model’s influence in real-
time scenarios has been increased by writers using data augmentation. Authors have
also used several machine learning classifiers to simulate the ISIC 2017 and PH2
datasets. The authors used cutting-edge deep learning to analyze the suggested model
for determining the presence of melanoma and discovered a remarkable 99.11%
accuracy.
References
1. Bibi A et al (2021) Skin lesion segmentation and classification using conventional and deep
learning based framework. Comput Mater Contin 71(2):2477–2495. https://doi.org/10.32604/
cmc.2022.018917
2. Razzak I, Naz S (2022) Unit-Vise: deep shallow unit-vise residual neural networks with transi-
tion layer for expert level skin cancer classification. IEEE/ACM Trans Comput Biol Bioinform
19(2):1225–1234. https://doi.org/10.1109/TCBB.2020.3039358
3. Afza F, Sharif M, Khan MA, Tariq U, Yong H-S, Cha J (2022) Multi-class skin lesion classifi-
cation using hybrid deep features selection and extreme learning machine. Sensors 22(3), Art.
no. 3, January 2022. https://doi.org/10.3390/s22030799
4. Khan MA, Muhammad K, Sharif M, Akram T, de Albuquerque VHC (2021) Multi-class
skin lesion detection and classification via teledermatology. IEEE J Biomed Health Inform
25(12):4267–4275. https://doi.org/10.1109/JBHI.2021.3067789
5. Kothadiya D, Bhatt C, Soni D, Gadhe K, Patel S, Bruno A, Mazzeo PL (2023) Enhancing
fingerprint liveness detection accuracy using deep learning: a comprehensive study and novel
approach. J Imaging 9(8):158
6. Nayak DR, Dash R, Majhi B (2020) Automated diagnosis of multi-class brain abnormalities
using MRI images: a deep convolutional neural network based method. Pattern Recognit Lett
138:385–391. https://doi.org/10.1016/j.patrec.2020.04.018
7. Khan M, Akram T, Sharif M, Kadry S, Nam Y (2021) Computer decision support system for
skin cancer localization and classification. Comput Mater Contin 68(1):1041–1064. https://doi.
org/10.32604/cmc.2021.016307
8. Taufiq MA, Hameed N, Anjum A, Hameed F (2017) m-Skin doctor: a mobile enabled system
for early melanoma skin cancer detection using support vector machine. In: Giokas K, Bokor L,
Hopfgartner F (eds) eHealth 360°. Lecture notes of the institute for computer sciences, social
informatics and telecommunications engineering, vol 181. Springer International Publishing,
Cham, pp 468–475. https://doi.org/10.1007/978-3-319-49655-9_57
9. Alfed N, Khelifi F (2017) Bagged textural and color features for melanoma skin cancer detection
in dermoscopic and standard images. Expert Syst Appl 90:101–110. https://doi.org/10.1016/j.
eswa.2017.08.010
10. Alquran H et al (2017) The Melanoma skin cancer detection and classification using support
vector machine. https://doi.org/10.1109/AEECT.2017.8257738
80 R. Patel et al.
11. Victor A, Ghalib M (2017) Automatic detection and classification of skin cancer. Int J Intell
Eng Syst 10:444–451. https://doi.org/10.22266/ijies2017.0630.50
12. Huang H-W, Hsu BW-Y, Lee C-H, Tseng VS (2021) Development of a light-weight deep
learning model for cloud applications and remote diagnosis of skin cancers. J Dermatol
48(3):310–316. https://doi.org/10.1111/1346-8138.15683
13. Carcagnì P et al (2019) Classification of skin lesions by combining multilevel learnings in a
DenseNet architecture, pp 335–344. https://doi.org/10.1007/978-3-030-30642-7_30
14. Thurnhofer-Hemsi K, Domínguez E (2021) A convolutional neural network framework for
accurate skin cancer detection. Neural Process Lett 53(5):3073–3093. https://doi.org/10.1007/
s11063-020-10364-y
15. Mohamed EH, El-Behaidy WH (2019) Enhanced skin lesions classification using deep convo-
lutional networks. In: 2019 ninth international conference on intelligent computing and infor-
mation systems (ICICIS), December 2019, pp 180–188. https://doi.org/10.1109/ICICIS46948.
2019.9014823
16. Chaturvedi SS, Tembhurne JV, Diwan T (2020) A multi-class skin cancer classification using
deep convolutional neural networks. Multimed Tools Appl 79(39):28477–28498. https://doi.
org/10.1007/s11042-020-09388-2
17. Almaraz-Damian J-A, Ponomaryov V, Sadovnychiy S, Castillejos-Fernandez H (2020)
Melanoma and nevus skin lesion classification using handcraft and deep learning feature fusion
via mutual information measures. Entropy 22(4), Art. no. 4, April 2020. https://doi.org/10.3390/
e22040484
18. Kothadiya D, Bhatt C, Saba T, Rehman A (2023) SIGNFORMER: deepvision transformer for
sign language recognition. In: IEEE access, vol PP, pp 1–1, January 2023. https://doi.org/10.
1109/ACCESS.2022.3231130
19. Vaswani A et al (2017) Attention is all you need. December 5, 2017. ArXiv: https://doi.org/
10.48550/arXiv.1706.03762
20. Gajera HK, Nayak DR, Zaveri MA (2023) A comprehensive analysis of dermoscopy images
for melanoma detection via deep CNN features. Biomed Signal Process Control 79:104186.
https://doi.org/10.1016/j.bspc.2022.104186
21. Berseth M (2017) ISIC 2017—Skin lesion analysis towards melanoma detection, March 1,
2017. ArXiv: https://doi.org/10.48550/arXiv.1703.00523
22. Mendonça T, Ferreira PM, Marques JS, Marcal ARS, Rozeira J (2013) PH2—a dermoscopic
image database for research and benchmarking. In: 2013 35th annual international conference
of the IEEE engineering in medicine and biology society (EMBC), July 2013, pp 5437–5440.
https://doi.org/10.1109/EMBC.2013.6610779
23. Kothadiya DR, Bhatt CM, Rehman A, Alamri FS, Saba T (2023) SignExplainer: an explainable
ai-enabled framework for sign language recognition with ensemble learning. IEEE Access
11:47410–47419. https://doi.org/10.1109/ACCESS.2023.3274851
24. Kothadiya D, Bhatt C, Sapariya K, Patel K, Gil-González A-B, Corchado JM (2022) Deepsign:
sign language detection and recognition using deep learning. Electronics 11(11), Art. no. 11,
January 2022. https://doi.org/10.3390/electronics11111780
25. Mahmood T, Li J, Pei Y, Akhtar F, Rehman MU, Wasti SH (2022) Breast lesions classifications
of mammographic images using a deep convolutional neural network-based approach. PLoS
One 17(1):e0263126. https://doi.org/10.1371/journal.pone.0263126
26. Alwakid G, Gouda W, Humayun M, Sama NU (2022) Melanoma detection using deep learning-
based classifications. Healthcare 10(12):2481. https://doi.org/10.3390/healthcare10122481
27. Kothadiya D, Rehman A, Abbas S, Alamri FS, Saba T (2023) Attention-based deep learning
framework to recognize diabetes disease from cellular retinal images. Biochem Cell Biol 101(6)
Domain Knowledge Based Multi-CNN
Approach for Dynamic and Personalized
Video Summarization
Abstract In this paper, we present the Multi-CNN approach for dynamic and
personalized Video Summarization. The proposed approach is grounded on Cricket
Sport domain knowledge to learn complex and domain features. The personalized
video summary is based on individual user preferences and is dynamic (dynamic
summary). The considerations of individual user preference, domain knowledge,
dynamic content, and Cricket sport make the work one of its kind. The proposed
Multi-CNN architecture entails two levels, CNN Level-1 and CNN Level-2. We
present domain activity-based video segmentation through CNN Level-1 to generate
dynamic video segments. The video segments are then forwarded to CNN Level-2,
which includes a stacked organization of two models (Umpire detection and umpire
pose recognition) to label the video segments. The individual user preference is
matched with labeled video segments for key segment identification. We also propose
two novel summary evaluation metrics based on individual user reactions. The results
indicate the promising performance of the proposed system and provide significant
insights for dynamic and personalized video summarization.
1 Introduction
Video Summarization undertakes the mechanism to convert an original raw video into
a compact and informative variant of the video, referred to as video summary. The
perpetual growth of high-volume video data at exponential high velocity brings time
and space constraints. Video summarization addresses these constraints by gener-
ating a time- and space-efficient summary of the video that reflects high value (useful
and requirement-specific). Personalized Video Summarization fixates the individual
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 81
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_7
82 P. Narwal et al.
user preferences to select the representative content for summary generation. Consid-
ering the subjective nature of individual preferences, the summary expectations may
be different for each individual. Thus, accounting for individual user preferences to
hold the subjective expectation towards the summary is pivotal for personalized video
summarization. The personalized video summarization process involves a sequen-
tial procedure as follows: (a) Input: The raw input video and user preferences are
provided; (b) Video Segmentation: The input video is divided into video segments;
(c) Key Segment Selection: The segments of interest i.e. video segments representing
the user preferences are selected and (d) Personalized Video Summary: The selected
video segments are combined in a non-overlapping sequential manner to create a
personalized video summary based on user preferences. The acquisition of user
preferences may include user preference profile history, click activity, manual user
selection, audio input, keyword input, query image, physiological responses, behav-
ioral responses, and biological responses (EEG, BP, FMRI, etc.). These user prefer-
ences establish the grounds for key segment selection. The selected key segments,
i.e., representing the user preferences, are then combined to generate a personalized
video summary.
This paper presents a personalized and dynamic video summarization approach for
the Cricket Sports Domain. We propose a Multi-Convolution Neural Network (Multi-
CNN) architecture to capture domain knowledge and generate user preference-based
video summaries. We also present two novel video summary evaluation metrics based
on user reactions to evaluate the performance of personalized video summarization.
The motivation of this work relies on the potential gaps in existing works in
the domain of video summarization. The first motivation is the non-availability of
dynamic video summaries. Most existing works in personalized video summarization
generate a static video summary (collection of keyframes) and, thus, discard audio
and motion features for summary inclusion. This paper generates a dynamic video
summary and captures salient information, including visual (frames), audio, motion,
and continuity features. The second motivation targets domain knowledge adaptation
for video summarization. There exist some video summarization works for sports
domains, including football, basketball, baseball, soccer, fencing, table tennis, tennis,
and rugby. Despite cricket being the second most popular sport in the world, there
are no significant contributions to video summarization in the Cricket domain. In
this paper, we present cricket domain knowledge-based video segmentation and
video summarization approach, making the paper one of a kind. The significant
contributions of this paper are highlighted as follows:
1. User preference-based personalized and dynamic video summarization: This
paper presents the design and development of video summarization systems
based on individual user preferences to generate personalized and dynamic video
summaries.
2. Multi-CNN architecture for Domain knowledge-based video segmentation and
video summarization: This work proposes Multi-CNN architecture to target
Cricket domain knowledge for effective video segmentation and video summa-
rization of Cricket sport videos.
Domain Knowledge Based Multi-CNN Approach for Dynamic … 83
2 Related Works
Video Summarization has been explored across diverse domains related to commer-
cial, education, security, and entertainment applications. A detailed taxonomy and
understanding of video summarization across different criteria and applications is
given by Narwal et al. [1].
Dynamic Video Summarization considers video segments as the basic unit for
summary generation. The dynamic summary generated so forth holds visual infor-
mation (video frames), audio information, and continuity-motion information [2]
and presents a comprehensive survey for video skimming (dynamic video summa-
rization). Various research works addressing video summarization have targeted the
generation of dynamic video summaries like [3–9]
Personalized Video Summarization is actively researched to generate user pref-
erence or query-based video summary. Several works, such as [10], use similarity
scores to identify user preference-based content. User profile considered by [11, 17];
user attention considered by [13, 16, 30], user query/preference considered by [14,
15, 15, 18, 19], multi-modal user reactions considered by [12] define preferences
over video content for a summary generation as observed.
Sports Video Summarization targets sports videos for summary/highlights gener-
ation. Various Video Summarization approaches for different sports (Basketball by
[20, 26], fencing by [21], Soccer by [22, 23, 25, 26, 29, 32], Tennis by [24], Baseball
by [27, 28]) have been proposed.
3 Proposed Approach
Fig. 1 Proposed multi-CNN architecture for dynamic and personalized video summarization
The proposed Multi-CNN architecture not only captures user preferences to create
a personalized video summary but also considers domain knowledge of the sport and
dynamic content for summary inclusion.
3.1 Input
The proposed Multi-CNN architecture selects the dynamic key segments to create a
personalized and dynamic video summary grounded exclusively on user preferences.
The approach considers an input video V (Cricket sport match) represented as a
continuous sequence of video frames (f ) and audio information (a) such that V =
(f1 .a1 , f2 .a2 , . . . , fn .an ). Along with the input raw video, the user provides input for
individual user preference (λ). The user preference will establish the grounds for key
segment selection.
The first level of the proposed Multi-CNN architecture includes Domain Activity
Video Segmentation grounded on Cricket sport domain knowledge. This level is
accountable for segmenting the input video into video segments, including video
frames, audio, and continuity features. The proposed domain activity-based video
segmentation uses a custom CNN trained on the Delivery-Play Cricket Sport (DPCS)
dataset proposed in Narwal et al. [28].
The game of cricket is governed by standard rules that apply to all the formats of
cricket. The Cricket bowling activity is specified by the definition of legal and fair
delivery in Cricket sport, which states that when a bowler swings the arm over the
shoulder to release the ball towards batsmen, the elbow should not be straightened
further once the arm has reached the level of the shoulder (as shown in Fig. 3). Using
this standard definition, we propose an activity-based video segmentation strategy
that recognizes the bowling activity or delivery instance, as shown in Fig. 2.
The CNN model is trained to recognize bowling activity and mark the corre-
sponding video frame. (f i ) as “Delivery” instance i.e., M (fi ) = Delivery. All
the other video frames apart from the bowling activity (Delivery) are marked as
“Play” instances i.e. M (fi ) = Play. The video segmentation relies on two consecu-
tive “Delivery” frames and identifies the intermediate content as part of one video
segment i.e. Vsegment . Similarly, this strategy is applied to the entire Cricket match
until the entire match is divided into video segments S(V ), where each video segment
(V segment ) represent individual delivery courses. A Delivery course starts from the
“delivery” frame. (f p ), followed by some action to the corresponding ball, finally
leading to a result (boundary, out, runs, etc.) and ending with another “delivery”
frame (f q ), which also starts the onset of yet another delivery course. The total
bowling instances with “Delivery” marking in a video are defined by BI .
86 P. Narwal et al.
n
∀f ∈ V , BI = 1 M (fi ) = Delivery (1)
1
BI
S(V ) = Vsegment . (3)
segment=1
detection and Umpire Pose Recognition. In cricket, the umpire declares the result
of every delivery through specific and standard poses governed by the International
Cricket Council (ICC). Thus, for each event happening during a delivery, the result is
perceived and understood considering the umpire pose as a basis. We use this domain
knowledge to classify the event contained within a video segment. This level encap-
sulates the stacked organization of two models representing Umpire detection and
Umpire Pose recognition, respectively trained over the SNOW dataset [33]. Firstly,
each frame of the video segment is processed by the Umpire detection model to deter-
mine the presence of the umpire in the frame. The umpire detection model performs
two label classification over video frames i.e., marking the frames M (fi ) = Umpire
andM (fi ) = Non − Umpire. The model includes a sequence of four convolutional
layers targeted by ReLU activation and two fully connected layers. Secondly, for
each positive “Umpire” label, the corresponding frame (f Umpire ) is now considered
for umpire pose recognition. The umpire pose recognition module performs five
label classification marking
five different
event results of the preceding
delivery
course
given
by: M f
Umpire = Six,
M f Umpire = No − ball, M f Umpire = out,
M fUmpire = wide and M fUmpire = No action. Once the umpire-detected frames
are labeled, the concerned video segments containing Umpire frames are marked with
the corresponding Umpire pose marking. The video segment (V segment ) is marked
with the label in accordance with the marking of the Umpire pose label, given by:
then, M Vsegment = M fUmpire . (5)
The key segments represent segments of interest for inclusion in the summary. We
consider user preference (λ) given as input to identify key segments and include
them in user preference-based personalized video summary. The user preference (λ)
is matched with the label marking of video segments to select the most representative
(User preference) video content. The video segment with label marking like the user
preference is considered a key segment, given by:
if, M Vsegment = λ (6)
The identified key segments entail selective events and video content as per the
expectations and requirements of the end user. The selection of video segments for
summary inclusion is entrusted by individual user preference. These selected video
segments i.e. key segments, are combined under a union operation in a continuous
and non-overlapping manner with time synchronization to ensure the original order
of the events (as in original video).
NKS
Personalized Video Summary = Key SegmentK (8)
K=1
NKS = 1 M Vsegment = λ (9)
The generated video summary conforms to the dynamic criteria through the inclu-
sion of visual, audio, motion, and continuity information and suffices personalized
criteria through user preference-based content selection.
4 Result Analysis
training and validation accuracy of 91.45 and 74.36% after 100 epochs. The corre-
sponding observed training and validation loss for the model are 0.0312 and 0.0978,
respectively. Figure 4 highlights the results of the Umpire Pose Recognition using
our model. In order to calculate the accuracy of CNN Level-2, we average the Umpire
Detection Accuracy and Umpire Pose Recognition Accuracy. The combined accuracy
of CNN Level-2 after averaging operation is 83.92%.
The overall accuracy of proposed Multi-CNN architecture is calculated by
averaging the accuracies of CNN Level-1 and CNN Level-2, given by:
5 Conclusion
then forwarded to CNN Level-2 for key segment identification. The CNN Level-2
includes a stacked organization of two models, i.e., Umpire detection and umpire pose
recognition, respectively. These two modules are trained over the SNOW dataset. The
key segment identification requires the initial detection of the umpire in the video
frame and forwards umpire detected video frames for pose recognition. The results of
the umpire pose recognition module label the video segment with the corresponding
event contained within the segment (Umpire pose recognition gives the result of the
event). Now, the personalized parameter, i.e., user preferences, forms the basis for
identifying key segments. The user preference is matched with label marking of video
segments to select the most representative (User preference-based) video content. The
video segments matching the user preference are selected as key segments. These key
segments are combined with a union operation in a continuous, non-overlapping and
time order fashion to generate a dynamic and personalized video summary. In this
work, we also propose two novel summary evaluation metrics i.e., User Rating Score
and Composite Summary Score (CS-Score) based on user reactions. Our proposed
Multi-CNN architecture achieved an overall accuracy of 90.21%. The evaluation of
the generated summary over proposed evaluation metrics i.e., User Rating Score
(overall average 9.1 on a scale of 10) and CS-Score indicates that our proposed
approach outperforms related works in the field.
Moreover, the proposed User Rating Score evaluation metric provides a qualita-
tive, subjective, and individual evaluation of the generated summary. The CS-Score
evaluation metrics serve as a standalone metric representing both the qualitative and
quantitative performance of the personalized summary. The experiments reveal that
the proposed approach provides promising results.
The future work concerning this paper may include the inclusion of other appli-
cation domains and embedding their domain knowledge for effective Video Summa-
rization. Moreover, effective strategies to capture user preferences will definitely
contribute to generation of a more precise and personalized video summaries.
Domain Knowledge Based Multi-CNN Approach for Dynamic … 93
References
1. Narwal P, Duhan N, Kumar Bhatia K (2022) A comprehensive survey and mathematical insights
towards video summarization. J Vis Commun Image Represent 89:103670. https://doi.org/10.
1016/j.jvcir.2022.103670
2. Vivekraj VK, Sen D, Raman B (2019) Video skimming. ACM Comput Surv 52(5):1–38. https://
doi.org/10.1145/3347712
3. Fei M, Jiang W, Mao W (2018) Creating memorable video summaries that satisfy the user’s
intention for taking the videos. Neurocomputing 275:1911–1920. https://doi.org/10.1016/j.neu
com.2017.10.030
4. Chu W-S, Song Y, Jaimes A (2015) Video co-summarization: video summarization by visual
co-occurrence. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR).
https://doi.org/10.1109/cvpr.2015.7298981
5. Gygli M, Grabner H, Riemenschneider H, Van Gool L (2014) Creating summaries from user
videos. In: Computer vision—ECCV 2014, pp 505–520. https://doi.org/10.1007/978-3-319-
10584-0_33
6. Panda R, Das A, Wu Z, Ernst J, Roy-Chowdhury AK (2017) Weakly supervised summarization
of web videos. In: 2017 IEEE international conference on computer vision (ICCV). https://doi.
org/10.1109/iccv.2017.395
7. Kannan R, Ghinea G, Swaminathan S (2015) What do you wish to see? A summarization
system for movies based on user preferences. Inf Process Manage 51(3):286–305. https://doi.
org/10.1016/j.ipm.2014.12.001
8. Tsai C-M, Kang L-W, Lin C-W, Lin W (2013) Scene-based movie summarization via role-
community networks. IEEE Trans Circuits Syst Video Technol 23(11):1927–1940. https://doi.
org/10.1109/tcsvt.2013.2269186
9. Zhang S, Zhu Y, Roy-Chowdhury AK (2016) Context-aware surveillance video summarization.
IEEE Trans Image Process 25(11):5469–5478. https://doi.org/10.1109/tip.2016.2601493
10. Panagiotakis C, Papadakis H, Fragopoulou P (2020) Personalized video summarization based
exclusively on user preferences. Lect Notes Comput Sci 305–311. https://doi.org/10.1007/978-
3-030-45442-5_38
11. Darabi K, Ghinea G (2016) User-centered personalized video abstraction approach adopting
sift features. Multimedia Tools Appl 76(2):2353–2378. https://doi.org/10.1007/s11042-015-
3210-4
12. Peng W-T, Chu W-T, Chang C-H, Chou C-N, Huang W-J, Chang W-Y, Hung Y-P (2011) Editing
by viewing: automatic home video summarization by viewing behavior analysis. IEEE Trans
Multimedia 13(3):539–550. https://doi.org/10.1109/tmm.2011.2131638
13. Mehmood I, Sajjad M, Rho S, Baik SW (2016) Divide-and-conquer based summarization
framework for extracting effective video content. Neurocomputing 174:393–403. https://doi.
org/10.1016/j.neucom.2015.05.126
14. Fei M, Jiang W, Mao W (2021) Learning user interest with improved triplet deep ranking and
web-image priors for topic-related video summarization. Expert Syst Appl 166:114036. https://
doi.org/10.1016/j.eswa.2020.114036
15. Varini P, Serra G, Cucchiara R (2017) Personalized egocentric video summarization of cultural
tours on user preferences input. IEEE Trans Multimedia 19(12):2832–2845. https://doi.org/10.
1109/tmm.2017.2705915
16. Qayyum H, Majid M, ul Haq E, Anwar SM (2019) Generation of personalized video summaries
by detecting viewer’s emotion using electroencephalography. J Vis Commun Image Represent
65:102672. https://doi.org/10.1016/j.jvcir.2019.102672
17. Yin Y, Thapliya R, Zimmermann R (2018) Encoded semantic tree for automatic user profiling
applied to personalized video summarization. IEEE Trans Circ Syst Video Technol 28(1):181–
192. https://doi.org/10.1109/tcsvt.2016.2602832
18. Zhang L, Jing P, Su Y, Zhang C, Shaoz L (2017) SnapVideo: personalized video generation for
a sightseeing trip. IEEE Trans Cybern 47(11):3866–3878. https://doi.org/10.1109/tcyb.2016.
2585764
94 P. Narwal et al.
19. Rathore A, Nagar P, Arora C, Jawahar CV (2019) Generating 1 minute summaries of day long
egocentric videos. In: Proceedings of the 27th ACM international conference on multimedia.
https://doi.org/10.1145/3343031.3350880
20. Liu Z (2019) 3DSportNet: 3D sport reconstruction by quality-aware deep multi-video
summation. J Vis Commun Image Represent 65:102651. https://doi.org/10.1016/j.jvcir.2019.
102651
21. Tejero-de-Pablos A, Nakashima Y, Sato T, Yokoya N, Linna M, Rahtu E (2018) Summariza-
tion of user-generated sports video by using deep action recognition features. IEEE Trans
Multimedia 20(8):2000–2011. https://doi.org/10.1109/tmm.2018.2794265
22. Sen A, Deb K (2022) Categorization of actions in soccer videos using a combination of transfer
learning and gated recurrent unit. ICT Express 8(1):65–71. https://doi.org/10.1016/j.icte.2021.
03.004
23. Sheng B, Li P, Zhang Y, Mao L, Chen CL (2021) Greensea: visual soccer analysis using a broad
learning system. IEEE Trans Cybern 51(3):1463–1477. https://doi.org/10.1109/tcyb.2020.298
8792
24. Boukadida H, Berrani S-A, Gros P (2017) Automatically creating adaptive video summaries
using constraint satisfaction programming: application to sport content. IEEE Trans Circ Syst
Video Technol 27(4):920–934. https://doi.org/10.1109/tcsvt.2015.2513678
25. Sanabria M, Precioso F, Menguy T (2021) Hierarchical multimodal attention for deep video
summarization. In: 2020 25th international conference on pattern recognition (ICPR). https://
doi.org/10.1109/icpr48806.2021.9413097
26. Shen J, Cheng Z (2010) Personalized video similarity measure. Multimedia Syst 17(5):421–
433. https://doi.org/10.1007/s00530-010-0223-8
27. Nitta N, Takahashi Y, Babaguchi N (2008) Automatic personalized video abstraction for sports
videos using metadata. Multimedia Tools Appl 41(1):1–25. https://doi.org/10.1007/s11042-
008-0217-0
28. Narwal P, Duhan N, Bhatia KK (2023) A novel multimodal neural network approach for
dynamic and generic sports video summarization. Eng Appl Artif Intell 126:106964. https://
doi.org/10.1016/j.engappai.2023.106964
29. Fei M, Jiang W, Mao W (2018) Creating personalized video summaries via semantic event
detection. J Amb Intell Humaniz Comput. https://doi.org/10.1007/s12652-018-0797-0
30. Han J, Li K, Shao L, Hu X, He S, Guo L, Han J, Liu T (2014) Video abstraction based on
fmri-driven visual attention model. Inf Sci 281:781–796. https://doi.org/10.1016/j.ins.2013.
12.039
31. Ji Z, Zhang Y, Pang Y, Li X (2018) Hypergraph dominant set based multi-video summarization.
Signal Process 148:114–123. https://doi.org/10.1016/j.sigpro.2018.01.028
32. Ouyang J, Liu R (2013) Ontology reasoning scheme for constructing meaningful sports video
summarisation. IET Image Proc 7(4):324–334. https://doi.org/10.1049/iet-ipr.2012.0495
33. Ravi A, Venugopal H, Paul S, Tizhoosh (2018) HRA dataset and preliminary results for umpire
pose detection using SVM classification of deep features. In: 2018 IEEE symposium series on
computational intelligence (SSCI).https://doi.org/10.1109/ssci.2018.8628877
Efficient Information Retrieval: AWS
Textract in Action
1 Introduction
The capacity of optical character recognition (OCR) to propel research in the social
sciences and humanities holds substantial promise. This technology enables the auto-
mated extraction of text from digital images, thereby unlocking extensive volumes
of historical documents that have not received sufficient scholarly attention [14]. The
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 95
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_8
96 R. Nancy Deborah et al.
presence of uneven lighting did not significantly impede the OCR process. However,
noise had a more pronounced negative impact. Nevertheless, the accuracy remained
acceptable even with a 10% noise level and a high resolution of 600 dpi [1]. The
primary purpose of Optical Character Recognition (OCR) is to transform an image
containing text into editable text. Consequently, it is essential to employ techniques
such as information extraction, natural language processing, and corpus analysis to
process the extracted text [3] further. When weighed against the significant expenses
involved in rectifying OCR errors, the quality of OCR outputs is deemed adequate
for human readability and document exploration [20]. Organizations employ Optical
Character Recognition (OCR) to transform document images into text that machines
can interpret. This approach offers a practical means of efficiently exploring extensive
document collections through automated tools like indexing for text-based searches
and machine translation [6]. Document images possess intriguing and unique charac-
teristics. They exhibit recurring character patterns and contain strokes/glyphs that are
common across various characters, languages, and styles. We argue that these char-
acteristics make the preprocessing stage we’ve created suitable for various situations
[5]. To consistently achieve character accuracy rates (CAR) surpassing 98% for older
historical printed materials, it is usually imperative to develop a specialized model
customized for a particular book [4]. The ability to perform computational searches
within historical textual archives is revolutionizing the approaches employed by
researchers. It enables them to discover and explore documents of relevance [13]
efficiently. Despite the inevitability of OCR errors with each scan, these inaccuracies
vary with each iteration. This strategy leverages this diversity by employing multiple
renditions of the same book to eliminate OCR errors, provided a given error doesn’t
replicate across multiple scans [7]. Examining the quality of document images is
essential during the advancement of algorithms to enhance and restore them. Such
investigations enable us to gain insights into the various forms of deterioration that
can impact document images and facilitate the creation of dependable techniques for
assessing the extent of these deteriorations [8].
2 Related Work
Kumar [12] proposed an efficient text extraction algorithm that performed better
than other existing methods on complex images without significantly increasing the
computational cost. The algorithm is based on line detection and is, therefore, very
efficient, robust, and capable of extracting text from video frames and images. It is
also able to represent the intrinsic characteristics of text better.
Talukder and Mallick [11] proposed a new method for extracting text from images
that is more accurate than existing methods. The process was tested on various
images, including caption and scene text. The authors found that their method can
extract text quickly and accurately.
Saudagar et al. [9] have introduced an exact method for extracting and recognizing
text in Arabic, which can also be adapted for languages sharing similar script styles
Efficient Information Retrieval: AWS Textract in Action 97
like Chinese, Japanese, Korean, Persian, Urdu, and Hindi. The primary constraint of
this proposed technique lies in its reliance on the specific font employed, as various
fonts may exhibit varying calligraphic renditions of the same characters. Neverthe-
less, with some minor adjustments, this approach can be enhanced to accommodate
a broader spectrum of fonts.
Li and Zhao [19] developed a new feature extraction method called CILIN. As the
number of dimensions increases, the calculation becomes increasingly more complex
and less reliable. Therefore, the authors used their method to reduce the number of
dimensions to thousands.
Sundaresan and Ranjini [15] engineered a system to extract text from digital
English comic images automatically. In our experiments, the system could correctly
detect text bubbles and extract the text using a median filter with an accuracy of
94.82%.
Ahmed et al. [16] introduced a novel text graphics component extraction method,
employing a part-based strategy. Their technique involves retrieving all SURF key
points within an unknown image, followed by comparing key points present in refer-
ence templates encompassing characters and non-character elements. Their empirical
investigations on authentic floor plan images noted a remarkable accuracy rate of over
95% in character identification.
Mahajan and Rani [2] determined that extensive research in scenic text extraction
has been conducted for Indian and non-Indian scripts. However, opportunities for
enhancement within the Indian context remain. The focus has been on employing
neural network classifiers for text feature extraction. Yet, utilizing deep learning
models is emerging as a pivotal avenue for improved recognition rates.
Devi and Sumathi [10] assessed various text extraction techniques and devised
a means to gauge their outcomes. They employed precision, recall, and F-measure
to appraise each method’s efficacy. The Gamma Correction approach secured the
top spot with an average precision rate of 78% and a recall rate of 96%, leading the
authors to assert its superiority over alternative text extraction methods.
Cheng et al. [18] developed a novel text feature extraction technique named
TFERs. It comprises four stages: text preparation, text feature vector generation,
attribute significance computation, and attribute reduction. TFERs exhibited supe-
rior performance in diverse tasks like text clustering, classification, and retrieval,
surpassing contemporary methods.
Chang [17] devised a novel method for detecting and extracting text within
natural scene images. Initially, it transforms the images into grayscale and employs a
machine-learning approach to recognize the text. Through rigorous testing on diverse
images, the algorithm successfully achieves a commendable 94.65% accuracy in text
identification and extraction, demonstrating its proficiency in this task.
98 R. Nancy Deborah et al.
3 Proposed Methodology
A suitable input for Amazon Textract operations can encompass single-page and
multi-page documents, including diverse document types such as legal papers,
forms, identification records, or correspondence. Forms often consist of questions
or prompts intended for individuals to provide answers, such as patient registration
forms, tax documents, or insurance claims. These documents may be presented in
formats such as JPEG, PNG, PDF, or TIFF. It’s worth noting that PDF and TIFF
formats are well-suited for handling multi-page documents.
Document analysis in AWS Textract refers to the initial process where the service
examines an input document, identifies its type (e.g., form, table, plain text), and
performs layout analysis. This analysis sets the stage for Textract to accurately detect
and extract text and data from the document’s structure, ensuring it processes the
content appropriately based on its nature.
Textract initiates its analysis by examining the given document. It possesses the
ability to autonomously identify the document’s nature (e.g., form, table, or plain
text) and subsequently employ distinct processing methods tailored to its type.
Confidence scores in AWS Textract serve as numerical metrics that reflect the
service’s confidence level or certainty regarding the precision of the text and data it
retrieves from a document. Each extracted element, such as words, lines, or tables,
is assigned these scores. Elevated confidence scores signify a heightened level of
certainty in the extraction’s accuracy, whereas lower scores imply some degree of
doubt.
These confidence scores are valuable tools for users to gauge the caliber and
dependability of the extracted content. For instance, they can establish confidence
thresholds to sift out less reliable outcomes or prioritize data with superior confidence
scores within their applications or workflows. This approach ensures the utilization
of only precise and dependable data for subsequent processes and analyses.
Text Confidence Score: Textract supplies a confidence score for every identified
word or line of text within a document, signaling its certainty regarding the precision
100 R. Nancy Deborah et al.
3.5 Deployment
Deploying Amazon Textract with Python and Flask entails creating a web application
capable of receiving user files, sending them to Amazon Textract for text extraction,
and presenting the extracted text to users. This process significantly enhances the
user interface for a seamless experience.
AWS Textract can be integrated into a Python Flask application using the AWS
SDK for Python (Boto3).
Boto3 is the official software development kit (SDK) provided by Amazon Web
Services (AWS) for Python developers. It allows developers to interact with AWS
services and resources programmatically, making building applications that leverage
AWS cloud services easier.
The following mechanism involved is given as follows;
User Uploads Document: Create a form in your Flask app that allows users to upload
documents. Typically, this involves creating an HTML form with an input field for
file uploads.
Handle Document Upload: In your Flask route, handle the uploaded document by
processing it on the server. You can access the uploaded file using the request files
in Flask.
Invoke AWS Textract: Once you have the uploaded document, use the Boto3 library
to interact with AWS Textract.
Retrieve Textract Results: Once the Textract job is finished, you can retrieve the
extracted text and structured data from the response. AWS Textract provides struc-
tured JSON output that contains information about detected text, tables, forms, and
more.
Present or Store Results: You can then present the extracted data to the user through
your Flask application or store it in a database for further processing or analysis.
Response to User: Finally, respond to the user’s request, either displaying the
extracted information or providing a download link for the processed document.
Considering the drawback of the cost associated with AWS Textract, which can lead
to substantial expenses, mainly when dealing with extensive document processing,
it may present financial constraints for organizations. Consequently, we have imple-
mented a text extraction solution using AWS services, granting complimentary access
to all users. We have also improved the user experience by developing an intuitive
interface using the Flask framework, allowing users to harness AWS’s robust text
extraction capabilities without incurring charges. We’ve incorporated AWS Textract,
102 R. Nancy Deborah et al.
When an image is provided as input to Amazon Textract, the service scans the
image to identify and extract text and data. It can distinguish between printed text,
handwriting, and various types of content, such as tables and forms within the image.
Textract accurately extracts this information using machine learning algorithms, even
from intricate or distorted images. The extracted text and data can be utilized for addi-
tional processing or integration with other applications. This functionality facilitates
the quick and efficient extraction of valuable information from images, leading to
automation and enhanced efficiency in document processing workflows.
When using Amazon Textract with a PDF document, you upload the file to the service,
which then analyzes its content, extracting text, tables, and forms. Textract can differ-
entiate between content types like headers, footers, and body text, maintaining the
original layout. This feature makes it efficient to extract information from PDFs, such
as invoices or reports, without manual entry. Textract analyzes each page individually
when processing multi-page PDFs, preserving the document’s structure and layout.
This capability enables the extraction of valuable information from multi-page PDFs,
such as books or contracts, without manual page-by-page processing.
When you upload an image containing tabular data to Amazon Textract, the service
analyzes the image. It extracts the tabular information into a structured format that
can be used for further processing or analysis. Textract can accurately identify the
rows and columns of the table within the image and the text within each cell. This
extracted tabular data can then populate databases, create spreadsheets, or integrate
with other applications, streamlining, removing, and utilizing data from images.
Efficient Information Retrieval: AWS Textract in Action 103
Table 1, displays the sample image in a tabular format which has been carried out
to illustrate valuable data extraction of tabular data.
The research primarily centers on extracting valuable data using AWS Textract. To
conclude, we successfully implemented text extraction using AWS Textract in a
Python Flask application. We could extract text from various types of documents,
including images and PDFs, demonstrating the power and versatility of Textract.
Our application provides an efficient and user-friendly way to extract text from docu-
ments, making it a valuable tool for businesses and organizations. Future work hinges
on exploring strategies to boost text extraction accuracy, which may involve refining
Textract models or incorporating supplementary preprocessing methods. Stream-
line the application for scalability, allowing it to efficiently manage larger docu-
ment volumes and concurrent users, potentially leveraging AWS Elastic Beanstalk
or Lambda.
Implement integration with various AWS services or external tools to facilitate
document storage, analysis, or additional processing. Elevate the user interface to
ensure a more intuitive and fluid user experience, including batch processing and
104 R. Nancy Deborah et al.
References
15. Sundaresan M, Ranjini S (2012) Text extraction from digital English comic image using two
blobs extraction method. In: International conference on pattern recognition, informatics and
medical engineering (PRIME 2012), Salem, India, pp 449–452. https://doi.org/10.1109/ICP
RIME.2012.6208388
16. Ahmed S, Liwicki M, Dengel A (2012) Extraction of text touching graphics using SURF. In:
2012 10th IAPR international workshop on document analysis systems, Gold Coast, QLD,
Australia, pp 349–353. https://doi.org/10.1109/DAS.2012.39
17. Chang R-C (2011) Intelligent text detection and extraction from natural scene images. In: The
16th North-East Asia symposium on nano, information technology and reliability, Macao, pp
23–28. https://doi.org/10.1109/NASNIT.2011.6111115
18. Cheng Y, Zhang R, Wang X, Chen Q (2008) Text feature extraction based on rough set. In:
2008 fifth international conference on fuzzy systems and knowledge discovery, Jinan, China,
pp 310–314. https://doi.org/10.1109/FSKD.2008.521
19. Li X-F, Zhao L-l (2008) A multilayer method of text feature extraction based on CILIN. In:
2008 international conference on computer science and information technology, Singapore, pp
48–52. https://doi.org/10.1109/ICCSIT.2008.57
20. Bieniecki W, Grabowski S, Rozenberg W (2007) Image preprocessing for improving OCR
accuracy. In: 2007 international conference on perspective technologies and methods in MEMS
design, Lviv, Ukraine, pp 75–80. https://doi.org/10.1109/MEMSTECH.2007.4283429
Text Summarization Techniques
for Kannada Language
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 107
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_9
108 D. Yogish et al.
1 Introduction
Time is significant in anyone’s life. People are interested in arts, culture, literature,
and other entertainment. However, having less time, they may need more time to read
and understand a complete text, news, or story in any media. But they want to know
a glimpse of what is happening. If the message is conveyed in a summarized way,
it will be a great help. This can work when there is a tool to summarize the content
available. There is a need to create automatic text summarization tools that make it
simple for people to gain insights from the vast amount of data currently circulated
in the digital sphere, most of which is unstructured textual data. At present, there is
instant access to enormous amounts of knowledge. Most of this data is unnecessary
and trivial and might need to be understood as it was meant to be. For instance,
if someone wants to find specific information in an online news article, they may
have to sift through its content and spend a lot of time eliminating unimportant
information before finding what they are looking for. Because of this, it is essential
to use automatic text summarizers, which can extract meaningful information while
excluding irrelevant and unnecessary data. Automatic text summarization can make
documents easier to read, reduce the time spent looking for information, and make
it possible to fit more information into a given space, as discussed by Shilpa and
Kumar [1].
Summarization is also called text-to-text transformation. There are two types of
text summarization: Extractive Summarization and Abstractive Summarization. In
Extractive Summarization, a subset of words that best capture the text’s key ideas
are selected and combined to create a summary. Imagine it as a highlighter that
only picks out the key details from a source text. Some of the sentences from the
document are extracted based on features such as sentence position and length, and
only those sentences appear in the final summary [2]. In Abstractive Summarization,
advanced deep learning techniques are applied to paraphrase and condense the orig-
inal document, just like humans do. Imagine it as a pen that writes original sentences
that might not be found in the source document. The suggested model condenses
a lengthy text document into a brief text in the same language by linguistic inter-
pretation of the text, which is quite challenging. The proposed model can take any
text and reduce it to any number of sentences the user desires by using encoders,
decoders, and sequence-to-sequence to sequence modeling. The TF-IDF model and
Seq2Seq model are used in the proposed model to produce extractive and abstractive
summaries.
Indian languages are classified [3, 4] as Indo-Aryan and Dravidian languages.
Some Indo-Aryan languages include Hindi, Marathi, Konkani, Gujarati, Punjabi,
Bengali, Odia, and Sindhi. Dravidian languages include Kannada, Malayalam, Tamil,
Telugu, Tulu, and many others. Kannada has a rich cultural heritage that spans more
than 2000 years. It has over 50 million speakers worldwide and over 12,000 articles
on Wikipedia and other content-related platforms. There has been a massive increase
in internet usage in the Kannada language in recent years. There is a lot of scope to
work on making the language available in the digital world [5]. Many researchers
Text Summarization Techniques for Kannada Language 109
translate and identify characters, words, and other related activities. It has become
essential to implement summarization for the Kannada language. Unlike in English,
the wording structure changes when the symbol is written by adding parts of speech.
It will be a challenging task to identify the words and figure out the meaning of
it. The research focuses on developing different techniques and comparing them to
summarize the content of the language.
2 Literature Survey
Several studies on automatic text summarization have been conducted for languages
like English and Hindi for several years. However, more work must be done on
abstractive and extractive summarization in Dravidian languages like Kannada. The
section discusses some of the existing works carried out by different researchers.
Geetha and Deepamala [6] produced a summary using the Singular Value Decom-
position (SVD) method. In this paper, extractive summarization has focused on the
most widely used. The broadest strategy is to assess how closely the texts resemble
one another. This technique has the highest accuracy compared to other categories
as per the literature survey carried out in the paper. Evaluation is conducted using
the intrinsic evaluation method. Here, the score ranges from 0 to 1. The authors have
achieved 94% accuracy and 80% precision.
Batra et al. [7] discussed the fundamental ideas and methods of automatic
text summarization, and the methodology used is abstractive method summariza-
tion with NLP. The paper discusses various abstractive text summarization tech-
niques, including recurrent neural networks, extended short-term memory networks,
encoder-decoder models, and pointer generator mechanisms. Here, the issue of
lengthy input text and dependencies is resolved by LSTM. Less than 230,000 training
iterations were completed in 3 days and 4 h of training. The METEOR package was
used to evaluate in both complete and exact match modes.
Dedhia et al. [8] discussed text summarization using RNN models with an attention
mechanism. It also provides a brief overview of the features that should be chosen
during the process. The attention model, the pointer mechanism, and how these
elements combine to produce an abstractive text summary have been briefly focused
on. An abstract method is being used. Both the GRU and the LSTM resolve the
gradient problems. The model could use the pointer mechanism to check the issue
of uncommon words that other modern models have.
Etemad et al. [9] have reviewed deep learning-based abstractive text summariza-
tion in detail. Using deep learning, I created an abstract summary. The two main
issues and challenges are syntactic and semantic of the text summarization. These
two issues are addressed in the paper. The decoder uses the last encoder’s input,
performs encoder-decoder attention, and generates the resulting outputs. Using the
Bayes rule, traditional summarization was divided into two parts. Sequence after
sequence makes use of the model to solve the issue. In this case, it was assumed that
110 D. Yogish et al.
each word in the summary was determined solely by its predecessors and the input
text which was discussed by author Kallimani et al. [10].
Jayashree et al. [11] used three methods for summary generation: crawling,
indexing, and summarization. Creating a Kannada dataset is the first step. The tech-
nique used in this case is keyword extraction. Python language is used to implement
the idea. The HTML markup is removed utilizing the indexing. All words in the
documents have their GSS coefficient and IDF calculated. The method described in
this work can be used to eliminate stop words. An algorithm was developed to elim-
inate stop words; it takes a stop word as input and searches for structurally related
words, adding them to the stop word list.
Embar et al. [12] discussed Information Retrieval for an Indian Regional Language
by Text Summarization. Their focus is to identify the Kannada language in infor-
mation retrieval. In this study, the idea of AutoSum was used to create a Kannada
lexical database in XML format. AutoSum will summarize the text or will highlight
its key points. The summary is produced through the extraction method. The program
uses the UTF-8 encoding. The final summary is based on the score that received the
highest overall. A summary produced by a machine has a higher word count overall
than one made by a human. The conclusion reached is that as the percentage of
summary sentences increases, so does the percentage of common sentences.
Nallapati et al. [13] discussed producing a summary using an abstractive method-
ology. The attentional encoder-decoder is used to perform the task of abstractive
summarization. The source vocabulary can only be 150 K words long, and the target
vocabulary can only be 60 K words long. With one significant exception, the full-
length Rog F1 METRIC used for the Gigaword corpus was used to evaluate the
model. An individual Tesla K40 GPU was used to train the model. A new dataset
was suggested for summarizing multiple sentences, and established benchmarks were
used.
Kallimani et al. [14] implemented a text summarization tool named AutoSum for
the Kannada language. The authors have researched the topic in detail and identified
the proper technique to carry out the research. The tool reads a text article in UTF-
8 format. Keywords are extracted by tagging in the keyword extraction phase and
parsing a text using a lexicon. Sentences are scored by using parameters such as
first line, position, numerical values, and keywords. The short summary is produced
based on the selection of ranked sentences.
3 Proposed Methodology
As specified in the introduction section of the paper, text summarization is the process
of producing the outline of the text read or processed, which is very much required
in the present online world. Many methods are used for summarization, such as if
it is carried out manually using pen and paper. The person who reads the document
would summarize the text based on the person’s individual choice and observations.
There are standards followed during the summarization. However, there is not much
Text Summarization Techniques for Kannada Language 111
focus or many methods for text summarization from an algorithmic perspective with
respect to Indian languages. The paper focuses on text summarization techniques.
Text summarization is carried out in two methods. The two methods are Extractive
Summarization and Abstractive Summarization [15].
Natural Language Processing based research on Indian languages started many
years back. But it is not widely spread because of many reasons. Among the important
reasons is the scarcity of data in any form or format on an online platform [5]. In
other words, the data set for the research is not widely available to the researchers.
It applies to all the Indian languages. It is also applicable to the Kannada language.
Hence, collecting and generating the labeled dataset was challenging, and it was
used to fit the deep learning model for an abstract summary generation [16]. The
research has been carried out on the available dataset for text summarization. The
dataset used for this research has around 5,000 rows and focuses on major categories
such as cinema data. The data set is read from the given file and passed through the
various phases. Data is then preprocessed by removing unnecessary characters using
stop-words and other irrelevant data. The data set will be processed with the specified
algorithms, and the results will be analyzed in the paper in the further sections.
The first method is the Extractive Summarization to summarize the text. This is
also called the extraction technique applied to the text to summarize. It focuses
on the important keywords in the given text based on the weights assigned. The
Extractive Summarization extracts essential terms from Kannada text texts using a
combination of GSS (Galavotti, Sebastiani, and Simi) coefficients and IDF (Inverse
Document Frequency) along with TF (Term Frequency) approaches for the extraction
of keywords [17]. The extracted characters are then used for summary generation.
This helps to build the summary text appropriately. The underlying technique assigns
a certain weight for each word in the sentence depending on the occurrence of the
words. The weight of the sentence is calculated by combining or adding the weights
of each word in a specific sentence. The top “m” sentences will be selected depending
on the sentence rating.
Initially, the data is loaded for recognition. After the text preprocessing phase,
the weights of each word in the given text are added, and the values are calculated
using the TF-IDF technique [18]. Calculate the sentence’s score and threshold value
based on the words’ weights. Depending on the fixed threshold value, the word will
be extracted. The process of extractive summarization is explained in Fig. 1.
112 D. Yogish et al.
Decoder Model: In the decoder stage, the target sentences are word-by-word decoded
or predicted using the decoder model. The input from the decoder indicates the subse-
quent word, which is subsequently passed into the following layer for the prediction.
The two terms ‘<start>’ (start of target sentence) and ‘<end>’ (end of target sentence)
provide the model with information about the beginning variable that will be used to
forecast the following word and the finishing variable that will be used to determine
the conclusion of the sentence. Initially, the word ‘<start>’ is given to the model
during training, and it then predicts the subsequent word, which is the target data for
the decoder. The decoded word is then fed back to get the next word prediction. In
the same way, a word-by-word output summary will be generated with the help of a
reference summary, which will be passed on to the decoder.
The working of Abstractive Summarization is shown in Fig. 3. The process has
been divided into two phases: the training phase and the testing phase. Text written in
Kannada is given as input to the model during the training phase. After preprocessing,
training data will go through the encoder unit of the system and identify the key or
important words in the input sentence, and the system’s decoder will be used in the
summarization process. The LSTM model will work on the input, and encoded data
will be stored. During the testing phase, the detailed input is read and encoded with the
LSTM model available, and a summary with the newer words is generated as mapped
or identified by the model. This is carried out during the inference phase based on the
encoded values generated, and it will be mapped with the decoding values generated.
Matching with the threshold value will help summarize the content given.
display the result on the screen. The application is named ‘Saramsha,’ which means
summary in Kannada.
Table 1 gives the ROUGE (Recall Oriented Understudy for Gisting Evaluation)
score for the techniques implemented, which is calculated on RECALL and PRECI-
SION. The ROUGE score or ROUGE metric is the similarity between the reference
value and value generated by the algorithms implemented. It helps to know the differ-
ence between the human-calculated values and values generated by machine learning
algorithms. Algorithms are executed for different texts to check the efficiency of the
techniques discussed. It can be observed from the table that the scores of the simi-
larity are in good range and indicate that the methods implemented are appropriate
and working properly for the given text for summarization. Both summaries have
been compared, and the ROUGE values for the comparison are shown. For the testing
data, the ROUGE 1 value was found to be 39%, The ROUGE 2 value was found to
be 37%, the ROUGE L value was found to be 36%, and the Overall average ROUGE
value was found to be 36%. These ROUGE values are found to be acceptable as they
are around the default range values given.
Figures 4 and 5 show the working (input and output) of Extractive Summarization
and Abstractive summarization techniques. The extractive summary was generated
for the text given by the user in the application GUI. The model took less than a
minute to generate the summary after giving the input in Kannada. The text is on
film-related information. It is about Kannada film actors Upendra and Ramgopal
Verma joining together for their new movie on the life story of Muthappa Rai. The
same input text is given for both summarization techniques. As explained earlier,
weights are assigned to every word in the sentence. This made the nouns and other
important keywords assigned with higher weights. The words that have more weight
are considered during summarization, and accordingly, the text is summarized. This
is proven in the results obtained. The summarized text will give the required and
valid information. Extractive Summarization Result is shown in Fig. 4.
The user in the Application GUI generated the abstractive summary from the given
text in Kannada language. The input text remains the same for both techniques. It
has identified the noun words as important keywords as per the logic built, and the
words encoded in part of the logic have helped identify these words appropriately.
Once the keywords are fed, the subsequent appropriate words that have relevance in
the summarization are predicted. This extracts the important words and removes the
words that are not relevant to explain the context in the summarized text: the input
text and the summarized text for abstractive summary, as shown in Fig. 5.
116 D. Yogish et al.
5 Conclusion
References
1. Shilpa GV, Shashi Kumar DR (Aug 2019) Abs-Sum-Kan: an abstractive text summarization
technique for an Indian regional language by induction of tagging rules. Int J Recent Technol
Eng (IJRTE) 8(2S8). ISSN: 2277-3878
2. Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) Text
summarization techniques: a brief survey
3. Dhanya PM, Jathavedan M (2013) Comparative study of text summarization in Indian
languages. Int J Comput Appl 75(6)
4. Sharma A, Mithun BN (2023) Deep learning character recognition of handwritten Devanagari
script: a complete survey. In: 2023 IEEE international conference on contemporary computing
and communications (InC4), Bangalore, India, pp 1–6. https://doi.org/10.1109/InC457730.
2023.10263251
5. Yogish D, Manjunath TN, Hegadi RS (2019) Review on natural language processing trends
and techniques using NLTK. In: Santosh K, Hegadi R (eds) Recent trends in image processing
and pattern recognition. RTIP2R 2018. Communications in computer and information science,
vol 1037. Springer, Singapore. https://doi.org/10.1007/978-981-13-9187-3_53
6. Geetha JK, Deepamala N (2015) Kannada text summarization using latent semantic analysis.
978-1-4799-8792-4/15/31.00 ©2015 IEEE
7. Batra P, Chaudhary S, Bhatt K, Varshney S (2020) A review: abstractive text summariza-
tion techniques using NLP. In: 2020 international conference on advances in computing,
communication & materials (ICACCM). IEEE, pp 23–28. 978-1-7281-9785-2/20/31.00 ©2020
IEEE
8. Dedhia PR, Pachgade HP, Malani AP, Raul N, Naik M (2020) Study on abstractive text
summarization techniques. In: 2020 international conference on emerging trends in informa-
tion technology and engineering (ic-ETITE), pp 1–8. IEEE. 978-1-7281-4142-8/31.00 ©2020
IEEE
9. Etemad AG, Abidi AI, Chhabra M (2021) A review on abstractive text summarization using
deep learning. 978-1-6654-1703-7/21/31.00 ©2021 IEEE
10. Kallimani JS, Srinivasa KG, Eswara Reddy B (2010) Information retrieval by text summariza-
tion for an Indian regional language. 978-1-4244-6899-7/10/26.00 ©2010 IEEE
11. Jayashree R, Srikanta M, Sunny K (2011) Document summarization in Kannada using keyword
extraction. https://doi.org/10.5121/csit.2011.1311
12. Embar VR, Deshpande SR, Vaishnavi AK, Jain V, Kallimani JS (2013) sArAmsha—a Kannada
abstractive summarizer. 978-1-4673-6217-7/13/31.00 c 2013 IEEE
13. Nallapati R, Zhou B, dos Santos C (26 Aug 2016) Abstractive text summarization using
sequenceto-sequence RNNs and beyond. arXiv:1602.06023v5 [cs.CL]
14. Kallimani JS, Srinivasa KG, Eswara Reddy B (2010) Information retrieval by text summariza-
tion for an Indian regional language. IEEE
15. Andhale N, Bewoor LA (2016) An overview of text summarization techniques. In: 2016 inter-
national conference on computing communication control and automation (ICCUBEA). IEEE,
pp 1–7
16. Etemad AG, Abidi AI, Chhabra M (2021) A review on abstractive text summarization using
deep learning. In: 2021, the 9th international conference on reliability, infocom technologies
and optimization (trends and future directions) (ICRITO). IEEE, pp 1–6
17. Swamy A, Srinath S (2019) Automated Kannada text summarization using sentence features.
Int J Recent Technol Eng (IJRTE) 8(2). ISSN: 2277-3878
18. Yogish D, Manjunath TN, Hegadi RS (Sep/Oct 2020) Ranking top similar documents for
user query based on normalized vector cosine similarity model. J Comput Theor Nanosci
17(9–10):4468–4472(5)
Parkinson’s Detection From Gait Time
Series Classification Using LSTM Tuned
by Modified RSA Algorithm
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 119
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_10
120 M. Zivkovic et al.
1 Introduction
2 Related Works
f = σ (W f xt + U f h t−1 + b f )
. t (1)
Within this framework, . f t stands for the forget gate, .xt signifies its inputs at time
step .t, .h t−1 represents the previous hidden state, and .W f and .U f refer to the weight
coefficients linked to these inputs, with .b f representing the bias vector.
The selection of which upcoming data should be kept in the cell state is facilitated
by a distinct sigmoid function referred to as the input gate (.i t ), as outlined in Eq. (2).
i = σ (Wi xt + Ui h t−1 + bi )
. t (2)
The weight coefficients associated with this process are represented by .Wi and .Ui ,
with the bias denoted as .bi .
A group of supplementary potential resolutions is generated using a hyperbolic
tangent (.tanh) layer, as per Eq. (3).
o = σ (Wo xt + Uo h t−1 + bo )
. t (5)
h = ot tanh(Ct )
. t (6)
Computer science has experienced a notable sudden increase in interest in the realm
of model optimization in the past few years. The growing complexity of models and
the increasing number of hyperparameters in modern algorithms have underscored
the necessity for the development of automated techniques. While model optimiza-
tion was traditionally approached through empirical methods, it has now become
essential to address this challenge systematically. However, this poses a significant
challenge because the pursuit of optimal parameters often involves navigating a com-
plex landscape of both discrete and continuous values, resulting in a mixed NP-hard
problem that profoundly impacts model performance.
Metaheuristic optimization algorithms belong to a powerful category of tech-
niques that excel at solving NP-hard problems while doing so in practical time
periods and using acceptable computational resources. These methods are capable of
enhancing the performance by considering the parameter selection process as an opti-
mization problem. One notable sub-group of metaheuristics is swarm intelligence,
inspired by the cooperative behaviors observed in natural groups and employing
these principles to efficiently carry out optimization challenges. The most popu-
lar swarm intelligence algorithms include Harris Hawks optimization (HHO) [10],
genetic algorithm (GA) [22], and particle swarm optimizer (PSO) [19], among many
others.
These approaches, along with methods built upon their principles, have found
application in diverse domains and have demonstrated encouraging outcomes. Promi-
nent examples of utilizing metaheuristics for addressing optimization challenges
include their applications such as predictions of crude oil prices [15], gold prices
[14], energy generation and consumption [2, 4, 29], cryptocurrencies trends pre-
diction [23, 28], industry 4.0 [8, 16], medicine [6, 35], computer systems security
[17, 25, 37, 38], cloud and edge computing [5, 36], and environment monitoring
task [3, 18].
124 M. Zivkovic et al.
3 Methods
The inspiration for the RSA metaheuristic comes from the reptile known as crocodile
and the algorithm was first proposed by Abualigah et al. [1]. These reptiles hunt
in twofold walking for prey encircling and attack coordinate which translates to
exploration and exploitation phases, respectively.
The random solution population .xi, j is outlined by the matrix . X in Eq. 7 while
the Eq. 8 describes how this population is initialized.
⎡ ⎤
x1,1 · · · x1, j x1,n−1 x1,n
⎢ x2,1 · · · x2, j ··· x2,n ⎥
⎢ ⎥
⎢ ··· ··· xi, j ··· ··· ⎥
⎢ ⎥
.X = ⎢ . .. .. .. .. ⎥ (7)
⎢ .. . . . . ⎥
⎢ ⎥
⎣ x N −1,1 · · · x N −1, j · · · x N −1,n ⎦
x N ,1 · · · x N , j x N ,n−1 x N ,n
x = rand × (U B − L B) + L B, j = 1, 2, . . . , n
. ij (8)
where the solution’s index is .x, the current location . j, number of potentials . N ,
dimension size .n, while the .rand is a random value from the range from 0 to 1, and
the lower and upper limits are standardly . L B and .U B.
Equation 9 describes two different exploration strategies exhibiting different walk-
ing techniques which are the elevated walking in case of .t ≤ T4 and stomach walking
for .t > T4 and t ≤ 2 T4 .
Best j (t) × −η(i, j) (t) × β − R(i, j) (t) × rand, t ≤ T4
x
. (i, j) (t + 1) = (9)
Best j (t) × x(r1 , j) × E S(t) × rand, t > T4 and t ≤ 2 T4
η
. (i, j) = Best j (t) × P(i, j) (10)
here the location . j of the best reptile currently is shown as .Best j , the ongoing round
is .t, and the maximal count of rounds is .T .
The hunting operator .η(i, j) is described by Eq. 10 in which the difference of per-
centage from. jth position of the current solution and. jth position of the best-calculated
solution is represented as . P(i, j) described in Eq. 13, while a sensitive parameter in
control of exactness of exploration .β is a set .0.1.
The searching space is contracted by the depletion function in Eq. 11 in which
the .r1 is an arbitrary number inside .[1, N ], .i th solution random location is the .xr 1, j ,
while the . shows a little value.
1
. E S(t) = 2 × r2 × 1 − (12)
T
x(i, j) − M (xi )
. P(i, j) = α + (13)
Best j (t) × U B( j) − L B( j) +
for the .α of .0.1 that is responsible for fluctuation control in cooperation during the
hunt, while the locations of . jth for lower and upper boundaries are . L B( j) and .U B( j) ,
in order.
The mean .i-th solution . M(xi ) is described by Eq. 14.
n
1
. M (xi ) = x(i, j) (14)
n j=1
Although the original RSA is a comparatively recent algorithm that draws inspiration
from natural processes, empirical evaluations using recognized CEC benchmark
functions [21] show a flaw in the algorithm: it shows a tendency to persist in local
minima. Thus, the exploration skills of the RSA are present as less than desired.
To address this shortcoming of the fundamental RSA, various modifications have
been introduced to enhance its exploration capabilities. These modifications involve
the incorporation of genetic operators from genetic algorithms (GA) [22], ultimately
resulting in an enhancement of the overall algorithm’s performance.
Following each round of execution in the modified algorithm, a fresh individual is
produced by merging the current best solution with a solution selected at random from
the populace. This procedure involves the utilization of a control parameter referred
to as . pc which signifies the uniform crossover rate. The value of this parameter is
established through empirical means and is fixed at . pc = 0.1.
Each parameter of this solution is mutated, steered by the mutation parameter .mp.
The parameter value is practically obtained and is.mp = 0.1. Mutation is executed by
producing an arbitrary value from a specified range, defined as .[ L2B , U2B ]. The direc-
tion of the mutation, directing if the chosen value is to be added or subtracted from the
126 M. Zivkovic et al.
4 Experiments
4.1 Dataset
1 https://physionet.org/content/gaitpdb/1.0.0/.
Parkinson’s Detection From Gait Time Series Classification Using LSTM … 127
Several notable metaheuristics algorithms are assigned the task of enhancing the
performance of LSTM neural networks. This optimization process takes into account
both architectural and training parameters. The algorithms used for comparison,
alongside the proposed RSAGO, were original RSA, particle swarm optimization
(PSO) [19], firefly algorithm (FA) [33], Harris Hawks optimization (HHO) [10],
brain storm optimization (BSO) [26], and crayfish optimization algorithm (COA)
[13]. The contending algorithms were implemented with the recommended values
of their control parameters, as defined by their respective creators.
To guarantee a fair assessment, all the optimization methods were implemented
in uniform conditions. The populace extent is limited to consist of 5 individuals.
Each algorithm is granted just 6 rounds for improving the population’s quality, as the
experiments are computationally demanding. To warrant the statistical vigor of the
results, the simulations are executed across 30 independent runs, accounting for the
inherent randomness associated with heuristic methods. The LSTM hyperparameters
put through the tuning procedure, with their search limits, are as follows:
4.3 Metrics
where .co and .ce denote arrays of observed and expected classification values.
128 M. Zivkovic et al.
Fig. 1 Distribution of results for objective function (box plot diagrams) and Cohen’s kappa indicator
(violin plot) over 30 independent runs
Parkinson’s Detection From Gait Time Series Classification Using LSTM … 129
Similar results are observed concerning the indicator function, specifically Cohen’s
Kappa in this study, as presented in Table 2. The suggested RSAGO algorithm dis-
plays the top performance when considering the single best run, as well as mean
and median values. However, for the best-worst value over the course of 30 runs,
the PSO algorithm stands out once again. COA metaheuristics obtained the steadiest
outcomes, indicated by the smallest values of standard deviation and variance. This
pattern of high stability is once again reinforced through the visualization presented
in Fig. 1.
Comprehensive comparisons of the top-performing LSTM structures produced by
every optimization algorithm are presented in Table 3. These comparisons encompass
detailed metrics that include precision, recall, and F1-score. The proposed LSTM-
RSAGO obtained the best scores for almost all metrics observed and achieved supe-
rior accuracy of 89.41%.
To guarantee the reproducibility of experiments, the chosen parameters for the top-
performing LSTM using each metaheuristics algorithm are documented in Table 4.
This information is valuable for potential independent replications of the research.
Finally, Fig. 1 shows the box and violin plots, and Fig. 2 presents convergence dia-
grams of all regarded algorithms for objective and indicator, respectively. It can be
noted that some methods tend to concentrate their efforts on suboptimal areas of
the search space for extended durations, while the proposed RSAGO excels by tran-
scending a relatively unfavorable starting position and surpassing all other algorithms
in the optimal execution scenario.
A comprehensive analysis of the best produced model is presented through ROC
and PR curves, as illustrated in Fig. 3. Comparisons of the classification confusion
matrix and joint plots of the indicator objective function for the top-performing
LSTM-RSAGO structure are also depicted in Fig. 4.
130 M. Zivkovic et al.
Table 4 Paramaters
Model Learning Dropout Epochs Layers L1 neurons L2 neurons
rate
LSTM- 0.010000 0.200000 60 2 15 15
RSAGO
LSTM-RSA 0.010000 0.161472 58 1 15 N/a
LSTM-PSO 0.008677 0.067101 60 2 15 15
LSTM-FA 0.010000 0.200000 58 2 13 10
LSTM- 0.010000 0.200000 59 2 15 5
HHO
LSTM-BSO 0.008986 0.106216 51 1 15 N/a
LSTM- 0.005511 0.052002 60 1 15 N/a
COA
Parkinson’s Detection From Gait Time Series Classification Using LSTM … 131
Fig. 2 Convergence graphs of all methods included in comparative analysis for objective function
and Cohen’s kappa indicator over 6 iterations
Fig. 3 Macro-micro receiver operating characteristics (ROC) and precision-recall (PR) curves of
best performing model obtained in simulations
Fig. 4 Confusion matrix and objective-Cohen’s kappa joint plot diagram for best generated LSTM
model
132 M. Zivkovic et al.
5 Conclusion
This study delved into the potential of the recently developed RSA metaheuristics
for optimizing hyperparameters of neural networks. The algorithm was employed
to fine-tune LSTM parameters, aiming to achieve optimal performance in the early
detection of Parkinson’s illness in patients’ gait while they walk. Several cutting-
edge metaheuristics underwent a comparative analysis, conducted under uniform
conditions. To enhance the original algorithm, a modified variant called the RSAGO
algorithm was suggested, combining genetic operators taken from GA. The sug-
gested method yielded the best results, as the highest-performing model achieved an
accuracy of 89.41%, demonstrating promising potential for early neurodegenerative
disease identification through this proposed method.
There are specific constraints within this study. It involves a limited comparison of
optimization algorithms and exclusively explores the potential of LSTM. Moreover,
regarding the computational resource constraints, smaller populations and a restricted
count of iterations were employed. Future research will prioritize the refinement of
the proposed approach, assessing alternative methods for classifying sequential data.
References
1. Abualigah L, Abd Elaziz M, Sumari P, Geem ZW, Gandomi AH (2022) Reptile search algorithm
(RSA): a nature-inspired meta-heuristic optimizer. Expert Syst Appl 191:116158
2. Bacanin N, Jovanovic L, Zivkovic M, Kandasamy V, Antonijevic M, Deveci M, Strumberger
I (2023) Multivariate energy forecasting via metaheuristic tuned long-short term memory and
gated recurrent unit neural networks. Inf Sci, 119122
3. Bacanin N, Sarac M, Budimirovic N, Zivkovic M, AlZubi AA, Bashir AK (2022) Smart wireless
health care system using graph LSTM pollution prediction and dragonfly node localization.
Sustain Comput: Inform Syst 35:100711
4. Bacanin N, Stoean C, Zivkovic M, Rakic M, Strulak-Wójcikiewicz R, Stoean R (2023) On
the benefits of using metaheuristics in the hyperparameter tuning of deep learning models for
energy load forecasting. Energies 16(3):1434
5. Bacanin N, Zivkovic M, Bezdan T, Venkatachalam K, Abouhawwash M (2022) Modified
firefly algorithm for workflow scheduling in cloud-edge environment. Neural Comput Appl
34(11):9043–9068
6. Bezdan T, Zivkovic M, Bacanin N, Chhabra A, Suresh M (2022) Feature selection by hybrid
brain storm optimization algorithm for covid-19 classification. J Comput Biol 29(6):515–529
7. Checkoway H, Lundin JI, Kelada SN (2011) Neurodegenerative diseases, no 163. IARC sci-
entific publications, pp 407–419
8. Dobrojevic M, Zivkovic M, Chhabra A, Sani NS, Bacanin N, Amin MM (2023) Address-
ing internet of things security by enhanced sine cosine metaheuristics tuned hybrid machine
learning model and results interpretation based on shap approach. PeerJ Comput Sci 9:e1405
9. Godkin FE, Turner E, Demnati Y, Vert A, Roberts A, Swartz RH, McLaughlin PM, Weber
KS, Thai V, Beyer KB et al (2022) Feasibility of a continuous, multi-sensor remote health
monitoring approach in persons living with neurodegenerative disease. J Neurol, 1–14
10. Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H (2019) Harris hawks optimiza-
tion: algorithm and applications. Futur Gener Comput Syst 97:849–872
11. Hochreiter S (1991) Studies on dynamic neural networks. Master’s thesis, Institute for Com-
puter Science, Technical University, Munich, vol 1, pp 1–150
Parkinson’s Detection From Gait Time Series Classification Using LSTM … 133
31. Von Coelln R, Gruber-Baldini A, Reich S, Armstrong M, Savitt J, Shulman L (2021) The
inconsistency and instability of Parkinson’s disease motor subtypes. Park Relat Disord 88:13–
18
32. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol
Comput 1(1):67–82
33. Yang XS, Slowik A (2020) Firefly algorithm. In: Swarm intelligence algorithms. CRC Press,
pp 163–174
34. Yogev G, Giladi N, Peretz C, Springer S, Simon ES, Hausdorff JM (2005) Dual tasking, gait
rhythmicity, and Parkinson’s disease: which aspects of gait are attention demanding? Eur J
Neurosci 22(5):1248–1256
35. Zivkovic M, Bacanin N, Antonijevic M, Nikolic B, Kvascev G, Marjanovic M, Savanovic N
(2022) Hybrid CNN and xgboost model tuned by modified arithmetic optimization algorithm
for covid-19 early diagnostics from x-ray images. Electronics 11(22):3798
36. Zivkovic M, Bezdan T, Strumberger I, Bacanin N, Venkatachalam K (2021) Improved Harris
Hawks optimization algorithm for workflow scheduling challenge in cloud–edge environment.
In: Computer networks, big data and IoT: proceedings of ICCBI 2020. Springer, pp 87–102
37. Zivkovic M, Petrovic A, Venkatachalam K, Strumberger I, Jassim HS, Bacanin N (2022) Novel
chaotic best firefly algorithm: Covid-19 fake news detection application. In: Advances in swarm
intelligence: variations and adaptations for optimization problems. Springer, pp 285–305
38. Zivkovic M, Tair M, Venkatachalam K, Bacanin N, Hubálovskỳ Š, Trojovskỳ P (2022) Novel
hybrid firefly algorithm: an application to enhance xgboost tuning for intrusion detection clas-
sification. PeerJ Comput Sci 8:e956
39. Zivkovic T, Nikolic B, Simic V, Pamucar D, Bacanin N (2023) Software defects prediction
by metaheuristics tuned extreme gradient boosting and analysis based on Shapley additive
explanations. Appl Soft Comput 146:110659
Human Action Recognition Using Depth
Motion Images and Deep Learning
1 Introduction
M. Gupta · A. Jalan
DST-Centre for Interdisciplinary Mathematical Sciences, Institute of Science, Banaras Hindu
University, Varanasi, India
M. Gupta (B)
Department of Computer Science, Institute of Science, Banaras Hindu University, Varanasi, India
e-mail: manjari@bhu.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 135
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_11
136 M. Gupta and A. Jalan
lighting, and the complexity of backgrounds [2]. Identifying precise motion or human
body locations in images in complex scenarios is particularly difficult. Moreover,
color images often lack crucial depth cues for accurate action recognition, mainly
when actions occur directly in front of the camera [1].
To tackle these obstacles, research in Human Activity Recognition (HAR) has
increasingly focused on data obtained from depth sensors, with the Kinect device
[3] being a notable example. These sensors provide various features derived from
either depth data or skeletal information. A depth map can be described as a two-
dimensional grid where the horizontal (x) and vertical (y) dimensions align with
the rows and columns of a standard image. However, instead of representing pixel
intensities, each element (pixel) of the array contains depth readings (z values),
offering different information. It is like a grayscale image, except the z information
(float—32 bytes) replaces the intensity information [4].
In this paper, we proposed a novel method for addressing the challenges of
Human Action Recognition. Our methodology is centered around the concept of
Depth Motion Image descriptors (DMI) extracted from the MSRAction3D dataset
[5]. It records variations in the depth of moving body parts [6]. The DMI imparts
unique characteristics for each action, thus simplifying extracting features for the
Convolutional Neural Networks (CNNs) [7] model. Our method strives to narrow
the gap between traditional and depth-based Human Action Recognition (HAR)
techniques, offering a flexible and resilient framework that can effectively enhance
existing human action recognition capabilities.
2 Related Work
Human action recognition has been widely used in virtual surveillance, human–
computer interaction, and robotics. The work in [8] focused on solving the problem
of capturing the complex joint shape motion cues at the pixel level. They also used
histogram-described depth sequences to capture the distribution of the surface’s
normal orientation in the 4D space of time, depth, and spatial coordinates. In [9], they
used an action graph to formally model the dynamics of the actions and a bag of 3D
points to characterize a set of prominent postures that correspond to the nodes in the
action graph using depth maps. The author in [6] presented a method (Action-Fusion)
for human action recognition from depth maps and posture data using convolutional
neural networks (CNNs). They designed Depth Motion Images and Moving Joint
Descriptor descriptors using depth maps and skeleton joint data, respectively. They
created three separate channels to feed these descriptors and finally fused all of them
to get classification results.
Our proposed work used the Depth Motion Images (DMI) generated from the
sequence of depth maps. The descriptor is then fed into our proposed neural network
architecture based on CNNs to get classification results.
Human Action Recognition Using Depth Motion Images and Deep … 137
3 Methodology
Table 1. demonstrates the pseudo-code designed to read and display depth maps
from the MSRAction3D dataset. It consists of several functions, including ‘load-
DepthMap’ for reading depth maps from binary files, ‘readHeader’ to interpret
the file’s header data, ‘showDepthMap’ to display individual depth maps, and
In Eq. (1), I(i, j, t) represents the pixel position (i, j) in the frame I at time t. DMI is
depicted as an 8-bit grayscale image, signifying the depth difference between frame
number k and k + N − 1, where N is the total number of frames. The pixel value in
the DMI image is the minimum value for the corresponding pixel position across the
depth map sequence. The resulting image is normalized by dividing each pixel value
by the maximum value among all pixels. Additionally, uninformative black pixels
are removed by cropping the Region of Interest (ROI) as illustrated in Fig. 3.
Table 2 Demonstrates the pseudo-code that explains the process of generating
Depth Motion Image (DMI) from an array of depth maps. It starts by determining
the total depth maps in the input ‘DepthSequence’ and initializes an empty ‘ydmi’
matrix to store the DMI. It then iterates pixel by pixel in a 240 × 320 grid, where
240 rows and 320 columns are commonly used dimensions for depth maps for suffi-
cient information content and computational efficiency. It calculates the minimum
depth value at each position across all depth maps. The resulting DMI is inverted,
normalized, and visualized as an image. The DMI is visualized after setting the
conventional y-axis orientation, defining axis limits, applying a grayscale colormap,
and adding a color bar. This algorithm effectively creates a DMI image of changing
depth information over time, which is helpful in various computer vision applications
Figure 4 displays Depth motion images for action classes like “High arm wave”,
“Horizontal arm wave”, “Hand clap”, “Two hand wave”, “Forward kick”, “Golf
swing”.
140 M. Gupta and A. Jalan
without an activation function, as a soft activation is applied during both training and
inference stages. Overall, our CNNs architecture provides a robust framework for
human action recognition, capable of spotting fine details and facilitating accurate
classification.
We chose the MSRAction3D [5] dataset to evaluate the performance of our proposed
method. The dataset provides depth data suitable for constructing Depth Motion
Image Descriptors [6]. Then, our model is trained with the descriptors to obtain
classification results.
Table 3. represents the classification accuracy of our proposed Convolutional
Neural Networks (CNNs) model at different training epochs. The highest training
accuracy of 93.42% was achieved during the 100th epoch, indicating the model’s
proficiency in learning from the training data. The validation accuracy is 82.81%,
the highest during the 50th epoch, showing how the model is efficiently generalized.
The results show the efficiency of our proposed model based on the Convolutional
Neural Networks model.
Table 4 displays various evaluation metrics we used to evaluate our proposed
Convolutional Neural Networks (CNNs) model across different training epochs on
the MSRAction3D dataset. These metrics include Precision [10], Recall [10], and
F1-Score [10] which offers valuable inferences about the model’s performance and
its ability to classify actions accurately. The highest precision, reaching 84.19%, was
observed at the 100th training epoch, highlighting the model’s capability to make
precise predictions. Similarly, the highest recall, 82.47%, and F1-Score, 81.96%,
were achieved during the 50th epoch.
Table 5 shows the comparison results with existing state-of-the-art methods based
on Depth Map methods only.
To improve Human Action Recognition (HAR), we closely examined how our
model improved over time through different training stages, known as epochs. We
used various plots to visualize the results. Figure 6 shows how well the model learned
Table 4 Performance
Epochs Precision Recall F1-score
metrics for CNNs model for
various epochs on 10 46.06 48.67 45.21
MSRAction3D dataset 15 53.22 47.79 42.40
20 52.15 49.56 45.71
30 69.76 69.03 66.63
50 83.50 82.47 81.96
100 84.19 80.01 80.45
Max 84.19 82.47 81.96
Table 5 Comparison of our proposed method with existing depth-based methods on MSRAction3D
dataset
Authors Dataset Feature extraction Classification Accuracy (%)
algorithm
Kamel Microsoft action 3-D CNNs CNNs MSRAction3D
et al. [6] dataset = 94.51
(MSRAction3D), UTD-MHAD =
University of Texas 88.14
at Dallas-multimodal MAD = 91.86
human action dataset,
multimodal action
dataset
(MAD)—Public
datasets
Li et al. Action set Silhouette based Non-Euclidean 74.70
[9] action recognition Relational Fuzzy
where the (NERF) C-Means
external contour and the dissimilarity
of the silhouettes between two depth
and holistic maps were
motion were used calculated as the
as the feature to Harsdorf distance
characterize the between the two sets
posture of the sampled 3D
points
Oreifej MSR-Daily Activity A histogram of SVM MSR-Daily
et al. [8] Dataset, oriented 4D Activity Dataset
MSRAction3D, MSR surface normals = 96.67,
Gesture 3D (HON4D) MSRAction3D
= 88.89, MSR
Gesture 3D =
92.45
Proposed MSRAction3D CNNs CNNs 82.81
method
144 M. Gupta and A. Jalan
from the training data, with the highest point showing it reached an accuracy of
93.42%, demonstrating its skill in recognizing actions. With an accuracy of 82.81%,
It highlighted the model’s ability to make predictions on new unseen data. We also
investigated how the model refined its predictions via training and expressed this in
training and validation loss graphs. These graphs assist us in comprehending how the
model adjusts its performance as it progresses through the learning process. We also
employed performance indicators such as precision, recall, and the F1-Score [10]
to assess how well the model recognizes actions correctly and thoroughly. Figure 7
depicts a confusion matrix that gave additional insight into how the model classified
activities. The Receiver Operating Characteristic (ROC) curve and Area Under the
Curve (AUC) [11] value in Fig. 8 are also significant because they demonstrate the
model’s performance by demonstrating the trade-off between the true positive rate
and the false positive rate. With these visual representations, we demonstrated how
our model improved over time and its ability to recognize human actions accurately.
These graphs demonstrate the value of our approach in the human action recognition
field.
Our study used MATLAB [12] to generate Depth Motion Image Descriptors
(DMIs) from our dataset. Furthermore, the proposed action recognition model was
trained and evaluated using Google Colab [13], a cloud-based platform, to efficiently
harness the computational resources necessary for complex deep learning models.
The experimental results show the effectiveness of our proposed model. The
highest training accuracy of 93.42% achieved during the 100th training epoch demon-
strates the model’s proficiency in learning from the training data. The validation
accuracy reached 82.81% during the 50th epoch, highlighting how well the model
is generalized on unseen data. These results indicate that our model can effectively
classify human actions based on depth motion images and performs well in training
and validation phases.
We also evaluated our model’s performance using various evaluation metrics,
including Precision, Recall, and F1-Score. The model demonstrated its ability to
make precise predictions with a Precision of 84.19% and effectively recall actions
with an 82.47% Recall, resulting in an 81.96% F1-Score during the 50th training
epoch. These metrics infer the model’s performance, focusing on accuracy.
While comparing our proposed method, having an accuracy of 82.81% with
existing state-of-the-art methods based on Depth Map techniques, it may not outper-
form some methods like “Action-fusion,” which has an accuracy of 94.51%, but still,
our approach provides a robust alternative for action, recognition. It is important
to note that the action-fusion method utilizes multiple information channels to feed
in CNN architecture for efficient classification. In contrast, our method uses only
depth motion images, which makes it a considerable choice when accounting for
parameters like computational efficiency and resource constraints.
Human Action Recognition Using Depth Motion Images and Deep … 145
Fig. 6 Plot of training accuracy and validation accuracy (Left) and plot of training loss and
validation loss (Right) for 50th and 100th epoch
146 M. Gupta and A. Jalan
Fig. 7 Confusion matrix [14] of our proposed method for the MSRAction3D dataset for 50th and
100th epoch
Human Action Recognition Using Depth Motion Images and Deep … 147
Fig. 8 Receiver operating characteristic (ROC) [11] curve for CNN model’s action recognition
performance on MSRAction3D dataset
5 Conclusion
In conclusion, our research provides an efficient solution for addressing the limi-
tations of traditional Human Action Recognition (HAR) methods. By harnessing
depth data from the MSRAction3D dataset and introducing the innovative Depth
Motion Image Descriptor (DMI) in conjunction with Convolutional Neural Networks
(CNNs), we achieved impressive training and validation accuracies of 93.42% and
82.81%, respectively. Our proposed model addresses the challenges imposed due to
variations in lighting, clothing colors, and complex backgrounds. It simplified the
way for developing more robust and reliable systems with broad applications in video
surveillance, healthcare, and human–computer interaction by enhancing the accuracy
and versatility of HAR in real-world scenarios. In the future, we focus on refining the
model’s accuracy and extending its capabilities to handle a broader range of actions
and diverse environmental conditions. Also, we will explore real-time applications
and address scalability for larger datasets for future research.
148 M. Gupta and A. Jalan
References
1. Zhang S, Wei Z, Nie J, Huang L, Wang S, Li Z (2017) A review on human activity recognition
using vision-based methods. J Healthc Eng
2. Aggarwal JK, Xia L (2014) Human activity recognition from 3d data: a review. Pattern Recogn
Lett 48:70–80
3. Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor:
a review. IEEE Trans Cybern 43(5):1318–1334
4. The 3D image—depth maps: https://users.cs.cf.ac.uk/dave/Vision_lecture/node9.html.
Accessed 31 Oct 2023
5. Dr. Wanqing Li Profile site. https://uowmailedumy.sharepoint.com/personal/wanqing_uow_
edu_au/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fwanqing%5Fuow%5Fedu%5Fau%
2FDocuments%2FResearchDatasets%2FMSRAction3D&ga=1. Accessed 31 Oct 2023
6. Kamel A, Sheng B, Yang P, Li P, Shen R, Feng DD (2018) Deep convolutional neural networks
for human action recognition using depth maps and postures. IEEE Trans Syst Man Cybern
Syst 49(9):1806–1819
7. Wu J (2017) Introduction to convolutional neural networks. National Key Lab for Novel
Software Technology. Nanjing University. China, vol 5(23), p 495
8. Oreifej O, Liu Z (2013) Hon4d: histogram of oriented 4d normals for activity recognition
from depth sequences. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp 716–723
9. Li W, Zhang Z, Liu Z (Jun 2010) Action recognition based on a bag of 3d points. In: 2010 IEEE
computer society conference on computer vision and pattern recognition-workshops. IEEE, pp
9–14
10. Goutte C, Gaussier E (Mar 2005) A probabilistic interpretation of precision, recall and F-score,
with implication for evaluation. In: The European conference on information retrieval. Springer
Berlin Heidelberg, Berlin, Heidelberg, pp 345–359
11. Krstinić D, Braović M, Šerić L, Božić-Štulić D (2020) Multi-label classifier performance
evaluation with confusion matrix. Comput Sci Inf Technol, 1
12. The MathWorks Inc (2022) MATLAB version: 9.13.0 (R2022b), Natick, Massachusetts: The
MathWorks Inc. https://www.mathworks.com
13. Google Colaboratory homepage. https://colab.research.google.com/?utm_source=scs-index.
Accessed 31 Oct 2023
14. Narkhede S (2018) Understanding auc-roc curve. Towards. Data Sci 26(1):220–227
Maximizing Portfolio Returns in Stock
Market Using Deep Reinforcement
Techniques
Abstract Stock markets have become attractive investments due to the potential
for high returns. However, investing in the stock market also comes with inherent
risks, and making informed decisions is essential to minimize losses. Accurately
predicting stock prices is key to reducing risk and maximizing returns. While there
are various investment opportunities in the stock market, ranging from listed stocks
to derivatives, predicting the most likely direction of stock prices can be challenging.
In this study, we aim to design a predictive machine learning model using deep rein-
forcement learning, a technique that leverages reward functions to optimize future
rewards. This approach differs significantly from classical machine learning and
regression algorithms. It offers several advantages, including the ability to evaluate
potential trades and select those that are most likely to provide optimal returns. By
using deep reinforcement learning, historical data can be better analyzed to predict
future stock prices. This technique helps us identify potential trading strategies
by leveraging reward functions to accurately predict which trades will most likely
provide the best returns. The model will be evaluated by comparing the performance
of three agents using the Sharpe Ratio, a mathematical evaluation of returns that
considers factors such as expected and risk-free returns. By analyzing the perfor-
mance of different agents, the optimal trading strategy can be identified to provide
more accurate predictions and better results for investors.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 149
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_12
150 P. Baby Maruthi et al.
1 Introduction
Investing in the stock market is challenging due to its complexity and volatility.
Portfolio optimization of investment is a crucial process in investment management
that aims to maximize returns and minimize risk by selecting the optimal set of
assets to invest in. Traditional portfolio optimization methods rely on statistical and
mathematical models with limitations in capturing financial market dynamics and
non-linearities.
Deep reinforcement learning (DRL) has shown great promise in solving complex
decision-making problems, including portfolio optimization in the stock market.
DRL is a type of machine learning that combines deep neural networks with rein-
forcement learning, allowing agents to learn from trial and error and make optimal
decisions in complex environments.
This paper’s proposed framework is agent-based, allowing agents to interact with
the financial markets and learn optimal trading strategies by modeling deep neural
networks, posing complex relationships between different assets, and learning the
optimal actions to take in different market conditions. The DRL algorithms use
historical stock market data for training and evaluate their performance in terms of
risk-adjusted returns, the Sharpe ratio, and other financial performance indicators.
This paper explores the proposed approach on different stock markets and periods.
The proposed approach provides more accurate and efficient portfolio optimiza-
tion methods, leading to better risk management and higher returns. The approach
provides insights into the applicability and confines of DRL methods in finance and
is useful for more research and practical applications. Capital allocation and invest-
ment performance parameters are used as metrics to evaluate the effectiveness of the
proposed methods.
2 Review of Literature
This literature review provides an ample overview of the existing techniques on using
DRL for portfolio optimization in the stock market. The review discusses the key
challenges and limitations of the existing methods.
• Portfolio optimization—Mean–Variance analysis
The basic idea is to identify a portfolio of assets that provides the highest expected
return for a given level of risk or the lowest risk for a given level of expected return.
The expected return and variance of returns for each asset in the portfolio are esti-
mated based on historical data. These estimates calculate the expected return and
variance of the entire portfolio by considering the weights of each asset. The port-
folio’s expected return is the weighted average of the expected returns of the indi-
vidual assets. The variance of the portfolio is affected by the single variances of the
assets and the correlations among them.
Maximizing Portfolio Returns in Stock Market Using Deep … 151
3 Proposed Methodology
The main goal is to get an optimal trading strategy combining features and
characteristics from three chosen DRL agents. PPO, DDPG, and A2C.
To design the optimal strategy for portfolio optimization using DRL, the following
steps have been used for the Machine Learning pipeline and it shown in Fig. 1:
A. Data Collection:
NASDAQ (National Association of Securities Dealers Automated Quotations) an
American stock exchange in New York City, is chosen as the stock exchange, and
out of the total number of 3554 listings, a sample of 30 stocks is selected, and
historical daily data of the thus taken 30 stocks from 01/01/2010 to 01/03/2023 are
used to train the data to analyze the performance of the agent.
B. Data pre-processing:
The data collected in the above step is checked for null values, erroneous data, and
outliers, which involves cleaning and pre-processing the data. Later, the data is split
into training, validation, and test sets.
Maximizing Portfolio Returns in Stock Market Using Deep … 153
Table 1 (continued)
Author Title Source Findings
Li, S., Li, H., & Li, B. A deep reinforcement Journal of Intelligent The proposed method
(2019). [7] learning framework for & Fuzzy Systems, is tested on various
the financial portfolio 37(5), 6359–6369 datasets, and the
management problem results show its
with a linear effectiveness in
transaction cost improving portfolio
function performance
Chiang, M. H., & Using deep Journal of Risk and DRL-based portfolio
Chiu, C. C. (2020). [8] reinforcement learning Financial optimization
to optimize financial Management, 13(1), approach, which
portfolios 11 integrates a DQN with
a mean–variance
optimization algorithm
Hong, T., Liu, B., & Portfolio optimization Neurocomputing, DRL-based portfolio
An, H. (2019) [9] with reinforcement 357, 185–193 optimization approach,
learning which employs a DQN
agent to learn an
optimal portfolio
allocation policy
Xiong, L., Xia, Y., & Portfolio selection Journal of The model combines
Zhang, Y. (2020). [10] with deep Forecasting, 39(8), deep learning
reinforcement learning 1193–1206 techniques and
reinforcement learning
to select assets for the
portfolio
Saini, S., & Singhal, Portfolio optimization 3rd International The model aims to
A. (2019). [11] using deep Conference on maximize the
reinforcement learning Computing portfolio’s returns by
Methodologies and selecting the
Communication appropriate set of
(ICCMC) assets
(pp. 212–216). IEEE
Huang, X., Wang, Y., Deep reinforcement IEEE Access, 8, The model aims to
& Shan, X. (2020). learning for 122013–122024 optimize the expected
[12] mean–variance return
portfolio selection with
constraints
C. Modelling:
This involves selecting the appropriate reinforcement learning algorithm for the task,
such as A2C, PPO, or DDPG. This involves defining the neural network architecture
and representing the policy and value functions. The pre-processed data is used for
training the model, which involves tuning the model’s hyperparameters, such as the
learning rate and batch size, to optimize the model’s performance.
D. Evaluation:
Maximizing Portfolio Returns in Stock Market Using Deep … 155
•Cleaning of data
Data
preproces •Splitting of data into train, validation and test sets
sing
a direction that maximizes the expected cumulative reward. It learns directly from
the rewards gained during the interactions with the environment. This approach
suits environments with continuous action spaces where computing the value
function is expensive. This is used for solving control problems where the optimal
action is a state function, such as robot control or portfolio optimization.
c Actor-critic approach: This hybrid DRL technique combines the benefits of
the actor—and Critic-only methods. In this approach, the agent learns a policy
(Actor) and a value function (Critic) simultaneously. The actor is responsible for
learning the optimal action selection policy, evaluating the policy’s quality, and
providing feedback to the actor to update the policy. Critic estimates that the
value function measures the likely future rewards for each state-action pair. The
value function estimates the future rewards the agent can expect from the policy.
The actor uses the value function to select actions that lead to higher expected
rewards. This approach is more stable as it includes the critic’s feedback to update
the policy. This has better convergence properties as the actor learns directly from
the critic’s feedback. This approach handles environments by continuous action
spaces compared to Critic-only.
Algorithms:
Below DRL algorithms used to train agents to make decisions in complex environ-
ments.
(a) A2C (Advantage Actor-Critic): The A2C trains agents to interact with an
environment and learn the optimal policy. It is a variant of the Actor-critic
approach that uses two separate neural networks to estimate the policy (Actor)
and the value function (critic). This algorithm simultaneously updates the actor
and critic networks during the training. An actor-network takes the current
state of the environment as input and outputs a probability distribution for the
possible actions. The critic network takes the current state as input and outputs
an estimate of the expected future reward. During each training episode, the
agent interacts with the environment to collect experiences and calculates the
advantage of each action taken. The advantage is the difference between the
actual reward received and the estimated value of that state. The advantage is
used to update the policy and the value function. The A2C uses a gradient-
based optimization to update the actor and critic networks. The policy gradient
is calculated using the advantage estimate to update the actor network. The critic
network is updated using the mean-squared error between the predicted value
and the reward received.
(b) DDPG (Deep Deterministic Policy Gradient): This combines the ideas of deep
neural networks with deterministic policy gradients to enable the learning of
complex continuous control policies. The algorithm uses an actor-critic archi-
tecture where the actor is a deep neural network that maps states to actions,
and the critic is a separate deep neural network that estimates the Q-values of
state-action pairs. During training, the actor learns to maximize the Q-value
predicted by the critic. The critic learns to minimize based on the difference
Maximizing Portfolio Returns in Stock Market Using Deep … 157
between its estimated Q-values and the actual rewards the agent receives. The
actor’s policy is updated using the deterministic policy gradient derived from
the critic’s Q-value estimates. To improve stability and reduce variance, the
DDPG uses experience replay. Thus, the agent can learn from past experiences
by storing them in a replay buffer and sampling them randomly during training
and target networks.
(c) PPO (Proximal Policy Optimization): The policy is updated in the same envi-
ronment used to collect data. PPO is designed to strike a balance between explo-
ration and exploitation by improving the robustness of the policy updates while
limiting the size of the changes to the policy parameters. In PPO, a clipped
surrogate objective function is used to update the policy, limiting the policy’s
change by adding a penalty term to the objective function when the new policy
deviates too far from the previous policy. This constraint encourages small
policy updates and improves the stability of the training process. The compo-
nents of the algorithm are a policy network and a value network. The policy
network takes the current state of the environment and outputs a probability
distribution over actions. The value network takes the current state of the envi-
ronment and outputs an estimate of the expected return of the current state. The
policy is updated using the clipped surrogate objective function, whereas the
value network is updated using the mean squared error loss. Hence, updates
to the policy and value networks are made by stochastic gradient descent. An
ensemble method is used to select the best agent among PPO, A2C and DDPG to
trade based on the Sharpe Ratio.
4 Result Analysis
The performance analysis of three different A2C, DDPG, PPO, and ensemble
methods measured metrics in the following parameters.
Annual Return: It refers to the percentage change in the value of an investment over
one year. It provides a measure of the investment’s performance on an annual basis
and helps investors assess the rate of return they have earned during that specific
year.
Annual Volatility: It refers to the measure of how much the price of a stock or
the overall market index fluctuates over one year. It indicates the variability or risk
associated with the investment.
Sharpe Ratio quantifies the excess return an investment generates per unit of risk
taken.
AverageReturnof
Sharpe Ratio = /Standard Deviation
theInvestment − Risk − FreeRate
of the Investment.
158 P. Baby Maruthi et al.
Table 2 Comparison between the various strategies chosen for the study
Ensemble PPO A2C DDPG
Cumulative return (%) 65.10 78.37 61.20 54.60
Annual return (%) 12.30 15.80 11.10 10.40
Annual volatility (%) 10.40 13.40 10.90 12.67
Sharpe Ratio 1.3 1.15 1.17 0.82
Max drawdown (%) −9.90 −24.56 −10.30 −14.50
Performance Metrics
150.00%
100.00%
50.00%
0.00%
Cumulative Annual return Annual Sharpe Ratio Max drawdown
return volatility
-50.00%
5 Backtest Results
Fig. 3 Back test plot comparing the Ensemble strategy and the Dow Jones Index (DJI)
Jones Index (DJI) in order to compare the performance of the two within the same
time period.
It was observed that, on average, the ensemble strategy gives higher returns than
the DJI.
Our findings demonstrate that the Ensemble strategy outperforms the Dow Jones
Index considerably throughout the chosen duration shown in Fig. 3. Further, the
following inferences can be drawn based on the findings of the study:
1. PPO agent outperforms the other three in terms of cumulative returns and annual
returns but also has greater volatility and maximum drawdown. Thus, for a risk-
averse investor, the cumulative and annual returns may not compensate enough
for the associated risks.
2. The ensemble strategy has the lowest maximum drawdown and annual volatility
among the four strategies, thus providing an enticing option for the risk-averse
investor. Whilst the cumulative and annual returns of the Ensemble strategy might
not be the best among all the strategies evaluated, the considerably risk-free nature
makes it an attractive proposition.
6 Conclusion
factors like interest rates or inflation. By evaluating the impact of different reward
functions on portfolio performance, researchers can identify the most effective ones
and their impact on long-term portfolio growth.
Further research could implement the model in a real-world setting to evaluate its
performance and applicability to real-world investment scenarios. This can help to
determine the practicality of using reinforcement learning algorithms for portfolio
management.
References
1. Jiang B, Li Q, Tan KC (2017) A deep reinforcement learning framework for the financial
portfolio management problem. IEEE Trans Neural Netw Learn Syst 29(9):1–13
2. Li J, Li B, Lu J (2019) Deep reinforcement learning-based portfolio management with policy
gradient methods. J Intell Fuzzy Syst 37(5):6345–6357
3. Wang H, Chen X (2020) A deep reinforcement learning approach for portfolio optimization.
IEEE Trans Neural Netw Learn Syst 31(3):698–710
4. Liu Y, Wang Y, Zhang Y (2019) Deep reinforcement learning for portfolio management. J
Intell Fuzzy Syst 37(5):6333–6343
5. Chen Z, Zhang X, Jiang Z (2020) Deep reinforcement learning for multi-asset portfolio
management. Appl Soft Comput 96:106597
6. Gu Y, Zhang Y (2019) Reinforcement learning for portfolio management with regret control.
J Intell Fuzzy Syst 37(5):6323–6331
7. Li S, Li H, Li B (2019) A deep reinforcement learning framework for the financial portfolio
management problem with a linear transaction cost function. J Intell Fuzzy Syst 37(5):6359–
6369
8. Chiang MH, Chiu CC (2020) Using deep reinforcement learning to optimize financial
portfolios. J Risk Financ Manag 13(1):11
9. Hong T, Liu B, An H (2019) Portfolio optimization with reinforcement learning. Neurocom-
puting 357:185–193
10. Xiong L, Xia Y, Zhang Y (2020) Portfolio selection with deep reinforcement learning. J Forecast
39(8):1193–1206
11. Saini S, Singhal A (2019) Portfolio optimization using deep reinforcement learning. In: 2019
3rd international conference on computing methodologies and communication (ICCMC).
IEEE, pp 212–216
12. Huang X, Wang Y, Shan X (2020) Deep reinforcement learning for mean-variance portfolio
selection with constraints. IEEE Access 8:122013–122024
13. Kim KJ, Kim Y, Kang SJ (2018) Deep learning for stock selection based on financial statements.
J Inf Process Syst 14(1):1–11
14. Zhang H, Shen W, Guo L (2019) Stock selection using deep learning techniques. In: 2019 IEEE
4th international conference on cloud computing and big data analysis (ICCCBDA). IEEE, pp
176–181
15. Cui J, Zhai X (2019) Deep learning models for predicting stock prices using financial news
articles. Appl Intell 49(6):1816–1828
16. Hsieh TY, Lin TC (2019) Stock price prediction using a hybrid deep learning model. Expert
Syst Appl 132:256–269
Detecting AI Generated Content:
A Study of Methods and Applications
Abstract This paper includes an extensive study into the identification of AI-
generated content, a vital field of research in the context of maintaining academic
integrity and preventing plagiarism. The study categorizes detection approaches into
three primary ways: linguistic-based, statistical-based, and learning-based methods.
Each method is extensively reviewed regarding its approach, strengths, and potential
limits. The paper contains a comparative examination of the three types of detection
algorithms, stressing their various use cases and performance measures. The paper
also discusses the challenges and scenarios for detecting AI-generated content in
various domains and contexts.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 161
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_13
162 S. Tiwari et al.
• retrieval-based systems, or
• generative models.
Among them, generative models such as ChatGPT, Bing AI, Bard, etc., have
shown astonishing abilities to produce responses that could mimic human conversa-
tions. However, generative models also have some boundaries:
• loss of factual consistency,
• ethical issues,
• vulnerability to adversarial attacks [1].
Therefore, detecting AI-generated content material is crucial to maintaining the
quality and reliability of online records and preventing harm to users and society.
AI-generated content detection is a binary categorization problem in which the input
is textual content, and the output is a label indicating whether or not the text is
written by a human or machine-generated [1]. However, this task is not trivial, as AI-
generated texts can be very much like human-written texts regarding linguistic capa-
bilities, style, and content material. Moreover, different generative models might also
have different characteristics and behaviors that require unique detection strategies.
Following is an extensive category of procedures:
• Linguistic-based methods
• Statistical-based methods
• Learning-based methods [2].
Linguistic-based methods analyze linguistic patterns to distinguish human-written
texts from AI-generated ones, but they may not be generalizable to different types of
texts or generative models. Statistical-based strategies use text distribution to differ-
entiate between human-written and system-generated texts. However, they may not
be reliable for quick or noisy texts. Learning-based approaches use machine learning
(ML) techniques to study discriminative features, but they may not be scalable or
adaptable to new text types or generative models.
2 Background
The rapid advancement of artificial intelligence (AI) in the field of natural language
generation (NLG) has led to the creation of texts that are increasingly fluent, coherent,
and human-like [3, 4]. Large language models (LLMs) based tools like ChatGPT,
BERT, XLNet, and others have demonstrated remarkable capabilities in generating
texts for various domains and applications, including chatbots, reviews, news arti-
cles, and academic papers. However, the proliferation of AI-generated content poses
significant challenges and risks, particularly in academic integrity and plagiarism
prevention [5]. AI-generated content can be used to deceive, manipulate, or misin-
form readers and bypass plagiarism detection systems. Therefore, it is crucial to
Detecting AI Generated Content: A Study of Methods and Applications 163
develop methods and tools to recognize and differentiate AI-generated content from
human-written content [6].
Detecting AI-generated content is a binary categorization problem, where the
input is text, and the output is a label indicating whether the content is human-
written or machine-generated [7]. This task is not trivial, as AI-generated texts can be
very similar to human-written texts regarding linguistic features, style, and content
[8]. Moreover, different generative models may have different characteristics and
behaviors that require specific detection strategies [9]. Previous studies have proposed
various approaches to detect AI-generated content, which can be broadly categorized
into three types: watermarking, classification, and Statistical Analysis.
2.1 Watermarking
The method proposed, “A Watermark for Large Language Models” by John Kirchen-
bauer et al. [8] from the University of Maryland, embeds a digital watermark method
into the generated text, undetectable to humans but noticeable by an algorithm. It
can help identify the source and authenticity of the text. However, it requires access
to the generation process, which may not always be available or feasible [8].
2.2 Classification
3 Generative Models
Generative models are a huge step forward in chatbot technology because they utilize
advanced machine-learning algorithms or techniques to generate text that sounds
more like a human. They work on generating answers from scratch based on the input
they receive, allowing them to create unique and contextually appropriate replies [8].
Some particular generative models are:
• OpenAI’s ChatGPT
• Google’s Bard
• Microsoft’s Bing AI.
This section explores the approaches used to distinguish texts generated by machines
from those written by humans. In general, there are three types of detection methods.
Detecting AI Generated Content: A Study of Methods and Applications 165
a machine might use adjectives or adverbs more frequently than a human writer
[18, 20].
• Feature Extraction for Machine Learning: POS tags can serve as valuable
features in machine learning models used for text classification. For example,
the frequency of certain POS tags (like nouns, verbs, adjectives, etc.) can help
differentiate between human-written and machine-generated text [18, 19].
This involves parsing the sentence structure to analyze the relationships between
words. Mathematically, this process can be represented using formal grammars and
parse trees.
Syntax Tree Parsing can help discriminate human-written and AI-generated text
by analyzing the following features:
• Depth of the Tree: The syntax tree’s depth can indicate the sentence’s complexity.
AI-generated text might have a different average tree depth than human-written
text [21].
• Branching Factor: The average number of children of a node (branching factor)
in the syntax tree can also be a distinguishing feature. It might differ between
human-written and AI-generated text [21].
• Sentence Length: The length of the sentence can be inferred from the syntax tree,
which might vary between human-written and AI-generated text [21].
• Grammar Patterns: Certain grammar patterns, represented as paths in the syntax
tree, might be used more frequently in AI-generated text [21].
These features can train an ML-based model to distinguish between human-
written and AI-generated text.
d. Lexical Diversity Metrics
Lexical Diversity Metrics like Type-Token Ratio (TTR) and Measure of Textual
Lexical Diversity (MTLD) are used to analyze the richness and variety of vocabulary
in a text [22, 23]. Here’s how they can help identify AI-generated content:
Type-Token Ratio (TTR): TTR is the ratio of unique words (types) to the total
number of words (tokens) in a text [22]. AI-generated content might demonstrate a
different TTR compared to human-written text. For instance, if an AI model overuses
certain phrases or lacks creativity in its language use, it might have a lower TTR [22].
Measure of Textual Lexical Diversity (MTLD): MTLD calculates the mean length
of word strings that sustain a criterion level of lexical variation [22]. It is designed
to be more resistant to text length than TTR [22]. If an AI model generates text with
repetitive or predictable language, it might have a lower MTLD [22].
Detecting AI Generated Content: A Study of Methods and Applications 167
So, these features (TTR and MLTD) can significantly train an ML-based model
to discriminate between human-written and AI-generated text.
e. Sentiment Analysis
Sentiment analysis algorithms often employ mathematical models, such as Naive
Bayes classifiers, to conclude the sentiment articulated in a text [12, 13]. Sentiment
Analysis algorithms output a sentiment score, which typically ranges from negative
to positive, indicating the overall sentiment of the text [24, 25]. This score can be
used in several ways to differentiate content as human-written or AI-generated text:
• Emotion Consistency: Human-written text usually maintains a consistent senti-
ment throughout, especially in short texts. In contrast, AI-generated text might
exhibit sudden shifts in sentiment, which can be a telltale sign of machine
generation [25].
• Sentiment Intensity: AI models might overuse positive or negative sentiments,
leading to unnaturally strong sentiment scores. For example, if an AI model is
trained mostly on positive reviews, it might generate text with an overly positive
sentiment [25].
• Feature Extraction for Machine Learning: The sentiment scores calculated by
Sentiment Analysis algorithms can serve as valuable features in ML models used
for text classification [25].
For example, consider two pieces of text:
– AI-generated: “This product is fantastic! It’s incredibly amazing! I love it so
much!”
– Human-written: “I like this product. It works well and meets my needs.”
The AI-generated text might have a higher sentiment score due to the overuse of
positive words, while the human-written text might have a more moderate sentiment
score [25]. These differences in sentiment scores can be used to train a machine
learning model to differentiate as human-written or AI-generated content [25].
Strengths:
• Sensitivity to Linguistic Nuances: Linguistic-based methods capture fine-
grained linguistic irregularities, making them adept at detecting AI-generated
text [12, 13].
• Granular Analysis: They facilitate a detailed examination of linguistic structures,
allowing for detecting subtle deviations [12, 13].
Limitations:
• Dependency on Specific Features: The choice of appropriate linguistic features
is critical in determining the effectiveness of linguistic-based approaches. This
could restrict their use to particular genres, languages, or domains [12, 13].
168 S. Tiwari et al.
PP(W ) = P(W )− N
1
(7)
w.x + b = 0 (10)
that maximizes the margin between the two classes. This is achieved by solving the
optimization problem:
1 N
w2 + C (0, 1 − yi (w · xi + b)) (11)
2 i=1
Detecting AI Generated Content: A Study of Methods and Applications 171
where:
w is the weight vector.
b is the bias term.
C is the regularization parameter.
N is the number of training examples.
Additionally, deep learning models, particularly recurrent neural networks
(RNNs) and transformers, have demonstrated significant success in learning-based
methods for text classification. These models can capture complex relationships and
dependencies in text data, making them highly effective in discerning AI-generated
content [14, 15].
Strengths:
• Learning-based methods can adapt to different types of texts and generative
models, making them versatile [14, 15].
• They can automatically extract intricate features from the data, potentially
uncovering subtle differences between human-written and AI-generated content
[14, 15].
Limitations:
• Learning-based methods require substantial labeled training data, which may not
always be readily available [14, 15].
• They may be computationally intensive, especially for complex models like deep
neural networks [14, 15].
• The performance of these methods can heavily depend on the quality of the
features extracted and the chosen algorithm [14, 15].
By employing learning-based methods, researchers and developers can harness
the power of machine learning to build robust detectors for AI-generated content
[14, 15].
Table 1 compares different methodologies, their advantages and limitations, etc.
Though each approach has its own strengths and weaknesses, the result will be
optimum if these approaches are implemented in combination.
7 Conclusion
This paper provided an overview of the existing techniques and applications for
detecting AI-generated content, focusing on conversational AI and chatbots. The
paper compares and contrasts three detection methods: linguistic-based, statistical-
based, and learning-based, highlighting their advantages and disadvantages. The
paper also identifies the key challenges and scenarios for detecting AI-generated
content in different industries and contexts. The paper suggests that combining
these methods can achieve better results than individual approaches and that further
research and development are needed to keep up with the advances in generative
models.
Detecting AI Generated Content: A Study of Methods and Applications 175
References
23. Brglez M, Vintar Š (2022) Lexical diversity in statistical and neural machine translation. Infor-
mation 13(2):93
24. Text & sentiment analysis: key differences & real-world examples. https://qualaroo.com/blog/
text-analysis-vs-sentiment-analysis-understanding-the-difference/. Accessed 5 Dec 2023
25. Yang D, Zhou Y, Zhang Z, Li TJ-J, Ray LC (2022) AI as an active writer: interaction strategies
with generated text in human-AI collaborative fiction writing. In: Joint proceedings of the ACM
IUI workshops, vol 10. CEUR-WS Team
A Systemic Review of Machine Learning
Approaches for Malicious URL Detection
Abstract New websites are emerging every day due to the growing popularity of
the internet. No matter where you live, your occupation, or your age, web browsing
has become an everyday activity for everyone. However, due to the growing internet
use, website attacks have become common. A URL that contains hidden links is
vulnerable and exploited by intruders for phishing, spam, DoS, DDoS, etc. Iden-
tifying and combating such malicious websites has been quite challenging due to
the difficulty separating good from harmful websites. In this survey paper, various
techniques used by researchers to detect malicious URLs are analyzed. Methods like
the Blacklist method, Heuristic approach, and various research articles for malicious
URL detection are discussed here. This paper presents malicious URL detection as
a machine-learning task and categorizes and reviews literature studies that address
the different aspects of the problem. Several well-known classifiers are discussed in
this paper, including Naive Bayes, Support Vector Machines, Multi-Layer Decision
Trees, and Random Forests, for detecting malicious URLs as a binary classification
problem.
1 Introduction
Web security has become an increasingly important issue in recent years as Internet
connectivity has spread around the globe. Though global connectivity is excellent
for accessible communication, there is also a risk that more people will be exposed
to malicious websites containing malware, viruses, and other agents that can cause
S. Kothari (B)
Symbiosis International (Deemed University), Symbiosis Institute of Technology, Lavale, Pune,
India
e-mail: sonali.kothari@sitpune.edu.in
I. Tidke
Bharti Vidyapeeth College of Engineering, Lavale, Pune, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 177
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_14
178 S. Kothari and I. Tidke
them harm. Consequently, identifying and dealing with such websites before a user
can access them becomes more crucial than ever. This problem is currently being
addressed in several inadequate ways regarding effectiveness and efficiency.
Malicious URLs are used to perform unlawful activities such as sending unso-
licited messages, committing financial fraud, performing a man-in-the-middle attack,
giving fake links for download, and creating and injecting some viruses into the user’s
system online scams, XSS attacks, phishing attacks, gore, etc. A malicious URL is
made with the intent to distribute malware like ransomware. The number of mali-
cious URLs has increased drastically as the size and popularity of the internet have
increased. Malicious URLs also affect IoT devices.
Malicious URLs pose a severe threat to cyber security. In [1], a diffusion distance
measurement technique based on color histogram similarity and motion cues from
segmented video objects creates an object-tracking framework based on particle
filters with probability functions. When a set of markers or rules is applied to detecting
malicious URLs, malicious URLs can be detected appropriately. Despite this, the
given method cannot see new malicious URLs that do not fit the predefined rules
or signs. In the proposed paper, we have compared the traditional approaches and
Machine learning ways to detect malicious URLs. The proposed paper is focused on
classifying URLs using different machine-learning algorithms. Various attributes of
URLs are considered.
introduces a new complexity. The blacklist method to identify and save the list of
malicious URLs is suitable but cannot be generalized. It cannot predict if any URL
is malicious or benign. The Blacklist method cannot find a new type of malicious
URL, and the number of malicious URLs is too large to be used in a simple database.
SVM, Naïve Bayes, Decision Tree, Random Forest, and ensemble learning are machine
learning techniques to detect dangerous URLs [3]. The general steps of a Machine
Learning approach are:
1. Get the dataset of Malicious URL
2. Extract the URL features and obtain the appropriate feature representation.
3. Split the data into train, test, and validation.
4. Implement a machine learning algorithm.
5. Check the accuracy of the result [4].
In Deep learning methods, natural language processing methods are used to
process the date text data in the URL and classify them as malicious or benign. Any
URL has two parts: First is the Application layer protocol, and second is the host-
name. The application layer protocol viz., HTTP, HTTPS, IPFS, and the hostname
is the website’s name, such as google.com.
Lexical Features: Lexical features are the name of the URL, its length, and the
average length of words. These linguistic characteristics are combined with addi-
tional features (for example, host-based features) to improve model performance. The
180 S. Kothari and I. Tidke
traditional methods use URL String, its length, and the length of every component
such as hostname, protocol, name of the website, etc.
i. The word embedding is created using pre-trained vectors such as word2vec and
Glove.
ii. Machine learning models like random forest and support vector machines are
used. Deep Learning methods such as transformers, LSTM, and CNN extract
features from the URL.
iii. The Kolmogorov Complexity method is also used. Kolmogorov Complexity
gives the complexity of a string given a free line.
Host-based Features: The host-based features are Whois information, IP address,
geolocation, city, state, and country. Whois information can be hidden using the whois
guard, limiting the usefulness of the host-based features. The physical geographical
location—for example, the country or city—is included in the location information.
Malicious URLs are detected using Application Layer and Network Layer charac-
teristics. “Time-to-live values”, the presence of relevant terms such as “client” or
“server,” whether the IP address appears in the domain name, and whether the PTR
record matches one of the hosts’ IP addresses are listed among the Domain Name
properties.
As many attributes are identity-related, they are stored in a numerical vector using
a bag-of-words technique, where each word corresponds to a particular identity.
Adopting such a method yields many characteristics similar to lexical features.
Other Features-Based Techniques: Features such as the Google safe browsing
list and the website’s Alexa rank can also be used to identify whether a website is
malicious or benign. The website’s popularity can be used to determine if the URL
is phishing or benign [5].
The following work discusses the most effective machine learning and deep learning
methods of classifying malicious URLs from benign ones.
The quantity and severity of network information security threats are increasing.
Nowadays, hackers seek to target end-to-end technology and exploit human flaws.
Approaches such as social engineering and phishing are examples [6]. Using mali-
cious Uniform Resource Locators (URLs) to deceive consumers is one step of these
cyberattacks (URLs).
The improved learning-based technique is utilized in this paper [7] to categorize
blacklisted websites into three categories: benign, spam, and malware. These are
A Systemic Review of Machine Learning Approaches for Malicious … 181
then used to assess whether the Uniform Resource Locator (URL) will not access
the content of websites, removing the run-time latency. The model is trained using
the shortened URL dataset and associated characteristics, resulting in a projected
accuracy of 96.29%.
An evaluation of machine learning algorithms to classify malicious URLs using
a controlled and robust set of criteria based on the constraints of previous research
is used and integrated various existing datasets with URLs of four different types
“benign”, “spam”, “phishing”, and “malware.” A random forest algorithm was then
utilized to forecast if the URL was malicious or benign and the type of attack that
could be placed on the URL.
Random forest obtained the maximum accuracy of 98.6%. The findings reveal that
detecting harmful websites just by their URLs and categorizing them as spam URLs
without depending on page content saves considerable resources while providing a
secure surfing experience for the user. It generates a probabilistic output, and it can
handle a large number of characteristics. It is particularly essential in our situation
because it deals with multiple classes that always cause issues.
Hackers have begun to use social media sites such as Twitter, Facebook, Google+,
and My Space for illegal operations in the context of online social media. These
are well-known social networking sites many individuals use to interact with one
another and share shared interests. Twitter is famous for microblogging, in which
users send brief messages known as tweets. Because the accessible material is broad
and scattered, hackers and attackers have begun to use Twitter to spread viruses. It
is also elementary to disseminate and post URLs on Twitter.
PhishAri quickly recognizes phishing on Twitter. To determine if a tweet
containing a URL is phishing, we blend attributes specific to Twitter with those
unique to URLs. We use elements peculiar to Twitter, such as tweet content and its
characteristics, including length, hashtags, and mentions. Other Twitter data includes
the account’s age, the number of tweets, and the follower-to-followee ratio of the
Twitter user who posted the tweet. These unique characteristics of Twitter work well
with URL-based features to identify phishing tweets.
To efficiently detect phishing tweets, PhishAri examines many criteria, such as
the qualities of the suspicious URL, the tweet’s text, the attributes of the Twitter
user submitting the tweet, and facts about the phishing websites. A Chrome browser
plugin has also been created to provide real-time phishing protection. Twitter users
are detected by attaching a red indication to phishing tweets, and the browser plugin
stops users from falling prey to phishing assaults [8].
182 S. Kothari and I. Tidke
SVM has been a popular method in detecting the malicious URLs present on the
internet today. This method has been used in many research papers, mainly to differ-
entiate between malignant and benign links (harmful and harmless). This summary
will explore how scholars have used this particular algorithm. An SVM model clas-
sifies the given data using two or more hyperplanes. This algorithm finds the closest
points of the line, called support vectors.
Malicious URL detection is a common problem recurrent on the internet today.
To tackle this, researchers are diving into machine learning to find solutions for
detecting such links to avoid significant damage when users click on such links.
So far, many papers have been based on the use of SVM algorithms, using them
in various ways. Looking at their results, SVM proved to be the best algorithm for
solving this problem.
The paper [9] has developed the Kozinec-SVM algorithm, which aims to reduce
the complexity of detecting URLs. SVM was also used to reduce the false positive
rate consistently. This has helped efficiently classify malicious URLs from benign
ones. Following this, lots of recent research has mainly focused on this problem.
The following paper [10] explored the various ways of using the SVM algorithm
in combination with other such techniques. SVM was used with a polynomial kernel
and the algorithm logistic regression to achieve an impressive accuracy of 98%.
In the same year, the paper [11], with the help of around thirty thousand URL links
and sixty-three features, feature engineering mechanisms were performed, wherein
a significant performance was obtained. The use of the SVM algorithm produced
an improvement after the above implementations. The paper [12] from the same
year has focused on the reference to big data processing. Hence, SVM was used to
realize autonomous learning and to build the classification of these items. K-Means
was even used to reduce the dimensionality of the data and ensure accuracy was
maintained [13]. They have focused on detecting the anomaly behaviors of these
suspicious URLs by exploiting big data technology. Experimentation with various
machine learning models like RF and SVM, they got a good enough accuracy. They
aimed to check how machine learning can help one detect them.
The following year, 2021 [14], the paper implemented an exciting data mining
approach using association-based classification. It uses a training dataset with a
history of past malicious links to build the association rules and create a good model.
This has helped them reduce false positive and negative rates, respectively.
A few more methods were explored [15], with 117 features to choose from to
build their model. Two approaches were used to perform this initial step, and a good
enough number of features were extracted further to train models like k-NN, SVM,
and ANN. The model, when introduced on k-NN, produced a satisfactory result.
A simple approach was also proposed to get the desired output [16]; proposed a
malicious URL detection model, wherein various features of links were studied, with
classification. Models used to implement this were Naïve Bayes, SVM, and Logistic
Regression. This produced a good enough model [17] that has also researched this.
A Systemic Review of Machine Learning Approaches for Malicious … 183
In this paper, they have used various classification methods to classify each URL
type. Their analysis shows that their classification models can organize the spiteful
code from benign [18], predicting whether the URL is malicious. It focuses mainly
on identifying phishing links, with an in-depth understanding of machine learning
methods [19], which has focused on addressing the detection of harmful URLs as a
binary classification problem. Later, the proposed work evaluated this with various
machine-learning problems, including SVM.
In recent times, various deep learning models have been developed. They are preva-
lent because deep learning methods can learn with little or no help. They are flexible
regarding changes in the input data and the environment. They perform like humans
or even better in medical diagnosis, image, language processing, forecasting and
predictions, etc.
Blacklisting, regular expressions, and signature matching are the most promi-
nent methods for detecting malicious URLs. However, with more robust methods of
creating new URLs and URL variants from existing malicious URLs, these methods
have become ineffective. Deep learning has been of great help here. Deep learning
methods started with autoencoders, traditional multi-layer perceptrons, Deep Belief
Networks, and a mix of regular expressions to extract features and apply feedforward
neural networks.
After Convolutional Neural Networks and NLP became efficient, they were
applied for the task. NLP-based methods like RNN, LSTM, and GRU are now being
perfected to detect malicious URLs. Even though deep learning models are more
accurate, they require a lot of computational resources like GPU. This intensifies
when new data is incoming and the model is trained again. Deep learning methods
such as Lifelong learning and Online Deep Learning have been developed to optimize
resources. For deep learning methods, character-level models have proven effective.
The models can be summarized as follows:
• Input: character strings.
• Feature extraction.
• Text classification—malicious/safe.
Below are a few novel deep-learning methods developed for Character Level
Malicious URL detection.
a. “NYU Architecture”: Here, a combination of CNN and LSTM is used for feature
extraction. A sequential model is created. Text representation for CNN is done
184 S. Kothari and I. Tidke
using pre-trained embedding, embedding, and lookup tables. For LSTM, only
the pre-trained embedding is used.
b. “Invincea Architecture”: Here, the CNN network consists of a Keras embedding
layer, a parallel CNN layer, and three fully connected layers [20]. The reactivation
function is used in the coatings. Batch normalization and dropouts are used to
prevent overfitting. The output layer has a sigmoid activation function to perform
binary classification of the URL as malicious or safe.
a. “MIT Architecture”: Here, the NYU architecture is used, and another LSTM
layer is added. So, we have a CNN-LSTM and LSTM layer in a sequence. This
has caused overfitting.
b. “DeepURLDetect (DUD) Model”: This model tries to overcome the overfitting
and low accuracies of the above deep learning model to classify URLs.
Two datasets are used here, one collected from public sources like Alexa.com,
DMOZ directory, and MalwareDomainlist.com. The second dataset is from Sophos
research. The model works in three steps:
i. Pre-processing: Here, URLs are embedded to form feature vectors.
ii. Optimal features are extracted using models used in the above methods, like
Invincea, NYU, MIT, CMU, and Endgame.
iii. Classification where steps such as converting URL to lowercase, applying zero
padding to make all URLs of the same length, performing character level
Keras embeddings, applying CNN, applying LSTM, classifying using sigmoid
activation.
The performance measures are accuracy, precision, recall, and F1 score. All the
models had accuracy between 93 and 99%.
A Systemic Review of Machine Learning Approaches for Malicious … 185
The researchers showed that while the transformer-based model did not perform
the best, the training time for the transformer was less than other methods [24].
186 S. Kothari and I. Tidke
5 Conclusion
References
15. AlTalhi R, Saqib MN, Saeed U, Alghamdi A (2021) Malicious URL detection using streaming
feature selection. In: The 5th international conference on future networks & distributed systems,
December 2021
16. Wejinya G, Bhatia S (2021) Machine learning for malicious URL detection. ICT systems and
sustainability. Springer, Singapore, pp 463–472
17. Singh A, Kumar A, Bharti AK, Singh V (2021) Detection of malicious web contents using
machine and deep learning approaches. Int J Appl Innov Eng Manag (IJAIEM), 10(6), 104–109.
ISSN 2319-4847
18. Tang L, Mahmoud QH (Aug 2021) A survey of machine learning-based solutions for phishing
website detection. Mach Learn Knowl Extr 3(3):672–694
19. Shantanu BJ, Kumar RJA (2021) Malicious URL detection: a comparative study. In: 2021
international conference on artificial intelligence and smart systems (ICAIS), Coimbatore,
India, pp 1147–1151. https://doi.org/10.1109/ICAIS50930.2021.9396014
20. Srinivasan S, Vinayakumar R, Arunachalam A, Alazab M, Soman KP (2020) DURLD: mali-
cious URL detection using deep learning-based character level representations. In: Malware
analysis using artificial intelligence and deep learning. https://doi.org/10.1007/978-3-030-
62582-5_21
21. Younghoo Lee JSRH (2023) CATBERT: context-aware tiny BERT for detecting social
engineering emails. https://arxiv.org/abs/2010.03484. Accessed 10 Jun 2023
22. Xu P (2023) A transformer-based model to detect phishing URLs. https://arxiv.org/abs/2109.
02138. Accessed 5 Aug 2023
23. Rudd Ethan M, Abdallah A (2023) Training transformers for information security tasks: a case
study on malicious URL prediction. https://arxiv.org/abs/2011.03040. Accessed 5 Jun 2023
24. Haynes K, Shirazi H, Ray I (2021) Lightweight URL-based phishing detection using natural
language processing transformers for mobile devices. FNC/MobiSPC 191:127–134
Digital Image Forgery Detection Based
on Convolutional Neural Networks
Abstract Image authentication has become a hot topic with the currently available
technology for manipulating and distributing images. Image authentication aims to
ensure the authenticity of digital images and automatically detect forged images
that have been tampered with after they have been captured. This paper presents
a convolutional network (CNN) model for detecting forged images. The proposed
model implies three main stages: First, preprocessing the input image by adopting
logarithmic mapping to refine the quality of the extracted features, especially in
dark regions. Secondly, a novel CNN architecture was trained to classify arbitrary
images into two categories: “original” and “forged”. The CNN model locates some
descriptors as a descriptor map. Finally, the model finds the similarities and depen-
dencies between these features. The proposed model was tested and evaluated on
three datasets under various copy-move conditions. Experimental results reveal that
the model can detect forged images with high accuracy, reaching up to 97.11%.
1 Introduction
N. M. Saleh
Informatics Institute for Postgraduate Studies, Iraqi Commission for Computers and Informatics,
Baghdad, Iraq
S. A. Naji (B)
University of Information Technology and Communications, Baghdad, Iraq
e-mail: dr.sinannaji@uoitc.edu.iq
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 189
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_15
190 N. M. Saleh and S. A. Naji
images have become an integral part of our daily lives. Unfortunately, the digital
images on television news broadcasts, websites, and newspapers are perceived as
true facts [4]. Although some images are susceptible to malevolent actions, this is
primarily attributable to the proliferation of image-editing software, making creating
fake images relatively easy [5]. By altering the visual contents of an image, the
new image is called a “forged” image. In many instances, image forgery can be
problematic. For example, a lot of fake photos have been released in online media
in an attempt to deceive readers. In some situations, photos used as evidence may
contain duplicates or leave out important objects. Furthermore, certain diseases may
be concealed or fabricated in medical images for insurance purposes [1]. For casual
users, it isn’t easy to visually distinguish between an original and a forged version
of a given image.
Image forgery detection can be defined as follows: “Given an arbitrary image, the
goal of image forgery detection is to decide the image’s authenticity and whether the
image had been tampered with in some way after it had been captured. In other words,
determine whether or not any object or region in the image has been tampered with
and, if present, return the location and extent of each in the image”. Image forgery
can be categorized into four main categories, as follows [6–9]:
1. Image Splicing Forgery: a new forged image is created by splicing two or more
images [6, 7, 10].
2. Image Retouching Forgery: manipulating some regions or objects in an image
to differentiate or enhance certain details. Generally, it does not significantly
modify the visual contents of an image. This may include adjusting intensities,
removing defects, false colors, enhancement, visual effects, eliminating wrinkles,
etc. [8].
3. Transformation Forgery: scaling, rotating, or translating one or more objects
in the image [11].
4. Copy-Move Image Forgery: in which certain objects are replicated in the image
in order to duplicate these objects or to overlap (i.e., hide) some regions in the
source image.
Generally, image forgery detection can be divided into two categories: passive
and active techniques. Some authentication data is embedded in the source image
before being distributed using active methods. The authentication data might be
subsequently utilized to confirm whether or not the image has been altered during a
forensic examination. The main issue with this technique is that it requires special-
ized cameras or a post-processing step after the image is captured. Watermarking,
steganography, and digital signatures are widely used for this type.
On the other hand, passive techniques detect if an arbitrary image has been
tampered with without any previous embedded authentication information [12]. To
detect a forgery, it is necessary to seek particular characteristics, such as statistical
properties, that are considered homogeneous to the original image [13]. Copy-move
forgery is the most prevalent form of image forgery due to its simplicity, and it is
usually used for altering an image’s contents with illegal intent.
Digital Image Forgery Detection Based on Convolutional Neural Networks 191
2 Related Work
Image forensics is a hot and highly effective topic that concerns developing tech-
niques to determine the authenticity of digital images [19] blindly. These tech-
niques are based on the assumption that an image forgery can be identified even
if there is no prior information concerning its contents. Recently, many tech-
niques have been proposed in the literature. The CMFD techniques can be cate-
gorized as follows: Block-based approach, key point-based approach, and machine
learning-based approach [20, 21].
The input image is initially divided into either overlapping or non-overlapping blocks.
In most cases, this is followed by the extraction of block features. Various feature
extraction techniques involving frequency transformations, filters, and region texture
are used in block-based algorithms. The matching phase is applied to each block
to determine similar blocks based on their features using an appropriate matching
mechanism. Gani and Qadir used the Discrete Cosine Transform (DCT) to extract
features from each block [20]. Cellular Automata was used to create the feature
vector of the DCT coefficients. The KD-tree matches the feature vector to locate
similar regions in the image. Ahmed et al. proposed a CMFD algorithm that implies
five stages [21]: preprocessing the input image; segmenting the image into blocks,
calculating certain statistical properties for each block to create the feature vectors;
sorting the feature vectors lexicographically; and finally, feeding them to the Support
192 N. M. Saleh and S. A. Naji
Vector Machine (SVM) classifier to decide whether the image is authentic or forged.
Zimba and Xingming proposed an algorithm that used Discrete Wavelet Transform
(DWT) for feature extraction [22]. Principal Component Analysis (PCA) was used
for classification. The algorithm can detect forged images with high performance.
Jaiprakash et al. proposed a low-dimensional feature-based model [23]. Features
are extracted through image statistics and pixel correlation in the DCT and DWT
domains. A classification ensemble has been chosen for training and testing. The
classifier determines whether the provided images are forgeries or genuine. Parveen
et al. proposed a five-step block-based system [24]: converting the source image to
grayscale and dividing it into equal-sized blocks. DCT is used for feature extraction,
clustering is made using the K-means algorithm, and finally, feature vector matching
Digital Image Forgery Detection Based on Convolutional Neural Networks 193
is carried out with the radix sort technique. Wang et al. proposed dividing the input
image into circular blocks [25]. The original image was normalized with a Gaussian
low-pass filter. Then, overlapping circular blocks are generated, and the invariant
quaternion exponent moments QEMs technique is used holistically to extract features
as a feature vector. Finally, matching blocks are performed based on exact Euclidean
locality-sensitive hashing. Niyishaka and Bhagvati employed blobs in contrast to
image blocks and used the RANSAC algorithm to eliminate spurious matches [26].
The authors claimed that the technique combines DoG (Difference of Gaussian) and
ORB (Oriented Fast and Rotated Brief) and successfully handles several scenarios,
such as geometric transformations and repeated copy-move forgeries. The Zernike
moments extract image features by [27–29].
These techniques extract local features from the input image and represent them as a
set of descriptors. The descriptors improve the dependability of the characteristics.
Then, descriptors are matched to locate forgery regions [30, 31]. SIFT, SURF, ASIFT,
BRIEF, ORB, and LBP are among the most popular techniques for extracting key
points that are robust to scale and rotation transformations. Furthermore, these tech-
niques show significant performance in terms of speed and accuracy. Li and Zhou
developed an interesting hierarchical matching strategy and an iterative localization
technique to reduce false alarm rates [32]. SIFT had been applied at multiple image
scales using a scale-space representation. To enhance accuracy, the color information
of each key point was used within the iterative localization technique. Fatima et al.
combined two feature extraction methods: SIFT and BREIF, in which the former is
applied for smooth regions while the second is applied for noisy regions [33]. The
key point matching step used the 2nd nearest neighbor.
Yang et al. developed an algorithm for distributing key points in forensic scenarios
[34]. First, the RGB image is converted to a grayscale image. Key-point detection is
performed using adaptive SIFT. Then, the AHC algorithm is used for the matching
stage. Dhivya et al. used the SURF technique for keypoint extraction along with
SVM for classification [35]. The author used image preprocessing, which implies
grayscale conversion, the Wiener filter, contrast stretched images, and binary image
conversion.
In general, the weakness of the key point-based approach appears when certain
objects have little details, making key points very hard to detect.
image. Recent research has demonstrated that CNNs can identify image forgeries
with high accuracy. Easily proposed an innovative CMFD model based on deep
learning [36]. A CNN model was specially proposed to create some kind of repre-
sentation of categorized descriptors. After the CNN training phase, the system can
test and classify images to detect copy-move forgeries.
Yao presented a deep learning-based model for detecting video forgeries [37].
The proposed model is based on CNNs for extracting features. The frames of the
video were preprocessed in three stages. They include an absolute difference layer
for frames to reduce temporal redundancy between video frames. Furthermore, data
augmentation was applied to prepare image patches for the training phase.
Wu et al. introduced a deep learning-based model named BusterNet for CMFD
[38]. BusterNet implies two CNN architectures followed by a fusion model.
BusterNet is capable of localizing potential manipulated regions via feature simi-
larities. The author stated that BusterNet outperforms state-of-the-art models.
Convolutional Neural Networks (CNNs)
CNNs are sophisticated artificial neural networks that utilize convolutional kernels
for successful pattern recognition and image processing tasks [1, 39]. They consist
of neurons that optimize themselves through learning, each receiving input and
performing an operation [36, 40]. The network articulates a single perceptive score
function, the weight, from the input raw image vectors to the output class score.
CNNs include an input layer, multiple CONV layer conversions, and a pooling layer
to be injected into the Fully Connected (FCN) layer. The architecture consists of
four primary components: a filter, a convolution layer, an activation function, and a
pooling or subsampling layer [41].
CNNs rely on the convolutional layer, which generates a 2D activation map for
visualization and learning activations. Convolution filters input data in image clas-
sification using 2D filters, with pre-defined coefficients computed during training
using a gradient descent algorithm. Methods like parameter sharing and backpropa-
gation stage restrict activation maps to the same weights and bias, updating a singular
set of weights rather than each neuron individually. Activation functions transform
input signals into output signals by applying nonlinear operations to each element
of the convolution layer and feeding the output as input to the next layer. To obtain
the pooling layer, which is in charge of invariance to data variation and pertur-
bation, the pooling operation comprises looking at feature maps and aggregating
data within local regions. A typical pooling strategy includes maximum pooling and
average pooling. The pooling layer is responsible for data invariance and can be
obtained through the pooling operation, which involves scanning feature maps and
aggregating data in local regions.
However, features (i.e., descriptors) are obtained through the use of Fully
Connected Layers (FCN) [42]. The FC layer is a feed-forward and is used as a
classifier, connecting each neuron to all neurons in the previous layer. It follows the
basic method of multiple-layer perceptron neural networks and inputs a vector from
the convolutional layer [43]. The FC layer performs matrix multiplication, adds a
bias vector, applies an activation function, and produces an output vector. The output
Digital Image Forgery Detection Based on Convolutional Neural Networks 195
layer produces model predictions, with sigmoid activation functions for binary clas-
sification and Softmax for multi-class problems. These layers can build deep learning
models for detecting image copy-move forgery, identifying manipulated regions, and
accurately distinguishing original and forged images.
This step aims at enhancing image quality. Practically, adjusting the image inten-
sity levels constantly improves the quality of the input images, which contributes
to enhanced detection accuracy. Logarithmic mapping was used in this work. The
logarithmic mapping technique is a fundamental concept in image processing. It is
widely used in various applications to enhance the features in the darker areas of the
image at the expense of those in the brighter areas.
It is a pixel-based mapping in which the mapping function is a logarithmic curve,
as shown in Fig. 2, where its mapping function is defined as:
Fig. 2 The logarithmic transformation used to enhance the features in dark regions
196 N. M. Saleh and S. A. Naji
where f (x, y) is the input image, and g(x, y) is the output image, and c is the scaling
constant defined as follows:
255
c= (2)
log(1 + |R|)
The system resizes the input images to a standard fixed size of 224 × 224, which
will be passed to the CNN Model.
Our model’s primary network structure draws inspiration from the VGG-16’s struc-
ture [16], with the least convolutional layers, to minimize the computational cost and
make the system lighter and more convenient for real-time applications. The model
has been trained for image forgery detection with supervised training. The features
are extracted through a series of four convolutional layers, each followed by an acti-
vation function. After each of the two convolutional layers, a max pooling layer is
applied. As a result, the network has two max pooling layers. The SoftMax activation
layer is applied to the input of the neuron, normalizing the output probabilities to be
a probability distribution over the classes. Table 1 summarizes the CNN architecture
of convolutional and pooling layers. For convolutional layers 1, 2, 4, and 5, the filters
used are 64 and 128 singly in the designated order. Initializing CNN weights had
been done based on a pre-trained CNN created for the classification task.
3.4 Classification
Accordingly, the image is most likely a “forged” image if the model produces values
of 0.71 for the “forged” and 0.29 for the “original” class.
4 Experimental Results
4.1 Datasets
This work used three well-known benchmark datasets for testing and evaluation.
These are the MICC-F220, MICC-F600, and MICC-F2000 datasets of the Images
and Communication Lab (ICL), University of Florence [44]. These datasets contain
a diversity of forgery types, sizes, and biases and could be generalized to many
scenarios. These are publicly available datasets for researchers in image forensics
and other topics.
−1
N M
log loss = Xij . log Pij (4)
N i=1 j=1
where Xij indicates whether the ith sample belongs to a class ( j); Pij indicates the
probability of sample (i) belonging to class ( j). The model aims at minimizing the
loss function [36].
This section presents the evaluation results for the proposed CNN architecture in
detail. The proposed model is tested and trained using MICC-F2000, MICCF220,
and MICC-F600 datasets.
It is necessary to conduct several tests (i.e., by trial and error) to determine the
optimal CNN architecture. Over 600 distinct CNN structures have been tested in this
study. The CNN’s structure was developed by experiential trial-and-error, wherein the
number of layers and other parameters were gradually adjusted. The “original” and
“forged” images are continually presented as input with the corresponding desired
targets. The system’s output is compared with the desired target, followed by CNN
adjustment until the highest accuracy is achieved, along with the minimum log loss.
After training the proposed model on the datasets, the parameters and weights of the
model are saved for use later in the testing phase.
The proposed CNN model’s results using the datasets mentioned earlier are
presented in Tables 2, 3, and 4, respectively. Table 2 shows the results of the MICC-
F2000 dataset in two sections. The first section shows the results using the source
images with no preprocessing image enhancement step, in which the highest accu-
racy reached is 91.93%. The results of applying the logarithmic mapping technique
(refer to Sect. 3.1) to enhance the source images are presented in the second section.
The table shows the results for 50 epochs. The accuracy generally increases as the
number of epochs increases, peaking at 96.74% after 50 epochs.
Table 3 shows the results applied to the MICC-F600 dataset; whereas Table 4
presents the results applied to the MICC-F220 dataset. However, the positive effect
of the image enhancement preprocessing step is still obvious in improving accuracy.
Figures 3, 4, and 5 show the accuracy progress for the training phase of the
proposed model using the three datasets. These figures also show other parameters
Digital Image Forgery Detection Based on Convolutional Neural Networks 199
Table 2 The results of the proposed CNN applied to the MICC-F2000 dataset
The results with no image enhancement
No. of epochs Accuracy Log loss % TPR FPR FNR TNR TT
% % % % % (s)
50 91.93 0.10464 95.55 17.64 4.45 82.36 4.79
The results with image enhancement preprocessing step
No. of Epochs Accuracy Log Loss % TPR % FPR % FNR TNR TT
% % % (s)
50 96.74 0.10808 98.29 1.74 1.71 98.26 5.07
Table 3 The results of the proposed CNN applied to the MICC-F600 dataset
The results with no image enhancement
No. of epochs Accuracy Log Loss % TPR FPR FNR TNR TT
% % % % % (s)
50 95.85 0.43025 93.06 5.11 6.94 94.89 2.88
The results with image enhancement preprocessing step
No. of Epochs Accuracy Log Loss % TPR % FPR % FNR TNR TT
% % % (s)
50 97.11 0.24605 98.32 0.04 1.68 99.96 4.26
Table 4 The results of the proposed FCN applied to the MICC-F220 dataset
The results with no image enhancement
No. of epochs Accuracy Log Loss % TPR FPR FNR TNR TT
% % % % % (s)
50 94.36 0.42454 92.27 0.30 7.73 99.70 2.49
The results with image enhancement preprocessing step
No. of Epochs Accuracy Log Loss % TPR FPR % FNR TNR TT
% % % % (s)
50 96.78 0.25004 97.79 5. 41 2.21 94.59 4.28
used to adjust the training process, such as changing the number of epochs, iterations
per epoch, learning rate, elapsed time, etc.
4.4 Discussion
Table 5 presents a quantitative comparison of the proposed FCN model with several
well-known models. The best results from various models are displayed in the table.
While these models employ different image databases for training and testing, the
table provides a general overview of the performance of these techniques. The
comparison table shows that the detection rates attained by the suggested system
are comparable or equal to the highest published results.
5 Concluding Remarks
This research paper presents a novel computational methodology for CMFD utilizing
machine learning. Although many models were proposed in the literature, many
of them accepted and processed the source images directly. The first step of this
work is to refine the quality of the extracted features, especially in dark regions,
by adopting logarithmic mapping. Then, a novel CNN architecture is proposed to
classify arbitrary images into two categories: “original” and “forged”. The CNN
model extracts image features and generates feature maps. Then, the model finds the
similarities and dependencies between these features. The CNN may be trained, and
the system is prepared to test and classify various input image types.
The proposed model was tested using three publicly available datasets with various
copy-move scenarios, including one or more duplicates with distinct clone regions.
The quantity of training sessions is a crucial factor that must be considered. Different
numbers of epochs were used in several experiments.
Experimental results reveal that the model can detect images tampered with by
various transformations, such as scaling, rotation, and translation. As shown in
Table 3, the highest achieved accuracy was 97.11%. The model is fast, light, and reli-
able. With this, it shows the potential to be applied to various applications concerning
image forensics. Our future work focuses on different directions, such as applying
parallel processing, better preprocessing methods, improving accuracy, and detecting
video forgery.
References
1. Abhishek, Jindal N (2021) Copy move and splicing forgery detection using deep convolutional
neural network, and semantic segmentation. Multimed Tools Appl 80:3571–3599
2. Wang C, Zhang Z, Li Q, Zhou X (2019) An image copy-move forgery detection method based
on SURF and PCET. IEEE Access 7:170032–170047
3. Malathi J, Nagamani TS, Lakshmi KVV (2019) Survey: image forgery and its detection
techniques. In: Journal of physics: conference series, vol 1228, no 1. IOP Publishing, p 012036
4. Mahmood T, Mehmood Z, Shah M, Saba T (2018) A robust technique for copy-move forgery
detection and localization in digital images via stationary wavelet and discrete cosine transform.
J Vis Commun Image Represent 53:202–214
5. Khudhair ZN, Mohamed F, Rehman A, Saba T (2023) Detection of copy-move forgery in
digital images using singular value decomposition. Comput Mater Contin 74(2)
6. Elaskily MA, Aslan HK, Elshakankiry OA, Faragallah OS, Abd El-Samie FE, Dessouky MM
(2017) Comparative study of copy-move forgery detection techniques. In: 2017 International
Conference on advanced control circuits systems (ACCS) Systems & 2017 Intl conf on new
paradigms in electronics & information technology (PEIT). IEEE, pp 193–203
7. Thakur T, Singh K, Yadav A (2018) Blind approach for digital image forgery detection. Int J
Comput Appl 975:8887
8. Shah H, Shinde P, Kukreja J (2013) Retouching detection and steganalysis. Int J Eng Innov
Res 2(6):487
9. Ahmad M, Khursheed F (2021) Digital image forgery detection approaches: a review. In:
Applications of artificial intelligence in engineering: proceedings of first global conference on
artificial intelligence and applications (GCAIA 2020). Springer, pp 863–882
Digital Image Forgery Detection Based on Convolutional Neural Networks 203
10. Meena KB, Tyagi V (2021) Image splicing forgery detection techniques: a review. In: Advances
in computing and data sciences: 5th international conference, ICACDS 2021, Nashik, India,
April 23–24, 2021. Springer, pp 364–388
11. Li J et al (2023) Learning steerable function for efficient image resampling. In: Proceedings of
the IEEE/CVF conference on computer vision and pattern recognition, pp. 5866–5875
12. Zedan IA, Soliman MM, Elsayed KM, Onsi HM (2021) Copy move forgery detection tech-
niques: a comprehensive survey of challenges and future directions. Int J Adv Comput Sci Appl
12(7)
13. Lin X, Li J-H, Wang S-L, Cheng F, Huang X-S (2018) Recent advances in passive digital image
security forensics: a brief review. Engineering 4(1):29–39
14. Szegedy C et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference
on computer vision and pattern recognition, pp 1–9
15. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image
recognition. arXiv:1409.1556
16. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional
neural networks. Adv Neural Inf Process Syst 25
17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
18. Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Garcia-Rodriguez J (2017)
A review on deep learning techniques applied to semantic segmentation. arXiv:06857
19. Yang P, Baracchi D, Ni R, Zhao Y, Argenti F, Piva A (2020) A survey of deep learning-based
source image forensics. J Imaging 6(3):9
20. Gani G, Qadir F (2020) A robust copy-move forgery detection technique based on discrete
cosine transform and cellular automata. J Inf Secur Appl 54:102510
21. Ahmed IT, Hammad BT, Jamil N (2021) Image copy-move forgery detection algorithms based
on spatial feature domain. In: 2021 IEEE 17th international colloquium on signal processing &
its applications (CSPA). IEEE, pp 92–96
22. Zimba M, Xingming S (2011) DWT-PCA (EVD) based copy-move image forgery detection.
Int J Digit Content Technol Its Appl 5(1):251–258
23. Jaiprakash SP, Desai MB, Prakash CS, Mistry VH, Radadiya KL (2020) Low dimensional
DCT and DWT feature based model for detection of image splicing and copy-move forgery.
Multimed Tools Appl 79:29977–30005
24. Parveen A, Khan ZH, Ahmad SN (2019) Block-based copy–move image forgery detection
using DCT. Iran J Comput Sci 2:89–99
25. Wang X-Y, Liu Y-N, Xu H, Wang P, Yang H-Y (2018) Robust copy–move forgery detection
using quaternion exponent moments. Pattern Anal Appl 21:451–467
26. Niyishaka P, Bhagvati C (2018) Digital image forensics technique for copy-move forgery detec-
tion using dog and orb. In: Computer vision and graphics: international conference, ICCVG
2018, Warsaw, Poland, September 17–19, 2018, proceedings. Springer, pp 472–483
27. Ouyang J, Liu Y, Liao M (2019) Robust copy-move forgery detection method using pyramid
model and Zernike moments. Multimed Tools Appl 78:10207–10225
28. Chen B, Yu M, Su Q, Shim HJ, Shi Y-Q (2018) Fractional quaternion Zernike moments for
robust color image copy-move forgery detection. IEEE Access 6:56637–56646
29. Mahmoud K, Husien A (2016) Copy-move forgery detection using zernike and pseudo zernike
moments. Int. Arab J. Inf. Technol. 13(6A):930–937
30. Alberry HA, Hegazy AA, Salama GI (2018) A fast SIFT based method for copy move forgery
detection. Futur Comput Inform J 3(2):159–165
31. Mahmoud K, Al-Rukab AHA (2016) Moment based copy move forgery detection methods.
Int J Comput Sci Inf Secur14(7)
32. Li Y, Zhou J (2018) Fast and effective image copy-move forgery detection via hierarchical
feature point matching. IEEE Trans Inf Forensics Secur 14(5):1307–1322
33. Fatima B, Ghafoor A, Ali SS, Riaz MM (2022) FAST, BRIEF and SIFT based image copy-move
forgery detection technique. Multimed Tools Appl 81(30):43805–43819
204 N. M. Saleh and S. A. Naji
34. Yang B, Sun X, Guo H, Xia Z, Chen X (2018) A copy-move forgery detection method based
on CMFD-SIFT. Multimed Tools Appl 77:837–855
35. Dhivya S, Sangeetha J, Sudhakar B (2020) Copy-move forgery detection using SURF feature
extraction and SVM supervised learning technique. Soft Comput 24:14429–14440
36. Elaskily MA et al (2020) A novel deep learning framework for copy-move forgery detection
in images. Multimed Tools Appl 79:19167–19192
37. Yao Y, Shi Y, Weng S, Guan B (2017) Deep learning for detection of object-based forgery in
advanced video. Symmetry 10(1):3
38. Wu Y, Abd-Almageed W, Natarajan P (2018) Busternet: detecting copy-move image forgery
with source/target localization. In: Proceedings of the European conference on computer vision
(ECCV), pp 168–184
39. Tran DT, Iosifidis A, Gabbouj M (2018) Improving efficiency in convolutional neural networks
with multilinear filters. Neural Netw 105:328–339
40. Chen J, Zhu D, Hui B, Li RYM, Yue XG (2022) Mu-Net: multi-path upsampling convolution
network for medical image segmentation, 131(1)
41. Lei T, Li RYM, Jotikastira N, Fu H, Wang CJC, (2023) Prediction for the inventory management
chaotic complexity system based on the deep neural network algorithm, vol 2023
42. Matsumura N, Ito Y, Nakano K, Kasagi A, Tabaru T (2023) A novel structured sparse fully
connected layer in convolutional neural networks. Concurr Comput: Pract Exp 35(11):e6213
43. Alzubaidi L et al (2021) Review of deep learning: concepts, CNN architectures, challenges,
applications, future directions. J Big Data 8:1–74
44. Amerini I, Ballan L, Caldelli R, Del Bimbo A, Serra G (2011) A sift-based forensic method
for copy–move attack detection and transformation recovery. IEEE Trans Inf Forensics Secur
6(3):1099–1110
45. Elaskily MA, Alkinani MH, Sedik A, Dessouky MM (2021) Deep learning based algorithm
(ConvLSTM) for copy move forgery detection. J Intell Fuzzy Syst 40(3):4385–4405
46. Goel N, Kaur S, Bala R (2021) Dual branch convolutional neural network for copy move
forgery detection. IET Image Proc 15(3):656–665
47. Elaskily MA, Elnemr HA, Dessouky MM, Faragallah OS (2019) Two stages object recognition
based copy-move forgery detection algorithm. Multimed Tools Appl 78:15353–15373
48. Doegar A, Dutta M, Gaurav K (2019) CNN based image forgery detection using pre-trained
alexnet model. Int J Comput Intell 2(1)
49. Agarwal R, Verma OP (2020) An efficient copy move forgery detection using deep learning
feature extraction and matching algorithm. Multimed Tools Appl 79(11–12):7355–7376
Banana Freshness Classification: A Deep
Learning Approach with VGG16
Abstract In the food sector, ensuring the safety and quality of banana products is
crucial. The classification of bananas into “fresh banana” and “rotten banana” cate-
gories is the main objective of this study. The studied banana varieties are cavendish,
lady fingers, and red bananas. We use the VGG16 deep convolutional neural network
with a dataset of high-resolution banana images. Our method includes training-testing
ratios, picture enhancement, and rigorous data preprocessing. The results indicate
how well the VGG16 model performs in classifying freshness, with good recall,
accuracy, precision, and F1-score. Additionally, the model successfully differenti-
ates between cavendish, lady finger, and red bananas, highlighting its capacity to
handle minute variations. This work expands the application of image classification
to other fruit varieties by offering a dependable technique for quality control and
automated evaluation of banana freshness.
1 Introduction
A significant section of the global population depends on bananas for essential nutri-
ents and energy because they are a fruit consumed widely worldwide. The degree to
which a banana is ripe greatly impacts its flavor, nutritional value, and overall appeal.
The accuracy with which fresh and damaged bananas may be distinguished is cru-
cial at every stage of the banana supply chain, from production and transportation
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 205
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_16
206 F. V. Patre et al.
2 Literature Review
In July 2023, Amin and associates [1] published an automated method for classi-
fying fruit freshness that focused on fresh and spoiled oranges, bananas, and apples.
Their research yielded high accuracy rates for these fruit groups of 98.2%, 99.8%,
and 99.3%, thanks to a well-organized dataset structure. However, the study had some
drawbacks, such as the small range of fruits it examined and the lack of an intuitive
smartphone application for real-world use. The authors recommended more research
on hyperparameters to strengthen the validity of their findings, as these variables may
offer insightful information for enhancing fruit freshness categorization algorithms.
In May 2023, Knott et al. [5] investigated machine learning methods for evaluating
fruit quality, particularly assessing apple flaws and estimating banana ripeness. When
compared to conventional CNN models, they showed competitive accuracy using pre-
trained Vision Transformers (ViTs). Their ViT-based method, which needed fewer
training samples to achieve 90% accuracy, was impressive. The study did identify
some possible downsides, though, such as worries about how well the ViTs would
function in challenging situations and the potential for preprocessing-induced image
distortion or loss of information.
Liang and associates’ [7] research aimed to forecast banana maturity by ana-
lyzing color and sweetness characteristics along three banana finger segments as
they ripened. They examined gene expression and enzyme activity in addition to
non-destructive analyses of sweetness and color. For classification, they used prin-
cipal component analysis and cluster plots; for prediction, they used support vector
machines, random forests, and artificial neural networks. With an emphasis on sweet-
ness and color, the study distinguished six unique maturation stages and created two
efficient maturity prediction algorithms.
Mamidi and associates [10] explore the critical field of automated fruit freshness
classification in the food industry. Recognizing the shortcomings of conventional
visual observation techniques, the research uses a varied dataset that includes fresh
and rotting oranges, bananas, and apples. Utilizing both traditional machine learning
techniques and sophisticated deep learning models such as Inception and Xception,
the study highlights the enhanced efficacy of deep learning in precisely categorizing
the freshness of the fruit. These models regularly outperform conventional methods
in comparison results, demonstrating the promise of convolutional neural networks
for accurate and effective automated fruit quality assessment.
Deep learning techniques were used in May 2022 by Raghavendra et al. [11] to
develop a dual-channel banana grading system for Taiwanese varieties. The system’s
98.4% accuracy and 0.97 F1-score for banana categorization, obtained by merging
RGB and hyperspectral sensors, proved the effectiveness of Convolutional Neural
Networks (CNN) and Multi-Layer Perceptrons (MLP). With RGB and hyperspec-
tral photos, the model 99% accurately predicted bananas’ size and perspective. The
dataset is limited to Taiwanese bananas, which may affect generalizability.
Fu et al. [3] concentrated on fruit freshness grading techniques, addressing six
fruit classes. The dataset was divided into training and validation sets, totaling about
4,000 photos. In addition to YOLO for the region of interest extraction, deep learning
models, including ResNet, VGG, GoogLeNet, and AlexNet, were used for freshness
grading. It was decided to combine CNN models with real-time object detection
208 F. V. Patre et al.
hierarchically. In evaluating the freshness of fruit, the study found that deep learn-
ing algorithms performed exceptionally well. Nevertheless, it did not offer precise
numerical outcomes or measurements.
With an emphasis on apples, bananas, and oranges, Kazi et al. [4] used image
classification to evaluate the freshness of fruit in the food business. ResNet50 fared
better than AlexNet and VGG16 using transfer learning with CNN architectures,
requiring less computations. The research identified several fruit rotting kinds using
a six-class dataset. The absence of generalization to other fruits with different ripen-
ing characteristics and a brief discussion of practical applications beyond banana
classification are among the limitations.
For the purpose of classifying banana ripeness stages, Saranya and Venkateswaran
[13] employed deep learning, more precisely a Convolutional Neural Network
(CNN). They trained their model using original and enhanced photos and compared it
to other models. Their method yielded 96.14% accuracy, allowing for faster training
with fewer parameters. Accuracy was further enhanced by adding more data.
Pardede et al. [9] created a fruit ripeness detection system utilizing VGG16 and
transfer learning in April 2021. They considerably outperformed conventional fea-
ture extraction techniques, achieving 90% accuracy. Nevertheless, the study did not
address the wider issues associated with implementing deep learning in practical
applications, nor did it investigate its applicability in other domains outside of fruit
ripeness detection.
In a June 2023 study, fruit freshness was determined using the YOLOv8 model
and the SE attention mechanism published in the “Academic Journal of Science and
Technology.” [15] The model’s average accuracy of 87.8% when using the 401 photos
in the “fruit_bad_dataset” indicates that it may be used for real-time processing in
the agricultural industry. However, the study admits several shortcomings, such as
a limited dataset and possible class disparities. Future studies might resolve these
problems to improve the model’s applicability.
With the help of Random Forest, Airej et al. [2] were able to achieve 99% accuracy
in classifying fruit diseases using machine learning on a dataset consisting of 13,587
photos of both healthy and rotting apples, bananas, and oranges. The study did
point out several difficulties, particularly when using Hu Moments to identify bad
oranges. It disregarded environmental aspects that are essential for practical smart
farming. The comparison of VGG16 with existing models is shown in Table 1 with
the methodology used along with the freshness accuracy obtained by each model.
3 Methodology
Using the Kaggle dataset “Fruits fresh and rotten for classification,” which contained
apples and oranges in addition to bananas, the study concentrated on bananas. Three
different kinds of bananas were created for classification after ladyfinger and red
bananas were added to increase diversity.
Banana Freshness Classification: A Deep Learning Approach with VGG16 209
An organized methodology separated the training and testing datasets from the
main dataset. Each subgroup was then sorted into labeled files so that images of fresh
and rotten bananas could be distinguished.
The dataset for the banana cultivars Cavendish, Red, and Ladyfinger is summa-
rized in Table 2, along with the number of fresh and rotten cases in the training and
testing datasets. Cavendish Bananas, for example, has 1217 fresh and 1678 rotting
examples in the training dataset, totaling 3967 photos, some of which are shown in
Fig. 1.
The following preprocessing procedures were applied to every image in the training
and testing datasets to guarantee uniformity and consistency in our analysis:
1. Resizing: Standardized to 224 .× 224 pixels, all images in the dataset ensured
consistent dimensions and allowed the deep learning models to process and
analyze them successfully.
2. Color Channels: The images were kept in the standard RGB format (224,
224, 3) to help deep learning models analyze and extract information more
effectively [1, 4].
210 F. V. Patre et al.
Image augmentation is used to boost the model’s size, diversity, and resilience;
particular values are used for each modification to improve the quality of our banana
freshness classification algorithm.
1. Rotation(Rot): The model’s adaptability to changes in banana orientations is
enhanced through random rotation augmentation, introducing rotation angles (.θ )
sampled between –20 and 20 .◦ C for increased robustness.
3. Shear: To simulate the deformation that might take place in real-world situations,
shearing transformations were used. To provide controlled fluctuations, shear
factors (s) were collected from –0.1 to 0.1.
4. Zoom: The bananas’ scale was allowed to fluctuate by applying random zooming.
Samples of zoom factors (z) were taken between 0.9 and 1.1. This augmentation
procedure simulates differences in the bananas’ closeness and framing within the
photos.
.Zoomed Image = Zoom(Original Image, z) (4)
5. Horizontal Flip (H_Flip) and Vertical Flip (V_Flip): Both vertical and horizon-
tal flips are utilized in augmentation procedures to produce bananas that are mirror
images of one another. The dataset is made more diverse by these adjustments.
With the help of domain expertise and features unique to the dataset, applying exact
parameters to picture augmentations improved the model’s capacity to assess banana
freshness reliably. It increased the training dataset’s effective size [1, 9, 13, 14].
4 Experimental Work
In our research, we use RMSProp, a dynamic learning rate optimizer that adapts
the learning rate for each parameter. This enhances convergence speed and ensures
optimal updates by adjusting the learning rate based on the gradient’s moving average,
preventing rapid diminishment.
At the conclusion of each training epoch, a Model Checkpoint stores the optimal
model weights if the validation accuracy for the current epoch surpasses that of the
prior best. This protects the best-performing model against lost data and interrupted
training [9, 11].
This section outlines the detailed results of our VGG16 model-based research
on banana freshness categorization, including accomplishments, insights, and
performance data.
Accuracy and Precision: The VGG16 model performed remarkably well, achiev-
ing 99% classification accuracy in differentiating between “fresh bananas” and “rot-
ten bananas.” Its exceptional accuracy emphasizes its dependability for real-world
applications, especially when it comes to detecting positive situations with few false
positives.
Recall and F1-Score: The VGG16 model consistently performs well in minimiz-
ing false positives and false negatives, ensuring the correct classification of banana
freshness. Its strong recall scores highlight its accuracy in identifying positive occur-
rences, especially fresh bananas. Meanwhile, the balanced F1-score highlights its
effectiveness.
214 F. V. Patre et al.
• True Positives (TP): The number of “fresh banana” photos that have been correctly
classified.
• False Positives (FP): The quantity of photos that are supposed to be of “fresh
bananas” but are actually of “rotten bananas.”
• False Negatives (FN): The amount of photos that are actually of “rotten bananas”
but are wrongly labeled as “fresh bananas.”
• True Negatives (TN): The number of “rotten banana” photos that are accurately
classified.
The VGG16 model has remarkable accuracy, convergence, and signaling stability,
as seen by the graph presented in Fig. 3. To avoid overfitting, it is advised to end the
model early after about 10 epochs. Considering the dataset’s characteristics and
correcting class imbalances further improves model evaluation.
The VGG16 model shows good generalization with a noteworthy test accuracy of
99.6%, despite overfitting as evidenced by near-perfect training accuracy and lower
validation accuracy.
Figure 4 from our study shows the learning process for image categorization by the
VGG16 model. The training loss (blue) immediately drops dramatically, indicating
rapid data absorption. However, it reaches a plateau around epoch 5, suggesting
Fig. 3 The graph demonstrates the training and validation accuracy of our VGG16 model over
epochs
Banana Freshness Classification: A Deep Learning Approach with VGG16 215
Fig. 4 The graph illustrates the training and validation loss of our VGG16 model over epochs
little more learning. Validation loss (orange), on the other hand, exhibits a similar
pattern but plateaus later and stays higher, suggesting an overfitting problem. This
disparity indicates that the model struggles with generalization while memorizing
training data. Despite this, a 99.6% test accuracy rate suggests successful real-world
adaption.
In Fig. 5, the “fresh banana” and “rotten banana” confusion matrix illustrates a
classifier’s performance. The model is quite good at recognizing healthy bananas;
it properly identified 384 “fresh banana” cases and incorrectly identified only 2
occurrences as “rotten banana.” Similarly, it incorrectly recognizes only 1 banana
as “fresh banana” while accurately classifying 533 rotten ones. The model performs
well overall, with an accuracy of 917/1000.
We offer the classification report, which thoroughly evaluates our model’s perfor-
mance.
The model for binary classification performs exceptionally well in differentiating
between “fresh banana” and “rotten banana,” attaining flawless F1-score, recall, and
precision (all 1.00). The model has remarkable effectiveness in correctly identifying
occurrences of both classes, as evidenced by its total accuracy of 1.00 as shown in
Table 3.
216 F. V. Patre et al.
This study uses the VGG16 deep learning model, which achieves a 99.6% accuracy
rate in classifying the freshness of bananas, including Cavendish, Lady Finger, and
Red Banana types. The model works well, consistently achieving precision, recall,
and accuracy above 99%. The VGG16 model is a robust tool for automated banana
freshness evaluation, effectively differentiating between pictures of “fresh bananas”
and “rotten bananas.” This study offers a practical solution to ensure consumers have
a consistent supply of fresh fruit, with implications for reduced waste and greater
output in the food business.
Future research opportunities examining the scalability of real-time applications,
handling larger datasets, and diversifying the dataset with additional fruit species
Banana Freshness Classification: A Deep Learning Approach with VGG16 217
could open up new areas of investigation for this study. Learning more about the dif-
ferences in fruit maturity may be possible by investigating cutting-edge techniques
like object detection. By offering more accurate tools for food processing and qual-
ity inspection, these next projects have the potential to revolutionize deep learning
applications in agriculture.
References
1. Amin U, Shahzad MI, Shahzad A, Shahzad M, Khan U, Mahmood Z (2023) Automatic fruits
freshness classification using CNN and transfer learning. Appl Sci 13(14):8087
2. Airej AE, Hasnaoui ML, Benlachmi Y (2022) Fruits disease classification using machine
learning techniques. Indones J Electr Eng Inform 12(3)
3. Fu Y, Nguyen M, Yan WQ (2022) Grading methods for fruit freshness based on deep learning.
SN Comput Sci 3(4)
4. Kazi A, Panda SP (2022) Determining the freshness of fruits in the food industry by image
classification using transfer learning. Multimed Tools Appl 81(6):7611–7624
5. Knott M, Perez-Cruz F, Defraeye T (2023) Facilitated machine learning for image-based fruit
quality assessment. J Food Eng 345:111401
6. Lee Y, Kim J (2023) Psi analysis of adversarial-attacked DCNN models 13(17)
7. Liang C, Cui Y, Du H, Liu H, Ma L, Zhu L, Yu Y, Lu C, Benjakul S, Brennan C, Brennan
MA (2022) Prediction of banana maturity based on the sweetness and color values of different
segments during ripening 5(17)
8. Nikhitha M, Roopa Sri S, Uma Maheswari B (2019) Fruit recognition and grade of disease
detection using inception v3 model, pp 1040–1043
9. Pardede J, Sitohang B, Akbar S, Khodra ML (2021) Implementation of transfer learning using
VGG16 on fruit ripeness detection. Int J Intell Syst Appl 13(2):52–61
10. Mamidi SSR, Munaganuri CA, Gollapalli T, Aditya ATVS, Rajesh CB (2022) Implementation
of machine learning algorithms to identify freshness of fruits, pp 1395–1399
11. Raghavendra S, Ganguli S, Selvan PT, Nayak MM, Chaudhury S, Espina RU, Ofori I (2022)
Deep learning based dual channel banana grading system using convolution neural network 5
12. Saranya G, Venkateswaran H (2022) Detection and classification of brain tumor on MR imaging
using deep neural network based VGG-19 architecture. Periodico di Mineralogia 19:672–683
13. Saranya N, Srinivasan K, Kumar SKP (2021) Banana ripeness stage identification: a deep
learning approach. J Ambient Intell HumIzed Comput 13(8):4033–4039
14. Valentino F, Wawan T, Cenggoro G, Elwirehardja N, Pardamean B (2023) Energy-efficient
deep learning model for fruit freshness detection. IAES Intl J Artif Intell (IJ-AI) 12(3):1386
15. Wei Z, Chang M, Zhong Y (2023) Fruit freshness detection based on yolov8 and se attention
mechanism. Acad J Sci Technol 6:195–197
GreenHarvest: Data-Driven Crop Yield
Prediction and Eco-Friendly Fertilizer
Guidance for Sustainable Agriculture
1 Introduction
Modern agriculture relies on crop yield forecasts and fertilizer advice to maximize
crop output, resource allocation, and environmental sustainability. Crop yield predic-
tion is calculating a particular crop’s expected yield based on various variables, such
as weather patterns, soil properties, crop genetics, and previous yields. Farmers can
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 219
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_17
220 S. P. Shetty and M. Shetty
forecast the performance of their crops and take proactive measures for irrigation,
pest control, harvesting, and marketing by knowing the anticipated output. Farmers
can maximize resource allocation and prepare for market demands with the help of
accurate yield forecasts. An essential component of crop nutrition management is
fertilizer advice. It entails choosing the right type, quantity, and timing of fertilizer
application to meet the crop’s nutrient needs.
The system forecasts crop yield and suggests fertilizers based on location, soil
types, season, and region variables. This advice depends on several variables,
including soil nutrient content, crop intake of nutrients costs, fertilizer imbalances,
and environmental concerns. Farmers may increase nutrient efficiency, limit nutrient
losses, and lessen their environmental impact by receiving customized fertilizer
recommendations. In light of this, we suggest the creation of an intelligent system
that would evaluate soil attributes like type of soil, irrigation, yields, humidity, and
nutrient concentration as well as weather factors such as temperature and rain-
fall before advising the user on the best crop to plant. In addition, a fertilizer
recommendation based on the ideal nutrients of the cultivated crops is also made.
The primary goal is to provide comprehensive knowledge of crops from the time
of cultivation and the method of employing various fertilizers at various phases of
crop development to guard against various illnesses. In this study, crop prediction
will be done using machine learning algorithms based on soil type, rainfall, and
meteorological conditions. The forecasting of fertilizer for good crop yield will be
done using ML algorithms. A mechanism is created in this paper to suggest the best
crops. This method also suggests the necessary fertilizer to increase the nutrients in
the soil and hence increase crop production. It also helps future researchers use an
efficient algorithm to obtain accurate results.
2 Literature Survey
of rice and wheat. If the farmers are interested in another crop, they can’t use this
model.
In Islam et al. [6] and Shah et al. [11], the authors have developed the model for
Bangladesh and the United States, respectively. Hence, the model is not suited for
Indian farmers’ crop yield prediction.
XCYPF framework is proposed by Manjula et al. [8]. This model helps in precision
agriculture by considering rainfall data and surface temperature. It helps predict the
yield of crops like sugarcane, rice, etc.
In Paul et al. [10], it predict the crop based on the soil dataset but does not suggest
fertilizer recommendations.
In Varghese et al. [12], the authors use sensors to sense the ground data and
machine learning models for real-time analysis to predict the future condition of
crops. This model also does not focus on suggestions like another model.
All the above papers note that the data considered for the experiment are from
online models or other countries. Hence, Indian farmers can’t depend on this model
for prediction as the soil condition and yield of crops change from one location to
another. Therefore, in this experiment, we considered our country’s real-time data
for crop yield prediction, and the model also gives suggestions for agriculture, which
makes the work unique.
3 Problem Definition
Green harvest addresses the problem of low agricultural yields caused by the incorrect
fertilizers being applied or the usage of the wrong amounts of fertilizers. Farmers
frequently struggle with selecting the best fertilizers and deciding how much to
apply to their crops. Farmers may decide what fertilizers to use and how much
to use depending on many criteria such as location, soil types, season, and area
by constructing a crop yield forecast and fertilizer recommendation system utilizing
machine learning algorithms. Ultimately, this may result in greater agricultural yields
and financial gain for farmers.
Many people work in agriculture but lack the knowledge to know which crops
would grow best in their soil. This means that some crops can only grow in moist soil,
while others need soil with a medium humidity level to flourish. However, neither
farmers nor those just getting into farming are likely to be aware of this.
A system or method known as “crop yield prediction and fertilizer recommenda-
tion” seeks to forecast the productivity or yield of a particular crop and recommend the
right fertilizer application rates to maximize production. Analyzing many elements
that affect crop output, such as weather patterns, soil properties, historical crop data,
and agronomic methods, is required to solve the problem. Crop production fore-
casting frequently makes use of statistical models and machine learning methods.
Agriculture is one of the most significant occupations in India. Many people work in
agriculture but cannot identify the crops that grow best in their soil. This means some
crops can only grow in moist soil, while others need medium humidity to flourish.
222 S. P. Shetty and M. Shetty
However, farmers and newcomers who are interested in farming are less likely to be
aware of this information. They currently have a relatively limited number of tools
and technologies at their disposal that can assist them in increasing quality.
4 Objectives
A crop yield prediction and fertilizer recommendation system aims to give farmers
and other agricultural stakeholders accurate and timely information to maximize crop
production. The method aims to predict a crop’s expected yield based on several
factors, such as weather, soil conditions, historical data, and agronomic practices.
1. Enhanced Decision-Making:
The technology helps farmers decide on crop management tactics, resource alloca-
tion, and market planning by offering precise production projections and fertilizer
suggestions.
2. Increased Crop Productivity:
The method optimizes fertilizer application depending on crop nutrient requirements
and soil conditions to maximize crop yields. This makes it possible to guarantee that
crops receive the necessary nutrients for strong growth and development.
3. Resource Efficiency:
The method aids farmers in maximizing the use of fertilizers by advising exact
fertilizer applications, preventing excessive use, and cutting expenses. Additionally,
it encourages environmentally friendly farming methods by reducing fertilizer runoff
and pollution.
4. Risk mitigation:
By allowing farmers to foresee future yield losses or surpluses, accurate crop yield
estimates help them manage production risks. Farmers can take preventive actions
like changing planting dates or diversifying their crop portfolio with this knowledge.
5. Economic Viability:
The system tries to improve the economic viability of farming operations by maxi-
mizing crop yields and resource utilization. Lowering input costs, eliminating crop
losses, and spotting chances for market optimization help farmers increase their
profitability.
GreenHarvest: Data-Driven Crop Yield Prediction and Eco-Friendly … 223
5 Methodology
A crucial component of every machine learning system is data. Data collected at the
district level were important since local climates vary. Historical information about
the crops and climate of a certain area was required to deploy the system.
The training dataset’s correctness and parameter count influence any machine
learning algorithm’s accuracy. In this study, the datasets gathered from the statis-
tical department in Mangalore were analyzed, and the factors that would produce
the best results were carefully chosen. Several studies in this sector have used yield
as a significant component, along with location, soil type, season, and area, to fore-
cast agricultural sustainability using environmental data. The data flow diagram is
depicted in Fig. 1. In the data flow diagram, the model accepts input of location, soil
type, season, and area of agriculture and performs three tasks. They are:
1. It classifies soil using Random Forest, support vector machine, and K nearest
neighbor algorithm.
2. It predicts the crop yield.
3. It recommends fertilizer.
The steps are involved in the crop yield prediction and fertilizer recommendation
system.
The results of a crop yield prediction and fertilizer recommendation will depend on
the specific machine learning models and techniques used and the quality of the input
data. These results demonstrate that machine learning can be an effective tool for
predicting crop yield and recommending fertilizers, with high levels of accuracy made
achievable by applying the right techniques and data. It’s important to remember that
the system’s accuracy can vary depending on the crop type, the area, and other factors
affecting crop productivity.
Three popular algorithms are used for the classification of the green harvest. They
are K’s nearest neighbor, Support Vector Machine, and Random Forest.
K Nearest Neighbor (KNN)
It is one of the widely used supervised learning classifiers. The KNN works by
identifying the nearest neighbor. The distance between neighbors is calculated using
Euclidean distance (E.D.). This algorithm helps to classify the crop based on season,
location, and soil type.
Support Vector Machine (SVM)
It is another widely used supervised machine learning algorithm. The SVM algorithm
helps to find the optimal hyperplane in an N-dimensional space.
Random Forest
The random forest classifier considers only a random subset of the features. One
can create trees by additionally using random thresholds for each feature rather than
searching for the best possible thresholds (like a normal decision tree does).
Results and Accuracy
All the algorithms are based on supervised learning. Our overall system is divided
into two modules:
• Crop recommender.
• Fertilizer Recommender/Suggestion
Table 1 gives the accuracy of Green Harvest for different algorithms. KNN gives
72% accuracy, which is less than the Support vector and random forest. Random
forest results have the best accuracy of 97% when compared to KNN and SVM. The
accuracy graph is depicted in Fig. 2.
From the result, it is noted that among the algorithms, Random Forest is a more
suitable algorithm for prediction, and it is recommended that farmers use the Green
Harvest model to increase their crop production.
7 Conclusion
Green Harvest is a useful model that can assist farmers in maximizing crop output and
also for better usage of fertilizer for crops. In this study, crop production is predicted,
and fertilizer recommendations are made for certain crops using machine learning
models and techniques, including random forest, KNN, and support vector machines
(SVM). The system’s accuracy will be determined by the particular machine learning
models and methodologies employed, as well as the caliber of the input data. From
the result, it is identified that Random Forest gives better accuracy results than KNN
and SVM. However, the excellent accuracy rates noted in numerous research imply
that machine learning can be useful for estimating crop production and suggesting
fertilizers.
Overall, the model has the potential to significantly improve crop yield and reduce
the environmental impact of farming by minimizing the use of fertilizers. This system
helps the farmer choose the right crop by providing insights that ordinary farmers
don’t keep track of, thereby decreasing the chances of crop failure and increasing
productivity. It also prevents them from incurring losses. The system can be extended
to the web and accessed by millions of farmers nationwide.
GreenHarvest: Data-Driven Crop Yield Prediction and Eco-Friendly … 227
References
1. Radhika, Narendiran (2018) Kind of crops and small plants prediction using IoT with machine
learning. Int J Comput & Math Sci
2. Anguraj Ka, Thiyaneswaran Bb, Megashree Gc, Preetha Shri JGd, NavyaSe, Jayanthi Jf (2020)
Crop recommendation on analyzing soil using machine learning
3. Bang S, Bishnoi R, Chauhan AS, Dixit AK, Chawla I (2019) Fuzzy logic based crop yield
prediction using temperature and rainfall parameters
4. Gandge Y (2017) A study on various data mining techniques for crop yield prediction. In
2017 International Conference on Electrical, Electronics, Communication, Computer, and
Optimization Techniques (ICEECCOT)(pp.420–423). IEEE
5. Gandhi N, Petkar O, Armstrong LJ (2016) Rice crop yield prediction using artificial
neural networks. In 2016 IEEE Technological Innovations in ICT for Agriculture and Rural
Development (TIAR) (pp.105–110). IEEE
6. Islam T, Chisty TA, Chakrabarty A (2018) A deep neural network approach for crop selection
and yield prediction in Bangladesh, In 2018 IEEE
7. Kadir MKA, Ayob MZ, Miniappan N (2014) Wheat yield prediction: Artificial neural
network- based approach. In 2014 4th International Conference on Engineering Technology
and Technopreneurs (ICE2T) (pp. 161–165). IEEE
8. Manjula A, Narsimha G (2015) XCYPF: A flexible and extensible framework for agricultural
Crop Yield Prediction. In 2015 IEEE 9th International Conference on Intelligent Systems and
Control (ISCO) (pp. 1–5). IEEE
9. Mariappan AK, Das JAB (2017) A paradigm for rice yield prediction in Tamil Nadu. In 2017
IEEE Technological Innovations in ICT for Agriculture and Rural Development (TIAR) (pp. 18-
21). IEEE
10. Paul M, Vishwakarma SK, Verma A (2015) Analysis of soil behavior and prediction of crop
yield using the data mining approach. In 2015 International Conference on Computational
Intelligence and Communication Networks (CICN) (pp. 766–771). IEEE
11. Shah A, Dubey A, Hemnani V, Gala D, Kalbande R (2018) Smart farming system:CropYield
prediction using regression
12. Varghese R, Sharma S (2018) In affordable smart farming using IoT and machine learning.
In: 2018 Second International Conference on Intelligent Computing and Control Systems
(ICICCS)
Real-Time Deep Learning Based Image
Compression Techniques: Review
Abstract The emergence of deep learning techniques has solved many image
processing problems using traditional methods. It has provided pioneering solutions,
especially in image compression, for the urgent need for storage and transmission.
This paper aims to review modern techniques that use image compression using
several neural networks and deep learning methods. These networks have shown
promising results in complex cognitive tasks by providing high compression ratios
while maintaining visual image quality. However, this field lacks further exploration
and testing to evaluate the effectiveness of deep learning across different types of
images, especially in medical images, which has its own challenges and requirements.
Therefore, image compression has become extremely important. In this article, we
begin with an overview of the basics of image compression. A brief introduction
to the types of networks based on deep learning, then a comprehensive summary of
previous literature, and finally, we discuss prospects for image compression methods
based on deep learning.
A. A. Abdulredah (B)
National School of Electronics and Telecoms of Sfax, University of Sfax, Sfax, Tunisia
e-mail: aliatshan981969@gmail.com
M. Kherallah · F. Charfi
Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia
A. A. Abdulredah
College of Computer Science and Information Technology, University of Sumer, Thi Qar, Iraq
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 229
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_18
230 A. A. Abdulredah et al.
1 Introduction
2 Image Compression
Compression plays a significant role in digitally storing vast numbers of images with
the growing use of cloud computing [8]. It becomes a critical necessity in everyday
life to improve image processing quality and minimize the space available to store
Real-Time Deep Learning Based Image Compression Techniques: Review 231
them [9, 10]. Three distinct forms of redundancy exist in a graphical image due to
the unequal sensitivity of the human eye to different visual information. However,
most image compression models adopt the concept as shown in Fig. 1. The original
Image is denoted as I(m,n) and the compressed one by I’(m,n) [11].
Lossy compression focuses primarily on the image file size after compression, which
is significantly reduced, and the image quality diminishes relative to the original
image. Lossy compression enables fidelity for a specific transmission and storage
[12].
3 Deep Learning
In ML research, deep learning (DL) uses theoretical models like deep neural networks
with multiple layers of nonlinear units[16]. DL efficiently handles multi-level data.
It performs well. DL models, especially those with convolutional architectures and
attentional mechanisms, compress images better at lower bit rates [17]. DL model
improvements should improve image compression efficiency and quality [18, 19].
232 A. A. Abdulredah et al.
3.1 Auto-Encoder
The auto-encoder (AE) is a type of artificial neural network that learns the link
between input data and its bottleneck representation in a lower-dimensional latent
space. Neural network feature extraction reduces data dimensionality and eliminates
unnecessary information. After that, performance metrics like PSNR and SSIM to
evaluate the reconstructed Image [20, 21].
Networks of neurons that remember the past and use that data to predict the future
are called recurrent neural networks (RNNs). They have found usage in many fields,
including machine translation, picture compression, and facial recognition [22].
4 Literature Review
Throughout the past few years, DL has been largely acknowledged as one of the
most effective technologies to manage enormous datasets. The idea of deep learning
can be used with complex ANNs. This section summarizes previous literature on
the implementation of DL to achieve the best image compression with high quality
and larger storage space. Table 1 summarizes the literature survey presented in this
section, which includes the dataset, methodology, PSNR value, SSIM value, and the
findings of previous work.
Toderici et al. [25] showed the use of RNN with Entropy coding for image lossy
compression. They achieved PSNR of 33.59dB, SSIM of 0.8933, and Multiband The
structural similarity index (MS-SSIM) is 0.9877. Furthermore, Chen et al. [26] The
authors proposed a network called DPW-SDNet, which is based on a dual pixel-
wavelet domain deep convolutional neural network (CNN), to improve the visual
fidelity of photographs that have been compressed using the JPEG algorithm. The
application of the pixel domain network reduces blocking and ringing artifacts,
whereas the wavelet domain network is responsible for restoring high-frequency
Real-Time Deep Learning Based Image Compression Techniques: Review 233
Table 1 (continued)
Ref. Dataset Methodology PSNR SSIM Result
Li et al. [37] Kodak and DNN and entropy encoding 31.01 0.978 The proposed
Tecnica CCNs
demonstrated
their entropy
modeling
ability in
lossless and
lossy image
compression
Gou et al. Kodak uses (CNNs) and an - 0.985 Achieved a
[36] objective function that compression
combines PatchGAN’s ratio of 80%
competitive discriminator
Zheng et al. DIV2K, LIVE1 Hybrid DNNs with DCT 34.51 0.922 potential for
[35] practical
applications
in stress
reduction
compression
Mishra et al. ImageNet/Kodak Wavelet transform-based 28.8 0.82 Improved
[34] compression-decompression artifact
alg reduction
method for
low bit rate
compression
Li et al. [33] ImageNet Hybrid SPIHT-like 28.01 – Neural
algorithm and arithmetic networks
coding with DNN enhanced the
likelihood of
the Adaptive
Arithmetic
Coding
estimates
Akyazi et al. CLIC2019 Hybrid Haar wavelet 31.25 0.983 Reduced
[32] technique with DNN blurring
artifacts and
blocking, and
preserved
various
details of the
images at low
bit rates
(continued)
Real-Time Deep Learning Based Image Compression Techniques: Review 235
Table 1 (continued)
Ref. Dataset Methodology PSNR SSIM Result
Cheng et al. Kodak Hybrid principal 42.45 0.98 It achieves a
[31] components analysis (PCA) BD reduction
with CAE (quantization and of 13.7%
entropy coder) when
compared to
JPEG2000
Peixoto Kodak, CLIC A block-based technique is 33.4 0.92 The bit rate
et al. [30] used in multi-mode was reduced
intra-prediction with deep by 28%
learning (DL) compared to
the baseline
codec
Mentzer ImageNet, Combined entropy coding - 0.982 The method
et al. [29] Kodak, Urban100 with DNN for training the outperformed
AE BPG, JPEG,
and
JPEG2000.
The technique
also reduced
the rate by
10% for the
context model
Hu et al. CLIC2018, SVAC2 encoder and CNN to 30.84 – SVAC2 with
[28] BSDS200 encode the YUV-420 image CNN
outperforms
JPEG,
JPEG2000,
Web, and
Applicable at
low data rates
Minnen Kodak Hybrid DNN with 30.418 – The bit rate
et al. [27] quality-sensitive bit rate for both
adaptation quantitative
and subjective
image quality
increased
Chen et al. BSDS500/liv1 The network is comprised of 29.53 0.821 Soft decoding
[26] pixel domain and wavelet of JPEG
domain networks image using
wavelet
transform and
DNN, which
is superior to
JPEG, and
low bit rates
(continued)
236 A. A. Abdulredah et al.
Table 1 (continued)
Ref. Dataset Methodology PSNR SSIM Result
Toderici 32 × 32 Kodak It involves the integration of 33.59 0.893 For Image
et al. [25] hybrid RNN with entropy lossy
coding compression.
Multiband
The structural
similarity
index
(MS-SSIM) is
0.9877
Nagarsenker 50 images Compact CNN (ComCNN) 38.45 0.960 Factor
et al. [43] randomly online and (RecCNN) with the Performance
MS-ROI (Multi Structure - (QF) = 5
Region of Interest)
Kamisli CLIC and Mobile uses auto-encoder neural 28.59 – A bit rate of
et al. [42] Datasets networks and CNNs 0.223bpp
Krishnaraj Underwater DWT is used as an image 53.961 – The average
et al. [41] Wireless Sensor codec after the CNN process space saving
Network(UWSN) to minimize the input image is 79.7038%
size
Sujitha et al. SIPI image CNN was used with 49.90 – The outcomes
[40] dataset24 sequential LZMA to encode are compared
the compressed in terms of
representation compression
performance,
reconstructed
quality of
image, and
Structural
Similarity at
89.38%
Liu et al. MCL-JCI CNN and JPEG coder were 35.41 0.82 The model
[39] combined achieved a
lossy lossless
features
accuracy of
92%
Hong et al. ADE20K, Kodak A hybrid semantic 33.57 0.977 Achieved
[38] segmentation network was 35.31%
employed. A CNN was also BD-rate
used to solve the semantic reduction
segment extractor over the
HEVC-based
(BPG) codec,
5% bit rate,
and 24%
encoding time
saving
(continued)
Real-Time Deep Learning Based Image Compression Techniques: Review 237
Table 1 (continued)
Ref. Dataset Methodology PSNR SSIM Result
Li et al. [37] Kodak and DNN and entropy encoding 31.01 0.978 The proposed
Tecnica CCNs
demonstrated
their entropy
modeling
ability in
lossless and
lossy image
compression
Gou et al. Kodak uses (CNNs) and an – 0.985 Achieved a
[36] objective function that compression
combines PatchGAN’s ratio of 80%
competitive discriminator
Zheng et al. DIV2K, LIVE1 Hybrid DNNs with DCT 34.51 0.922 potential for
[35] practical
applications
in stress
reduction
compression
Mishra et al. ImageNet/Kodak Wavelet transform-based 28.8 0.82 Improved
[34] compression-decompression artifact
alg reduction
method for
low bit rate
compression
the trade-off between the rate and distortion of image compression autoencoders
(AEs), which builds upon other studies conducted in this domain. This approach’s
idea is to utilize a three-dimensional convolutional neural network (3D-CNN) struc-
ture, which is responsible for training a conditional probability model of the latent
distribution of the autoencoder (AE). This model aims to measure the entropy of the
latent representation accurately. The Autoencoder (AE) utilized the context model to
assess its entropy throughout the training procedure. Additionally, the context model
was adjusted parallel to acquire knowledge about the relationship.
Peixoto et al. [30] utilized two prediction modes based on convolutional neural
networks (CNNs) and all intra-mode modes from a well-established video encoding
standard.
Their objective was to develop a novel prediction model within an image. The
study also considered bitstream-improving allocation schemes that only worked if
reconstruction error was significantly reduced. A loss image compression scheme
was developed by Ching et al. [31]. They addressed lossy image compression,
intending to design an architecture that achieves high coding efficiency through
Convolutional Automatic Encoders (CAEs). The proposed methodology involves
replacing conventional transformers with a new computer-aided engineering (CAE)
framework, which is subsequently trained using a distortion rate loss function.
Moreover, the proposed approach uses principal component analysis (PCA) to
produce a dense power representation of the feature maps, which leads to enhanced
coding efficiency. Experimental results show that the proposed approach performs
better than traditional image coding algorithms. Specifically, it achieves a BD
reduction of 13.7% compared to JPEG2000 on images from Kodak’s database.
Akyazi et al. [32] proposed two end-to-end CNN-dependent picture compression
structures. Before training, two-dimensional wavelets are used as a preprocessing
step. It has extraction for the wavelet coefficient compression used in the suggested
networks. Training is carried out by utilizing regularization in the loss function, and
many models with different operating rates are produced. Li et al. [33] presented
a two-stage sub-band coding system for coefficients (wavelet coefficients) besides
analysis by Filter banks based on CNN using the set partitioning in Hierarchical trees
SPIHT-like algorithm, followed by the primitive Adaptive Arithmetic Coding (AAC).
The SPIHT-like algorithm stretched the spatial orientation tree to manipulate the
inter-sub band reliance on sub-bands and directions of various sizes. For knowledge-
theoretical calculations, reciprocal information was calculated to formulate these
dependencies. Different primitives were designed to encode the generated bit stream
by adapting its multiple lists and passes. Neural networks enhanced the likelihood of
the AAC estimates, where nonlinear estimates were based on scales, paths, positions,
and coefficient importance contexts.
Recently, Mishra et al. [34] proposed a wavelet transform-based compression-
decompression algorithm incorporating high- and low-frequency components. The
addition of high frequencies assisted in preserving fine details of the Image, such
as boundaries and edges, and significantly reduced blocking artifacts. Based on the
results, the algorithm’s performance exceeded JPEG, JPEG2000, and other advanced
artifact reduction techniques. Zheng et al. [35], proposed a two-domain implicit
Real-Time Deep Learning Based Image Compression Techniques: Review 239
convolutional network) IDCN that can handle color images with discrete cosine
transform (DCT) domain introductions and a flexible version (IDCN-f) to handle a
wide range of compression qualities. To reduce the effects of image compression
in color images. IDCN uses a new dual-band correction unit (DCU) based on an
extractor framework to handle color images with DCT domain introductions. At the
same time, IDCN-f performs excellently across various compression qualities. The
proposed models show great potential for practical applications in stress reduction
compression.
In a separate report by Guo et al. [36], A medical image compression system for
retinal optical tomography (OCT) is highlighted. The goal is to achieve high compres-
sion ratios while maintaining precise structural characteristics. The proposed frame-
work uses convolutional neural networks (CNNs) with data preprocessing, call skip-
ping, and an objective function that combines PatchGAN’s competitive discriminator
and MS-SSIM sanction.
The results showed that the proposed method is superior to other compression
formulas in terms of similarity index and visual inspection. Li et al. [37] introduced
context-based convolutional networks (CCNs) to assess natural picture probabilistic
structure. Parallel entropy decoding uses 3D code splitting and zigzag scanning.
Binary masks static-translated CCN convolution filters. CCN entropy model and
analysis/synthesis transformations optimized distortion rate efficiency. CCN quanti-
fied entropy by explicitly computing the Bernoulli distribution for each symbol. First,
the CCN directly calculated the Bernoulli distribution for each symbol to quantify
entropy in lossless and lossy picture compression. CCNs can model lossless and
lossless image compression entropy. Huang et al. [38] developed a novel multilayer
image compression scheme with the same semantic segmentation for the decoder
and encoder. The encoder and decoder applied the semantic segmentation network
to the sampled image. To improve quality, the nonlinear map of the excised portion
was given a CNN structure from its original distribution.
Nevertheless, the semantic segment obtained from the sampled image did not
precisely match that of the original image. Furthermore, Liu et al. [39] introduced a
DL-based Image Wise Just Noticeable Difference (WJND) model for image coding.
It was first sub-edited as a classification problem. A framework that addressed the
problem was proposed using only one binary classifier. Subsequently, a DL-based
classifier model was constructed to determine whether a combination of colors is
either perceptually lossy or lossless. Thus, the study suggested a shifting window-
based search technique to forecast image Wise.
In a recent study by Sujitha et al. [40], they proposed a compression technique
that uses convolutional neural networks (CNNs) for remote sensing images in the
framework of the Internet of Things (IoT). Their method is used to learn a compact
representation of the original image for its structured data. Then, it is encrypted by the
Lempel Ziv Markov algorithm. They aim to achieve better compression efficiency
and highly restructured image quality than other techniques such as binary tree,
optimal truncation, JPEG, and JPEG2000.
Krishnaraj and colleagues [41] proposed a discrete wavelet transform (DWT)
and deep learning-based IoUT photo compression method. Deep learning-based
240 A. A. Abdulredah et al.
DWT-CNN combines DWT and CNNs to compress images efficiently in IoUT. The
model uses a DWT and CNN to encode and decode images. This technology has
been employed in subaqueous environments. Space-saving (SS) rate was 79.7038%,
and the average PSNR was 53.961 for the DWT-CNN model. Peak signal-to-noise
ratio measures image quality. The compression efficiency and picture reconstruc-
tion quality were better than (SRCNN), JPEG, and JPEG2000. Kamisli et al. [42]
used auto-encoder architectures for large image blocks and neural networks for
intra/spatial prediction and post-processing. The authors introduced block-based
picture compression, eliminating the need for intra-prediction and deblocking neural
networks. Instead, a single auto-encoder neural network uses block-level masked
convolutions with 8 × 8 blocks. Masked convolutions allow each block to encode and
decode information from adjacent blocks. The proposed compression method uses
mutual information between adjacent blocks to reconstruct each block. This elimi-
nates intra-prediction and neural network deblocking. The closed-loop system uses
an asymptotic closed-loop design and stochastic gradient descent for training. The
experimental results show that the proposed picture compression method performs
similarly to established procedures. Nagarsenker and colleagues [43] proposed intro-
ducing a novel compression architecture that utilizes Convolutional Neural Networks
(CNN). The framework comprises two distinct components: the Compact Convolu-
tional Neural Network (ComCNN) and the Reconstruction Convolutional Neural
Network (RecCNN). The researchers attained a Peak Signal-to-Noise Ratio (PSNR)
value of 38.45 decibels and a Structural Similarity Index (SSIM) value of 0.9602.
5 Discussion
Traditional methods, like JPEG and GIF, use lossy compression techniques that
reduce file size, storage requirements, and latency while maintaining reasonable
image quality. This enables image sharing across networks to meet specific needs.
However, these techniques are unsuitable for medical imaging applications because
potentially essential information is ignored, reducing crucial image accuracy. Deep
learning techniques can achieve superior compression ratios while maintaining better
image quality by learning complex patterns and representations from data. This
ensures that the quality of applications that require careful data saving does not
deteriorate. So, more research should be pursued to develop image compression,
especially in the medical field. The CNN model is the best among the modern tech-
niques used in image compression. This study explored research papers that use deep
learning to compress images for better compression, higher quality, and more storage
space. Table 1 shows the systematic data set used in previous works. According to
this table, we find [41] presents a deep learning model for image compression in
Internet of Things (IoUT) based on discrete wavelet transform (DWT). Convolu-
tional neural networks (CNNs) were used to compress and improve the quality of
reconstructed images when encoding and decoding. It obtained a PSNR of 53.961
and an area saving of 79.70%. The results showed that the hybrid DWT-CNN model
Real-Time Deep Learning Based Image Compression Techniques: Review 241
PSNR
30
20
10
0
[25]
[26]
[27]
[28]
[30]
[31]
[32]
[33]
[34]
[35]
[37]
[38]
[39]
[40]
[41]
Ref. NO
is better than other methods, such as SRCNN, JPEG, and JPEG2000. Figure 2 shows
the PSNR tendency for deep learning, which is close to image compression, as shown
by previous research.
6 Conclusion
This study compared deep learning (DL) image compression methods. Despite recent
progress, these methods still struggle to handle increasingly large datasets. Deep
learning, particularly feature extraction, has improved efficiency and compression
over traditional methods. DL-based designs are more adaptable and effective than
handcrafted image compression systems.
However, several research obstacles prevent image compression from reaching
its full potential. Considering hyperparameters and resource constraints, it is crucial
to find a balance between effectiveness and computational cost. Compact real-time
networks must also be prioritized.
Pooling layers and other architectural elements in convolutional neural networks
(CNNs) help extract relevant features. This feature simplifies modeling and enhances
experimentation. Deep learning (DL) works well in image classification, object
recognition, segmentation, and compression. DL techniques have advanced these
domains.
References
27. Minnen D et al. (2018) Spatially adaptive image compression using a tiled deep network,"
Proc. - Int. Conf. Image Process. ICIP, 2017-Septe, 2796–2800.
28. Hu J, Li M, Xia C, Zhang Y (2018) Combine traditional compression method with convolutional
neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) Workshops
29. Mentzer F, Agustsson E, Tschannen M, Timofte R, Van Gool L. (2018) Conditional Probability
Models for Deep Image Compression. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
Recognit. 4394–4402
30. Peixoto E, Hung EM, De Campos T (2018) Multi-Mode intra prediction for Learning-Based
image compression. IEEE Xplore. 1296–1300
31. Cheng Z, Sun H, Takeuchi M, Katto J (2018) Deep convolutional auto encoder-based lossy
image compression. 2018, Pict Coding Symp PCS 253–257
32. Akyazi, Ebrahimi T (2019) A new end-to-end image compression system based on convolu-
tional neural networks. 22
33. Li S, Zheng Z, Dai W, Xiong H (2019). Lossy image compression with filter bank based
convolutional networks. Data Compression Conf Proc. 32–23
34. Mishra D, Singh SK, Singh RK (2021) Wavelet-based deep auto encoder-decoder (wdaed)-
based image compression. IEEE Trans Circuits Syst Video Technol 31(4):1452–1462. https://
doi.org/10.1109/TCSVT.2020.3010627
35. Zheng B, Chen Y, Tian X, Zhou F, Liu X (2020) Implicit dual-domain convolutional network
for robust color image compression artifact reduction. IEEE Trans Circuits Syst Video Technol
30(11):3982–3994
36. Guo Li D, Li X (2020) Deep OCT image compression with convolutional neural networks.
Biomed Opt Express 11(7):3543
37. Li M, Ma K, You J, D, Zuo W (2020) Efficient and effective Context-Based convolutional
entropy modeling for image compression," IEEE Trans. Image Process. 29(1) 5900–5911
38. Hoang TM, Zhou J, Fan Y (2020) Image compression with encoder-decoder matched semantic
segmentation. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work. 619–623.
39. Liu H et al (2020) Deep Learning-Based Image- Wise Just Noticeable Distortion Prediction
Model for Image Compression,”. IEEE Trans Image Process 29:641–656
40. Sujitha, Ben, et al. (2020) Optimal deep learning based image compression technique for data
transmission on industrial Internet of things applications. Trans On Emerging Telecommun
Technol 32: e3976.
41. Krishnaraj S, Mohamed Elhoseny N, Thenmozhi M, Mahmoud SM (2020) Deep learning model
for real-time image compression in Internet of Underwater Things (IoUT),”. J Real-Time Image
Process. 6(17):2097–2111
42. Kamisli F (2022) End-to-End Learned Block-Based Image Compression with Block-Level
Masked Convolutions and Asymptotic Closed Loop Training. https://doi.org/10.48550/arXiv.
2203.11686.
43. Nagarsenker A, Khandekar P, i Deshmukh M (2023) JPEG2000-Based Semantic Image
Compression using CNN. Int J Electr Comput Eng Syst, 14 (5), 527–534. https://doi.org/
10.32985/ijeces.14.5.4
Fog-Cloud Enabled Human Falls
Prediction System Using a Hybrid
Feature Selection Approach
Abstract Elderly people’s human fall prediction is identified as one of the more
challenging factors in real-time healthcare monitoring systems. This type of health-
care system creates huge traffic and delays due to continuous data transmission from
the sensing device to a cloud-based processing system. So, a novel fog-cloud-enabled
human fall prediction system is proposed to minimize traffic and quick response time
due to closer decision-making in the fog-level computing environment. Then, the
amount of data sensed through the accelerometer sensors deployed over humans can
be transferred from fog to the cloud computing layer. To minimize the data transfer
from fog to cloud layer, the fall prediction system incorporates a hybrid feature
selection approach using Particle Swarm Optimization and Grey Wolf Optimization
(PSO-GWO). This novel feature can significantly minimize the bandwidth usage and
latency between the fog and cloud computing nodes. As a result, the proposed fall
prediction system significantly improves prediction time and accuracy compared to
the existing fall prediction systems.
1 Introduction
Human fall prevention seems to be one of the challenging factors among hospitals
in providing a safe and secure healthcare environment. Especially, most old persons
may fall during hospitalization, which is becoming a common issue, and it is consid-
ered to be one of the serious complications during hospital stay. Nearly 6% of the
patients are facing such issues, and this leads to severe head injuries that may cause
increased medical expenses among older persons and an economic burden on fami-
lies and society [1]. Usually, human fall refers to suddenly resting the human on
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 245
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_19
246 R. Ganesan and Y. Bevish Jinila
the ground due to accidental changes in the body positions and loss of unconscious-
ness. The major causes of human falls are the risk factors of skeletal muscle motor
function and postural control ability. A fall prediction accuracy can be improved by
combining a composite equilibrium score and a time up-and-go test [2]. The inci-
dence of human fall dramatically increases day by day. Developed countries such as
China have experienced 79% growth in the last three decades. Most of the research
studies have identified the major risk factors of human falls, such as chronic illness,
physical decline, balance, and gait disorders. Another major risk factor is vestibular
dysfunction, which affects the vestibular system in perceiving human body move-
ments and postures. Dizziness will cause unwanted tension and hesitation among
older adults, leading to fear of falling and affecting their mobility. This will try to
create more negative effects on the patient’s confidence and, thereby, minimize the
interactions, restrict daily activities, and increase depression among adults. There-
fore, there is more demand for developing computerized clinical support systems
to effectively predict falls and recurrent falls among older adults [3]. Moreover, the
performance of the fall prediction can be assessed in terms of prediction accuracy,
discrimination, calibration, and other clinical parameters.
The main objective of the proposed research study is to design and develop an
effective feature selection approach to improve human fall prediction in the proposed
healthcare system. As a result, the proposed approach improves the efficiency of the
fog-cloud-enabled human fall prediction system, which can help minimize treatment
expenses and maximize the likelihood of patient recovery. Most research studies
use three-dimensional accelerometer sensors integrated into wearable devices to
support fall prediction. This wearable sensor-based technology and cameras are a
common solution for accurate fall prediction despite a patient’s outdoor or indoor
environment [4]. One of the challenges these research studies face is the lack of
better solutions due to device resource limitations in power and storage capacity.
So many potential technologies such as wireless sensor technology, the Internet of
Things, cloud computing, and fog computing, can strongly help implement the fall
prediction system. More demand for hybrid combinations of these technologies to
provide high-quality service features to the healthcare system.
The proposed research study’s key contributions include a) designing and devel-
oping the fog-cloud-enabled human fall prediction system and b) developing an
effective hybrid feature selection approach to improve the quality of service and effi-
ciency of the fall prediction system. Therefore, the proposed research study improves
the fall prediction time without degrading the prediction accuracy. The remainder of
the paper is organized into five sections: related works, proposed human fall predic-
tion system, hybrid feature selection approach, experimental results, conclusion, and
future enhancement, respectively.
Fog-Cloud Enabled Human Falls Prediction System Using a Hybrid … 247
2 Related Works
The hybrid feature selection approach is introduced using PSO and GWO techniques.
The proposed hybrid approach optimizes the classification algorithm explored in the
research study. Most of the research problems can be solved using the PSO algorithm.
However, the probability of the PSO algorithm tapping into local minimums needs
to be minimized through appropriate solutions. Therefore, the PSO algorithm directs
certain particles to random positions. This type of risk may lead to moving away
from the global minimum. Further, this risk can be minimized by utilizing the GWO
algorithm, which can direct the particle to the appropriate position enhanced by the
GWO algorithm. To solve the feature section optimization problem, a PSO traverse
problem is defined based on the position of the element p and the velocity of the
element v. It can be computed based on the discrete operations by the following
Eqs. (1) and (2).
vkt+1 = w ∗ vkt + c1t ∗ rand ∗ (pbest tk − pkt ) + c2t ∗ rand ∗ (qbest − pkt ) (2)
f (vkt )
c1t = 1.2 − (3)
f (qbest)
f (vkt )
c2t = 0.5 − (4)
f (qbest)
Let f (vkt ) denotes the fitness of particle k, c1t and c2t Denotes the coefficients
observed at time t, f(qbest) denotes the global best fitness of the swarm. Then, the
inertia formula, sigmoidal function, and particle position can be defined as given in
Eqs. (5), (6), and (7), respectively.
250 R. Ganesan and Y. Bevish Jinila
pij (t + 1) = 1, ifrij < sig(vij (t + 1))0, Otherwise (7)
Here, the value of ij denotes the random number generated from 0 to 1. Then, the
GWO is applied to improve the PSO by optimizing the probability of falling into the
local minimum. As a result, the PSO approach directs some particles to a random
position. This direction of the particle provides some risk, such as moving the particle
away from the global minimum. This type of risk can be improved by directing
some of the particles to new positions based on the enhancement done by the GWO
approach. It will cause the minimum number of GWO iterations within the key loop
of PSO. Here, the mutation probability is considered as 0.1.
5 Experimental Results
As per the observation, it is more evident that the proposed hybrid feature selection
classifier model provides more prediction accuracy and F1 score compared to non-
feature selection approaches. Further, the analysis is based on the F1 score, as shown
in Table 1. Here, the performance of Linear Regression, K-Nearest Neighbour, and
Support Vector Machine is bad in the non-feature selection approach compared to the
hybrid feature selection approach. The K-Nearest Neighbour classifier model outper-
forms the Linear Regression and Support Vector Machine classifier models in both
cases with and without feature selection approaches. In the future, the research study
can be enhanced with the edge-cloud integrated platform for improving healthcare
monitoring and rehabilitation in the smart home and hospital environment [14].
The proposed research study experiments the fog-cloud-enabled human fall predic-
tion system using the accelerometer and gyroscope sensors. Then, the performance of
the proposed healthcare system is compared with the existing healthcare systems in
the context of feature selection and classifier models. Here, the proposed hybrid
feature selection (PSO-GWO) approach outperforms the existing CF and FCF
feature selection approaches exploited in the existing research studies. Moreover,
the proposed linear regression-based classifier model outperforms the existing arti-
ficial neural network, K-nearest neighbour, and support vector machine classifier
models regarding prediction time and accuracy. This low-cost fall prediction system
will help the healthcare industry overcome the human fall risk during hospitalization.
In the future, the feature selection parameter can be further optimized to improve
human fall prediction in real-time clinical practice.
References
1. Noman Dormosh, Birgit A Damoiseaux-Volman, Nathalie van der Velde, Stephanie Medlock,
Johannes A Romijn, Ameen Abu-Hanna (2023) Development and internal validation of a
prediction model for falls using electronic health records in a hospital setting. J Am Med Dir
Assoc, 24, 964e970. https://doi.org/10.1016/j.jamda.2023.03.006
2. Zhou J, Bo, Liu, Ye H, Duan J-P (2023) A prospective cohort study on the association between
new falls and balancing ability among older adults over 80 years who are independent. Exp
Gerontol 180:112259. https://doi.org/10.1016/j.exger.2023.112259
3. Bob van de Loo, Martijn W Heymans, Stephanie Medlock, Nicole DA Boyé, Tischa JM van der
Cammen, Klaas A Hartholt, Marielle H Emmelot-Vonk, Francesco US Mattace-Raso, Ameen
Abu-Hanna, Nathalie van der Velde, Natasja M van (2023) Validation of the ADFICE_IT
Models for Predicting Falls and Recurrent Falls in Geriatric Outpatients. J Am Med Dir Assoc,
In Press. https://doi.org/10.1016/j.jamda.2023.04.021
4. Pravin Kulurkar, Chandra kumar Dixit, Bharathi VC, Monikavishnuvarthini A, Amol Dhakne,
Preethi P (2023) AI based elderly fall prediction system using wearable sensors: A smart
252 R. Ganesan and Y. Bevish Jinila
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 253
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_20
254 A. Chopde et al.
1 Introduction
2 Literature Review
Y.-H. Seo, S.-H. Park, and D.-W. Kim presents a digital comparator design capable
of processing multiple inputs. Developed with a focus on speed and area efficiency
using a hardware description language, the design outperforms existing models in
both speed and hardware resource usage [1]. W. Alexander and C.M. Williams delve
into digital signal processing in their book, providing principles, algorithms, and
system design insights. Geared towards signal processing enthusiasts, the book
comprehensively explores through practical examples and exercises [2]. Another
article proposes a novel approach for designing multi-mode floating-point adders,
A 4-Input 8-Bit Comparator with Enhanced Binary Subtraction 255
emphasizing speed and area efficiency for various arithmetic operations, with effi-
cient results suggesting diverse applications [3]. An author suggests a hybrid method
combining partial tag comparison techniques and search methods to enhance cache
performance. Experimental results validate the approach, demonstrating reduced
memory accesses and increased cache hit rates [4]. An analysis of optimal comparator
numbers for sorting nine and ten inputs concludes with recommendations of 25
comparators for nine and 29 for ten inputs, offering insights into efficient sorting [5].
Introducing a hybrid digital comparator technique with configurable comparison
operations based on input operands, an article demonstrates substantial reductions in
delay and power consumption, making it suitable for graphical and media processing
applications [6]. Addressing challenges in a dynamic comparator with low power
supply voltage requirements modifications aim to reduce latch drive current and
enhance control over power flow, marking advancements in comparator technology
[7]. A new circuit design for a comparison circuit using a compact and efficient unique
cell shows superiority over conventional methods, requiring fewer transistors for a
4-bit cascade comparator with low delay time and power dissipation [8]. A two-stage
dynamic comparator optimized for low power consumption and high-speed opera-
tion is presented for analog-to-digital converters, offering promising performance
in handling overlapped control signals with low offset voltage [9]. The focus shifts
to a 64-bit binary comparator development using different logic styles—Modified
Pass Transistor Logic Style (MPTL), Complementary Metal–Oxide–Semiconductor
(CMOS) logic style, and Gate Diffusion Input (GDI) logic style, each with its advan-
tages and considerations [10]. Clocked digital comparators incorporating sleep tran-
sistors for enhanced speed and power efficiency are explored across different CMOS
technologies, showcasing reduced power dissipation and delay time [11]. Bitwise
Competition Logic (BCL) offers an algorithm for comparing integer numbers without
arithmetic computations, emphasizing pre-encoding to prevent logic failures [12].
A transmission gate logic utilizing two NMOS and PMOS transistors in parallel
exhibits low power consumption and improved packing densities compared to a
multiplexer-based single 12-bit comparator [13]. The literature also explores multi-
valued non-volatile memory devices, opening possibilities for multi-bit comparators,
and presents the Manchester Chain Processor as an efficient digital circuit with low
power consumption and noise resilience [14, 15].
A study introduces a reversible binary comparator structured on a binary tree and
utilizing 2-bit reversible comparators. The design proves more efficient regarding
quantum cost, delay, and garbage outputs compared to existing models, offering
promising advancements in quantum circuit complexity [16]. The integration of
dynamic and static circuits in CMOS technology is investigated, focusing on the
CMOS Domino circuit. The circuit type combines the advantages of dynamic circuits
with the stability of static circuits, resulting in a significant speed improvement over
traditional circuits in arithmetic units [17]. Research explores a comparator architec-
ture based on standard CMOS cells, resembling a parallel prefix tree with repeated
cells. The design exhibits efficiency in logic synthesis, ease of predicting comparator
characteristics, and practical implications for developing more efficient comparators
[18]. Another study introduces the utilization of a multi-input current Max circuit and
256 A. Chopde et al.
3 Methodology
The full Adder module is the basic building block for additional operations in the
design. It takes three inputs (Ain, Bin, and Cin) and produces two outputs (sum and
Cout). The sum output represents the least significant bit of the addition result, and
Cout is the carry-out.
This aggregates eight instances of the full Adder module to perform 8-bit addition/
subtraction. Each bit of the input (a or b) is XORed with the Cin to enable subtraction
258 A. Chopde et al.
when Cin is set. The carry-out from each full Adder stage is fed into the next, forming
a carry chain for efficient propagation.
The comparator module utilizes multiple instances of the 8-bit adder module to
compare four 8-bit numbers (A, B, C, and D) based on a common control input Cin.
The Cin input acts as a switch, determining whether the comparison is for addition
or subtraction (here Cin is set to 1 to ensure the subtraction of two numbers to
indicate if the number is positive or negative for further comparisons). For each pair
of numbers, it computes both the sum and carry-out, indicating the sign of the result.
The minimum and maximum values are determined based on the carry-out signals
of the 8-bit adder stages.
In order to enhance the clarity of the processes involved in the design and imple-
mentation of the 4-input 8-bit comparator, a comprehensive flowchart illustrating the
entire procedure is presented in “Fig. 2.”
To find the minimum value, we first consider input ’a’ as the potential minimum
and check if cout_ab, cout_ac, and cout_ad are equal to 1. If this condition is not
met, we move on to input ’b’ as the potential minimum and check if cout_bc and
cout_bd are equal to 1. This process continues for inputs ‘c’ and ‘d’ if none of these
conditions are met, we consider ‘d’ to be the minimum.
Similarly, to find the maximum value, we check each input to see if their respective
count values are equal to 0—temporary variables, min_val, and max_val, store values
at specific points in the evaluation process. The final minimum and maximum values
are determined and assigned to the variables min_output and max_output to obtain
the desired output.
To sum up, the main focus is to address the significant role of comparators
in various industrial applications by designing a 4-input 8-bit comparator with a
common Cin input, which outputs the minimum and maximum values among the
inputs. Integrating full adders and an 8-bit adder module allows for a systematic and
structured development process, enhancing the comparator’s flexibility and utility.
4 Novelty
generates a “carry out” (Cout) bit, which signifies whether the result is positive (Cout
= 0) or negative (Cout = 1) and for detecting overflow in subtraction operations.
Overflow occurs when a carry-out from the MSB (most significant bit = 1) indicates
that the result is too large to be represented in the given bit width. This Cout bit is
calculated separately for each subtraction operation, and by comparing these Cout
260 A. Chopde et al.
values, the minimum and maximum values can be determined as the results. This
approach ensures proper handling of signed binary arithmetic in the project.
5 Analysis
The compressive analysis of critical components underlying the design and perfor-
mance of the project aims to reveal the complexities of the system’s structure,
revealing insights into how key elements contribute to its function and overall
influence.
The power aspects of the project are discussed in Table 2, which provides valuable
insights into the operational dynamics. The on-chip power consumption is recorded
at 0.241W, ensuring an efficient utilization of resources. The junction temperature
stands at 25.3 degrees Celsius, indicative of the thermal conditions during oper-
ation. The effective thermal resistance, a crucial parameter for thermal manage-
ment, is measured at 1.4 degrees Celsius per watt, underlining the system’s thermal
performance and its ability to dissipate heat effectively.
Power estimation from Synthesized netlist derived from constraints files, simula-
tion files, or vectorless analysis can change after actual hardware implementation.
A 4-Input 8-Bit Comparator with Enhanced Binary Subtraction 261
6 Results
The Register-Transfer Level (RTL) schematic visually represents the digital circuit
designed for the project. Developed using Hardware Description Language (HDL)
and synthesized with dedicated tools, the Register-Transfer Level (RTL) schematic
snippet, illustrated in “Fig. 3”, serves as a snippet of visual representation of the
designed 4-input 8-bit comparator at the register-transfer abstraction level. This
schematic vividly portrays the data flow between registers, the logic operations
performed, and the data paths within the circuit. The straightforward visualization
facilitates a deeper understanding of the circuit’s architecture, enabling effective
analysis and verification. The clear visualization facilitates a deeper understanding
of the circuit’s architecture, enabling effective analysis and verification.
As depicted in “Fig. 4”, the subsequent simulation results confirm the successful
implementation of the designed logic for determining the minimum and maximum
values among four 8-bit inputs.
In the simulation scenario with input numbers 218, 228, 145, and 182, the 4-input
8-bit comparator accurately determined the minimum and maximum values as 145
and 228, respectively. This outcome showcases the efficacy of the designed logic in
handling diverse input scenarios and producing reliable results.
A 4-Input 8-Bit Comparator with Enhanced Binary Subtraction 263
The systematic functioning of the 4-input 8-bit comparator is justified by its utiliza-
tion of a combination of full adders and an 8-bit adder module. The circuit’s ability to
process inputs and execute the required comparisons aligns with the intended design,
demonstrating its suitability for practical applications.
The successful simulation results affirm the reliability of the designed circuit in
performing the specified logic operations. The use of RTL schematic visualization
aids in comprehending the circuit’s intricacies, fostering a clearer understanding of
its architecture. The outcomes underscore the practicality and effectiveness of the
4-input 8-bit comparator for determining minimum and maximum values in diverse
scenarios.
References
1. Seo Y-H, Park S-H, Kim D-W (2019) High-level hardware design of digital comparator with
multiple inputs. Integration 68:157–165
2. Alexander W, Williams CM (2016) Digital signal processing: principles. Academic Press,
Algorithms and System Design
3. Jaiswal MK, Varma BSC, So HK-H, Balakrishnan M, Paul K, Cheung RCC (2015) Configurable
architectures for multi-mode floating point adders. IEEE Trans Circuits Syst I Regul Pap
62(8):2079–2090
4. Abed S, Al-Shayeji M, Sultan S, Mohammad N (2016) Hybrid ap- proach based on partial tag
comparison technique and search methods to improve cache performance. IET Comput Digital
Tech 10(2):69–76
5. M. Codish L, Cruz-Filipe M, Frank, Schneider-Kamp P (2014) Twenty- five comparators is
optimal when sorting nine inputs (and twenty-nine for ten). In: 2014 IEEE 26th International
Conference on Tools with Artificial Intelligence. IEEE, pp. 186–193
6. Ahmed SE., Srinivas SS, Srinivas M (2016) A hybrid energy efficient digital comparator. In
2016 29th International Conference on VLSI Design and 2016 15th International Conference
on Embedded Systems (VLSID). IEEE, 2016, pp. 567–568.
7. Basak, Dona, SM Ishraqul Huq, Satyendra Nath Biswas (2019) Design and analysis of High—
Speed dynamic comparator for area minimization. In: 2019 2nd International Conference on
Innovation in Engineering and Technology (ICIET). IEEE
8. Hsia S-C (2005) High-speed multi-input comparator. IEE Proceedings- Circuits, Devices and
Systems 152(3):210–214
9. Khorami A, Dastjerdi MB, Ahmadi AF (2016) A low-power high—speed comparator for
analog to digital converters. In: 2016 IEEE International Symposium on Circuits and Systems
(ISCAS). IEEE, pp 2010–2013
10. R. Vanitha, Thenmozhi S (2015) Low power cmos comparator using bipolar cmos technology
for signal processing applications. In: 2015 2nd International Conference on Electronics and
Communication Systems (ICECS). IEEE, pp 1241–1243
11. Anjuli, Satyajit Anand, Satjajit A (2013) High-speed 64-bit CMOS binary comparator. In:
International Journal of Innovative Systems Design and Engineering 4(2): 45–58
12. Sorwar, Afran, Elias Ahammad Sojib, Md Ashik Zafar Dipto, Md Mostak Tahmid Rangon,
Md Sabbir Alam Chowdhury, Abdul Hasib Siddique (2020) Design of a high-performance 2-
bit magnitude comparator using hybrid logic style. In 2020 11th International Conference on
Computing, Communication and Networking Technologies (ICCCNT), pp 1–5. IEEE
13. Shekhawat, Vijaya, Tripti Sharma, Krishna Gopal Sharma (2014) 2-Bit magnitude comparator
using GDI technique. In: International Conference on Recent Advances and Innovations in
Engineering (ICRAIE-2014), pp. 1–5. IEEE
14. Lubaba, Samiha, Faisal KM, Moumita Sadia Islam, Mehedi Hasan (2020) Design of a two- bit
magnitude comparator based on a pass transistor, transmission gate and conventional static
CMOS logic. In 2020 11th International Conference on Computing, Communication and
Networking Technologies (ICCCNT), pp. 1–5. IEEE
15. Guangjie, Wang, Sheng Shimin, Ji Lijiu (1996) New efficient design of digital comparator. In:
2nd International Conference on ASIC, pp 263–266. IEEE
16. Sharma, Geetanjali, Hiten Arora, Jitesh Chawla, Juhi Ramzai (2015) Comparative analysis of a
2-bit magnitude comparator using various high performance techniques. In: 2015 International
Conference on Communications and Signal Processing (ICCSP), pp. 0079- 0083. IEEE
17. Hidalgo-Lopez JA, Tejero JC, Fernández J, Gago A (1995) New types of digital comparators.
In: 1995 IEEE International Symposium on Circuits and Systems (ISCAS), 1, pp 29–32. IEEE
A 4-Input 8-Bit Comparator with Enhanced Binary Subtraction 265
18. Nandhasri, Krissanapong, Jitkasem Ngarmnil (2001) Designs of analog and digital comparators
with FGMOS.In: ISCAS 2001. The 2001 IEEE International Symposium on Circuits and
Systems (Cat. No. 01CH37196), 1, pp 25–28. IEEE
19. Kailash Chandra Rout, Sushmita Rath, Belal Ali (2015) Design of high performance magnitude
comparator. Int J Eng Res & Technol (IJERT) Special Issue—2015
20. Manasi Vaidya, Pati SR (2018) To design 2-bit magnitude comparator using CMOS. 2018
JETIR 5: 6
Multivalued Dependency in Neutrosophic
Database System
1 Introduction
The traditional data model introduced by Codd [1] in 1970 processes only unam-
biguous data. In 1993, Gau and Buehrer [2] introduced a vague set theory for
ambiguous database information.
Smarandache [3] 2001 first introduced neutrosophic set theory, a more efficient
set theory than a vague one to deal with ambiguous information.
Presently, research towards designing a neutrosophic database model using
neutrosophic set theory to deal with uncertain data is more focused. However, very
few research studies have been reported in this field. Dependency constraints are
essential in designing anomaly-free databases. The studies of different dependency
constraints generate a major research area in database designing of neutrosophic
data. Functional dependency using vague set theory, known as vague functional
S. De (B) · J. Mishra
Computer Science and Engineering Department, College of Engineering and Management, Purba
Medinipur, Kolaghat, West Bengal 721171, India
e-mail: soumitra@cemk.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 267
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_21
268 S. De and J. Mishra
dependency (α-vfd), has been explained in the references [4–6]. But in 2019, De and
Mishra [7] developed more promising and effective neutrosophic functional depen-
dency (α-nfd). The literature [8–14] reported that a neutrosophic database model is
more effective in processing imprecise data than a vague data model.
Authors have tried to propose a new concept of neutrosophic multivalued depen-
dency (α-nmvd) for designing neutrosophic databases in their present work to resolve
the redundancy and inconsistency in neutrosophic databases. The concept of α-nmvd
is explained based on the α-equality similarity measure of tuples, as reported in [7].
The implication problem of α-nmvd has been examined, and a set of sound and
complete inference axioms has been proposed.
In Sect. 2, the definition of neutrosophic set theory, similarity measure of neutro-
sophic data, and neutrosophic functional dependency have been revisited. In Sect. 3,
the authors have proposed a new definition of neutrosophic multivalued dependency
(α-nmvd). In the same section, a set of inference rules for the proposed α-nmvd have
also been defined and proved. The concluding remark appears in the final Sect. 4.
2 Basic Definitions
Let x1 and y1 be any two neutrosophic values such that x1 = [tx1 , ix1 , fx1 ] and y1 =
[ty1 , iy1 , fy1 ] where 0 ≤ tx1 ≤ 1, 0 ≤ ix1 ≤ 1, 0 ≤ fx1 ≤ 1 and 0 ≤ ty1 ≤ 1, 0 ≤ iy1 ≤
1, 0 ≤ fy1 ≤ 1 with 0 ≤ tx1 + fx1 ≤ 1, 0 ≤ ty1 + fy1 ≤ 1, 0 ≤ tx1 + ix1 + fx1 ≤ 2, 0 ≤
ty1 + iy1 + fy1 ≤ 2.
Now two neutrosophic data related similarity measure
formula
SE(x ,
1 1 y ) is defined as follows SE(x ,
1 1y ) =
|(tx1 −ty1 )−(ix1 −iy1 )−(fx1 −fy1 )|
1− 3
1 − (tx1 − ty1 ) + (ix1 − iy1 ) + (fx1 − fy1 ) Now,
these two neutrosophic data x1 = [tx1 , ix1 , fx1 ] and y1 = [ty1 , iy1 , fy1 ] is said to be
α-equal if SM (x, y) ≥ α.
Multivalued Dependency in Neutrosophic Database System 269
Using the concept of α-equality of two neutrosophic data, the definition of α − nfd
is defined in Ref. [4] as follows.
nfd nfd
P 2.3.3: If 0 ≤ α2 ≤ α1 ≤ 1, then X1 −→ Y1 ⇒ X1 −→ Y1 .
α1 α2
The classical view of multivalued dependency doesn’t resolve the problem when
databases contain uncertain data. It may be resolved efficiently by the neutrosophic
multivalued dependency (α-nmvd) defined using the notion of a neutrosophic set.
Below, authors have defined neutrosophic multivalued dependency using a new
notion of neutrosophic set.
Definition 3.2 Let R1 (A1 , A2 , ···, An ) be a relational schema and X1 , Y1 are two
different set of attributes of R1 i.e., X1 , Y1 ⊂ R1 . Now, X 1 neutrosophic multivalued
nmvd
determines Y 1 at α level of tolerance (α-nmvd) denoted by X1 → −→ Y1 is defined
α
as: if t1 [X1 ](NE)α t2 [X1 ] hold, then there exists a third tuple t3 in R1 where.
t1 X1 (NE)α t2 [X1 ](NE)α t3 [X1 ], t1 [Y1 ](NE)α t3 [Y1 ] and
t2 [R1 − X1 − Y1 ](NE)α t3 [R1 − X1 − Y1 ] are also held.
3.1. Lemma: The Definition of the α-nmvd is Consistent I.E., the α-nmvd X1 →
nmvd
−→ Y1 ⇒ ClassicalMultivaluedDependency X1 →→ Y1 .
α=1
Proof:
mmvd
Let X1 → −→ Y1 hold in R1 . So, the definition of α − nmvdstated that for any
α=1
two tuples t1 and t2 in R1 , if t1 [X1 ](NE)α=1 t2 [X1 ]is true, (1)
Now using Definition 2.2 and Definition 2.3.1, for = 1 it can be shown
Using the above implication (5) in relations (1), (2), (3) and (4), the above α-nmvd
reduces as if t1 [X1 ] = t2 [X1 ], then exists a tuple t3 in R1 satisfying.
t1 [X1 ] = t2 [X1 ] = t3 [X],
t1 [Y1 ] = t3 [Y1 ].
and t2 [R1 – X1 – Y1 ] = t3 [R1 – X1 – Y1 ].
which implies X1 →→ Y1 . Hence the definition of α-nmvd is consistent.
Multivalued Dependency in Neutrosophic Database System 271
Proof:
nmvd
Given X1 → −→ Y1 . Then, from α-nmvd,
α
t1 [X1 ](NE)α t2 [X1 ], there exists a tuple t3 such that
Now, from the definition 3.2 of α-nmvd the relations (6), (8) and (9) imply
nmvd nmvd
X1 → −→ R1 − X1 − Y1 .X1 → −→ R1 − X1 − Y1 .
α α
Hence α-nmvd complementation rule proved.
nmvd nmvd
(IR3.2.3) α-nmvd inclusion rule: X1 → −→ Y1 and α1 ≥ α2 ⇒ X1 → −→ Y1 .
α1 α2
Proof:
This rule follows directly from P 2.3.1.
nmvd
(IR3.2.4) α-nmvd reflexivity rule: Y1 ⊆ X1 ⇒ X1 → −→ Y1 .
α
Proof:
nfd
α-nfd reflexive rule in [1] says, if Y1 ⊆ X1 then it always implies X1 −→ Y1 and
α
nfd nmvd
α-nmvd replication rule says, if X1 −→ Y1 , then X1 → −→ Y1 .
α α
Hence, the α-nmvd reflexivity rule is verified.
nmvd
(IR3.2.5) α-nmvd augmentation rule: X1 → −→ Y1 hold and W1 ⊆ U1 always
α
nmvd
implies U1 X1 → −→ Y1 W1 .
α
Proof:
nmvd
Given X1 → −→ Y1 ,
α
if t1 [X1 ](NE)α t3 [X1 ], then there exists a tuple t3 such that
nmvd
Again, since W1 ⊆ U1 so by α-nmvd reflexivity rule U1 → −→ W1 .U1 →
α
nmvd
−→ W1 .
α
Hence, from the definition of α-nmvd,
Again since R1 − U1 X1 – Y1 W1 ⊆ R1 – X1 – Y1 or R1 – U1 X1 – Y1 .
W1 ⊆ R1 – U1 – W1 , so either.
from (12) or (15) by using P 2.3.2, we have
nmvd
Therefore, relations (16), (17) and (18) ⇒ U1 X1 → −→ Y1 W1 . This proved the
α
augmentation rule.
nmvd nmvd
(IR3.2.6) α-nmvd transitivity rule: If X1 → −→ Y1 and Y1 → −→ Z1 , then X1 →
α1 α2
nmvd
−→ Z1 − Y1 .
min(α1 ,α2 )
Proof:
case when α1 ≥ α2.
nmvd nmvd
Given X1 → −→ Y1 and Y1 → −→ Z1 .
α1 α2
nmvd nmvd
Now by α-nmvd inclusion rule we say X1 → −→ Y1 ⇒ X1 → −→ Y1 .
α1 α2
nmvd
Then, from the definition of X1 → −→ Y1 , for tuple t1 , t2 there exist tuple t3 such
α2
that
nmvd
and from Y1 → −→ Z1 ,, for tuple t1 , t3 there exist tuple t4 such that
α2
274 S. De and J. Mishra
Then from (19) and [(26) for disjoint/(27) for non disjoint],
nmvd
which implies X1 → −→ Y1 Z1 .
α2
nmvd nmvd nmvd
Hence if X1 → −→ Y1 and Y1 → −→ Z1 holds then X1 → −→ Y1 Z1 is also
α1 α2 α2
holds for α1 ≥ α2 .
Similarly, the α-nmvd transitivity rule can be proved for α2 ≥ α1 .
nmvd nmvd
(IR3.2.7) α-nmvd-union rule: If X1 → −→ Y1 and X1 → −→ Z1 , then X1 →
α1 α2
nmvd
−→ Y1 Z1 .
min(α1 ,α2 )
Proof:
Case-I: α1 ≥ α2 .
nmvd nmvd
Given: X1 → −→ Y1 and X1 → −→ Z1 .
α1 α2
nmvd nmvd
Now X1 → −→ Y1 ⇒ X1 → −→ Y1 by α-nmvd-inclusion rule.
α1 α2
nmvd
Further, from the definition of X1 → −→ Y1 , it may be written as
α2
nmvd
again from X1 → −→ Z1 ,
α2
Now, combining the relations (33) and (39), one can get
Also for non disjoint sets X1 , Y1 and Z1 there exist a set Y1 of Y1 such that Y1 =
Y1 − (Y1 ∩ X1 ) − (Y1 ∩ Z1 ).Y1 = Y1 − (Y1 ∩ X1 ) − (Y1 ∩ Z1 ).
Now, sin ce Y1 ⊆ R1 − X1 − Z1 , so the (37) u sin gP 2.3.2, gives t3 [Y1 ](NE)α2 t4 [Y1 ]
(41)
Again, sin ceY1 ⊆ Y1 , so from (33) u sin gP 2.3.2, one can ge tt1 [Y1 ](NE)α2 t3 [Y1 ]
(42)
and then the relations (41) and (42) provides, t1 [Y1 ](NE)α2 t4 [Y1 ] (43)
the relations (35) and (36) gives t1 [X1 ](NE)α2 t4 [X1 ] and t1 [Z1 ](NE)α2 t4 [Z1 ].
Combining the above two relations, it may be written t1 [X1 Z1 ](NE)α2 t4 [X1 Z1 ].
(44)
Then for disjoint set combining the relations (42) and (43) implies t1 [Y1 ](NE)α2 t4 [Y1 ]
(46)
from equations (36) and (40)/(35) we have t1 [Y1 Z1 ](NE)α2 t4 [Y1 Z1 ] (47)
Hence, using the relations (38), (46) and (47), for any two tuples t1 and t2 if
t1 [X1 ](NE)α2 t2 [X1 ] then there exists a tuple t4 for which
nmvd
which implies X1 → −→ Y1 Z1 [∵ R1 − X1 − Y1 − Z1 = R1 − X1 − Y1 Z1 ]
α2
Proof:
Case-I: α1 ≥ α2 .
nmvd
Let X1 → −→ Y1 ..
α1
nmvd
By α-nmvd-inclusion rule this implies X1 → −→ Y1 .
α2
nmvd
Now, from the definition of X1 → −→ Y1 , one has
α2
nmvd
Also, from X1 → −→ Z1 , one has
α2
nmvd
Again, from α-nmvd reflexivity rule one has X1 → −→ X1 [∵ X1 ⊆ X1 ], which
α2
implies
or
nmvd
Using relations (A1 ), (A2 ), (A3 ), one get X1 → −→ Y1 − Z1 (proved).
α2
Again from (49), one has
nmvd
Using relations (61), (62), (63), it may be written as X1 → −→ Z1 − Y1 (proved).
α2
Next, from relation (54), one has
nmvd
Using relations (64), (65), (66), we get X1 → −→ Z1 ∩ Y1 (Proved).
α2
nmvd nmvd
Hence, for given, X1 → −→ Y1 and X1 → −→ Z1 , it is proved that X1 →
α1 α2
nmvd nmvd nmvd
−→ Y1 − Z1 , X1 → −→ Z1 − Y1 and X1 → −→ Z1 ∩ Y1 where α1 ≥ α2 .
α2 α2 α2
Case-II: α2 ≥ α1 .
nmvd nmvd nmvd
Similarly, for given, X1 → −→ Y1 and X1 → −→ Z1 , we can prove X1 → −→ Y1 −
α1 α2 α1
Z1 ,
nmvd nmvd
X1 → −→ Z1 − Y1 and X1 → −→ Z1 ∩ Y1 .
α1 α1
Thus, combining the above two cases we can say, if.
nmvd nmvd
X1 → −→ Y1 and X1 → −→ Z1 , then.
α1 α2
nmvd nmvd vmvd
X1 → −→ Y1 − Z1 , X1 → −→ Z1 − Y1 , and X1 → −→ Z1 ∩ Y1 .
min(α1 ,α2 ) min(α1 ,α2 ) min(α1 ,α2 )
Hence proved.
4 Conclusion
Like functional dependency, the multivalued dependency also constitutes a key data
dependency constraint in the database model. Measuring dependencies helps iden-
tify the key of the relation and normalize the relation into higher normal forms.
Data dependency constraints for ambiguous data may be determined using fuzzy,
vague, or neutrosophic sets. In their present work, authors have introduced a defini-
tion of neutrosophic multivalued dependency (α-nmvd) using α-equality similarity
measures to handle imprecise data. Finally, an important set of inference rules for
α-nmvd have been proposed and proved.
280 S. De and J. Mishra
References
1. Codd E (1970) A relational model for large shared data banks. Comm. of ACM 13(6):377–387
2. Gau,WL, Buehrer DJ (1993) Vague sets. IEEE Trans Syst Man, Cybernetics, 23(2), 610–614
3. Smarandache F (2001) First International Conference on Neutrosophy, Neutrosophic Proba-
bility, Set and Logic. University of New Mexico. 1(3)
4. Mishra J, Ghosh S (2012) A new functional dependency in a vague relational database model.
Int J Comput Appl 8:29–36
5. Zhao F, Ma ZM, Yan L (2007) A Vague Relational Model and Algebra. Fourth International
Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), 1, 81–85
6. Zhao F, Ma ZM (2009) Vague query based on vague relational model. AISC 61, Springer—
Verlag Berlin Heidelberg, 229–238
7. De S, Mishra J (2019) A new approach of functional dependency in a neutrosophic relational
database model. Asian J Comput Sci Technol 8(2):44–48
8. Broumi S (2013) Generalized Neutrosophic soft set. Int J Comput Sci, Eng Inf Technol 3(2):17–
30
9. Deli I, Broumi S (2015) Neutrosophic soft relations and some properties. Ann Fuzzy Math
Inform 9(1):169–182
10. De S, Mishra J (2016) Compare different similarity measure formula based imprecise query
on neutrosophic data. Int Conf Adv Comput Intell Eng 5(12):330–334
11. De S, Mishra J (2017) Handle inconsistent data using new method of neutrosophic logic for
proper outcome. Int J Sci, Technol Manag 6(2):383–387
12. De S, Mishra J (2017) Processing of inconsistent neutrosophic data and selecting primary key
from the relation. International Conference on Inventive Computing and Informatics 6(7):245–
250
13. Fagin R (1977) Multivalued dependencies and a new normal form for relational databases.
ACM Trans Database Syst 2(3):262–278
14. De S, Mishra J, Chatterjee S (2020) Decomposing lossless join with dependency preservation
in neutrosophic database. Solid State Technology 63(1):908–916
Traffic Sign Recognition Framework
Using Zero-Shot Learning
Abstract Traffic signs play a crucial role in preventing accidents and bottlenecks in
traffic. Traffic symbols are visual representations of various information that drivers
must be able to understand and obey. Traffic signs are essential for controlling traffic,
enforcing driving ehaviour, and preventing accidents, injuries, and fatalities. Recog-
nizing traffic signs in a real-time environment is essential for automated driving cars.
We propose a Zero Short Learning-based traffic sign recognition model to address this
challenge. The proposed framework uses the self-supervised model to recognize and
detect traffic signs without prior training data. The proposed study enhances the zero-
short learning method to recognize traffic signs in Lower-brightness situations. Using
semantic links, zero-shot learning allows the model to generalize to previously undis-
covered classes. This increases its efficiency and adaptability in dynamic contexts
and guarantees it can recognize changing traffic laws without requiring continuous
retraining—a critical benefit for autonomous vehicles and intelligent transportation
networks. The proposed methodology has used the standard German Traffic Sign
Recognition Benchmark (GTSRB) dataset. The simulation of the proposed method-
ology demonstrates the accurate recognition of traffic signs in various scenarios,
and the proposed architecture has achieved 99.36 % validation accuracy on the
GSTSRB dataset. The authors have also compared the proposed methodology with
other self-supervised learning models.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 281
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_22
282 P. Shah et al.
1 Introduction
2 Literature Review
They provide a collection of real-world benchmark data for detecting traffic signs.
The findings are supplied with several carefully selected assessment indicators, base-
line findings, and a web interface for contrasting methods. To sort traffic signs, a
convolutional neural network (CNN) and an image processing-based approach for
traffic sign detection and identification are linked. Deep learning-based traffic sign
recognition is suggested, focusing on circular traffic signs. This approach can recog-
nize and identify traffic signs by using picture pre-processing, traffic sign detection,
recognition, and classifications [2].
Triki, et al. [8] proposed a novel deep convolutional neural network based on
attention. Since the obtained testing accuracy and F1-measure rates attain 99.91%
and 99%, the produced findings are superior to those found in prior studies on the
classification of traffic signs. Using a Raspberry Pi 4 board, the built TSR system is
assessed and verified.
The traffic sign’s color or shape information is frequently considered by methods
used for detection and recognition. However, it is well acknowledged that the image
quality in real-world traffic situations is frequently poor. This is due to various vari-
ables, including low resolution, poor resolution, bad weather, poor illumination,
motion blur, occlusion, scale, and rotation. Therefore, efficient color and form inte-
gration in both the detection and classification stages is a very intriguing and exciting
topic that needs much more research. The most popular techniques for tracking are
the Kalman filter and its variants [10].
Traditional visual object recognition mostly relies on extracting visual features,
such as color and edge, which has drawbacks. Deep learning formed the foundation
for creating the convolutional neural network (CNN), which was created for visual
object recognition. It has disadvantages because traditional visual object recognition
mostly relies on extracting visual properties like color and edge. The convolutional
neural network (CNN), developed for visual object recognition, was built on deep
learning principles [11]. Indian road traffic signs were detected and recognized using a
deep learning method, which performed well under various circumstances, including
changes in scale, orientation, and illumination [12].
Each traffic sign is distinct due to its color and shape. Red, blue, green, and yellow
are the colors most frequently associated with traffic signs, while circular, rectangular,
or triangular shapes are the most popular. To identify the red and blue objects in the
image, HSV color space was used. These objects are also separated from the rest
of the background using masking. A neural network is frequently referred to as a
“black box.” Additionally, new techniques for data augmentation can be used to
strengthen the classifier [13]. It accurately attracts candidates for traffic signs from
the real road view area, including the signs’ size, texture, and color. One of the biggest
challenges in this field of study is using a useful dataset containing real-world images
of different traffic lights in different conditions. Since most traffic sign photographs
were taken in perfect circumstances, it is less beneficial to create a better system in less
ideal situations, such as when there are various lighting conditions, environmental
284 P. Shah et al.
variations, viewing angles, transparency, etc. Most earlier studies on detecting and
identifying these traffic indicators are not particularly helpful in scenarios when they
occur in real-time [14]. They successfully taught a model with only five known classes
to distinguish two novel types of plastic using zero-shot learning. The system can
identify and categorize new classes not present during training because all metrics
for unknown classes are more than 56%. The system successfully identified and
classified two new kinds of plastic with an overall accuracy of 98% [15].
There may be value and possible applications for traffic sign detection and recog-
nition work. They routinely distinguish and detect traffic signs image by picture
using conventional detection and identification techniques. In this situation, the rela-
tionship between the image sequences is ignored, and only the information from
the current image is used. In their conclusion, they suggest a novel model that can
swiftly and reliably detect and recognize traffic signs in a driving video series by
utilizing the interaction between several images. The study proposes a fusion model
based on the YOLO-V3 and VGG19 networks. The results show that this proposed
model works better than the baseline approach for all types of traffic signs in a
range of scenarios, obtaining an accuracy of more than 90% when tested on a public
dataset and compared to the baseline methodology. Consequently, we can say that
the suggested model is accurate and effective [16].
Currently, traffic sign detection and identification research has focused on raising
these systems’ efficiency by utilizing deep learning techniques like CNN and
YOLOv3, as demonstrated in Table 1. Additionally, some research has worked
on real-time implementation, lightweight models, and transfer learning to improve
performance and reduce computational costs.
A proposed approach for identifying and detecting traffic signs based on image
processing is paired with a convolutional neural network (CNN) to recognize the
traffic signs. CNN can be used to perform a variety of computer vision tasks due
to its high recognition rate. A deep learning-based traffic sign recognition method
aimed primarily at circular traffic signs. This method detects and identifies traffic
signs by utilizing image pre-processing, traffic sign detection, recognition, and clas-
sification. The accuracy of this method is 98.2%, according to the test results[7].
Refined Mask R-CNN (RMR-CNN), a deep learning-based model, achieves 97.08%
accuracy using the Customized Indian Traffic sign dataset of 6480 pictures [12]. The
YOLOv3 traffic-sign detection and recognition method was created in Python using
OpenCV to solve issues, including how readily the environment may alter standard
traffic sign detection. The total methodology shows 92.2% accuracy in real-time item
identification and classification at a frame rate of 45 frames per second [14].
3 Methodology
Authors have proposed a Zero-shot learning model, which enables a model to identify
and categorize novel traffic signs without any prior training information for those
particular signs. This is achieved by training the model on source traffic sign classes
and using a separate set of class-level information to recognize new, unseen traffic
signs [9]. Zero-shot learning mainly aims to identify new traffic signs based on their
semantic similarity to the known traffic signs rather than memorizing specific trained
signs. The proposed methodology uses transfer learning to train a model on traffic
sign recognition tasks and then apply the learned features in real-time unseen traffic
signs. Figure 1 illustrates the functional diagram of the proposed study.
The proposed methodology can recognize new traffic signs not present during
training by leveraging knowledge about the relationships between different sign
classes. These semantic representations are used to train a model that maps the
visual features of an image to the semantic space of the traffic sign recognition.
Proposed Zero short learning model applied on training set S = {(xn, yn), n =
1…N}, where yn ∈ Ytr belongs to training traffic sign class. Having L as loss function
and as batch normalization, zero short learning can be formulated as Eq. 1 [26],
Fig. 1 Architecture of proposed Zero short learning model for traffic sign recognition
286 P. Shah et al.
where x and y are input and output feature maps with average w weights.
N
1\N L(yn , f (xn , w)) + (w) (1)
n=1
The model uses the learned features to classify new, unseen classes without any
additional training data. This is also known as “transudative learning”. A common
approach to ZSL is to use a semantic embedding space, where a high-dimensional
vector represents each class. The embeddings are learned during training and infer-
ence, and the model maps unseen classes to the embedding space and predicts the
class closest to the embedding of the input. The equation for zero-shot learning can
be expressed as Eq. 2.
Output = f (X , Y , Z) (2)
Output: The predicted output or class label for a given input. X: The input data
or features of the sample being classified. Y: The training data with labeled samples
from seen classes. Z: The auxiliary information or semantic attributes associated
with seen and unseen classes.
The proposed model associates the input data (X) with the corresponding semantic
attributes (Z) of the seen classes (Y). This learning process allows the model to
understand the relationships between the input features and the auxiliary information.
It is challenging because it requires the model to generalize to unseen classes
based on only a few examples and limited information. ZSL has applications in
several disciplines, including speech recognition, natural language processing, and
picture classification.
In zero-shot machine learning, visual embeddings represent the seen and unseen
classes. The visual embedding of a picture x from class i should be indicated as
vix . The similarity between a semantic embedding (s) and a visual embedding (v)
is quantified using a compatibility function. A linear compatibility function is one
popular option:
f (v, s) = vT Ws (3)
where x is a probability output vector for each potential class, with W1 as a weight
matrix for the fully connected layer, h1 is the output vector of the previous fully
connected layer, and B1 is the bias vector for the last fully linked layer.
Traffic Sign Recognition Framework Using Zero-Shot Learning 287
The weights of the base model are frozen, so they won’t change while training.
The network’s input shape is set to (32, 32, 3). The input photos are then randomly
horizontally flipped and rotated by a maximum of 0.1 radians using the model’s two
data augmentation layers, Random Flip and Random Rotation. The network builds
on the original model by adding dense layers, each with a dropout layer to avoid
overfitting and a ReLU activation function. The 300-unit dense layer comprising the
network’s top layer lacks a defined activation function, as Eq. 5.
where y is the input vector of size n, W1 is the weight matrix for the first layer, b1
is the bias vector for the first layer (size m), W2 is the weight matrix for the second
(output) layer, b2 is the bias vector for the second layer (size k).
4.1 Dataset
The “German Traffic Sign Recognition Benchmark” competition was held at IJCNN
in 2011. The images are supplemented by various pre-computed feature sets, allowing
machine learning algorithms to be applied without prior image processing knowledge
[18]. We used the GTSRB dataset, which stands for German Traffic Sign Recognition
Benchmark (sample illustrated in Fig. 2). The German Traffic Sign Benchmark is an
image classification test with multiple classes. The simulation used one directory per
class structure. The simulation of the proposed methodology uses 11 classes with an
average of 2000 images. The average dimension of images is around 30 × 30.
4.2 Result
The research study on zero-shot learning for traffic sign identification offers a novel
solution to the issue of understanding traffic signs in practical situations. The model
can recognize novel signs without the need for explicit training on them, thanks to a
technique the researchers suggest that uses zero-shot learning capabilities. The exper-
imental outcomes show the usefulness of the suggested approach, delivering cutting-
edge results on a benchmark dataset for identifying traffic signs. The paper also thor-
oughly investigates the model’s performance in various scenarios, emphasizing its
adaptability to changes in image quality, occlusions, and lighting. Depending on the
classification job, the outcome of zero-shot learning can be assessed using a variety
of performance metrics, including accuracy, precision, recall, or F1 score. Usually, a
separate assessment dataset with pictures from visible and unseen classes is usually
used to evaluate the performance. Accuracy and loss graphs are illustrated in Figs. 3
and 4.
The quality and amount of the semantic information, the complexity of the image
dataset, and the zero-shot learning technique or algorithm employed can all affect
how well zero-shot learning performs in image classification. The simulation of the
proposed method used 20 epochs with 16 batch sizes. The dropout and learning rate
for simulation is 0.2 and 0.001, respectively. Reset all the images with (32, 32, 3)
RGB bitmap.
Fig. 3 The plot of training and validation accuracy concerning each epoch
Traffic Sign Recognition Framework Using Zero-Shot Learning 289
Fig. 4 The plot of Training and Validation loss concerning each epoch
promise of zero-shot learning for more general applications in the face of changing
computer vision issues.
5 Conclusion
References
21. Hu W, Zhang Y, Li L (2019) Study of the application of deep convolutional neural networks
(CNNs). In: Processing Sensor Data and Biomedical Images. Sensors, 19, 3584. https://doi.
org/10.3390/s19163584
22. Kothadiya D, Rehman A, Abbas S, Alamri FS, Saba T (2023) Attention-based deep learning
framework to recognize diabetes disease from cellular retinal images. Biochemistry and Cell
Biology
23. Zhang, Yongli (2012) Support vector machine classification algorithm and its application.
In Information Computing and Applications: Third International Conference, ICICA 2012,
Chengde, China, September 14–16, 2012. Proceedings, Part II 3, pp. 179–186. Springer Berlin
Heidelberg
24. Uliyan DM, Sadeghi S, Jalab HA (2020) Anti-spoofing method for fingerprint recognition
using patch-based deep learning machine. Eng Sci Technol, Int J 23(2):264–273
25. Kothadiya DR, Bhatt CM, Saba T, Rehman A, Bahaj SA (2023) SIGNFORMER: deepvision
transformer for sign language recognition. IEEE Access 11:4730–4739
26. Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning—a comprehensive
evaluation of the good, the bad, and the ugly. IEEE Trans Pattern Anal Mach Intell
41(9):2251–2265
27. Bansal M, Kumar M, Sachdeva M, Mittal A (2021) Transfer learning for image classification
using VGG19: Caltech-101 image data set. J Ambient Intell HumIzed Comput, pp 1–12
Machine Learning Techniques
to Categorize the Sentiment Analysis
of Amazon Customer Reviews
1 Introduction
The growth of Internet marketplaces in recent decades has led sellers and merchants
to collect client feedback. Millions of reviews of various goods, services, and loca-
tions are posted daily. As a result, the Internet has become the go-to location for
getting feedback on a service or product. As more individuals evaluate a product, it
becomes more difficult for potential consumers to make an informed decision. When
confronted with inconsistent evaluations and varied opinions on the same product,
the customer’s ability to make an informed buying decision is further impaired. This
content looks to be critical for all e-commerce businesses to assess.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 293
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_23
294 R. V. Prakash et al.
2 Related Work
Sangeetha et al. [1] used a Pearson correlation coefficient-based Harris Hawks Opti-
mization based Recurrent Neural Network-Long Short-Term Memory (PCCHHO-
RNN-LSTM) algorithm to select features from reviews given by users for classifying
their sentiments according to their appropriate polarity. In this proposed PCCHHO-
RNNLSTM, the correlation coefficients of features are used to conduct an initial
dimensionality reduction to achieve greater accuracy. The HHO algorithm is then
used to choose a small group of non-redundant features, and RNN-LSTM is used to
classify sentiments to their suitable polarity.
Sultan Naveed [2] used two methods for different models, TFIDF and Count
Vector, including Naive Bayes, SVM, KNN, Decision Tree, Logistic Regression,
and Ensemble Classification.
Mohmmad Fories [3] used Multinomial Naive Bayes (MNB), Random Forest
(RF), Long-Short Term Memory (LSTM), and Convolutional Neural Network (CNN)
for sentiment analysis on Amazon product reviews. The feature extraction techniques
Term Frequency-Inverse Document Frequency Transformer (TF-IDF(T)) and TFIDF
Vectorizer (TF-IDF(V)) were used for ML models, MNB and RF.
Feilong Tang et al. [4] retrieved fine-grained opinions and aspects from internet
reviews using a ‘2’ generative model, MaxEnt-JABST and JABST. The JABST model
gathered polar ideas and traits ranging from specific, generic, and emotional. A
maximal entropy classifier was also utilized in the MaxEnt-JABST architecture to
Machine Learning Techniques to Categorize the Sentiment Analysis … 295
distinguish between aspects or opinion words. These designs were evaluated scien-
tifically and qualitatively in a restaurant and electronic gadget examination. The trial
results demonstrated that the designs exceeded baselines in both numerical and qual-
itative terms, accurately identifying restaurant and electronic gadget reviews. The
testing results demonstrated that the designs outperformed baselines and recognized
fine-grained traits and views.
Rajkumar et al. [5] created the Nave Bayes (NB) and SVM techniques for SA on
product reviews. Amazon customer reviews of PCs, cameras, mobile phones, tablets,
security cameras, and televisions were employed in these procedures. A word bag
was generated by stemming, deleting stop words, and removing punctuation marks.
With sentence sentiment scores, 4783 negative and 2006 positive terms from opinion
lexicons were compared. Scores and variables were used to calculate NB and SVM
accuracy. SVM had 93.54% camera review accuracy, whereas NB had 98.170%.
This method employs the SVM, which necessitates fine-tuning various parameters
to achieve great classification results. SVM classification precision, on the other
hand, is lower.
Sumbal Riaz et al. [6] used text mining to assess customer attitudes using a
vast dataset of product reviews from various customers. This approach used SA
to determine each term’s Sentiment Polarity (SP) instead of per document. The
SP strength was calculated by extracting keywords from a ten-key graph based on
document phrase frequency. The data was clustered by emotion intensity using k-
means. Those scores were compared to data star ratings to assess outstanding and
neutral product sentiment. This clustering approach may be a group.
The work of Hemma et al. [7] is essential for any discussion of sentiment analysis
algorithms. When the results of their research are used as a standard, SVMs can be
employed well for sentiment categorization. This information can assist researchers
and practitioners in analyzing Amazon product reviews and reaching algorithmic
conclusions.
A study by Zeenia Singla et al. found that [8] Classical machine learning methods
like Logistic Regression and Naive Bayes are prominent in Singla’s research. This
finding demonstrates the algorithms’ continued usefulness in practical e-commerce
applications and is especially helpful for decision-makers dealing with limited
resources.
Feature engineering is crucial for improving sentiment analysis performance,
according to Bi Jian-Wu [9]. Their research contributes to the existing literature on
sentiment analysis algorithms used to assess Amazon product reviews, which could
be more accurate and effective.
The research of Choudhary et al. [10] focuses on the problems and complications
of writing Amazon product evaluations. Important suggestions for further research
and improved models are provided. Researchers and enterprises must first understand
these difficulties to harvest customer feedback for useful insights appropriately.
Guo et al. [11] greatly impacted how academics analyze the effectiveness of
sentiment analysis algorithms because of their recommendations for correct evalu-
ation measures (ROC-AUC and F1-score). This research is particularly relevant to
296 R. V. Prakash et al.
the academic and corporate sectors due to the requirement to specify appropriate
assessment standards to acquire trustworthy results properly.
Sayyida [12] is working on improving Transformer-based sentiment analysis algo-
rithms. Their contributions are crucial for keeping up with advances in sentiment
analysis that need new approaches.
3 Data Preprocessing
The comments left on the online goods used to compile the dataset were subjected
to several pre-processing operations, including tokenizing, stemming, removal of
stop words, and deletion of usernames and hashtags [13, 14]. In the initial step of
the process, known as pre-processing, tokenization, which is the act of splitting the
text into a collection of meaningful pieces known as tokens, was carried out. For
example, one may separate a lengthy passage of text into its parts, such as words
or phrases. The following stage is to get rid of “stop words,” which are phrases that
are repeated frequently but do not add anything to the student’s education. When a
piece of writing is checked against a list of “stop words,” typical words like “the”
and “and” are removed from the document [15].
The procedure of stemming is the next phase in the process as described in Fig. 1.
Words can be condensed to their etymological roots through a process known as
stemming [16, 17]. This involves eliminating the infection (often a suffix) from
the word and deleting any extraneous characters. Eliminating the URL, hashtag,
and username is a significant development, as hashtags are commonly employed to
designate terms in social media interactions, thereby facilitating their identification
[18, 19]. In this phase, the typed text will have the hashtags, which are words that
are prefaced by a number sign (#), removed from it. Following that, any URLs or
usernames found will be removed, and finally, the word that is too long will be
abridged.
4 Methodology
Vector Machines (SVM), logistic regression, and Naive Bayes are machine learning
methods that have demonstrated effectiveness in a prior investigation [23–26]. The
preliminary inquiry gathered an exhaustive dataset by utilizing Amazon products
and reviews. Subsequently, it is necessary to preprocess the review material to facil-
itate comprehension by the trained model. Ultimately, it is crucial to evaluate the
effects of the study’s operational framework. The present part will be examined and
analyzed using the approaches employed in the current investigation. Figure 2 is a
comprehensive depiction of the full methodology.
The dataset used to analyze the sentiment of Amazon product reviews and compare
classifiers was provided by Kaggle, a well-known website containing datasets and
machine learning tools [27–29]. This dataset includes reviews on various products
sold by Amazon, as shown in Fig. 3. Each review includes text and labels indicating
whether the reviewer had a positive, negative, or neutral opinion regarding the product
in question. Because so many product categories and types of reviews are included,
it works wonderfully for sentiment analysis. The terms and conditions of Kaggle
298 R. V. Prakash et al.
500
0
Count
Neutral Negative Positive
and any other license or usage agreements that may apply to this dataset must be
followed for users to be allowed to access and use the dataset [30].
The objective of this study is to nalyse data through the utilization of a machine-
learning approach that integrates both supervised and unsupervised learning tech-
niques. The proposed methodology for conducting sentiment analysis and classifier
comparison on Amazon product reviews involves the acquisition of a varied dataset
from platforms such as Kaggle. Subsequently, a comprehensive preprocessing stage
is undertaken, which encompasses removing HTML tags, eliminating punctuation
and stop-words, and addressing missing data. Finally, sentiment labels are assigned
to the reviews, and features are extracted using TF-IDF and word embeddings. Three
classification algorithms, namely Support Vector Machine (SVM), Naive Bayes, and
Logistic Regression, are employed and trained using the annotated dataset.
The reviews of numerous products on the Amazon platform were subjected to anal-
ysis utilizing three widely employed classifiers: the Support Vector Machine (SVM),
Logistic Regression, and Naive Bayes. The primary goal is to assess the efficacy of
their sentiment analysis methodology. Based on comprehensive testing, the following
findings were obtained concerning accuracy:
The use of support vector machines in this study was deemed significant due
to its ability to establish an optimal decision boundary that effectively maximizes
the separation between distinct sentiment classifications. This process guarantees
that Amazon product reviews are categorized accurately based on emotion. The
study found that the Support Vector Machine (SVM) classifier demonstrated a high
level of accuracy, specifically 90.842%, making it the most successful classifier in
this research. The demonstrated efficacy of opinion mining underscores its critical
importance in obtaining dependable results.
Machine Learning Techniques to Categorize the Sentiment Analysis … 299
Model Comparison
92
90
88
86
84
82
Accuracy
6 Conclusion
It’s important to pick the right classifier when mining Amazon reviews for senti-
ment. An analysis of three widely used classifiers—Support Vector Machine (SVM),
Logistic Regression, and Naive Bayes—reveals that each possesses distinct merits
that render it suitable for various tasks. SVM’s accuracy rating of 90.842% qualifies
it as the highest benchmark for applications where precision is crucial. However,
Logistic Regression is a trustworthy approach that finds a happy medium between
power and usability, boasting an impressive 90.615% accuracy. Naive Bayes, with
an accuracy of 85.737%, when efficiency and simplicity in implementation are
preferred.
The classification efficiency of the SVM Logistic Regression architecture can be
improved by utilizing more complex deep learning models like BERT, GloVe, and
word2vec to improve the accuracy of the models.
References
1. Sangeetha J, Kumaran U (2023) Sentiment analysis of amazon user reviews using a hybrid
approach, Meas: Sens, 27, 100790, https://doi.org/10.1016/j.measen.2023.100790.
2. Sultan Naveed (2023) Sentiment analysis of amazon product reviews using supervised machine
learning techniques. Knowl Eng Data Sci, 5, pp. 101–108 https://doi.org/10.17977/um018v5i1
2022p101-108
3. Mohamad, Faris, bin, Harunasir (2023) Naveen Palanichamy, Su-Cheng Haw, and Kok-Why
Ng, Sentiment analysis of amazon product reviews by supervised machine learning models. J
Adv Inf Technol 14(4):857–862
Machine Learning Techniques to Categorize the Sentiment Analysis … 301
4. Feilong Tang, Luoyi Fu, Bin Yao, Wenchao Xu (2019) Aspect based fine-grained sentiment
analysis for online reviews, Information Sciences, 488, pages 190–204, ISSN 0020–0255,
https://doi.org/ https://doi.org/10.1016/j.ins.2019.02.064
5. Jagdale, Rajkumar S, Vishal S Shirsat, Sachin N Deshmukh (2018) Sentiment analysis on
product reviews using machine learning techniques. Cogn Inform Soft Comput
6. Riaz, Sumbal, Mehvish Fatima, Muhammad Kamran and Muhammad Wasif Nisar (2019)
Opinion mining on large scale data using sentiment analysis and k-means clustering. Clust
Comput 22: 7149–7164
7. Emma Haddi, Xiaohui Liu, Yong Shi (2013) The role of text pre-processing in sentiment
analysis, procedia computer science, 17, pp 26–32, ISSN 1877–0509, https://doi.org/10.1016/
j.procs.2013.05.005
8. Singla Z, Randhawa S, Jain S (2017) Statistical and sentiment analysis of consumer product
reviews. In: 2017 8th International Conference on Computing, Communication and Networking
Technologies (ICCCNT), Delhi, India, pp 1–6, https://doi.org/10.1109/ICCCNT.2017.8203960
9. Jian-Wu Bi, Yang Liu, Zhi-Ping Fan (2019) Representing sentiment analysis results of online
reviews using interval type-2 fuzzy numbers and its application to product ranking. Inf Sci,
504, pp 293–307, ISSN 0020–0255, https://doi.org/10.1016/j.ins.2019.07.025
10. Choudhary M, Choudhary PK (2018) Sentiment analysis of text reviewing algorithm using data
mining, In International Conference on Smart Systems and Inventive Technology (ICSSIT),
pp 532–538
11. Guo Chonghui, Zhonglian Du, Kou Xinyue (2018) Products ranking through aspect-based
sentiment analysis of online heterogeneous reviews. J Syst Sci Syst Eng. 27(5):542–58
12. Sayyida Tabinda Kokab, Sohail Asghar, Shehneela Naz (2022) Transformer-based deep
learning models for the sentiment analysis of social media data, Array. 14, 100157, ISSN
2590–0056, https://doi.org/10.1016/j.array.2022.100157
13. Sajib Dasgupta and Vincent Ng (2009) Topic-wise, sentiment-wise, or otherwise?: Identi-
fying the hidden dimension for unsupervised text classification. In Proceedings of the 2009
Conference on Empirical Methods in Natural Language Processing: Volume 2-Volume 2,
pages 580–589. Association for Computational Linguistics
14. P´adraig Cunningham, Matthieu Cord, Sarah Jane Delany (2008) Supervised learning. In
Machine learning techniques for multimedia, pages 21–49. Springer
15. Minqing Hu and Bing Liu (2004) Mining and summarizing customer reviews. In: Proceedings
of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, pages 168–177. ACM
16. Thorsten Joachims (1998) Text categorization with support vector machines: Learning with
many relevant features. In: the European Conference on Machine Learning, pages 137–142.
Springer
17. Khairnar J, Kinikar M (2013) Machine learning algorithms for opinion mining and sentiment
classification. Int J Sci Res Publ 3(6):1–6
18. Ye Q, Zhang Z, Law R (2009) Sentiment classification of online reviews to travel destinations
by supervised machine learning approaches. Expert Syst Appl 36(3):6527–6535
19. Bing Liu (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol,
5(1):1–167, 20
20. Bingwei Liu, Erik Blasch, Yu Chen, Dan Shen, Genshe Chen (2013) Scalable sentiment classi-
fication for big data analysis using naive bayes classifier. In: Big Data, 2013 IEEE International
Conference on, pages 99–104. IEEE
21. Jingjing Liu, Yunbo Cao, Chin-Yew Lin, Yalou Huang, and Ming Zhou (2007) Low-quality
product review detection in opinion summarization. In: Proceedings of the 2007 Joint Confer-
ence on Empirical Methods in Natural Language Processing and Computational Natural
Language Learning (EMNLP-CoNLL)
22. Priyank Pandey, Manoj Kumar, Prakhar Srivastava (2016) Classification techniques for big
data: A survey. In: Computing for Sustainable Global Development (INDIACom), 2016 3rd
International Conference on, pages 3625–3629. IEEE
302 R. V. Prakash et al.
23. Bo Pang, Lillian Lee, Shivakumar Vaithyanathan (2002) Thumbs up?: sentiment classification
using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical
Methods in Natural Language Processing, 10, pages 79–86. Association for Computational
Linguistics
24. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Pretten-
hofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M,
Duchesnay E (2011) Scikitlearn: Machine learning in Python. J Mach Learn Res 12:2825–2830
25. Irina Rish (2001) An empirical study of the naive bayes classifier. In: IJCAI 2001 Workshop
on Empirical Methods in Artificial Intelligence, 3, pages 41–46. IBM
26. Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, Manfred Stede (2011) Lexicon-
based methods for sentiment analysis. Comput Linguist, 37(2):267–307, 21
27. Ni, Jianmo Li, Jiacheng, McAuley, Julian (2019). Justifying Recommendations using Distantly
Labeled Reviews and Fine-Grained Aspects. 188–197. https://doi.org/10.18653/v1/D19-1018
28. P. Chaovalit, Zhou L (2005) Movie review mining: a comparison between supervised and unsu-
pervised classification approaches. In: Proceedings of the 38th Annual Hawaii International
Conference on System Sciences, Big Island, HI, USA, pp 112c-112c, https://doi.org/10.1109/
HICSS.2005.445
29. Cristianini N, Shawe-Taylor J (2000) An Introduction to Support Vector Machines and Other
Kernel-based Learning Methods. Cambridge University Press, Cambridge. https://doi.org/10.
1017/CBO9780511801389
30. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other
kernel-based learning methods. Cambridge University Press, Cambridge
Alzheimer’s Disease Diagnosis Using
Machine Learning and Deep Learning
Techniques
Abstract A major global problem is dementia, a disorder that causes the loss of
cognitive abilities. Effective therapy and management depend on early detection.
By examining multiple sorts of data, including brain scans, speech, and gait, deep
learning (DL) and machine learning (ML) algorithms have shown promising results
in detecting dementia. Using DL and ML approaches, this survey study thoroughly
summarizes recent improvements and discoveries in the dementia diagnosis sector.
The most recent methods for dementia detection, including supervised, unsupervised,
and reinforcement learning approaches, are reviewed in this work. This paper’s main
goal is to explore various traditional machine learning approaches widely applied
for identifying and forecasting Alzheimer’s disease (AD) using MRI and linguistic
datasets. The merits and demerits of the various data formats employed in DL and
ML models for dementia detection are also covered. There is also a discussion of
the difficulties of applying DL and ML methods for dementia detection, such as
data imbalance, interpretability, and generalization. Using linguistic data, including
speech and text, has become increasingly popular lately to aid in detecting dementia.
An exhaustive review of the existing research on various linguistic markers used
to diagnose dementia is provided. The paper concludes by discussing the future
directions of DL and ML techniques in dementia detection and their potential impact
on early detection and treatment. With treatment, early detection of dementia can
enhance the patient’s quality of life, and DL/ML techniques can aid in identifying
the disease. The paper aims to provide researchers and practitioners in dementia
detection with a comprehensive comprehension of the present cutting-edge problems
and opportunities connected with DL and ML approaches.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 303
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_24
304 M. Karnik et al.
1 Introduction
2 Literature Survey
The detection of AD using ML, DL, and other innovative methods has garnered signif-
icant attention among researchers. To conduct our study, we extensively reviewed the
works of various authors who have made significant contributions to this field. This
literature survey is organized into three distinct sections, each focusing on different
aspects of AD detection. The first section comprises surveys conducted by authors
who utilized machine-learning techniques in their research. The second section
Alzheimer’s Disease Diagnosis Using Machine Learning and Deep … 305
machine learning classifiers to identify AD accurately. The test results show that
logistic regression performed exceptionally well in accuracy. For this experiment,
they used the Alzheimer’s Disease Neuroimaging Initiative dataset. Zhen Zhao et al.
[18] studied widely used conventional machine learning methods for classifying and
predicting AD using MRI. Some of the methods explored include the support vector
machine (SVM), random forest (RF), convolutional neural network (CNN), autoen-
coder, deep learning, and transformer. Additionally, they review widely used feature
extractors and several convolutional neural network input formats. Input type selec-
tion, deep learning, traditional machine learning approaches, innovative approaches,
and trade-offs related to these topics are also covered.
NoriI et al. [21] proposed that using administrative claims data machine learning
systems may predict Alzheimer’s and related diseases. A 50-variable model
exhibiting an AUC of 0.693 was selected using the Lassthe o technique from a
national de-identified dataset. The claims-based model outplayed previous models
directly derived from clinical data. Top predictive features, including neurolog-
ical testing, neurological diseases, signs and symptoms, altered mental status, and
psychosis diagnoses, were used by the authors. Castellazzi et al. [25] studied the
potential of ML algorithms and the combination with advanced MRI features to
enhance the diagnostic potential of Alzheimer’s and vascular dementia. Three algo-
rithms, namely support vector machines (SVM), adaptive neuro-fuzzy inference
systems (ANFIS), along with artificial neural networks (ANN) were used. ANFIS
was found to be the most efficient algorithm in discriminating between AD and
Vascular dementia. Zhu et al. [28] used 3 different feature selection methods to build
and confirm a machine learning-driven method for initial normal, VMD, and MCI
detection. Naive Bayes was the best-performing classification model. Kima et al. [22]
used linear discriminant analysis and Principal component analysis (PCA) to build
classifiers from the data on cortical thickness. Classifiers were Cognitive normal
vs Frontotemporal Dementia + AD, Frontotemporal Dementia vs. AD, Behav-
ioral Variant Frontotemporal Dementia Vs. Primary Progressive Aphasia, Semantic
Variant Primary Progressive Aphasia vs. Nonfluent/Agrammatic Variant Primary
Progressive Aphasia and the classification performance was assessed using cross-
validation with the tenfold method. The automatic classifier correctly classified FTD
clinical subtypes with good to exceptional accuracy and the hierarchical classification
tree. The paper comes up with an overview of machine learning methods in health
informatics for dementia care. Alroobaea et al. [38] mention that the OASIS dataset
has been utilized for backing ML models. The naïve Bayes classifier gave the lowest
accuracy value. For instance, the k-nearest neighbors achieved better performance
compared to the random forest and for the decision tree.
B. Deep Learning Approaches
Puente et al. [19] automatically predict AD’s presence in sagittal magnetic reso-
nance images (MRI) using DL techniques. The SVM classifier and DL method
ANN ResNet feature extractor were employed. The two primary findings of this
study were that sagittal MRI can discriminate between AD-related damage and its
phases and that DL models employed in horizontal plane MRI and sagittal MRI
Alzheimer’s Disease Diagnosis Using Machine Learning and Deep … 307
produced results that were compared to the state-of-the-art. The authors Bi Xiaojun
et al. [37] directly conducted Electroencephalogram EEG spectral image classifica-
tion by using a label layer, which resulted in a divisive version of the Contractive
Slab and Spike Convolutional Deep Boltzmann Machine (CssCDBM). Because the
designed model bridges the connection between feature extraction and classification,
compared to other generative models, it produces superior outcomes. This increase in
the inter-subject variations and reduction in the intra-subject variations are observed,
and both are important to the early diagnosis of AD. Taeho Jo et al. [30] employed a
model that combines a stacked auto-encoder (SAE) for feature selection with conven-
tional machine learning for classification. This led to an increase in accuracy rates
for AD classification and the prediction of the progression of a prodromal stage of
AD-moderate cognitive impairment (MCI). The combination of fluid biomarkers
and multimodal neuroimaging produced the best categorization results. Developing
2D CNN into 3D CNN is crucial when working with multimodal neuroimages,
particularly in AD research.
Gaoa et al. [1] introduced AD-related biomarkers, the feature extraction approach,
the preprocessing method, and depth models in AD diagnosis. Regarding classifi-
cation techniques, CNN is most frequently utilized and performs better than other
deep models in this field. The overfitting issue with the data set needs to be resolved,
though unsupervised care and self-management have advanced the field of medical
research imagery due to lack of medical data. An AD detection model employing
convolutional neural networks (CNN) has been developed using typical MRI images
as inputs. Transfer Learning (TL), with information recorded from diverse datasets,
is used to improve the finetuning of hyperparameters, thus increasing detection accu-
racy. It is thought that utilizing a machine learning model to identify AD automatically
will reduce the strain on medical professionals and improve the precision of medical
conclusions. The Generative Adversarial Network- Convolutional Neural Network-
Transfer Learning (GAN-CNN-TL) technique that is proposed in this research by
Chui et al. [2] offers the benefits of increased data creation, less biased detection
model, automated feature extraction, and improved hyperparameter tuning.
The absence of image patterns in the data can lead to overfitting and reduce
the performance of deep learning models. Versatile learning techniques like deep
learning were developed to address this issue. Orouskhani et al. [3] analyzed brain
MRI data and identified AD using a deep trine network (DTN) as a learning measure.
Due to insufficient models, the proposed deep ternary mesh adds a fuzzy function
to increase the model’s accuracy. The best and worst triplets are included in the
conditional triplet (CT) loss used in the concept model. This model predicts a four-
class classification problem to identify and diagnose Alzheimer’s disease using brain
MRI. Author Taher M. Ghazal et al. [4] have proposed a method for diagnosing
Alzheimer’s disease using adaptive learning for multiple classes from medical reso-
nance imaging (MRI) of the brain. The proposed AD detection empowered with the
Transfer Learning model is fast and can process small images without manual tools.
Numerous datasets, such as ADNI, are available to work with to achieve performance
that is equivalent to or better. Finally, multiclass Alzheimer’s disease may be detected
308 M. Karnik et al.
Rating (CDR) and compared based on the efficacy of deep features and handmade
features in detecting AD stages using different classifiers. DenseNet achieved a spec-
tacular classification accuracy for all three classes on augmented photos, and a spiking
neural network (SNN) achieved impressive classification accuracy. According to the
authors Abida Ashraf et al. [32], their work evaluated 13 deep NN architectures
such as Spiking neural networks, DenseNet, MobileNet, Squeezenet, ResNet, VGG,
GoogLeNet, and others using various kinds of input samples to enhance prediction
and categorization rate.
DenseNet achieved the highest classification accuracy for all three classes on
augmented photos. The most recent methods were compared with the Naz et al. [33].
The effectiveness of the suggested networks for comparing the MCI and AD classes
showed that the VGG19-SVM (Visual Geometry Group) has fc6 (layer) characteris-
tics. Similarly, freeze-fc6 layers were used in VGG16-SVM for CN vs. AD classes
and VGG16-SVM for CN vs. MCI. According to the study’s authors, Raghavendra
Pappagari et al. [34], this study investigated the application of acoustic and linguistic
techniques for automatic detection and Mini-Mental State Examination (MMSE)
and analysis of AD prediction for a low-resource environment. The x-vector model
and encoder-decoder automatic speech recognition embeddings provided the best
results, while the Bidirectional Encoder Representations from Transformers (BERT)
finetuned with automatic transcriptions from a commercial Automatic Speech Recog-
nition (ASR) system yielded the best results. They have suggested evaluating several
iterations of language model (LM) interpolation and multimodal approach classifiers
in evaluating deep feature models and handmade feature extraction models.
The Authors Emre Altinkaya et al. [36] used Deep Neural Networks (DNN), Deep
Boltzmann Machine (DBM), Convolutional Neural Networks (CNN), and Deep
Automatic Encoder (DA) models for the diagnosis of Alzheimer’s and Dementia
diseases. The data’s hidden data layers were used to accomplish the feature extrac-
tion method. Features accurately represented the AD-related regions of structures
such as ventricular size, hippocampus shape, cortical thickness, and brain volume.
According to Amir Ebrahimi et al. [39], the Temporal Convolutional Network (TCN)
model was used to comprehend the relationships between the features collected and
urged users to remove feature vectors from their MRI scan sequence. They were
using four and five residual blocks to fill vectors with zeros as their elements led to
increased AD detection accuracy.
Furthermore, a TCN with four procedural blocks outperformed the other models
tested. Voxel pre-selection techniques may be used to handle the high feature dimen-
sionality. According to the author Junxiu Liua et al. [29], the CNN model built in
this study outperforms other models. To optimize the model, Depthwise Separable
Convolutions (DSCs) are employed to replace three ordinary convolutions. The use
of pre-trained models improves transfer learning. This research provides a unique
DSC network-based procedure for diagnosing AD that decreases model parameters
and computing costs while retaining classification accuracy. Tests using the OASIS
MRI dataset show prospects for AD detection. Future research should try to integrate
DSC with AlexNet or GoogleNet to improve accuracy. By contrasting multimodal
Computer-Aided Diagnosis (CAD) systems’ performance with that of systems that
310 M. Karnik et al.
only use one MRI modality, Lazli et al. [27] presented a study of the multimodal CAD
systems’ performance based on quantitative measurement parameters. In this specific
instance, improvements in information fusion techniques in medical images are high-
lighted, emphasizing both the benefits and drawbacks. Finally, the key discoveries
made in brain illness evaluation, the utility of hybrid designs, and the benefits of
combining different modes have been addressed.
C. Linguistic Approaches
The Authors Aparna Balagopalan et al. [7] evaluated two popular approaches:
explicit feature engineering, which utilizes domain knowledge and transfer learning
using modified BERT classification models. In interactive conversation, referential
communication tasks (RCT) are employed to evaluate a talker’s capacity to choose
and verbally code an object’s features. The study by Liu et al. [15] analyzed manually
transcribed voice transcripts from 28 older persons using contextualized word repre-
sentations from NLP and machine learning approaches. The findings suggest that
RCTs may be useful as a recognition tool for AD and can be incorporated into small
samples without reducing the accuracy of the diagnosis. According to [35], speech
fluency is a crucial characteristic for AD identification, and two approaches are
suggested. The first approach is a paralinguistic system with low-dimension feature
vectors, making training easier irrespective of the language and goal. The second
approach is built on analyzing spoken word mistakes and temporal patterns of silence.
The i-vector framework, a speaker modeling approach used for speaker recognition,
language identification, speaker diarization, and speech-related health tasks, and a
traditional counting-terms algorithm, which processes speech transcriptions, is used
to compare the performance of these algorithms. In [23], various classifiers are used
to compare a large number of linguistic and audio data for the detection of AD.
The authors used a variety of classifiers to explore numerous speech and linguistic
factors that helped identify AD. The features the authors looked at were extracted
from speech, manual, or ASR transcripts, such as the X-vectors, ComParE set, the
TF-IDF, Linguistics set, and BERT embeddings. They found that improving classifi-
cation performance can benefit from using the PCA and Pearson’s correlation tests.
They also found that verbs, mean lengths of utterance, and type/token ratios are the
linguistic and auditory characteristics that most closely correlate with the presence
or absence of AD. These characteristics contain segment length, the RASTA-style
(Relative Spectral Filtering-style) filtered aural spectrum, and zero cross rate.
3 Datasets
During our survey, we found that numerous authors in the field of AD detection
utilized diverse datasets in their studies. While several datasets were employed, the
ADNI (Alzheimer’s Disease Neuroimaging Initiative) and OASIS (Open Access
Series of Imaging Studies) datasets emerged as the most commonly used data sources.
The ADNI dataset is a collaborative effort involving multiple institutions. It provides
Alzheimer’s Disease Diagnosis Using Machine Learning and Deep … 311
a rich collection of clinical, neuroimaging, and genetic data from individuals with
AD, mild cognitive impairment (MCI), and healthy controls. On the other hand,
the OASIS dataset comprises MRI scans, demographics, and clinical evaluations
of individuals diagnosed with AD and normal controls. Both datasets have been
extensively utilized in AD research and have contributed significantly to advancing
our understanding of the disease. In the following sections, we provide detailed
information about the ADNI and OASIS datasets, including their characteristics,
sample sizes, and relevant features, as these datasets have played a prominent role
in the literature on AD detection.
A. ADNI Dataset
The Alzheimer’s Disease Neuroimaging Initiative (ADNI) database collects clinical,
genetic, and brain data on adults without Alzheimer’s disease, MCI, and patients. The
ADNI project seeks to develop biomarkers for the early detection and monitoring
of AD development. It Hired more than 1,500 people from over 50 locations in the
US and Canada and included their data in the ADNI dataset. The dataset consists of
genetic information, results from cognitive tests, and data from MRI and positron
emission tomography (PET) brain scans. Scientists often use this information to
develop new methods for early detection and monitoring of Alzheimer’s disease and
to investigate the causes of the disease. The ADNI dataset is publicly available and
accessible through the ADNI website. It is useful for academic research on AD and its
associated disorders. It is a potent tool for creating novel biomarkers, comprehending
disease processes, and enhancing patient outcomes due to the variety of data kinds
and size of the sample.
B. OASIS Dataset
The Open Access Series of Imaging Studies (OASIS) dataset is another one that is
frequently used in studies on Alzheimer’s disease. It contains clinical information
and brain imaging from elderly patients with Alzheimer’s disease, MCI, and healthy
aging. MRI images, results from cognitive tests, and other clinical assessments are
all included in the collection. Numerous different datasets are available, each with
unique benefits and drawbacks accessible for study on Alzheimer’s disease in addi-
tion to the ADNI databases. The research objectives and available resources must
be carefully considered while selecting an appropriate dataset for a specific study.
The OASIS dataset has been employed for a broad range of research purposes,
including investigating alterations in the brain’s function and structure in AD and
advancing novel techniques for evaluating brain imaging data. It has also been used
to create models that anticipate how a disease will advance and how well a therapy
will work. Table 2 serves as a comprehensive summary of research undertaken in
Machine Learning. Table 3 provides the overall examination of studies conducted
in linguistics. Lastly, Table 4 has insights into research efforts within the domain of
Deep Learning. These tables detail the authors, methodologies employed, datasets
utilized, and the respective accuracy achieved in the referenced works shown in
Table 1.
312 M. Karnik et al.
Table 1 Details of the OASIS, ADNI datasets Accuracy and Model Implemented by a different
author
Model name Use by author Accuracy Dataset
(Average)
ResNet [1] 98% ADNI,OASIS
CNN [1, 2, 4, 6, 8, 10, 36] 95% ADNI
SVM [31, 33] 97% ADNI
DNN [9, 36] 90% ADNI
KNN [31, 36] 97% ADNI
ANN [27, 32] 92% OASIS,ADNI
Alexnet [6] 95% OASIS
Autoencoder [13] 94% ADNI
Random Forest [14, 31] 97% CP13
Transfer learning [2, 4] 92% OASIS
RNN [39] 93% Imagenet dataset
YOLOv3 [14] 97% CP13
TQWT [7] 96% SBS
BERT [34] 84% ADReSSo
Table 2 (continued)
Author Method Matrices used Dataset
Javier Mar et al. [24] Random forest ROC for psychotic Basque Health Service’s
cluster model and institutional database
(0,80) and for
depressive cluster
model (0.74)
Gloria Castellazzi Adaptive neuro-fuzzy Discriminating AD DTI + fMRI GT
Maria Giovanna inference system from VD (>84%),
Cuzzoni et al. [25] (ANFIS) prediction rate of
77.33%
Fubao Zhu et al. Information Gain, Naive Bayes Chwan Health System
[28] Random performed the best dataset
Forest, Naive Bayes (accuracy = 0.81,
precision = 0.82,
recall = 0.81,and
F1-score = 0.81)
Roobaea Alroobaea logistic regression and ML models are ADNI dataset and
et al. [38] SVM 99.43% and 99.10%. OASIS
ADNI-84.33% and
OASIS- 83.92%
Muhammad KNN, decision tree Accuracy, Precision, ADNI database
Shahbaz et al. [20] (DT), rule induction, Recall, Confusion
Naive Bayes, Matrix 79–99%
generalized linear
model (GLM) and DL
algorithm
Jianping Li et al. Logistic regression, Accuracy, Neuroimaging Initiative
[11] decision tree and SVM Specificity, data set
Sensitivity 98.12%
4 Experimental Results
Figure 1 visually depicts the project flow, with each block symbolizing a pivotal
stage or component in the overall process.
We conducted a comparative study involving machine learning (ML) and deep
learning (DL) algorithms on two distinct datasets—ADNI and OASIS. The algo-
rithms under consideration include Random Forest, Logistic Regression, VGG16,
VGG19, and InceptionV3. The results of this comparison are presented in Table 5,
along with their accuracy.
Figure 2 shows the performance of various machine learning and deep learning
algorithms on two distinct datasets, ADNI and OASIS. In machine learning, Random
Forest demonstrated an accuracy of 81% on ADNI and 91% on OASIS. Logistic
regression, on the other hand, exhibited a higher accuracy with 85% on ADNI and
100%
95.21%
Accuracy Percentage
95% 93.18%
91% 90.33%
90% 86.96%
85%
85% 82.52% 82.38% 82.17%
81%
80%
75%
70%
ADNI OASIS ADNI OASIS ADNI OASIS ADNI OASIS ADNI OASIS
d i i i GG 6 GG 9 i 3
Fig. 2 Experimental results
5 Conclusion
References
1. Gaoa S, Lima D (2021) A review of the application of deep learning in the detection of
Alzheimer’s disease. Int J Cogn Comput Eng. https://doi.org/10.1016/j.ijcce.2021.12.002
2. Kwok Tai Chui , Brij B. Gupta , Wadee Alhalabi and Fatma Salih Alzahrani (2022) An
MRI Scans-Based Alzheimer’s disease detection via convolutional neural network and transfer
learning. Diagnostics 12(7): 1531, https://doi.org/10.3390/diagnostics12071531
3. Maysam Orouskhani, Chengcheng Zhu , Sahar Rostamian , Firoozeh Shomal Zadeh , Mehrzad
Shafiei , Yasin Orouskhani (2022) Alzheimer’s disease detection from structural MRI using
conditional deep triplet network. Neuroscience Informatics SAS, 2, https://doi.org/10.1016/j.
neuri.2022.100066
4. Taher M Ghazal, Sagheer Abbas, Sundus Munir, Khan MA, Munir Ahmad, Ghassan F. Issa,
Syeda Binish Zahra, Muhammad Adnan Khan, Mohammad Kamrul Hasan (2021) Alzheimer
disease detection empowered with transfer learning, computers. Mater & Contin, 70(3), https://
doi.org/10.32604/cmc.2022.020866
5. El-Sappagh S, Saleh H, Ali F, Amer E, Abuhmed T (2022) Two-stage deep learning model for
Alzheimer’s disease detection and prediction of the mild cognitive impairment time. Neural
Comput Appl. https://doi.org/10.1007/s00521-022-07263-9
6. Sathish Kumar L, Hariharasitaraman S, Kanagaraj Narayanasamy, Thinakaran K, Mahalakshmi
J, Pandimurugan V (2022) AlexNet approach for early stage Alzheimer’s disease detection
from MRI brain images. Materials Today: Proceedings, p 58–65, https://doi.org/10.1016/j.
matpr.2021.04.415
7. Aparna Balagopalan, Benjamin Eyre, Frank Rudzicz, Jekaterina Novikova (2020) To BERT or
Not To BERT: Comparing Speech and Language-based Approaches for Alzheimers Disease
Detection. INTERSPEECH 2020, https://doi.org/10.48550/arXiv.2008.01551
8. Golrokh Mirzaei Hojjat Adeli (2022) Machine learning techniques for diagnosis of alzheimer
disease, mild cognitive disorder, and other types of dementia. Biomed Signal Process Control,
72, https://doi.org/10.1016/j.bspc.2021.103293
9. Lee E, Choi J-S, Kim M, Suk H-I (2019) The Alzheimer’s disease neuroimaging initiative,
toward an interpretable Alzheimer’s disease diagnostic model with regional abnormality repre-
sentation via deep learning. Elsevier NeuroImage 202:116113. https://doi.org/10.1016/j.neuroi
mage.2019.116113
10. Serkan Savas (2022) Detecting the stages of alzheimer’s disease with pre-trained deep learning
architectures. Arab J Sci Eng, https://doi.org/10.1007/s13369-021-06131-3
11. Muhammad Hammad Memon, Jianping Li, Amin Ul Haq And Muhammad Hunain Memon
(2020) Early Stage Alzheimer’s Disease Diagnosis Method. IEEE Explore
12. Kruthika KR, Rajeswari, Maheshappa HD (2019) Multistage classifier-based approach for
Alzheimer’s disease prediction and retrieval. Inform Med Unlocked, 14, pp 34–42 , https://doi.
org/10.1016/j.imu.2018.12.003
13. Haibing Guo and Yongjin Zhang (2020) Resting State fMRI and Improved Deep Learning
Algorithm for Earlier Detection of Alzheimer’s Disease. IEEE Acce. Special Section On Deep
Learning Algorithms For Internet Of Medical Things, 8
14. Koga S, Ikeda A, Dickson DW (2021) Deep learning-based model for diagnosing Alzheimer’s
disease and tauopathies. Neuropathol Appl Neurobiol. https://doi.org/10.1111/nan.12759
15. Mr. Ziming Liu, Dr. Eun Jin Paek, Dr. Si On Yoon, Dr. Devin Casenhiser, Dr. Wenjun Zhou and
Dr. Xiaopeng Zhao (2022) Detecting Alzheimer’s disease using natural language processing of
referential communication task transcripts (Referential Communication in AD). J Alzheimer’s
Dis, 86, https://doi.org/10.3233/jad-215137
16. Reem Bin-Hezam, Tomas E. Ward (2019) A machine learning approach towards detecting
dementia based on its modifiable risk factors. Int J Adv Comput Sci Appl, 10
17. Pai-Yi Chiu, Haipeng Tang, Cheng-Yu Wei, Chaoyang Zhang, Guang-Uei Hung, Weihua
Zhou (2019) A new machine-learning derived screening instrument to detect mild cognitive
impairment and dementia. PLOS ONE
318 M. Karnik et al.
18. Zhen Zhao, Joon Huang Chuah, Khin Wee Lai, Chee-Onn Chow, Munkhjargal Gochoo, Sami-
appan Dhanalakshmi, Na Wang, Wei Bao and Xiang Wu (2023) Conventional machine learning
and deep learning in Alzheimer’s disease diagnosis using neuroimaging: A review. Front
Comput Neurosci
19. Alejandro Puente-Castro, Enrique Fernandez-Blanco, Alejandro Pazos, Cristian R. Munteanu
(2020) "Automatic assessment of Alzheimer’s disease diagnosis based on deep learning
techniques. Elsevier, Comput Biol Med 120
20. Muhammad Shahbaz, Shahzad Ali, Aziz Guergachi, Aneeta Niazi and Amina Umer (2019)
Classification of Alzheimer’s disease using machine learning technique. In Proceedings of the
8th International Conference on Data Science, Technology and Applications (DATA 2019),
pages 296–303, https://doi.org/10.5220/0007949902960303
21. Vijay S Noril, Christopher A Hane, David C Martin, Alexander D kravetz, Darshak M Sang-
havi (2019) Identifying incident dementia by applying machine learning to a very large
administrative claims dataset. PLOS ONE
22. Jun Pyo Kima, Jeonghun Kimb, Yu Hyun Parka, Seong Beom Parka, Jin San Leed, Sole Yooe,
Eun-Joo Kimf, Hee Jin Kima, Duk L Naa, Jesse A Browning, Samuel N Lockharth, Samg
Won Seoa, Joon-Kyung Seong (2019) Machine learning based hierarchical classification of
frontotemporal dementia and Alzheimer’s disease. NeoroImage:Clinical. https://doi.org/10.
1016/j.nicl.2019.101811
23. Jinchao Li, Jianwei Yu, Zi Ye, Simon Wong, Manwai Mak, Brian Mak, Xunying Liu, Helen
Meng (2021) A comparative study of acoustic and linguistic features classification For
Alzheimer’s disease detection. EEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), https://doi.org/10.1109/ICASSP39728.2021.9414147
24. Mar J, Gorostizaa A, Cernudae OIC, Arrospidea A, Iruinc A, Larranagaa I, Taintab M, Ezpeletae
E, Alberdie A (2019) Validation of random forest machine learning models to predict Dementia-
Related neuropsychiatric symptoms in Real-World data. J Alzheimers Dis. https://doi.org/10.
3233/JAD-200345
25. Gloria Castellazzi, Maria Giovanna Cuzzoni, Matteo Cotta Ramusino, Daniele Martinelli,
Federica Denaro, Antonio Ricciardi, Paolo Vitali, Nicoletta Anzalone, Sara Bernini, Fulvia
Palesi, Elena Sinforiani, Alfredo Costa Giuseppe Micieli, Egidio D’Angelo,Giovanni Magenes
and Claudia A. M. Gandini Wheeler-Kingshott (2020) A machine learning approach for the
differential diagnosis of Alzheimer and Vascular dementia Fed by MRI selected features.
Frontiers in Neuroinformatics
26. Jabason E, Ahmad MO, Swamy MNS (2019) Classification of Alzheimer’s disease from MRI
data using an ensemble of hybrid deep convolutional neural networks. IEEE 62nd International
Midwest Symposium on Circuits and Systems (MWSCAS), Dallas, TX. USA. https://doi.org/
10.1109/MWSCAS.2019.8884939
27. Lazli L, Boukadoum M, Mohamed OA (2020) A survey on computer-aided diagnosis of brain
disorders through MRI based on machine learning and data mining methodologies with an
emphasis on Alzheimer disease diagnosis and the contribution of the multimodal fusion. Appl
Sci. https://doi.org/10.3390/app10051894
28. Fubao Zhu, Xiaonan Li, Haipeng Tang, Zhuo He, Chaoyang Zhang, Guang-Uei Hung, Pai-
Yi Chiu, Weihua Zhou (2020) Machine Learning for the Preliminary Diagnosis of Dementia,
Hindawi Sci Program
29. Junxiu Liu, Mingxing Li (2021) Alzheimer’s disease detection using depthwise separable
convolutional neural networks. Comput Methods Program Biomed, 203, https://doi.org/10.
1016/j.cmpb.2021.106032
30. Taeho Jo, Kwangsik Nho and Andrew J Saykin (2019) Deep Learning in Alzheimer’s disease:
diagnostic classification and prognostic prediction using neuroimaging data. Front Aging
Neurosci
31. Hina Nawaz, Muazzam Maqsood (2021) A deep feature-based real-time system for Alzheimer
disease stage detection. Multimed Tools Appl, https://doi.org/10.1007/s11042-020-09087-y
32. Ashraf A, Naz S (2021) Deep transfer learning for alzheimer neurological disorder detection.
Multimed Tools Appl. https://doi.org/10.1007/s11042-020-10331-8
Alzheimer’s Disease Diagnosis Using Machine Learning and Deep … 319
33. Naz S, Ashraf A (2022) Transfer learning using freeze features for Alzheimer neurological
disorder detection using ADNI dataset. Multimedia Syst. https://doi.org/10.1007/s00530-021-
00797-31
34. Raghavendra Pappagari, Jaejin Cho (2021) Automatic detection and assessment of Alzheimer
Disease using speech and language technologies in low-resource scenarios. Interspeech, https://
doi.org/10.21437/Interspeech.2021-1850
35. Edward L (2020) Campbell, Raul Yanez Mesía, Laura Docío-Fernandez, Carmen García-
Mateo, “Paralinguistic and linguistic fluency features for Alzheimer’s disease detection.”
Elsevier. https://doi.org/10.1016/j.csl.2021.101198
36. Altinkaya E, Polat K, Barakli B (2019) Detection of Alzheimer’s disease and dementia states
based on deep learning from MRI images: a comprehensive review. J Inst Electron Comput.
https://doi.org/10.33969/JIEC.2019.11005
37. Bi Xiaojun , Wang Haibo (2019) Early Alzheimer’s disease diagnosis based on EEG spectral
images using deep learning. Elsevier, Neural Networks 114
38. Roobaea Alroobaea, Seifeddine Mechti (2021) Alzheimer’s disease early detection using
machine learning techniques. ResearchSquare, https://doi.org/10.21203/rs.3.rs-624520/v1
39. Amir Ebrahimi, Suhuai Luo (2021) Deep sequence modeling for Alzheimer’s disease detection
using MRI. Comput Biol Med. 134, 2021, https://doi.org/10.1016/j.compbiomed.2021.104537
Sentinel Eyes Violence Detection System
1 Introduction
Concerns about crime and violence in urban areas have raised demands for more
sophisticated observation frameworks. Deep learning has since surfaced and shown
great potential in various computer vision applications, including counting wild
regions within recordings. This audit paper aims to present a detailed diagram of a
state-of-the-art real-time savagery discovery framework that combines multi-person
2D posture estimation using OpenPose, quick individual location with Yolov8, and
salvage activity classification using a combination of Long Short-Term Memory
(LSTM) and Convolutional Neural Organize (CNN). The detection of viciousness
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 321
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_25
322 S. Deshmukh et al.
2 Literature Review
The rise in crime and violence is posing increasing difficulties to public safety in
urban areas worldwide. Researchers use deep learning techniques in their advanced
surveillance systems to successfully fight these urgent challenges. Law enforcement
organizations can respond more swiftly and efficiently if these systems are used to
identify and categorize violent incidents in real-time.
It is impossible to overestimate the importance of Dhruv Shindhe et al.‘s funda-
mental work in presenting OpenPose as a real-time multi-person 2D pose estimation
technique [2]. A deep learning technique called OpenPose can be used to track and
identify various people’s body parts in real time within a scenario. This is an essen-
tial initial stage in detecting violence because it enables the system to recognize and
follow the parties involved in a fight. It has been demonstrated that OpenPose can
recognize body parts accurately even under difficult circumstances like dim lighting
or occlusion. Furthermore, it has demonstrated sufficient speed for real-time appli-
cations. OpenPose has thus gained popularity as a solution for violence detection
systems. Apart from its accuracy and speed, OpenPose has various other benefits
[7]. It proposes using the transformer-based approach with a combination of 3D
CNN and OpenPose for object detection. For instance, it may be used with a range
of various cameras and is reasonably simple to train and deploy. Therefore, Open-
Pose is a flexible and strong instrument that can be utilized to raise the efficacy of
systems for detecting violence.
In a similar spirit, B. The innovative object identification technique, YOLO v5, was
introduced by Arthi et al. in 2022. Vanitha [5] proposes using the Yolov5 algorithm
to detect a violent crime. A real-time object identification program called YOLO
v5 is remarkably effective at spotting possible hazards within video frames. To do
this, the input image is divided into a grid of cells, and each cell’s bounding boxes
and class labels are then predicted. It has been demonstrated that YOLO v5 can
accurately identify a wide range of things, including people, cars, and weapons. It
has also been demonstrated to be successful in real-time violence detection. As per
B’s study. According to Arthi et al., YOLO v5 has a 90\% accuracy rate in identifying
violent occurrences. These fundamental developments provide researchers with a
platform to investigate the combination of Long Short-Term Memory (LSTMs) and
Convolutional Neural Networks (CNNs) to classify violence. While LSTMs work
well for capturing temporal features, CNNs work well for extracting spatial features
from images. In a single stream, this kind of CNN is able to extract characteristics
324 S. Deshmukh et al.
from both spatial and temporal data. Compared to the two-stream model, this makes
it a more effective and efficient method of classifying violence [1] Almamon Rasool
Abdali et al. These are only a handful of the several methods that have been put forth
for CNN and LSTM-based violence classification. More successful and efficient
models will probably be created as long as this field of study is pursued.
Additionally, by recommending the application of Long Short-Term Memory
(LSTM), a variation of Recurrent Neural Networks (RNNs) made to capture the
subtleties of long-term temporal data, Anusha Jayasimhan et al. [4] have made
a significant contribution to the field of violence detection. Their groundbreaking
research shows how useful LSTM is for deciphering sequential patterns in human
behavior, particularly when it comes to violence detection. Souvik Kumar et al.
[6] have made their system with the combination of CNN and LSTM. Jain et al.
[8] have combined LSTM with CNN for their model implementation. Long-term
connections between events in a video can be learned by LSTMs, which is crucial
for correctly classifying violent content. For instance, an LSTM-based model might
discover that a person raising their fist frequently results in a punch or that two
arguing are more likely to use violence than two people conversing. Furthermore, a
thorough investigation on deep learning for human activity recognition was carried
out by Traoré et al. (2020) [12], highlighting the efficiency of Convolutional Neural
Networks (CNNs) in extracting spatial characteristics from video frames. Systems
are able to identify changing patterns and context over time because of this CNN
and LSTM integration, which improves classification accuracy and is a vital tool for
urban security. Although the application of CNNs and LSTMs for violence detection
is still in its infancy, the initial findings are encouraging. It is expected that even
more effective and efficient models will be produced as long as research in this field
is conducted. These models could significantly affect public safety by assisting in
preventing violence and shielding individuals from harm.
The concept put forth focuses on analyzing surveillance footage for real-time
violence detection. The MobileNet CNN system is used for object detection by Himi
et al. [10]. Real-time applications are a good fit for MobileNet, a lightweight CNN.
It is quick and easy to use since it extracts features from images using a feed-forward
convolution method. The MobileNet CNN features are then processed using LSTM
layers for precise action detection. Temporal information is crucial for violence
detection, and LSTMs are ideally equipped to capture it. In addition, the suggested
methodology employs a single embodiment strategy to protect video footage from
many surveillance sources. This can help to minimize the quantity of data that needs
to be processed because the model just examines a single frame or shot from the
video. Furthermore, the abovementioned model is designed to provide children with
total protection. This is accomplished by adding the age factor to the LSTM layers.
This makes it possible for the model to discern between children’s aggressive and
non-violent conduct, which is crucial for preventing child abuse. Authors in [9]
propose using Resnet architecture along with IOT integration using keyframing.
A. Similar to the two-stream paradigm, Arthi et al. [11] also suggested using the
YOLO method to identify items in every frame. YOLO is a quick and precise object
detection system that identifies individuals, objects, and other items in a video. By
Sentinel Eyes Violence Detection System 325
merging the two-stream model’s output and YOLO’s output, Arthi et al. increased
the accuracy of the violence classification. When the dataset was split into 80% for
training and 20% for testing, the validation accuracy of their model stabilized at a
value between 80 and 90%. This implies that their approach can effectively generalize
to previously undiscovered data. B. Arthi et al. employed recurrent neural networks
(RNNs) in addition to the two-stream model and YOLO to extract the temporal
information from the video. Neural networks that can interpret sequential data, such
as video frames, are called RNNs. B. Arthi et al. increased the accuracy of classifying
violence by utilizing RNNs and accounting for the temporal correlations between
the frames. The product of B. The possibility of combining various deep learning
approaches to increase the accuracy of violence classification is demonstrated by
Arthi et al. More successful and efficient models will probably be created as long as
this field of study is pursued.
Haque et al. [13] have presented a more inventive method for handling this. They
use Gated Recurrent Units (GRUs) with Convolutional Neural Networks (CNNs) to
categorize aggression in videos. While the GRU is used to capture temporal char-
acteristics, the CNN extracts spatial features from individual frames. After that, the
two streams are blended to determine the final classification. The model creates a
collection of 512 features for every frame by encoding the data from 4 to 2D. The
GRU layer then extracts the temporal aspect of the data as a 1D vector. Next, this
vector is classified to ascertain whether or not the frame is violent. A dropout layer
with a 0.25 dropping rate is added to prevent overfitting. The AVDC video dataset,
a sizable collection of violent videos, served as the model’s training set. The model
attained ninety percent test accuracy—a promising outcome. The problem of CNNs
processing one image at a time is resolved by introducing GRUs in this model. This
results from GRUs’ capacity to record the temporal correlations between frames.
Furthermore, GRUs are easier to train than LSTMs due to their lower complexity.
This method is a potentially useful advancement in the realm of violent crime detec-
tion. It is reasonably efficient and capable of achieving great accuracy. It is expected
that even more effective and efficient models will be produced as long as research
in this field is conducted. Haque et al.’s BrutNet model is a novel and promising
method for detecting aggression. It is reasonably efficient and capable of achieving
great accuracy. Because of this, it is a strong contender for practical uses.
An attention network can be used to concentrate on the areas of the frame that
are most likely to contain violence in the context of violence detection. The 3D
light-weight attention network (LP3DAM) is one of the most promising attention
networks for violence detection [14]. A 3D CNN called LP3DAM employs attention
to concentrate on the most crucial areas of the frame. A collection of hazy and
indistinct photos—common in real-world surveillance footage—was used to train the
network. Keyframe extraction based on clustering is another potential method. Using
this method, a video’s frames are initially grouped into clusters of related frames.
Other methods for detecting aggression using attention networks have also been
developed besides those covered above. Gated recurrent units (GRUs), for instance,
have been employed by certain academics to discover long-term connections between
326 S. Deshmukh et al.
frames. Others have combined the predictions of numerous models using ensemble
methods.
A clustering-based keyframe extraction approach has been designed to maximize
the efficiency and accuracy of real-time monitoring. In order for this approach to
function, the video frames are first clustered into groups of related frames. Next, a
keyframe is chosen from each group based on which frame best represents that group.
By doing this, the number of frames that must be processed greatly decreases, which
can increase the system’s efficiency. Furthermore, the clustering-based keyframe
extraction technique can lessen false alarms during violence classification by elim-
inating duplicate frames. An analysis by [14], for instance, discovered that the
clustering-based keyframe extraction approach might cut the number of frames by up
to 90\% % without appreciably compromising the violent classification’s accuracy.
This makes it a viable strategy for raising the effectiveness and precision of violence
detection systems that operate in real-time. Other methods have also been put forth
to increase the effectiveness and precision of real-time violence detection systems in
addition to the clustering-based keyframe extraction strategy.
Bhaktram Jain et al. [3] have highlighted the potential of Long Short-Term
Memory (LSTM) in violence detection systems within transitory analysis. Recur-
rent neural networks (RNNs) of the long-term temporal dependency (LSTM) kind
are ideal for this task. This makes it an important tool for recognizing and catego-
rizing violent acts, which are frequently dynamic and ever-changing occurrences.
LSTMs function by continuously updating an internal state. In this state, individ-
uals can recall the past and utilize it to anticipate the future. This is crucial for
violence detection because it enables the algorithm to recognize behavioral patterns
pointing to an approaching attack. While LSTMs are a promising new technology
for violence detection, they have drawbacks. One drawback is that training them can
be computationally costly.
In conclusion, the significant developments in real-time violence detection
systems driven by deep learning technology are explained by this literature study.
The combination of multi-person 2D posture estimation, fast individual detection,
and CNN with LSTM has shown great potential in tackling the intricate problems
caused by crime and violence in cities. The foundational contributions covered here
provide direction for future study and growth and offer insights into the discipline’s
current state. These developments could improve system performance, streamline
operations, and lead to new deep-learning uses in urban security.
3 Methodology
for accurate pose estimation, deriving rich data on the critical spots on the human body
to understand body positions, motions, and gestures that might suggest aggressive
conduct. Figure 2 illustrates the accurate Project Flow.
We use deep learning techniques to extract significant insights from the data.
Specifically, we introduce a Convolutional Neural Network (CNN) architecture to
extract relevant features from the bounding boxes and highlight important locations.
A key component of our process is integrating data from OpenPose’s pose estimate
and YOLO’s object recognition to create a cohesive feature representation, allowing
for a deeper comprehension of the visual cues linked to violence. Since video data has
a temporal dimension, we also include a Long Short-Term Memory (LSTM) network.
With this LSTM component, we can decode the complex sequential dependencies
found in video sequences, enabling us to model and examine the patterns that change
over time and underpin violent acts.
Our research technique includes validation and evaluation as essential compo-
nents, wherein we thoroughly assess our violence detection system’s performance.
We use well-established evaluation criteria to measure the system’s effectiveness,
including accuracy, precision, recall, and the F1-score. We use cross-validation
methods and adjust hyperparameters to optimize the model to guarantee optimal
performance. We must present a thorough analysis of our system’s performance, high-
lighting its strengths, weaknesses, and possible areas for improvement. To demon-
strate the benefits of our technology, we also draw comparisons with other violence
detection techniques now in use.
We extend our technology into real-time video processing environments, paving
the way for public safety, security, and surveillance applications, moving from
study to practical implementation. Our technique is profoundly ingrained with
ethical considerations as we tackle responsible use, privacy, and surveillance issues.
When necessary, we investigate privacy-preserving methods to balance security and
individual rights.
328 S. Deshmukh et al.
3.1 Openpose
3.2 Yolov8
The most recent and advanced YOLO model, YOLOv8, applies to tasks like instance
segmentation, object detection, and image classification. The company Ultralytics,
which also developed the well-known and industry-defining YOLOv5 model, is the
creator of YOLOv8. Compared to YOLOv5, YOLOv8 has a number of architectural
and developer experience enhancements.
Sentinel Eyes Violence Detection System 329
The Predictions Vector: YOLO’s output encoding is the first thing to comprehend.
Cells in the input image are arranged in a S x S grid. One grid cell is considered to
be “in charge” of anticipating each object that is visible in the picture. That is the
compartment into which the object’s center falls.
Loss Function:
s2 B obj
2
2
λcoord (xi − xi ) − (yi − yi )
i=0 j=0 ij
3.3 Cnn-Lstm
4 Result
4.2 Architectures
an astounding 94% accuracy rate. This combination allows the model to incorporate
temporal and spatial information that is essential for recognizing aggressive behav-
iors. With an impressive accuracy score of 96%, CNN-LSTM—a combo-focused
on spatio-temporal patterns in videos—performs better than others. This demon-
strates the importance of recording temporal and spatial dynamics to detect violence
accurately.
Conversely, the Artificial Neural Network (ANN), a fundamental machine
learning model, has a decent accuracy of 89%. Although ANNs have a simpler struc-
ture than deep learning models, they are nevertheless a good choice for problems
involving aggression detection. Table 2 illustrates the accuracy for the different archi-
tectures used. In summary, the model selection should be based on the application’s
particular needs, taking into account elements like precision, processing complexity,
and the significance of capturing temporal dynamics in video data. Figure 5 illustrates
the comparison between different architectures concerning their F-Measure.
5 Conclusion
CNN and LSTM work together to leverage spatiotemporal highlights, completing the
circle of progressing accuracy by tracking designs and settings over time. Reconnais-
sance demands accuracy and efficiency, and the clustering-based keyframe extrac-
tion method reduces false warnings, maximizes processing, and decreases repetitive
outlining. Using the practical analysis of LSTM strengthens the system’s ability
to distinguish between benign and aggressive activity, providing law enforcement
with a proactive tool for safer city environments. As this thorough writing audit
outlines, the mix of cutting-edge technology shows significant promise in mitigating
the problems posed by urban brutality and misbehavior. The course for future inves-
tigations, which will focus on increasing efficiency, enhancing system performance,
and exploring innovative deep-learning applications within urban security, is made
clear by this audit.
References
1. Abdali AR, Aggar AA (2022) DEVTrV2: enhanced data-efficient video transformer for
violence detection. In: Proceedings of the 2022 7th international conference on image, vision
and computing (ICIVC). Transformers, CNN
2. Dhruv Shindhe S, Sushant Govindraj SN (2021) Omkar real-time violence activity detection
using deep neural networks in a CCTV camera. In: 2021 IEEE international conference on
electronics, computing and communication technologies (CONECCT). YoloV3, OpenPose
3. Jain B, Paul A, Supraja P (2023) Violence detection in real life videos using deep learning. In:
2023 Third international conference on advances in electrical, computing, communication and
sustainable technologies (ICAECT). LSTM
4. Jayasimhan A, Pabitha P (2022) A hybrid model using 2D and 3D Convolutional Neural
Networks for violence detection in a video dataset. In: I2022 3rd international conference on
communication, computing and industry 4.0 (C2I4). CNN
5. Vanitha K, Ninoria S (2022) A detection of violence from CCTV cameras in real-time using
machine learning. In: 2022 fourth international conference on emerging research in electronics,
computer science and technology (ICERECT). CNN, Net-SSD
6. Parui SK, Biswas SK, Das S, Chakraborty M, Purkayastha B (2023) An efficient violence
detection system from video clips using ConvLSTM and keyframe extraction. In: 2023 11th
international conference on internet of everything, microwave engineering, communication
and networks (IEMECON). CNN + LSTM
7. Zhou L (2022) End-to-end video violence detection with transformer. In: 2022 5th international
conference on pattern recognition and artificial intelligence (PRAI). Transformers 3D CNN +
OpenPose
8. Jain A, Vishwakarma DK (2020) Deep neuralnet for violence detection using motion features
from dynamic images. In: 2020 third international conference on smart systems and inventive
technology (ICSSIT), Tirunelveli, India, 2020, pp 826–831. ConvLSTM
9. Bineeshia J, Chidambaram G (2023) Physical violence detection in videos using keyframing.
In: 2023 international conference on intelligent systems for communication, IoT and security
(ICISCoIS), Coimbatore, India, 2023, pp 275–280. Resnet, ConvLSTM
10. Himi ST, Gomasta SS, Monalisa NT, Islam ME (2020) A framework on deep learning-based
indoor child exploitation alert system. In: 2020 IEEE international symposium on technology
and society (ISTAS), Tempe, AZ, USA, 2020, pp 497–500. CNN + LSTM
11. Arthi B, PoornaPushkala K, Arya A, Rajasekhar D (2022) Wearable sensors and real-time
system for detecting violence using artificial intelligence. In: 2022 international conference on
Sentinel Eyes Violence Detection System 333
Abstract Alzheimer’s disease (AD) ranks as the predominant factor leading to the
onset of dementia. AD leads to a steady decline in memory, reasoning, behavior, and
social abilities. These changes affect a person’s personality. Alzheimer’s disease is
most common in older adults. Conventional diagnostics methods for Alzheimer’s
disease face challenges like invasiveness, cost, imprecision, and patient discomfort.
The aim is to develop an innovative AI model offering a non-invasive, cost-effective
solution. In this suggested system, we use a convolutional neural network to classify
AD into four classes: Normal Phase, First Phase, Second Phase, and Third Phase. The
dataset size is 6000 MRI (Magnetic Resonance Imaging) images. AD causes the brain
regions to shrink, and connections between the network of neurons may break down.
Hence, accurate and timely Alzheimer’s disease classification is crucial for effective
treatment and better patient outcomes. The primary goal of the suggested model is
to classify AD with greater accuracy using a deep learning model. The work follows
several preprocessing steps for the input images, such as image resizing, image
pixel resizing, and batching images to enhance the input features, and for disease
classification, the Convolutional Neural Network model is used. The suggested model
was evaluated using Kaggle dataset images, and a maximum accuracy of 97.25%
was obtained using the Kaggle dataset. This research addresses the critical need for
precise AD classification, potentially revolutionizing patient outcomes through early
detection.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 335
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_26
336 N. Santosh et al.
1 Introduction
2 Related Work
adding four layers of 1000 neurons with a 0.6 dropout rate between each layer.
Notably, their method could differentiate delicate cases of no, very mild, and mild
AD, allowing for the possibility of a quick and accurate diagnosis of early-stage AD,
which is critical for early intervention.
Jinyu Wen, Yang Li et al. developed an innovative method of Fine-Grained and
Multiple Classification for Alzheimer’s Disease with Wavelet Convolution Unit
Network [10]. They used data from the Alzheimer’s Disease Neuroimaging (ADNI)
collection, which included 902 samples with Alzheimer’s Disease (AD), Late Mild
Cognitive Impairment (LMCI), Early Mild Cognitive Impairment (EMCI), and
Normal Control (NC). We use a unique network utilizing diffuse tensor images
to achieve fine-grained classifications with up to 97.30%, 95.78%, 95.00%, 94.00%,
97.89%, 95.71%, 95.07%, and 93.79% accuracy for all eight types of fine-grained
classifications.
Manop Phankokkruad and Sirirat Wacharawichanant used Deep Transfer
Learning Models with Over-Sampling to classify Alzheimer’s disease [11]. Through
over-sampling methods, the researchers used deep transfer learning models to clas-
sify the stages of progression in Alzheimer’s disease. The research used the following
transfer learning models: VGG19, Xception, ResNet50, and MobileNetV2. The accu-
racy of these models differed, with Xception leading the way at 82.46%, followed by
MobileNetV2 at 79.29%, VGG19 at 77.73%, and ResNet50 at 76.28%. The study
used a dataset of 6,327 occurrences categorized into four groups: non-demented,
mild, moderate, and severe. Over-sampling approaches were used to correct class
imbalances in the data set; the findings of this study shed light on the effective-
ness of deep transfer learning models in the classification of Alzheimer’s disease
progression, with Xception appearing as the most accurate model.
Ruchika Das and Shobhanjana Kalita developed a robust classification system
for Alzheimer’s Disease (AD) [12] stages through volumetric MRI data analysis.
They achieved remarkable results with a segmentation train accuracy of 93% and a
test accuracy of 90% using a 3D-UNET architecture. Additionally, they employed a
unique volumetric analysis approach to categorize the four phases of AD based
on hippocampus volume, achieving an accuracy of 91% person-wise and 88%
hemisphere-wise after adjusting the threshold using the root mean square error
(RMSE). This study represents a significant contribution to the field, showcasing
the potential of deep learning models and volumetric analysis in early AD diagnosis,
which is crucial for improving patient outcomes.
The research that was discussed in this part has provided several aspects of this
crucial field of medical diagnostics. In conclusion, the application of deep learning
in Alzheimer’s Disease (AD) classification is essential. Archana and Kalirajan’s [8]
work achieved consistently high accuracy rates, highlighting the trend of improving
accuracy in this field. In contrast, Maximus Liu, Mikhail Y. Shalaginov et al.’s [9]
VGG16-based model stood out with an astonishing 99.68% accuracy for early AD
detection. Lei Zhu et al. [10] developed a new Wavelet Convolution Unit Network
(WCU-Net), which achieves exceptional accuracy in fine-grained AD classifica-
tions. Ruchika Das and Shobhanjana Kalita established the efficacy of volumetric
analysis for AD stage categorization using a 3D-UNET architecture. Finally, the
Detection of Alzheimer’s Disease from Brain MRI Images Using … 339
paper [11] investigated deep transfer learning models and over-sampling strategies,
naming Xception as a leading model for Alzheimer’s disease progression classifica-
tion. Collectively, these various techniques. Achieving high accuracy on one dataset
does not guarantee similar performance on other datasets.
3 Implementation
3.1 Preprocessing
Our CNN architecture gives good results for Alzheimer’s Disease Classification. We
used convolutional layers, Maxpool layers, and dense layers. These are the founda-
tions of the model’s strength, not just structural features. Each layer serves a specific
role, extracting critical features that compose the model’s basic predictive capabili-
ties. This layered technique transforms and refines MRI data at each level, capturing
even the tiniest information for brain tumor diagnosis. This layered design places
our CNN as precise in medical imaging.
Here, Fig. 3 explains the model architecture. The initial Conv2D layer is the foun-
dation for feature extraction, with 32 size filters, kernel size (5, 5), and activation used
is relu. It goes deeply into MRI scans, finding edges and gradients and establishing
the framework for the model’s understanding of these complex medical images. Filter
sizes adapt to MRI scan features as we move through consecutive Conv2D layers.
After using the first Conv2D layer, “Batch Normalization” was used. This Batch
normalization will help to ensure stability and improve network training. The second
Detection of Alzheimer’s Disease from Brain MRI Images Using … 341
layer contains 64 filters (5, 5) that recognize more complex patterns. It enhances the
model’s ability to classify Alzheimer’s disease accurately.
Max-pooling layers of size (2, 2) are used in CNN because they reduce spatial
dimensions, promote translation invariance, choose the most prominent features,
and improve computing efficiency. Densely connected layers are essential to the
classification process in the suggested architecture. These layers are inserted into our
CNN as intellectual centers, capturing high-level features and shaping the model’s
predictions. Controlling model complexity to improve performance is a precise task
when determining the number of units in each dense layer.
These layers protect the model’s cognitive abilities, ensuring it recognizes
complex patterns, retrieves relevant information, and offers accurate classifications.
Every unit functions like a neuron in a neural orchestra, collaborating to decode brain
tumor images. This orchestration of multiple layers provides a delicate combination
of accuracy, efficiency, and adaptability in medical imaging.
The selection and layout of pooling and dense layers are critical foundations
in building our CNN architecture and significantly impact its performance. We
achieve successful down-sampling by sparingly using max-pooling layers, which
optimizes computing efficiency and protects against the dangers of overfitting. These
pooling layers dance with the spatial data, keeping the most essential properties while
condensing information. The structure of our CNN model is shown in the archi-
tectural diagram (Fig. 2). It shows the sequential flow of layers, such as convolu-
tional, pooling, and dense layers, to provide a complete understanding of our model’s
architecture.
This section thoroughly examines the results acquired from the Alzheimer’s Disease
Classification model we constructed. The experiment was run on a GPU Tesla T4
using Python 3.10 and TensorFlow 2.13.3.
342 N. Santosh et al.
The Kaggle dataset, a rigorously selected collection showcasing a wide range of brain
MRI images, serves as the core of our research. This dataset presents 6000 images,
where each class has 1500 images. The harmonious symmetry in the distribution
of images in all classes demonstrates the repository’s robustness. This purposeful
balance guarantees that our model obtains fair and unbiased exposure to both groups
during the important training and evaluation phases. As a result, it considerably
improves the model’s potential, sharpening its accuracy in separating the subtle differ-
ences between Alzheimer’s Disease classes and MRI representations, delivering a
robust and trustworthy diagnostic tool.
For training, 60 epochs were used with a batch size of 32. Training accuracy reached
an impressive 100% accuracy, and when it comes to test accuracy, we achieved
around 97.25%, showing good results. The training loss rapidly decreased, and the
test loss was slightly higher at 0.1371. The discrepancy between training and test loss
shows that training data may have been overfitted, which regularization approaches
like dropout could remedy. Despite this, the model effectively reduces training and
test data classification errors.
Table 1 explains the performance of our model for every 10 epochs. The training
and testing log presented here shows the machine-learning model’s development
over 60 training epochs. In machine learning, each epoch represents a complete
cycle through the entire training dataset during neural network training. The model
processes the data in mini-batches during each epoch, performing forward and back-
ward passes, calculating gradients, and updating its parameters to minimize the loss
function. Typically, many epochs allow the model to learn and modify its parameters
to enhance predicting performance.
Detection of Alzheimer’s Disease from Brain MRI Images Using … 343
During our training as epoch 1, the model starts learning with its initial accuracy
of 0.5044 and loss of 1.0182. It’s normal to have a tentative model performance
during the early stages of training. However, as training increases, training accu-
racy improves noticeably. This gradual improvement in accuracy indicates that the
model is learning to produce increasingly accurate predictions over time, indicating
consistency and success in the model’s learning process.
From Fig. 4, we can see the confusion matrix. Here it shows that our model’s
performance is good. It shows good performance in predicting classes. There are not
many false positives or false negatives.
Fig. 5 a Training loss versus Validation loss of CNN architecture, b Training accuracy versus
Validation accuracy of CNN architecture
From Fig. 5, we can see two learning curves: loss and accuracy. Here, in loss, we see
the performance of validation loss and training loss, and the same comparison was
made for accuracy. The learning dynamics and generalization ability of the proposed
model are interesting. In epoch 1, the model starts with a loss of 1.0182 and an
accuracy of 0.5044; this indicates the initial calculation challenges, even though
the validation accuracy and loss show the same calculation challenges. From these
comparisons, our model’s performance improved exceptionally. The time per epoch
is 306 s, leaving room for efficiency. However, as training continued, improvement
was noticeable. By epoch 20 we can see the sudden decrease of loss to 0.0019 and
increase accuracy of 0.9996, and validation accuracy also improves. The time per
epoch at epoch 20 is 279 s, and the reduction is 27 s. This indicates an improvement
in efficiency.
In the following epochs, model performance continues to show improvement.
At the last epoch, 60% accuracy is 1.0000, and validation accuracy is 0.9725. This
shows the intense learning of the model. From epoch 40 to 60, the consistency of the
model is shown. At epoch 60, the validation accuracy is 0.9725, and the validation
loss of 0.1371 is normal, demonstrating the model’s high adaptability to new MRI
data.
Here, Table 2 compares the performance metrics of other research papers with
our proposed model, labeled as “Proposed Model,” alongside advanced Alzheimer’s
Disease models from the literature. From Table 2, we can see Wu et al. [3] got an
accuracy of 90.1%, Joshi et al. [5] got an accuracy of 91.80%, Phankokkruad and
Wacharawichanant [7] got an accuracy of 82.46%, Archana and Kalirajan [8] got
an accuracy of 97% and our model outperformed them by getting an accuracy of
97.25%. It demonstrates its proficiency in accuracy. Our model outperforms some
existing models due to different datasets or architecture, which is better than some
Detection of Alzheimer’s Disease from Brain MRI Images Using … 345
Table 2 Comparison of
Author name Accuracy (%)
performance metrics
Wu et al. [3] 90.1
Joshi et al. [5] 91.80
Phankokkruad and Wacharawichanant [7] 82.46
Archana and Kalirajan [8] 97
Proposed model 97.25
5 Conclusions
Our suggested model represents a vital improvement in medical imaging and diag-
nostic accuracy. Through the state-of-the-art Convolutional Neural Network (CNN)
and preprocessing techniques, we achieved an extraordinary 97.25% classification
accuracy. This incredible accuracy and our entire model have established a new
measure in Alzheimer’s Disease Classification. The proposed model has laid a strong
foundation for Alzheimer’s Disease diagnosis. We focused on Multi-Class Clas-
sification to classify the mild demented, non-demented, very mild demented, and
moderate demented classes. This transition helps us to understand different classes
of Alzheimer’s Disease. In the future, we focus on advanced preprocessing tech-
niques, taking different metrics, and focusing on comparative analysis of different
models. The model may give different results if different datasets and imbalanced
data exist among different classes. In practical application, accurate classification
models can contribute to early detection of Alzheimer’s disease. Our goal continues
to produce a versatile tool that bridges AI with medical diagnostics and eventually
enhances patient care by utilizing advanced algorithms and preprocessing techniques.
References
Nitin Pise
Abstract For ensuring global food security and sustainable agriculture, a major
challenge is to control plant diseases. There is a need to improve existing procedures
for early detection of plant diseases by using deep learning-based modern automatic
image recognition systems. The paper describes such methods based on convolutional
neural networks. Crop monitoring is the monitoring of crop growth and performance
during developmental stages, and it allows farmers to intervene at the right time to
ensure optimal yields at the end of the season. All over the world, banana produc-
tion is affected by numerous diseases. Most diseases include Panama wilt, leaf spot
diseases, yellow Sigatoka, black Sigatoka, bacterial wilt, bunchy top, banana bract
mosaic virus, and CMV (Cucumber Mosaic Virus). Innovative and quick methods for
detecting diseases will allow us to monitor more efficiently and create proper fertiga-
tion strategies, which will help farmers increase their yield. Combining aerial image
data from unmanned aerial vehicles (UAVs) with machine learning algorithms can
deliver an accurate and efficient technique for detecting crop diseases in real-world
conditions. The paper describes one approach for detecting banana plant diseases in
the Jalgaon District of India using various deep-learning approaches. Accuracy is
achieved by more than 98%.
1 Introduction
India is called an agricultural country because 65.07% of its people live in villages.
However, the contribution of agriculture to the Indian economy is less than 20%.
The current population of India is more than 1.35 billion, which will increase to
more than 1.8 billion in 2040. The agricultural produce should grow by 50% to
N. Pise (B)
Vishwanath Karad MIT World Peace University, Pune 411038, India
e-mail: nitin.pise@mitwpu.edu.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 347
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_27
348 N. Pise
feed this growing population. Many automation efforts are required in India, as it is
necessary to increase crop yield on the same land. Technologies, such as the Internet
of Things, cloud computing, edge computing, drones, wireless sensor networks, and
machine learning, can increase farmers’ produce, and it should be catered to the
entire population with an adequate supply.
Banana plantations are vulnerable to a range of adverse weather circumstances,
including a lack of air circulation during cultivation due to high planting density,
the planting of infected suckers, inadequate nutrient management, and faulty disease
identification expertise. Cucumber Mosaic Virus (CMV), Yellow and Black Siga-
toka, Banana Bunchy Top Virus (BBTV), and Banana Streak Virus (BSV) are the
predominant diseases in India. By 2050, per capita banana consumption is antici-
pated to climb threefold. The average national banana productivity rate is 37 MT/ha,
which is much lower than in states such as Maharashtra and Tamil Nadu (>60–70).
Even though India is one of the world’s largest banana producers, we export far less
(exports to the world are nominal). So, due to these factors, the detection of diseases
plays a vital role in making banana plantations easy and highly profitable. Most
banana diseases are identified manually, typically by farmers and fertilizer traders.
This procedure is inefficient and inaccurate in making the correct judgments and
recommending treatments. This paper represents the deep learning methods used
for banana leaf disease detection and identification with an emphasis on machine
learning in a significant way towards smart agriculture. Section 2 describes a liter-
ature survey on the state-of-the-art techniques used for plant disease detection. The
proposed method is explained in detail using the Sects. 3.1 to 3.6. The result anal-
ysis is presented in Sect. 4, along with the various performance measures and the
result graphs. The conclusion and discussion of results are depicted in Sects. 5 and
6, respectively.
2 Literature Review
The issue of detecting leaf disease has long been a concern in agriculture for crop
quality management. The following are some already proposed systems in the area
[1]. The proposed research is on an indigenous, technology-based agriculture solu-
tion that provides important insights into crop health. This is achieved by extracting
complementary features from a multimodal dataset and minimizing the crop ground
survey effort on large-sized lands. Arnab Kumar [2] describes the machine-learning
techniques that can be used for plant disease detection. He suggests methods that can
be integrated into the drone using the Raspberry Pi 3B module. Surya Prabha [3] gives
ideas about disease classification in banana crops using image processing to identify,
analyze, and manage plant diseases. Major diseases occur in the leaf areas of banana
crops. Change of color is a major criterion used to classify the leaf disease. Different
models like Red Green Blue (RGB), Hue Saturation and Value (HSV), Hue Saturation
and Intensity (HSI), and CIE LAB are available to extract required features accurately
Detection of Banana Plant Diseases Using Convolutional Neural Network 349
for disease identification. Various feature extraction techniques, like Gray Level Co-
occurrence Matrix (GLCM), Support Vector Machine (SVM), and Neural Network
(NN) are available for disease classification. Rutu [4] proposes an image-based clas-
sification for plant disease identification. To augment the limited number of local
photos of Indian plants and diseases available, the model employs Generative Adver-
sarial Networks (GANs). A convolutional neural network (CNN) is used for classi-
fication and is deployed in a smartphone app. Two CNN architecture models have
been compared—Inception v3 and MobileNets. Both models were tested to see how
they were compared regarding accuracy, training speed, and model size. The future
scope of using drones that can navigate through fields and capture several pictures
using computer vision is given in the paper. Shruti [5] reviews learning classifica-
tion approaches for detecting plant diseases. Comparative analysis is conducted on
five machine-learning classification techniques for disease detection. The Artificial
Neural Network (ANN), Fuzzy C-Means Classifier, K- Nearest Neighbor Classifica-
tion Technique, Support Vector Machine (SVM), and Convolutional Neural Network
Classification methods and their efficiencies are studied to detect plant diseases.
The CNN classifier efficiently diagnoses a greater number of diseases. Omkar [6]
describes an application that uses leaf textual similarity to predict the type of crop
disease. The model is trained using a dataset including healthy and diseased crop
leaves. In the task of crop detection, the InceptionV3 model outperforms MobileNet.
The authors used the Singular Value Decomposition (SVD) to process the cropped
leaf images, extract the corresponding information, remove noise, compress the data,
and reduce the image size to solve the problem of identifying crop diseases in agri-
cultural activity [7]. Crop disease photos are utilized to train the neural network
model (feature extraction), and the trained model is used to identify crop diseases.
Deep learning is faster, easier to use, and has greater recognition accuracy than the
first two methods. As a result, they suggest the MDFC-ResNet model for the deep
learning system model, which can identify common and severe crop diseases and
is more informative for real agricultural production activities. Improving accuracy
at a lower level increases the system’s performance [8]. This research explains how
a convolutional neural network trained with transfer learning and fine-tuning may
be used to monitor the nutritional content of farmland by detecting nutrient deficits
using image recognition of banana leaves. The methodology used in this research
is dataset acquisition, data augmentation, image pre-processing, color space conver-
sion, VGG16, metrics comparison, selecting the best model, and uploading the model
to the platform prototype. The most effective pre-processing method was histogram
equalization, with a validation and training accuracy of 98.61% and 99.28% [9]. The
authors studied plant disease detection and its solution using image classification. It
developed an improved k-mean clustering approach to estimate the infected region
of the leaves. A color-based segmentation model segments and assigns the infected
zone to the appropriate classes. Image acquisition, picture pre-processing, image
segmentation, feature extraction, and classification are all processes in the disease
detection process [10], which uses a deep-learning approach to detect and classify
leaf diseases in bananas. In particular, the architecture is used as a CNN to clas-
sify data sets containing banana images. The main diseases discussed in this article
350 N. Pise
are Panama, Moko, Sigatoka, black spot, banana, infectious chlorosis, and banana
streak virus. The prediction time is almost negligible. Practical implementation on
many banana plants is impossible [11]; it examines various UAV platforms, their
limitations, advantages, cameras, sensors, and spectral requirements for capturing
images and acquiring data for plant disease monitoring and detection. RGB cameras
are inexpensive and widely available. RGB photos are less accurate than multispec-
tral or hyper-spectral images because they can only measure three electromagnetic
spectrum bands. UAVs with AI and deep learning are being used to improve crop
disease detection and monitoring precision [13]. They describe various studies of
early detection of plant diseases using automatic image recognition systems-based
deep learning models. The problems faced in agricultural IoT are investigated, and
the future development of agricultural IoT is discussed in [14, 15], which presents
an algorithm for image segmentation used for the automatic detection and classifica-
tion of plant leaf diseases. It also surveys different disease classification techniques
used for plant leaf disease detection. Image segmentation is an important part of
disease detection in plant leaf disease. The authors used a genetic algorithm for
image segmentation. A comprehensive discussion on the detection of diseases using
image processing and classification performance is discussed, considering the work
in this domain proposed from 1997 to 2016 [16]. Also, the authors discussed the chal-
lenges and some prospects for future improvements in this domain [18] and described
the creation of dataset for plant leaves of tomato, cauliflower, and mango and used
CNN pre-trained models such as VGG-16 and Inception V3 for multi-crop disease
detection. Authors in [19] used deep belief networks for autism spectrum disorder
classification and authors in [20] used a CNN based model with three convolution
and max pooling layers with the number of filers varying in each layer. The Plant
Village dataset is used for experimentation.
3 Proposed Method
This section describes various subsections such as prototype drone assembling for
collection of banana plant images, dataset preparation using data pre-processing
steps, selection of the machine learning models or algorithms for the banana plant
disease detection.
The different steps are followed for the drone assembly. Some steps include deciding
the thrust-to-weight ratio, Navio2 hardware setup, setting up drone hardware,
Raspberry PI wi-fi configuration, and manual flight control.
Detection of Banana Plant Diseases Using Convolutional Neural Network 351
3000 images of banana leaves are gathered from banana plants in Jalgaon District
of Maharashtra, India. Care is taken so that almost all of the major diseases found
in India are covered. The dataset has 3 classes: CMV, Sigatoka, and Healthy. The
Healthy and Sigatoka classes were taken from the Kaggle website, specifically the
Banana Leaf Dataset uploaded by (Kaies Al Mahmud scholars, 2022) [12]. Some
images were collected from various websites, including Google, blogs, and social
networking software.
After deleting a few blurry photos and images with significant external environ-
mental contamination, the dataset was organized. The dataset was expanded to nearly
3000 images after using data augmentation techniques like horizontal flip, vertical
flip, brightness range, and rotation range.
To feed the model, the images had to be standardized and normalized to the required
pixel size to reduce all the features to the same scale without altering the differences
in the range of values. Image resizing and rescaling were performed as part of the
pre-processing process. Because not all our images are the exact size we require,
352 N. Pise
it’s critical to understand how to resize an image correctly and how resizing works.
The pixel information of an image changes when it is resized. Scaling images is an
important aspect of image processing. Multiple reasons necessitate image scaling up
or down. Downscaling of high-resolution digital images is common to fit different
display screens or save on storage and bandwidth costs. After resizing and rescaling,
the final image size we obtain is 113*150.
Figure 2 shows the flow of banana plant disease detection using deep learning algo-
rithms. The dataset is prepared as explained in Sect. 3.2 and stored on the cloud. The
dataset is split into 80:20 so that 80% of the instances are used for training the deep
learning model, and 20% of instances are used for testing the learned model. Finally,
the model classifies unseen banana plant images into healthy and disease classes.
Based on the plant’s disease, fertigation is recommended to the farmer to take care
of the plant’s disease. The flow of the proposed method is explained below.
Algorithm for banana leaf disease detection
1. Start
2. Data collection, Image pre-processing, and dataset preparation for banana leaves
3. CNN Model Selection
4. Dataset splitting into 80:20 ratio for model training and testing
5. Validation and classification into healthy and disease classes by CNN
Fig. 3 a Healthy banana plant image. b. Banana plant image with CMV disease and Yellow Sigatoka
disease respectively
Figure 3a and b show the different images, such as healthy, CMV disease, and
Yellow Sigatoka disease, respectively, on banana plants collected from Jalgaon
District, India.
The various deep learning models used for classifying the healthy and disease
images of banana plants are discussed in Sect. 3.5.
complex data from images, CNN uses a variety of activation and loss functions.
VGG16 is a 16-layer deep convolutional neural network (CNN) architecture. The 16
in VGG16 layers stand for 16 weighted layers. It has thirteen convolutional layers,
five Max Pooling layers, and three dense layers, totaling 21 layers, but only sixteen
weight layers (the learnable parameters layer).
The size of the input tensor for VGG16 is kept as (224,244,3), where 224*244 is
an image size, and 3 reflects the number of RGB channels. Instead of many hyper-
parameters, RESNET50 is one of the best CNN architectures for various computer
vision tasks. The ResNet-50 model has five stages, each with a convolution and
identity block. Every convolution block and identity block has three layers. ResNet-
50 has over 23 million trainable parameters. It is highly efficient for addressing
critical problems such as gradient explosion and vanishing gradient problems. The
strength of ResNet lies in introducing the concept of Skip Connection in the neural
network. ResNet was implemented using transfer learning from Keras. The pre-
trained “ima202genet” weights were used with “average” pooling and input shape
of (113,150,3).
The model structure is as follows: The first layer is a “pre-trained_model”,
followed by a flattening layer and two dense layers. The activation functions used
in the last two dense layers are “ReLU” and “softmax” respectively. An Activation
Function decides whether a neuron should be activated or not. It uses simpler mathe-
matical operations to determine whether the neuron’s input to the network is essential
throughout the prediction process. The rectified linear activation function (ReLU)
is linear. An activation function will output the input directly if it is positive or 0
otherwise. The softmax function is used as the activation function in the output layer
of the neural network. It predicts the probability of the presence of all the classes
in an input. The batch size was kept at 32 with an image size of (113,150). The
train, test, and validation split were 8:1:1, respectively. Finally, the input shape was
(32,113,150,3). The model was then compiled on the’ adam’ optimizer, with loss as
‘SparseCategoricalCrossen-tropy’, and metrics as’ accuracy’. Finally, the model was
trained on a total of 10 epochs. The final train, validation, and test accuracies were
observed to be 1.0, 1.0, and 1.0 consecutively. Then the model was tested on a batch
of images from the test dataset, where the maximum confidence observed was nearly
100%. Finally, the model was tested on unknown images taken by mobile phones,
where the accuracy observed was nearly 85–90%. After performing predictions on
all the data provided to the model, it generates a “.csv” (comma-separated value) file
of all the predictions, which is then further processed to identify the predominant
disease in the field. Identification of predominant diseases in the field allows farmers
to save capital on the cost of fertilizers and increase their profitability per bunch.
The various experiments were carried out on the prepared data. The experimental
settings are described in Sect. 3.6.
Detection of Banana Plant Diseases Using Convolutional Neural Network 355
The following software was used for the experimentation work. GoogleColab (For
building Deep Learning Model) environment is implemented using the Python
programming language (Python 3). The neural network-based modeling open-source
framework used is TensorFlow v2.4.1 [17], Google Drive 100 GB storage for storing
Dataset (raw as well as labeled), PyCharm for building FastAPI, Postman for running
the FastAPI, Mission Planner, Putty, Qgroundcontrol, VNC Viewer. The data was
collected in various lighting conditions of the day with the help of a Raspberry Pi
camera. Some of the images were collected with mobile phone cameras. The model
was trained on a total of 3000 images of shape (113,150,3). Then, the model was
cross-verified by testing it on images unknown to the model and collected from the
banana fields. The model predicts the images in this format (correct label, predicted
label) so that we can cross-check each result.
4 Results
Recall is the ratio of correctly predicted positive values to all values in the current
class.
Table 1 Comparison table for evaluation matrix for the different models
Model name Train Acc Test Val Precision Recall F1 score
Acc Acc
CNN 98.4 97.8 96.5 0.473 0.474 0.472
CNN_altered 98.6 100 100 0.354 0.361 0.345
RESETNET50 100 100 100 0.328 0.334 0.325
VGG16 34.9 34.0 34.0 – – –
Fig. 4 Graph of RESNET50 for epoch versus training and validation accuracy and loss
358 N. Pise
Fig. 5 Graph of VGG16 for epoch versus training and validation accuracy and loss
Detection of Banana Plant Diseases Using Convolutional Neural Network 359
Fig. 6 Graph of CNN for epoch versus training and validation accuracy and loss
360 N. Pise
Fig. 7 Graph of modified CNN for epoch versus training and validation accuracy and loss
5 Conclusion
Early detection of diseases has become essential to match the growing per capita
consumption of bananas. CMV (Cucumber Mosaic Virus) has been devastating in
India for the last 4–5 years. Farmers must throw away nearly 25–30% of their plan-
tations each year due to CMV. Keeping the banana plants healthy throughout the
growth season is crucial for increasing the average weight per bunch. We have built
the technology with the help of deep learning to address the problem of early detec-
tion of diseases in banana plants. We experimented with models such as ResNet50,
VGG16, self-designed CNN, and CNN with an altered train-test-validation ratio,
and we concluded that ResNet50 outperformed all the models. ResNet50 subse-
quently gave training, testing, and validation accuracies of nearly 1.0, 1.0, and 1.0.
After testing a batch of images from the test dataset, all the images were predicted
correctly with nearly 100% confidence on most of the images. To validate the model
Detection of Banana Plant Diseases Using Convolutional Neural Network 361
for farmers, we have collected some images of banana leaves from banana fields.
ResNet50 gave an accuracy of nearly 88% on those images, too. With these excel-
lent accuracies, our model will be very beneficial for farmers in the early predic-
tion of diseases. Early disease detection increases the average weight per banana
bunch, eventually increasing profitability per bunch. No research has ever been able
to forecast the dominant disease in the entire banana field, but we can predict the
predominant diseases with a good accuracy of nearly 88%. Our vision is to increase
per capita banana production by 3–5 times by 2050, and we call it DREAM2050.
6 Discussion
From Table 1 in the result section, the proposed method shown as CNN with
RESNET50 and CNN Altered performs better for the different classification param-
eters such as accuracy, precision, recall, and F1-score. We have implemented various
models, such as VGG16, ResNet50, self-designed CNN, and CNN with a different
train-test-validation split ratio. Epochs versus accuracy (training and validation) and
epochs versus loss have been plotted for each model. A confusion matrix of all the
models has been drawn. VGG16 gives the least accurate results, while ResNet50
and our self-designed CNN give almost the same and best accuracy. The CNN with
an altered train-test-validation split ratio also gives good classification accuracy.
After studying all the plots, we learned that the plot of VGG16 was very skewed. In
contrast, the plot of CNN was way better than VGG16, and the curve never became
flat with epochs, which means the model kept learning till the last epoch, and there
was no problem with overfitting or underfitting. The plot of CNN with an altered
ratio of train-test-split was skewed, and hence, it seemed to be overfitted and also
resembled VGG16. ResNet gave the best plot with an increasing number of epochs.
Even though ResNet gave the best plot, the learning curve became flat after some
epochs, which reflects that the model stopped extracting features after some epochs.
References
1. Shafi U, Mumtaz R, Iqbal N, Zaidi SMH, Zaidi SAR, Hussain I, Mahmood Z (2020) A multi-
modal approach for crop health mapping using low altitude remote sensing, internet of things
(IoT) and e learning. IEEE Access. https://doi.org/10.1109/ACCESS.2020.3002948
2. Saha AK, Saha J, Ray R, Sircar S, Dutta S, Chattopadhyay SP, Saha HN (2018) IOT-based drone
for improvement of crop quality in agricultural field. In: 2018 IEEE 8th annual computing and
communication workshop and conference (CCWC)
3. Surya Prabha D, Satheesh Kuma J (2014) Study on banana leaf disease identification using
image processing methods. 2014, Int J Res Comput Sci Inf Technol 2(2(A))
4. Gandhi R, Nimbalkar S, Yelamanchili N, Ponkshe S (2018) Plant disease detection using
CNNs and GANs as an augmentative approach. In: IEEE international conference on innovative
research and development
362 N. Pise
Abstract For efficient pest and crop protection management in agriculture, fast
and precise insect pest identification is essential. Manual examination, which can
be time-consuming and subject to human mistakes, is frequently used in traditional
procedures. This article proposes a novel method for automated insect pest detec-
tion in agricultural photography using deep learning techniques, notably Convolu-
tional Neural Networks (CNNs). Preprocessing was done on a varied data set that
included high-resolution pictures of both healthy and pest-infested plants. The care-
fully planned CNN architecture included multiple convolutional and pooling layers
to extract pertinent characteristics from the images. Data augmentation approaches
were used to improve the model’s ability to generalize across various environmental
situations. A rigorous cross-validation process was used to train and assess the model,
which resulted in an amazing classification accuracy of about 89.6%. Metrics like
precision, recall, and F1-score showed that the system performed better at iden-
tifying both true positives and negatives. A pre-trained model’s performance was
also enhanced by the incorporation of transfer learning, which sped up the training
process. The findings of this study demonstrate the potential of CNN-based strategies
to transform agricultural pest management techniques. The created model provides
a scalable and effective approach for early pest detection, which can eventually
reduce production loss and minimize environmental impact related to traditional
pest treatment methods.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 363
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_28
364 S. Anilkumar et al.
1 Introduction
The Dangerous Farm Insect data set comprises 15 distinct classes of insects that pose
significant threats to agricultural practices and crop production. Each class represents
a specific type of insect, meticulously labeled to facilitate accurate selection and
identification. The primary objective of this dataset is to enable the development
of a robust system capable of effectively detecting and classifying these insects.
Implementing such a system holds immense potential for pest control, benefiting
both residential areas and farms alike. By mitigating the detrimental effects caused
by these insects, the system safeguards crop quality, improves farmers’ earnings, and
positively impacts human health and the overall economy.
1. Africanized Honey Bees, also known as Killer Bees: These bees are extremely
aggressive and may be dangerous to people and animals. When people come into
contact with them, their protective nature frequently causes fatalities or extremely
severe allergic reactions.
2. Aphids: Tiny insects that feed on sap and infest various plants. They seriously
harm the plant, resulting in stunted growth, twisted leaves, and decreased crop yield
by draining its sap.
3. Army worms: Armyworms are omnivorous caterpillars primarily feeding on
grasses and cereal crops. Their fast feeding can destroy large tracts of crops, resulting
in a significant loss of yield.
4. Brown Marmorated Stink Bugs: By puncturing fruits, vegetables, and other
plant parts, the invasive brown marmorated stink bugs harm various crops Utilizing
their mouths they produce blemishes and rot; due to this feeding behavior, it loses
value in the market.
5. Cabbage Loopers: These pests cause havoc by feeding on the leaves of cruciferous
vegetables like broccoli, cabbage, and other varieties. Their feeding activity causes
defoliation, which lowers crop yield and quality.
6. Citrus Canker: A bacterial disease that causes lesions on leaves, fruit, and stems,
citrus canker is a problem that citrus trees face. Early fruit drop, lower yield, and
weakened tree health affect citrus growers and the industry.
Insect Management in Crops Using Deep Learning 365
2 Literature Survey
The impact of plant diseases and pests on agricultural productivity and quality is
substantial, necessitating the development of efficient diagnostic techniques. Tradi-
tional methods are used when compared negatively to the advances made possible
by deep learning in digital image processing. The paper outlines the advantages and
disadvantages of each deep learning element, classifying new research into three
categories: segmentation networks, detection networks, and classification networks.
For comparison study, common datasets and performance measures are presented.
The article discusses the practical difficulties in using deep learning to detect pests
and plant diseases and suggests future research areas and possible solutions. Finally,
the analysis projects future developments in this dynamic industry [1].
Insect pest detection and monitoring system developments in precision agricul-
ture are reviewed in this study. For early identification, it highlights the usage of
acoustic sensors, infrared sensors, and image-based categorization. Current appli-
cations, methods, and advancements involving machine learning and the Internet of
Things are discussed. Future directions for pest control decision support systems and
automated traps are also considered [2].
For early treatment and reducing financial losses in agriculture, the study describes
a deep learning-based method for identifying pests and illnesses in tomato plants. The
accuracy of three feature extractors combined with deep learning meta-architectures
is assessed. Improved annotation and data augmentation techniques are presented in
the study, which improves training efficiency. After extensive testing on a large data
set, the system successfully identifies nine different plant diseases and pests, demon-
strating its capacity to handle challenging conditions in the environment around a
plant [3].
An Anchor-Free Region Convolutional Neural Network (AF-RCNN) for accurate
identification and categorization of 24 kinds of agricultural pests is presented in this
paper. Using a 20 k-image data set, the approach—which combines a feature fusion
module and an anchor-free region proposal network (AFRPN)—performs better than
conventional techniques, with 56.4% mean average precision and 85.1% mean recall.
Compared to Faster R-CNN and YOLO detectors, the AF-RCNN performs better,
offering higher accuracy and real-time detection capabilities (0.07 s per image). It has
been determined that the suggested approach is efficient and suitable for intelligent,
real-time agricultural pest detection [4].
The study presents an insect pest detection method that classifies nine and twenty-
four insect classes using machine learning approaches such as convolutional neural
networks and artificial neural networks. The approach gets good classification rates
(91.5% and 90%) when it is applied to datasets by Wang, Xie, Deng, and IP102. It
works better than conventional techniques, showing faster computation times and
increased accuracy. The suggested method shows potential for identifying insects at
an early stage, which will improve crop quality and productivity in agriculture [5].
The study examines current developments in plant disease detection, focusing on
the application of machine learning and image processing methods, especially with
366 S. Anilkumar et al.
RGB images for economy. There has been a noticeable shift towards deep learning,
as evidenced by the high recognition accuracy reported in controlled settings. The
paper uses various CNN architectures to present experimental results on leaf disease
recognition and provides recommendations for deployment in traditional and mobile
/embedded computing environments. The difficulties of increasing workable auto-
matic plant disease detection systems for field settings are examined, emphasizing
the need for more investigation and solutions [6].
The paper reviews recent advancements in plant disease detection, emphasizing
the use of image processing and machine learning techniques, particularly with
RGB images for cost-effectiveness. The shift towards deep learning is noted, with
high recognition accuracies reported in controlled environments. The paper presents
experimental results on leaf disease recognition using different CNN architectures
and offers recommendations for deployment in conventional and mobile/embedded
computing environments. Challenges in developing practical automatic plant disease
recognition systems for field conditions are discussed, highlighting the need for
further research and resolution [7].
In agriculture, pests are decreasing agricultural productivity. Identifying a pest
is a pest in agriculture that reduces its productivity. Pest identification is a difficult
process that depends on professional judgment. Many efforts are being made these
days to detect pests automatically. The development of object detection architectures
in Deep Learning makes it feasible. This paper compares the accuracy performance of
image augmentation with a focus on small data sets and demonstrates multi-class pest
detection using Faster R-CNN architecture. To address the issue of class imbalance,
we have employed 90-degree rotation augmentation parameters and horizontal flip.
Using Faster R-CNN architecture, we discovered that a trained pest detection model
with augmentation options can outperform the others with an accuracy of 91.02%
[8].
The primary subjects of the study are Distribution of Plant Biophysical and
Biochemical Parameters, Spatial Structure and Variability in the UAV and Sentinel-2
Imagery, and Comparison [9]. Transect Line Information from UAVs and Sentinel-2.
To produce distinct crop map outputs, WRASIS was utilized in the paper to describe
crop patterns, add crop spectral signatures to software libraries, and identify different
crop types. Six cutting-edge convolutional neural networks are merged and optimized
for the proposed models. After that, each model is evaluated independently and in
combination for the given task.
In the end, a support vector machine (SVM) classifier is used to evaluate how
well various combinations obtained from the recommended models perform. We
gathered the Turkey-Plant Dataset, a collection of unconstrained photos of 15 distinct
diseases and pest kinds that were documented in Turkey, to confirm the accuracy of
the suggested model. The formula used to determine the accuracy scores is 97.56%.
The performance outcomes demonstrate that the majority voting ensemble model and
the early fusion ensemble model were used by 96.83% of the sample. The findings
show that the suggested models match or surpass state-of-the-art outcomes for this
problem-solving approach.
Insect Management in Crops Using Deep Learning 367
3 Proposed Methodology
Preparing the data is the foundational step in this research architecture. Thorough
cleaning and refining of the dataset is required to ensure no extraneous or unwanted
photographs are present. This process is crucial for several reasons. It initially elim-
inates noisy data to ensure the model is trained on a high-quality dataset, which is
required to obtain correct results. Second, data preparation standardizes the infor-
mation and prepares it for analysis. One may employ scaling, noise reduction,
and contrast enhancement methods. Moreover, data augmentation can increase the
dataset’s diversity, which is crucial for picture-based tasks since it exposes the model
to a greater range of image variants, improving its ability to generalize.
After preprocessing has prepared the dataset, the next important step is to produce
an annotated dataset. Annotation is the process of meticulously classifying data,
essential to supervised learning. Labeling items or areas of interest within the photos
in the context of the image data is a common step in this procedure. Constructing
an annotated dataset demands accuracy and knowledge to ensure label accuracy.
These annotations serve as the model’s points of reference as it learns. Therefore,
it takes a lot of work to create a reliable annotated dataset, but it is necessary for
training machine learning models, particularly for tasks like image recognition and
classification.
We will split the dataset into test, train, and validation datasets, with 70 percent of
the dataset to train, 20 percent of the dataset, and 10 percent to validate the dataset.
The architecture’s picture emphasizes how machine learning models are devel-
oped iteratively. If the initial training fails to accomplish the desired result, like insect
pest detection, more training rounds are applied to the model.
Throughout these iterations, the annotated dataset may grow, the model archi-
tecture may be modified, or the hyperparameters may be tweaked. This iterative
approach recognizes that machine learning is a dynamic and adaptive process that
requires resilience and adaptation to yield optimal outcomes. It is necessary to be
flexible and open to revision when working to enhance a model.
The architecture’s picture emphasizes how machine learning models are devel-
oped iteratively. If the initial training fails to accomplish the desired result, like insect
pest detection, more training rounds are applied to the model.
Throughout these iterations, the annotated dataset may grow, the model archi-
tecture may be modified, or the hyperparameters may be tweaked. This iterative
approach recognizes that machine learning is a dynamic and adaptive process that
requires resilience and adaptation to yield optimal outcomes. It is necessary to be
flexible and open to revision when working to enhance a model.
An example of a deep learning model specifically created for processing struc-
tured grid data, like images, is the convolutional neural network (CNN) architecture
(Fig. 1). CNNs are multilayer neural networks that automatically and adaptively
learn hierarchical representations of input data through convolutional layers. The
basic building blocks are convolutional layers, pooling layers, and fully connected
layers. Convolutional layers utilize filters to identify spatial feature hierarchies within
368 S. Anilkumar et al.
the input data, whereas pooling layers minimize dimensionality while maintaining
crucial information. As a classifier, the fully connected layers combine the acquired
features to generate predictions. Due to their ability to automatically extract perti-
nent features from raw data, CNNs are widely used in image classification, object
detection, and other computer vision tasks. They are particularly good at capturing
spatial dependencies. The adaptive capability of the architecture is ideal for tasks
where local patterns contribute to a global understanding of the input because of its
hierarchical representations.
An architecture of a convolutional neural network (CNN) is shown in the image
supplied. Conv2D layers are used for convolution, MaxPooling2D layers are used for
downsampling, flatten layers are used to flatten the output, Dense layers are used for
completely linked layers, and Dropout layers are used for regularization to prevent
overfitting. These layers make up the model’s sequential structure.
The model has a size of 14.49 MB and 3,798,089 total trainable parameters. The
design appears appropriate for picture classification tasks, and the Dropout layers
indicate the efforts to improve the model’s generalization performance.
An enhanced convolutional neural network architecture called EfficientNetB3
was created for effective and efficient image classification applications. It was
first presented as a member of the EfficientNet family and is a balanced, scal-
able model that performs exceptionally well on a range of computing resources
(Fig. 2). Compound scaling is the main innovation, which involves methodically
increasing the model’s depth, width, and resolution to achieve the ideal balance
Insect Management in Crops Using Deep Learning 369
have applications in numerous other domains where data-driven solutions are essen-
tial. By heeding these suggestions, scholars can increase the likelihood of creating
accurate and efficient machine-learning models.
There are 75,000 images and 27 pest kinds in the IP102 Data Set. There are over 500
photos for each pest kind. We also investigated another data set known as the pest
data set, which has 3000 photos and 9 distinct types of pests, each with numerous
photographs. We will train the model with both the IP102 and pest-data sets and test
it with the test data to make it exact and confident.
Insect Management in Crops Using Deep Learning 371
On the other hand, validation accuracy assesses how well the model performs
on a distinct dataset that it hasn’t encountered during training, commonly called the
validation dataset. This dataset shows how well the model performs in real-world
applications. It’s critical to attain a high validation accuracy because it demonstrates
the model’s accuracy in predicting new, unobserved data. A sizable difference in
training and validation accuracy could mean that the model is overfitting the training
data and may have generalization problems (Fig. 4).
The graph above in Fig-4 represents the Convolutional Neural Network (CNN)
and Efficient Net B3 algorithms that are compared for image classification. The
CNN model performs best, with an impressive accuracy of 89.64%. The CNN loss
graph that goes with it shows how well the model converges and how well it can
reduce prediction errors during training. By comparison, the EfficientNetB3 algo-
rithm, which aims to maximize resource utilization, attains a slightly higher loss
and a lower accuracy of 63.76%. The CNN model outperforms the EfficientNetB3
model in terms of accuracy because it is better at identifying complex patterns in the
dataset. Specific application requirements and the trade-off between computational
efficiency and accuracy should be considered when selecting one of these models.
When both the validation and training accuracy are close to high, the model has
successfully learned from the training set while retaining the capacity to generalize to
new, unseen data. A major obstacle to balancing these two metrics is the development
of machine learning models. It may be necessary to adjust the model’s complexity,
regularization techniques, and hyperparameter tuning to achieve this balance and
ensure the best outcomes of the model’s capacity.
In conclusion, comparing the accuracy of a machine learning model’s training
and validation is necessary to assess the model’s quality and generalizability. It
helps identify potential overfitting issues and guides the model’s fine-tuning to yield
accurate and dependable results with new data.
(TP + NP)
Accuracy =
(TP + FP + TN + FN )
2. Precision: The percentage of correct positive predictions the model makes for
all of its optimistic assertions is known as precision. It is computed by dividing
the total number of positive results by the total number of false positives and
positive results.
Insect Management in Crops Using Deep Learning 373
(TP)
Precision =
(TP + FP)
3. Recall: Recall is the percentage of true positive cases that are true positives. It is
computed by dividing the total number of successful and unsuccessful outcomes
by the number of successful outcomes.
(TP)
Recall =
(TP + FN )
4. F1-score: The recall and precision average is the F1-score. It’s a reasonable
measurement that accounts for accuracy and memory.
(2 ∗ precision ∗ recall)
F1 =
(precision + recall)
Another important metric is computational time efficiency. The goal of our method
is computational simplicity, which is apparent when contrasting it with other cutting-
edge methods. The computation time for our CNN model is shown in Table 2.
374 S. Anilkumar et al.
5 Conclusion
Finally, with an accuracy of 63.76% for EfficientNetB3 and 89.64% for CNN,
combining the EfficientNetB3 algorithm and a Convolutional Neural Network
(CNN)architecture has demonstrated promising results in identifying insect pests.
Given its higher accuracy, which suggests it can automatically extract complex infor-
mation from image data, CNN is a good choice for insect identification. Despite
its limitations in accuracy, EfficientNetB3b’s computational efficiency could make
it valuable in resource-constrained scenarios. However, further optimization and
research into different models and datasets is required to enhance and improve insect
pest detection systems.
References
1. Liu J, Wang X (2021) Plant diseases and pests detection based on deep learning. In: 2021
Provided by the Springer Nature Shared it content-sharing initiative, China
2. Jiang Q et al (2020) Automatic identification of insect pests on winter wheat using multispectral
imagery. In: 2020 IEEE international geoscienceand remote sensing symposium (IGARSS),
Waikoloa
3. Fuentes A, Yoon S, Kim S, Park D (2017) A robust deep-learning-based detector for real-time
tomato plant diseases and pests recognition.Sensors 17(9). https://doi.org/10.3390/s17092022
4. Jiao L, Dong S, Zhang S, Xie C, Wang H (2020) AF-RCNN: ananchor-free convolutional neural
network for multi categories agricultural pest detection. Comput Electron Agricult 174:105522–
105522. https://doi.org/10.1016/j.compag.2020.105522
5. Kasinathan T, Singaraju D, Uyyala SR (2021) Insect classification and detection field crops
using modern machine learning techniques. Inf Process Agricult 8(3):446–457. https://doi.org/
10.1016/j.inpa.2020.09.006
6. Li R, Jia X, Hu M, Zhou M, Li D, Liu W, Wang R, Zhang J, Xie C, Liu L, Wang F, Chen H, Chen
T, Hu H (2019) An Effective Data Augmentation Strategy for CNN-Based Pest Localization and
Recognitionin the Field. IEEE Access 7:160274–160283. https://doi.org/10.1109/access.2019.
2949852
7. Ngugi LC, Abdel Wahab M, Abo-Zahhad M (2020) Recent advances in image processing tech-
niques for automated leaf pest and disease recognition – a review. Inf Process Agricult. https://
doi.org/10.1016/j.inpa.2020.04.004
Insect Management in Crops Using Deep Learning 375
8. Patel D, Bhatt N (2021) Improved accuracy of pest detection using augmentation approach with
Faster R-CNN. IOP Conf Ser: Mater Sci Engin 1042(1):012020. https://doi.org/10.1088/1757-
899x/1042/1/012020
9. Wang R, Chen P, Yang P (2023) Deep learning in crop diseases and insect pests
An Intra-Slice Security Approach
with Chaos-Based Stream Ciphers for 5G
Networks
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 377
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_29
378 V. Vijayan et al.
1 Introduction
2 Related Works
In 2018, the authors of [2] suggested a new security solution for 5G intra-slice
domain where the proposed solution employs a lightweight pseudo-random number
generator to obtain the keystream. These key streams are used in stream ciphers
to protect the user’s sensitive data and also the meta-information which contains
data size, encryption scheme, etc. The results show that the proposed solution is
lightweight and can be implemented in 5G base stations and resource-constrained
devices. In [3], chaotic-based security techniques are proposed to tackle privacy
challenges in upcoming 5G systems. Specifically, the authors employ chaotic signals
from Lorenz dynamics to create three binary flows, which are then utilized to cipher
and mask private information. This strategy aims to safeguard private data exchanges
between personal devices and 5G base stations. To improve privacy, they combined
encryption techniques and Code Division Multiple Access (CDMA). Additionally,
the implementation showcases the practicality of the solution, leveraging reduced
resource microcontrollers and devices.
In [11], the authors are proposing a lightweight stream cipher for low computa-
tional devices such as RFID and WSN. The proposed algorithm is a combination of
two generators, the SG and SSG, with the combinations of LFSR and FCSR. The
authors also provide examples of three cascade ideas that can be used with the SG
and SSG family of ciphers. The authors in [4] proposed a key scheme for network
slicing in systems that enable secure access for third-party monitoring applications,
subject to consent from the network devices. They follow the triple-way handshaking
process. It removes distortions due to time delays, multipath transmissions, and node
mobility of 5G systems.
The proposed scheme in [5] addresses the scenario of communication between
third-party applications and slices. It employs Shamir’s secret sharing for distribut-
ing and reconstructing private key shares and utilizes the ElGamal cryptosystem
for interval-key encryption and decryption. This approach leverages multiple secret
shares to reconstruct the key generated. It is adaptable for a group of users with
user consent, implicitly assuring privacy protection for monitored users. The authors
in [9] propose a study on network slicing technology. For that, they have collected
information from various online resources and investigated the security concerns
that were introduced in 5G networks. Then they find major challenges to network
slicing and thereafter discuss a mitigation strategy to those challenges which include
isolation through slicing, authentication, and cryptography. These findings help to
understand the scope of security in 5G networks.
There are very few existing works within the precise domain of encryption
schemes for 5G intra-slice security, and it is [2]. All above-mentioned schemes
are referenced in the context of 5G security and encryption methods for resource-
constrained devices. Our work is similar to [2] in the problem, but we are proposing
a novel approach aimed at addressing the same problem. The proposed work imple-
ments a novel encryption scheme for 5G intra-slice network security. It combines
the 2D-LSCM chaotic map for key stream generation with the lightweight stream
380 V. Vijayan et al.
cipher ChaCha20 for encryption. This hybrid approach aims to enhance efficiency,
particularly in resource-constrained domains such as IoT.
3 Methodology
‘The proposed chaotic encryption scheme uses 2D-LSCM for the keystream gener-
ation and ChaCha20 stream cipher for the encryption. We are considering images
for the encryption. Images of different sizes are considered for the encryption. The
chaotic map and the stream cipher used for the key stream generation and encryption
are explained in the section below.
Here .xi and . yi presents the values of sequences .x and . y at the iteration .i, .xi+1
and . yi+1 are the values of sequences .x and . y at the iteration .i + 1, .θ is the control
parameter, .θ ∈ [0, 1].
ChaCha20 Stream Cipher: ChaCha20 has gained recognition as one of the fastest
stream cipher algorithms employed for securing sensitive data [18]. The ChaCha20
block function employs a sequence of 10 “double rounds”. These rounds consist
of a distinct process: the “column round”, and the “diagonal round”, which operate
on the columns and diagonals respectively during the operations of encryption. This
alternation results in a total of 20 rounds, or equivalently, 80 individual quarter rounds.
The encrypted ciphertext is obtained by XORing both pseudo-random key streams
and plain text. The output of the Chacha20 stream cipher will be the ciphertext or
cipher image.
An Intra-Slice Security Approach with Chaos … 381
Figure 1 illustrates the block diagram of the proposed encryption scheme. Initially,
the 2D-LSCM generates a random keystream using predefined parameter values.
Subsequently, this randomly generated keystream is converted into bytes and utilized
as the key for the ChaCha20 stream cipher. Alongside a 16-byte nonce value, the
ChaCha20 stream cipher encrypts the plain image, employing the random keystream
generated earlier to obtain the cipher image.
Assessing image encryption algorithms is crucial due to large data sizes and process-
ing demands. Balancing high security with practical application needs is key [19].
Efficiency in execution time and memory usage, especially in resource-constrained
scenarios, is critical. Our performance analysis compared our proposed algorithm’s
Table 1 Comparison of end-to-end time for different image sizes (in seconds)
Image size 2D-LSCM Baker’s Map Arnold’s cat map
128. × 128 0.2524 0.3335 19.0264
256. × 256 0.2937 0.3554 19.0824
512. × 512 0.3414 0.4178 19.1575
1024. × 1024 0.5293 0.6650 19.4914
Table 2 Comparison of end-to-end memory usage for different image sizes (in MB)
Image size 2D-LSCM Baker’s map Arnold’s cat map
128. × 128 8.3086 38.5369 84.9809
256. × 256 18.3873 39.2517 113.0825
512. × 512 20.8947 62.1937 161.9692
1024. × 1024 47.2392 105.2595 176.2622
efficiency with Baker’s Map and Arnold’s Cat Map for key stream generation,
enabling direct comparisons.
Specifications for the analysis include Windows 11 OS with 16 GB RAM, and
Core i5 11th Gen processor. The simulation tool and language used are Python 3.12
and MATLAB R2023a.
Time Analysis: We analyze the proposed algorithm’s performance by calculating
its execution time across different image sizes, aiming to evaluate its effectiveness in
image encryption and decryption. The overall runtime is compared with two alter-
native encryption schemes using the chaotic maps—Baker’s map and Arnold’s cat
map for key stream generation as in Table 1, illustrating a notable advantage in speed
for our proposed method.
Memory Analysis: Memory analysis quantifies the proposed algorithm’s mem-
ory consumption during image encryption and decryption [17]. Table 2 juxtaposes
overall memory usage across different image sizes, showing our 2D-LSCM-based
scheme’s efficiency compared to alternative chaotic maps, even with increased
memory demands as image size grows.
Table 3 Percentage of matched pixels for different chaotic maps and image sizes
Chaotic maps Image size Percentage of matched pixels
2D-LSCM 128. × 128 35.19%
256. × 256 41.50%
512. × 512 38.19%
1024. × 1024 39.72%
Baker’s Map 128. × 128 42.93%
256. × 256 37.33%
512. × 512 39.16%
1024. × 1024 39.38%
Arnold’s cat map 128. × 128 43.74%
256. × 256 39.88%
512. × 512 40.41%
1024. × 1024 39.79%
Table 4 Comparison of NPCR (Number of Pixel Changing Rate) for Different Chaotic Maps and
Image Pixel Sizes
Image pixel size Channel 2D-LSCM Baker’s map Arnold’s cat map
NPCR Avg. NPCR Avg. NPCR Avg
NPCR NPCR NPCR
128. × 128 R 99.60 99.65 99.60
G 99.57 99.60 99.56 99.61 99.58 99.61
B 99.64 99.62 99.64
256. × 256 R 99.58 99.60 99.58
G 99.61 99.62 99.58 99.60 99.58 99.57
B 99.66 99.62 99.54
512. × 512 R 99.61 99.62 99.62
G 99.58 99.60 99.63 99.62 99.62 99.62
B 99.61 99.61 99.62
1024. × 1024 R 99.61 99.60 99.61
G 99.61 99.61 99.62 99.61 99.60 99.61
B 99.61 99.62 99.61
Table 5 Comparison of UACI (Unified average changing intensity) for different chaotic maps and
image pixel sizes
Image size Channel 2D-LSCM Baker’s map Arnold’s cat map
UACI Avg. UACI Avg. UACI Avg
UACI UACI UACI
128. × 128 R 33.69 33.47 33.35
G 33.48 33.55 33.62 33.49 33.62 33.31
B 33.49 33.39 33.08
256. × 256 R 33.40 33.58 33.53
G 33.46 33.45 33.41 33.53 33.38 33.48
B 33.50 33.60 33.53
512. × 512 R 33.50 33.55 33.50
G 33.44 33.47 33.44 33.51 33.40 33.53
B 33.45 33.53 33.44
1024. × 1024 R 33.42 33.45 33.47
G 33.47 33.45 33.51 33.47 33.44 33.51
B 33.46 33.51 33.48
particularly in 8-bit pixel images, enhances security. Table 6 reveals higher entropy
values for the three schemes, indicating a more uniform distribution in encrypted
images.
An Intra-Slice Security Approach with Chaos … 385
Fig. 2 Histogram; a. Plain image b. Encrypted image c. Histogram of plain image d. Histogram
of encrypted image (2D-LSCM) e. Histogram of encrypted image (Baker’s map) f. Histogram of
encrypted image (Arnold’s cat map)
Correlation Analysis: In plain images, neighboring pixels often show strong cor-
relations, indicating data redundancy [13]. Effective encryption disrupts these cor-
relations across horizontal (H), vertical (V), and diagonal (D) directions. Tables
7 and 8 present correlation coefficients for plain and cipher images, respectively.
The results, smaller and nearly identical, signify heightened security measures.
PSNR Analysis: The Peak Signal-to-Noise Ratio (PSNR) measures image distortion,
typically higher for compressed images resembling the original closely. In encryp-
tion, a lower PSNR signifies a distinct cipher image from the original, aligning
with encryption objectives. Formulas for PSNR and Mean Squared Error (MSE) are
386 V. Vijayan et al.
Table 6 Entropy analysis of cipher image channels for different chaotic maps and image sizes
Chaotic maps Image sizes Red channel Green channel Blue channel Average
entropy
2D-LSCM 128. × 128 7.988313 7.987952 7.988665 7.98831
256. × 256 7.997152 7.996826 7.997158 7.997045
512. × 512 7.999357 7.999353 7.999332 7.999347
1024. × 1024 7.999813 7.999845 7.999835 7.999831
Baker’s Map 128. × 128 7.989432 7.986689 7.989445 7.988522
256. × 256 7.996996 7.997155 7.9973 7.997155
512. × 512 7.999283 7.999163 7.999335 7.99926
1024. × 1024 7.999797 7.999812 7.999848 7.999819
Arnold’s Cat 128. × 128 7.988214 7.987721 7.99039 7.988775
Map
256. × 256 7.996912 7.997304 7.997176 7.997131
512. × 512 7.999352 7.999356 7.999202 7.999303
1024. × 1024 7.999852 7.999812 7.999807 7.999824
outlined [16]. Table 9 indicates high MSE and low PSNR across all methods, signal-
ing robust encryption. The proposed 2D-LSCM scheme shows relatively improved
PSNR, hinting at potentially stronger security measures.
Contrast Analysis: Contrast analysis assesses the variation in pixel intensity within
encrypted images, crucial in gauging vulnerability to attacks seeking key or image
extraction. It examines pixel value distribution, unveiling patterns or weaknesses in
encryption. Table 10 displays the results, highlighting the distribution in our proposed
encrypted image, aligning closely with other algorithms’ contrast values.”
An Intra-Slice Security Approach with Chaos … 387
Table 10 Average contrast analysis of pixel channels for different chaotic maps and pixel sizes
Chaotic maps Image size Red channel Green channel Blue channel Average
contrast
2D-LSCM 128. × 128 116.6642 116.9024 116.1533 116.5733
256. × 256 117.3163 116.8444 115.9682 116.7086
512. × 512 116.3521 116.8146 116.8687 116.6783
1024. × 1024 117.2689 116.4676 116.6431 116.7932
Baker’s Map 128. × 128 116.3039 116.4566 118.0584 116.9396
256. × 256 116.3772 117.2193 116.6279 116.7415
512. × 512 116.9819 116.6074 117.1718 116.9204
1024. × 1024 116.6501 117.1546 116.4951 116.7666
Arnold’s cat 128. × 128 117.6140 118.7028 117.1643 117.8270
Map
256. × 256 117.2129 117.0944 116.7294 117.0122
512. × 512 117.2042 116.5975 116.5692 116.7903
1024. × 1024 117.3213 116.4163 116.6449 116.7942
This paper addresses the critical security concerns prevalent within the 5G intra-slice
domain by introducing an innovative encryption scheme grounded in chaos theory
and stream ciphers. The urgency for enhanced security, especially in transmitting sen-
sitive user data, calls for innovative solutions in 5G. Our proposed encryption scheme
stands as a lightweight, efficient, and robust solution, strategically leveraging chaos
theory and the lightweight ChaCha20 stream cipher. Based on our performance and
security analysis, our proposed solution demonstrates clear superiority in effective-
ness and resilience. When compared to well-known chaotic maps like Baker’s map
and Arnold’s cat map, our approach proves superior in both security and efficiency
within the context of 5G.
For future works, the inclusion of additional chaotic maps such as the Piecewise
Linear Chaotic Map (PWLCM) and the Hénon Map stands as a potential avenue.
An Intra-Slice Security Approach with Chaos … 389
The incorporation of these maps into our encryption scheme could offer a broader
spectrum for key stream generation, enriching our understanding of their compara-
tive performance and enhancing the versatility of encryption methodologies. Also,
the implementation of our encryption scheme within a 5G test environment can be
a future scope. This endeavor aims to conduct real-time behavioral analysis, assess-
ing the system’s performance, robustness, and efficiency in a dynamic, high-speed
communication setting.
References
1. Zhang S (2019) An overview of network slicing for 5G. IEEE Wirel Commun 26(3):111–7.
https://doi.org/10.1109/MWC.2019.1800234
2. Bordel B, Orúe AB, Alcarria R, Sánchez-De-Rivera D (2018) An intra-slice security solution
for emerging 5G networks based on pseudo-random number generators. IEEE Access 6:16149–
16164. https://doi.org/10.1109/ACCESS.2018.2815567
3. Mareca P, Bordel B (2018) An intra-slice chaotic-based security solution for privacy preserva-
tion in future 5G systems. In: Trends and advances in information systems and technologies,
vol 2, 6 2018. Springer International Publishing, pp 144–154
4. Bordel Sánchez B, Alcarria Garrido RP (2017) Secure sensor data transmission in 5G networks
using pseudorandom number generators. In: Research briefs on information and communica-
tion technology evolution (ReBICTE), vol 3, pp 1–11. https://doi.org/10.22667/ReBiCTE.
2017.11.15.011
5. Porambage P (2019) Secure keying scheme for network slicing in 5G architecture. In: 2019
IEEE conference on standards for communications and networking (CSCN). IEEE. https://doi.
org/10.1109/CSCN.2019.8931330
6. Hua Z et al (2018) 2D logistic-sine-coupling map for image encryption. Signal Process
149:148–161. https://doi.org/10.1016/j.sigpro.2018.03.010
7. Guan ZH, Huang F, Guan W (2005) Chaos-based image encryption algorithm. Phys Lett A
346(1–3):153–7 Oct 10
8. Salleh M, Ibrahim S, Isnin IF. Enhanced chaotic image encryption algorithm based on Baker’s
map. In: Proceedings of the 2003 international symposium on circuits and systems, 2003.
ISCAS’03. 2003 May 25, vol 2. IEEE, pp II–II
9. Mathew A (2020) Network slicing in 5G and the security concerns. In: 2020 fourth international
conference on computing methodologies and communication (ICCMC). IEEE. https://doi.org/
10.1109/ICCMC48092.2020.ICCMC-00014
10. Balasundaram A et al (2023) Internet of things (IoT) based smart healthcare system for efficient
diagnostics of health parameters of patients in emergency care. IEEE Internet Things J
11. Shemaili MB et al (2012) A new lightweight hybrid cryptographic algorithm for the Internet
of things. In: 2012 international conference for internet technology and secured transactions.
IEEE. https://ieeexplore.ieee.org/abstract/document/6470990
12. Sarker VK et al (2020) Lightweight security algorithms for resource-constrained IoT-based
sensor nodes. In: ICC 2020-2020 IEEE international conference on communications (ICC).
IEEE. https://doi.org/10.1109/ICC40277.2020.9149359
13. Hua, Zhongyun, et al. "Color image encryption using orthogonal Latin squares and a new 2D
chaotic system." Nonlinear Dynamics 104 (2021): 4505-4522
14. Kumar RR, Mathew J (2020) Image encryption: traditional methods vs alternative methods.
In: 2020 fourth international conference on computing methodologies and communication
(ICCMC) 2020 Mar 11. IEEE, pp 1–7
15. The image dataset used for the encryption process is taken from, https://www.kaggle.com/
datasets/ll01dm/set-5-14-super-resolution-dataset/
390 V. Vijayan et al.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 391
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_30
392 G. Puniya et al.
1 Introduction
2 Literature Survey
data. The authors report a 77.60% and 74.53% accuracy of arousal and valence clas-
sification, suggesting scope for further improvement. The authors report an approx-
imate valence-arousal model classification accuracy of 71% and 75% for video and
picture stimuli, respectively.
Heart rate variability(HRV) also serves as a significant physiological measure that
mirrors the regulatory capacity of the cardiac autonomic nervous system. The paper
by Wang et al. [2] employed the amplitude level quantization (ALQ) technique for
feature extraction and proposed the HRV emotion recognition (HER) method for
emotion recognition. They achieved an accuracy rate of 84.3% while treating the
emotions as a varied spectrum instead of a binary classification.
Valentina et al. [3] utilized 59 participants’ GSR and ECG recordings from the
CLAS dataset. The signal combination is investigated using 39 extracted features
and then trained using a polynomial kernel-based support vector machine (SVM) and
the Sequential Minimal Optimization (SMO) algorithms. Luz et al. [4] conducted
a noteworthy study that employed a deep convolutional neural network (DCNN)
classifier on the ECG modality to achieve an arousal and valence accuracy of 81%
and 71%, respectively.
Sarkar et al. [5] compared the performance of the CNN mode with and without a
self-supervised approach on the SWELL and AMIGOS emotion recognition datasets.
The obtained results suggest that for both the valence-arousal dimensions, the self-
supervised CNN model outperforms the CNN model without the self-supervised
method. The paper by Hasnul et al. [6] discusses various challenges associated with
designing emotion recognition systems using ECG signals and potential future direc-
tions. The significant difference in the studies was that they used either a multimodal
approach, EEG and ECG together, or binary classification for emotions instead of
treating them as a varied spectrum.
Though it has been demonstrated that ECG-based emotion recognition is possible,
the related work lacked in identifying a diverse set of emotions. It gave poor accuracy
when working with a single modality. Hence, there is much room for improvement
in emotion recognition models based on ECG signals.
3 Methodology
The methodology encompasses data collection from the CLAS dataset, preprocessing
through windowing techniques, and model implementation for emotion prediction
using ECG data. It includes feature extraction, pooling, activation, regularization,
and categorical cross-entropy loss for arousal and valence prediction. Our focus is
on identifying all the emotions accurately without compromising on the intensity of
the emotion. While previous papers clubbed various ranges of valence and arousal
together, we averaged the values to the nearest integers. Thus, 9 classes (0–9) were
obtained for valence and 9 classes (0–9) for arousal, thus allowing a more detailed
study of emotions.
Emotion Classification Using Triple Layer CNN with ECG Signals 395
3.1 Dataset
The Cognitive Load, Affect, and Stress (CLAS) recognition dataset [3] is a multi-
modal ECG, PPG, and EDA physiological signals database collected using wearable
sensors: The Shimmer3 GSR + Unit for EDA and PPG and the Shimmer3 ECG Unit
for ECG. The Shimmer3 GSR Unit also recorded three-dimensional accelerometer
data. All signals are recorded at a 256 Hz sampling rate with 16-bit resolution. This
dataset contains synchronized recordings of these signals from 62 healthy partici-
pants engaged in interactive and perceptive tasks involving emotional stimuli. The
dataset facilitates the examination of negative emotions, cognitive effort, mental
strain, attention assessment, and emotion recognition. We have considered only 51
data participants, as their Valence and Arousal values were recorded along with ECG
data.
3.2 Preprocessing
Our window size and overlap selection balanced feature granularity and temporal
coherence.
Subsequently, these preprocessed segments are fed into a triple-layered CNN model,
which has been tailored to extract meaningful features from the ECG data. The output
of our model encompasses results from 9 classes, representing arousal and valence,
providing a comprehensive view of the emotional landscape.
1. Model Description: We have used a layered 1-D CNN model, which refers to a
specific type of machine-learning model that is used for analyzing sequential data.
In this case, the model uses a type of neural network called a 1-D Convolutional
Neural Network (CNN) with three layers. The model has three layers since it
has multiple levels of computational units that process the input data. Each layer
consists of convolutional operations, where filters are applied to the sequential
data to extract relevant features. These convolutional operations help the model
learn patterns and relationships in the data.
2. Optimization and Learning: The model’s refinement process is directed by
the Adamax optimizer. The model’s learning is steered by the categorical
cross-entropy loss function, which works towards minimizing the gap between
predicted and actual labels. In conclusion, the meticulous orchestration of these
components empowers our model to make precise and reliable predictions
concerning human emotional states.
3. Metrics Used: The proposed model evaluates performance using a “confu-
sion matrix.” It provides key metrics: Accuracy for correctness, Precision for
positive prediction accuracy, recall for pertinent instance identification, and F1-
Score for overall performance assessment. These metrics are vital for evaluating
classification models in research and practical applications.
4 Experimental Work
The proposed Emotion recognition system uses ECG signals of all 51 subjects. The
ECG signals are preprocessed, and various classification techniques such as ANN,
CNN, LSTM, RNN, and TRANSFORMER are used to classify emotions using ECG
signals. CNN gave the best accuracy among the different classification techniques
implemented and outperformed the other classification techniques (as depicted in
Table 1).
Our optimized CNN model for multi-class classification surpassed alternative
techniques substantially. This highlights its effectiveness for the classification task.
In addition, the performance of CNN depends on the tuning of hyperparameters.
We have used multiple optimization techniques (Grid Search, Random Search, and
Emotion Classification Using Triple Layer CNN with ECG Signals 397
Optuna using Bayesian Optimization) to optimize our accuracy for arousal and
valence. The outcomes are shown in tabular form in Table 2.
After a comparative study (as seen in Table 2), it became evident that Optuna
produced the best results for the current model and improved the accuracy by over
4% for Arousal and Valence. The final hyperparameters obtained using Optuna are
Hyper-filters = 95, kernel size = 3, dense units = 102, and dropout rate = 0.1312.
5 Result Analysis
The final accuracy obtained for Arousal and Valence was 93.31% and 92.65%, respec-
tively. Since the CNN model presented the most accurate results, confusion matrices
were constructed for arousal (Fig. 3) and valence (Fig. 4). The arousal confusion
matrix was consistent with expectations. The diagonal elements represent correctly
predicted samples. The model accurately classified 96,490 samples out of 103,408
(93.31%) for arousal, demonstrating the overall effectiveness.
As for valence, the confusion matrix (in Fig. 4) accurately classified 95,787
samples out of 103,386 samples (92.65%). This indicates that the model outper-
forms other studies that are currently present and can prove to be a pivotal point for
emotion recognition systems.
We used 80% of the training data and 20% of the testing data to determine the
learned model’s results. To review the model’s performance, Fig. 5 shows the loss
plotted against the number of epochs for the CNN model after optimization using
Optuna. This also clearly indicates that the model’s performance improves over time
by measuring the error or dissimilarity between its predicted output and true output.
As we used multi-class classification, the metrics would first be calculated for all
the classes individually, and later the macro-averaged metrics would be shared. The
results for arousal were listed in Tables 3 and 4 for valence.
By using the macro-averaging method, precision and recall are calculated for
arousal and valence. The F1 Score can be calculated using those values. These
results are a quantitative measure for assessing the model’s overall effectiveness
and performance. Table 5 presents the precision, recall, and F1 scores for arousal
and valence after macro-averaging. The precision values indicate the proportion of
true positive predictions among all positive predictions, while recall represents the
proportion of true positive predictions among all actual positive instances. Addition-
ally, the F1 score balances precision and recall, measuring the model’s overall perfor-
mance. These metrics are essential for evaluating the accuracy and effectiveness of
the classification models in predicting arousal and valence in the given context.
The current models can be categorized into two major segments. First, while binary
categorization of the ECG signal (0–4.5 & 4.5–9) yields good accuracy (>85%), it
does not provide a wide range of emotions (just four primary classifications: angry,
joy, bored, and depressed). The second segment is those using a three-class distribu-
tion (0–3, 4–6, 7–9). These models can identify diverse emotions, but we can see a
significant drop in accuracy. While some articles report great accuracy, their multi-
modal methodology utilizes EEG and other physiological markers as input and ECG
signals. The proposed work shows significant improvement in accuracy with the use
of ECG signals only. Table 6 shows a comparative analysis of our results with the
state of the artwork.
Initial research relied on traditional methods and manual feature extraction,
achieving 57% to 82% accuracy. In 2022, a significant shift was observed with
6 Conclusion
This research used ECG signals to classify and analyze human emotions. The exper-
imental findings demonstrate an accuracy of 93% in recognizing human emotions
and classifying them as a varied spectrum of 9 × 9 classes instead of treating them as
binary distinctions, providing a more nuanced understanding of emotional states and
paving the way for more comprehensive emotion recognition systems. Moreover, the
data used was obtained from a single lead configuration, thus allowing ease of use
when incorporating the model into a smart device of everyday use.
Future research will use other modalities, such as GSR and PPG, which can
be easily used in everyday devices. The research can open new domains in the
health sector by focusing more on the mental state of patients and allowing the
doctors to have a better insight about the patient’s emotional state and, thus, guide
the formulation of treatment plans. The work can further benefit in psychological
counseling and self-monitoring of one’s emotional state.
References
1 Introduction
Epilepsy is a chronic neurological illness that affects 6 million people globally. These
seizures can significantly impact patients’ health and wellbeing [1]. Early seizure
detection and continued monitoring are critical for effective therapy. Recent break-
throughs in deep learning and the Internet of Things (IoT) provide intriguing methods
for identifying and monitoring epileptic seizures. Deep learning systems can evaluate
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 405
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_31
406 O. M. Assim and A. F. Mahmood
data patterns to detect seizures. IoT devices, such as wearable sensors and gadgets,
capture and send patients’ physiological data simultaneously [2]. Creating accurate
and reliable IoT-based seizure detection systems, which include signal collecting,
algorithm development, and data standards, is difficult.
There are several forms of epilepsy, including focal, generalized, unknown, and
unclassified. Focal seizures start with a single point of focus [3]. Generalized seizures
are caused by synchronous stimulation in both cerebral hemispheres [4]. Seizures
with unclear onsets are classed as “unknown,” allowing for categorization even
when the start is uncertain [5]. The unclassifiable category continues despite adding
“unknown” as a seizure-beginning type [6]. Epilepsy manifests differently depending
on age, with peaks at ages 5–9 and approximately 80, and it affects both genders
equally [7]. Patients benefit from early detection of seizures, which improves their
quality of life and reduces hazards [8].
Smart technologies, encompassing mobile and electronic devices, have been increas-
ingly integrated into health care, revolutionizing disease detection, medical manage-
ment, and overall quality of life [31–34]. The concept of intelligent health care is
realized when Internet of Things (IoT) modules bolster the foundational functions of
the healthcare sector. While IoT has garnered global attention for several years, the
healthcare industry has recently embraced its vast potential and advantages, incor-
porating cutting-edge equipment, facilities, and interconnections across sectors [35,
36]. In epilepsy care, IoT is vital in enabling patient emergency response systems.
A robust monitoring approach is essential to ensure secure data transfer within the
network.
In the medical field, the Internet of Things connects all imaginable healthcare
resources to enable fast transfer of information via the Internet [41]. This inter-
connected network encompasses doctors, rehabilitation facilities, hospitals, medical
equipment, sensors, and patients, creating a seamless real-time data flow. Many IoT
devices, including portable insulin syringes to stress monitors, blood pressure moni-
tors, weight trackers, fitness trackers, and ECG and EEG monitors, are currently in
development for the healthcare sector [42].
The convergence of IoT with deep learning heralds the dawn of a new age of intel-
ligent and adaptive systems. We might anticipate a future in which our interconnected
devices collect data and grasp, learn, and respond intelligently to the ever-changing
world around us as these technologies advance. This symbiotic link between IoT
and DL holds the promise of unlocking creative solutions across multiple sectors,
ultimately transforming the world into a more intelligent and responsive one.
408 O. M. Assim and A. F. Mahmood
3 Discussion
4 Conclusion
Internet of Things (IoT) promises a real-time early detection system that could save
the lives of thousands of epileptic patients. By enabling early detection of epileptic
seizures, IoT empowers individuals with this chronic disorder, their families, and
nearby healthcare providers. It alerts them during the preictal stage, potentially
averting life-threatening situations. In an era marked by the transition to electronic
health care, developing a precise, automated, computer-assisted seizure diagnosis
system is paramount in clinical practice.
However, several challenges stand in the way of realizing the goal of effective
early epilepsy detection. These challenges encompass the inherent limitations of the
EEG signal, a frequently used diagnostic tool known for its weakness, instability, and
susceptibility to noise. Additionally, the relatively small amount of available EEG
data poses a significant hurdle for researchers utilizing deep learning approaches.
Deep learning algorithms excel when trained on extensive datasets, but the field of
epilepsy detection currently lacks such comprehensive data.
Despite these challenges, significant progress in epileptic identification and moni-
toring has occurred over the past five years. Researchers have explored innova-
tive solutions, as evidenced by the studies highlighted in the preceding sections
of this review. Continued efforts to advance technology, data acquisition methods,
and machine learning hold the potential to overcome these obstacles, ultimately
improving the lives of individuals affected by epilepsy.
The convergence of IoT and health care continues to offer promising opportunities
to transform epilepsy management, provide timely interventions, improve patient
outcomes, and increase quality of life.
412 O. M. Assim and A. F. Mahmood
5 Future Directions
Looking ahead, the fusion of the Internet of Things (IoT) and deep learning (DL) algo-
rithms for identifying and monitoring epileptic seizures holds tremendous promise,
opening up exciting avenues for advancements in patient care and research. Here are
some future opportunities in this field:
1. Personalized seizure prediction: Future developments in IoT and DL could create
personalized seizure prediction models for individuals by leveraging continuous
data streams from connected wearable devices, providing more accurate and
timely predictions tailored to the unique characteristics of each person’s seizures.
2. Edge Computing for Real time: Integrating edge computing with IoT devices
and DL algorithms to overcome the challenges of real-time processing is an
important field. This approach involves processing data locally on the device,
reducing latency, and enhancing the speed of seizure detection. Edge computing
can empower even resource-constrained devices to contribute effectively to real-
time monitoring.
3. Multimodal Data Integration: Integrating multiple data modalities, such as elec-
troencephalogram (EEG) data, heart rate variability, and patient behavior, holds
the potential for more comprehensive seizure detection. DL algorithms could
learn from diverse data sources, enabling a holistic understanding of the preictal
and ictal states. This multidimensional approach may significantly improve the
accuracy and reliability of seizure identification.
4. Explainable AI for Clinical Adoption: Future research may focus on developing
explainable AI models, enhancing interpretability for clinicians and patients, and
understanding how a DL algorithm arrives at a seizure prediction or detection
can instill greater confidence among healthcare professionals. Explainable AI
fosters trust, a critical factor for successfully integrating these technologies into
clinical practice.
5. Continuous Monitoring Beyond Seizure Detection: Expanding the scope of IoT
and DL in epilepsy management, there’s an opportunity for continuous moni-
toring beyond seizure detection. These technologies could be harnessed to track
medication adherence, sleep patterns, and lifestyle factors, providing a more
comprehensive picture of a patient’s condition. This holistic approach may enable
personalized treatment plans and interventions.
References
1. Alharthi MK et al (2022) Epileptic disorder detection of seizures using EEG signals. Sensors
22(17):6592
2. Mohamad Jawad HH et al (2022) A systematic literature review of enabling IoT in healthcare:
motivations, challenges, and recommendations. Electronics 11(19):3223
3. Natu M et al (2022) Review on epileptic seizure prediction: machine learning and deep learning
approaches. In: Computational and mathematical methods in medicine, 2022
Evolving Approaches in Epilepsy Management: Harnessing Internet … 413
4. Fisher RS et al (2018) Instruction manual for the ILAE 2017 operational classification of
seizure types. Zeitschrift f¨ur Epileptologie 31:282–295
5. de Bruijn MA et al (2019) Evaluation of seizure treatment in anti-LGI1, anti-NMDAR, and
anti-GABABR encephalitis. Neurology 92(19):e2185–e2196
6. Neligan A, Hauser WA, Sander JW (2012) The epidemiology of the epilepsies. Handb Clin
Neurol 107:113–133
7. Shoeibi A et al (2021) Epileptic seizures detection using deep learning techniques: a review.
Int J Environ Res Public Health 18(11):5780
8. Omidvarnia A et al (2019) Towards fast and reliable simultaneous EEG-fMRI analysis of
epilepsy with automatic spike detection. Clin Neurophysiol 130(3):368–378
9. Louis EKS, Cascino GD (2016) Diagnosis of epilepsy and related episodic disorders.
CONTINUUM: Lifelong Learn Neurol 22(1):15–37
10. Seneviratne U, Cook M, D’Souza W (2012) The electroencephalogram of idiopathic general-
ized epilepsy. Epilepsia 53(2):234–248
11. Abhang PA, Gawali BW, Mehrotra SC (2016) Introduction to EEG-and speech based emotion
recognition. Academic Press
12. Beniczky S et al (2017) Standardized computer-based organized reporting of EEG: SCORE–
second version. Clin Neurophysiol 128(11):2334–2346
13. Roberson SW et al (2020) Electrocorticography reveals spatiotemporal neuronal activation
patterns of verbal fluency in patients with epilepsy. Neuropsychologia 141:107386
14. Assim OM, Mahmood AF (2023) Designing a wearable EEG device and its benefits for epilepsy
patients: a review. Al-Kitab J Pure Sci 20;7(1):69–82
15. Pacreu S et al (2018) Anaesthesia management in epilepsy surgery with intraoperative elec-
trocorticography. Revista Espa˜nola de Anestesiolog´ıa y Reanimaci´on (English Edition)
65(2):108–111
16. Bandopadhyay R et al (2021) Recent developments in diagnosis of epilepsy: scope of
microRNA and technological advancements. Biology 10(11):1097
17. Raschka S, Patterson J, Nolet C (2020) Machine learning in python: Main developments and
technology trends in data science, machine learning, and artificial intelligence. Information
11(4):193
18. Sharma R, Pachori RB (2015) Classification of epileptic seizures in EEG signals based on
phase space representation of intrinsic mode functions. Expert Syst Appl 42(3): 1106–1117.
18
19. Mohammadpoor M, Shoeibi A, Shojaee H (2016) A hierarchical classification method for
breast tumor detection. Iranian J Med Phys/Majallah-I F¯ız¯ık-I Pizishk¯ı-i ¯Ir¯an 13(4).
20. Assi EB et al (2017) Towards accurate prediction of epileptic seizures: a review. Biomed Signal
Process Control 34:144–157
21. Romaine JB et al (2021) EEG—Single-channel envelope synchronization and classification
for seizure detection and prediction. Brain Sci 11(4):516
22. Khodatars M et al (2021) Deep learning for neuroimaging-based diagnosis and rehabilitation
of autism spectrum disorder: a review. Comput Biol Med 139:104949
23. Sadeghi D et al (2022) An overview of artificial intelligence techniques for diagnosis of
Schizophrenia based on magnetic resonance imaging modalities: methods, challenges, and
future works. Comput Biol Med 146:105554
24. Craik A, He Y, Contreras-Vidal JL (2019) Deep learning for electroencephalogram (EEG)
classification tasks: a review. J Neural Eng 16(3):031001
25. Subasi A, Kevric J, Abdullah Canbaz M (2019) Epileptic seizure detection using hybrid machine
learning methods. Neural Comput Appl 31:317–325
26. Pal KK, Sudeep K (2016) Preprocessing for image classification by convolutional neural
networks. In: 2016 IEEE international conference on recent trends in electronics, information
and communication technology (RTEICT). IEEE
27. Cao J et al (2019) Epileptic signal classification with deep EEG features by stacked CNNs.
IEEE Trans Cognit Develop Syst 12(4):709–722
414 O. M. Assim and A. F. Mahmood
28. Assim OM, Alkababji AM (2021) CNN and genetic algorithm for finger vein recognition. In:
2021 14th international conference on developments in eSystems engineering (DeSE) (pp. 503–
508). IEEE.
29. Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural
Comput 18(7):1527–1554
30. Chai R et al (2017) Improving EEG-based driver fatigue classification using sparse deep belief
networks. Front Neurosci 11:103
31. Vaˇreka L, Mautner P (2017) Stacked autoencoders for the P300 component detection. Front
Neurosci 11:302
32. Papa A et al (2020) E-health and wellbeing monitoring using smart healthcare devices: An
empirical investigation. Technol Forecast Soc Chang 153:119226
33. Dritsa D, Biloria N (2018) Towards a multi-scalar framework for smart healthcare. Smart
Sustain Built Environ 7(1):33–52
34. Alabdulatif A et al (2019) Secure edge of things for smart healthcare surveillance framework.
IEEE Access 7:31010–31021
35. Delgosha MS, Hajiheydari N, Talafidaryani M (2022) Discovering IoT implications in business
and management: a computational thematic analysis. Technovation 118:102236
36. Zaman S et al (2022) Thinking out of the blocks: Holochain for distributed security in iot
healthcare. IEEE Access 10:37064–37081
37. Guan Z et al (2019) Achieving data utility-privacy tradeoff in Internet of medical things: a
machine learning approach. Futur Gener Comput Syst 98:60–68
38. Vilela PH et al (2019) Performance evaluation of a Fog-assisted IoT solution for eHealth
applications. Futur Gener Comput Syst 97:379–386
39. Anand A et al (2021) An efficient CNN-based deep learning model to detect malware attacks
(CNN-DMA) in 5G-IoT healthcare applications. Sensors 21(19):6346
40. Almaiah MA et al (2022) A novel hybrid trustworthy decentralized authentication and data
preservation model for digital healthcare IoT based CPS. Sensors 22(4):1448
41. Jabar MK, Al-Qurabat AKM (2021) Human activity diagnosis system based on the Internet of
things. In: Journal of physics: conference series, vol 1879, No 2, p 022079. IOP Publishing.
42. Bharadwaj HK, Agarwal A, Chamola V, Lakkaniga NR, Hassija V, Guizani M, Sikdar B (2021)
A review on the role of machine learning in enabling IoT based healthcare applications. IEEE
Access 9:38859–38890
43. Alhussein M et al (2018) Cognitive IoT-cloud integration for smart healthcare: case study for
epileptic seizure detection and monitoring. Mobile Netw Appl 23:1624–1635
44. Singh K, Malhotra J (2019) IoT and cloud computing based automatic epileptic seizure detec-
tion using HOS features based random forest classification. J Ambient Intell Humanized
Comput 1–16
45. Sayeed MA, Mohanty SP, Kougianos E, Zaveri HP (2019) ESeiz: an edge-device for accurate
seizure detection for smart healthcare. IEEE Trans Consum Electron 65(3):379–387
46. Sayeed MA et al (2019) Neuro-detect: a machine learning-based fast and accurate seizure
detection system in the IoMT. IEEE Trans Consum Electron 65(3):359–368
47. Daoud H, Williams P, Bayoumi M (2020) IoT based efficient epileptic seizure prediction system
using deep learning. In: 2020 IEEE 6th world forum on internet of things (WF-IoT). 2020. IEEE
48. Akashah PE, Shita AN (2020) An IoT platform for seizure alert wearable devices. In: IOP
conference series: materials science and engineering 2020, vol 767, No 1, p 012012. IOP
Publishing
49. Gupta S, Ranga V, Agrawal P (2021) Epilnet: a novel approach to IoT based epileptic seizure
prediction and diagnosis system using artificial intelligence. arXiv preprint arXiv:2111.03265
50. Hassan S, Mwangi E, Kihato PK. IoT based monitoring system for epileptic patients. Heliyon.
2022, 8(6).
51. Zambrana-Vinaroz D et al (2022) Wearable epileptic seizure Prediction System based on
machine learning techniques using ECG, PPG and EEG signals. Sensors 22(23):9372
52. Lupi’on, M, et al. Epilepsy Seizure Detection Using Low-Cost IoT Devices and a Federated
Machine Learning Algorithm. in International Symposium on Ambient Intelligence. 2022.
Springer.
Evolving Approaches in Epilepsy Management: Harnessing Internet … 415
53. Yedurkar DP et al (2023) An IoT based novel hybrid seizure detection approach for epileptic
monitoring. IEEE Trans Indust Inf
54. Al-Hajjar AL, Al-Qurabat AK (2023) An overview of machine learning methods in enabling
IoMT-based epileptic seizure detection. J Supercomput 24:1–48
Multipurpose Internet of Things-Based
Robot for Military Use
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 417
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_32
418 P. Linga Varshini et al.
1 Introduction
A multipurpose robot has many capabilities and is typically used in military settings.
These robots are equipped with cutting-edge technology and functions, allowing
them to carry out various duties while boosting the efficiency and safety of military
operations. Increased situational awareness, less danger to soldiers, increased mission
success, and the ability to perform jobs previously thought too dangerous for human
personnel are all major benefits. Multipurpose military robots are expected to play an
increasingly important role in today’s military operations as technology advances.
Multipurpose Internet of Things-Based Robot for Military Use 419
Jhanavi et al. [1] presented a fresh method that uses a wireless camera to spot
strangers or potential intruders. This defense robot has a gripper, weapon, sensors,
and a camera to perform several duties. The system can communicate data to a web
server via GSM. The major goal of this robotic system is to prevent harm to people
and the country as a whole.
Gavali et al. [2] designed a robot model to handle mundane and repetitive tasks for
individuals; robots have evolved significantly with advancing technology, blurring
the lines between reality and imagination. In our contemporary era, we coexist with
robots exhibiting heightened intelligence, even possessing human-like characteris-
tics. These robots demonstrate precise detection and responsive reactions to environ-
mental stimuli. Facilitated by algorithms executed through the Arduino Uno inter-
face, the robot achieves seamless movement along a designated path. The primary
objective of this project is to implement and fine-tune the algorithm, adjusting control
parameters to optimize the robot’s performance and movement regulation.
Features such as obstacles and human detection, wireless remote control,
accelerometer-based control, and more were prioritized by Nahidul Alam et al. [3],
making this robot flexible enough to meet the needs of a wide range of users and
settings. By allowing users to switch between several modes, it demonstrates its
flexibility and adaptability. An integrated system of many sensors that are useful for
intelligent service robots results from their smooth cooperation. A Dual-Tone Multi-
Frequency (DTMF) module can change and control the robot’s operational modes
from a distance. This research aims to construct a feasible module, integrating the
cost-effective approach outlined here with insights from diverse industries. This
paves the way for developing novel systems and future research endeavors.
Inderjeet Singh et al. [4] offered an experimental military robot prototype
constructed with multiple use and variable sensors array according to its special-
ized purpose. Among the most modern robotic technologies, it comprises grippers,
integrated systems, and live-view recording cameras. Because of its adaptability,
this robot can be used in various settings, including military operations, where it can
improve environmental assessments without compromising human safety.
Mohamed Ibrahim et al. [5] adopted the Hypertext Transfer Protocol (HTTP).
This novel strategy uses multipurpose robots to monitor far-flung and frontier
areas, especially border monitoring, when human troops are absent. This Internet-
connected robot car may be driven by a human following directions on a screen
or autonomously by reading environmental cues. Operating in several modes, the
robot is equipped to detect fire, metal, hazardous substances, and human presence in
distant and demanding environments. This approach uses an embedded Raspberry
Pi 3 board with Python programming to overcome wireless security robot limita-
tions. The Internet of Things lets consumers track and control military robots world-
wide. Solar panels are installed to make the system greener. An ultrasonic sensor is
used for hands-free functionality, while web-based arrow keys provide easy manual
control. The robot’s path can be fine-tuned in response to changes in its immediate
environment using live video input from a camera.
420 P. Linga Varshini et al.
multipurpose robots are crucial in identifying and rescuing trapped soldiers. Addi-
tionally, they help bridge the gap between individual military units and higher-ups
in the chain of command. The expanding use of multipurpose robots in conflict is in
step with technical development, enhancing military efficiency.
These robots offer novel approaches to complex problems, which boosts mission
performance overall. The potential uses and range of multifunctional robots in
warfare are growing with the success of continuing robotics and AI research and
development. Robots can perform many operations and duties, including mobility,
detecting dangerous compounds in the air, submerging underwater to rescue persons,
fire detection, and monitoring environmental elements, including temperature,
humidity, and metal presence [6, 7]. The Army Robot is one such machine; it has a
built-in camera module for spying on the enemy and doesn’t rely on Bluetooth for
remote data collection. Problems with non-noise communication between the robot
and the control unit leading to malfunctions are common in current systems [8],
as are the high expenses involved with connecting robots during rescue operations.
Although Bluetooth’s range is extensive and can be expanded, its use in certain appli-
cations may be limited. A wide variety of robotic systems are currently available,
each of which has some of the drawbacks described below.
Bluetooth-Based Voice-Controlled Robots
Built on Arduino microcontrollers, the voice-controlled robot integrates voice recog-
nition technology to operate with simple spoken commands [11]. As a result of the
combination of these two technologies, new possibilities for hands-free interaction
422 P. Linga Varshini et al.
and automation have been opened up, with important implications for user experience
and machine-human interaction [12].
Disadvantage
The vocabulary size of such robots may be constrained, limiting the variety and
complexity of commands they can understand. In noisy environments, they may
have trouble interpreting commands precisely, leading to mistakes.
Obstacle Avoidance Robots
The obstacle avoidance robot, which Arduino powers, signifies a significant mile-
stone in robotics and microcontroller technology [9]. Advanced algorithms and
sensors allow the autonomous robot to identify obstacles and modify its trajectory
in real time. As the central processing unit, Arduino microcontrollers analyze sensor
data in real time for quick decision-making [10]. This work examines obstacle avoid-
ance robots for their importance, technological components, and potential to improve
automation in multiple sectors [13].
Disadvantage
The sensing range of obstacle avoidance robots may impose constraints, perhaps
leading to delayed identification of obstacles and an increased risk. Variations in
illumination may affect obstacle detection sensors, affecting the robot’s performance
in varied surroundings [15].
Environment Surveillance Robots
This advanced robot has sensors, cameras, and communication modules for remote
surveillance [14]. Arduino microcontrollers, the robot’s cognitive core, process
and decide instantly. This brief introduction discusses the Arduino-driven military
surveillance robot, its importance, its technological features, and how it improves
military reconnaissance and security [16].
Disadvantage: Arduino microcontrollers may have processing limits that limit
algorithm and image processing complexity for advanced surveillance functions.
Arduino-based surveillance robots may have limited sensor options, limiting their
effectiveness in challenging conditions.
Fire Fighting Robots
Combining robotics and microcontroller technologies, the Arduino-based Fire-
fighting Robot extinguishes fires. It uses sensors, actuators, and Arduino to auto-
matically detect, navigate to, and extinguish flames, increasing firefighting’s effi-
ciency. When the fire sensor identifies a threat, the Arduino system activates actu-
ators, enabling autonomous navigation and efficient fire extinguishing for a safer
response in emergency scenarios. This summary highlights its significance and vital
role in responding rapidly to fire crises to improve safety.
Multipurpose Internet of Things-Based Robot for Military Use 423
Disadvantage
Robots designed to fight fires may have trouble navigating complex or confined
spaces.
Proposed Block Diagram
The communication and interaction between the Arduino microcontroller and the
rest of the system are depicted in this block diagram. The Arduino manages access
control, fingerprint matching, and capture, which acts as the system’s brain. The
fingerprint templates of approved users are stored in a database, and the access
control mechanism restricts exam room access [9].
Figures 2, 3, and 4 show the block diagram for an Arduino-based multipurpose
robot system, explaining its components and connections. Here is a simplified block
diagram for such a system.
Workflow of the Proposed Work
The flow graphic shows the basic steps of multipurpose robot operation. After system
initialization, user input, robot control, perception, decision-making, and multifunc-
tional task execution, the user interface is updated. The robot’s capacity to understand
user orders, navigate via GPS, observe its surroundings using the ESP32 camera, and
interpret sensor data gives it several skills. The proposed paradigm process diagram
is shown in Fig. 5.
Also, by introducing an advanced Long Range (LoRa) module, the proposed robot
model excels in extended communication capabilities. By intelligently defining LoRa
parameters, the robot achieves unparalleled precision in long-range transmissions,
bolstering its adaptability across expansive war zones or challenging terrains. Its
objective is to operate in scenarios where human involvement is not feasible. Further-
more, the user is provided with prior information on prospective incursions onto their
property.
Future Scopes
The prospective scope of military-oriented multifunctional robots is promising,
advancing with technological progress. These robots may deploy a robotic arm
for precise object manipulation, incorporate a water tank for firefighting capabil-
ities, and substitute standard cameras with night vision counterparts. Integrating
artificial intelligence, fostering learning and adaptation in dynamic environments,
will augment the robots’ adeptness in handling unpredictable scenarios. The future
426 P. Linga Varshini et al.
References
1. Jhanavi V, Jahnavi AP, Ayeesha Ruman, Ramya KR (2022) IoT based multifunctional robot
for war assistance. Int Adv Res J Sci, Engin, Technol, 9(4). https://doi.org/10.17148/IARJSET.
2022.9412
2. Gavali R (2021) Multipurpose robot. Int Res J Engin Technol (IRJET) 08(08). ISSN: 2395–0056
3. Nahidul Alam Md, Saiam Md, Al Mamun A, Musfiqur Rahman Md, Hany U, A prototype of
multifunctional rescue robot using wireless communication. In: 5th international conference
on electrical engineering and information communication technology (ICEEICT). https://doi.
org/10.1109/ICEEICT53905.2021.9667872
4. Inderjeet Singh S, Mudigonda S, Mukkavalli S, Kotrika N (2022) Multipurpose security robot
using arduino microcontroller. Int J Sci, Engin, Manage (IJSEM) 9(7)
5. Mohamed IbrahimA, Deepthi E, Bindiya M (2018) Solar powered wireless multifunctional
robot. Int J Engin Res Technol (IJERT) 7(03), ISSN: 2278–0181
6. El-Said O, Al HS (2022) Are customers happy with robot service? Investigating satisfaction
with robot service restaurants during the COVID-19 pandemic. Heliyon 8(3):e08986. https://
doi.org/10.1016/j.heliyon.2022.e08986
7. Shimmura T, Ichikawa R, Okuma T, Ito H, Okada K, Nonaka T (2020) Service robot introduc-
tion to a restaurant enhances both labor productivity and service quality. Procedia CIRP, 2020,
vol 88, pp 589–94.https://doi.org/10.1016/j.procir.2020.05.103
8. Hutabarat D, Purwanto D, Hutomo H, Rivai M (2019) Lidar-based obstacle avoidance for the
autonomous mobile robot. In: International conference on information and communication
technology and system (ICTS).https://doi.org/10.1109/ICTS.2019.8850952
9. Ghaleb M (2018) Design of an obstacle-avoiding robot car based on arduino microcontroller.
Bachelor’s thesis, June 2018
10. Adegoke OM, Akinola SO (2018) Development of an Arduino-based obstacle avoidance
robotic system an unmanned vehicle. ARPN J Engin Appl Sci 13(3). ISSN: 1819–6608
11. Shifat AZ, Rahman MS, Fahim-Al-Fattah M, Rahman MA (2014) A practical approach to
microcontroller based smartphone operated robotic system at emergency rescue scheme. In:
2014 9th international forum on strategic technology (IFOST), Cox’s Bazar, Bangladesh. IEEE,
pp 414–417
12. Pavithra S, Siva Sankari SA (2013) 7TH sense-a multipurpose robot for the military. In: 2013
international conference on information communication and embedded systems (ICICES),
Chennai, India. IEEE pp 1224–1228
13. Jain K, Suluchana V (2013) Design and development of smart robot car for border security. Int
J Comput Appl 76(7)
14. Mohammad T (2009) Using ultrasonic and infrared sensors for distance measurement. World
Acad Sci, Engin Technol 51:293–299
15. Binoy BN, Keerthana T, Barani PR, Kaushik A, Sathees A, Aswathy SN (2010) A GSM-based
versatile unmanned ground vehicle. In: 2010 international conference on emerging trends in
robotics and communication technologies (INTERACT), Chennai, India. IEEE, pp 356–361
16. Brandao AS, Sasaki AS, Castelano CR Jr (2012) Autonomous navigation with obstacle avoid-
ance for a car-like robot. In: 2012 international conference on robotics symposium and latin
american robotics symposium (SBR-LARS), Fortaleza, Brazil. IEEE, pp 104–126
17. Harindravel L (2013) Mobile robot surveillance system with GPS tracking
A Comprehensive Review of Small
Building Detection in Collapsed Images:
Advancements and Applications
of Machine Learning Algorithms
1 Introduction
The precise identification and assessment of small buildings inside collapsed images
is crucial for successful disaster response, recovery, and urban planning following
catastrophes and urban emergencies, including explosions, landslides, and earth-
quakes. The review aims to comprehensively understand why this detection process
is crucial and how it can significantly impact various aspects of disaster response,
recovery, and urban development.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 429
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_33
430 I. Sajitha et al.
Identifying collapsed buildings helps prioritize search and rescue efforts during
earthquakes, hurricanes, or floods. Efficient deployment of resources to areas with the
most significant damage can save lives and minimize further casualties. Identifying
areas with collapsed small buildings aids in planning and delivering humanitarian
aid, ensuring that resources reach the affected population promptly [1]. Accurate
detection assists in evaluating the integrity of critical infrastructure, such as bridges
and roads, which may be affected by the collapse of nearby buildings. Identifying
and assessing the damage to small buildings is crucial for urban planners when
developing strategies for rebuilding and reconstructing affected areas and also aids
insurance companies in determining the extent of damage, processing claims, and
adjusting premium rates based on the risk associated with specific geographic loca-
tions. Remote sensing technologies, such as satellite and aerial imagery, play a crucial
role in accurately detecting and assessing the impact of disasters on small buildings.
This information can be rapidly acquired and analyzed for timely decision-making.
Accurate detection of small building collapses can contribute to training machine
learning algorithms, allowing for the development of automated tools that can quickly
analyze large datasets for disaster assessment.
The scope of the review extends to the application of advanced technologies such
as image processing, machine learning, and computer vision. It also emphasizes the
use of high-resolution satellite imagery, drone technology, and other remote sensing
tools. The overarching goal is to improve the accuracy and efficiency of detecting
small building collapses, aiming to enhance disaster response, recovery, and urban
planning processes.
In general, there are two stages to the satellite imagery-based building detection
process: threshold-based and object-based. Segments are constructed and given
personality using the object-based technique using attributes (such as form, spec-
tral content, and height). The normalized disparity Index of vegetation is generated
using a digital surface model (DSM) for building detection and the threshold-based
approach.
The global influence of 2D and 3D building design is substantial. Therefore,
different methodologies and instruments are needed to extract and detect both.
Many methods and algorithms have been developed for 2D building extraction [2].
However, there aren’t many articles discussing their capabilities and restrictions.
Mayer provides an overview of the building detection methods developed until the
mid-1990s. The review includes a description of the models and strategies that were
employed in the developed approaches. However, there is an alternative: the 3D
building extraction approach. It can provide information on a specific area or city in
vertical and horizontal directions obtained by stereo-mapping-based satellites. Most
research focuses on the 2D level since access to 3D information is costly and limited.
In a review, Brenner15 emphasized the benefits of reconstruction methods utilizing
A Comprehensive Review of Small Building Detection in Collapsed … 431
light detection and ranging (LIDAR) [3]. It encompasses a detailed examination of the
features of the semi-automatic (16–19) and automatic (20–29) rebuilding methodolo-
gies. Extending the review from Mayer13, Unsalan and Boyer30 offer a comparative
assessment of the proposed techniques up until 2003. Haala and Kada31 provide
an overview of the methods developed for building reconstruction using LIDAR
and aircraft elevation data. They assert that segmentation, DSM simplification, and
parametric forms serve as the foundation for the reconstruction of structures.
Artificial neural networks, the most widely used machine learning technique for clas-
sification and regression issues, were motivated by the functioning of brain neurons.
An ANN architecture comprises an input layer, multiple concealed layers, and a
result layer. Depending on the complexity of the ANN, multiple hidden layers may
be stacked between the input and outputs levels [8]. An artificial neural network’s
(ANN) power stems from its neuronal connections and the weights applied to each
connectivity. To lower the error function, ANN learning approaches adjust the
network weights. The backpropagation algorithm (BP) is used by a well-known
ANN, BPANN, to train the network. FE structural models produce datasets for
training for ANN value computing in a variety of SHM systems. These datasets
offer both undamaged and damaged training instances. Damage to these kinds of
structures is correlated with natural frequencies and alterations in their attributes.
In order to identify and quantify structural harm in various damaged circumstances,
Tan et al. presented an approach that uses an ANN with the modal energy of strain
as a damage-sensitive characteristic. It described how to find structural deterioration
by combining the RFR and PCA techniques.
Kourehli [9] built an ANN-based SHM system for harm identification and damage
severity estimation using just the initial two natural frequency bands and partial FEM
modal data. Using rigidity loss as a damage indication, three-story flat frames, an
eight-degree-of-freedom spring-mass system, and a supported beam all underwent
effective damage detection. Natural frequencies are frequently used as damage-
sensitive characteristics in SHM systems. The significance of removing or mini-
mizing temperature change interference was highlighted by Gu et al. [10]. They were
able to distinguish between changes in naturally occurring frequencies caused by
temperature impacts and shifts in these natural harmonics carried on by deterioration
in structure using a multilayer ANN.
Goh et al. [11] developed a two-step process for identifying damage. Using an
ANN, the modal form of the unknown architecture was predicted in the first step. To
evaluate the forecast’s accuracy, compare the expected modal shape to a cubic using
the spline interpolation method. Several structure-specific response assessment sites
were utilized to train an ANN for harm identification and severity for structures.
Another was suggested by Shu et al. [12]. Acceleration measurements and statistical
displacement parameters are used as ANN training inputs in ANN-BP, which detects
bridge degradation. This concept showed that measurement noise should be reduced
or eliminated since it negatively affects the efficiency of damage identification. This
method was applied in scenarios with single and multiple damages.
A Comprehensive Review of Small Building Detection in Collapsed … 433
Convolutional neural networks, also referred to as CNNs, are perhaps the most repre-
sentative deep technique; they have been used to solve many regression and classi-
fication issues involving data. Depending on how a CNN is meant to function, its
three main layer types are pooling, fully connected, and convolutional. Convolution
involves the provided data and a customized matrix called the filter, which is how
layers of convolution generate characteristics from the input data. Pooling layers
reduce the amount of data, while fully connected layers manage data categorization
tasks [13]. The layout of the network architecture may have multiple layers layered
preceding one another, increasing network complexity yet requiring more resources
and longer training cycles.
In SHM systems, some pertinent CNN application limitations are noted for struc-
tural evaluation. In CNN training, the model’s generalization property is typically
achieved with a large amount of training data, and the structural data of damage
states are often unavailable. Moreover, large-scale civil constructions rarely provide
structural data about deterioration states. Several SHM frequently use FE model
approaches to produce data on the structural condition of damage [14]. Nonetheless,
the FE-generated data must be accurate FE models, which also rely on the algorithm’s
frame parameters being set correctly [15].
Since some of these characteristics are erratic or unknown, estimating or
computing them using experimental data is best. A time–frequency graph of the
acceleration information was used by Wang and colleagues (2019) to create a struc-
tural degradation diagnosis method. The accuracy of a CNN was 10% higher with
PSO than it would have exhibited without it when its hyperparameters were adjusted,
and the lowest marginal spectrum was utilized as the input. Oh and Kim [16] exam-
ined two alternative objective function approaches for choosing the optimal hyper-
parameters for a CNN-based damage diagnosis sys-tem. Their findings indicated a
forty percent reduction in computing costs. Using vibration data from bridges, Sony
et al. [6] created a 1D-CNN to do multiple-class damage identification.
One well-liked machine learning method for handling regression and classification
issues is support vector machines (SVMs). The main goal of its operating concept
is to maximize the distance between a support vector group and a predefined hyper-
plane. One approach to think of the maximizing target in SVM is as an optimization
problem with several solutions. One noteworthy aspect of the SVM decision is that
the boundary effectively separates a group of data classes. Provided an SVM formula
and a set of data. Table 1 demonstrates the high accuracy of multiple SVM structural
approaches in estimating damage [17]. SVM model training, however, is typically
computationally costly for actual time SHM systems. Integration of PSO with SVM
434 I. Sajitha et al.
Dataset: The xBD dataset is a publicly available dataset designed for advancing
research in building damage assessment, particularly in the context of natural disas-
ters. The xBD dataset was created by the Defense Innovation Unit (DIU), a branch
of the United States Department of Defense, to support the development of machine
learning models for automatically detecting building damage.
Training images: 1000 post-disaster satellite images labeled small buildings.
Testing images: 200 post-disaster satellite images for evaluation.
A Comprehensive Review of Small Building Detection in Collapsed … 435
Table 1 (continued)
Methodology Features Advantages Limitations
Deep learning Makes use of either Able to produce Prone to producing
generative models vibrational auto realistic building illusory forms if
encoders (VAEs) or layouts. Able to deduce improperly trained
generative adversarial building shapes even
networks (GANs) from photos of poor
quality
Rule-based systems Employs predefined Can be computationally Restricted ability to
rules based on spectral efficient adjust to different crisis
and spatial properties conditions. Difficulty in
intricate scenes
5 Results
Here’s a hypothetical Table 2 showcasing the output given by each algorithm on the
testing dataset.
Table 2 Hypothetical table showcasing the performance of each algorithm on the testing dataset
Algorithm Accuracy Precision Recall F1-score IoU
Support vector machines 0.85 0.82 0.89 0.85 0.75
Random forest 0.88 0.86 0.91 0.88 0.78
Convolutional neural networks 0.92 0.91 0.93 0.92 0.83
K-nearest neighbors 0.78 0.76 0.82 0.79 0.68
Gradient boosting machine 0.89 0.87 0.92 0.89 0.79
A Comprehensive Review of Small Building Detection in Collapsed … 437
6 Future Enhancement
The type of disaster, the area’s features, the desired degree of accuracy, and the
data’s accessibility influence the chosen approach. By utilizing the advantages of
each strategy, a hybrid technique or a combination of many ways can frequently
produce the best results. To further hone these approaches, it’s also critical to consider
the availability of labeled data and the continual advancements in machine learning
techniques.
7 Conclusion
References
1. Wang C, Zhang Y, Xie T, Guo L, Chen S, Li J, Shi F (2022) A detection method for
collapsed buildings combining post-earthquake high-resolution optical and synthetic aperture
radar images. Remote Sens 14(5):1100. https://doi.org/10.3390/rs14051100
2. Li L, Wu X (202) Deep learning-based object detection for earth-quake-damaged buildings
using convolutional neural networks. J: Remote Sens Year
3. Zhu Y, El-Rayes K (2018) Object detection in unmanned aerial vehicle imagery for post-
earthquake building damage assessment. J: J Comput Civ Eng
4. Wu J, Zhu Y, Zhang L Detecting collapsed buildings using convolutional neural networks
in aerial images. In: International conference on artificial intelligence and computer science
(AICS)
5. Ma H, Liu Y, Ren Y, Wang D, Yu L, Yu J (2020) Improved CNN classification method for
groups of buildings damaged by earthquake, based on high resolution remote sensing images.
Remote Sens 12(2):260. https://doi.org/10.3390/rs12020260
6. Xiu H, Shinohara T, Matsuoka M, Inoguchi M, Kawabe K, Horie K (2020) Collapsed building
detection using 3D point clouds and deep learning. Remote Sens 12(24):4057. https://doi.org/
10.3390/rs12244057
7. Bosch M, Foster K, Christie G, Wang S, Hager GD, Brown M (2019) Semantic stereo for
incidental satellite images. In: Proceedings IEEE winter conference on applications of computer
vision, WACV. Beijing China, pp 1524–1532. https://doi.org/10.1109/WACV.2019.00167
438 I. Sajitha et al.
8. Castrejón L, Kundu K, Urtasun R, Fidler S (2017) Annotating object instances with a polygon-
RNN. In: Proceedings-30th IEEE conference on computer vision and pattern recognition.
CVPR, Honolulu, HI, USA. pp 4485–4493. https://doi.org/10.1109/CVPR. 477
9. Hu X, Fan H (2019) Small object detection in post-disaster images using mask R-CNN. In:
IEEE conference on computer vision and pattern recognition workshops (CVPRW)
10. Zheng J, Zheng B, Liu L (2018) Remote sensing image analysis for natural disasters: advances
and challenges. ISPRS J Photo-Grammetry Remote Sens
11. Use of unmanned aerial vehicles in humanitarian crises: a case study of Nepal earthquake”
organization: United Nations office for the coordination of humanitarian affairs (OCHA) (2017)
12. Azimi M, Eslamlou AD, Pekcan G (2020) Data-driven structural health monitoring and
damage detection through deep learning: State-of-the-art review. Sensors (Basel, Switzerland)
20(10):2778. https://doi.org/10.3390/s20102778
13. Chattopadhyay S, Kak AC (2022) Uncertainty, edge, and reverse-attention guided generative
adversarial network for automatic building detection in remotely sensed images. IEEE J Sel
Top Appl Earth Obs Remote Sens 15:3146–3167. https://doi.org/10.1109/JSTARS.2022.316
6929
14. Chen LC, Teo TA, Wen JY, Rau JY (2007) Occlusion compensated true ortho rectification for
high-resolution satellite images. Photo-Grammetric Rec 22:39–52. https://doi.org/10.1111/j.
1477-9730.2007.00416.x
15. Wang C, Ji L, Shi F, Li J, Wang J, Enan IH, Wu T, Yang J (2023) Collapsed building detection
in high-resolution remote sensing images based on mutual attention and cost sensitive loss.
IEEE Geosci Remote Sens Lett: Publ IEEE Geosci Remote Sens Soc 20:1–5. https://doi.org/
10.1109/lgrs.2023.3268701
16. Mangalathu S, Burton HV (2019) Deep learning-based classification of earthquake-
impacted buildings using textual damage descriptions. Int J Disaster Risk Reduct: IJDRR
36(101111):101111. https://doi.org/10.1016/j.ijdrr.2019.101111
17. Chen Q, Wang L, Waslander SL, Liu X (2020) An end-to-end shape modeling framework for
vectorized building outline generation from aerial images. ISPRS J Photogramm Remote Sens
170:114–126. https://doi.org/10.1016/j.isprsjprs.2020.10.008
18. Bialas J, Oommen T, Havens TC (2019) Optimal segmentation of high spatial resolution images
for the classification of buildings using random forests. Int J Appl Earth Obs Geoinf 82:101895.
https://doi.org/10.1016/j.jag.2019.06.005
19. Bo H, Bei Z, Song Y (2018) Urban land-use mapping using a deep convolutional neural
network with high spatial resolution multispectral remote sensing imagery. Remote Sens
Environ 214:73–86. https://doi.org/10.1016/j.rse.2018.04.050
20. Brenner C (2005) Building reconstruction from images and laser scanning. Int J Appl Earth
Obs Geo Inf 6: 187–198. https://doi.org/10.1016/j.jag.2004.10.006.Canny. J (1986) A compu-
tational approach to edge detection. In: IEEE transactions on pattern analysis and machine
intelligence PAMI-8: 679–698. https://doi.org/10.1109/TPAMI.1986.4767851
21. Cao S, Weng Q, Du M, Li B, Zhong R, Mo Y (2020) Multi-scale three-dimensional detection of
urban buildings using aerial LiDAR data. GIScience & Remote Sens 57(8):1125–1143. https://
doi.org/10.1080/15481603.2020.1847453
22. Cao Y, Huang X (2021) A deep learning method for building height estimation using high-
resolution multi-view imagery over urban areas: a case study of 42 Chinese cities. Remote
Sens Environ 264:112590. https://doi.org/10.1016/j.rse.2021.112590
23. Chandra N, Ghosh JK (2018) A cognitive viewpoint on building detection from remotely
sensed multispectral images. IETE J Res 64:165–175. https://doi.org/10.1080/03772063.2017.
1351320. Chandra N (2022) A review of building detection methods from remotely sensed
images. https://www.currentscience.ac.in/data/forthcoming/414.pdf
Data-Based Model of PEM Fuel Cell
Using Neural Network
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 439
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_34
440 R. Aruna et al.
1 Introduction
To meet the energy demand, many alternative energy sources are used; among them,
energy production from fuel cells has been given more attention in the last decades,
since the fuel cell has high energy efficiency and is an emission-free energy producer.
The rapid development of fuel cell technology has led to many applications in power
stations, automobiles, and electronic devices [1]. The fundamentals of PEM fuel cells
with stage-by-stage development in performance, durability, and cost reduction are
discussed [2]. The steady-state and dynamic model of the PEM fuel cell is developed
and compared with the experimental value in the literature [3]. A reduced-order
model for balancing the water through the membrane assembly in the electrode of
the PEM fuel cell is discussed [4]. This model makes a simulation platform for
the performance analysis of PEM fuel cells. In addition, the derivation of a zero-
dimensional thermodynamically consistent electrochemical model for PEM fuel cells
is elaborated [5]. A steady-state model and the experimental setup of a 500W PEM
fuel cell with a boost converter and resistive load are presented in the literature [6].
To enhance the performance of fuel cells, a clear understanding of evaluation
methods using machine learning techniques is required. A study on the application
of machine learning and artificial neural networks in fuel cell applications to meet
the loading requirement and optimize the operation based on real-time monitoring
is presented [2]. Various performance prediction and optimization algorithms for
fuel cells are reviewed [7]. A comparative study on the evaluation of fuel cells is
performed in [8]. The neural network, adaptive neuro-fuzzy inference system, and
particle filtering approach methods are applied, and the general framework required
for selecting the algorithm is stated. The literature [9] uses ANN to investigate fuel
cells’ temperature effect and humidity based on the stack characteristics.
Similarly, using ANN, different sets of fuel cell data are analyzed, and a data-
based model is obtained [10, 11]. By varying the hidden neurons, the coefficients
and mean squared error using ANN for the PEM fuel cell are presented to obtain
the optimal performance [12, 13]. A dimension-reduced model is developed using
ANN to anticipate cell voltage distribution and consistency [14]. Using Machine
Learning, the two-phase flow pressure drop in the flow channel of the fuel cell is
discussed [15]. In [16], Support Vector Machine regression, Linear Regression, and
K-nearest neighbor for regression algorithm are used for the various humidity studies
in fuel cells.
The main contribution of this paper is an investigation of the experimental setup
of the 5W PEM fuel cell, the corresponding empirical equations, and obtaining the
data-based model of the PEM fuel cell. The second section elaborates on the workings
of fuel cells using the equations. In the third section, the experimental setup of the
fuel cell is discussed. The ANN technique is presented in the fourth section. The fifth
section includes the results and discussion. The conclusion of the research work is
presented in the last section.
Data-Based Model of PEM Fuel Cell Using Neural Network 441
In fuel cells, an electrochemical reaction occurs in which the hydrogen and oxidizing
agent oxygen is converted into electricity using a pair of redox reactions [7]:
Fuel cells require a constant source of hydrogen and oxygen for chemical reac-
tions, but in a battery, chemical reactions occur within the substances. It is the main
difference between batteries and fuel cells. The fuel cells can generate electricity
when hydrogen and oxygen are continuously supplied. However, the batteries will
discharge soon after the chemical reaction has been converted. Next, it needs to
charge it again. Thus, the fuel cell has become a more suitable alternative energy
carrier that does not pose any environmental hazards.
A PEMFC consists of a membrane electrode assembly (MEA); it contains an
anode and a cathode, both of which are isolated by a proton conductive membrane
[2]. Figure 1 presents a diagram of the fuel cell. A reaction occurs when the constant
hydrogen gas within the anode electrode and oxygen enter the cathode. As a result
of an electrochemical reaction, the protons and electrons are produced due to an
oxidation reaction. The electrolyte exchange membrane creates the path for those
particles. The electrons travel toward the outside electric circuit. The protons combine
with the oxygen and produce water as output. Several PEMFC mathematical models
were drafted in recent years to understand the main phenomena that alter the device’s
performance and obtain an adequate system with good effectiveness.
On the anode side of the fuel cell, hydrogen gas disseminates to the anode catalyst,
which separates it into protons and electrons [2]. The generated protons reacted
with oxidants, resulting in multi-facilitated proton membranes. These protons are
transmitted using the membrane to the cathode; at that time, the electrons are passed
through an external circuit because the membrane is electrically insulated. At the
same time, the oxygen molecules combine with the electrons and protons in the
cathode side to produce water [5], where the hydrogen and oxygen ions combine
and produce water as a by-product. The cell voltage (Vcell ) of the PEM fuel cell is
expressed as
−3
RT PH2 PO2
E = Eo − 0.85 × 10 T − Tref + 2.3 log log (4)
4F PH2 O
The delay of reaction on the electrode surface causes an activation loss (Vact),
which is tested by Tafel’s equation:
RT J
Vact = ln ln (5)
αnF Jo
where J is current density, R is the universal gas constant, and F is Faraday’s constant.
The membrane resistance (Rmem ) provides the ohmic loss (Vohm ):
where w is the mass transfer coefficient, and n is the production of growth rate during
electrochemical reaction in the catalytic layer [3].
Data-Based Model of PEM Fuel Cell Using Neural Network 443
The experiment used a 5W PEM fuel cell model, as shown in Fig. 2. It is a reversible
PEM fuel cell that can act as an electrolyzer and a fuel cell. When electricity is
applied, the distilled water is segregated into hydrogen gas and oxygen gas. For the
power generation, the reverse process is carried out.
Two oxygen and hydrogen tanks, 12 and 24 ml, are considered for producing elec-
trical energy, as shown in Fig. 3. The area of the membrane is 6 cm2 . On connecting
the load as a 10K resistor, the V-I characteristics of the PEM fuel cell are obtained
according to the experimental result in Table 1 and are shown in Fig. 4.
An algorithm developed based on the brain function is the Artificial Neural Network
(ANN) [10]. It is used to model complex patterns and predict the issues. The growth
of ANN results in duplicating the mechanism of the human brain. The ANN method
is similar to biological neural networks, but they are not the same. The combination
of neurons adjusts the computational activities according to the selected activation
function and related inputs. The backpropagation algorithm is used to solve the
problems and model the regressions more accurately; the flowchart is shown in
444 R. Aruna et al.
0.8
0.6
0.4
0.2
0
0 0.5 1 1.5 2
Current density (A/cm2)
Fig. 5. In ANN, the activation functions will include non-linearity, which depends
on the sensitivity of the input feature vectors [5].
Figure 6 shows the structure of the neural network for the fuel cell. The leftmost
layer is the input layer, and the neurons in this layer are named input neurons.
The rightmost layer is the output layer in which the forecast values are presented
as output. The hidden layer is placed in between those two layers. To develop a
data-based model of the PEM fuel cell, 200 data are collected from the fuel cell.
Three input parameters are considered, temperature, current density, and pressure of
hydrogen gas, and the output is cell voltage. Here, the hidden layer 10 is initially
taken for the data-based model development.
Data-Based Model of PEM Fuel Cell Using Neural Network 445
Compute weight
updates for output and
Stop hidden layer
Trained?
Hidden Layer
Input Layer
Output Layer
Temperature
Cell Voltage
Current density
Pressure
In PEM fuel cells, the developed model is required to depict both linear and non-
linear behavior according to the operating conditions. Based on the physics-based
equation, the characteristics curve of the PEM fuel cell is obtained, as shown in
Fig. 7.
A validation performance graph shows the functioning of the PEM fuel cell for the
given validation data set during the training process. Figure 8 shows the best valida-
tion performance for the data set; it is observed that the best validation performance
occurred at epoch 37.
Figure 9 shows the training state and depicts the present progress/status of the
training at an exact time during the training progress. The training state shows the
value for gradient, momentum, and validation. If the gradient is higher, the slope will
446 R. Aruna et al.
be steeper and faster than the response [11]. The obtained gradient value for the data
set in this training state is 0.0033402 at epoch 37. During training, network conver-
gence takes place to obtain the solution faster. This technique is named momentum.
The graph shows that the momentum value in this training state is Mu = 0.001
at epoch 37. The validation set handles a subset of the training data to deliver an
unbiased calculation of a model. Using the validation process, over-fitting issues are
found where the model accomplishes well on the given training data and allows for
model tuning and optimization to attain improved simplification performance. The
validation data set differs between training and test sets. It is an intermediate phase
for selecting and optimizing the best model, in which the hyperparameter tuning
occurs. The obtained value for the validation check is 6 at epoch 37.
The regression plots are observed to validate the performance of the acquired
trained model. The training and validation regression plot for the fuel cell data set
is represented in Fig. 10. Figure 10 shows that the predicted model is characterized
Data-Based Model of PEM Fuel Cell Using Neural Network 447
by high accuracy because most data points come ahead of a 45-degree line; hence,
the output is similar to the target. The integrity of the model is analyzed using the R
values. It should be between 0 and 1. In our research, the accuracy of the acquired
model is confirmed by the following R values: R = 0.99996, the test is R = 0.99997,
and all is R = 0.99998. For the betterment of the performance of fuel cells, a
comparison analysis was done in the neural network architectures with ten hidden
layers and twenty hidden layers. It’s essential to note that the choice of the number
of hidden layers and their sizes in a neural network depends on the specific problem
trying to solve, the data, and various factors.
Table 2 shows the ten hidden layers of the neural network are considered moderate
depth. It is appropriate for the required tasks to some level of feature abstraction but
not extreme ones. It is faster and has fewer computational resources. The neural
network is obtained as a deep network with twenty hidden layers. It is more suitable
for tasks requiring a high level of feature abstraction and complex hierarchical repre-
sentations. Deeper networks require more time and are complicated to optimize and
train [12]. The obtained data-based model is compared, as shown in Fig. 11.
From Fig. 11 and Table 3, the developed data-based model of the PEM fuel
cell shows a better response. Hence, the developed model is crucial for designing
efficient systems and optimizing their performance in various applications, including
448 R. Aruna et al.
6 Conclusion
to develop sustainable and efficient energy sources, and ANNs provide a powerful
tool. The obtained model is compared with the real-time PEM fuel cell; it is found
that the obtained data-based model is suitable for designing a controller to obtain
constant output voltage from the PEM fuel cell.
References
1. Parekh A (2022) Recent developments of proton exchange membranes for PEMFC: a review.
Front Energy Res 10:956132
2. Wang Y, Seo B, Wang B, Zamel N, Jiao K, Adroher XC (2020) Fundamentals, materials,
and machine learning of polymer electrolyte membrane fuel cell technology. Energy and AI
1:100014
3. Zhu L, Yu Q, Huang Y, Guan J, Wang Y, Yan Y (2020) Mathematical modeling and operation
parameters analysis of proton exchange membrane fuel cell. IOP Conf Ser: Earth Environ Sci
467(1):0–11
4. Goshtasbi A, Pence BL, Chen J, DeBolt MA, Wang C, Waldecker JR, Ersal T (2020) Erratum: a
mathematical model toward real-time monitoring of automotive PEM fuel cells. J Electrochem
Soc 167(4):049002
5. Kravos A, Ritzberger D, Tavc̆ar G, Hametner C, Jakubek S, Katras̆nik T (2020) Thermodynami-
cally consistent reduced dimensionality electrochemical model for proton exchange membrane
fuel cell performance modeling and control. J Power Sources 454:227930
6. Omran A, Lucchesi A, Smith D, Alaswad A, Amiri A, Wilberforce T, Olabi AG (2021)
Mathematical model of a proton-exchange membrane (PEM) fuel cell. Int J Thermofluids
11:100110
7. Su D, Zheng J, Ma J, Dong Z, Chen Z, Qin Y (2023) Application of machine learning in fuel
cell research. Energies 16(11):4390
8. Mao L, Jackson L (2016) Comparative study on prediction of fuel cell performance using
machine learning approaches. Lect Notes Eng Comput Sci 1:52–57
9. Derbeli M, Napole C, Barambones O (2021) Machine learning approach for modeling and
control of a commercial heliocentrisFC50 PEM fuel cell system. Mathematics 9(17):2068
10. Legala A, Zhao J, Li X (2022) Machine learning modeling for proton exchange membrane fuel
cell performance. Energy and AI 10(July):100183
11. Wilberforce T, Olabi AG (2021) Proton exchange membrane fuel cell performance prediction
using artificial neural network. Int J Hydrogen Energy 46(8):6037–6050
12. Wilberforce T, Biswas M, Omran A (2022) Power and voltage modelling of a proton-exchange
membrane fuel cell using artificial neural networks. Energies 15:5587
13. Wilberforce T, Biswas M (2022) A study into proton exchange membrane fuel cell power and
voltage prediction using artificial neural network. Energy Rep 8:12843–12852
14. Cao J, Yin C, Feng Y, Su Y, Lu P, Tang H (2022) A dimension-reduced artificial neural network
model for the cell voltage consistency prediction of a proton exchange membrane fuel cell
stack. Appl Sci (Switzerland) 12(22):11602
15. Chauhan V, Mortazavi M, Benner JZ, Santamaria AD (2020) Two-phase flow characterization
in PEM fuel cells using machine learning. Energy Rep 6:2713–2719
16. Saco A, Sundari PS, Karthikeyan J, Paul A (2022) An optimized data analysis on a real-
time application of PEM fuel cell design by using machine learning algorithms. Algorithms
15(10):1–19
Ensemble Technique to Detect Intrusion
in a Network Based
on the UNSWB-NB15 Dataset
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 451
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_35
452 V. S. Badiger and G. K. Shyam
1 Introduction
Computer networks and their related devices are indeed vulnerable to cyberattacks.
Cybersecurity threats have become increasingly sophisticated and prevalent, posing
significant risks to the confidentiality, integrity, and availability of network informa-
tion and services. There is a constant evolution of new types of attacks, which creates
more dependable, flexible, and adaptable network intrusion detection systems. Sensi-
tive and important data are always the target of attackers. An intrusion is any unau-
thorized access or activity within a computer system, network, or application to steal,
modify, or corrupt user data. It is a process where the attacker sends packets to gain
access to the network system to perform mischievous activity with data. An attack can
be defined as a malicious attempt through network packets to exploit vulnerabilities in
a computer system, network, application, or other digital assets. Any existing vulner-
ability such as misconfiguration, software flaws, and weak authentication may permit
intrusion to occur in network systems, devices, or applications. Worldwide, many
industrial sectors depend on the network as the mode of their operations, resulting
in most cyber-attacks. As these attacks become more proficient, the network intru-
sion detection system (NIDS) is essential to security systems. As per 2023 cyber
security records, many organizations were impacted by cyber-attacks. DDoS attacks
were launched on banks, ransomware attacks on hospitals, and malware attacks that
exposed 1.5 million customer data [1]. To mitigate these types of cyber-attacks, a
strong security tool is essential. Firewalls block illegal packets and safeguard network
security from unauthorized access and cyber threats. Manual rules are configured to
identify the legitimate traffic and packets. They cannot identify internal attacks [2].
Due to manual configuration it is difficult to detect advanced attacks.
Based on the mechanism of detection and analysis, IDS is classified as anomaly-
based (AIDS) and signature-based (SIDS). AIDS detects abnormal behavior or devi-
ations from established computer systems or network baselines. It monitors network
traffic for unusual patterns or deviations from typical communication patterns. In
AIDS, establishing an accurate baseline is complex, and it can lead to false alarms
[3]. SIDS is designed to identify and block known threats by comparing observed
activities with a database of predefined signatures or patterns associated with known
malicious behavior. When an intrusion occurs, SIDS matches the database of signa-
tures or patterns. If a match occurs, an alarm is raised. This type of IDS is also known
as knowledge-based IDS or misuse-based IDS [3]. As SIDS models are trained to
detect threats based on existing patterns in the database, they are out of date for
detecting new threats. On the other hand, AIDS can detect new incoming threats
by comparing the incoming packets with the known trained baseline knowledge as
suspicious or not.
IDS can be deployed on the network or the host. According to the deployment
location, IDS is classified as a network-based intrusion detection system (NIDS)
and host-based intrusion detection system (HIDS). NIDS monitors network traffic
and identifies suspicious patterns or anomalies, whereas HIDS monitors activities
on individual hosts or devices, such as file operations, file modifications, application
Ensemble Technique to Detect Intrusion in a Network Based … 453
access, and operating system activities, to record suspicious activities in a log file
[4].
Machine learning (ML) involves learning effectively from predefined data to infer
meaningful full information, which helps detect and predict. ML, when applied to the
IDS, enhances its ability to detect and respond to threats. ML can analyze network
patterns and identify network traffic or system behavior anomalies, enabling more
adaptive and effective threat detection.
Ensemble learning strategically combines multiple machine learning algorithms
to solve a problem [5]. Ensemble learning techniques can be applied to enhance
the performance of IDS. The idea behind ensemble learning is by aggregating the
predictions of multiple models, the weaknesses of individual models can be mitigated,
and overall performance can be improved. Ensemble learning starts with the creation
of multiple individual models, often referred to as base learners. These base learners
can be of the same or different types, such as decision trees, support vector machines,
and neural networks. They use an aggregation strategy to combine the predictions
of base learners. Averaging and voting are applied to combine the base learners
in an ensemble technique. This study’s overall motivation is classifying malicious
network traffic using the ensemble learning technique on the UNSWB-NB15 dataset.
A variety of algorithms, such as Decision Tree (DT), Random Forest (RF), K-Nearest
Neighbor (KNN), XGBoost (XGB), and Logistic Regression (LR), were used to
analyze the performance of the proposed model. The performance parameters used to
assess the model include accuracy, precision, recall, and F1-score. The performance
analysis showed that the proposed ensemble model has improved results compared
to the standalone machine learning algorithms with 95% accuracy.
The overall contributions of the paper are summarized as follows:
• We proposed an ensemble machine learning approach and showed its reliability
for detecting network intrusion by interpreting the dependability in metrics.
• The Gini index for feature selection uniquely outperforms the state-of-the-art
models for network intrusion detection.
• Finally, we used a number of performance indicators to assess how well the model
can perform in terms of accuracy, precision, F1-score, and recall. The results show
that our ensemble model is superior to the existing model in detecting intrusions,
resulting in lower type-1 (False Positive) and type-2 (False Negative) rates.
The subsequent sections offer an overview of the existing work, the proposed
model, and the findings. Section 2 provides an overview of the related work. A
detailed description of the proposed methodology and dataset is given in Sect. 3.
Following these results and findings, the model analysis is explained in Sect. 4.
Finally, the conclusion of our proposed methodology with the future work description
is given in Sect. 5.
454 V. S. Badiger and G. K. Shyam
2 Literature Review
Intrusion detection systems are evolving quickly to detect the latest threats. Indeed
in recent years, a number of intrusion detection systems have been proposed to
improve the performance of the IDS. Some of the techniques use standalone machine
learning algorithms, whereas others combine multiple machine learning algorithms
to improve the model’s performance. In this section review of these approaches is
been given.
Ripon Patgiri et al. [6] experimented on the NSL-KDD dataset and evalu-
ated machine learning algorithms to detect intrusion in the network. They applied
feature reduction techniques using recursive feature elimination techniques to reduce
features. On the reduced features they applied SVM and random forest algorithms.
Chen et al. [7] experimented on basic security module (BSM) audit data from the
Defense Advanced Research Projects Agency (DARPA) intrusion detection dataset.
They showed that SVM performed better at detecting intrusions than artificial neural
networks.
Govindarajan et al. [8] proposed a hybrid architecture to detect network intrusion.
It’s interesting to find out about the study that demonstrated the enhanced perfor-
mance attained by merging a multilayer perceptron (MLP) and radial basis function
(RBF) ensemble in the context of network intrusion detection. Aburomman et al. [9]
developed an ensemble design using weights generated by particle swarm optimiza-
tion to get weights for the opinion of experts. SVM and K’s nearest neighbors were
used as base classifiers to create an intrusion detection system.
Hooshmand et al. [10] performed experiments to select features for training
ensemble models. Five feature selection techniques were applied to select optimal
features. They used set theory’s quorum and union combination techniques to
combine the outcomes of multiple methods. Using the best feature sets, they assessed
the effectiveness of various machine learning techniques, including RF. X. Gao et al.
[11] developed an adaptive ensemble voting algorithm using a decision tree, random
forest, KNN, and DNN as base classifiers. Multiple decission tree named as Multi-
Tree was used as base classifiers. Wang et al. [12] applied a core vector machine,
a data mining technique to develop an ensemble system for improving accuracy.
They suggested an ensemble method for intrusion detection based on Bayesian
networks and random trees. Zainal et al. [13] presented an ensemble classifier that
utilized RF, adaptive neural-fuzzy inference, and linear genetic programming. Naive
Bayes, C4.5 decision trees, VFI-voting feature intervals, and KNN clustering are
used in a meta-learning-based system. Jiang et al. [14] study offers a convolution
neural network (CNN) and bi-directional long short-term memory (BiLSTM)-based.
Tahri et al. [15] employed an SVM algorithm to detect intrusions. Ahmed et al. [16]
provided a machine learning ensemble-based strategy for detecting intrusions. By
incorporating PCA for feature selection and using Random Forest for prediction, this
strategy enhances the efficiency and accuracy of intrusion detection while effectively
addressing class imbalance through SMOTE. Andrecut [17] proposed anomaly-based
Ensemble Technique to Detect Intrusion in a Network Based … 455
distance criterion with a random forest classifier was applied for splitting. Two new
algorithms were proposed to overcome the class overlap issue in the dataset.
Table 1 briefly summarizes the literature review done for the study. From the liter-
ature review, we can infer that with the present feature selection strategy, the current
intrusion detection model does not demonstrate all types of intrusion detection. Some
work is compatible with single-classifier-based machine learning models, and others
on hybrid-based methods. Creating an intrusion detection model that can effectively
handle all types of intrusions on the current datasets and new incursions is a complex
task. To achieve this, a robust and adaptable approach is needed to generate improved
performance and address the existing system’s issues. The suggested model may
effectively identify network attacks using the stacking ensemble machine learning
technique for binary (two-class) classification tasks.
3 Methodology
Numerous datasets are obtainable for intrusion detection online for research
purposes. The most popular among them is KDDCUP99, created in 1999 by the
Canadian Institute of Cyber Security and is a decade old. It has 41 features cate-
gorized in four groups: base features, content features, time-based features, and
host-based features. The total size of the dataset is 4,898,430 records, which is larger
than any other dataset. Four types of attack categories are available in the dataset:
U2R, DoS, R2L, and probe. The biggest drawback of this dataset is that it has many
redundant records, which will decline the algorithm’s performance as the algorithm
leads to bias. To overcome this, NSL-KDD was introduced, which has overcome
the issues of KDDCUP99. NSL-KDD does not help much in detecting modern-day
attack scenarios.
UNSW-NB15 dataset was generated in 2015 [7], which is the latest compared to
NSL-KDD. The Australian Centre created this dataset for the Cybersecurity cyber
range lab, including nine attack categories. In contrast, an NSL-KDD has four types
Ensemble Technique to Detect Intrusion in a Network Based … 457
nine attack categories found in the dataset and its description are given in Table 2.
The original dataset was generated in pcap format and converted to a csv file. The
total number of records in the csv file is more than 2.5 million, of which 2,21,876
contain normal records, and 3,21,283 contain attacked records. In our study, we are
not using the original size dataset; instead, we have used a refined dataset containing
175,341 records as a training set and 85,332 as a testing set.
numeric using Label Encoder. Label Encoder assigns a unique number to the cate-
gorical features starting from 0. The numerical features scale in different ranges.
They were standardized to overcome inconsistency among them. The dataset has
an extremely high and lower range of feature values. It leads to bias. In order to
overcome the bias and have a common scale value Z-score normalization, a stan-
dardization technique is applied to the features. The features are subtracted from the
mean and divided by the standard deviation. Standardization is shown in the Eq. 1:
X −µ
X a = (1)
M
X is the original value of the feature, X’a is the new value after standardization,
µ is the mean of the feature, and M is the standard deviation of the feature. Missing
values were verified. Some features have 0 as their values and some have missing
values. The 0 depends on the feature and its type.
460 V. S. Badiger and G. K. Shyam
UNSWB-NB15 is a high-dimensional feature dataset. Not all the features are signifi-
cant in model building. Feature reduction is required to avoid predicament and select
optimal features. In our proposed worktop 14 features such as dur, proto, state, rate,
sttl, sload, sinpkt, ct_srv_src, ct_state_ttl, ct_dst_ltm, ct_dst_src_ltm, ct_src_ltm, ct_
src_dst, and is_sm_ips_ports were selected from the UNSWB-NB15 dataset using
their importance value which is calculated using Gini index. Gini index was used to
measure inequality in the features. The formula of the Gini index is shown in Eq. 2:
n
gini = 1 − Pi ∗ Pi (2)
i=1
Pi is the probability of ith feature. Ideal Gini scores range from 0.0 to 0.5. Features
are selected with a Gini score of 0.3. 0.0 and 0.5 lead to bias and generate an overfitted
model.
In this study we have used 5 predominant classifiers as base classifiers in the ensemble
technique, i.e. Decision Tree, Random Forest, XGBoost, K-Nearest Neighbor,
Logistic Regression, and Multilayer Perceptron as meta classifier to classify network
traffic into normal and attack category. During training and testing, the data was split
with 70% of the dataset used for training and 30% for testing.
In our study, we trained ensemble classifiers with the selected features. The total
dataset size used was 1,75341, of which 80% was used for training and 20% for
testing. All the parameters of the base classifiers and meta classifiers were done
prior. For the Decision Tree, Random Forest, and XGBoost, random state = 1 was
applied. The decision tree max depth was set to none. The number of neighbors in
K-Nearest Neighbors is neighbors = 2. The meta classifiers multilayer perceptron
parameters were alpha = 1 and maximum iteration =100. Table 3 depicts the results
of binary class classification values on UNSWB-15 dataset as normal and attack
values.
The confusion matrix of our model which gives the insight of the binary class
classification report is presented in Fig. 2.
The confusion matrix results indicate that our model exhibits high sensitivity
(recall) and relatively low false negatives. However, further analysis, incorporating
precision, accuracy, and F1-score, is necessary to evaluate the performance of the
model comprehensively. Figure 3 depicts the performance measure of all five base
and ensemble classifiers. The figure shows that the ensemble classifier accuracy is
highest, with a yield of 95% compared to all the base classifiers. Among the base
classifiers, XGBoost yields the lowest accuracy of 83.5% for binary classification.
Table 4 gives the comparative analysis of all five classifiers and the ensemble
technique in terms of accuracy, F1-score, recall, and precision. The results show that
the ensemble classifier using multilayer perceptron as a meta classifier has performed
better in terms of accuracy of 95%, F1-score of 94, recall of 92.9, and precision of
96 compared with all five base classifiers.
The proposed model has been compared with existing intrusion detection systems in
the literature based on the UNSWB-NB15 dataset, and the proposed model was tested
on the same dataset. Most of the work is carried out on both binary and multiclass
classification. The comparison study is done on binary class classification. Table 5
compares the proposed work with existing work in the literature. Five models in
the literature, the latest work, are compared with the proposed model. Yin et al.
[19] have proposed a hybrid model for an intrusion detection system using an MLP
classifier and achieved an improved accuracy of 84.24% and F1-score of 82.85%.
The other performance metrics, recall precision, were not tested at work. Sydney
Mambwe Kasongo [24] applied a Recurrent Neural Network, XGBoost, and LSTM
for intrusion detection and achieved 99.4% accuracy. However, the model has not
been tested for other performance metrics. Almomani et al. [26] developed a stacking
ensemble model with a logistic regression algorithm as a classifier and achieved good
accuracy, precision, recall, and F1-score performance. Abiodun Ayantayo et al. [27]
proposed a deep learning model for intrusion detection and achieved an accuracy
of 77.8%. The model is tested for precision and recall. Das et al. [28] proposed
an ensemble model for intrusion detection based on majority voting and achieved
improved performance.
Ensemble Technique to Detect Intrusion in a Network Based … 463
Table 5 Comparison of existing work with proposed model (NR indicates not reported)
Reference and year Accuracy Precision Recall F1-score
2023 [19] 84.24% NR NR 82.85%
2023 [24] Xgboost LSTM = 99.4% NR NR NR
Xgboost RNN = 87.07%
2023 [26] 97.9% 98.4 97.8 98.1
2023 [27] 77.8% 86.04 69.50 NR
2022 [28] 97.8 97.8 97.7 97.8
Proposed model 95 96 92.9 94
5 Conclusion
It is essential to create efficient IDS due to the rising risk of network attacks. In
the field of IDS, machine learning and deep learning have been widely used. IDS
development is hampered, nonetheless, by the problem of high-dimensional data.
The proposed work is an effective approach to detect intrusion in a network using
an ensemble machine learning technique using the UNSW-NB15 dataset. This study
involves data preprocessing to convert categorical data to numerical, efficient feature
selection, training and testing of the model, and evaluation of the results. A combina-
tion of machine learning algorithms consisting of the Decision Tree, Random Forest,
XGBoost, K-Nearest Neighbor, and Logistic Regression was used as a base clas-
sifier and multilayer perceptron as meta classifier, yielding an improved prediction
compared to existing work in the literature. In the study, intrusion detection is carried
out for two class classifications where it can tell whether an intrusion is present. The
model is compared with existing works in the literature for accuracy, precision, recall,
and F1-score, and our model has provided improvised results in ensemble learning.
In the future, the work can be further extended for multiclass classification, where
the model can detect the type of intrusion in the network. Accordingly, the expert
should take measures to overcome the problem. The experiment can be conducted
to test model performance on multiple datasets.
References
Abstract This study demonstrates the expansive utility of Markov chains in statis-
tical modeling, emphasizing their role in simulating complex systems within diverse
fields, including engineering, economics, biology, and computer science. We present
an innovative integration of Markov chain theory to predict the future states of
dynamic systems and introduce ModerCMarkov, a novel web application designed in
R Studio using Shiny packages. This application leverages Markov chains for fore-
casting outcomes from varied database typologies. Our comprehensive evaluation of
ModerCMarkov assesses its processing speed and predictive accuracy across multiple
databases, varying in scope and complexity. The results highlight the application’s
robustness, evidenced by its rapid processing capabilities and precise predictions.
Furthermore, our research utilizes the Markov chain approach to identify critical
nodes within key variables, enhancing our understanding of these systems. Moder-
CMarkov emerges as a powerful tool for intricate analysis and modeling of complex
variable databases, offering significant contributions to multidisciplinary research
endeavors.
1 Introduction
The study at hand sits at the intersection of statistical modeling and computational
applications, focusing on Markov Chain Monte Carlo (MCMC) methods. These
methods are lauded for their versatility and effectiveness across a spectrum of scien-
tific inquiries, a theme echoed in the literature. For instance, research has shown that
MCMC is adept at generating classified textual data, underscoring its adaptability
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 465
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_36
466 F. Torres-Cruz et al.
for various applied science contexts [1]. Another study has made strides in bridging
the gap in parallel interactive MCMC algorithms, thereby enhancing the efficiency
of transitions between chains [2].
A systematic literature review analyzing 480 studies on software test case prior-
itization with Markov chains has reaffirmed the tool’s robustness and adaptability,
demonstrating its validity through analytical and numerical means and suggesting
prospective research trajectories [3]. Similarly, a comparative study evaluating
MCMC software for chromosomal microsatellite data analysis in evolutionary
biology highlights different tools’ nuanced strengths and limitations concerning
parameter estimation, execution speed, and convergence patterns [4].
The burgeoning field of web applications has also embraced Markov chains. For
instance, a web application built on the Python Flask framework has been instru-
mental in leveraging machine learning for diabetes risk prediction based on clinical
data, offering an interactive platform for users [5]. Another novel application of the
hidden Markov model probes the subtleties of city clusters, elucidating the latent
connections between institutional digital economy support, green finance develop-
ment, and energy consumption while predicting stationary probabilities for cluster
transitions [6]. In health economics, web-based software employing MCMC analysis
has proven valuable in juxtaposing the cost-effectiveness of different treatments, an
innovation that enhances both education and practical application for students and
health professionals alike [7]. On a similar note, the estimation of Markov chain
transition matrices, particularly within small sample spaces, has been advanced
through Monte Carlo experiments, offering a comparison to traditional methods
and showcasing the potential for refinement in matrix approximation [7].
Furthermore, SpatialEpiApp has emerged as a comprehensive tool in public health
surveillance, merging disease mapping with cluster detection without necessitating
advanced programming knowledge. This web application facilitates the fitting of
Bayesian models to assess disease risks and delineate disease clusters using SaTScan,
further contributing to the field’s capacity for generating interactive data visualiza-
tions and reports [8]. Markov chains have been instrumental in the healthcare sector,
particularly in understanding the progression and treatment patterns of dementia.
Research has applied these chains to discern patterns in medical appointments,
stratifying dementia patients into subgroups to uncover the most prevalent clinical
pathways and transitions between medical specialties [9]. Similarly, in industrial
processes such as casting, Markov chain models offer predictive insight into failure
probabilities, streamlining the process by providing a transition probability matrix
that enhances the optimization of the casting process with industry data [10].
The reliability of software in complex systems has also been a focal point of
Markov chain applications. A study suggests that higher order Markov chains, which
consider deeper historical dependencies, significantly enhance software reliability
assessment. Such an approach has been validated in the flight software of CubeSat
nanosatellites, where introducing these chains improved failure rate predictions by a
considerable margin [11]. An interactive R-Shiny application presents a novel way to
visualize longitudinal data in clinical trial simulations. This tool distinguishes itself
by expediting the analysis of platform trials, demonstrating the relative impact of
Enhancing Statistical Analysis with Markov Chain Models Using … 467
input variables on outcomes without any current free, open-source equivalent in the
R ecosystem [12].
Further extending the use of Markov chains in software reliability, another study
builds upon previous methodologies by proposing an algorithm that simplifies high-
order processes into an equivalent first-order process. This simplification does not
compromise accuracy; it has been shown to significantly improve failure rate assess-
ments for complex software systems like those used in CubeSat nanosatellites [13].
Moreover, Markov chain theory has been applied to overcome the technical assump-
tions typically associated with large deviations in longer runs. An enhanced global
estimate for the distribution function has been presented, offering potentially broad
applications beyond the original scope of the research [14].
2 Methods
This study employs a methodology rooted in Markov chain theory [15], utilizing
the R programming language and its associated packages markovchain, shiny,
shinythemes, tidyverse, ggplot2, and ggcorrplot for implementation [16]. Our
approach is methodically structured into two primary phases. The first phase encom-
passes data collection and preparation, ensuring the integrity and relevance of the
dataset. The second phase involves the meticulous implementation and analysis of
the software, leveraging the computational power of R and the specialized capa-
bilities of the selected packages [16, 17]. This dual-phase methodology ensures a
comprehensive and robust analysis aligning with the study’s objectives.
Once the data sample has been collected, the next step is data preparation, which
consists of cleaning and transforming the data for subsequent analysis. Once the
data sample has been collected, the next step is data preparation, which consists of
cleaning and transforming the data so that they can be used for subsequent analysis.
In this research, we chose three datasets to test the developed software: Ceedata,
Diabetes, and Credit. Each dataset requires different preprocessing steps to address
issues such as missing values, outliers, and data normalization.
dataset used to finally identify the relevant variables for constructing the prediction
model with Markov chains.
This is the central phase of this study; the markovchain package of R is used to build
the prediction model from the transition matrix generated from the input data. This
model is used to predict the transition probabilities from one state to another in the
future, providing valuable information for decision-making in various domains.
The ability of the developed software to process different datasets and generate
accurate and reliable prediction models is evaluated. Rigorous tests are performed,
and the results are compared with real data to evaluate the application’s performance.
The results of this stage are presented in the form of tables and graphs to facilitate
their interpretation.
The developed application was compared with other similar data analysis tools or
methods in terms of accuracy, efficiency, scalability, etc. Relevant metrics are used to
measure and compare the performance of each method, and the results are presented
clearly and concisely for interpretation.
The prediction model with Markov chains is implemented in a web interface for
practical and accessible use by end users. The R package Shiny creates an intuitive
and easy-to-use user interface that allows users to enter data and visualize the results
interactively. The web application’s aesthetic and functional design are made to
ensure a satisfactory experience for the end user.
Enhancing Statistical Analysis with Markov Chain Models Using … 469
The Markov chain technique is a mathematical tool to model systems that evolve
probabilistically over time. These systems can represent a wide variety of situations,
from the behavior of a molecule in a chemical reaction to the prediction of stock
prices in the stock market, i.e. general-purpose predictive models can be generated
through this technique. In Fig. 1, we may see the flow of our implementation.
The application was implemented in R, taking advantage of the power of the Shiny
packages for its integration into the web framework. Tests have been carried out
on a computational system that meets high technical specifications to evaluate the
prediction model’s performance, which include an Intel Core i7-6700 processor, a
4 GB graphics card, 8 GB of RAM, and a 64-bit operating system with a frequency
of 1.80 GHz to analyze the performance of the application; processing time metrics
have been used when applying the prediction model three times to four databases
470 F. Torres-Cruz et al.
3 Results
We will describe the web implementation process in a simple way and then render
the application with simple experiments.
3.1 Interface
The Shiny package for R is a powerful tool for building interactive web applications
accessible through any standard web browser. This package is a foundation for users
to craft graphical user interfaces (GUIs) that can bridge the gap between complex
statistical analysis and user-friendly interactivity. In this study, additional R pack-
ages complement Shiny’s capabilities to enhance the application’s functionality for
rendering tables and generating descriptive visualizations. These include ‘plyr’ for
data manipulation, ‘tidyverse’ for data science tasks, ‘ggplot2’ for creating high-
quality graphics, and ‘ggcorrplot’ for visually displaying correlation matrices. The
application’s workflow is visualized in Fig. 2 and encompasses the following steps.
Upon initial access, the user will receive a concise introduction to the appli-
cation. This overview explains the application’s purpose, functionality, and scope,
ensuring users understand the tools at their disposal (Fig. 2). Additionally, a sidebar
is incorporated to facilitate navigation and feature access within the application.
Data Section. In the data section (Fig. 3), the user loads the data to be analyzed,
displaying the loaded data.
Data Description. Once the dataset is loaded, the description section shows the
summary of the dataset headers and the descriptive intervariate graph which, in
addition to showing the statistical values, graphically shows the distribution of the
data through the correlation diagram in Fig. 4.
Prediction Section. The user is shown the variables found in the data analysis;
one of them is requested for the prediction; the number of steps to predict can also be
loaded with a base of 1 initially, providing the transition table and Markovian graph
(Fig. 5).
3.2 Predictions
4 Conclusions
References
1. Cerqueti R, Ficcadenti V, Dhesi G, Ausloos M (2022) Markov chain Monte Carlo for generating
ranked textual data. Inf Sci (N Y) 610:425–439. https://doi.org/10.1016/j.ins.2022.07.137
2. Rigata BF, Mirac A (2019) Parallel hierarchical sampling: a Markov chain Monte Carlo algo-
rithm interactive general-purpose. Comput Stat Data Anal 1. https://doi.org/10.1016/j.csda.
2011.11.020
3. Barbosa G, de Souza ÉF, dos Santos LBR, da Silva M, Balera JM, Vijaykumar NL (2022) A
systematic literature review on prioritizing software test cases using Markov chains. Inf Softw
Technol 147. https://doi.org/10.1016/j.infsof.2022.106902
4. Gundlach S, Junge O, Wienbrandt L, Krawczak M, Caliebe A (2019) Comparison of Markov
chain Monte Carlo software for the evolutionary analysis of y-chromosomal microsatellite
data. Comput Struct Biotechnol J 17:1082–1090. https://doi.org/10.1016/j.csbj.2019.07.014
5. Ahmed N et al (2021) Machine learning based diabetes prediction and development of smart
web application. Int J Cogn Comput Eng 2:229–241. https://doi.org/10.1016/j.ijcce.2021.
12.001
6. Huo D, Zhang X, Meng S, Wu G, Li J, Di R (2022) Green finance and energy efficiency:
dynamic study of the spatial externality of institutional support in a digital economy by using
hidden Markov chain. Energy Econ 116. https://doi.org/10.1016/j.eneco.2022.106431
7. McGhan WF, Khole T, Vichaichanakul K, Willey VJ (2020) PRM33 validating a web-based,
incremental cost-effectiveness software program that implements a Markov Chain Monte Carlo
(MCMC) analysis model. ISSUE 15. https://doi.org/10.1016/j.jval.2012.03.889
8. Moraga P (2017) SpatialEpiApp: a shiny web application for the analysis of spatial and spatio-
temporal disease data. Spat Spatiotemporal Epidemiol 23:47–57. https://doi.org/10.1016/j.sste.
2017.08.001
9. Costa LM, Colaço J, Carvalho AM, Vinga S, Teixeira AS (2023) Using Markov chains and
temporal alignment to identify clinical patterns in Dementia. J Biomed Inform 140:104328.
https://doi.org/10.1016/j.jbi.2023.104328
Enhancing Statistical Analysis with Markov Chain Models Using … 475
10. Chaudhari A, Vasudevan H (2022) Reliability based design optimization of casting process
parameters using Markov chain model. Mater Today Proc 63:602–606. https://doi.org/10.1016/
j.matpr.2022.04.189
11. Yakovyna V, Symets I (2021) Reliability assessment of CubeSat nanosatellites flight software
by high-order Markov chains. In: Procedia computer science. Elsevier B.V. pp 447–456. https://
doi.org/10.1016/j.procs.2021.08.046
12. Meyer EL, Kumaus C, Majka M, Koenig F (2023) An interactive R-Shiny app for quickly
visualizing a tidy, long dataset with multiple dimensions with an application in clinical trial
simulations for platform trials. SoftwareX 22:101347. https://doi.org/10.1016/j.softx.2023.
101347
13. Reed S, Ziadé E (2023) On transient analysis of N-Markov chains. Methodol Comput Appl
Probab 25(1). https://doi.org/10.1007/s11009-023-10002-9
14. Liu Z, Mbokoma M (2023) An improvement on the large deviations for longest runs in Markov
chains. Stat Probab Lett 193. https://doi.org/10.1016/j.spl.2022.109737
15. Vieira SC, Fabro AT, Rodrigues RLP, da Silva MJ, Morales RE, Castro MS (2023) A two-state
Markov chain model for slug flow in horizontal ducts. Flow Meas Instrum 90. https://doi.org/
10.1016/j.flowmeasinst.2023.102335
16. Holland-Letz T, Kopp-Schneider A (2021) An R-shiny application to calculate optimal designs
for single substance and interaction trials in dose response experiments. Toxicol Lett 337:18–27.
https://doi.org/10.1016/j.toxlet.2020.11.018
17. Thiede RN, Fabris-Rotelli IN, Debba P, Cleghorn CW (2023) A Markov chain model for
geographical accessibility. Spat Stat 100748. https://doi.org/10.1016/j.spasta.2023.100748
18. Bilici A, Külahcı F, Bilici, Şen Z (2023) Markov chain transition probability modeling of radon
gas records and future projection possibility determination. J Atmos Sol Terr Phys 244. https://
doi.org/10.1016/j.jastp.2023.106027
19. Galeano J, Gómez MÁ, Rivas F, Buldú JM (2022) Using Markov chains to identify player’s
performance in badminton. Chaos Solitons Fractals 165. https://doi.org/10.1016/j.chaos.2022.
112828
20. Sakthivel K, Ganesan R (2023) ESTEEM–enhanced stability and throughput for energy effi-
cient multihop routing based on Markov chain model in wireless body area networks. Sustain
Energy Technol Assess 56. https://doi.org/10.1016/j.seta.2023.103100.
21. Rothe F, Lames M (2022) Simulation of Tennis behaviour using finite Markov chains. In:
IFAC-PapersOnLine. Elsevier B.V., pp 606–611. https://doi.org/10.1016/j.ifacol.2022.09.162
22. Zhang K, Su K, Yao Y, Li Q, Chen S (2022) Dynamic evaluation and analysis of the uncertainty
of roundness error measurement by Markov chain Monte Carlo method. Measurement (Lond)
201. https://doi.org/10.1016/j.measurement.2022.111771
23. Zhang Y et al (2023) Joint nonlinear-drift-driven Wiener process-Markov chain degradation
switching model for adaptive online predicting lithium-ion battery remaining useful life. Appl
Energy 341. https://doi.org/10.1016/j.apenergy.2023.121043
24. Ecker L, Schlacher K (2022) An approximation of the Bayesian state observer with Markov
chain Monte Carlo propagation stage. In: IFAC-PapersOnLine, Elsevier B.V., pp 301–306.
https://doi.org/10.1016/j.ifacol.2022.09.112
25. Bonilha CS (2022) BCyto: a shiny app for flow cytometry data analysis. Mol Cell Probes 65.
https://doi.org/10.1016/j.mcp.2022.101848
26. Li Y (2020) Towards fast prototyping of cloud-based environmental decision support systems
for environmental scientists using R Shiny and Docker. Environ Model Softw 132. https://doi.
org/10.1016/j.envsoft.2020.104797
27. Lye A, Cicirello A, Patelli E (2022) An efficient and robust sampler for Bayesian inference:
transitional ensemble Markov chain Monte Carlo. Mech Syst Signal Process 167. https://doi.
org/10.1016/j.ymssp.2021.108471
Securing the Digital Realm: Unmasking
Fraud in Online Transactions Using
Supervised Machine Learning
Techniques
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 477
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_37
478 G. Y. Reddy et al.
1 Introduction
2 Related Work
security intact, one must identify and halt suspicious transactions. It is now feasible
to identify unusual activity within the transactions, thanks to machine learning algo-
rithms and the availability of historical datasets. Balanced data from an existing
dataset must be constructed before using machine learning techniques like DT, kNN,
LR, SVM, RF, and XGBoost [1, 7–9] to detect fraud activities.
Before creating the customer data records, the fraud protection system gathers the
customer’s electronic identity, including their address, phone number, email address,
spending patterns, history of payments, and other details. Learning improves the
accuracy of financial projections and propels corporate expansion. Artificial intelli-
gence and machine learning [10] evaluate an investor’s financial status, risk tolerance,
and investment goal before recommending a moderate, reasonable, or aggressive
portfolio based on the investor’s needs. Currently, the credit score—which considers
the quantity of active cards, payment history, and active loans—is used by banks
when processing credit card applications. These days, many insurance companies
are creating artificial intelligence apps for fraud detection, claim processing, and
underwriting.
The dataset, sourced from Kaggle [11], comprises 11 attributes, including ‘step,’
‘type,’ ‘amount,’ ‘nameOrig,’ ‘oldbalance–Org,’ ‘newbalanceOrig,’ ‘nameDest,’
‘oldbalanceDest,’ ‘new–balanceDest,’ ‘isFraud,’ and ‘isFlaggedFraud.’ The data
types range from int64 to float64, with three object columns. The dataset provides
information on transaction steps, types, amounts, originator and destination names,
old and new balances, and fraud indicators. The memory usage for this dataset is
more than 534.0 MB.
The following gives the details of the attributes present in the dataset:
• Transaction Steps (step): It represents the chronological order of transactions,
offering a temporal dimension to the dataset.
• Transaction Type (type): It categorizes transactions into different types, providing
information on the nature of the financial activity (e.g., payment, transfer).
• Transaction Amount (amount): It specifies the monetary value involved in each
transaction, a crucial parameter for fraud detection.
• Originator Name (nameOrig): It identifies the entity initiating the transaction,
contributing to understanding transactional patterns.
482 G. Y. Reddy et al.
Data preparation is essential in detecting online fraud and ensuring the data is reliable,
relevant, and fit for analysis. The data preprocessing processes for detecting online
fraud are as follows:
• Data Cleaning: To prevent the analysis from being skewed, identical transactions
are removed first. Missing values are next dealt with, and, finally, outlier detection
is done.
• Feature Selection: Relevant features are found by selecting those that substan-
tially improve the detection model or are more likely to be correlated with fraud.
Figure 2 shows the correlation matrix used for feature selection. One of the statis-
tical methods for assessing the association between two features in a dataset is to
create a correlation matrix [12]. The matrix is a table where each cell has a corre-
lation coefficient, with 1 denoting a strong association, 0 a neutral relationship,
and −1 a weak relationship between the variables.
• Data Normalization: This step normalized the features to a standard range to
keep some features from dominating the model based only on their magnitude.
Figure 3 shows the normalization done for ‘isFraud’ feature.
• Data Balancing: In this research study, the classes are balanced using the SMOTE
and ENN techniques [6].
• Data Splitting: In this step, the dataset is divided into two sets, one for training the
models and the other for testing and evaluating the models’ performance. In order
to help the machine learning model in this study discover patterns, relationships,
and features from a variety of samples, a significant portion of the data (80%) is
used for training the model [13]. The remaining twenty percent is set aside for
performance testing of the trained model. In order to evaluate how effectively the
model generalizes to new, unseen data, this set serves as a separate analysis dataset.
More training data lowers the likelihood of overfitting, a condition in which the
Securing the Digital Realm: Unmasking Fraud in Online Transactions … 483
model learns the training set by memorizing rather than the underlying patterns.
The performance of the model is then accurately evaluated using the testing set.
Also, the evaluation results are more statistically significant and dependable when
the testing set is large (20% of the data). This helps us draw more accurate decisions
about the model’s effectiveness.
484 G. Y. Reddy et al.
This study uses multiple libraries and frameworks to develop an online fraud detec-
tion system in Python, including tools for data preprocessing, machine learning, and
evaluating models. Pandas and NumPy are used for data preprocessing, Scikit-learn
for machine learning classifiers and evaluation metrics such as accuracy, precision,
recall, F1-score, ROC AUC, and confusion matrix, SMOTE for handling class imbal-
ance in the dataset, matplotlib for data visualization, and the following machine
learning model libraries for comparative analysis of fraud detection system [14]:
• XGBClassifier(objective = ‘multi:softprob’, n_estimators = num_estimators)
• LogisticRegression(penalty = ‘l2’, solver = ‘lbfgs’, max_iter = 100)
• KNeighborsClassifier(n_neighbors = 5, metric = ‘minkowski’)
• DecisionTreeClassifier(criterion = ‘gini’)
• RandomForestClassifier(n_estimators = 100, criterion = ‘gini’, min_samples_
split = 2)
• GaussianNB(*, priors = None, var_smoothing = 1e-09)
• SVC(C = 1.0, kernel = ‘rbf’)
When selecting a machine learning model for online fraud detection, factors such as
data volume, feature complexity, computational resources, interpretability, real-time
inference needs, and the balance of false positives and false negatives are considered.
Based on the literature review, seven machine learning models are considered in this
research study: Logistic Regression (LR), Support Vector Machine (SVM), Naïve
Bayes (NB), Decision Tree (DT), k-Nearest Neighbor (kNN), Random Forest (RF),
and XGBoost (XGB).
486 G. Y. Reddy et al.
Fig. 6 Undersampling using ENN and oversampling using SMOTE for handling imbalanced
datasets [11]
The fraud detection dataset [11] is used for experiments, and Fig. 7 and Table 2
show the findings of the evaluation of the seven machine learning (ML) algorithms
considered in this study.
In comparing the machine learning algorithms for fraud detection in online trans-
actions, the performance metrics revealed distinct characteristics that underscore the
strengths and weaknesses of each approach. Logistic regression, with a creditable
accuracy of 92%, exhibits a balanced precision, recall, and F1-score, making it a
reliable choice. K-Nearest Neighbors (kNN) influences with a good accuracy of
96%, demonstrating robustness in identifying fraudulent transactions. Decision Tree
Securing the Digital Realm: Unmasking Fraud in Online Transactions … 487
Accuracy
0.58
and Random Forest models showcase accuracy values of 94% and 97%, respec-
tively. These tree-based algorithms are well-suited for fraud detection, effectively
minimizing false positives and false negatives. XGBoost stands out as the optimal
algorithm, boasting an outstanding accuracy of 99% and excelling across all metrics
compared to other models. Its ability to strike a harmonious balance between preci-
sion and recall makes it well-suited for fraud detection tasks. The algorithm’s versa-
tility and effectiveness are evident in its high AUC-ROC score, affirming its status
as the frontrunner. In contrast, Naive Bayes exhibits limitations in accuracy and
overall performance, highlighting its challenges in handling the intricacies of fraud
detection. Support Vector Machine (SVM) achieves an accuracy of 96%.
488 G. Y. Reddy et al.
5 Conclusion
To sum up, designing and implementing an online fraud detection system is essential
for contemporary financial risk management and cybersecurity. By utilizing algo-
rithms for machine learning, data preparation methods, and precise evaluation proce-
dures, organizations can improve their capacity to identify and address fraudulent
activity. Seven ML models are used in this research study. During the experimenta-
tion process, XGBoost yielded the highest accuracy of 99%. When dealing with tasks
where the fundamental structures are complicated for simpler models to comprehend,
XGBoost can effectively capture complex associations and non-linear properties in
the data. Regularization techniques like L1 (Lasso) and L2 (Ridge), which penalize
complex models, are included in XGBoost and help prevent overfitting. The gradient
boosting method, on which XGBoost is based, sequentially combines several deci-
sion trees to create a powerful predictive model. Due to these, accuracy and robustness
are increased with this ensemble method. The Naive Bayes classifiers in the study
performed poorer because they assigned a zero probability to feature values in the
test data that were not visible or present in the training data. This caused problems
during the classification process.
In the future, Distributed Ledger Technologies (DLTs) and Blockchain may be
used to improve transaction security, traceability, and transparency. This will allow
for safe, unchangeable transaction records that can help with fraud identification
and prevention. By employing network analysis and graph analytics tools to find
intricate fraud networks, connections between malicious entities can be determined,
and coordinated attacks or multi-party fraud schemes can be found.
Acknowledgements The authors are indebted to the faculty members of the CSE department at
Christ University, Bangalore, India, for the invaluable infrastructure provided and their technical
support.
References
1. Hussain SKS, Reddy ESC, Akshay KG, Akanksha T (2021) Fraud detection in credit card
transactions using SVM and random forest algorithms. In: 2021 fifth international conference
on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). Palladam, India, pp
1013–1017. https://doi.org/10.1109/I-SMAC52330.2021.9640631
2. Alarfaj FK, Malik I, Khan HU, Almusallam N, Ramzan M, Ahmed M (2022) Credit card fraud
detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access
10:39700–39715. https://doi.org/10.1109/ACCESS.2022.3166891
3. Ghaleb FA, Saeed F, Al-Sarem M, Qasem SN, Al-Hadhrami T (2023) Ensemble synthesized
minority oversampling-based generative adversarial networks and random forest algorithm for
credit card fraud detection. IEEE Access 11:89694–89710. https://doi.org/10.1109/ACCESS.
2023.3306621
4. Esenogho E, Mienye ID, Swart TG, Aruleba K, Obaido G (2022) A neural network ensemble
with feature engineering for improved credit card fraud detection. IEEE Access 10:16400–
16407. https://doi.org/10.1109/ACCESS.2022.3148298
Securing the Digital Realm: Unmasking Fraud in Online Transactions … 489
5. Mienye ID, Sun Y (2023) A deep learning ensemble with data resampling for credit card fraud
detection. IEEE Access 11:30628–30638. https://doi.org/10.1109/ACCESS.2023.3262020
6. Ileberi E, Sun Y, Wang Z (2021) Performance evaluation of machine learning methods for credit
card fraud detection using SMOTE and AdaBoost. IEEE Access 9:165286–165294. https://doi.
org/10.1109/ACCESS.2021.3134330
7. Panthakkan A, Valappil N, Appathil M, Verma S, Mansoor W, Al-Ahmad H (2022) Perfor-
mance comparison of credit card fraud detection system using machine learning. In: 2022
5th international conference on signal processing and information security (ICSPIS). Dubai,
United Arab Emirates, pp 17–21. https://doi.org/10.1109/ICSPIS57063.2022.10002517
8. Karkhile K, Raskar S, Patil R, Bhangare V, Sarode A (2023) Enhancing credit card security:
a machine learning approach for fraud detection. In: 2023 7th international conference on
computing, communication, control and automation (ICCUBEA). Pune, India, pp 1–6. https://
doi.org/10.1109/ICCUBEA58933.2023.10392165
9. Aladakatti D, G P, Kodipalli A, Kamal S (2022) Fraud detection in online payment transaction
using machine learning algorithms. In: 2022 international conference on smart and sustainable
technologies in energy and power sectors (SSTEPS). Mahendragarh, India, pp 223–228. https://
doi.org/10.1109/SSTEPS57475.2022.00063
10. Liu ACC, Law OMK, Law I (2022) Finance, in understanding artificial intelligence:
fundamentals and applications. IEEE, pp 77–88, https://doi.org/10.1002/9781119858393.ch8
11. https://www.kaggle.com/datasets/jainilcoder/online-payment-fraud-detection. Accessed 10
Dec 2023
12. Tekkali CG, Natarajan K, Bhuvanesh VM (2023) A novel classification approach for smart card
fraud detection. In: 2023 international conference on advances in computation, communication
and information technology (ICAICCIT). Faridabad, India, pp 169–173. https://doi.org/10.
1109/ICAICCIT60255.2023.10466027
13. Xu Y, Goodacre R (2018) On splitting training and validation set: a comparative study of cross-
validation, bootstrap and systematic sampling for estimating the generalization performance
of supervised learning. J Anal Test 2(3):249–262. https://doi.org/10.1007/s41664-018-0068-2
14. Mangal E, Shubham D, Gussain R (2023) Credit card fraud detection using python & machine
learning algorithms. Int J Res App Sci & Eng Tech 11(5):3120–3128
High-Speed Parity Number Detection
Algorithm in RNS Based on Akushsky
Core Function
Abstract The Residue Number System is widely used in cryptography, digital sig-
nal processing, image processing systems and other areas where high-performance
computation is required. One of the computationally expensive operations in the
Residue Number System is the parity detection of a number. This paper presents
a high-speed algorithm for parity detection of numbers in Residue Number Sys-
tem based on Akushsky core function. The proposed approach for parity detection
reduces the average time by 20.39% compared to the algorithm based on the Chinese
Remainder Theorem.
1 Introduction
In the modern world, where the speed of data processing plays a key role, the search
for efficient algorithms becomes an integral part of software development. One of
the tools to improve the speed of information systems is the Residue Number System
(RNS). RNS is used in areas such as blockchain [1], homomorphic encryption [2],
digital signal and image processing [3, 4], communication systems [4], highly reliable
cloud environments [5], neural networks [6].
Since RNS is a non-positional number system, there are a number of so-called
non-modular operations that are difficult to perform in RNS. These operations
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 491
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_38
492 V. Lutsenko et al.
include number division [7], number sign detection [8], number comparison [9],
base expansion [10], scaling [11] and parity detection.
Parity detection is directly required in the division [7, 12] and error correction
[13] algorithms in the RNS. In this paper, we present a high-speed algorithm for
determining the parity of a number in a residual class system based on Akushsky
core function (ACF). Akushsky core function is a mathematical function that is used
to determine the positional characteristic in RNS.
The paper has the following structure. Section 2 discusses the Residue Number
System. Section 3 presents parity algorithms in RNS based on inverse conversion.
Then, Sect. 4 studies the Akushsky core function and the parity algorithm using it.
Finally, Sect. 5 analyses the performance of the proposed method. Finally, the results
obtained are summarized.
RNS is based on the widely known Chinese Remainder Theorem (CRT) [4, 14]. She
argues that knowing the smallest non-negative residues from dividing an integer . X
by the integer moduli . p1 , p2 , . . . , pn it is possible to uniquely determine the residue
from dividing. X by the product of these moduli, provided that the moduli are pairwise
mutually simple. RNS, unlike classical .b-ary number systems, is not defined by a
single fixed base, but by a set of moduli .{ p1 , p2 , . . . , pn } such that .gcd pi , p j = 1
for all .i, j ∈ {1, 2, . . . , n} , i =j, where .gcd() is the greatest common divisor. The
n
product of these moduli . P = i=1 pi determines the dynamic range of the RNS.
An integer . X ∈ [0, P − 1] is represented as a vector composed of the smallest
non-negative residues obtained by dividing . X by . pi :
. X = (x1 , x2 , . . . , xn ), (1)
Table 1 Representation of Numbers for RNS with the Basis .{5, 7} on the Interval .[0, 15]
RN S RN S RN S RN S
.0 −→ (0, 0) .1 −→ (1, 1) .2 −→ (2, 2) .3 −→ (3, 3)
RN S RN S RN S RN S
.4 −→ (4, 4) .5 −→ (0, 5) .6 −→ (1, 6) .7 −→ (2, 0)
RN S RN S RN S RN S
.8 −→ (3, 1) .9 −→ (4, 2) .10 −→ (0, 3) .11 −→ (1, 4)
RN S RN S RN S RN S
.12 −→ (2, 5) .13 −→ (3, 6) .14 −→ (4, 0) .15 −→ (0, 1)
RNS defines basic operations on numbers, which are divided into two groups.
The operations of the first group, which are sometimes called modular, include addi-
tion and subtraction of numbers without the possibility of determining the sign of
the result, as well as multiplication. Such operations are performed componentially
with remainders, i.e. without forming carryovers between them. Let the numbers
. X, Y and . Z be represented as .(x 1 , x 2 , . . . , x n ) , (y1 , y2 , . . . , yn ) and .(z 1 , z 2 , . . . , z n ),
respectively. Then for any modular operation .◦ we have
. Z = |x1 ◦ y1 | p1 , |x2 ◦ y2 | p2 , . . . , |xn ◦ yn | pn . (2)
That is, the .i-th digit of the result in RNS, .z i , is defined only in terms of .|xi ◦ yi | pi and
does not depend on any other digit .z j . This allows the realization of carry-free, high-
speed (parallel) computer arithmetic and makes RNS an attractive number system
for use in resource-intensive applications, especially those involving the processing
of large numbers. It also provides high computational reliability since an error in the
i-th digit has no effect on other digits and therefore can be efficiently localized and
eliminated [15]. In turn, for operations of the second group, often called non-modular,
it is not enough to know the values of individual residues and requires an estimate
of the magnitude of numbers: the result of such an operation is either not a number
in RNS at all, or the value of each of its digits (residue) is not only a function of the
values of the corresponding digits of the operands, but depends on the magnitude of
these operands. Unfortunately, the non-positional structure of RNS does not allow to
efficiently estimate the value of a number by residues, and this circumstance is the
main factor restraining the widespread use of RNS as an alternative to .b-ary number
systems.
RN S
Hence . Z = (0, 4). This is true since .25 −→ (0, 4).
494 V. Lutsenko et al.
P P
. P1 = = 7, P2 = = 5.
p1 p2
Having these values, you can calculate the value of the number . X , according to
the (3):
. X = |7 · 3 · 3 + 5 · 4 · 3|35 = |123|35 = 18.
| Pi−1 |
where .ki = pi pi constants of the chosen system, and the (4) gives a result within
the interval .[0, 1). In this context, the process of determining the remainder with a
larger modulo is replaced by simply discarding the integer part, a simple operation
to implement. To get the exact value, the fractional part is multiplied by . P.
To illustrate parity detection with Approximate Method (AM), let us consider the
following Example 3.
3 3
k =
. 1 , k2 = .
5 7
Then, using (4) it is easy to find:
X 3 3 123 18 18
= 3 · + 4 · = = 3 =
,
7 1 35 1 35 1
.
P 5 35
then
18
. X= · 35 = 18.
35
Thus (3, 4) is an even number.
In [18], a combined method for conversion from CRT to binary number system, the
Mixed-Radix Chinese Remainder Theorem (MR CRT), which combines the merits
of CRT and Mixed-Radix Conversion (MRC) is proposed [19] methods. According
to MR CRT, the recovery of the integer value of . X is performed according to the
W1 = 1,
W 2 = p1 ,
. W 3 = p1 p2 , (6)
...
W 3 = p1 p2 . . . pn .
x̄1 = x1 ,
x̄2 = |τ1 x1 + τ2 x2 | p2 ,
. x̄3 = |(τ1 x1 +τ 2 x2 + τ3 x3 ) / p2 | p3 , (7)
...
x̄n = |(τ1 x1 +τ 2 x2 + . . . + τn xn ) / p2 p3 . . . pn−1 | pn .
. W1 = 1; W2 = 5.
7·3−1 5·3
τ =
. 1 = 4; τ2 = = 3.
5 5
Next, let us calculate the digits of the mixed representation
n
X
C (X ) = C X =
. wi · . (9)
i=1
pi
n
P
n
C (P) = C P =
. wi · = wi · Pi . (10)
i=1
pi i=1
Equation (9) is unsuitable for use in practice, let us introduce the following way
of calculating the core function
n
.C (X ) = C(Bi ) · xi . (12)
i=1 C(P)
The weights are independent of the number and are chosen together with the
system . An algorithm for determining the optimal weights for the Akushsky core
function is presented in [21].
Let
. X = (x 1 , x 2 , . . . , x n ) , Y = (y1 , y2 , . . . , yn ) .
n
X n
Y
CX =
. wi , CY = wi .
i=1
pi i=1
pi
And let it be
. X + Y = (δ1 , δ2 , . . . , δn ) ,
where
δ = |xi + yi | pi = xi + yi − εi pi ,
. i (13)
498 V. Lutsenko et al.
thus we have
0 i f xi + yi < pi ,
. i ε = (14)
1 i f xi + yi ≥ pi .
X +Y X +Y X +Y
C X +Y = w1
. + w2 + . . . + wn .
p1 p2 pn
X + Y − δi X − xi Y − yi xi + yi − δi
. − − = .
pi pi pi pi
xi + yi − δi xi + yi − xi − yi + εi pi
. = = εi .
pi pi
Hence,
n
C X +Y = C X + CY +
. wi · εi . (15)
i=1
n
. C2X = 2C X + wi · εi .
i=1
Here .εi = 0 if .2xi < pi and .εi = 1 if .2xi ≥ pi . Suppose that all . pi are odd
numbers. n
Let the number . X be even, then .C X = 2C X2 + i=1 wi · εi , whence
n
CX − wi · εi
C X2 =
.
i=1
.
2
Now, let us say that
X
. = (β1 , β2 , . . . , βn ) .
2
High-Speed Parity Number Detection Algorithm in RNS … 499
Then .xi = 2β i − εi pi . But since . pi is an odd number, .εi = 1, if .xi is odd, and
ε = 0 if .xi is even. Let .ψi denote the function of even .xi , i.e. .ψi = 0 if .xi is even,
. i
and .ψi = 1, if .xi is odd.
Let us introduce the parity function of the number
n
. E (X ) = ψi wi . (16)
i=1
Consider an example.
Example 5 (Parity detection using Akushsky core function) Continuing to work with
the system of moduli .{5, 7} and the number . X = (3, 4). And let the weights of the
system .w1 = 0, w2 = 1 then
C (P) = 0 · 7 + 1 · 5 = 5,
.
. B1 = 7 · 3 = 21, B2 = 5 · 3 = 15.
21 21
.C (B1 ) = 0 · +1· = 3,
5 7
500 V. Lutsenko et al.
15 15
C (B2 ) = 0 ·
. +1· = 2.
5 7
C (X ) = |3 · 3 + 2 · 4|5 = 2.
.
For a given number . X .ψi and the parity function . E (X ) are equal to
ψ1 = 1, ψ2 = 0,
.
. E (X ) = 1 · 0 + 0 · 1 = 0.
Hence, .C(X ) and . E(X ) have the same parity, so . X = (3, 4) is even.
5 Performance Evaluation
10 .{257, 263, 269, 271, 277, 281, 283, 293, 307, 337}
High-Speed Parity Number Detection Algorithm in RNS … 501
Table 4 shows that the algorithm based on Akushsky core function is on average
19.16% faster than CRT method. According to the results of the stage B of the study,
the algorithm based on the Akushsky core function is on average 21.62% faster than
the CRT method.
It is worth noting that when using the Akushsky core function we do not have
to recover a number, which means that the algorithm can be effectively used within
RNS along with other operations related to the core function.
High-Speed Parity Number Detection Algorithm in RNS … 503
6 Conclusion
This work was aimed at investigating the Akushsky core function for the parity
number detection in RNS. The proposed method reduces the time on average by
20.39% compared to the algorithm based on the Chinese Remainder Theorem. These
results will be useful in applying the algorithm to division and error correction in
RNS.
In view of these results, further research will be directed towards the implemen-
tation of this algorithm in fog computing working in the RNS.
Acknowledgements The research was supported by the Russian Science Foundation Grant No.
22-71-10046, https://rscf.ru/en/project/22-71-10046/.
References
1. Guo Z, Gao Z, Mei H, Zhao M, Yang J (2019) Design and optimization for storage mechanism
of the public blockchain based on redundant sesidual number system. IEEE Access 7:98546–
98554. https://doi.org/10.1109/ACCESS.2019.2930125
2. Al Badawi A, Polyakov Y, Aung KM, Veeravalli B, Rohloff K (2021) Implementation and
performance evaluation of RNS variants of the BFV homomorphic encryption scheme. IEEE
Trans Emerg Top Comput 9(2):941–956. https://doi.org/10.1109/TETC.2019.2902799
3. Isupov K (2020) Using floating-point intervals for non-modular computations in residue num-
ber system. IEEE Access 8:58603–58619. https://doi.org/10.1109/ACCESS.2020.2982365
4. Omondi AR, Premkumar AB (2007) Residue number systems: theory and implementation.
World Scientific
5. Chervyakov N, Babenko M, Tchernykh A, Kucherov N, Miranda-Lopez V, Cortes-Mendoza
JM (2019) AR-RRNS: Configurable reliable distributed data storage systems for Internet of
Things to ensure security. Future Gener Comput Syst 92:1080–1092. https://doi.org/10.1016/
j.future.2017.09.061
6. Valueva MV, Nagornov NN, Lyakhov PA, Valuev GV, Chervyakov NI (2020) Application
of the residue number system to reduce hardware costs of the convolutional neural network
implementation. Math Comput Simul 177:232–243. https://doi.org/10.1016/j.matcom.2020.
04.031
7. Lutsenko VV, Babenko MG, Tchernykh AN, Lapina MA (2024) Optimization of a number
division algorithm in the residue number system based on the Akushsky core function. In:
Proceedings of the institute for system programming of the RAS (Proceedings of ISP RAS),
vol 35, no 5, pp 157–168. https://doi.org/10.15514/ISPRAS-2022-35(5)-11
8. Sousa L, Martins P (2017) Sign detection and number comparison on RNS 3-moduli sets
.{2 − 1, 2
n n+x , 2n + 1}. Circuits Syst Signal Process 36(3):1224–1246. https://doi.org/10.
1007/s00034-016-0354-z
9. Babenko M, Piestrak SJ, Chervyakov N, Deryabin M (2021) The study of monotonic core
functions and their use to build RNS number comparators. Electronics 10(9), Art. no. 9. https://
doi.org/10.3390/electronics10091041
10. Lutsenko V, Bezuglova E (2023) An efficient implementation of the montgomery algorithm
using the Akushsky core function. In: International workshop on advanced information security
management and applications. Springer Book Series, Cham, in press
11. Burgess N (2003) Scaling an RNS number using the core function. In: 16th IEEE symposium
on computer arithmetic, Proceedings., Santiago de Compostela, Spain: IEEE Comput Soc
2003:262–269. https://doi.org/10.1109/ARITH.2003.1207687
504 V. Lutsenko et al.
12. Lu M, Chiang J-S (1992) A novel division algorithm for the residue number system. IEEE
Trans Comput 41(8):1026–1032. https://doi.org/10.1109/12.156545
13. Armand A, Timarchi S, Mahdavi H (2019) Optimized parity-based error detection and correc-
tion methods for residue number system. J Circuits Syst Comput 28(01):1950002. https://doi.
org/10.1142/S0218126619500026
14. Shoup V (2009) A computational introduction to number theory and algebra. Cambridge
University Press
15. Goh VT, Siddiqi MU (2008) Multiple error detection and correction based on redundant residue
number systems. IEEE Trans Commun 56(3):325–330. https://doi.org/10.1109/TCOMM.
2008.050401
16. Chervyakov NI, Molahosseini AS, Lyakhov PA, Babenko MG, Deryabin MA (2017) Residue-
to-binary conversion for general moduli sets based on approximate Chinese remainder theorem.
Int J Comput Math 94(9):1833–1849. https://doi.org/10.1080/00207160.2016.1247439
17. Soderstrand M, Vernia C, Chang J-H (1983) An improved residue number system digital-
to-analog converter. IEEE Trans Circuits Syst 30(12):903–907. https://doi.org/10.1109/TCS.
1983.1085311
18. Bi S, Gross WJ (2008) The mixed-radix chinese remainder theorem and its applications to
residue comparison. IEEE Trans Comput 57(12):1624–1632. https://doi.org/10.1109/TC.2008.
126
19. Szabo NS, Tanaka RI (1967) Residue arithmetic and its application to computer technology.
McGraw-Hill
20. Akushsky IY, Burtsev VM, Pak IT (1977) Calculation of the positional characteristic (core) of
the non-positional code. In: Theory of coding and optimization of complex systems, pp 17–25
21. Shiriaev E, Kucherov N, Babenko M, Lutsenko V, Al-Galda S (2023) Algorithm for determin-
ing the optimal weights for the Akushsky core function with an approximate rank. Appl Sci
13(18):10495
A Review: 5G Unleashed Pioneering
Leadership, Global Deployment,
and Future International Policies
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 505
S. Kumar et al. (eds.), Proceedings of International Conference on Communication and
Computational Technologies, Lecture Notes in Networks and Systems 1121,
https://doi.org/10.1007/978-981-97-7423-4_39
506 N. KrishanVyas et al.
1 Introduction
A mere few years have transpired since the inaugural deployment of the fifth-
generation (5G) mobile network in 2019. Evolving beyond its initial role of
connecting individuals, 5G has swiftly advanced to facilitate connectivity among
machines and objects, solidifying its status as a burgeoning global standard. While
the ongoing deployment and adoption of 5G persist, the landscape of mobile
network technology undergoes continuous refinement, with enhancements to 5G
underway and the inception of the sixth generation (6G) already in progress [1]. It is
crucial to recognize that 5G represents more than a mere incremental improvement
over previous mobile technologies, introducing novel features that promise radical
transformations. Beyond augmented speed and reduced latency in data transmis-
sion, 5G brings forth capabilities fostering enhanced machine-to-machine interac-
tions, even without direct human agency, and accentuates the significance of edge
computing [2]. Consequently, 5G’s influence is already accelerating digitalization
across diverse sectors, impacting areas such as the Internet of Things (IoT) and laying
the groundwork for the anticipated rise of the Metaverse [3].
Recent data reveals a burgeoning ecosystem of connected IoT devices, surpassing
11 billion by the close of 2021 and projected to reach nearly 20 billion by 2025 [4].
The economic ramifications of 5G and its subsequent developments are poised to be
monumental, with estimates suggesting revenue growth of up to $13.2 trillion and the
creation of as many as 22 million jobs by 2035 [5]. The transformative potential of 5G
extends far beyond the telecommunications industry, promising substantial impacts
on various sectors, albeit the specific industries most affected remain uncertain. While
certain sectors are already witnessing transformative shifts with the initial adoption of
5G, others are expected to unveil their transformative potential as utilization expands,
ultimately enveloping a broader array of industries than initially anticipated [6].
The transformative impact of 5G technology is already underway, notably within
the gaming industry and the burgeoning exploration of the Metaverse. While these
applications are gaining traction, additional use cases and opportunities are poised to
emerge, particularly in sectors where innovation deployment is robust. These encom-
pass a broad spectrum of applications, spanning vehicle automation, smart homes,
smart agriculture, industrial manufacturing, health care, and logistics automation [7].
The Metaverse, a concept gaining recognition, is structured into seven layers, with
the foundational layer reliant on robust technological infrastructure [8]. Connectivity
technologies, including telecommunications networks, such as fiber, 5G, and antic-
ipated 6G, are pivotal in this foundational layer, alongside cloud systems and semi-
conductors [9]. Crucially, the immersive experience within the Metaverse hinges on
addressing latency issues, wherein 5G and subsequent mobile technology evolutions
prove instrumental by offering the necessary speed and latency, particularly on mobile
devices. Given the diverse applications and anticipated significance across various
fields such as IoT, Metaverses, AI, cloud, and edge computing, 5G and its subse-
quent evolutions could potentially evolve into a general-purpose technology (GPT)
[10]. Similar to the revolutionary effects of personal computers and the Internet,
A Review: 5G Unleashed Pioneering Leadership, Global Deployment … 507
GPTs have historically proven disruptive because of their ubiquity, ability to support
innovation processes, and quick evolution. The adoption of 5G as a global platform
technology (GPT) is largely contingent on its acceptance as the next global stan-
dard for mobile communication between humans and machines [11]. Institutions
that create standards will enable and encourage the global adoption of 5G [12].
The ongoing debate surrounding 5G extends beyond technological considerations
and delves into two main issues: industrial and/or geopolitical leadership in techno-
logical innovation and the challenges associated with global deployment. In light
of these worries, an important question about how a new 5G-focused EU indus-
trial policy may close the current disparity in technological leadership and adoption
emerges, underscoring the complex nature of the conversation about the direction
that 5G technology will take in the future.
2 Art of 5G Technology
Table 1 (continued)
Contribution Data Identification Method Main results
strategy
Parcu et al. Patents from Co-occurrence The In order of ascendancy,
[10, 45] USPTO and and literature definition of the United States leads,
EPO by analysis are used technological followed by China, Japan,
country for the to find complexity Korea, and Europe.
years technologies and was given by However, if one looks at
2010–2019 keywords that are Hausmann areas rather than specific
used in patent and Hidalgo nations, Europe is close to
selection Asia, and neither
continent is really far
from America
Patents Analyzed Various Leading Chinese and
analyzed by according to metrics Korean businesses are
IPlytics using technical concerning closely followed by US
ETSI data, specifications patent and European businesses.
categorized by adhering to the families, However, US
company and 3GPP or project including corporations come out on
country, up descriptions counts, top when the weighting is
until December related to “5G” or normalized based on the nations
2019 “new radio” counts, and where the patents are
forward submitted. When it comes
citations to forward citations, the
results are more varied
and ensure the EU’s active and influential role in shaping the trajectory of emerging
telecommunications technologies on a global scale.
While positioning the European Union at the forefront of 5G technologies is
unquestionably imperative, developing a robust standardization system concurrently
presents a global opportunity for the widespread utilization and implementation of
5G. However, what emerges as equally critical and urgent for the EU in the coming
years is the expeditious deployment of new networks in tandem with the technolog-
ical evolution. The forthcoming section will discuss the deployment of 5G networks
in Europe, revealing that the region lags. The challenges stem from the complex-
ities associated with investments in what appears to be a considerably fragmented
landscape when compared to other areas across the globe [24]. Overcoming these
challenges and accelerating the deployment of 5G networks synchronized with tech-
nological advancements is paramount for the EU to fully capitalize on the transfor-
mative potential of 5G and maintain its competitive stance in the evolving global
telecommunications landscape.
512 N. KrishanVyas et al.
3 International Deployment of 5G
minimum speed and maximum capacity) and fundamental information. For instance,
coverage details for major roads and railways are currently reported only by Finland.
Consequently, the declared 72% coverage for the entire EU does not inherently
guarantee a specific level of service quality. The European Commission has taken
a forward-looking approach by developing a common monitoring mechanism to
address this issue. This mechanism, outlined within the context of its 2030 Policy
Programme “Path to the Digital Decade,” aims to standardize reporting and offer
a more consistent and comprehensive evaluation of 5G deployment, addressing
concerns related to service quality and coverage uniformity across member states.
A crucial aspect in evaluating the performance of 5G deployment is the distinction
among its various frequency bands, including low-band, mid-band, and high-band
frequencies. Reaching speeds much faster than 4G is a prerequisite for 5G to reach
its full potential, and higher frequency bands like 26 GHz (high-band) and 3.4–
3.8 GHz (mid-band) are the only ones where this is possible. Sub-1 GHz bands, like
700 MHz, are essential for covering large regions and interior spaces, but they may
yield slower download speeds than 5G in the 3.6 GHz spectrum. On the other hand,
the 26 GHz band provides 5G at fast speeds, but because of its restricted propagation
characteristics, it works best in areas with extremely high densities. As a result, the
3.6 GHz frequency is believed to be essential for providing customers with 5G that
balances coverage and speed concerns [29].
The most often assigned frequency for 5G deployment in the European Union
is currently the 3.6 GHz spectrum band. Almost 84% of the available spectrum is
represented by the 25 out of 27 Member States that have finished assigning this band.
As the 5G Observatory Report highlights, Estonia and Lithuania have made signif-
icant strides in this direction, demonstrating improvements over the prior period.
As a result of what is thought to be a lack of demand, just 8 countries have been
allotted the 26 GHz band, in contrast, which presents a different situation (as stated
by Plum Consulting, 2021). The allocations in these countries account for less than
thirty percent of the EU’s 26 GHz band’s available spectrum. One major problem
for the 26 GHz band has been recognized as the fragmented approach followed by
EU members in spectrum assignment, exemplified by discrepancies between deci-
sions made in countries like Germany and Italy, as indicated in the 5G [29]. This
fragmentation raises concerns about the harmonization and efficient utilization of
spectrum resources across the EU, impacting the potential for consistent and optimal
5G deployment.
Observers frequently highlight the issue of very high prices as a significant chal-
lenge in spectrum auctions within the European Union [30]. The high auction costs
are a barrier to entry for new players in the market and, more importantly, as many
global mobile network operators (MNOs) have noted, lower the amount of resources
available for implementing 5G. As a result, there may be delays in the network’s
actual implementation. But when you look at typical European prices, especially for
the mid-band spectrum, they don’t seem much higher than in the US and Canada.
Rather, the main cause for concern is the wide range of pricing found in EU member
states. For example, Italian operators pay eight times more per megahertz compared
to their Finnish counterparts [31]. This fragmented approach to spectrum policies
514 N. KrishanVyas et al.
within the EU raises clear challenges for achieving a cohesive and uniform European
deployment, further emphasizing the need for greater spectrum pricing strategies
harmonized across member states [32].
Positively, the recently proposed Gigabit Infrastructure Act is expected to expe-
dite and simplify the implementation of networks throughout the European Union;
thus, hopes are high. By lowering administrative expenses and hassles, automating
permission processes, enabling cooperative infrastructure use, and promoting the
construction of fiber networks, this Act seeks to accomplish these goals. The expected
advantages include cutting related expenses in addition to quickening the deployment
of networks. The European Commission has also unveiled a Gigabit Recommenda-
tion in tandem with this legislative effort. This forthcoming guidance is designed to
assist National Regulatory Authorities in leveraging their available tools to incen-
tivize and drive faster deployment of high-speed networks across the region. These
initiatives collectively underscore a commitment to fostering a more efficient and
robust digital infrastructure landscape within the European Union.
As of 2021, the diffusion of 5G technology in Europe, as per GSMA data,
accounted for only 4% of the market, with 4G still maintaining dominance at 75%
[25]. Additionally, 3G and 2G technologies held respective shares of 15% and 6%.
Intriguingly, GSMA’s forecast anticipates a significant shift, with 5G expected to
represent a quarter of total global mobile connections by 2025, surpassing three times
the figure recorded in 2021. This ongoing “tech migration” is marked by a decline in
4G adoption, particularly in developing markets like Sub-Saharan Africa, where 4G
is projected to continue growing. The pace of this shift varies, with pioneer markets
like China, South Korea, and the US seeing the biggest adoption of 5G. Another
important variable impacting the adoption trajectory of 5G is the implementation of
the architecture, which differentiates between standalone (SA) and non-standalone
(NSA) setups. The latter, known as SA 5G, is now being introduced internationally,
and it is necessary to utilize 5G fully. It supports applications based on enhanced
mobile broadband (eMBB), ultra-reliable low-latency communications (URLLC),
and various Internet of Things use cases.
While European operators launched most 5G commercial offers in 2020, data
gathered from the 5G Observatory shows that standalone launches did not acquire
considerable traction until 2022, accounting for 38% of the total (see Fig. 1). These
independent 5G commercial launches are still centered in a small number of nations,
with Germany prominently leading the way ahead of the EU (see Fig. 2). ETNO [31]
reports that with 15 active independent 5G services, the Asia–Pacific region leads
the world in this regard [31]. Europe has four active standalone services, indicating a
modest increase from the previous year, and North America follows closely with three
services. The trend suggests a gradual but increasing adoption of standalone 5G tech-
nology in Europe, with potential for further expansion as more operators announce
plans for standalone launches in the coming years in their evolution to become
General-Purpose Technologies (GPTs). The 5G ecosystem’s growth and complexity
provide issues for businesses and policymakers. The increasing number of inter-
dependent players, each potentially conflicting or, at the very least, incompletely
aligned interests, causes coordination costs to rise.
A Review: 5G Unleashed Pioneering Leadership, Global Deployment … 515
Fig. 1 Count of commercial launches for both NSA (Non-Standalone) and SA (Standalone)
versions of 5G annually in the European union
Fig. 2 Count of commercial launches for both NSA (Non-Standalone) and SA (Standalone)
versions of 5G per country in the European union
These actors must work together to create a tightly integrated service within a more
distinct value system. The EU and its member states must provide a supportive policy
climate and comparable regulatory frameworks to realize the full benefits of 5G.
These circumstances would make it easier for 5G infrastructures to be developed and
deployed quickly and effectively. The suitability of the EU’s long-standing approach
to competition laws and regulatory frameworks in resolving the issues provided
by 5G is a crucial question, given that technology improvements and regulatory
policies are frequently path-dependent. If the current approach proves inadequate,
identifying the areas requiring immediate attention becomes imperative to ensure the
516 N. KrishanVyas et al.
recognizing the 5G rollout “as a strategic rather than merely a technological choice”
[36].
From our point of view, it is imperative to give special attention to revitalizing
an EU-wide industrial policy. In examining whether and how such a policy could
be essential in fostering the scale required to strengthen the European Union’s tech-
nological leadership and tackle the issue of underinvestment in 5G deployment,
this section goes deep. Given the complexity of the challenges, a comprehensive
and well-thought-out industrial policy might act as a catalyst, encouraging cooper-
ation, reducing fragmentation, and coordinating initiatives among member states.
By advocating for a cohesive approach, this strategy could foster an atmosphere that
encourages innovation and tackles the financial obstacles impeding the most efficient
implementation of 5G infrastructure. As the global telecommunications landscape
rapidly evolves, the EU’s commitment to a robust and coordinated industrial policy
emerges as a crucial factor in shaping its trajectory toward technological excellence
and leadership in the 5G era.
Globally, the nations leading the way in 5G deployments are those whose aspi-
rations, as outlined in detailed plans, have driven implementation ahead of and
beyond user demand. One prominent instance is South Korea, where the govern-
ment developed a deployment strategy allowing big telecoms to quickly build out
the 5G network while splitting the implementation expenses [30]. Similarly, China
specifically encouraged “national champions” to lead 5G projects, guaranteeing
telecom providers quickly switched to standalone 5G, enabling the extensive use
of IoT applications and breakthroughs in advanced manufacturing [37]. The Chinese
government’s focused guidance and substantial investments in technology research
and development have enabled the domestic industry to capitalize on economies of
scale, effectively shielding it from foreign competitors. This underscores the pivotal
role of government-driven strategies and support in propelling 5G deployment and
technological advancements globally.
The US took a major step forward in January 2021 when it adopted the much-
anticipated National Strategy to secure 5G Implementation Plan, a comprehensive
effort to aid the development and implementation of secure and resilient 5G infras-
tructure. This plan is noteworthy for its clarity since it expands on the Secure 5G and
Beyond Act, which President Trump signed into law in March 2020, and provides
concrete actions across four distinct “lines of effort.” Firstly, it emphasizes the signif-
icance of “Facilitate Domestic 5G Rollout.” This calculated action demonstrates
rising support and a bipartisan consensus for developing an industrial policy for 5G
planning. The fundamental idea is that the effective implementation of 5G, when
combined with ongoing innovation, has strategic and national importance [38]. The
timely and efficient implementation of 5G services is contingent upon the acces-
sibility of the spectrum and the creation of an investment-friendly atmosphere—
suboptimal deployment inside the EU results from the highly fragmented structure
of the European market in both respects. Spectrum availability differs throughout
the Member States in terms of scheduling, and spectrum licensing is expensive, with
Germany and Italy being two examples of this. These disparities increase the cost of
investment and produce an uneven environment throughout the EU. This difference
518 N. KrishanVyas et al.
highlights how crucial it is for the EU to reach a consensus that puts long-term soci-
etal advantages ahead of the short-term maximizing of state revenues. Harmonizing
strategies and fostering cooperation can contribute to realizing a robust and efficient
5G infrastructure throughout Europe.
Within investment costs, the upward trajectory observed in various sectors gains
particular significance when examined within the telecommunications markets.
Firstly, national fragmentation necessitates companies to navigate diverse regu-
lations, varied application procedures, and permits across borders, exacerbating
deployment costs. The European telecommunications market, characterized by
intense competition with over 70 network operators, has yielded consumer bene-
fits, such as lower prices and innovative services [39]. However, this fragmentation
has translated into reduced revenues for EU telecom operators, posing challenges
in sustaining the escalating investment costs linked with 5G. Moreover, telecom
investment presents unique challenges involving expensive and protracted capital
expenditure (CAPEX) and cyclical infrastructure upgrades. In an uncertain environ-
ment, investors favor short-term returns and investments that are less sensitive to
demand risk, influencing their decisions on where to allocate funds [40].
The European Electronic Communications Code’s regulatory framework, pro-
investment instruments, and other related legislations are examined in this regard.
Concerns regarding their incapacity to unilaterally close the current investment gap
and the difficulties in realizing the advantages of consistent investments in mobile
and fixed networks are raised by telecom operators. As a result, telecom opera-
tors are pushing big Internet companies that run Over-the-Top (OTT) services on
their networks to make a “fair share” of future network deployments. The European
Commission has held a public consultation on “The future of the electronic commu-
nications sector and its infrastructure” in response to this appeal [41]. Although it
is outside the purview of this essay to delve into the various issues surrounding this
discussion, it is important to note that the EU’s new attitude on OTT’s involvement
in telecom operators’ challenges represents a significant departure from its previous
posture, which pushed operators to “adapt or perish.”
The fragmentation of the EU telecom market has seriously hampered the ability
of EU telecom operators to make investments. The Commission’s merger control
strategy has drawn criticism since it is thought to foster artificial competition by
subtly favoring a certain number of mobile players—ideally four in each national
market. Currently, there are three mobile network operators (MNOs) in thirteen EU
member states, four in thirteen, and five in Italy. Table 2 analyses reveal that, since
2007, DG Competition has evaluated eight 4 to 3 in-country mobile mergers; six have
been approved, frequently with significant remedies, one has been blocked, and one
was withdrawn during the review process. The case in Denmark, where parties to a
A Review: 5G Unleashed Pioneering Leadership, Global Deployment … 519
4-to-3 merger abandoned the transaction due to the failure to submit adequate reme-
dies addressing the Commission’s identified competition concerns, exemplifies the
challenges associated with market consolidation in the telecommunications sector.
Historically, mobile mergers in the European telecom sector have predominantly
centered on consolidating within individual countries, subject to thorough exami-
nation by the Commission to address apprehensions about potential anticompetitive
consequences arising from reducing the number of operators. Four-to-three mergers
are typically subject to detailed scrutiny, and their clearance often necessitates in-
depth investigations and significant remedies. However, the recent decision by the
EU General Court, nullifying the EC’s rejection of the H3G/Telephonic merger, is
expected to set a higher standard for blocking such mergers. Acknowledging the
potential consequences of increased concentration, both the EU General Court and
the US FCC and DoJ recognize that heightened consolidation may lead to elevated
prices but could also contribute to enhanced network investment, facilitating the
swift deployment of multiple high-quality 5G networks. Given that more than half
of the EU Member States continue to maintain four Mobile Network Operators
(MNOs), the eventual effects of mergers on the capability of MNOs to invest in
network deployment and innovative services remain uncertain [42]. The recent judg-
ment in Commission v CK Telecoms UK by the Court of Justice, overturning the
General Court’s decision, reaffirms the Commission’s discretion in merger control.
The forthcoming Orange/MasMovil proposed merger in the Spanish telecom market
is poised to be a litmus test, shedding light on whether the Commission is inclined to
embrace a more lenient approach. Such a shift would require a nuanced assessment
of economic evidence, particularly vital for evaluating investment and innovation
incentives, especially in mergers within oligopolistic markets.
Assessing mergers using a pan-European market presents an extra barrier. Though
it has been debated, the idea of cross-border consolidation and the establishment of
pan-European champions have not materialized significantly, raising concerns about
the viability of current laws and whether telecom operators are not incentivized
to engage in such transactions. The EU’s potential economies of scale and scope
may have been severely curtailed due to heterogeneous consumer behavior across
nations, differences in infrastructure and spectrum allocations (especially pertinent
in the mobile and 5G industries), and variances in tax and labor rules. According to
EU Commissioner, the EU currently seems to encourage cross-border consolidation.
According to Liberty Global CEO Mike Fries, the request from Thierry Breton
for a “serious discussion about possible existing obstacles.” However, due to the
limited potential cost savings from existing in various regions, cross-border mergers
are improbable. He contends that for cross-border mergers to be an appealing choice,
in-country consolidation must come before them. This calls into question the order
in which markets are being consolidated as well as the true mutual exclusivity of in-
country and cross-border mergers. More research is necessary to fully understand the
trade-offs between different kinds of mergers and the advantages and disadvantages
that affect the choice to pursue one over the other [43].
520 N. KrishanVyas et al.
Additionally, mobile operators might use less extreme tactics, such as making
greater use of network-sharing agreements, to achieve scale and network efficien-
cies. Comparing such deals to four-to-three mergers, they may be subject to less
stringent competition reviews. Mergers with companies running networks based on
various business models could be an alternate strategy for reaching scale. A rela-
tively new class of telecom companies, wholesale-only network providers, could
soon be involved in mergers that test the EU’s openness to cross-border consolidation.
According to reports, major infrastructure firms are considering consolidating a large
section of the infrastructure used by European telecoms to achieve critical economies
of scale and scope. Since wholesale-only carriers have no incentive to engage in the
anticompetitive foreclosures that vertically integrated telecom providers are known
for, these mergers may be less likely to cause competition issues. Pure wholesale
network operators may attract more investors if they detach their infrastructure assets
from customer-facing businesses. Their utility business model leases infrastructure
to service providers and produces consistent cash flows. A more proactive policy that
explicitly advocates for creating a pan-European mobile market may be required, or
a shift in the approach to in-country consolidation may be sufficient if achieving
economies of scale and investor attraction is crucial for the timely deployment of
5G. MNOs could be guaranteed a minimum amount of revenue through policies that
stimulate demand for 5G, such as public agencies committing to acquire a minimum
level of service through anchor tenant agreements. Demand may be increased by
tax laws favorable to investment, compliant with state assistance regulations, and
supportive of cross-sector and co-investment in network deployment [44]. However,
an evaluation of such regulations at the EU level is required to avoid differences
between Member States and further fragmentation of the EU telecom market.
5 Conclusion
In this research, we looked at whether the EU should be more proactive in its approach
to 5G industrial policy, given how far behind the rest of the world is today regarding
network deployment and technological innovation. The key question is whether the
rollout of 5G presents a fresh set of opportunities and incentives for companies and
government policymakers alike. An overview of technological leadership in 5G was
provided in the first section, which looked at current studies using empirical methods
based on patents [45]. Building on the previous study’s findings, we discovered that
the dispersion of 5G research and development efforts limits Europe’s capacity to
compete on an equal footing with other developed regions globally. Nonetheless, by
pooling the wealth of rare and advanced technologies currently present in Europe to
research and innovate in the 5G space as a group, the EU and its member states may
be able to boost Europe’s competitiveness in this important area.
Continuing our exploration, the subsequent section delved into the ongoing
deployment of 5G networks in the EU. Inconsistent protocols and metrics make it
difficult to evaluate the deployment status among member states qualitatively. Still, a
A Review: 5G Unleashed Pioneering Leadership, Global Deployment … 521
clear pattern becomes apparent: the EU is developing more slowly than other parts of
the world. We found that underinvestment is one of the main causes of the slow adop-
tion of 5G technology. According to our findings, the same barrier impeding 5G’s
technological advancement might also prevent the network from being deployed
quickly. In particular, the fragmented structure of the European Union’s telecom
sector seems to deter investors from mobilizing the necessary capital to guarantee
the quick rollout of 5G networks to benefit European customers and citizens.
As a possible solution, we support a careful investigation of possible industry
agreements and international mergers that would more fully realize the concept of
a single market, even if the networks involved are mostly national. It is unclear
whether cross-border mergers and in-country consolidation are mutually exclusive
or if some level of in-country consolidation is required before cross-border mergers
may be a feasible alternative, even though the EU publicly supports cross-border
consolidation. In any event, a hypothetical industrial consolidation across the EU
may incorporate and profit from the contributions of creative EU SMEs across the
value chain without necessarily pursuing it at the expense of competition in retail
markets.
Addressing the multifaceted challenges of advancing 5G, it becomes evident
that effective public policy must extend beyond an industrial focus, permeating
various realms such as innovation policy, spectrum assignments, economic regula-
tion, competition enforcement, and security and resilience-related policies. Currently
tackled through separate legal instruments, this complex interplay of policies neces-
sitates a holistic approach to support 5G development in Europe. Research exam-
ining the optimal level at which these policies should be implemented—whether at
the national, regional, or EU level—can guide the design of a comprehensive policy
environment conducive to 5G advancement. An ecosystem or value chain perspective
is crucial for addressing deployment challenges, according to the recently released
consultation on the future of the electronic communications sector, which empha-
sizes that all market actors should fairly contribute to the costs of public goods,
services, and infrastructures in line with the broader digital transformation.
In the current geopolitical and economic landscape, characterized by heightened
consideration for industrial policy post COVID-19 and the war in Ukraine, major
global competitors are actively engaging in industrial interventions related to 5G.
In this context, Europe should actively participate in the conversation. Notably, the
discussion about challenges posed by 5G extends beyond the technology itself, antic-
ipating the advent of 6G. To retain global relevance and competitiveness, the EU must
treat the development and deployment of 5G, and eventually 6G, as genuine Single
Market issues. While the EU’s Digital Single Market Strategy ostensibly places 5G
at its core, the persistent fragmentation of the European telecom market hinders prac-
tical implementation. If, as suggested, the EU’s underperformance in technological
leadership and 5G deployments stems from this fragmentation, remediation can only
be achieved through robust policy actions executed at the European scale.
522 N. KrishanVyas et al.
References
24. Blackman C, Forge S (2019) 5G deployment: atate of play in Europe, USA and Asia.
Luxemourg: European Parliament
25. GSMA, The mobile economy 2022. https://www.gsma.com/mobileeconomy/wp-content/upl
oads/2022/02/280222-The-Mobile-Economy-2022.pdf
26. European Commission (2021) Communication from the commission to the European parlia-
ment, the council, the European economic and social committee and the committee of the
regions 2030 digital compass: the European way for the digital decade. COM/2021/118 final
27. European Court of Auditors (2022) 5G rollout in the EU: delays in deployment of networks
with security issues remaining unresolved. Special Report 3/2022 https://www.eca.europa.eu/
Lists/ECADocuments/SR22_03/SR_Security-5G-networks_EN.pdf
28. Plum Consulting (2021) Stimulating demand for 26 GHz in Europe, report by tony lavender, val
jervis, aude schoentgen, laura wilkinson. https://plumconsulting.co.uk/stimulating-demand-
for-26-ghz-in-europe/
29. European 5G Observatory (2022a) 5G observatory quarterly report 17. https://5gobservatory.
eu/wp-content/uploads/2022/10/QR-17-Final-v3-CLEAN.pdf
30. Ku´s A, Massaro M (2022) Analysing the C-band spectrum auctions for 5G in Europe: achieving
efficiency and fair decisions in radio spectrum management. Telecommun Policy 46(4), Article
102286
31. ETNO (European Telecommunications Network Operators’ Association) (2023) The state of
digital communications 2023. https://etno.eu/library/reports/112-the-state-of-digital-commun
ications-2023.html
32. https://ec.europa.eu/commission/presscorner/detail/en/SPEECH_23_62
33. Timmers P (2022) Digital industrial policy for Europe, CERRE report
34. Erie MS, Streinz T (2021) The beijing effect: China’s digital silk road as transnational data
governance. New York University. J Int Law Polit
35. Hoffman S, Bradshaw S, Taylor E (2020) Networks and geopolitics: how great power rivalries
infected 5G. Oxford Information Labs
36. Kaska K, Beckvard H, Min´arik T (2019) Huawei, 5G and China as a security threat, vol 28.
NATO Cooperative Cyber Defense Center for Excellence (CCDCOE)
37. Triolo P (2020) China’s 5G strategy: Be first out of the gate and ready to innovate. In: Kennedy
S (ed) China’s uneven high-tech drive: implications for the United States. Center for Strategic
and International Studies (CSIS), Washington, DC, pp 21–28
38. Brake D (2020) A US national strategy for 5G and future wireless innovation. Inf Technol
Innov Found
39. ETNO (European Telecommunications Network Operators’ Association) (2022) The state of
digital communications 2022. https://etno.eu//downloads/reports/state_of_digi_2022.pdf
40. Williamson B, Howard S (2022) Thinking beyond the WACC-the investment hurdle rate and
the seesaw effect
41. https://digital-strategy.ec.europa.eu/en/consultations/future-electronic-communications-sec
tor-and-its-infrastructure
42. https://www.justice.gov/opa/pr/justice-department-settles-t-mobile-and-sprint-their-pro
posed-merger-requiring-package
43. https://www.ft.com/content/ee262b71-4d26-42d9-a25d-6c9b6afc9dfc
44. Deloitte (2021) The open future of radio access networks. https://www2.deloitte.com/content/
dam/Deloitte/pt/Documents/technology-media-telecommunications/TEE/The-Open-Future-
of-Radio-Access-Networks.pdf
45. Parcu PL (2022) Policy options for 5G success in the EU. In: Bohlin E, Cappelletti F (eds)
Europe’s future connected: policies and challenges for 5G and 6G networks. ELF