Lecture Notes in Electrical Engineering
Lecture Notes in Electrical Engineering
Proceedings
of International
Conference
on Recent
Innovations in
Computing
ICRIC 2022, Volume 2
Lecture Notes in Electrical Engineering
Volume 1011
Series Editors
Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli
Federico II, Napoli, Italy
Marco Arteaga, Departamento de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán, Mexico
Bijaya Ketan Panigrahi, Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi,
Delhi, India
Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, München, Germany
Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China
Shanben Chen, School of Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore,
Singapore, Singapore
Rüdiger Dillmann, Humanoids and Intelligent Systems Lab, Karlsruhe Institute for Technology, Karlsruhe, Germany
Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China
Gianluigi Ferrari, Dipartimento di Ingegneria dell’Informazione Palazzina 2, Università degli Studi di Parma, Parma,
Italy
Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid,
Madrid, Spain
Sandra Hirche, Department of Electrical Engineering and Information Systems, Technische Universität München,
München, Germany
Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA, USA
Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Janusz Kacprzyk, Intelligent Systems Laboratory, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Alaa Khamis, Department of Mechatronics Engineering, German University in Egypt El Tagamoa El Khames,
New Cairo City, Egypt
Torsten Kroeger, Stanford University, Stanford, CA, USA
Yong Li, College of Electrical and Information Engineering, Hunan University, Changsha, Hunan, China
Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA
Ferran Martín, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra,
Barcelona, Spain
Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore
Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany
Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA
Sebastian Möller, Quality and Usability Laboratory, TU Berlin, Berlin, Germany
Subhas Mukhopadhyay, School of Engineering and Advanced Technology, Massey University,
Palmerston North, Manawatu-Wanganui, New Zealand
Cun-Zheng Ning, Department of Electrical Engineering, Arizona State University, Tempe, AZ, USA
Toyoaki Nishida, Department of Intelligence Science and Technology, Kyoto University, Kyoto, Japan
Luca Oneto, Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genova,
Genova, Genova, Italy
Federica Pascucci, Department di Ingegneria, Università degli Studi Roma Tre, Roma, Italy
Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
Gan Woon Seng, School of Electrical and Electronic Engineering, Nanyang Technological University,
Singapore, Singapore
Joachim Speidel, Institute of Telecommunications, University of Stuttgart, Stuttgart, Germany
Germano Veiga, FEUP Campus, INESC Porto, Porto, Portugal
Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Haidian District Beijing, China
Walter Zamboni, Department of Computer Engineering, Electrical Engineering and Applied Mathematics,
DIEM—Università degli studi di Salerno, Fisciano, Salerno, Italy
Junjie James Zhang, Charlotte, NC, USA
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the
latest developments in Electrical Engineering—quickly, informally and in high
quality. While original research reported in proceedings and monographs has
traditionally formed the core of LNEE, we also encourage authors to submit books
devoted to supporting student education and professional training in the various
fields and applications areas of electrical engineering. The series cover classical and
emerging topics concerning:
• Communication Engineering, Information Theory and Networks
• Electronics Engineering and Microelectronics
• Signal, Image and Speech Processing
• Wireless and Mobile Communication
• Circuits and Systems
• Energy Systems, Power Electronics and Electrical Machines
• Electro-optical Engineering
• Instrumentation Engineering
• Avionics Engineering
• Control Systems
• Internet-of-Things and Cybersecurity
• Biomedical Devices, MEMS and NEMS
For general information about this book series, comments or suggestions, please
contact leontina.dicecco@springer.com.
To submit a proposal or request further information, please contact the Publishing
Editor in your country:
China
Jasmine Dou, Editor (jasmine.dou@springer.com)
India, Japan, Rest of Asia
Swati Meherishi, Editorial Director (Swati.Meherishi@springer.com)
Southeast Asia, Australia, New Zealand
Ramesh Nath Premnath, Editor (ramesh.premnath@springernature.com)
USA, Canada
Michael Luby, Senior Editor (michael.luby@springer.com)
All other Countries
Leontina Di Cecco, Senior Editor (leontina.dicecco@springer.com)
** This series is indexed by EI Compendex and Scopus databases. **
Yashwant Singh · Chaman Verma · Illés Zoltán ·
Jitender Kumar Chhabra · Pradeep Kumar Singh
Editors
Proceedings of International
Conference on Recent
Innovations in Computing
ICRIC 2022, Volume 2
Editors
Yashwant Singh Chaman Verma
Central University of Jammu Department of Media and Educational
Jammu, Jammu and Kashmir, India Informatics, Faculty of Informatics
Eötvös Loránd University
Illés Zoltán Budapest, Hungary
Department of Media and Educational
Informatics, Faculty of Informatics Jitender Kumar Chhabra
Eötvös Loránd University Department of Computer Engineering
Budapest, Hungary National Institute of Technology
Kurukshetra
Pradeep Kumar Singh Kurukshetra, Haryana, India
KIET Group of Institutions
Ghaziabad, Uttar Pradesh, India
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
The volume two of the fifth version of international conference was hosted on the
theme Recent Innovations in Computing (ICRIC-2022), and the conference was
hosted by the Eötvös Loránd University (ELTE), Hungary, in association with the
Knowledge University, Erbil, WSG University in Bydgoszcz Poland, Cyber Secu-
rity Research Lab (CSRL) India, and other academic associates, technical societies,
from India and abroad. The conference was held from August 13 to 14, 2022. The
conference includes the tracks: cybersecurity and cyberphysical systems, Internet of
things, machine learning, deep learning, big data analytics, robotics cloud computing,
computer networks and Internet technologies, artificial intelligence, information
security, database and distributed computing, and digital India.
The authors are invited to present their research papers at the second volume of
Fifth International Conference on Recent Innovations in Computing (ICRIC-2022)
in six technical tracks. We express our thanks to authors who have submitted their
papers during the conference.
The first keynote address was delivered by Dr. Zoltán Illés, Eötvös Loránd Univer-
sity (ELTE), Hungary. He speaks on real-time systems and its role in various appli-
cations including educational purposes as well. The second keynote speech was
delivered by Dr. Kayhan Zrar Ghafoor, Knowledge University, Erbil; he speaks on
understanding the importance of the ultra-low latency in beyond 5G networks. The
third keynote was delivered by Dr. Yashwant Singh, Central University of Jammu,
J&K, India; he speaks on the Internet of things (IoT) and intelligent vulnerability
assessment framework. He discusses the importance of IoT and IoVT during his talk.
The organizing committee expresses their sincere thanks to all session chairs—
Dr. Chaman Verma, Dr. Amit Sharma, Dr. Ashutosh Sharma, Dr. Mayank Agarwal,
Dr. Aruna Malik, Dr. Samayveer Singh, Dr. Yugal Kumar, Dr. Vivek Sehgal, Dr.
Anupam Singh, Dr. Rakesh Saini, Dr. Nagesh Kumar, Dr. Vandana, Dr. Kayhan Zrar
Ghafoor, Dr. Zdzislaw Polkowski, Dr. Pljonkin Anton for extending their help during
technical sessions.
v
vi Preface
We are also grateful to rectors, vice rectors, deans, and professors of ELTE,
Hungary, for their kind support and advice from time to time. The organizing
committee is thankful to all academic associates and different universities for
extending their support during the conference. Many senior researchers and profes-
sors across the world also deserve our gratitude for devoting their valuable time to
listen, giving feedback, and suggestion on the paper presentations. We extend our
thanks to the Springer, LNEE Series, and editorial board for believing in us.
vii
viii Contents
Dr. Yashwant Singh is a Professor and Head of the Department of Computer Science
and Information Technology at the Central University of Jammu where he has been
a faculty member since 2017. Prior to this, he was at the Jaypee University of Infor-
mation Technology for 10 Years. Yashwant completed his Ph.D. from Himachal
Pradesh University Shimla, his Post Graduate study from Punjab Engineering College
Chandigarh, and his undergraduate studies from SLIET Longowal. His research
interests lie in the area of Internet of Things, Vulnerability Assessment of IoT and
Embedded Devices, Wireless Sensor Networks, Secure and Energy Efficient Routing,
ICS/SCADA Cyber Security, ranging from theory to design to implementation. He
has collaborated actively with researchers in several other disciplines of computer
science, particularly Machine Learning, Electrical Engineering. Yashwant has served
on 30 International Conference and Workshop Program Committees and served as
the General Chair for PDGC-2014, ICRIC-2018, ICRIC-2019, ICRIC-2020, and
ICRIC-2021. He currently serves as coordinator of Kalam Centre for Science and
Technology (KCST), Computational Systems Security Vertical at Central University
of Jammu established by DRDO. Yashwant has published more than 100 Research
Papers in International Journals, International Conferences, and Book Chapters of
repute that are indexed in SCI and SCOPUS. He has 860 Citations, i10-index 27
and h-index 16. He has Research Projects of worth Rs.1040.9413 Lakhs in his credit
from DRDO and Rs. 12.19 Lakhs from NCW. He has guided four Ph.D., 24 M.Tech.
students and guiding four Ph.Ds. and five M.Tech. He has visited eight countries for
his academic visits e.g. U.K., Germany, Poland, Chez Republic, Hungary, Slovakia,
Austria, Romania. He is Visiting Professor at Jan Wyzykowski University, Polkowice
Poland.
Dr. Chaman Verma is an Assistant Professor at the Department of Media and Educa-
tional Informatics, Faculty of Informatics, Eötvös Loránd University, Hungary. He
has completed his Ph.D. Informatics from Doctoral School of Informatics, Eötvös
Loránd University, Hungary, with Stipendium Hunagricum scholarship funded by
Tempus Public Foundation, Government of Hungary. During his Ph.D., he won
the EFOP Scholarship, Co-founded by the European Union Social Fund and the
xiii
xiv About the Editors
Dr. Illés Zoltán, Ph.D., Habil. has started higher education studies in Mathematics
and Physics at Eötvös Loránd University. He later took up the Computer Science
supplementary course, which was started at that time. He got a Hungarian’s Republic
scholarship based on his outstanding academic achievements during his university
studies. He graduated in 1985, after which he started working at the Department of
Computer Science of Eötvös Loránd University. He completed his Ph.D. dissertation
entitled “Implementation of Real-Time Measurements for High-Energy Ion Radia-
tions” in 2001. In 2004, at the request of Jedlik Publisher, he also wrote a textbook on
the C# programming language. This book has a second, expanded edition in 2008.
In 2007, he was awarded a scholarship by the Slovak Academy of Sciences, where
he spent six months in researching and teaching at the Constantine the Philosopher
University in Nitra. The NJSZT awarded the Rezső Tarján Prize in 2016 for the
success of the joint work that has been going on ever since. He and his colleagues
also researched the issue of mobile devices and applications in the framework of a
TéT_SK tender won in 2014. Based on their research findings, he launched a pilot
project to support real time, innovative performance management. The first results of
this research are an integral part of his habilitation dissertation. He has been an invited
speaker at several international conferences and a member of the Amity University
Advisory Board since 2020.
papers in reputed International and National Journals and conferences including more
than 40 publications from IEEE, ACM, Elsevier, and Springer, most of which are
SCI/Scopus indexed. His research interest includes Software Metrics, Data Mining,
Soft Computing, Machine Learning, Algorithms, and related areas. He is a reviewer
for most reputed journals such as IEEE Transactions ACM Transactions Elsevier,
Wiley, and Springer. He has Google Scholar citations 1122, H-index 16, and i-10
index 24.
Dr. Pradeep Kumar Singh is currently working as a professor and head in the depart-
ment of Computer Science at KIET Group of Institutions, Delhi-NCR, Ghaziabad,
Uttar Pradesh, India. Dr. Singh is a senior member of Computer Society of India
(CSI), IEEE, and ACM and a life member. He is an associate editor of the Inter-
national Journal of Information System Modeling and Design (IJISMD) Indexed by
Scopus and Web of Science. He is also an associate editor of International Journal of
Applied Evolutionary Computation (IJAEC) IGI Global USA, Security and Privacy,
and Wiley. He has received three sponsored research project grants from Government
of India and Government of Himachal Pradesh worth Rs 25 Lakhs. He has edited a
total of 12 books from Springer and Elsevier. He has Google Scholar citations 1600,
H-index 20, and i-10 index 49. His recently published book titled Handbook of Wire-
less Sensor Networks: Issues and Challenges in Current Scenarios from Springer
has reached more than 12000 downloads in the last few months. Recently, Dr. Singh
has been nominated as a section editor for Discover IoT a Springer Journal.
Artificial Intelligence, Machine Learning,
Deep Learning Technologies
A Three-Machine n-Job Flow Shop
Scheduling Problem with Setup
and Machine-Specific Halting Times
Abstract Generally, machines may be serviced due to multiple reasons. The most
common cause for machine repairs is due to allowing them to function uninter-
ruptedly. This can be avoided by introducing mandatory machine halts that play
a crucial role in preventing machine failures. However, preventive maintenance has
been extensively addressed in the literature but halting times, which are indeed impor-
tant in an effective manufacturing process, have received limited attention. Due
to its practical importance, this paper considers a three-machine n-job flow shop-
scheduling problem (FSSP) with transportation time, setup and machine-specific
halting times. Here, setup times are separated from the processing time, and this
occurs due to several causes such as assembling, cleaning and evocation. The objec-
tive of this problem is to minimize the overall makespan and mean weighted flow time
such that halts must be supplied to the machines whenever they operate continuously
for specified times. To the best of the author’s knowledge, the present FSSP model is
not addressed before in the literature. Focusing on heuristics or metaheuristic algo-
rithms is inevitable given that the FSSP with three or four machines is NP-hard.
Thus, an efficient heuristic algorithm is developed to tackle this problem. Finally,
the proposed algorithm is demonstrated with a numerical example through which
various performance measures are calculated. The current study is a modest attempt
to look into the implications of machine-specific halting, setup and conveyance times
for production.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 3
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_1
4 T. J. Kumar et al.
1 Introduction
[21], binding method-based heuristic algorithm [22], iterative meta-heuristic [23] and
several other algorithms were developed for solving FSSP and its allied problems.
Halting time is crucial for preventing machine failures. In FSSP, halting time
is crucial in preventing machine failures. To achieve the optimal job sequence and
reduce the makespan and mean weighted flow time, relaxation must be provided to
the machines while they are operating constantly. The setup time for each task on each
machine is also important because there may be instances where a machine requires
setup before performing jobs. By understanding the importance of various practical
constraints, this study addresses the FSSP combined with setup time, conveyance
time, the weight of jobs and machine-specific halting times. To the author’s best
knowledge, the present model has not been addressed in the literature.
2 Problem Description
2.1 Assumptions
The n-job and three-machine FSSP with setup time and machine-specific halting
times is defined as follows: Let Rij {i = 1, 2, . . . , n, j = 1, 2, 3} be the machine
processing time, and Sij {i = 1, 2, . . . , n, j = 1, 2, 3} be the setup time for each
machine. Let h i and li denote the conveyance time between Machine 1 to Machine 2
and Machine 2 to Machine 3, respectively. Conveyance time is the time that carries
something to one place to another. Let wi be the weight of jobs, which are rela-
tive importance between the jobs. Let CF j be the cumulative flow time for all the
machines. Depending on the cumulative flow time, each machine is sent for halting.
A machine is allowed for halting only when the cumulative flow time is contin-
uous, and it exceeds specified time. Let HS j and HE j be the halt starting time
and halt ending time for each machine. The times at which Machine 1, Machine
2 and Machine 3 are allowed for halting are deterministic, which are denoted by
6 T. J. Kumar et al.
HS1 , HS2 and HS3 , respectively. Halting duration times for all the machines are repre-
sented by HE1 , HE2 and HE3 , respectively. Here, idle times of machines are assumed
to utilize for halting to optimize the overall elapsed time. The objective is to find
the optimal job sequence such that the total elapsed time is minimum. Due to the
NP-hard nature of the present problem, an efficient heuristic algorithm is presented
to minimize the total makespan. Since there are no existing studies on the present
model, comparative studies are not performed. The outline of the described problem
is given in Table 1, which is also graphically shown (see Fig. 1).
3 Proposed Methodology
The systematic algorithm for solving the present problem is described as follows:
Step 1: Modify the provided three-machine problem into two-machine problem
by using the following structural conditions:
Min{Si1 + Ri1 } ≥ Max{h i + Si2 + Ri2 }
(1)
(or) Min{li + Si3 + Ri3 } ≥ Max{h i + Si2 + Ri2 }
Min{Si1 + Ri1 + h i } ≥ Max{Si2 + Ri2 }
(2)
(or) Min{li + Si3 + Ri3 } ≥ Max{Si2 + Ri2 }
Min{Si1 + Ri1 + h i } ≥ Max{Si2 + Ri2 + li }
(3)
(or) Min{Si3 + Ri3 } ≥ Max{Si2 + Ri2 + li }
Min{Si1 + Ri1 } ≥ Max{h i + Si2 + Ri2 + li }
(4)
(or)Min{Si3 + Ri3 } ≥ Max{h i + Si2 + Ri2 + li }
7
8 T. J. Kumar et al.
Step 8: Calculate the performance measures such as total elapsed time, mean
weighted flow time, flow time of each job, flow time of each machines and machine
idle time for revised initial table.
4 Numerical Example
Consider the three-machine five-job FSSP with setup time, transportation time and
weight of jobs with specific machine halting times. The objective of the given problem
is to optimize the makespan and mean weighted flow time. The processing and
conveyance times (in hours) and halting times (in hours) are given in Tables 2 and 3
as follows:
Step 1: The given problem satisfied the structural condition (2), the given problem
can be reduced into two machine problem that is given in Table 4.
where Mi = Si1 + Ri1 + h i + Si2 + Ri2 , Ni = Si2 + Ri2 + li + Si3 + Ri3 .
Steps 2–3: Apply Step 2 and Step 3 for the revised table, we get the following
Table 5.
Table 2 Numerical example with setup time, conveyance time and weight of jobs
Job (i) Si1 Ri1 (h i ) Si2 Ri2 (li ) Si3 Ri3 (wi )
1 5 25 10 2 12 5 3 23 2
2 4 36 8 4 15 6 4 26 4
3 3 14 5 3 10 4 2 19 3
4 4 18 6 2 17 5 3 18 5
5 3 15 5 1 11 3 4 22 2
Table 3 Machine-specific
Machines Time (in hours) to start Time (in hours) taking
halting times
( j = 1, 2, 3) halting HS j
for halting HE j
Machine 1 50 10
Machine 2 40 6
Machine 3 50 8
Table 4 Two-machine
Job (i ) Mi Ni wi
problem by adding setup and
conveyance time 1 54 45 2
2 67 55 4
3 35 38 3
4 47 45 5
5 35 41 2
10 T. J. Kumar et al.
4 3 5 1 2
Step 5: Using the optimal sequence, the total elapsed time and halting time affected
jobs for all the machines are computed and shown in Table 6.
The total makespan for the given optimal sequence is 190 h.
Step 6–7: Identify the halting time affected jobs by using cumulative flow time
of each machine. The halting time affected jobs are Job 1 for Machine 1 and Job 5
for Machine 3. The Machine 2 flow time is not continuous, and then keeps the same
processing times for Machine 2 in the original table, which is highlighted in bold as
shown in Table 7.
Step 8: The summary results of total makespan, weighted mean flow time, flow
time of each job, flow time of each machine and idle times of the machines are
determined for reduced problem and shown in Tables 8 and 9. The final solution is
demonstrated through Gantt chart (see Fig. 2).
From Table 6, it is observed that the total makespan due to absence of halting
times is 190 h, whereas from the Table 8, it is seen that the total makespan is 200 h
and increased by 10 h against the former makespan due to effect of halting times.
A Three-Machine n-Job Flow Shop Scheduling Problem with Setup …
Table 7 Effect of processing time due to halting times
Job Si1 Ri1 (h i ) Si2 Ri2 (li ) Si3 Ri3 (wi )
(i )
1 5 35 10 2 12 5 3 23 2
2 4 36 8 4 15 6 4 26 4
3 3 14 5 3 10 4 2 19 3
4 4 18 6 2 17 5 3 18 5
5 3 15 5 1 11 3 4 30 2
11
12 T. J. Kumar et al.
Machine-1 idle
job-5
Machine-1 set up
Machine-1 Processing
Transportation(M-1 to M-2)
Machine-2 idle
job-3
Machine-2 set up
Machine-2 Processing
Transportation(M-2 to M-3)
Machine-1 idle
job-4
Machine-3 set up
Machine-3 Processing
5 Conclusions
In this paper, we investigated the three-machine n-job FSSP with various constraints
such as setup time, transportation time, job weights and machine-specific halting
times. The problem is to minimize the overall makespan and mean weighted flow
time such that halts must be supplied to the machines whenever they operate contin-
uously for specified times. The present model is not addressed before in the litera-
ture. Due to the NP-hard nature of the present problem, an efficient heuristic algo-
rithm is presented to minimize the total makespan. Since no existing studies on the
considered model, comparative studies are not performed. However, the proposed
algorithm is demonstrated with a numerical example through which various perfor-
mance measures are computed. The current study is a modest attempt to look into the
implications of machine-specific halting, setup and conveyance times for production.
References
1. Lian Z, Gu X, Jiao B (2008) A novel particle swarm optimization algorithm for permutation
flow-shop scheduling to minimize makespan. Chaos, Solitons Fractals 35(5):851–861
2. Ren T, Guo M, Lin L, Miao Y (2015) A local search algorithm for the flow shop scheduling
problem with release dates. Discret Dyn Nat Soc 2015:1–8
3. Erseven G, Akgün G, Karakaş A, Yarıkcan G, Yücel Ö, Öner A (2018) An application of
permutation flowshop scheduling problem in quality control processes. In: The international
symposium for production research. Springer, Cham, pp 849–860
4. Khurshid B, Maqsood S, Omair M, Sarkar B, Saad M, Asad U (2021) Fast evolutionary
algorithm for flow shop scheduling problems. IEEE Access 9:44825–44839
5. Ruiz R, Maroto C, Alcaraz J (2006) Two new robust genetic algorithms for the flowshop
scheduling problem. Omega 34(5):461–476
6. Johnson SM (1954) Optimal two-and three-stage production schedules with setup times
included. Naval Res Logist Q 1(1):61–68
7. Campbell HG, Dudek RA, Smith ML (1970) A heuristic algorithm for the n job, m machine
sequencing problem. Manage Sci 16(10):630–637
8. Nawaz M, Enscore E, Ham I (1983) A heuristic algorithm for the m-machine, n-job flow-shop
sequencing problem. Omega 11(1):91–95
9. Ruiz R, Maroto C, Alcaraz J (2005) Solving the flowshop scheduling problem with sequence
dependent setup times using advanced metaheuristics. Eur J Oper Res 165(1):34–54
10. Pandian P, Rajendran P (2010) Solving constrained flow-shop scheduling problems with three
machines. Int J Contemp Math Sci 5(19):921–929
11. Aydilek H, Allahverdi A (2013) A polynomial time heuristic for the two-machine flowshop
scheduling problem with setup times and random processing times. Appl Math Model 37(12–
13):7164–7173
12. Gupta, D., Nailwal, K. K., Sharma, S.: A heuristic for permutation flowshop scheduling to
minimize makespan. In: Proceedings of the Third International Conference on Soft Computing
for Problem Solving, pp. 423–432, Springer, New Delhi (2014).
13. Thangaraj M, Rajendran P (2016) Solving constrained multi-stage machines flow-shop
scheduling problems in fuzzy environment. Int J Appl Eng Res 11(1):521–528
14. Nailwal K, Gupta D, Jeet K (2016) Heuristics for no-wait flow shop scheduling problem. Int J
Ind Eng Comput 7(4):671–680
14 T. J. Kumar et al.
15. Lee JY, Kim YD (2017) Minimizing total tardiness in a two-machine flowshop scheduling
problem with availability constraint on the first machine. Comput Ind Eng 114:22–30
16. Belabid J, Aqil S, Allali K (2020) Solving permutation flow shop scheduling problem with
sequence-independent setup time. J Appl Math 2020:1–11
17. Missaoui A, Boujelbene Y (2021) An effective iterated greedy algorithm for blocking hybrid
flow shop problem with due date window. RAIRO-Oper Res 55(3):1603–1616
18. Abbaszadeh N, Asadi-Gangraj E, Emami S (2021) Flexible flow shop scheduling problem to
minimize makespan with renewable resources. Sci Iranica 28(3):1853–1870
19. Ribas I, Companys R, Tort-Martorell X (2021) An iterated greedy algorithm for the parallel
blocking flow shop scheduling problem and sequence-dependent setup times. Expert Syst Appl
184:115535
20. Sun J, Zhang G, Lu J, Zhang W (2021) A hybrid many-objective evolutionary algorithm for
flexible job-shop scheduling problem with transportation and setup times. Comput Oper Res
132:105263
21. Shao W, Shao Z, Pi D (2021) Effective constructive heuristics for distributed no-wait flexible
flow shop scheduling problem. Comput Oper Res 136:105482
22. Thangaraj M, Kumar T, Nandan K (2022) A precedence constrained flow shop scheduling
problem with transportation time, breakdown times, and weighted jobs. J Project Manage
7(4):229–240
23. Liang Z, Zhong P, Liu M, Zhang C, Zhang Z (2022) A computational efficient optimization of
flow shop scheduling problems. Sci Rep 12(1):1–16
Clinical Named Entity Recognition Using
U-Net Classification Model
1 Introduction
Electronic medical records (EMRs) consist of plentiful health data and vital medicinal
signs supporting clinical decision making and disease surveillance. But a great deal
of non-structured medicinal texts substantially limits the search for knowledge and
the appliance of EMRs. Discovering automated information retrieval techniques is
the need of the current time to convert unstructured text into simple and machine-
compatible structured data [1]. As a fundamental step in natural language processing
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 15
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_2
16 P. Bansal and P. Singh
(NLP), clinical named entity recognition (CNER) has been a well-known research
topic on deriving all types of expressive information in non-structured clinical text.
The purpose of the CNER is to discover and classify medical terminology in the
EHR, for example, disease, symptoms, treatment, examination and body part [2].
However, this task becomes challenging for two reasons. One is the magnificence of
the EHR; that is, the same word or sentence can represent multiple forms of named
entities, and different types can define the entities with the same name. The second
is the use of non-standard abbreviations or acronyms along with multiple forms of
similar entities due to the large number of entities that rarely or do not occur in the
training set. The development of the CNER component follows a modular design
with a certain processing sequence (Fig. 1). This allows different modules belonging
to the design sequence to be adaptive and independently instantiate a number of
times.
The component’s input contains clinical terms that are previously considered at
the level of images and tables that may be present in a document. Here, the major
objective is the identification and classification of sensitive clinical data, given its
class. The processing chain is followed for this as depicted in the Fig. 1. The three
main steps include pre-processing, named entity recognition and post-processing.
All these tasks are performed in different modules. Each of these modules deals
with solving a particular issue to access the output terminology [3]. The last module
of the NER component chain is post-processing. This module aims at treating the
results obtained from the former NER module and return the text (output) and entities
discovered with the required format to the client. This facilitates the user to select
to see the result in five dissimilar ways, as various forms of output are available for
display based on their priority.
Machine learning (ML) being a form of data-driven artificial intelligence (AI) is
able to learn about systems without clear programming. The algorithm based on ML
formulates the clinical NER process as a chain labeling issue that is meant to discover
the most optimal label chain (e.g., BIO) for a provided input chain (different words
from the clinical text). Researchers have implemented several machines learning
framework, including conditional random fields (CRFs), maximum entropy (ME),
etc. Several prominent NER systems implemented the CRF model, which is one of
the leading machine learning algorithms.
Clinical Named Entity Recognition Using U-Net Classification Model 17
2 Literature Review
– Xing et al., discussed that various issues were occurred in the Chinese clinical
domain, due to which the recognition of clinical named entities (CNEs) became
challenging [5]. Thus, a glyph-based enhanced information model was intro-
duced for which the convolutional neural network (CNN) and ALBERT were
implemented for pre-training the language model so that the enhanced character
information vector was attained. An attention mechanism was put forward on
the BiLSTM-CNN-CRF (bidirectional long short-term memory network-CNN-
conditional random field) structure for dealing with the issue related to extracting
the attributes through the existing model without considering the importance and
location information of words. The outcomes acquired on CCKS2018 dataset
demonstrated the supremacy of the introduced model over others.
– Li et al., developed BiLSTM-Att-CRF algorithm in which bidirectional long-short
time memory (BiLSTM) was integrated with an attention system for enhancing the
efficiency to recognize the named entity in Chinese EMRs [6]. Hence, five kinds
of clinical entities were recognized from CCKS2018. The outcomes indicated that
the developed algorithm was performed well. Moreover, this algorithm yielded
the F-score up to 85.79% without any additional attributes and 86.35% in the
presence of extra attributes.
18 P. Bansal and P. Singh
– Qiu et al., designed a residual dilated convolutional neural network with condi-
tional random field (RD-CNN-CRF) algorithm in order to recognize the CNE [7].
Initially, the contextual information was captured using dilated convolutions, and
the semantic and low-level attributes were deployed with the implementation of
residual connections later on. In the end, the CRF was implemented as the output
interface for attaining the optimal sequence of tags over the entire sentence. The
CCKS-2017 Task 2 dataset was applied in the experimentation. The outcomes
exhibited that the precision obtained from the designed algorithm was counted
90.63%, recall was 92.02% and F1-score 91.32% in comparison with the existing
techniques.
– Li et al., projected the fine-tuned pre-trained contextual language models to recog-
nize the named entity (NE) on the clinical trial eligibility criteria [8]. For this
purpose, BioBERT, BlueBERT, PubMedBERT and SciBERT were implemented,
and two systems were deployed for the open domain with regard to three clinical
trial eligibility criteria corpora. The outcomes revealed that the PubMedBERT
system performed more effectively as compared to others. Moreover, domain-
specific transformer-based language models were proved more applicable in
recognizing the NE in clinical criteria.
– Li et al., presented an approach in which conditional random field (CRF) was
integrated with BiLSTM for recognizing and extracting the NEs in unstructured
medical texts [9]. The asthma was cured by deriving 804 drug specifications from
the Internet. Thereafter, a vector was utilized as the input of the neural network
(NN) to quantize the normalized field of drug specification word. The presented
approach led to enhance the accuracy by 6.18%, recall by 5.2% and F1 value
by 4.87% in contrast to the existing algorithms. Additionally, the adaptability of
this approach was proved for extracting the information regarding NE from drug
specification.
– Yu et al., focused on implementing the BioBERT model on the basis of Google
BERT model to define the clinical problems, treatments and tests automatically in
EMR [10]. At first, the text was transformed into a numerical vector by pre-training
the BioBERT model on the corpus of medical fields. After that, the BiLSTM-
CRF was adopted for training the processed vectors and accomplishing the task
to tag the entity. The experiments were conducted on I2B2 2010 dataset. The
experimental results proved that the presented approach was assisted in enhancing
efficiency to recognize the named entity for EMR and provided the F1 score up
to 87.10%.
– Qiu et al., intended a RD-CNN-CRF system to recognize the Chinese clinical
named entity (CNE) [11]. Initially, dense vector representations were employed
to project the Chinese characters and dictionary attributes whose implementa-
tion was done later in RDCNN for capturing the contextual features. Eventually,
the dependencies among neighboring tags were captured, and the optimal tag
sequence was acquired for the whole sequence using CRF. The CCKS-2017 Task
2 dataset was utilized to evaluate the intended system. The outcomes confirmed
the effectiveness concerning training time.
Clinical Named Entity Recognition Using U-Net Classification Model 19
– Liu et al., constructed a medical dictionary to recognize the CNE [12]. Afterward,
the CRF system was utilized to determine the impact of diverse kinds of attributes
in task to recognize CNE. The experiments were conducted on 220 clinical texts
whose selection was done from Peking Anzhen Hospital at random. The exper-
imental results depicted the adaptability of these attributes in varying degrees to
recognize the CNE.
– Luu et al., investigated a new technique to recognize the CNE on the basis of
deep learning (DL) technique [13]. This technique was relied on two models,
namely feed forward network (FFN) and recurrent neural network (RNN) which
assisted in increasing the efficiency with respect to diverse parameters such as
precision, recall and F-score. CLEF 2016 Challenge task 1 dataset was applied to
quantify the investigated technique. The results demonstrated that the investigated
technique offered the F-score of 66% using RNN as compared to other algorithm.
3 Research Methodology
This section of the research work is divided into two phases which include dataset
description, and second part includes the methodology used for the CER.
Annotated Corpus for NER using Groningen Meaning Bank (GMB) corpus for entity
classification with enhanced and popular features by NLP applied to the dataset.
Essential info about entities:
– geo = Geographical entity
– org = Organization
– per = Person
– gpe = Geopolitical entity
– tim = Time indicator
– art = Artifact
– eve = Event
– nat = Natural phenomenon
Total Words Count = 1,354,149.Target Data Column: “tag”. Inspiration: This
dataset is getting more interested because of more features added to the recent version
of this dataset. Also, it helps to create a broad view of feature engineering with respect
to this dataset. Why this dataset is helpful or playful? It might not sound so interested
for earlier versions, but when you are able to pick intent and custom named entities
from your own sentence with more features, then it is getting interested and helps
you solve real business problems (Fig. 2).
20 P. Bansal and P. Singh
This research work is based on the CER. The CER techniques have three steps which
include feature extraction, boundary detection and classification.
– Feature extraction: In the phase of feature extraction, the whole input data will
be converted into the certain set of patterns. The patterns will help us to classify
data into certain set of categories. To apply step of feature extraction, the input
data needs to be tokenized and needs to apply part of speech tags, and in the last
step, we need to extract the context words.
– Boundary detection: In the boundary detection technique, the noun phrases will be
removed which are not required. To apply step of boundary detection techniques,
inverse document frequency will be applied.
– Entity classification: In the phase of classification, the frequency of the words
will be checked, and U-Net approach will be applied for the classification. It is
represented by initiating a layer-hopping connection, fusing feature maps in the
coding phase to the decoding phase, which is helpful to obtain the details of the
segmentation process.
The U-Net method is able to segment the lesion part into a complicated background
and still yields high accuracy. The network design has many similarities to that of
a convolutional auto encoder, which consists of a contracting trail (”encoder”) and
an expansive trail (”decoder”). The skip connection between the encoder and the
decoder is the basic characteristic of U-Net that differentiates it from a typical auto
encoder. Skip connections are responsible for recovering the spatial information lost
in down-sampling, which is important for segmentation operations (Fig. 3).
Clinical Named Entity Recognition Using U-Net Classification Model 21
This research work is based on the CER. The entity recognition has various phases
which include pre-processing, boundary detection and classification. The dataset is
collected from the Kaggle for the entity recognition. The phase of pre-processing will
tokenize the input data in the next of boundary detection the nouns will be removed
from the data. In the last phase of classification, U-net architecture will be applied
which will recognize the clinical entity. The performance of the proposed model is
analyzed in terms of accuracy, precision and recall.
22 P. Bansal and P. Singh
As shown in Fig. 4, the proposed model is implemented for CER. The proposed
model is the U-Net model for the entity recognition.
As shown in Fig. 5, the performance of proposed model is compared with the
LSTM and LSTM + CNN models. The proposed model achieves accuracy up to
98% which shows approx which is shown in Table 1. 5% improvement in the results.
5 Conclusion
The biomedical entity recognition models are generally derived for the entity recog-
nition. The entity recognition model is used to generate new entities based on the
input data. When analyzed from the previous research work, the deep learning and
pattern recognition algorithms are commonly used for the entity recognition. It is
analyzed that latest efficient designed algorithm is combination of two deep learning
models which has high complexity. In this research work, classification method
Clinical Named Entity Recognition Using U-Net Classification Model 23
will be designed which is combination of three steps which are feature extraction,
boundary detection and classification. The model of U-Net is implemented in Python.
It analyzed the results in terms of accuracy, precision and recall. The LSTM + CNN
model shows accuracy: 92.67%, precision: 91% and recall: 91%, and the proposed
model shows accuracy: 98%, precision: 99% and recall: 99% which is approx. 5%
improvements in the results for the entity recognition.
References
1. Li L, Zhao J, Hou L, Zhai Y, Shi J, Cui F (2019) An attention-based deep learning model for
clinical named entity recognition of Chinese electronic medical records. In: 2015, The second
international workshop on health natural language processing (HealthNLP 2019), pp 1–11
2. Bose P, Srinivasan S, Sleeman WC, Palta J, Kapoor R, Ghosh P (2021) A survey on recent
named entity recognition and relationship extraction techniques on clinical texts. Appl Sci
2021:1–30
3. Gong L, Zhang Z, Chen S (2020) Clinical named entity recognition from Chinese electronic
medical records based on deep learning pretraining. J Healthcare Eng 1–8
4. Ghiasvand O, Kate RJ (2015) Biomedical named entity recognition with less supervision. Int
Conf Healthcare Inf 495–495
5. Xing Z, Sun P, Xiaoqun L (2021) Chinese clinical entity recognition based on pre-trained
models. In: 2021, international conference on big data analysis and computer science (BDACS)
6. Li L, Hou L (2019) Combined attention mechanism for named entity recognition in chinese
electronic medical records. In: 2019, ieee international conference on healthcare informatics
(ICHI)
7. Qiu J, Wang Q, Zhou Y, Ruan T, Gao J (2018) Fast and accurate recognition of chinese clinical
named entities with residual dilated convolutions. In: 2018, IEEE international conference on
bioinformatics and biomedicine (BIBM)
24 P. Bansal and P. Singh
1 Introduction
Communication has been an integral part of human life ever since the beginning
of time. As per Census of India, 2011, Marathi is the third most frequently spo-
ken language in India and ranks 15th in the world in terms of combined primary
Yash Jhaveri, Akshath Mahajan, Aditya Thaker, Tanay Gandhi have contributed equally.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 25
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_3
26 Y. Jhaveri et al.
2 Literature Survey
In recent years, NMT has made significant progress in improving machine transla-
tion quality. Google Translate [9], Bing Translator [10] and Yandex Translator [11]
are some of the most popular free online translators, with Google Translator [9]
being one of the most popular locations for machine translation.
The state-of-the-art approaches for machine translation, which includes rule-
based machine translation and NMT, have been widely used [12–16]. Rule-based
MT primarily connects the structure of given input sentences to the structure of
desired output sentences, ensuring that their distinctive meaning is preserved.
Shirsath et al. [12] offer a system to translate simple Marathi phrases to English
utilizing a rule-based method and a NMT approach, with a maximum BLEU score
of roughly 62.3 in the testing set. Garje et al. [13] use a rule-based approach to
develop a system for translating simple assertive and interrogative Marathi utter-
ances into matching English sentences. Due to the lack of a large corpus for
3 Neural Machine Translation from English … 27
translation, Govilkar et al. [14] used rule-based techniques to translate only the
components of speech for the sentence. The proposed system uses a morpholog-
ical analyzer to locate root words and then compare the root word to the corpus
to assign an appropriate tag. If a word contains more than one tag, ambiguity can
be eliminated using grammatical rules. Garje et al. [15] present an online parts of
speech (POS) tagger and a rule-based system for translating short Marathi utter-
ances to English sentences. Garje et al. [16] primarily focus on the grammar
structure of the target language in order to produce better and smoother transla-
tions and employ a rule-based approach to translate sentences, primarily for the
English–Marathi pair, with a maximum BLEU score of 44.29. Banerjee et al. [17]
specifically focus on the case of English–Marathi NMT and enhance parallel cor-
pora with the help of transfer learning to ameliorate the low-resource challenge.
Techniques such as phrase table injection (PTI) have been employed and for aug-
menting parallel data, pivoting and multilingual embeddings to leverage transfer
learning, back-translation and mixing of language corpora are used.
Jadhav [18] has proposed a system where a range of neural machine Marathi
translators were trained and compared to BERT-tokenizer-trained English transla-
tors. The sequence-to-sequence library Fairseq created by Facebook [19] has been
used to train and deduce with the translation model.
In contrast with the NMT model, there has been a quite significant upscale in
other models that can be used along with the state-of-the-art NMT models for MT.
Vaswani et al. [8] have deduced that when compared to conventional recurrent
neural network (RNN)-based techniques, the transformer model provides substan-
tial enhancements in translation quality which was proposed by Bahdanau et al.
[6], Cho et al. [20] and Sutskever et al. [5]. Self-attention and absence of recur-
rent layers can be used alongside state-of-the-art NMT models that enable train-
ing quicker and a better performance in the case of absence of a huge corpus for
translation.
3 Research Gap
Google Translate [9] mainly uses statistical MT models, parameters of which are
obtained through analysis of bilingual text corpora, i.e., sentences that have poor
quality text translations. Furthermore, BLEU score of the translation received for
sentences less than 15 words is 55.1, and above 15 words is 28.6.
The rule-based technique employed by [12–16] is now obsolete and is being
replaced by transformers, deep learning models that employ the mechanism of
self-attention. Furthermore, Shirsath et al. [12] have provided a maximum BLEU
score of about 62.3 in the testing set using rule-based techniques, whereas the
paper has achieved a maximum BLEU score of about 65.29 using the proposed
methodology. Govilkar et al. [14] translated only the parts of speech for the sen-
tence using rule based techniques. In order to increase the system’s performance,
extra meaningful rules must be added. Garje et al. [16] have also used rule-based
28 Y. Jhaveri et al.
techniques for translation but have provided a maximum BLEU score of around
49, whereas the paper has achieved a maximum BLEU score of about 65.29 using
the proposed methodology. Moreover, the problem with rule-based learning lies
with exploring with the incomprehensible grammar, which is on the other hand
eliminated by the approach presented by the paper. Newer techniques such as
phrase table injection (PTI), back-translation and mixing of language corpora have
been applied by Banerjee et al. [17], yet have failed to achieve an adequate BLEU
score having used a huge corpus of around 2.5 lakh sentences. From the results
from the proposed system of Jadhav [18], it can be observed that the proposed
transformer-based model can outperform Google Translation for sentence length
up to 15 words but not more than 15 words. This paper, on the other hand, focuses
on sentences more than 15 words length and tries to model accurate predictions.
4 Methodology
Statistical MT [21] is one of the most widely used techniques in which conditional
probabilities are calculated using a bilingual corpus, which is used to reach the
most likely translation. As a baseline model, SMT model has been employed to
convert English sentences to Marathi. This was achieved through a word-based
SMT model, trained by calculating the conditional probabilities of Marathi words
given an English word, and using it to translate input sequences token by token.
Most translation systems are based on this technique but do not achieve precise
translations.
In order to tackle this, newer methods like rule-based MT and NMT had been
introduced with the most accurate method being NMT. This method employ-
ees NLP concepts and includes models like sequence-to-sequence, attention and
transformers.
Sequence-to-sequence. RNNs [22] are a type of artificial neural networks,
which were one of the first to be used to work with sequential or time series data.
RNNs require that each timestep be provided with the current input as well as the
output of the previous timestep. Although it stores context from past data in the
sequence, it is also prone to vanishing and exploding gradient problems. LSTMs
were introduced to overcome this problem, by maintaining forget, input and output
gates within each cell, that controls the amount of data which is stored and propa-
gated through the cell.
Sequence-to-sequence (seq2seq) models [23] are a class of encoder–decoder
models that are used to convert sentences in one domain to sentences in another
domain. This encoder–decoder architecture comprises the encoder block, the
decoder block and context vector.
1. Encoder block: This block consists of a stack RNN layer, preferably with
LSTMs cells. The outputs of the encoder block are discarded, as the hidden
states of the last LSTM cell are used as a context vector and sent to the decoder
block.
2. Decoder block: This block consists of the same architecture as that of the
encoder block. It is trained for a language modeling task, in the target language
taking only the states of the encoder block as input (Fig. 1).
The image above describes the architecture of the encoder–decoder model. During
the training phase of the decoder, teach forcing is used, which feeds the model
ground truth instead of the output of the previous states. In the testing phase, a
<START> token is provided as input to the first cell of the decoder block that
marks the start of a sequence, along with the hidden states of the encoder block.
The outputs of this cell are used as input to the next cell to make a prediction for
the next word. This procedure continues, until the <END> token is generated
which marks the end of the sequence. This token is used so that the model can be
assured that the sentence translation procedure has finished.
A single RNN layer has been used consisting of LSTM cells for the encoder
block and a similar architecture for the decoder block. Embedding layers are used
to translate the sentences from words to word vectors before it can be used by the
encoder. Another embedding layer is used to convert the outputs of the decoder
block into words in target language, after which a softmax function gives a proba-
bility distribution over the vocabulary.
Attention. In recent years, NMT problems have found major success using the
encoder–decoder framework, which first encodes the source sentence, that is used
to generate the translation by selecting tokens from the target vocabulary one at a
time. [22, 23]
This paradigm, however, fails on long sentences where the context required to
correctly predict the next word might be present at a different position in the sen-
tence which might be forgotten. An attention mechanism is used to refine transla-
tion results by focusing on important parts of the source sentences [25] (Fig. 2).
The proposed encoder network consists of three LSTM layers having 500 latent
dimensions. On the other hand, the decoder network first has an LSTM that has
its initial state set to the encoder state. The attention layer is then introduced that
takes the encoder outputs and the outputs from the decoder LSTM. Finally, the
outputs from the decoder LSTM and the attention layer are combined and passed
through a time-distributed dense layer.
The authors have used the “Teacher Forcing” method to train the network
faster. The model was set to train for 40 epochs using the RMSProp optimizer
along with sparse categorical cross-entropy loss but observed early stopping after
just 22 epochs.
The trained weights are then saved, and an inference model is generated using
the encoder and decoder weights to predict and evaluate the translation results.
This is done by adding a fully connected softmax layer after the decoder in order
to generate a probability distribution over the target vocabulary.
Transformers. The work by Ashish Vaswani et al. [8] proposes a novel method
for avoiding recurrence and depending solely on the self-attention mechanism.
This new architecture is more precise, parallelizable and faster to train (Fig. 3).
Fig. 3 Transformer
architecture [8]
32 Y. Jhaveri et al.
In the transformer model, a stack of six encoders and six decoders is used. The
input data is first embedded before it is passed to the encoder or decoder stacks.
Because the model lacks recurrence and convolution, the authors injected some
information about the relative or absolute positions of the tokens in the sequence
to allow the model to use the sequence’s order. Positional encoding was added to
the input embeddings to achieve this. The positional encodings and embeddings
have the same dimension, therefore can be added together.
There are two levels to each encoder. The multi-head attention layer is the
initial encoder in the stack through which the embeddings with their positional
encoding are passed and subsequently supplied to the feed-forward neural net-
work. The self-attention mechanism uses each input vector in three different ways:
the query, the key, and the value. These are transmitted through the self-attention
layer, which calculates the self-attention score by taking the dot product of the
query and key vectors. To have more stable gradients, this is divided by the square
root of the dimensions of key vectors and then supplied to the softmax algorithm
to normalize these scores. This softmax score is multiplied by the value vectors,
and then the sum of all weighted value vectors is computed. These scores indicate
how much attention should be paid to other parts of the input sequence of words in
relation to a certain word. Because the self-attention layer is a multi-headed atten-
tion layer, the word vectors are broken into a predefined number of chunks and
transmitted through various self-attention heads to pay attention to distinct parts
of the words. To generate the final matrix, the output of each of these pieces is
concatenated and multiplied by the specified weight matrix. This is the final output
of the self-attention layer, which is normalized and added to the embedding before
being sent to the feed-forward neural network.
5 Results
After experimenting with the number of layers in the model and fine-tuning the
hyperparameters of the models used, the paper compares the results of the transla-
tions produced using the BLEU score and WER score.
The sacreBLEU score is a metric for assessing the quality of machine transla-
tions from one language to another. The link between a machine’s output and that
of a human is characterized as quality. It was created to evaluate text generated
for translation, but it can also be used to evaluate text generated for other natural
language processing applications. Its output is often a score between 0 and 100,
indicating how close the reference and hypothesis texts are. The higher the value,
the better the translations.
Word error rate (WER) computes the minimum edit distance between the
human-generated sentence and the machine-predicted sentence. It calculates the
number of discrepancies between the projected output and the target transcript by
comparing them word by word. The smaller the value, the better the translations.
3 Neural Machine Translation from English … 33
From the Tables 2, 3, 4, 5, 6, 7 and Fig. 4, it can be observed that the best per-
forming model with respect to SacreBLEU score and WER score metrics is the
transformer model, while the worst performing model is SMT. This is so because
the transformer model keeps track of the various word positions in the sentences
and uses the attention mechanism while the SMT depends upon the probability of
the next word which makes it less accurate and reliable.
Table 4 Translation result Input Ive already finished reading this book
Required हे पुस्तक माझं आधीच वाचून झालं आहे
SMT वाचत आधीच पुस्तक या हे
Seq2Seq पुस्तक माझं आधीच वाचून झालं
Attention मी आधीच हे पुस्तक वाचून काढलं आहे
Transformer माझं आधीच हे पाच वाजलं आहे
6 Conclusion
References
1. https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers. Last
accessed 03 Sept 2022
2. https://censusindia.gov.in/2011Census/C-16_25062018_NEW.pdf. Last accessed 03 Sept
2022
3 Neural Machine Translation from English … 35
Abstract The most common causes of death are cardiovascular diseases (CVDs)
every year due to lifestyle which include getting stressed up, unhealthy con-
sumption of food, and many more amongst others. Several researchers have used
machine learning (ML) models for the diagnosis of CVD, but it has been discov-
ered that a lot of them still obtain low accuracy, AUC, precision, recall, F1-score,
and high false positive rate (FPR). This study therefore aims to review previous
research on CVD using ML techniques. The review intends to find out the lapses
encounter in this state-of-the-art review and propose a solution to how to solve the
challenges aforementioned earlier. This study also examines existing technologies
and provides a clear vision of this area to be aware of the utilized approaches and
existing limitations in this line of research. To this end, an extensive search was
conducted to find articles dealing with various diseases and methods of solving
them comprehensively by reviewing related applications. SpringerLink, Talyor
and Francis, and Scopus databases were checked for articles on CNN and other
machine learning (ML) algorithms on the diagnosis or prediction of cardiovascu-
lar diseases. A total of 2606 publications were obtained from the databases. The
R. O. Ogundokun
Department of Multimedia Engineering, Kaunas University of Technology, 44249 Kaunas,
Lithuania
e-mail: rosogu@ktu.lt
S. Misra
Department of Computer Science and Communication, Østfold University College, 1757
Halden, Norway
e-mail: sanjay.misra@hiof.no
D. Umoru
Department of Computer Science, Landmark University Omu Aran, Omu Aran, Nigeria
A. Agrawal (*)
Amity University Haryana, Gurgaon, India
e-mail: akshatag20@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 37
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_4
38 R. O. Ogundokun et al.
publications were filtered according to the type of methods used as well as the
disease examined by the authors. A total of 27 publications were found to meet
up with the inclusion criteria. In conclusion, the review of related works hence
provided an insight into the future of cardiovascular disease diagnosis. The lapses
of the presently used ML models for the diagnosis of the disease were discovered,
and recommendations to solve these challenges were presented in the study.
1 Introduction
Human disease is a disorder of the normal state of a human being that disrupts
the normal functions [1, 2]. It always describes the infections and disabilities that
could affect any living organism. Diseases can affect people mentally, physically,
and even socially [3]. There are four main types of diseases: diseases of illness,
inherited diseases (including both genetic and non-genetic hereditary diseases),
infectious diseases, and physiological diseases [4]. It can be classified into commu-
nicable diseases and diseases that cannot be communicated. The deadliest human
diseases are cardiovascular diseases, followed by communicable diseases in terms
of nutritional deficiencies, cancer, and chronic respiratory infections [4]. There are
many forms of coronary heart disease, also known as ischemic heart disease, which
is cardiovascular disease, the leading cause of death globally [4]. The description
and use of convolutional neural networks (CNNs) prediction are majorly used in
image classification [5]. CNN is an effective algorithm for the identification of pat-
terns and image processing [6]. The CNN can be used to classify the electrocardio-
gram (ECG) images of the heart into various types of heart conditions.
Logistics regression is a statistical method for evaluating a dataset that includes
one or more independent variables that decide the outcome [7]. Also, like other
regression analyses, logistics regression is an analysis for prediction [8]. E-health
services are trends in today’s world technology. This applies to health services and
information provided by IT such as the internet [9]. It is being utilized by almost
everyone in the whole world ranging from products from big companies that pro-
vide health services in all their new technologies like Apple, Google, and others. It
is an upcoming and trending field of medical and public health facilities [10].
This study, therefore, aims to conduct a systematic literature review on cardio-
vascular disease prediction based on the use of machine learning algorithms. The
lapses with the present ML models used for the diagnosis of CVD were also pre-
sented in this study.
The remaining part of the article is pre-structured as thus: Sect. 2 presents the
methods and materials used for the SLR. Section 3 presents the results from the
SLR, while in Sect. 4, the result obtained is discussed and interpreted. The study is
concluded in Sect. 5.
Review of Cardiovascular Disease Prediction Based on Machine … 39
2 Methodology
Research Questions
The study formulated some research questions (RQ) for this SLR. They are as
follows:
RQ 1: What are the ML algorithms used for the diagnosis of CVD?
RQ 2: What are the cardiovascular diseases that are mostly examined?
RQ 3: What are the ML accuracies obtained in previous research? (Table 1).
2.1 Method
The method used in this study is the PRISMA SLR method as shown in Fig. 1.
A total of 2606 publications were obtained from three databases (SpringerLink,
Taylor and Francis, and Scopus). A total of 204 duplicates were eliminated from
the databases. A total of 2402 articles were examined for further elimination using
abstracts, keywords, and titles. A total of 2336 publications were excluded using
abstracts, keywords, and titles leading to 66 articles remaining for full-text down-
load. The 66 articles full-text were downloaded and considered for eligibility out
of which 11 were excluded due to unavailability of the full text. A total of 32 of
the 55 articles downloaded for inclusion in the qualitative synthesis review were
excluded because the article does not focus on CVD diagnosis and ML techniques.
A total of 27 articles were later included for synthesis review (23 articles plus an
extra 4 papers located when other databases were manually searched).
40 R. O. Ogundokun et al.
Duplicates ex-
cluded (n = 204)
11 articles excluded
66 Full-text articles due to unavailability of
obtained for eligibility full text
27 articles in-
cluded for qualita-
tive and quantita-
tive synthesis
The study searched three databases which are SpringerLink, Taylor and Francis,
and Scopus databases and obtained a total of 2606 publications as seen in Table
2. The study search string was cardiovascular, deep learning, machine learning,
and diagnosis as shown in Table 3. The search included publications on CNN
algorithms and other disease-related algorithms. According to the types of tech-
niques applied between 2018 and 2022, the articles that were retrieved were fil-
tered. Convolutional neural networks and search phrases associated with cardiac
disorders were combined. Before making a choice, the complete text and, if nec-
essary, extra materials (tables) were studied for papers lacking abstracts or without
enough information in the abstract to make a choice. The databases citing refer-
ences were manually searched for any additional potentially suitable research in
the reference lists of the included studies.
Studies are required to be original to qualify for inclusion. The accuracy of the
supplied indicators for evaluating predictive power may be considered. These
measures are now more widely acknowledged and employed in the evaluation of
incremental predictive capability. Studies only involving people were comprised.
Using the qualifying criteria provided in linked works, all of the authors assessed
the citations and abstracts found through electronic literature searches. The ability
of statistical models to forecast illnesses was assessed in the eligible publications,
which were all published in English. Because several predicting elements that, in
some way or another, are connected to neural networks have been generated using
algorithms. There was no restriction on studies by diagnosis among medical popu-
lations. Excluded studies were studies based on psychiatric, surgical, and pediatric
populations because datasets and results were not feasible.
Data were gathered using standard risk variables as well as research populations,
locations, and methodologies. Studies that were extracted and concentrated on for
the examination of discrimination quality assessment showed an accuracy of over
70%. Researchers calculated the percentage of research in which they asserted an
improvement in prediction when interpreting their results. Administrative, pri-
mary (such as survey, chart review), or both types of data were available. Other
operational parameters including sensitivity, specificity, and predictive values are
abstracted if the accuracy was not given. First and foremost, systematic informa-
tion on sampling, research design, outcome criterion formulation, and statistical
analysis were taken from all of the chosen studies. Then, to guarantee correct
study inclusion, the aforementioned inclusion criteria were used once again.
A narrative synthesis of the available data was undertaken due to the vast variety
of indicators utilized to evaluate the incremental predictive power of algorithms.
Too many different types of research were included, concentrating on the popula-
tions where the model has been tested, the kinds of variables used in each model,
and model discrimination.
In these reviews, we used some criteria for the elimination and inclusion of articles
for this SLR. The inclusion criteria are if (i) The publication is written in English,
(ii) the publication is on cardiovascular diseases, (iii) the publication used machine
learning algorithms for diagnosis or detection, and (iv) the publication is a journal
article.
Review of Cardiovascular Disease Prediction Based on Machine … 43
The exclusion criteria are if (i) the publication does not focus on cardiovascu-
lar disease, (ii) the publication did not use machine learning algorithms, (iii) the
publication is not written in English, (iv) the publication is a conference, preprint,
chapter of books, magazine publications.
3 Results
Citations found through searches in the ScienceDirect, Research Gate, and Scopus
databases were used to choose the study. A total of 128 abstracts were chosen for
further screening, and 74 full-text publications were examined. After all exclu-
sions, 54 publications covering 50 studies were included because they satisfied
the qualifying requirements. The table in the connected works and its data sum-
mary includes information from research that evaluated the precision of predic-
tion models. The following factors were taken into consideration when choosing
the datasets: physical activity, dietary habits, smoking status, alcohol use, body
mass index, height, weight, waist circumference, hip circumference, family his-
tory of diabetes, ethnicity, blood pressure, and/or antihypertensive medication use
and steroid use. Occasionally, biological components were present (blood glu-
cose, triglyceride, lipid variables, or uric acid levels). The RQs are answered in the
sub-sections as follows:
RQ 1: What are the ML Algorithms Used for the Diagnosis of CVD?
The ML methods used by the authors of the included publications are discussed
here as follows, and those methods can be seen in Table 4.
Kalaivani et al. [11] a hybridized Ant Lion Crow Search Optimization Genetic
Algorithm (ALCSOGA) to accomplish operative feature selection. A combina-
tion of Ant Lion, Crow Search, and Genetic Algorithm is used in this optimiza-
tion. The elite place is decided by the Ant Lion algorithm. While the Crow Search
Algorithm makes use of each crow's position and memory to assess the goal
function.
Singhal et al. [12] proposed a new CNN-based heart disease prediction model
and compared it to existing systems, and the prediction accuracy of the prediction
system is 9% relative higher than theirs. They aimed at developing a medical diag-
nosis system that can predict heart diseases and use the backpropagation mecha-
nism to train datasets in CNN.
Harini and Natesh [13] gathered and examined an enormous volume of infor-
mation for prediction since there were no advancements thereby using statistical
knowledge, they determined the major chronic diseases and used the CNN algo-
rithm. They also compared the prediction of risk of multimodal disease (CNN-
MDRP) algorithm used with structured and unstructured hospital data, and this
is 94.8% higher than the CNN-based unimodal disease risk prediction algorithm
(CNN-UDRP).
44 R. O. Ogundokun et al.
Abdelsalam and Zahran [14] offered some specific unique methods for DR
early detection based on multifractal geometry to detect early non-proliferative
diabetic retinopathy, and the macular optical coherence tomography angiography
(OCTA) pictures are examined (NPDR), automating the diagnosing process with
supervised machine learning techniques like the support vector machine (SVM)
algorithm and increasing the resulting accuracy. The accuracy of the categoriza-
tion method was 98.5%.
Cinar and Tuncer [15] proposed a method for the classification of normal sinus
rhythm (NSR), abnormal arrhythmia (ARR), and congestive heart failure (CHF).
ECG data using the highly accurate and well-liked deep learning architecture is
discussed. Hybrid AlexNet-SVM is the foundation of the suggested architecture
(support vector machine). There are 192 ECG signals in all, including 96 arrhyth-
mia, 30 CHF, and 36 NSR signals. ARR, CHR, and NSR signals are initially cat-
egorized by SVM and KNN methods, reaching 68.75% and 65.63% accuracy, to
show the classification performance of deep learning architectures. The signals
are then categorized with an accuracy of 90.67% using long short-time memory
Review of Cardiovascular Disease Prediction Based on Machine … 45
(LSTM). The hybrid AlexNet-SVM technique is used to apply to the photos and
achieve 96.77% accuracy by collecting the spectrograms of the signals. The find-
ings demonstrate that compared to traditional machine learning classifiers, the pro-
posed deep learning architecture more accurately categorizes ECG data.
Muhammad et al. [16] used diagnostic CAD data gathered from the two general
hospitals in Kano State, Nigeria, to construct machine learning prediction models
for CAD. The dataset was used to train predictive models using machine learning
techniques like support vector machines, K-nearest neighbors, random trees, Naive
Bayes, gradient boosting, and logistic regression. The models were then assessed
based on performance metrics like accuracy, specificity, and sensitivity using
receiver operating curve (ROC) techniques.
Subhadra and Vikas [17] proposed a system implementing the concept of mul-
tilayered neural networks processed with clinical attributes thus improving the
accuracy of disease diagnosis and prediction and, for effective prediction, a back-
propagation algorithm was used to train the data.
Kadam Vinay et al. [18] built a new model for disease prediction with the use
of predicting artificial neural network (ANN) diseases in terms of probabilistic
modeling. With the concern of obtaining relevant health information data for pre-
diction from a large hospital information database, they got the prediction accu-
racy and produced a confusion matrix for that data. The model gives the 95%,
98%, and 72% accuracy for predicting heart, kidney, and diabetes diseases.
Wahyunggoro et al. [19] proposed a model for prediction by using the ANN
model based on datasets thereby forecasting diseases because it is a problem in
terms of not covering all diseases and rare conditions. A case study of the occur-
rence of the number of diseases was used to compare different neural network
techniques and then select the best outcome among them.
Sadek et al. [20] worked toward helping doctors know what causes Parkinson’s
disease in terms of earlier detection. They implemented an ANN system with a
backpropagation algorithm, and then it was discovered that the symptoms come
slowly and are difficult to diagnose.
Ogundokun et al. [21] proposed a PCA feature extraction technique with three
classification ML methods. The methods were used to diagnose heart disease. It
was discovered that the system had a low accuracy of 56.86%, but the detection
rate of 98.7% was on the high side.
Krishnan and Kumar [22] proposed a big data concept of both structured and
unstructured datasets using the CNN definition, multimodal disease risk prediction
(CNN-MDRP) based on CNN. Due to machine learning techniques, it is not being
able to handle the difficulties of little and incomplete data in disease prediction,
focusing on both data prediction and big data analytics.
Thiyagarajan et al. [23] classified the output data of diabetes to improve accu-
racy thereby overcoming traditional methods of algorithms thereby implement-
ing the transductive extreme learning machine (TELM) based on testing time and
accuracy. The precision and execution time for the proposed TELM show high
performance and thus making it a good option for the diagnosis process to classify
diabetes data.
46 R. O. Ogundokun et al.
Durai et al. [24] aimed at predicting the occurrence of liver diseases based on
the unhealthy lifestyles of patients. By using the J48 algorithm to make decisions
on the prediction, they were able to get meaningful results from large datasets.
Gujar et al. [25] used the Naïve Bayes algorithm, which takes symptoms to
predict diseases based on user input information. They aimed at analyzing clinical
documents about patients’ health to predict the possibility of occurrence of any
diseases.
Gawande and Barhatte [26] aimed at getting reliable information from biomed-
ical signals because of problems in consistencies and location of source or trans-
mission of the cardiac electrical impulse and then proposed a system in which an
ECG signal is given as input; then segmentation is done to analyze the signal and
output is trained in one-dimensional CNN in a particular format. The obtained
accuracy is nearly 99.46%.
Dami and Yahaghizadeh [27] proposed a 5 min electrocardiogram (ECG)
recording and the extraction of time–frequency characteristics from ECG signals
which were utilized to predict arterial events a few weeks or months in advance
of the occurrence using a deep learning technique. The LSTM neural network was
utilized to take into account the potential for learning long-term dependencies to
swiftly recognize and avoid these situations. To choose and represent the recorded
dataset's most useful and efficient characteristics, a deep belief network (DBN)
was also utilized. LSTM-DBN is the short name for this method.
Ogundokun et al. [28] proposed the use of two computational intelligence tech-
niques. They employed the use of autoencoder feature extraction algorithms with
decision trees and K-nearest neighbor (KNN). The algorithms’ performance was
evaluated using a heart disease dataset from the Nation Health Service database.
They obtained an accuracy of 56.19%.
Bhaskaru and Devi [29] proposed an accurate detection and prediction of heart
diseases since medical researchers have found most results inaccurate. Hybrid
differential evolution fuzzy neural network (HDEFNN) was implemented by
improving the initial weight updating of the neural networks and can perform well
without retraining.
Elsayed et al. [30] proposed a mobile application system that predicts coro-
nary heart diseases based on risk factors implemented using the logistic regression
model.
RQ 2: What are the Cardiovascular Diseases that are Mostly Examined?
There are several CVDs that have been examined in previous research, and a few
of them were investigated in the publications included in this review. They include
heart disease [12, 21, 28, 29, 30], diabetic cases [14], heart, kidney, and diabetes
diseases [18], Parkinson’s disease [20], liver diseases [24], and so on.
It can be deduced that many researchers worked on heart disease diagnosis. It
is suggested that more investigation should be carried out on the use of ML tech-
niques on other types of CVD to detect them early and reduce the death rate in the
world.
Review of Cardiovascular Disease Prediction Based on Machine … 47
4 Discussions
CNN is mostly used, and accuracy depends on the number of layers and algo-
rithms supporting them. Only eight related works out of the existing publication
included in this review gave a straightforward accuracy. The methods consist of
a series of neural networks and algorithms that support them while some are not
supported. Average accuracy is being computed as per various accuracies derived
in every article. Discovered in these reviews is the fact that neural networks used
are being supported and then used in an intertwined manner in an approach to get
high accuracy. The strengths and limitations are discussed as follows:
The removal of studies that merely provided effect estimates for the inde-
pendent associations of algorithms with the outcome, such as risk ratios or rela-
tive risk, is one of this review's strengths. The justification for predictive testing
is dependent less on the size of the risk ratio and more on how helpful the test
findings are in enhancing illness prognosis. The restrictions stem from the fact that
only English-language publications were included, making it impossible to access
research that was conducted in a variety of languages also did not formally eval-
uate the potential for publication bias. Unpublished research, however, is more
likely to have results that are discouraging; thus, they would likely reduce rather
than exaggerate estimates of the predictive utility of algorithms (i.e., those report-
ing no improvement in the predictive accuracy).
The number of publications per year can be seen in Fig. 2, and it was discov-
ered that 30% of the publications were published in the year 2018, 26% were pub-
lished in the year 2019, 15% in the year 2020, 19% in the year 2021, and lastly
11% articles were published in the year 2022.
48 R. O. Ogundokun et al.
5 Conclusion
More implementation studies should be carried out to improve the uptake of exist-
ing risk prediction models and, consequently, to assess their effects on the imple-
mentation and outcomes of diabetes prevention and control programs, given that
existing prediction tools based on established risk factors are already achieving
acceptable-to-good discrimination. The hybridization of algorithms such as ML
and deep learning models can be hybridized together. Various researches have
attempted to combine various techniques to boost accuracy in terms of:
a. Implementation of various characteristics of CNN in combination with other
algorithms.
b. Some combinations appear to be individually executed and merged at a perfect
point at the end of the implementation.
c. Tackled accuracy of results based on datasets used and therefore giving useful
information about it.
d. Also tackled the amount of time of executions hoping that someday it can be
improved on.
e. Most algorithms tackled the raw datasets based on patients’ information.
A combination of algorithms could be the next move in the improving fastness of
result outputs in the world today. This proposed study will also tackle the accu-
racy of the combination of methods and would therefore provide medical profes-
sionals the aid in the prediction of various heart diseases. This study proposes the
Review of Cardiovascular Disease Prediction Based on Machine … 49
provision of efficiency in the utility and also looks forward to reducing the rate of
cardiovascular diseases in the nearest future.
References
21. Ogundokun RO, Misra S, Awotunde JB, Agrawal A, Ahuja R (2022) PCA-based feature
extraction for classification of heart disease. In: Advances in electrical and computer tech-
nologies. Springer, Singapore, pp 173–183
22. Krishnan D, Kumar SB (2018) A survey on disease prediction by machine learning over big
data from healthcare motivation: IOSR. J Eng 08(10):2278–8719
23. Thiyagarajan C, Kumar KA, Bharathi A (2018) Diabetes mellitus diagnosis based on trans-
ductive extreme learning machine 15(6):412–416
24. Durai V, Ramesh S, Kalthireddy D (2019) Liver disease prediction using machine learning
5(2):1584–1588
25. Gujar D, Biyani R, Bramhane T, Bhosale S, Vaidya TP (2018) Disease prediction and doctor
recommendation system 3207–3209
26. Gawande N, Barhatte A (2018) Heart diseases classification using a convolutional neu-
ral network. In: Proceedings of the 2nd international conference on communication and
electronics systems, ICCES 2017, 2018-Janua (June), 17–20. https://doi.org/10.1109/
CESYS.2017.8321264
27. Dami S, Yahaghizadeh M (2021) Predicting cardiovascular events with a deep learning
approach in the context of the internet of things. Neural Comput Appl 33(13):7979–7996
28. Ogundokun RO, Misra S, Sadiku PO, Adeniyi JK (2021) Assessment of machine learning
classifiers for heart diseases discovery. In: European, mediterranean, and middle eastern
conference on information systems. Springer, Cham, pp 441–452
29. Bhaskaru O, De Harini vi MS (2019) Accurate and fast diagnosis of heart disease using
hybrid differential neural network algorithm 3:452–457.
30. Elsayed HAG, Galal MA, Syed L (2018) HeartCare+: a smart heart cares mobile application
for Framingham-based early risk prediction of hard coronary heart diseases in the middle
east. Mobile Inf Syst 2017
Exploiting Parts of Speech
in Bangla-To-English Machine
Translation Evaluation
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 51
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_5
52 G. Datta et al.
1 Introduction
When developing machine translation (MT) systems that automatically convert the
source to the target language, the correct evaluation of such automatic translation
is a crucial task. MT has gone through different stages such as dictionary-based,
rule-based, statistical, phrase-based, hybrid, and most recently neural-based [1–8].
The most popular method for MT evaluation is reference-based, which involves
comparing the system output to one or more human reference translations. The
majority of MT evaluation methods in use produce a computer-generated abso-
lute quality score. The simplest scenario is comparing the similarity of translations
suggested by machines (hypothesis sentence) to the human translation (reference
sentence). To do this, word n-gram matches between the translation (MT output)
and the reference are counted. This is the situation with BLEU [18], which has
long served as the benchmark for MT evaluation. In MT evaluation, human judg-
ment is considered to be the best, but it is time-consuming. The features used by
the human evaluators are captured and applied to the automatic evaluation metrics
by training with a couple of reference translations (generated by human evalua-
tors) and hypothesis translations (translation generated by MT). How the human
judgment is done when two hypotheses are given for a particular reference trans-
lation for choosing the best translation for ranking is used in automatic evaluation
metrics. Such type of MT evaluation is done on the basis of ranking, i.e., without
generating a direct score, we have multiple MT engines and their translation qual-
ity needs to be measured[9]. Researchers also focused on the syntactic evaluation
of the MT system where they included syntactic information in the MT evaluation
[10]. Apart from BLEU, some of the widely used automatic evaluation metrics are
METEOR [11], National Institute of Standard and Technology (NIST) [12], Word
Error Rate (WER) [13], etc. Evaluating the performance of the MT system is very
important to developing a better system. Researchers are quite active in this area
for a long and have explored different techniques such as precision-based, recall-
based, parts-of-speech (POS)-based, neural network-based, etc. to enhance the
evaluation accuracy of MT engines [14, 15]. In this paper, we attempted to eval-
uate the performance of two online neural-based translation systems: Google and
Bing with the most popular metric BLEU in conjunction with the syntactic evalua-
tion. In syntactic evaluation, we have taken parts of speech of the sentences (refer-
ence and hypothesis sentences) with the same approach as BLEU uses to evaluate.
The remaining part of the paper is presented as follows: Sect. 2 highlights some
previous work in MT evaluation; Sect. 3 briefs methodology; Sect. 4 elaborates on
our experimentation; Sect. 5 includes results and discussion. Finally, Sect. 6 has a
conclusion and future direction.
Exploiting Parts of Speech in Bangla-To-English … 53
2 Related Work
3 Methodology
of speech during matching. In the paper, researchers used POS with BLEU for
other language pairs [15]. BLEU score with parts of speech we named as BLEU
with POS. We have tried to find the correlation of both the scores (BLEU and
BLEU + POS) with the human score. Further, we compute the Pearson correlation
and found BLEU with POS has a higher correlation with human scores compared
to normal BLEU. Figure 1 is a diagrammatic representation of our methodology.
The process of computing a human score is as follows: The source sentence is
Bangla, and translated sentence is English; both are shown to five human evalua-
tors who have expertise in both these languages: English and Bangla. On the basis
of two important parameters such as adequacy and fluency, the human evaluators
are asked to evaluate.
Adequacy means completeness, i.e., whether the translated sentence is com-
plete or not that is being evaluated. Fluency ensures the correctness of the gram-
mar of the translated text. Adequacy and fluency both are measured on a scale
from 1 to 5. If all the meanings of the translated words are correct, then a full
score, i.e., 5, is assigned to adequacy. If most of the meaning of the translated
word is correct, then 4. In case of much meaning assigned score is 3. In the case
of words that have little meaning, score is 2 and with no meaning, the score is 1.
In a similar fashion, fluency scores are also decided. If the translated sentence is
incomprehensible, then the score is 1, flawless translation has a score of 5, good
language has 4, non-native language has 3, and if the translated sentence is dis-
fluent, then the score is 2. On the basis of both adequacy and fluency scores, the
average score is computed. And this computed average score is the human score.
Next correlation is computed to figure out which automatic metric has a higher
correlation with the human judgment score.
Exploiting Parts of Speech in Bangla-To-English … 55
4 Experimentation
Table 1 BLEU score with n-gram where n = 1 to 4, POS-BLEU and human score of test sen-
tence 1 (scores are for Google Translate)
BLEU POS-BLEU Human score
0.66 0.88 0.98
0.37 0.75 0.95
0.29 0.73 0.95
0.17 0.70 0.98
Fig. 4 Graphical
representation of automatic
and human scores
stated before, we used BLEU for multiple n-grams and accordingly also noticed
variation in evaluation scores. Figure 2 is the graphical representation of all the
metrics scores. Pearson’s correlation score is reported graphically in Fig. 5. We
can see that BLEU with POS has a higher correlation with the human score as
compared to normal BLEU. Pearson’s correlation coefficient values vary from −1
to + 1. −1 means a negative correlation between two variables. 0 means no correla-
tion. And + 1 means positive correlation (Fig. 4).
Whereas in our second example sentence (generated by Microsoft’s Bing trans-
late), reference text exactly matches with hypothesis text and hence in this case
all the scores are highest and considers to be the best translation (Figs. 6 and 7).
All the metrics scores are shown in Table 4. In the automatic translation score
generation process by BLEU and BLEU with POS, all tokens irrespective of the
Exploiting Parts of Speech in Bangla-To-English … 57
Table 4 BLEU (n = 1–4) scores, BLEU with POS and human score for test sentence 1 (transla-
tion generated by Bing)
BLEU (n = 1 to 4) BLEU with POS Human score
1 1 1
1 1 1
1 1 1
1 1 1
Hypothesis text and reference text exactly match, and hence, all metrics have the highest score
number of n-grams exactly match, and hence with BLEU’s computation criteria it
generates a maximum score. And in this case, the human score is also the highest
because hypothesis and reference translation both are exactly the same (Tables 2
and 3).
58 G. Datta et al.
From our experimentation and results as reported in tables in Sect. 4, we can see
that since the BLEU metric is mainly dependent on a precision-based approach
where it tries to match tokens from reference text and hypothesis text, hence
n-gram with minimum n-value has the highest score. When the n value increases,
the score also gradually decreases. The reason is when the n value is more than 1,
then accordingly it tries to find that the many numbers of the exact token match
from reference and hypothesis texts. But in reality, this is not possible because
words may have multiple meanings based on context, words may be represented
in a sentence in a different manner, one particular context/meaning can be repre-
sented by different words, and so on. The probability of occurrence of such type
of case is higher in case of morphologically rich languages. Hence, the chances of
an accurate score are also reduced in normal BLEU. On the other hand, when we
use BLEU with POS, such type of exact word matching case between reference
and hypothesis can be avoided and matching will be done on the basis of POS, and
thereby chances of accuracy increase. This is the reason when we try to find the
correlation between the human score and both the automatic metrics BLEU and
BLEU with POS, the latter has a higher correlation with the human score.
We can conclude from the above work and discussion that correct evaluation is
very important in MT design. Sometimes automatic evaluation metrics fail to pro-
duce a correct score because they are primarily dependent on either n-gram-based
precision or a recall-based approach. Hence, researchers also focused on machine
learning-based approaches in the automatic evaluation process. We have also seen
that by integrating a syntax-based approach in the evaluation process with BLEU,
we are able to generate a better score, which has a higher correlation with a human
score, in our Bengali-to-English translation task. Furthermore, in future work, we
can exploit recently introduced ML models where we can capture the semantic,
syntax, and other important information in hypothesis and reference translations,
they can be embedded before training the model for our language pairs, and evalu-
ation accuracy can be enhanced.
References
3. Xiong D, Meng F, Liu Q (2016) Topic-based term translation models for statistical machine
translation. Artif Intell 232:54–75
4. Koehn P et al (2007) Moses: open source toolkit for statistical machine translation. In:
Proceedings of the 45th annual meeting ACL interaction poster demonstration session—
ACL ’07 177. https://doi.org/10.3115/1557769.1557821
5. Vaswani A et al (2017) Attention is all you need. Adv Neural Inf Process Syst 2017-Decem,
5999–6009
6. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks.
Adv Neural Inf Process Syst 4:3104–3112
7. Stahlberg F (2020) Neural machine translation: a review. J Artif Intell Res 69:343–418
8. Vathsala MK, Holi G (2020) RNN based machine translation and transliteration for Twitter
data. Int J Speech Technol 23:499–504
9. Duh K (2008) Ranking vs. regression in machine translation evaluation. In: Third workshop
on statistical machine translation WMT 2008 annual meeting association on computer lin-
guist ACL 2008 191–194. https://doi.org/10.3115/1626394.1626425
10. Liu D, Gildea D (2005) Syntactic features for evaluation of machine translation. In:
Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for
machine translation and/or summarization ACL 2005 25–32
11. Banerjee S, Lavie A (2005) METEOR: an automatic metric for mt evaluation with improved
correlation with human judgments. In: Proceedings of the acl workshop on intrinsic and
extrinsic evaluation measures for machine translation and/or summarization 2005 65–72
12. Doddington G (2002) Automatic evaluation of machine translation quality using n-gram
co-occurrence statistics 138. https://doi.org/10.3115/1289189.1289273
13. Park Y, Patwardhan S, Visweswariah K, Gates SC (2008) An empirical analysis of word
error rate and keyword error rate. In: Proceedings of annual conference on international
speech communication association INTERSPEECH 2070–2073. https://doi.org/10.21437/
interspeech.2008-537
14. Guzmán F, Joty S, Màrquez L, Nakov P (2017) Machine translation evaluation with neural
networks. Comput Speech Lang 45:180–200
15. Popovíc M, Ney H (2009) Syntax-oriented evaluation measures for machine transla-
tion output. In: EACL 2009—Proceedings of the Fourth Workshop on Statistical Machine
Translation 29–32. https://doi.org/10.3115/1626431.1626435
16. Duma M, Vertan C, Park VM, Menzel W (2013) A new syntactic metric for evaluation of
machine translation. ACL Student Res Work 130–135
17. Haque R, Hasanuzzaman M, Way A (2020) Analysing terminology translation errors in sta-
tistical and neural machine translation. Mach Transl 34:149–195
18. Papineni K, Roukos S, Ward T, Zhu WJ (2002) {B}leu: a method for automatic evalua-
tion of machine translation. In: Proceedings of the 40th annual meeting of the association
for computational linguistics 311–318 (Association for Computational Linguistics, 2002).
https://doi.org/10.3115/1073083.1073135
19. Agnihotri S (2019) Hyperparameter optimization on neural machine translation. Creat
Components 124
20. Lim R, Heafield K, Hoang H, Briers M, Malony A (2018) Exploring hyper-parameter opti-
mization for neural machine translation on GPU architectures 1–8
21. Tran N, Schneider J-G, Weber I, Qin AK (2020) Hyper-parameter optimization in classifica-
tion: to-do or not-to-do. Pattern Recognit 103:107245
22. Lankford S, Afli H, Way A (2022) Human evaluation of English–Irish transformer-Based
NMT 1–19
23. Newman B, Ang KS, Gong J, Hewitt J (2021) Refining targeted syntactic evaluation of lan-
guage models 3710–3723. https://doi.org/10.18653/v1/2021.naacl-main.290
24. Manning C et al (2015) The Stanford CoreNLP natural language processing toolkit 55–60.
https://doi.org/10.3115/v1/p14-5010
25. Marcus MP, Santorini B, Marcinkiewicz MA (1993) Building a large annotated corpus of
English: the Penn Treebank. Comput Linguist 19:313–330
Artificial Intelligence and Blockchain
Technology in Insurance Business
Abstract With the use of cutting-edge technology like artificial intelligence, the
Internet of Things, and blockchain, the insurance industry is about to enter a new
era. Financial innovations will have a wide range of effects on economic activity.
The future of the AI and blockchain-based insurance industry appears bright. The
expansion of the insurance industry will be considerably aided by AI and block-
chain technology. The corona situation has recently brought to light some ineffi-
ciencies of the conventional system, including fewer client–insurer interaction
capabilities, challenges in targeting consumers, difficulties working from home, a
lack of IT support, a lack of transparency, and sparse management. This article
examines the impact of cutting-edge financial technology on the insurance indus-
try, including AI technology and blockchain, and describes the phenomena from
several angles. Insightful and scientific, ICT-based insurance makes considera-
ble use of electrical and IT equipment for technology resolution without the con-
sumer providing the insurance company with any natural resources. Digital insurer
seems to be growing more popular at the moment, but it’s essential to understand
the problems and challenges it faces, which result in poor penetration. Although
there are many advantages for the insured and insurer, internet insurance providers
are nevertheless subject to operational, regulatory, and company reputation, and
their customers worry about the safety of their transactions and identity theft. This
article aims to illustrate the fundamentals and significance of AI and blockchain
technology in the insurance business, looking at benefits that accrue to both the
provider and the customer. Although there are many studies already published, the
focus of this research is the insurance industry. By fusing cutting-edge technical
demands with established company competencies, this paper assists the manage-
ment in expanding their understanding of competing for advantage.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 61
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_6
62 S. Ahmad and C. Saxena
1 Introduction
2 Literature Review
corporations. The participants usually give generous businesses with safety net
suppliers flexibility. Still, they might also emerge as direct rivals, restricting net
income and testing backup plans, particularly at their client application [11].
According to Mohammad Ali (2020), financial development is significantly
impacted by insurance. An ambiguity, a loss of accredited living person harmo-
nization, a lack of a branding and promotion strategy, a lack of business ethics,
inadequate IT support, ineffective officials, totally inadequate ROI, a loss of trans-
parency and honesty, a loss of acknowledgement, and banal control are among
the most critical matters in this industry, according to him. He suggests using
cutting-edge marketing and advertising tools, retaining and developing talent,
enhancing recognition, utilizing IT, the Internet of Things (IoT), blockchain (BC),
and artificial intelligence (AI), avoiding risky competition, adopting a progressive
managerial style, and enforcing substantial insurance to overcome the challenges
[12].
Both the insurance industry and how insurance services are provided are being
transformed by digitalization. The insurance industry’s service delivery process is
significantly impacted by the use of mobile devices, chatbots, big data, intelligent
systems (AI), IoT, and chatbots. This includes everything from product design to
underwriting risks to policy pricing to their advertising and sales to claim pro-
cessing to current customer interaction and management [13]. Digitalization is
reassess present tasks from fresh perspectives formed possible by technological
novelty rather than just changing remain processes into more improper ones. The
transition might result in new ways to do activities that are more durable or rea-
sonable. Still, it can also be disruptive to a company’s ongoing operations because
digitalization fundamentally alters an organization’s commercial opportunities
[14].
AI has the potential to change the marketing industry completely. The focus on
AI by businesses in their financial statements and its gross and net operational effi-
ciency are compared in this study. 10-K filings are an essential source of informa-
tion for research in accounting and finance, but they are still primarily ignored in
marketing. Researchers create a practical guideline to demonstrate how organiza-
tions’ AI focus may be tied to annual gross and operating efficiency by drawing on
financial and marketing theory. The connection between AI intensity and operating
efficiency is then empirically tested using a set of simultaneous equations. Their
findings demonstrate the coming AI change that US-listed companies are currently
experiencing. They show how an emphasis on AI may increase net revenue, net
operating efficiency, and return on capital while lowering advertising costs and
generating jobs [15]. AI emphasis and sales, as well as AI focus and personnel
count, are positively correlated. However, it is expensive to retrain staff mem-
bers and reorganize marketing roles to accommodate the mechanization of sell-
ing and marketing capabilities. It is crucial to understand whether an AI-focused
organization has a good or negative impact on sales per employee. Because our
guiding framework is unable to anticipate the direction of the link between AI
emphasis and deals, we hypothesize that AI focus is adversely connected to sales
per employee. Furthermore, our guiding framework assumes a positive correlation
Artificial Intelligence and Blockchain Technology in Insurance Business 65
between the workforce size and AI focus. Since hiring more staff incurs higher
costs, we anticipate that there will be a negative correlation between AI engage-
ment and sales per employee [16].
3 Methodology
decentralized digital ledger that is shared by many peers in a network and makes
it easier to record transactions and track the ownership of both tangible and intan-
gible goods. Through the use of cryptography signatures, validated payments take
the form of blocks that are sequentially added to a chain of previously confirmed
blocks due to the decentralized nature of the ledger. And the fact that each new
block is tagged sequentially and contains the information that references the block
before it, any attempt to falsify the blockchain would necessitate falsifying every
block that has already been created [18] (Fig. 1).
It is evident from the graph mentioned above and the chart that there is a strong
correlation between technology investment and insurance performance. InsurTech
investment is rising globally but slowing down around the period of the corona
pandemic. The premium volume serves as a proxy for insurance performance,
which is improving annually as a result of investments in technology. Due to tech-
nological investments made the year before, the premium growth rate increased
during the pandemic by 2.9%. Like the rest of the world, India’s insurance indus-
try is booming. Even throughout the COVID-19 period, India’s premium volume
continues to rise annually. The fact that India is already a member of the tech-
nological revolution is encouraging. The majority of technology devices, or
"InsurTech," are already in use by India’s insurance sector. After the pandemic,
we hope India will fare better than it is right now. Before the COVID-19 pan-
demic, the insurance companies were on a steady growth path, according to the
Swiss ReInstitute’s sigma research article (no. 4/2020) on global insurance. With
help from the non-life industry in developed economies, the total amount of direct
premium written globally climbed by about 3% in 2019 compared to the previous
year. Over 60% of all insurance markets globally saw total premium growth out-
pace real GDP growth. The total amount of insurance premiums insured in 2019
increased from the pre-revision estimate of USD 5.4 trillion to USD 6.3 trillion,
representing 7.2% of the world GDP (IRDI Annual Report) [20] (Figs. 2, 3 and 4).
ϲϬϬϬ
ϱϱϰϬ
ϱϬϬϬ
ϰϳϴϬ
ϰϬϬϬ ϰϯϳϭ
ϯϬϬϬ
Ϯϳϳϭ ϮϲϵϬ
ϮϬϬϬ
ϭϰϲϯ
ϭϬϬϬ
Ϭ
ϮϬϭϱͲϮϬϭϲ ϮϬϭϲͲϮϬϭϳ ϮϬϭϳͲϮϬϭϴ ϮϬϭϴͲϮϬϭϵ ϮϬϭϵͲϮϬϮϬ ϮϬϮϬͲϮϬϮϭ
ϳϬϬϬ
ϲϬϬϬ
ϱϬϬϬ >ŝĨĞ
;h^ΨͿ
ϰϬϬϬ
EŽŶͲ>ŝĨĞ
;h^ΨͿ
ϯϬϬϬ
dŽƚĂů
ϮϬϬϬ ;h^ΨͿ
ϭϬϬϬ
Ϭ
ϮϬϭϱͲϮϬϭϲ ϮϬϭϲͲϮϬϭϳ ϮϬϭϳͲϮϬϭϴ ϮϬϭϴͲϮϬϭϵ ϮϬϭϵͲϮϬϮϬ ϮϬϮϬͲϮϬϮϭ
ϭϮϬ
ϭϬϬ
ϴϬ >ŝĨĞ
;h^ΨͿ
ϲϬ
EŽŶͲ>ŝĨĞ
;h^ΨͿ
ϰϬ
dŽƚĂů
;h^ΨͿ
ϮϬ
Ϭ
ϮϬϭϱͲϮϬϭϲ ϮϬϭϲͲϮϬϭϳ ϮϬϭϳͲϮϬϭϴ ϮϬϭϴͲϮϬϭϵ ϮϬϭϵͲϮϬϮϬ ϮϬϮϬͲϮϬϮϭ
5 Test of Hypothesis
From the above hypothetical and insightful conversation, we could gather that the
new cutting-edge innovations (AI and BC) goodly affect the insurance business. In
this way, our examination shows there is a meaningful role of AI and blockchain
technology in the insurance business.
6 Conclusion
References
14. Parviainen P, Kaarianinen J, Tihinen M, Teppola S (2017) Tackling the digitalization chal-
lenge: how to benefit from digitalization in practice. J Acad Market Sci 5(1):63–77
15. Sagarika M, Michael T, Holly B (2022) Artificial intelligence focus and firm performance.
Int J Inf Syst Project Manage 5(1)
16. Barro S, Davenport TH (2019) People and machines: partners in innovation. MIT Sloan
Manage Rev 60(4):22–28
17. Martin E, Davide N, Julian S (2019) The impact of artificial intelligence along the insurance
value chain and on the insurability of risks. Geneva Papers Risk Insurance—Issues Practice
2022(47):205–241. https://doi.org/10.1057/s41288-020-00201-7
18. Bonson E, Bednarova M (2019) Blockchain and its implications for accounting and auditing.
Meditari Account Res 27(5):725–740
19. Blockchain in the Insurance Industry: What to Expect in the Future? https://www.dataversity.
net/blockchain-in-the-insurance-industry-what-to-expect-in-the-future/
20. IRDAI Annual Report. https://www.irdai.gov.in/ADMINCMS/cms/frmGeneral_NoYearList.
aspx?DF=AR&mid=11.1
21. Investment in InsurTech. https://www2.deloitte.com/us/en/pages/financialservices/articles/
fintech-insurtech-investment-trends.html
22. Raj R, Dixit A, Saravanakumar A, Fathima A, Dornadula R, Ahmad S (2021)
Comprehensive review of functions of blockchain and crypto currency in finance and bank-
ing. Des Eng 9:3649–3655
23. Mhlanga D (2021) Financial inclusion in emerging economies: the application of machine
learning and artificial intelligence in credit risk assessment. Int. J. Financ. Stud. 9:39
24. Mhlanga D (2022) Human-centered artificial intelligence: the superlative approach to
achieve sustainable development goals in the fourth industrial revolution. Sustainability
14:7804. https://doi.org/10.3390/su14137804
Classification and Detection of Acoustic
Scene and Event Using Deep Neural
Network
Sandeep Rathor
Abstract Sound is the basis medium to understand the world. On the basis of sound,
we can analyze or recognize the events, environment or scene. Recently, artificial
intelligence techniques have been prominently applied to handle speech signals, and
remarkable achievement has been exhibited by utilizing the speech signal with deep
neural network. Therefore, a multi-model approach to recognize acoustic event and
scene is proposed by using deep neural network. In the proposed work, temporal
features are captured by using LSTM, and dense layer (DL) is utilized for detention
the nonlinear combination of those features. Experimental results are obtained using
TUT 2017 datasets with the acceptable accuracy, i.e., 85.36%.
1 Introduction
Speech is the most convenient and fast medium of communication. On the basis of
speech, communicators can understand the domain, situation, mood, and purpose of
each other. The domain of communication reflects the context like political, medical,
advertisement, research, games and sports, etc. [1]. If persons are discussing on a
pandemic, then its domain will be “Medical”; similarly, if two persons are discussing
about IPL match, then its domain will be “Games and Sports.” The emotions and
sentiments can also be recognized through the communication. The emotions can be
classified as: happy, sad, joy, surprise, excited, etc. while sentiments can be classified
as: favorable, unfavorable, and neutral. Similarly, there are different types of sound
in real life like cooking, travelling, playing, watching television, passing the vehicle,
singing, crying, etc. Nowadays, there has been increasing interest during analyzing
numerous sounds in real-life environments equivalent to change of state sounds in
a room or vehicles passing sounds in the environment [2]. Therefore, an automatic
analysis can be performed to classify the environment on the basis of the sound. The
S. Rathor (B)
Department of Computer Engineering and Applications, GLA University, Mathura, India
e-mail: sandeep.rathor@gla.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 73
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_7
74 S. Rathor
sound can be of a bird, a car, a musical instrument, harmonic noises, and multiple
noises, etc. [3]. We can also recognize critical situations. Obviously, it is considered
as an event. It can be generated by acoustic. A shooting, a scream, a glass breaking,
an explosion, or an emergency siren are examples of artifacts [4]. Mainly, “environ-
mental sound detection” can be utilized for two tasks, i.e., for event detection and for
scene classification. Acoustic event detection is the process of detecting the level of
event like “birds singing,” “mouse clicking,” “travelling,” “playing,” etc.; it is simply
identification of “Triger words,” while acoustic scene classification is the process
of predicting scene from the recording such as “park,” “office,” “train,” “cricket,”
etc. The neural network is the most suited for recognizing events [5]. However, for
the implement multimodal, i.e., for event and scene, deep neural network is more
suitable [6].
Figure 1 shows two tasks that can be recognized by the sound signals, i.e., event
recognition and scene recognition. The main objective of the proposed research is
to recognize it by using bi-directional LSTM and deep neural network on a standard
dataset.
Sound Signals
Kitchen Cooking
Market Crying
Train/ Bus People Talking
2 Related Work
3 Proposed Methodology
A proposed framework for acoustic scene and event recognition is shown in the
Fig. 2. Acoustic features are extracted from the input speech signals by using Mel-
frequency cepstral coefficient (MFCC). The input feature map is first convolved with
two-dimensional filters in the convolution layer, and then its dimension is decreased
via maxpooling.
The input acoustic signal is sent to the feature extraction phase, which calculates
effective parameters. These parameters are specified to improve the audio signal’s
feature extraction. They aid in the characterization of sound segment by separating
one from the other. Adapted signals are split into classes xi, for I = 1; 2;::; n,
according to classification. During the training process, the acoustic model learns
from characteristics and parameters. Unsupervised and supervised learning are the
two types of learning. Labels in supervised learning represent marked vectors that
indicate class membership. According to unsupervised learning, train data will be
divided into classes based on hidden variables [15]. The output of the maxpooling
layer is then concatenated and input to a fully connected layer. Fully connected layer
is used to learn relationship between the features.
In the proposed model, softmax activation function is used. The output of this layer
is passed to the dropout layer to overcome overfitting problem. On the other hand,
one more activation function rectified linear unit (ReLU) is used in event layers
to recognized the events. The proposed network is optimized because of softmax
cross-entropy objective function (Fig. 3).
The LSTM is used to discover patterns in sequential features. When learning
long-term temporal dependencies, LSTM is effective. The input gate, forget gate, and
Classification and Detection of Acoustic Scene and Event Using Deep … 77
output gate are the three gates. The interwork dependencies are effectively learned
using a bi-directional LSTM layer. This layer is in charge of teaching the sentence’s
word order.
78 S. Rathor
X X X X
Layer
X X X X
Acoustic
features
In order to recognize the acoustic scene and events, I evaluated the performance of
the proposed model on TUT2017 dataset [16]. In this dataset, some events were not
labeled therefore, tagged manually. Measure parameters like Recall, Precision, and
F-score are also mentioned to validate the research [17].
Figure 4 shows the feature extracted from an audio signal by using spectral
centroid, while Fig. 5 shows the spectrogram of speech signals.
Figure 6 shows the parameter calculation for scene recognition. It is used to
validate the results of the proposed model.
Figure 7 shows the parameter calculation for event recognition in the proposed
model. The values show that proposed model is capable to recognize the acoustic
scene and event with good accuracy.
Table 1 shows the comparison between the proposed model and other state-of-the-
art methods proposed by various researchers. The results shown in Table 1 indicates
that the accuracy of proposed model is better than any of the other model.
Classification and Detection of Acoustic Scene and Event Using Deep … 79
1.2
0.93
0.91
0.86
0.9
0.75
1
0.73
0.72
0.72
0.70
0.69
0.69
0.63
0.59
0.8
0.43
0.6
0.4
0.2
0
Kitchen Office Park Market Train/ Bus
The paper explored the idea of event and scene recognition. In this paper, LSTM
model is proposed to classify the acoustic events and scenes by using different fully
connected layers and Bi-LSTM. The proposed model is executed on TUT2017 stan-
dard dataset to recognize events and scenes. The received accuracy is close to 85%.
80 S. Rathor
0.96
0.96
0.94
0.92
0.92
0.91
0.90
0.89
0.89
0.89
0.87
1.2
0.86
0.86
0.82
0.79
1
0.8
0.6
0.4
0.2
0
Coocking Mouse Bird Singing Crying Talking
Clicking
Table 1 Comparison of
Method Event (F-score) Scene (F-score)
proposed model with other
methods Tonami et al., 0.776 0.66
RNN 0.683 0.59
Proposed 0.89 0.73
To implement the proposed model, Python programming language is used. İn context
to parametric evaluation, precision and recall are calculated for each event and scene.
İn future, the proposed research can be extended to recognize the whole environment
on video data.
References
1. Rathor S, Agrawal S (2021) A robust model for domain recognition of acoustic communication
using Bidirectional LSTM and deep neural network. Neural Comput Appl 33(17):11223–11232
2. Mesaros A, Diment A, Elizalde B, Heittola T, Vincent E, Raj B, Virtanen T (2019) Sound event
detection in the DCASE challenge. IEEE/ACM Trans Audio Speech Lang Process 27(6):992–
1006
3. Barchiesi D, Giannoulis D, Stowell D, Plumbley MD (2015) Acoustic scene classification:
classifying environments from the sounds they produce. IEEE Signal Process Mag 32(3):16–34
4. Dosbayev Z et al (2021) Audio surveillance: detection of audio-based emergency situations. In:
Wojtkiewicz K, Treur J, Pimenidis E, Maleszka M (eds) Advances in Computational Collective
Intelligence. ICCCI 2021. Communications in Computer and Information Science, vol 1463.
Springer, Cham. https://doi.org/10.1007/978-3-030-88113-9_33
5. Valenti M, Diment A, Parascandolo G, Squartini S, Virtanen T (2016) Acoustic scene classifi-
cation using convolutional neural networks. In: Proceedings of Detection and classification of
acoustic scenes and events 2016 Work., no. September, pp 95–99
6. Huang CW, Narayanan SS (2017) Characterizing types of convolution in deep convolutional
recurrent neural networks for robust speech emotion recognition, 1–19
7. Liu AA, Shao Z, Wong Y et al (2019) Multimed Tools Appl 78:677. https://doi.org/10.1007/
s11042-017-5532-x
Classification and Detection of Acoustic Scene and Event Using Deep … 81
8. Cohen R, Ruinskiy D, Zickfeld J, IJzerman H, Lavner Y (2020) Baby cry detection: deep
learning and classical approaches. In: Pedrycz W, Chen SM (eds) Development and analysis
of deep learning architectures. Studies in computational intelligence, vol 867. Springer, Cham
9. Hayashi T, Watanabe S, Toda T, Hori T, Le Roux J, Takeda K (2016) Bidirectional LSTM-HMM
hybrid system for polyphonic sound event detection
10. Conka D, Cizmar A (2019) Acoustic events processing with deep neural network. In: 2019
29th international conference radioelektronika (RADIOELEKTRONIKA), pp 1–4. https://doi.
org/10.1109/RADIOELEK.2019.8733502
11. Ford L, Tang H, Grondin F, Glass J (2019) A deep residual network for large-scale acoustic scene
analysis. In: Proceedings of annual conference international speech communication association
INTERSPEECH, vol 2019-Septe, pp 2568–2572
12. Ma J, Tang H, Zheng WL, Lu BL (2019) Emotion recognition using multimodal residual LSTM
network. In: Proceedings of the 27th ACM international conference on multimedia, pp 176–183
13. Zhang S, Zhao X, Tian Q (2019) Spontaneous speech emotion recognition using multiscale
deep convolutional LSTM. IEEE Trans Affect Comput c:1
14. Xia X, Togneri R, Sohel F, Zhao Y, Huang D (2019) A survey: neural network-based deep
learning for acoustic event detection. Circuits Syst Signal Process 38(8):3433–3453
15. Tonami N, Imoto K, Niitsuma M, Yamanishi R, Yamashita Y (2019) Joint analysis of acoustic
events and scenes based on multitask learning. In: 2019 IEEE workshop on applications of
signal processing to audio and acoustics (WASPAA). IEEE, pp 338–342
16. Mesaros A, Heittola T, Diment A, Elizalde B, Shah A, Vincent E, Raj B, Virtanen T (2017)
DCASE 2017 challenge setup: tasks, datasets and baseline system. In: Proceedings workshop
on detection and classification of acoustic scenes and events (DCASE), pp 85–92
17. Tripathi R, Jalal AS, Agrawal S (2019) Abandoned or removed object detection from visual
surveillance: a review. Int J Multimed Tools Appl 78(6):7585–7620
Analysis of a Novel Integrated Machine
Learning Model for Yield and Weather
Prediction: Punjab Versus Maharashtra
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 83
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_8
84 S. Chaudhary et al.
1 Introduction
Agriculture is the cardinal source of livelihood for about 59% of India’s population.
Indian agriculture in the 21st millennium is structurally deviating and robust than
the one prevalent during the green revolution era which commenced in the 1970’s.
Tremendous advancements and the emergence of new technologies can be witnessed
in this sector [1]. Agriculture has gone through drastic alterations throughout the
decades; machines are highly automated, and unmanned aerial vehicles (UAVs) and
orbital satellites are becoming essential. Agriculture stumps up about 17% of the
gross domestic product (GDP) of Indian economy and provides employment to over
60% of the population [2]. The Government of India ratified a number of measures
to ameliorate the system of agricultural marketing, standardization of weights and
measures, establishment of warehouses, open regulated markets, and commencing
policies like MSP and PDS.
India is a multiproduct agricultural powerhouse and produces an enormous range
of food and non-food crops, and the yield depends on weather, soil fertility, season,
water and nutrients absorbed by crops, and dosage of fertilizers and pesticides [3].
Predicting crop yield with a rimmed area of land is an arduous task in an agro-
based country like India. Yield rate can be accelerated by monitoring crop growth,
accurate weather predictions, field productivity zoning, crop disease prevention and
management, and forecasting crop yield. Each kind of crop has its optimum growth
requirements [4–6]. In India, agricultural yield predominantly recons upon weather
conditions. For instance, rice cultivation primarily relies upon rainfall, and bean
cultivation demands a major amount of sunlight. Weather prediction is a challenging
task due to the dynamic nature of the atmosphere [7]. This research also helps the
farmer in weather prediction more precisely than the previous studies so that farmers
can be alerted well in advance. Prediction is done on the basis of variables like
rainfall, sunlight, and pH value of soil.
Advances in machine learning have created new opportunities to revamp predic-
tion in agriculture. Crop production rates depend on the topography and geographic
conditions of the region (e.g., hilly areas, river ground, mountainous regions, and
depth regions), weather conditions (e.g., rainfall, sunlight, groundwater level, temper-
ature, and pH value of soil), soil type (e.g., sandy, clay, saline, and loam soil), and
soil composition and irrigation methods [8–10]. Different prediction models are used
for different types of crops grown all over India [11, 12]. This research chronicles
different machine learning prediction models, concepts, and algorithms. We also
resort to unearth the least traversed areas concerning the integration of machine
learning techniques in the agriculture sector. This research will help prospective
researchers get a better understanding of what all prediction models have been used
till date, and what all still need to be focused upon. The study is segregated into
various sections, each section depicting a particular aspect of agriculture.
The main objective of this research is to propose an intelligent and interac-
tive prediction system that aims at predicting the crop yield before harvesting by
learning the past 20 years’ data of the farming land and helps farmers through timely
Analysis of a Novel Integrated Machine Learning Model for Yield … 85
weather forecasts and identification of crop condition by using machine learning tech-
niques. Factors which are significant to crop production such as farm area, production
of crop in the previous years and the seasons of farming for different Indian states,
temperature of area, humidity, crop yield, growing season, and water requirements
of crop were considered for experimental analysis. To anticipate continuous values,
various machine learning techniques are used, and the data is pre-processed by using
SSIS.
This paper is organized into following sections. Section 1 conferred with the
prefatory phase. Section 2 explores the related research work done in the related
field. Section 3 presents the proposed framework to deal with the limitations of the
existing systems. Section 4 reveals the result and discussions, followed by conclusion
in Sect. 5.
2 Literature Review
Priya et al. [13] proposed a system for predicting the yield of the crop on the basis
of existing data by using random forest algorithm. Real data of Tamil Nadu, State
of India, was used in this research for building the models, and the same models
were tested with samples. Random forest was the only algorithm used for crop
yield prediction. Jeong et al. [14], and the team proposed spawned outputs which
proved that random forest is a compelling and versatile machine learning algorithm
for predicting the crop yield of wheat, maize, and potato, in comparison of multiple
linear regression, at both territorial and worldwide scales. The dataset consists of data
of yield from US counties and northeastern seaborn regions. Manjula [15] proposed
a system to predict crop yield from preceding data. This is accomplished by imple-
menting association rule mining on agriculture data and predicting the crop yield.
The paper proposed an analysis of crop yield prediction using data mining tech-
niques for the selected region, i.e., district of Tamil Nadu in India. Raju Prasad
Paswan [16] proposed extensive review of literature analyzing feedforward neural
networks and traditional statistical methods to predict agricultural crop production.
Traditional statistical methods included in this study are linear regression. By results,
they concluded that if there is better communication among the fields of statistics
and neural networks, then it would benefit both. Veenadhari [17] proposed fore-
casting crop yield based on parameters of climate. In this Research Crop Advisor,
a software tool “Crop Advisor” has been developed as a Web page for forecasting
the influence of climatic parameters on the crop yields. Main algorithm used in this
study was C4.5. It produces the influencing climatic parameter on the crop yields of
selected crops in selected districts of Madhya Pradesh. D Ramesh [18] proposed anal-
ysis of crop yield prediction using multiple linear regression (MLR) technique and
density-based clustering technique for the district of Andhra Pradesh-East Godavari
in India; in this study, results of two methods were compared according to the specific
region. Shahane [19] proposed reduction on crop cultivation is basically an aggre-
gation of sustainability, soil analysis, crop and fertilizer recommendation, and crop
86 S. Chaudhary et al.
1. Data pre-processing is the first and crucial step while creating a machine learning
model. In the previous studies, data pre-processing is done with the help of Python
language which consumes a lot of time, and large codes are generated.
2. Insufficient availability of data (too few data) is a major problem in existing
approaches. The studies stated that their systems worked for the limited data that
they had at hand, and indicated data with more variety should be used for further
testing.
3. Moreover, algorithms like “lasso regression” and “linear discriminant analysis”
have never been used in crop prediction and weather forecasting.
3 Proposed Model
Modern and contemporary technologies are gaining more attention with respect
to prediction and predictive analysis approaches. Predictive techniques are being
favored in recent times due to their immense scope in knowing the agricultural yield
in advance. This framework provides the farmer/user an approximation on how much
crop yield will be produced depending upon the season, crop, area, and production.
Weather has a profound influence on crop growth, total yield, amount of pesticides,
and fertilizers needed by crop, and all other activities carried out throughout the
growing season.
Features of the proposed model are as follows:
• The proposed model showed how beneficial the amalgam of machine learning
and predicting crop yield and weather forecasting could be in agriculture.
• The proposed framework uses algorithms like lasso regression, Gaussian Naïve
Bayes, and linear discriminant analysis which have never been used in crop predic-
tion and weather forecasting and hence gives us better accuracy than the existing
system.
• The framework utilizes SSIS platform by using visual studio, BIDS, for pre-
processing the data which results in less time and simple codes. Python is a very
powerful programming language. Combined with SSIS, it can provide robust and
flexible solutions to several problems.
• The framework is well supported by measure-based evaluation which ultimately
validates the performance (Fig. 1).
88 S. Chaudhary et al.
Technology used for data pre-processing in this research is SSIS, SQL. Server inte-
gration services are a platform for building planned data integration and data transfor-
mations solutions. SSIS used to remove the duplicate values by sort transformation
editor which worked efficiently in removing all null values, and conditional split
was done as per algorithm needed, i.e., raw data was combined from various sources
through SSIS.
Analysis of a Novel Integrated Machine Learning Model for Yield … 89
Different machine learning algorithms such as random forest, linear regression, lasso
regression, and support vector machine are applied to the Maharashtra and Punjab
State of India from the dataset to predict the crop yield in advance before harvesting
and were compared using the R2-score and MSE measures.
• R2-score: R-squared is a statistical measure that defines the goodness of fit of the
model. The ideal R2-score value is 1. The closer the value of r-square is to 1, the
better the model is. R2 value can also be negative if the model is worse than the
average fitted model.
R-square = 1−(SSres / SStotal )
SSres : Residual sum of squares
SStotal : Total sum of squares
• Mean Squared Error: MSE is a statistical measure that is defined as the mean or
average of the square of the difference between actual and estimated values. The
lesser the value of MSE, the better the model is.
Punjab, from Table 1, we can say that random forest is the best algorithm for crop
yield prediction with the accuracy of 96.72% and the highest R2-score with least
error SVM performs worst. Table 2 shows the accuracy representation for finding
the best-fit algorithm.
For Maharashtra, from Table 3, it is concluded that random forest is giving the
highest accuracy of 96.33% with the best R2-score with minimum error, and linear
regression turned out to be the worst algorithm for crop yield prediction.
90 S. Chaudhary et al.
Graph, in Fig. 2, shows the higher the R2-score and lower the MSE, better the
accuracy of an algorithm is. The above graph depicts that random forest is showing
the best R2-score with least MSE value. The closer the value of R2-score is to 1, the
better the algorithm is.
Various algorithms such as KNN, Gaussian Naïve Bayes, logistic regression, decision
tree, SVM, and linear discriminant analysis applied for weather forecasting, to help
the farmers to adapt to the situation and take preventive measures before harvesting
Analysis of a Novel Integrated Machine Learning Model for Yield … 91
the crop. Different parameters such as precision, recall, and F1-score are used to
find the algorithm with the best accuracy in weather prediction. From Table 4, it is
concluded that Gaussian Naïve Bayes is the best algorithm for weather prediction
with the accuracy of 91.89%.
The graph in Fig. 3 is plotted against cross validation score, the higher the cross
validation score, the better the algorithm is. As, Gaussian Naïve Bayes (NB) is
showing the highest score and is the best-fit algorithm for weather prediction.
5 Conclusion
For proposed model, random forest provides maximum accuracy of 96.33% and is
highly efficient in prediction of crop yield. Gaussian Naïve Bayes is the proficient
algorithm with highest accuracy 91.89% in forecasting weather. Performance of the
model is found to be relatively sensitive to the quality of weather prediction, which
in turn recommend the significance of weather forecasting techniques. This model
92 S. Chaudhary et al.
lessen the troubles confronted through farmers and will serve as a delegate to offer
farmers with the information they want to benefit high and maximize the profits.
In future, starting from the training dataset used for the model, it can be further
incorporated with crop-imaging data. Additionally, newer algorithms and learning
methods could be used for prediction, enhancing the accuracy of the learning model
along with better measures for calculating the accuracy of the employed classifier.
References
1. Pantazi XE, Moshou D, Oberti R, West J, Mouazen AM, Bochtis D (2017) Detection of biotic
and abiotic stresses in crops by using hierarchical self-organizing classifiers. Precis Agric
18:383–393
2. Moshou D, Bravo C, West J, Wahlen S, McCartney A (2004) Automatic detection of “yellow
rust” in wheat using reflectance measurements and neural networks. Comput Electron Agric
44:173–188
3. Sangeeta SG (2020) Design and implementation of crop yield prediction model in agriculture.
Int J Sci Technol Res 8(01)
4. Shin HC, Roth HR, Gao M, Lu L (2016) Deep convolutional neural networks for computer-
aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans
Med Imag 35(5)
5. Torrey L, Shavlik J (2010) Transfer learning. In: Handbook of research on machine learning
applications and trends: algorithms, methods, and techniques, Olivas ES et al (eds). IGI Global,
pp 242–264
6. Champaneri M, Chachpara D, Chandwadkar C, Rathod M (2020) Crop yield prediction using
machine learning. Int J Sci Res (IJSR)
7. Alexandros O, Cagata C, Ayalew K (2022) Deep learning for crop prediction: a systematic
literature review. New Zealand J Crop Horticult Sci
8. Folnovic T (2021) Importance of weather monitoring in farm production, Agrivi. All rights
Reserved
9. Pisner DA, Schnyer DM (2020) Support vector machine, in Machine Learning
10. Alonso J, Castañón ÁR, Bahamonde A, Support vector regression to predict carcass weight in
beef cattle in advance of the slaughter. Comput Electron Agric
11. Suganya M, Dayana R, Revathi R (2020) Crop yield prediction using supervised learning
techniques. Int J Comput Eng Technol 11(2)
12. Amit S, Nima S, Saeed K (2022) Winter wheat yield prediction using convolutional neural
networks from environmental and phonological data. Sci Rep Nat
13. Priya P, Muthaiah U, Balamurugan M (2018) Predicting yield of the crop using machine learning
algorithm. Int J Eng Sci Res Technol. 7(4). ISSN: 2277–9655
14. Jeong J, Resop J, Mueller N t al. Random forests for global and regional crop yield prediction,
PLoS ONE J
15. Manjula E, S (2017) A model for prediction of crop yield. Int J Comput Intell Inf 6(4)
16. Paswan RP, Begum SA (2013) Regression and neural networks models for prediction of crop
production. Int J Sci Eng Res 4(9)
17. Veenadhari S, Dr. Misra B, Dr. Singh CD Machine learning approach for forecasting crop yield
based on climatic parameters. In: International conference on computer communication and
informatics (ICCCI)
18. Ramesh D, Vardhan B (2015) Analysis of crop yield prediction using data mining techniques.
Int J Res Eng Technol 4(1):47–473
Analysis of a Novel Integrated Machine Learning Model for Yield … 93
19. Shahane SK, Tawale PV (2016) Prediction on crop cultivation. Int J Adv Res Comput Sci
Electron Eng (IJARCSEE) 5(10)
20. Ferentinos KP, Yialouris CP, Blouchos P, Moschopoulou G, Kintzios S (2013) Pesticide residue
screening using a novel artificial neural network combined with a bioelectric cellular biosensor.
Hindawi publishing corporation BioMed research international
Machine Learning Techniques for Result
Prediction of One Day International
(ODI) Cricket Match
Abstract Cricket is the most popular sport, and most watched now a day. Test
matches, One Day Internationals (ODI), and Twenty20 Internationals are the three
forms in which it is played. Until the last ball of the last over, no one can predict who
would win the match. Machine learning is a new field that uses existing data to predict
future results. The goal of this study is to build a model that will predict the winner
of a One Day International Match before it begins. Machine learning techniques will
be used on testing and training datasets to predict the winner of ODI match that will
be based on the specified features. The data for model will be collected from Kaggle,
and some will be collected from the different cricket Web sites because the data
obtained from Kaggle have only matches up until July 2021. After that prediction
will be done, and the model will provide advantages to team management in terms of
improving team performance and increasing the chance of winning the game. This
model will be used to predict the outcomes of the next Cricket World Cup 2023,
which will be, the 13th, edition of the men’s ODI Cricket World Cup and will be
hosted by India in 2023. Also, this work will serve as a guidance work, as there is
much to be done in the field of sports.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 95
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_9
96 I. Haq et al.
1 Introduction
Cricket was initially brought to North America in the seventeenth century through
English colonies, and it went to other areas of the world in the eighteenth century.
Colonists introduced it to the West Indies, while British East India Company sailors
introduced it to India. It came in Australia nearly immediately after colonization
began in 1788, and in the middle of the nineteenth century, it came to New Zealand
and South Africa. The International Cricket Council is in control of cricket (ICC),
which is the sport’s international governing body. A cricket match might end in a
“win” for one of two teams or a “tie.” In a limited-over’s game, if the game cannot
be completed on time, the game may conclude in a “draw;” in other forms of cricket,
a “no result” may be feasible. When one side scores more runs than another, and all
of the opponent team’s innings have been completed, the game is called “won.” The
team with the most runs “wins,” while the side with the fewest runs “loses.” If the
match is called off before all of the innings have been finished, the result could be a
tie or no result. When the scores are tied at the end of play, the game is declared a
“tie,” but only if the team batting last has finished its innings. Only, two tests have
ever ended in a tie, which is remarkable in cricket. In several one-day cricket formats,
such as Twenty20, a Super Over or a bowl-out is commonly used as a tiebreaker to
settle a result that would otherwise be a tie. If a match is ended without win or tie,
the result is “draw,” as described in Law 16. When one or both sides do not finish
their innings before the planned end of play, the game is called a draw. No matter
how many runs either side has scored, the match is ultimately drawn. If a limited-
over match that has already started cannot be completed due to weather or minor
disruptions, a “no result” is declared.
When rain is a factor, the match is frequently referred to be “washed out.” The
match can be “abandoned” or “cancelled” if weather or other conditions stop it
from happening. Cricket is unique; it has 3 different forms. Test Cricket, Twenty20
International Cricket, and One Day International Cricket. One Day International
(ODI) is a sort of limited-over’s cricket match in which two sides compete against
each other. Each team is given a set number of over’s, currently 50, and the game
can take up to 9 h. In which each side bats only once. Each side’s innings finishes
either when their allotted number of over’s has been completed, or all ten wickets
have been lost.
Cricket World Cup, a four-year international cricket championship that is the top
one-day cricket competition and one of the supreme-viewed sporting action in the
world (Table 1).
As technology progresses and applications like as fantasy 11 and betting sites
grow more popular, people will trust on the predictions offered by the ML model.
Machine Learning Techniques for Result Prediction of One Day …
Table 1 Results of the ODI Cricket World Cup 1975–2019
Year 1975 1979 1983 1987 1992 1996 1999 2003 2007 2011 2015 2019
Winner West-Indies West-Indies India Australia Pakistan Sri-Lanka Australia Australia Australia India Australia England
97
98 I. Haq et al.
2 Literature Review
• The author of the paper [1] used data from 12 seasons of IPL matches. In this
paper, the dataset of the first 11 seasons is used as a training dataset, having 580
matches. The last season is considered a testing dataset, with 60 matches. The
numbers are assigned to the names of the teams. The author has used various ML
modules and has assembled the modules. After the winning team is determined by
computing the total percentage of all model outcomes, decision tree. regression,
random forest regression, support vector machine, Naive Bayes, multiple. linear
regression, and logistic regression are some of the algorithms that were utilized.
The accuracy is achieved at nearly 90%.
• Chowdhury et al. [2] projected the winner of the ODI cricket match played between
India and Pakistan. They manually obtained info on all ODI matches between India
and Pakistan between 1978 and 2019 from www.espncricinfo.com. The data are
preprocessed for model creation, with tied matches being removed, and so on.
When using logistic regression, the chance of an India team winning on home
court was 70.6% higher and 2.28 times higher for data when related to day night
period contests. This research allows for the discovery of hindsight of a winning
match in favor of Team India.
• Jalaz et al. [3] investigated the impact of two machine learning models, decision
trees and multilayer perception networks, on the outcome of a cricket match.
Founded on these findings, the Cricket Outcome Prediction System was created
for estimating the ultimate result of a particular match; the developed method
considers pregame variables such as the ground, venue, and innings. ESPN
Machine Learning Techniques for Result Prediction of One Day … 99
cricinfo is used to obtain the data. All ODI matches from January 5, 1971,
through October 29, 2017, are included in this dataset. There were 3933 ODI
match outcomes. Some of the matches in the dataset were removed from the
analysis during the cleaning phase. After comparison of accuracy, the multilayer
perceptron has a score of 0.574, whereas the decision tree classifier has a score
of 0.551.
• Mago et al. [4] predict the winner of an IPL match before it begins. To determine
the winner of the IPL, machine learning algorithms are trained on key features. The
SEMMA approach was used for the study of the IPL T20 match winner dataset.
The dataset was preprocessed to ensure consistency by eliminating missing values
and encoding variables into numerical format. First, a decision tree was used,
which accurately predicted the winner with an accuracy of 76.9%. The parameters
for the decision tree model are fine-tuned to increase model performance and
get satisfactory results. The model’s performance improved by 76.9–94.0%. The
random forest model was then applied and predicted the winner with an accuracy
of 71%. That is not enough, therefore, the random forest model was tweaked using
tuning of parameters, and the results improved to 80%. The XGBoost model was
used last. The outcome was 94.23% without any parameter adjustment.
• Baasit et al. [5] have examined the popular machine learning algorithms to
declare the winner of the 7th edition of the 20–20 World Cup 2020, which was
hosted in Australia. The ESPN cricinfo dataset was used in this analysis. This
research employs four different learning methods (C4.5, random forest, extra
trees, ID3, and random forest). Random forest was determined as the best algo-
rithm using proprietary efficiency criteria. It achieved a standard efficiency of
80.86%. Australia was expected to win the 20–20 Men’s Cricket World Cup for
the next two years.
• Aggrwal et al. [6] have predicted results collaborative and the ability of each player
to contribute to the match’s result. The data were gathered from techgig.com.
The database comprises data from the previous 500 IPL matches, which has been
preprocessed. Support vector machine, CTree, and Naive Bayes are three machine
learning approaches that achieve accuracy of 95.96%, 97.98%, and 98.99%,
respectively.
• Barot et al. [7] have given a model which is used to improve a bowler or batsman’s
rating and performance in various match aspects are used to investigate what
determines outcome of a cricket match, and outcome of cricket match is. also
predicted using a variety of features. The data for analysis and prediction were
collected from www.espncricinfo.com, which includes data from previous IPL
editions, and retrieved from www.kaggle.com. For match predictions, machine
learning methods such as SVM, logistic regression, decision, tree, random forest,
and Naive Bayes were used. The best accuracy was over 87% and 95% for the
decision tree and logistic regression methods, respectively.
• Islam et al. [8] presented a way of estimating a cricket player’s performance in a
future match. The suggested model is based on statistical data acquired from reli-
able sports sources on the Bangladesh national cricket team’s players. As for selec-
tion methods, recursive feature elimination and univariate selection are used, and
100 I. Haq et al.
as for machine learning algorithms, linear regression and support vector machines
with linear and polynomial kernels are used. In the forthcoming match, machine
learning algorithms are employed to anticipate how many runs the batsman will
score and how many runs the bowler will concede. The model correctly forecasts
batsman Tamim and bowler Mahmudullah with up to 91.5% accuracy, with other
players’ predictions being similarly accurate.
• Rameshwarie et al. [9] created a model that can forecast outcomes while the game
is still being played, i.e., live prediction. The amount of wickets lost, the match’s
venue, the team’s ranking, the pitch report, and the home team’s advantages were
all considered in this study. The main goal of this research is to ripen a model for
predicting the final score of the first innings and the outcome of the second innings
in a limited-overs cricket match. Two separate models have been provided based
on prior matches, one for the first innings and the other for the second innings,
utilizing the linear regression classifier and the Naive Bayes classifier, respectively.
A reinforcement algorithm is also employed.
• Rudrapal et al. [10] used a deep neural network model to predict the outcome of a
football match automatically. There are various obstacles and instances where the
suggested method fails to predict the outcome of a match. The info was gathered
from a variety of online sources. MLP, SVM, Gaussian Naive Bayes, and random
forest are among the algorithms used, which show an accuracy of 73.57%, 58.77%,
65.84%, and 72.92%, respectively.
• Gagana et al. [11] predicted the number of runs for each ball by using the batsman’s
previous runs as observed data. Data from all past IPL matches are collected for
the analysis. The Naive Bayes classifier, decision tree, and random forest were
employed, with an accuracy of 42.95%, 79.02%, and 90.27%, respectively, when
70% of the data was used for training and 30% for testing.
• Kapadiya et al. [12] performed a thorough study and review of the literature in
order to provide an effective method for predicting player performance in the
game of cricket. This model will aid in the selection of the best team and, as a
result, increase overall team performance. For player performance prediction, a
meteorological dataset is used with cricket match statistics. The accuracy rates
for Naive Bayes, decision tree, random forest SVM, and weighted random forest
were 58.12%, 86.50%, 92.25%, 68.78%, and 93.73%, respectively.
• Passi et al. [13] developed a model that considers both teams’ player performances,
such as the number of runs a batsman will. score, and the no. of wickets a bowler
will take. Prediction models are developed utilizing Naive Bayes, random forest,
multiclass SVM, and decision. tree classifiers for both aims. Random forest was
found the most accurate classifier in both experiments. Scraping tools were used
to collect data from cricinfo.com. The best accurate classifier for both datasets
was random forest, which correctly predicted batter runs with 90.74% accuracy
and bowler wickets with 92.25% accuracy.
• Lamsal et al. [14] offered a multifactorial regression-based approach to calculating
points for each player in the league, and the load of a team is determined based
on the historical performance of the players who played the most for the squad.
Six machine learning models have been proposed and utilized to know the result
Machine Learning Techniques for Result Prediction of One Day … 101
Comparison of Accuracy
100
98
96
94
92
90
88
86
Decision Tree Random Forest SVM Linear Naïve Bayes XGBoost
Regression
of each 2018 IPL match 15 min prior to the start of the game, just after the toss.
The proposed model accurately predicted over 40 matches, with the multilayer
perceptron model surpassing the others by 71.6% (Fig. 1).
3 Comparative Analysis
See Table 2.
102 I. Haq et al.
Table 2 Machine learning techniques for result prediction of One Day International (ODI) cricket
match
Authors Domain Year Model Dataset Accuracy/Result
Pallavi et al Cricket 2020 SVM, decision IPL from year 90% Aggregate
prediction tree, random 2008 to 2019
using forest, Naïve
machine Bayes, logistic
learning regression, and
multiple linear
regression
Chowdhury ODI cricket 2020 Logistic Espncriciinfo.c NA
et al forecast in regression om
logistic
analysis
Jalaz et al Using 2018 Decision tree, Espncrickinfo 0.574 and 0.551,
decision tree MLP networks com resp.
and MLP
networks,
predict the
outcome of
an ODI match
Daniel Mago The Cricket 2019 Decision tree, IPL from season 94.87%,
et al Winner random forest 2008 to 2017 80.76%,
Prediction classifier, 94.23%,
with XGBoost resp.
Application
Of Machine
Learning and
Data
Analytics
Ab. Baasit et al Predicting the 2020 Random forest, espncricinfo. com 80.86%, 79.73%,
winner of the C4.5, ID3, and 74.69%, 79.67%,
ICC T20 extra trees resp.
Cricket
World Cup
2020 using
machine
learning
techniques
Shilpi Aggrwal Using 2018 Support vector, techgig.com 95.96%, 97.98%,
et al machine CTree, and 98.99%, resp.
learning to Naive Bayes
predict the
outcomes of
IPL T20
matches
(continued)
Machine Learning Techniques for Result Prediction of One Day … 103
Table 2 (continued)
Authors Domain Year Model Dataset Accuracy/Result
Harshiit Barot Study and 2020 SVM, logistic Kaggle.com 83.67%,
et al prediction for regression, espncricinfo.com 95.91%,
the IPL decision. tree, 87.95%, 83.67%,
random forest, 81.63%
and Naïve Resp.
Bayes
Aminul Islam Machine 2018 Linear Sports Web sites Batsman
et al learning regression, Taamim has a
algorithms support vector 91.5% accuracy
for predicting machine rate, while
player bowler
performance Mahmudullah
in ODI has a 75.3%
Cricket accuracy rate
Rameshwarie Winning 2018 Linear Na Na
et al prediction regression,
and live Naïve Bayes,
cricket score reinforcement
algorithm
Dwijen Predicting the 2021 MLP, SVM, Sports websites 73.57%
Rudrapal, et al outcome of a Gaussian.Naive 58.77% 65.84
football .Bayes, and and 72.92%,
match using random forest resp.
deep learning
Gagana et al A view on 2019 Naïve Bayes dataworld.com 42.95%,
using classifier, 79.02%, and
machine decision 90.27%, resp.
learning to tree, random
analyze IPL forest
match results
Various challenges involved in the existing models of ODI cricket matches are given
as follows:
• Data shortage: Dataset of ODI cricket match available is currently incomplete,
as it only comprises matches up to July 2021. An advanced dataset is required to
overcome the deficit of data.
• ODI Model: A very little work has been done in building a model for the ODI
format of cricket. As per the latest research, main focus is given on building
prediction model for Cricket World Cup 2023.
• ODI Algorithms: Fewer algorithms have been trained in previous researches.
Trained algorithms are required in this active field of study. The algorithms that
produce the best results are considered for the future use.
104 I. Haq et al.
• Prediction Models: Models for outcome prediction are rapidly evolving, with
various new strategies being developed and existing techniques being changed to
improve performance. As per latest research, new and more advanced ways for
outcome prediction are required in this active field of research.
5 Proposed Model
The following methodology will be used in this work which consists of different
phases shown in Fig. 2.
ATTRIBUTE
DATASET SELECTION
TRAINED
DATA RESULT
PREDICTED
RESULT
Fig. 3 Chart shows how many ODI matches each team have played
5.2 Dataset
The dataset is collected from Kaggle, and some data were added manually into the
dataset. The dataset consists of 7734 ODI matches. The dataset consists of irrelevant
information like date which are discarded and team 1, inn, rpo, runs, team2, overs,
result, and ground are included. Machine learning model takes data only in numeric
format, so the feature team 1, Ground, and team 2 are converted into numbers (Fig. 3).
6 Conclusion
Our major goal in this research is to use machine learning methods to construct a.
model. That can predict the outcome of an ODI match before it starts. In this work,
data of ODI matches played after July 2019 will be included in dataset. We selected
8 key features that will give the best possible prediction accuracy. If we see in Table
2, the highest accuracy is of [6], and lowest accuracy is of [11]. So, after analyzing
each paper, we found all the key factors that increased the prediction accuracy. The
model is based on data from previous matches between the teams. This work includes
efficiency and accuracy checks. This approach can be applied to various forms of
106 I. Haq et al.
cricket, such as women’s cricket, domestic cricket, and other sports, to predict the
winner. Also, this will help in the development of a robust prediction model in the
future.
References
1. Tekade P, Markad K, Amage A, Natekar B (2020) Cricket match prediction using machine
learning. Int J Adv Sci Res Eng Trends 5(7)
2. Zayed M (2020) One day international (ODI) cricket match prediction in logistic analysis: India
VS. Pakistan. Int J Hum Movement Sports Sci. https://doi.org/10.13189/saj.2020.080629
3. Kumar J, Kumar R, Kumar P (2018) Outcome prediction of ODI cricket matches using decision
trees and MLP networks. In: IEEE (ICSCCC)
4. Vistro DM, Rasheed F, David LG (2019) The cricket winner prediction with application of
machine learning and data analytics. Int J Sci Technol Res 8(09)
5. Basit A, Alvi MB, Fawwad, Alvi M, Memon KH, Shah RA (2020) ICC T20 cricket world cup
2020 winner prediction using machine learning technique. IEEE https://doi.org/10.13140/RG.
2.2.31021.72163
6. Agrawal S, Singh SP, Sharma JK (2018) Predicting results of indian premier league T-20
matches using machine learning. IN: IEEE international conference on communication system
and network tech
7. Barot H, Kothari A, Bide P, Ahir B, Kankaria R Analysis and prediction for the Indian
premier league. In: IEEE 2020 international conference for emerging technology (INCET)
8. Anik AI, Yeaser S, Hossain AGMI, Chakrabarty A (2018) Player’s performance prediction in
ODI cricket using machine learning algorithms. In: IEEE 2018 4th international conference on
electrical engineering and information and communication technology (iCEEiCT)
9. Lokhande R, Chawan PM (2018) Live cricket score and winning prediction. Int J Trend Res
Develop 5(1), ISSN: 2394–9333
10. Rudrapal D, Boro S, Srivastava J, Singh S (2020) A deep learning approach to predict football
match result. https://doi.org/10.1007/978-981-13-8676-3_9, ResearchGate
11. Gagana, Paramesha K (2019) A perspective on analyzing IPL match results using
machine learning. Int J Sci Res Develop 7(03)
12. Kapadiya C, Shah A, Adhvaryu K, Barot P (2020) Intelligent cricket team selection by
predicting individual players’ performance using efficient machine learning technique. Int
J Eng Adv Technol (IJEAT), 9(3). ISSN: 2249–8958 (Online)
Machine Learning Techniques for Result Prediction of One Day … 107
13. Passi K, Pandey N (2018) Increased prediction accuracy in the game of cricket using machine
learning. Int J Data Mining Knowl Manage Process (IJDKP) 8(2)
14. Lamsal R, Choudhary A (2020) Predicting outcome of Indian premier league (IPL) matches
using machine learning. ResearchGate arXiv:1809.09813
Recommendation System Using Neural
Collaborative Filtering and Deep
Learning
Abstract Recommender systems have transformed the nature of the online service
experience due to their quick growth and widespread use. In today’s world, the
recommendation system plays a very vital role. At every point of our life, we use a
recommendation system from shopping on Amazon to watching a movie on Netflix.
A recommender system bases its predictions, like many machine learning algo-
rithms, on past user behavior. The goal is to specifically forecast user preference for
a group of items based on prior usage. The two most well-liked methods for devel-
oping recommender systems are collaborative filtering and content-based filtering.
Somehow, we were using the traditional methods, named content-based filtering
(CB) and collaborative-based filtering (CF), which are lacking behind because of
some issues or problems like a cold start and scalability. The approach of this paper
is to overcome the problems of CF as well as CB. We built an advanced recom-
mendation system that is built with neural collaborative filtering which uses implicit
feedback and finds the accuracy with the help of hit ratio which will be more accurate
and efficient than the traditional recommendation system.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 109
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_10
110 V. Shah et al.
selecting Websites, news articles, TV listings, and other information are among
the latest developments in the field of recommendation systems. Such systems’
users frequently have a variety of competing needs. There are many variations in
people’s personal tastes, socioeconomic and educational backgrounds, and personal
and professional interests. Therefore, it is desirable to have customized intelligent
systems that process, filter, and present information in a way that is appropriate for
each user of them. Recommendation System Using Neural Collaborative filtering,
Traditionally, relied on clustering, KNN, and matrix factorization techniques.
Deep learning has outstanding success in recent years in a variety of fields, from
picture identification to natural language processing. The traditional approach for
recommendation systems is content filtering and collaborative filtering. Content
filtering is used broadly for creating recommendation systems that use the content
of items to create features that contest the user profile. Items are compared to the
previous item which is liked by the user, and then, it recommends which is the best
match to the user profile [2]. Collaborative-based filtering (CF) is the most popular
method for recommendation systems which exploits the data which is gathered from
user behavior in the past (likes and dislikes) and then recommends the item to the
user.
Collaborative filtering suffers from a cold start, sparsity, and scalability [3]. CF
algorithms are often divided into two categories, such as memory-based methods
(also known as nearest neighbor’s methods) and model-based approaches. Memory-
based approaches attempt to forecast a user’s choice based on the evaluations of other
users or products who have similar preferences. Locality-sensitive hashing, which
implements the closest neighbor’s method in linear time, is a common memory-based
method methodology. On the other hand, modeling methods are developed with the
help of data mining and ML techniques to reveal patterns or designs based on a
training set [4]. However, an advanced recommendation system uses deep learning
as it is more powerful than your traditional methods. Deep learning’s ability has also
improved recommendation systems. Deep learning capability to grasp nonlinear and
nontrivial connections between consumers and items, as well as include extensive
data, makes it practically infinite, and consequential in levels of recommendation
that many industries have so far achieved. Complex deep learning systems, rather
than traditional methods, power today’s state-of-the-art recommender systems like
Netflix and YouTube.
2 Related Work
from the customers, allowing Amazon to quantify their preferences. The thumbs-
up button on YouTube is yet another example of explicit feedback from users [5].
However, the problem with this feedback is that it is seldom. Remember when you
hit the like button on YouTube or contributed a response (in the form of a rating) to
your online purchases? Probably not. The count of videocassettes you specifically
rate is lesser than the number of videos you fob watch on YouTube.
It is a technique of filtering items that a user might enjoy based on the response from
other users. The cornerstone of a personalized recommender system is collaborative
filtering, which involves modeling users’ preferences on products grounded on their
prior interaction (ex, ratings, and hits). The collaborative filtering (CF) task with
implicit feedback is a common term for the recommendation problem, with the goal
of recommending a selection of items to users [6].
Tapestry was one of the first collaborative filtering-based recommender systems
to be implemented. The explicit opinions of members from a close-knit community,
such as an office workgroup [7], were used in this method. For Usenet news and
videos, the GroupLens research system [8, 9] provides a pseudonymous collaborative
112 V. Shah et al.
filtering approach. Ringo [10] and video recommender [11] are emails and Web-based
systems for making music and movie suggestions, respectively.
Although there are other deep learning architectures for recommendation systems,
we believe that the structure that he-et-al. have proposed is the most manageable to
implement and is also the most straightforward one. Most recommendation systems
are based on content-based, collaborative-based filtering, or hybrid filtering which
are nice models but not as much as they have their disadvantages which let them
down somewhere.
To overcome the disadvantages of these models, we come up with a powerful
recommendation system using neural collaborative filtering which uses implicit feed-
back and finds the accuracy with the help of hit ratio which will be more accurate and
efficient than the traditional recommendation system. The model we built (recom-
mendation system) gives the best result and accuracy for the search of movies and
is similar to it for the recommendation.
4 User Embeddings
Before we get into the model’s architecture, let us get acquainted with the concept of
embeddings. The similarity of vectors from a higher-dimensional space is captured
by embedding a low-dimensional space. Let us take a closer look at user embeddings
to better understand this notion. Let us say, we aim to serve our visitors based on
their preferences for two genres of movies: action and fictional films. Assume the
first dimension to represent the user’s preference for action films and the second
dimension as their preference for fictional films (Fig. 1).
Recommendation System Using Neural Collaborative Filtering … 113
Assume Bob to be our very first user. He enjoys action films and nonetheless
dislikes romance films. We place Bob in a two-dimensional vector according to his
preferences.
An embedding is a name for this two-dimensional space. Embedding reduces
the size of our users in a way that they could be denoted in a significant way in a
114 V. Shah et al.
low-dimensional space. Users who have identical movie choices are grouped in this
embedding (Fig. 2).
Of course, we are not inadequate to serve our users in only two dimensions. To
represent our consumers, we employ any number or a count of dimensions. At the
expense of model complexity, a larger amount of dimensions would help us capture
the attributes of each user more precisely. We will use eight dimensions in our code.
5 Learned Embeddings
6 Architecture of Model
Now, since we have a good understanding of embedding, we can specify the archi-
tecture of a model. As you can see clearly, the item and user embeddings are crucial
to our model. Let us have a look at the model architecture using the training sample
below:
UserId: 3 movieId: 1 interacted: 1 (Fig. 3).
The embedding layer, which is a fully connected layer that converts the sparse
representation into a dense vector, is located above the input layer. Then, the user and
Recommendation System Using Neural Collaborative Filtering … 115
Fig. 3 a Visual
representation of the model
architecture. b Neural
collaborative filtering
framework
116 V. Shah et al.
item embeddings are fed to the multi-layered neural network which we call the neural
collaborative filtering layers. This layer maps the latent vectors to their prediction
scores. The capability of the model depends on the size of the final hidden layer X.
In the final output layer, the prediction score ŷui is present, and the model is trained
so as to minimize the loss between yui and ŷui . Item and user vectors for movieId =
1 and userId = 3 are then one-hot encoded as inputs to the model. The true mark
(interacted) is 1 because this is a positive sample (the video was genuinely rated by
the user).
The item input vector and the user input vector are, respectively, served to item
embedding and user embedding, resulting in shorter, denser item, and user vectors.
The embedded item and user vectors are amalgamated, here traversing through a
sequence of totally connected layers that yield a prediction vector. Finally, we use
a sigmoid function to arrive at the best possible class. Because 0.8 > 0.2, the most
likely class is 1 (positive class) in the case above.
Our model is currently being trained and is ready to be estimated/evaluated via the test
dataset. We evaluate our models in traditional machine learning (ML) projects using
measures like accuracy (for classification tasks) and RMSE (for regression tasks)
or MAE. For estimating recommender systems, such metrics are far too simplistic.
We must first understand how advanced recommender systems are used to define
suitable and meaningful metrics for evaluating them.
Take a look at Amazon’s Website, which also has a list of recommendations (given
below) (Fig. 4).
The idea here is that the user is not required to interact with each and every item
on the suggestion list. Rather, we simply want the user to communicate/interact with
at minimum a single element from the list of recommendations; if the user does
that the recommendations will work perfectly. To replicate this, use the assessment
guidelines below to get a list of ten recommended items per user.
• Choose or pick 99 items arbitrarily per user that they have not associated (interact)
with up till now.
• Add the over 99 items alongside with the test item (the item that the user actually
interacted with). The absolute is presently 100 items.
• Apply algorithm to the above 100 objects and rank them based on their forecast
prospects.
• Select top ten items from the above rundown of 100 items of the list. If the test
data item is already in the topmost ten, we call it a ‘hit.’
• Repeat the procedure for all users. The average hits are then used to calculate the
hit ratio.
This hit ratio is generally used for evaluating recommender systems (Table 1).
OR
Alongside the rating, a timestamp column is present that displays the date and time
when the review was submitted. Via this timestamp column, we will use the leave-
one-out methodology to complete our train test split strategy. The most recent review
is used as the test set for each user, while the remaining is used as training data (refer
to Fig. 5).
A total of 38,700 movies have been reviewed by people. The user’s most recent
film review was for the 2013 blockbuster, Black Panther. For this user, we will utilize
this movie as the testing data and the remaining rated movies as training data. When
training and grading recommender systems, this train–test split strategy is widely
utilized. We could not make a random split.
Because we could be using a user’s current review for training and older ones for
testing. With a look-ahead bias, it will induce data leakage, and the trained model’s
performance will not be generalizable to real-world performance (Graph 1).
118 V. Shah et al.
Table 1 Number of
Number of recommendations (N) Hit ratio
recommendations versus
calculated hit ratio 1 0.19
2 0.30
3 0.40
4 0.56
5 0.58
6 0.60
7 0.63
8 0.80
9 0.80
10 0.80
11 0.89
12 0.90
13 0.90
14 0.95
15 0.95
16 0.97
17 0.99
18 1.00
19 1.00
20 1.00
9 Conclusion
Table 2 Comparison of
S. No. Model Model accuracy (%)
various state-of-the-art
models 1 NCF 86
2 Firefly 84
3 Hybrid 76
4 Collaborative-based filtering 65
5 Content-based filtering 63
References
1. Gao T, Jiang L, Wang X (2020) Recommendation system based on deep learning. In: Barolli
L, Hellinckx P, Enokido T (eds) Advances on broad-band wireless computing, communication
and applications. BWCCA 2019
2. Wei J, He J, Chen K, Zhou Y, Tang Z Collaborative filtering and deep learning based
recommendation system for cold start items. https://doi.org/10.1016/j.eswa.2016
3. Shah V, Kumar P (2021) Movie recommendation system using cosine similarity and Naive
Bayes, s (IJARESM) 9(5). ISSN: 2455–6211
4. Lee H, Lee J (2019) Scalable deep learning-based recommendation systems. Article in ICT
Express, published June 2019
5. Dooms S, Pessemier TD, Martens L 2011) An online evaluation of explicit feedback mech-
anisms for recommender systems, WEBIST 2011. In: Proceedings of the 7th international
conference on web information systems and technologies, Noordwijkerhout, The Netherlands
6. He X, Liao L, Zhang H, Nie L, Hu X, Chua TS (2017) Neural collaborative filtering. In:
Proceedings of the 26th international conference on world wide web (WWW ‘17). International
World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, pp
173–182
7. Goldberg D, Nichols D, Oki BM, Terry D (1992) Using collaborative filtering to weave an
information tapestry. Communications of the ACM. December
8. Konstan J, Miller B, Maltz D, Herlocker J, Gordon L, Riedl J (1997) GroupLens: applying
collaborative filtering to usenet news
9. Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) GroupLens: an open architecture
for collaborative filtering of netnews
10. Shardanand U, Maes P (1995) Social information filtering: algorithms for automating ‘Word
of Mouth’
11. Hill W, Stead L, Rosenstein M, Furnas G (1995) Recommending and evaluating choices in a
virtual community of use
12. Ma H, King I, Lyu MR (2011) Learning to recommend with explicit and implicit social
relations” Facing the cold start problem in recommendation systems. Expert Syst Appl
41(4):2065–2073
13. Jeong CS, Ryu KH, Lee JY, Jung KD Deep learning-based tourism recommendation
system using social network analysis. Received: 2020.04.02 Accepted: 2020.04.13 Published:
2020.05.31
The Importance of Selected LMS Logs
Pre-processing Tasks on the Performance
Metrics of Classification Models
Abstract Learning analytics and educational data mining are current research disci-
plines that can provide interesting and hidden insights into the effectiveness of
different learning styles, the complexity of courses, educational content difficul-
ties, and instructional design issues. Simultaneously, they can help to understand the
concepts and reasons and estimate future performance or possible drop out of the
students. However, even though the contribution of these research areas is promising,
the availability of the end-to-end ML tools caused, that many scholarly papers under-
estimated the importance of the data understanding and data pre-processing phases of
the knowledge discovery process. Subsequently, it leads to the incorrect or imprecise
interpretation of the results. Therefore, this paper aims to emphasize the importance
and impact of individual steps of the pre-processing phase on the quality of the ML
models. The paper introduces a case study in which different data pre-processing
tasks are applied to an open dataset of LMS logs before using an SVM classifier. As
a result, the paper confirms the importance and significant impact of a suitably chosen
set of pre-processing tasks on selected performance metrics of two ML classification
models.
1 Introduction
In recent years, the educational landscape has evolved dramatically. Due to the
COVID-19 pandemic, it dramatically growths a number of learning management
systems (LMSs) at universities, schools, and educational organizations. An LMS is a
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 121
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_11
122 J. Pecuchova and M. Drlik
software application used to manage, document, track, and provide electronic educa-
tional technologies, courses, and training programs [1]. In addition, LMSs strive to
use online technology in their courses to enhance the effectiveness of traditional
face-to-face education [2].
In this context, educational data mining (EDM) and learning analytics (LA)
have emerged as new fields of research that examine educational data to address
a variety of instructive research issues, such as identifying successful students in a
given course, identifying students who may drop out or require additional attention
during the learning process. While EDM is a fast-emerging discipline that focuses
on uncovering knowledge and extracting relevant patterns from educational infor-
mation systems, LA is more general and considers the environment in which the
learning process occurs. Both disciplines aim to assist students at various phases
of their academic careers using digital traces they leave in the systems [3]. As a
result, insights can uncover different learning styles, determine the complexity of
courses, identify specific areas of the content that cause difficulties in understanding
the concepts, and receive insights about the future performance or possible dropout
of the students.
The visualization and analysis of this data are often carried out using a variety of
machine learning included in EDM and LA methodologies to identify interesting and
useful hidden patterns for predicting students’ performance [4, 5]. The study of this
data may yield significant information that is beneficial to both teachers and students
if all phases of the knowledge discovery process are done thoroughly. Moreover, it is
expected that the research team understand not only the experiment design but also
the background of the learning process, resources of the educational data and the
importance of the correct application of individual pre-processing techniques [6].
Even though these requirements seem ordinary, the availability of the end-to-
end ML tools caused that many scholarly papers underestimated the importance of
the data understanding and data pre-processing phases of the knowledge discovery
process, which often led to the incorrect or imprecise interpretation of the obtained
results. The paper attempts to find out the most common data pre-processing methods
that may be used to improve the performance of classification models in terms of
their efficiency. Therefore, the following are a summary of the aims of the work given
in this paper:
1. The impact evaluation of integrating multiple pre-processing approaches on the
performance of classification algorithms.
2. Determination of pre-processing techniques which leads to a more precise
classification.
Main aim of this paper is to emphasize the importance and impact of individual
steps of the pre-processing phase on the quality of the ML model using an open
dataset of LMS logs.
The paper is structured as follows. The related works section summarizes the
importance of individual data pre-processing tasks and their impact on model perfor-
mance. Simultaneously, it provides an overview of the papers which deal with the
educational data pre-processing, especially LMS logs. The next sections introduce a
The Importance of Selected LMS Logs Pre-processing Tasks … 123
case study in which particular data pre-processing steps are presented using the open
dataset of LMS logs. Finally, the results and discussion sections compare the results
of two ML models, where a different set of pre-processing tasks had been done. In
addition, their impact on selected performance metrics is discussed in detail.
2 Related Works
Data pre-processing has the largest influence on the possible model generalization
based on machine learning algorithms. An estimate indicates that pre-processing
can take up to 50 to 80 percent of the whole classification process, highlighting
the significance of pre-processing phase in model development [7]. Furthermore,
enhancing data quality is essential for improving ML model performance metrics.
As was mentioned before, the paper is focused on the pre-processing phase of
machine learning research in education, which is frequently overlooked and unclear
because the researchers do not consider it important to explain and often do not
disclose what and why the data pre-processing techniques they implemented and
employed.
A dataset is composed of data items known as points, patterns, events, instances,
samples, observations, or entities [8]. Consequently, these data objects are usually
characterized by many attributes/features that offer the key characteristics of an entity,
such as the size of an object and the time at which an event occurred. A feature may
be a single, measurable quality, or a component of an event. The amount of training
data grows exponentially with the number of input space features. Features can be
broadly classified as either categorical or numerical.
Data pre-processing is the initial step in ML techniques, in which data is trans-
formed/encoded so that the computer can quickly study or understand it. An unpro-
cessed dataset cannot be used to train a machine learning model. Incomplete, noisy,
and inconsistent data, inherent to data obtained from original resources, continue to
cause issues with data analysis. Therefore, it is necessary to address the concerns
upfront and consider if the dataset is large enough, too small or fractured for further
analysis. At the same time, it is necessary to identify corrupted and missing data
that can decrease the prediction potential of the model. Pre-processing data with
respect to correct future interpretation are a component of the form that conditions
the input data to make subsequent feature extraction and resolution easier. Reducing
or adding the data dimensions may increase the overall performance of the model.
Additionally, if the accessible educational data does not contain a significant quan-
tity of all types of data, which provides a complete picture of the learning process,
then the information derived from the data may be unreliable because the missing or
redundant attributes may reduce the model’s precision [9].
In many issues, the dataset contains noisy data, and making the elimination of
noisy instances is one of the most challenging tasks in machine learning [10]. Another
challenging issue is distinguishing outliers from genuine data values [11]. These
124 J. Pecuchova and M. Drlik
inliers are incorrect data values located within the center of a statistical distribution,
and their localization and correction are extremely difficult [12].
The management of missing data is a frequent issue that must be addressed through
data preparation [13]. Frequently, numerical, and categorical datasets must be metic-
ulously controlled. On the other hand, it is well-known that several algorithms can
handle categorical cases more effectively or exhibit greater performance primarily
with such examples. Whenever this occurs, discretization of numerical data is of
paramount importance [14]. The grouping or discretization of categorical data is
a highly effective solution to the abovementioned difficulty. As a result, the initial
dataset is converted to the numerical format.
A large amount of categorical data is difficult to manage if the frequencies of many
categories vary greatly [15]. This frequently raises questions such as which subset
of categories gives the most useful information and, thus, which quantity should
be chosen for training the ML model [16]. In addition, a dataset with an excessive
amount of characteristics or features with correlations should not be included in the
learning process because these sorts of data do not provide relevant information [17–
20]. Therefore, selecting features that limit unneeded information and are irrelevant
to the research is required. As a result, feature selection techniques are useful in this
case [21, 22].
As can be seen, partial pre-processing tasks are mentioned in learning analytics
papers. However, a systematic review of the role of data pre-processing in learning
analytics or educational data mining is rare. However, as stated in [21] or [22], it is
evitable to actively improve researchers and educational policymakers’ awareness
of the importance of the pre-processing phase as an essential phase for creating a
reliable database prior to any analysis.
3 Methodology
The importance and impact of individual steps of the pre-processing phase on the
quality of the ML model will be demonstrated on an open dataset of LMS logs, which
is available for LA research [23]. Figure 1 visualizes the sequence of individual data
pre-processing steps, which should aim at the preparation of the dataset on which
different ML classification tasks can be realized. They are described in the following
subsections in detail.
The dataset used within the research comes from the August 2016 session of the
course “Teaching with Moodle”. Moodle Pty Ltd employees teach this course twice
a year in the form of online sessions.
The Importance of Selected LMS Logs Pre-processing Tasks … 125
The session was offered entirely online through an LMS Moodle from August 7
to September 4, 2016. There were 6119 students registered in this course, of which
2167 gave permission to utilize their data, resulting in 32,000 entries. 735 students
successfully completed the course, which consisted of completing all exercises within
the course. There were a total of 1566 students who obtained a course badge, and
735 students who completed the course.
The presented dataset consists of six selected tables extracted from the Moodle
database pages using SQL to CSV (comma-separated by values) files. Each dataset
has a pseudonymous username to facilitate the reconstruction of linkages between
individuals and their accomplishments, including activities, log entries, badges, and
grades. Other columns of data were omitted due to privacy concerns or because they
did not offer relevant information for this dataset [23].
Table 1 shows descriptions of six input CSV files with raw data. Each of these files
has numerous features that can be used into ML techniques to construct an adequate
predictive model for predicting student performance.
The dataset consists of 2167 student records, but 213 records were removed because
files, in general, contained academic information only about 1954 students. There
were no records that did not provide comprehensive information but the problem of
missing values had to be dealt. Missing data are acknowledged as one of the important
concerns that must be carefully addressed during the pre-processing phase, prior to
the use of machine learning algorithms, in order to develop effective machine learning
models. The historical records were considered, which provided information about
each student’s grades earned through various historical activities.
As part of this step, 6 of the 16 activities had to be removed. The most prevalent
strategies for handling missing values were used such as manual filling, replacement
with mean or zero/null values, adaptation an imputation procedure which distorted
the analysis results. Simultaneously, it was necessary to remove 752 records because
there was no information about students who participated in these obsolete activities.
The final dataset of 1202 records after cleaning.
Data scaling is step, which is required to ensure the validity of predictive modeling;
mainly in the situation, the input attributes have different scales. The ML algorithm is
efficiently trained when dataset is scaled or normalized. As a result, a better prediction
can be obtained and speeds up processing or training. Normalization requires extra
effort. The dataset must not have a few instances with fewer features. In addition, if
desirable care is no longer taken, the dataset may lose the internal structure, which
leads to lower accuracy [24].
The max–min normalization and z-score standardization are two of the most
widely used methods. Normalization is the process of scaling attribute values within
a specific range so that all attributes have approximately similar magnitudes. While
min–max normalization is sensitive to data outliers because their presence can signif-
icantly alter the data range, z-score standardization is less impacted by outliers. Typi-
cally, it is employed to transform the variable into a normal distribution with a mean
of zero and a standard deviation of one [25]. Due to fact that, dataset did not contain
significant outliers, and the data were normalized using min–max normalization at
intervals [-1, 1].
Data reduction typically consists of three main methods. The first is to directly pick
variables of interest using domain expertise. The second step is to choose important
variables for further study using statistical feature selection methods. The final step
is to implement feature extraction techniques to generate usable features for data
analysis. Sadly, the majority of datasets contain useless features, which can negatively
impact the performance of learning algorithms.
Feature selection (FS) methods can be roughly categorized into three groups: filter,
wrapper, and embedding approaches. The filter technique is a basic and quick way
of selecting features in which variables are ranked and selected based on specified
univariate parameters.
This research identified a filter-based strategy utilizing a selection algorithm based
on information gain. The filter approach was based on two criteria for feature selec-
tion: correlation (correlation matrix) and information-gain attribute assessment (in
this case, information-gain attribute evaluation).
A correlation matrix is suitable for checking the linear relationship between
features, as shown in Fig. 2. The primary objective was to mitigate the difficulty of
high-dimensional data by reducing the number of attributes without compromising
classification accuracy. In this case study, the correlation matrix showed that the final
outcome of the students was substantially correlated with their participation in each
course activity.
Figure 3 illustrated that the highest value was received by features badgesNo,
forum, and feedback, followed by categories relating to academic involvement such
as lesson, quiz, other modules, and so on. As illustrated in Fig. 3, a significant subset
of traits was picked while others were removed. Thus, the features examined in this
128 J. Pecuchova and M. Drlik
study received the highest ranking, indicating that students and their involvement
throughout the educational process significantly impact their academic achievements.
In contrast to feature selection, feature extraction seeks to create new features
based on linear or nonlinear combinations of existing variables. Principal component
analysis (PCA) and statistical methods are two representative linear feature extraction
techniques. The number of extracted principal components or features is decided by
the proportion of total data variance explained, e.g., the principal components should
be capable of explaining at least 80 or 90 percent of the total data variation.
The following 22 attributes shown in Table 3 were selected as a result of feature
selection: “BadgesNo”, “AttemptCount”, “At1”, “At2”, “At3”, “ActivityHistory”,
“OtherModules”, “AtRisk”, “M1”, “M2”, “M3”, “M4”, “M5”, “M6”, “M7”, “M8”,
“M9”, “M10”, “M11”, “M12”, and “Result”.
The “Result” attribute was linked to whether the student passed or failed the
course based on the completion of each module. Initially, the records from “mdl
course modules.csv” regarding each action within the course module were extracted.
Therefore, “mdl course modules completion.csv” was parsed to extract the results of
those students’ participation in those activities.
In general, the dataset exhibited issues typical of this type of educational data.
A large number of features caused the dataset to be multidimensional. Some of the
qualities were not meaningful for classification, and others were not connected. In
The Importance of Selected LMS Logs Pre-processing Tasks … 129
Fig. 3 Highly ranked features after applying filter-based evaluation using gain ratio
our situation, however, the data were not skewed, as the majority of students passed
while a minority did not.
Typically, the unbalanced data problem arises as a result of learning algorithms
ignoring less frequent classes in favor of more frequent ones. As a result, the resulting
classifier cannot correctly identify data instances that belong to classes that are
inadequately represented [5].
130 J. Pecuchova and M. Drlik
Support vector machines (SVMs) classification algorithm was used to train the
datasets as it was suitable for this kind of dataset composed of numerical features.
SVM works for classification and prediction problems, and the idea behind it is to
find a line that best isolates multi-group labels. Moreover, it is developed to deal with
numeric attributes, as it deals with nominal ones after converting them to numeric
data types.
The final pre-processed dataset for building the predictive model consisted of 1203
student information records stored in 20 features, 19 numerical, and one categorical.
The original aim was to predict whether a student’s outcome would pass or fail. The
categorical feature was encoded into numerical so that the sklearn functions could
properly work.
The final pre-processed dataset was balanced, having unsuccessful students (617),
and it has successful ones (586). The data were divided into the training and testing
dataset to increase the efficiency and stability of ML models.
The confusion matrix shows the efficiency of the final model along with the overall
accuracy. Attention was paid to false positive and false negative values within the
confusion matrix. A false positive identifies successful students as failures. A false
negative identifies the unsuccessful students as successful. A false positive can have
more impact than a false negative.
The evaluation of the result without the pre-processing of the dataset is shown
in Table 4. The results with pre-processing of the dataset are shown in Table 5. An
evaluation was carried out in the test dataset, which consisted of 394 successful
students and 398 dropouts. The results showed that false positive values were greatly
reduced for the pre-processed dataset, and the overall accuracy was increased by
16%.
5 Conclusion
tasks. It can be assumed that the impact of the adopted pre-processing techniques
would vary from one classification algorithm to another.
Future research will investigate the impact of various pre-processing strategies
on other classification and clustering algorithms. In addition, the appropriate pre-
processing procedures for such datasets must be determined in order to address
problems that are less frequent.
Acknowledgements This work was supported by the Scientific Grant Agency of the Ministry
of Education of the Slovak Republic and Slovak Academy of Sciences under Contract VEGA-
1/0490/22, and by the European Commission ERASMUS+ Program 2021, under Grant 2021-1-
SK01-KA220-HED-000032095.
References
1. Skalka J, Švec P, Drlík M (2012) E-learning and quality: the quality evaluation model for e-
learning courses. In: Divai 2012 - 9th International scientific conference on distance learning
in applied informatics
2. Amrieh EA, Hamtini T, Aljarah I (2015) Preprocessing and analyzing educational data set
using X-API for improving student’s performance. In: 2015 IEEE Jordan conference on applied
electrical engineering and computing technologies (AEECT). IEEE, pp 1–5
3. Alcalá-Fdez J, Sanchez L, Garcia S, del Jesus MJ, Ventura S, Garrell JM, Herrera F (2009)
KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput
13(3):307–318
4. Chouldechova A (2017) Fair prediction with disparate impact: a study of bias in recidivism
prediction instruments. Big Data 5(2):153–163
5. Kabathova J, Drlik M (2021) Towards predicting student’s dropout in university courses using
different machine learning techniques. Appl Sci 11(7):3130
6. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-
sampling technique. J Artif Intell Res 16:321–357
7. Kadhim AI (2018) An evaluation of preprocessing techniques for text classification. Int J
Comput Sci Inf Secur (IJCSIS) 16(6):22–32
8. Cui ZG, Cao Y, Wu GF, Liu H, Qiu ZF, Chen CW (2018) Research on preprocessing technology
of building energy consumption monitoring data based on a machine learning algorithm. Build
Sci 34(2):94–99
9. Davis JF, Piovoso MJ, Hoo KA, Bakshi BR (1999) Process data analysis and interpretation.
Adv Chem Eng 25:1–103. Academic Press
10. Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study. Artif Intell Rev
22(3):177–210
11. Khoshgoftaar TM, Van Hulse J, Napolitano A (2010) Comparing boosting and bagging tech-
niques with noisy and imbalanced data. IEEE Trans Syst Man Cybern Part A Syst Hum
41(3):552–568
12. van Hulse J, Khoshgoftaar TM, Napolitano A (2007) Experimental perspectives on learning
from imbalanced data. In: Proceedings of the 24th international conference on Machine
learning, pp 935–942
13. Farhangfar A, Kurgan LA, Pedrycz W (2007) A novel framework for imputation of missing
values in databases. IEEE Trans Syst Man Cybern Part A Sys Hum 37(5):692–709
14. Elomaa T, Rousu J (2004) Efficient multisplitting revisited: optima-preserving elimination of
partition candidates. Data Min Knowl Disc 8(2):97–126
The Importance of Selected LMS Logs Pre-processing Tasks … 133
15. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn
Res 3(2003):1157–1182
16. Skillicorn DB, McConnell SM (2008) Distributed prediction from vertically partitioned data.
J Parallel Distrib Comput 68(1):16–36
17. Czarnowski I (2010) Prototype selection algorithms for distributed learning. Pattern Recogn
43(6):2292–2300
18. Xiao W, Ji P, Hu J (2022) A survey on educational data mining methods used for predicting
students’ performance. Eng Rep 4(5):e12482
19. Mingyu Z, Sutong W, Yanzhang W, Dujuan W (2022) An interpretable prediction method for
university student academic crisis warning. Complex Intell Syst 8(1):323–336
20. Ismael MN (2022) Students performance prediction by using data mining algorithm techniques.
Eurasian J Eng Technol 6:11–25
21. Feldman-Maggor Y, Barhoom S, Blonder R, Tuvi-Arad I (2021) Behind the scenes of
educational data mining. Educ Inf Technol 26(2):1455–1470
22. Luna JM, Castro C, Romero C (2017) MDM tool: a data mining framework integrated into
Moodle. Comput Appl Eng Educ 25(1):90–102
23. Dalton E (2017) Learn Moodle August 2016 anonymized data set. [Dataset]
24. Munk M, Drlík M (2011) Impact of different pre-processing tasks on effective identification of
users’ behavioral patterns in web-based educational system. Procedia Comput Sci 4:1640–1649
25. Munk M, Kapusta J, Švec P (2010) Data preprocessing evaluation for web log mining:
reconstruction of activities of a web visitor. Procedia Comput Sci 1(1):2273–2280
Analysis of Deep Pre-trained Models
for Computer Vision Applications: Dog
Breed Classification
Meith Navlakha, Neil Mankodi, Nishant Aridaman Kumar, Rahul Raheja,
and Sindhu S. Nair
Abstract Machine perception is one of the most lucrative domains in the modern
landscape and one of the most challenging tasks embodied by this domain is analyzing
and interpreting images. Image recognition has seen several advancements over
the years such as the introduction of pre-trained models that have expunged the
complexity associated with developing high performance deep neural networks.
In this paper, we have proposed several eminent pre-trained models that have the
ability to categorize dogs as per their breeds. The main objective of this paper is
to present a fair comparison of the proposed models and establish the nonpareil
model on the basis of comparison metrics such as accuracy, validation accuracy,
time requirements, precision and recall scores of the model. It was observed that
ResNet 152V2 performed the best with respect to the accuracy, precision and recall
scores, Inception-ResNet gave the best validation accuracy and NASNet-Mobile had
the highest efficiency albeit with inferior performance in accordance with the other
evaluation metrics.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 135
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_12
136 M. Navlakha et al.
1 Introduction
There has been severe debate among the prominent governing bodies of canine
registries regarding the official number of recognized dog breeds. For example, the
American Kennel Club (AKC) only recognizes 195 breeds, while the Federation
Cynologique International (FCI) recognizes 360 breeds officially. There isn’t an
exact, worldwide, internationally agreed-upon number, but it would be safe to say
that there are several hundreds of canine breeds and having these many breeds poses
a serious problem. Many of these breeds have easily perceptible differences in their
characteristics such as height, weight and color that make them easy to distinguish,
but at the same time, there are several dog breeds that are nearly identical with
small, imperceptible differences that make it very difficult for us as human beings to
distinguish them from each other. The task of classifying dogs on the basis of their
breeds is a suitable candidate for being included as a machine perception problem
that can be managed with the help of convolutional neural networks (CNNs). CNNs
are a type of artificial neural networks that are primarily used in image recognition
and processing that are explicitly designed to process pixel data.
One of the most impactful advancements in the usage of CNNs has been the
introduction of transfer learning and pre-trained models. Transfer learning is a subset
of artificial intelligence (AI) and machine learning (ML) which aims to apply the
knowledge obtained from one task to a different but similar task. Pre-trained models
are deep neural networks which have been trained on an extensive dataset, typically
to solve some substantial image classification task. One can either use this pre-trained
model as it is or you can personalize it to fit your given task. This paper puts the
spotlight on the following pre-trained models:
• DenseNet 201
• InceptionV3
• Inception-ResNetV2
• MobileNet
• NASNet-Mobile
• ResNet 152V2
• Xception
Above mentioned are just a few of the several pre-trained models available in
the Keras framework and deciding which model to use is a headache that this paper
aims to solve. The core objective of our research is to conduct a fair comparison
of all the seven proposed pre-trained models and establish the nonpareil model. For
the purpose of comparison, we will be making use of standard metrics such as the
accuracy, validation accuracy, time requirements, precision and recall scores of the
model.
For the purpose of classifying canine images on the basis of their breeds, we
will be using the Stanford Dogs dataset that is publicly available on the Web site
known as Kaggle. The Stanford Dataset is an image dataset containing 20,580 images
corresponding to 120 different canine breeds, some of which are visible in Fig. 1.
This dataset has been curated by using ImageNet for the sole objective of developing
machine learning models to solve the challenging problem of classifying dog breeds
that have nearly indistinguishable features.
2 Related Work
The majority of the papers published on dog breed classification have chosen a
segment of the entire population of breeds that the dataset consists of (10, 13, 15,
etc.). Even though this is computationally less resource intensive as it takes less
training time, it certainly lacks practical utility. We, for the purpose of this paper,
have chosen 120 breeds of the dogs present in the Stanford Dataset on Dogs [1].
Wang et al. [2] have proposed the use of two fine-tuned pre-trained models for the
task of image classification of dogs and cats. VGG16 and VGG19 are the two models
being considered for the research, both of which operate on only 21 separate varieties
of dogs and cats. The proposed models have an accuracy of 98.47% and 98.59%,
respectively. D˛abrowski et al. [3] use the Inception-v3 pre-trained model to classify
among the dog breeds. It has an accuracy of 78.8%. While these studies achieved
excellent levels of accuracy, it falls short in certain aspects. The study only focuses
on a segment of breeds which severely limits its usability in real-world applications.
These papers also have a very limited scope of comparison as it chooses to target
only two of the several pre-trained models available for image classification.
Bouaafia et al. [4] have proposed the use of four pre-trained models for the task
of traffic sign identification in the domain of computer vision. The four selected
models are VGG16, VGG19, AlexNet and ResNet-50. The authors of the paper
have managed to obtain exemplary levels of accuracy for all of the aforementioned
138 M. Navlakha et al.
pre-trained models, but there are certain aspects that the research fails to take into
account. The study has a very confined scope of comparison as it chooses to only
focus on four pre-trained models. This makes it difficult for the readers to get the
complete overview of the pre-trained model landscape.
Varshney et al. [5] have proposed the use of a transfer learning approach on
two neural networks which are VGG16 and Inception-v3. The scope of the study
is however restricted to only two models; this restricts the comparison as the study
chooses to target only two of the several pre-trained models available for image
categorization. Furthermore, the accuracy (validation accuracy of 0.545 and training
accuracy of 0.69281) achieved by the study is not competitive in nature rendering the
model practically unusable. Also, the overhead of deploying two pre-trained models
for the purpose of breed prediction given the accuracy achieved does not classify as
a reasonable trade-off.
Junaidi et al. [6] have proposed a transfer learning approach for the task of image
classification of an egg incubator. The proposed approach consists of two pre-trained
models which are VGG16 and VGG19. The scope of research is severely limited due
to the fact that it only takes two models into account and both of them are versions
of VGG. Comparing only VGG models results in the paper not being able to provide
the reader with more significant results and insights. In addition to this drawback, the
paper achieves a maximum accuracy of 92% which has been adequately surpassed
by some of the models considered in our research paper.
Nemati et al. [7] have proposed the use of transfer learning in order to clas-
sify the patients as COVID-19 positive or negative using chest X-ray images. This
paper inspired us to extensively use pre-trained models in our research. They have
compared the accuracies of 27 different models some of them being Inception-v3,
ResNet, Xception, NASNet, VGG16, VGG19, etc. This paper became the source of
inspiration as the accuracy which they managed to attain was extraordinary. More-
over, the baseline problem statement of our as well as their paper was similar, i.e.,
related to image classification using CNNs.
Akhand et al. [8] have proposed the use of transfer learning to perform facial
emotion recognition (FER) using images of an individual. This paper makes use of
the following pre-trained models VGG16, VGG19, ResNet-18, ResNet-34, ResNet-
50, ResNet-152, Inception v3 and DenseNet 161 to carry out the above task. This
paper uses the base pre-trained models for feature extraction and an additional dense
layer for the classification task which led to accuracies of 96.51 and 99.52% in the
FER task. This was a major inspiration for the methodologies used in our paper. In
addition to this, the paper also has a similar framework to the methodology proposed
in this paper.
All the papers related to dog breed prediction had a restricted scope. The scope of
all the related papers was restricted to at the most three models. We have considered
eight models in total making it more elaborate and comprehensive.
Analysis of Deep Pre-trained Models for Computer Vision … 139
3 Dataset
Images of 120 different dog breeds are included in the Stanford Dogs dataset. The
purpose of this dataset was to do fine-grained image categorization utilizing images
and annotations from ImageNet [9]. It was initially gathered to help with fine-grained
image categorization, a difficult task, given that several dog breeds have almost
similar traits or differ in color and age. It comprises 20,580 images of 120 dog breeds.
Figure 2 contains the top 20 classes. It contains images with class labels, bounding
box dimensions of the main subject of that image and the dimensions of the image
itself. The bounding box consists of the dimensions of the subject of the image. The
dataset is made up of 2 folders, i.e., the images folder and the annotations folder. The
names of the objects in each folder are the same thus linking them together.
4 Data Augmentation
The data augmentation technique is used to enhance the diverseness in our training
data by applying stochastic, but rational transformations, like image rotation, skew-
ness, scaling, horizontal, or vertical shifting on the existing dataset. This helps
to reduce the chances of overfitting. Figure 3 displays the augmented images.
Data augmentation is implemented using the ImagedataGenerator class available
in TensorFlow’s Keras library.
140 M. Navlakha et al.
5 Proposed Work
5.1 Models
etc. The network was able to learn significant feature patterns for a wide variety of
pictures as a result. The network accepts 224 × 224 image input.
Xception Xception [13] is a CNN with 71 deep layers that derives upon depth-
wise separable convolutions. It is trained on ImageNet, allowing the network to
categorize images among thousand distinct subcategories. The network accepts 299
× 299 image input.
DenseNet 201 DenseNet 201 [14] is a CNN with 201 deep layers. It is trained
on ImageNet, allowing the network to categorize images among thousand distinct
subcategories. The network accepts 224 × 224 image input.
Table 1 depicts all the key characteristics of all the pre-trained models being
considered in our research on dog breed classification.
The dataset was split into training, testing and validation segments. Each segment
was preprocessed and augmented as mentioned in Sect. 4. The resultant images
of training and testing segments were fed to the pre-trained model. While training
the model, accuracy, precision, recall were observed for every epoch. Finally, the
validation accuracy was computed by evaluating the complete model. The entire
workflow is demonstrated in Fig. 4.
142 M. Navlakha et al.
Fig. 4 Flowchart
6 Experimental Performance
The experiment was performed using Python and Keras package with Tensorflow-gpu
on Kaggle having 13 GB RAM and 2-core Intel Xeon CPU, along with Tesla P100
16 GB VRAM GPU. The parameters used in the learning process for all pre-trained
models were as follows:
• Batch-Size: 32
• Epochs: 25
• Loss function: Categorical Crossentropy
• Optimizer: Adam, RMSProp and Adamax
• Learning Rate: 0.001 and 0.01
• Metrics: Accuracy, Precision and Recall
• Activation: Softmax
The ratio of true positives to actual positives is used to calculate precision. Precision
is a metric used in classification and pattern recognition. It helps us to check how
well the model can classify positive samples.
Analysis of Deep Pre-trained Models for Computer Vision … 143
TP
Precision =
TP + FP
The ratio of correctly categorized positive (+ve) records to all the positive (+ve)
records is termed as recall. Higher recall score denotes the fraction of positive samples
correctly detected.
TP
Recall =
TP + FN
TP True Positives.
FN False Negatives.
FP False Positives.
The F1-score is computed by taking the harmonic mean of a model’s recall and
precision, combining the two into an effective statistical measure. It is generally used
to compare among the models.
Precision × Recall
F1 score = 2 ×
Precision + Recall
7.2 Analysis
We have implemented the above pre-trained models using the proposed framework as
shown in Fig. 4; the comparison between them is displayed in Table 2, and the optimal
versions for each pre-trained model are presented in Table 3. We have emboldened
the best results for the chosen hyper-parameters across all the pre-trained models. We
have plotted the accuracy versus epochs, validation accuracy versus epochs and F1-
score versus epochs graphs as visible in Figs. 5, 6 and 7, respectively. As witnessed
in Table 3, the accuracy, precision and recall scores for ResNet 152V2 appear the
best although it may have overfit as the validation accuracy is not comparable to the
training accuracy.
The validation accuracy is highest for Inception-ResNet at approximately 82.5%
as witnessed in Table 3, which demonstrates that it has generalized the best with the
pet dataset, along with an overall decent accuracy, precision and recall scores. It can
also be observed from Table 3 that NASNet-Mobile is the most efficient model as
it requires the least amount of time to train, but overall the model falls short due to
inferior values achieved for metrics such as accuracy, validation accuracy, precision
and recall.
144 M. Navlakha et al.
8 Conclusion
9 Future Work
References
1. Khosla A, Jayadevaprakash N, Yao B, Li F-F (2011) Novel dataset for fine-grained image
categorization: Stanford dogs. In: Proc. CVPR workshop on fine-grained visual categorization
(FGVC), vol 2, no 1. Citeseer
2. Wang I-H, Lee K-C, Chang S-L (2020) Images classification of dogs and cats using fine-tuned
VGG models. In: 2020 IEEE Eurasia conference on IOT, communication and engineering
(ECICE). IEEE, pp 230–233
3. Dabrowski A, Lichy K, Lipiński P, Morawska B (2021) Dog breed library with picture-based
search using neural networks. In: IEEE 16th International conference on computer sciences
and information technologies (CSIT). IEEE, pp 17–20
4. Bouaafia S, Messaoud S, Maraoui A, Ammari AC, Khriji L, Machhout M (2021) Deep
pre-trained models for computer vision applications: traffic sign recognition. In: 2021 18th
International multi-conference on systems, signals & devices (SSD). IEEE, pp 23–28
5. Varshney A, Katiyar A, Singh AK, Chauhan SS (2021) Dog breed classification using deep
learning. In: 2021 International conference on intelligent technologies (CONIT). IEEE,pp 1–5
6. Junaidi A, Lasama J, Adhinata FD, Iskandar AR (2021) Image classification for egg incubator
using transfer learning of VGG16 and VGG19. In: 2021 IEEE International conference on
communication, networks and satellite (COMNETSAT). IEEE, pp 324–328
7. Nemati MA, BabaAhmadi A (2022) An investigation on transfer learning for classification
of COVID-19 chest x-ray images with pre-trained convolutional-based architectures. In: 2022
30th International conference on electrical engineering (ICEE). IEEE, pp 880–884
8. Akhand MAH, Roy S, Siddique N, Kamal MAS, Shimamura T (2021) Facial emotion
recognition using transfer learning in the deep CNN. Electronics 10(9):1036
9. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical
image database. In: 2009 IEEE Conference on computer vision and pattern recognitio. IEEE,
pp 248–255
10. Ojha N, Kumar A (2020) A comparison based breast cancer high microscopy image classifica-
tion using pre-trained models. In: 2020 IEEE Students conference on engineering & systems
(SCES). IEEE, pp 1–6
Analysis of Deep Pre-trained Models for Computer Vision … 147
11. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam
H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications.
arXiv preprint arXiv:1704.04861
12. Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transferable architectures for scalable
image recognition. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp 8697–8710
13. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceed-
ings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
14. Ovreiu S, Paraschiv E-A, Ovreiu E (2021) Deep learning & digital fundus images: glaucoma
detection using DenseNet. In: 2021 13th International conference on electronics, computers
and artificial intelligence (ECAI). IEEE, pp 1–4
15. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-ResNet and the
impact of residual connections on learning. In: Thirty-first AAAI conference on artificial
intelligence
An Efficient Recognition
and Classification System for Paddy Leaf
Disease Using Naïve Bayes
with Optimization Algorithm
Abstract In today’s world, paddy is the main crop used by most people. Several
diseases occur in paddy crops. Due to numerous diseases, only a limited quantity of
paddy crop is yielded. Due to lack of technical and scientific knowledge, it is not
easy for farmers to predict the diseases. Physically recognize and categorize paddy
diseases has required more time. The research topic is classification, and recogni-
tion of the diseases from the leaf is one of the recent research topics in artificial
intelligence. Consequently, an automatic and exact recognition scheme has turned
into vital to reducing this issue. A novel method has been developed to find the
paddy diseases brown spot, bacterial blight, blast diseases, and sheath rot through
the application of machine learning (ML) classification methods. In this research
article, a robust methodology is proposed that is categorized into different phases. In
the preprocessing phase, removal of the background using the RGB images converted
into HSV images and the segmentation phase of the normal fraction, diseased frac-
tion, and the background a k-means clustering method is used. The machine learning
method has been implemented to enhance the accuracy rate and the planned method.
The firefly optimization has been implemented to fuse with an ML named Naïve
Bayes classification method. Afterward, 494 leave images of our dataset of five
dissimilar paddy leave diseases are used to instruct the NB with firefly proposed
model. NB classification is then trained with the characteristics, which are extracted
from NB with firefly method. The proposed method capably recognized and classi-
fied paddy leave diseases of five different categories and achieved 98.64% accuracy
rate.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 149
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_13
150 V. Prashar et al.
1 Introduction
Fig. 1 i Bacterial blight paddy leaf ii Blast paddy leaf iii Brown spot paddy leaf iv Sheath rot
paddy leaf
quality. Generally, plant infections attack by several pathogens like bacteria, viruses,
and fungus, etc. Normally, these viruses destruct the plant photosynthetic procedure
and interrupt the growth of plants. Thus, it is the main task to categorize the paddy
plant disease in the premature phase [9].
At this time, farmers used their personal knowledge and knowledge to classify and
identify infections. Without identifying the plant infections, farmers use insecticides
in extreme quantity, which cannot assist in the inhabitation of disease but may enclose
an effect on plants. However, some paddy leaf diseases may generate the same spot
area. It also different lesions could be formed from similar diseases because of
various paddy leaf varieties and local situations or climate [10]. Therefore, at times,
their misclassification creates a bad effect on paddy cultivation. Consequently, it
requires a guidance form paddy infection professionally. In rural areas, paddy disease
professionals cannot able to offer rapid remedies to the farmers at the correct time, and
they need the costly devices and a large amount of time for manual recognition and
classification of paddy diseases. The second phase, for physical image processing,
needs additional eyes to verify and crosscheck for its accuracy rate. However, an
automatic scheme recognition and classification infections affected pictures more
accurately than the manual classifying procedures.
The research approach defined a system to recognize and categorize the infection
of paddy leaves. It guides the farmers to take the precise evaluation and encourage
them to augment construction. In the study work, for the categorization of paddy
leave disease images, it has developed an ML system by implementing NB with
an FF optimization algorithm. The extraction of features is assumed to train a NB
algorithm. The research system is then, calculated on a test, analyzes the database
of 200 paddy leaf images, and achieved a calculation AUC of 98.64 percent.
In this research paper, explained several recognition and classification methods
of diseases in paddy crops are analyzed in Sect. 1. In the Sect. 2, an approach is
implemented for classifying of paddy leave diseases. Detailed feature extraction that
defines the ROI for classification is elaborated in Sect. 4. Experimental analysis is
given in Sect. 5. The conclusion is given in Sect. 6.
152 V. Prashar et al.
2 Related Work
In the advancement technology the recent research model is related to recognize and
classification of plant leaf diseases. It is playing the important role in the machine
learning to recognize the diseases.
The classification and recognition of the paddy leaf diseases was based on DNN
Jaya approach. Using acquisition, the image of paddy plant leaves was acquired from
yield for normal paddy leave image, bacterial blight paddy leaf, brown spots images,
and blast infections. During the image preprocessing, the background elimination of
the RGB images was transformed into HSV pictures, and it was dependent on the
H (Hue) and S (Saturation) by Ramesh et al. [11] based binary pictures that were
extracted to segment the unhealthy and non-infected parts. During the segmentation
of the infected parts, the standard parts, and contextual process, the clustering algo-
rithm was used. In addition, the classification of the diseases was considered utilizing
optimized DNN along with Jaya algorithm. Experimental analysis was done and
comparable to ANN, DAE, and DNN. The planned model has acquired the accuracy
for the blast diseased up to 98.9%; bacterial blight was 95.34%, and sheath rot was
92.1%; brown spot was 94.56, and the standard leaf picture was 90.54% by Nidhis
et al. [12].
On the foundation of techniques, there is a deep convolutional neural network;
maize leaf disease detection was developed by Sun et al. [13]. They used a dataset
divided into three sets like a validation set, testing set, and training set of maize leaf,
and the total images 8152 include terrible diseases; these images were calibrated
by human–plant pathologists. The detection performance achieved by single-shot
multibox detector (SSD), and the observational outcomes depicted the feasibility
and efficiency of their proposed model.
The latest method caffeNet, a deep learning framework formulated by the learning
center and Barkley vision, which was used to perform the deep convolutional neural
network introduced by Sladojevic et al. [14]. Apart from it, the dataset of the total
number of images in 4483 that were downloaded from the Internet included 15
categorizes of disease images and non-disease images. The classifier named softmax
was trained from scratch, and they used the backpropagation algorithm, and the
overall accuracy of 96.3 percent was achieved by a deep learning model.
Using machine learning approaches, Liu et al. [15] developed a significant model
for detecting and classification diseases names Mosaic Rust, Brown spot, Alternaria
leaf caused by the apple leaves. The proposed AlexNet model and GoogleNet were
efficient and reliable, and they achieved the 97.62 percent accuracy and convex
optimization algorithm used for faster convergence rate.
To automatically detect the plant, leave diseases using the new approaches
designed called as support vector machine (SVM) by Zang et al. [16]. RGB model
was converted to HSI (Hue, saturation, and intensity), gray models, and YUV
model for identify the color from the images and the segmented image by used
the region growing algorithm. The most important features were selected by genetic
and correlation-based feature selection algorithm. At the end of the feasible and
An Efficient Recognition and Classification System for Paddy Leaf … 153
effective model, the classification phase done by SVM and achieved the correctness
was 90 percent.
Sengupta et al. [17] used the supervised incremental classifier named as a particle
swarm optimization algorithm to perform the identify rice diseases with 84.02
percent. The proposed method became a model which had time complexity of the
classification system constantly increment, and the system became more reliable and
efficient for finding rice plant diseases. Moreover, the increment classifier model is
suitable for applying on the dataset of postulation rice plant diseases such as the
aspects and the change behavior of rice leaf due to nature, environmental condition,
geological, and biological component.
In [18], Jiang et al. proposed an apple diseases identification model that is based
on image processing techniques and deep learning methods, namely INAR-SSD and
the deigned method by introducing the GoogleNet. In the experimental terms, the
database of diseases image is divided between training and testing phases by the
ratio 4:1. This approach accomplished the accuracy level which is 96.52 percent.
This model is a feasible solution for finding the real-time detection of apple diseases.
Ramcharan et al. [19] used the proposed method to find virus diseases named as
cassava (Manihot esculenta Crantz) disease considered as food security crops. It is
a source of carbohydrates for the human food chain. The dataset was collected of
11,670 images of cassava leaves from an International Institute of Tropical Agri-
culture (IITA). Apart from it, researchers used the deep learning method which had
the dataset divided into training and testing phases. During the model, 10 percent
was used for training phase, and the other 90 percent was used for testing phase.
The transfer learning method used for training cassava images and the deep learning
convolutional neural method to detect the disease with the accuracy of 96 percent.
Image processing and the neural network method proposed to identify cotton
leave diseases by Batmavady et al. [20]. Researchers collected the dataset of cotton
leaves from the plant village. Firstly, the process started from there using processing
techniques named as filter the image and then removes the noise and converted the
input image into grayscale conversion. A set of features extracted from the processing
image was used by the radial basis function neural network classifier. Further, SVM
classifier divides the dataset into the testing and training phase and then used to
recognize cotton disease with the better accuracy.
Ma et al. [21] proposed the method of deep learning convolutional network
conducting the symptom-wise identification of four type of cucumber diseases, i.e.,
downy mildew, anthracnose, target leaf spots, and powdery mildew. The DCNN
achieved a good accuracy of 93.4% on the results of 14,208 symptom images, and
the comparative scientific research used the classifiers support vector machine and
the random forest algorithm along with AlexNet.
Bai et al. [22] investigated cucumber leaf spot diseases implemented the segmen-
tation method named morphological operations and also the watershed algorithm for
working against the complex background of the disease images. The algorithm is
demonstrated the neighborhood greyscale information which improves the capacity
of noise filtering algorithm. The proposed method is an impressive and robust
154 V. Prashar et al.
segmentation method for recognizing the cucumber diseases. The success rate of
the segmentation method is 86.32%.
Spares representation classifier (SR Space) proposed by Zhang et al. [23] for
recognizing cucumber diseases and the segmentation done by k-means algorithm.
The advantages of the SR space classifier are effectively reducing the computational
cost, and this method was improving the performance of the recognition rate that
is 85.7%. Moreover, the lesion feature extraction method was used to extract the
important feature the cucumber leaf dataset.
The survey on various IP and ML methods used for recognition of the rice plant
(RP) viruses depends on the pictures of diseased RP by Shah et al. [24]. This research
presented different methods for the recognition and categorization of the diseases.
It carried out different surveyed papers on rice plant disease. The methods were
carried out with the picture database, quantity of classes, preprocessing segmen-
tation methods, classifiers, and so forth. They surveyed and studied the detection
and arrangement of the RP diseases. Proposed research by Singh et al. [25] on the
blast infection of the paddy leaf observes the picture of the plant leaf by specialists
with required actions. The disease recognition approach was a color slicing method
that perceived the infected spots and destructed proportion of the complete leaf that
provides the advice, if the disease took place and removes in a required period of time
to prevent the losses. In this research, the recognition of the blast disease of the paddy
leaf was presented. In the planned model, the accuracy was acquired up to 96.6%. The
new recognition provides better outcomes as compared to edge recognition models.
Sharma et al. [26] examined numerous machine learning (ML) algorithms and
evolutionary computation with deep learning methods for identifying paddy diseases.
This research took into consideration three main paddy ailments including, paddy
blasting, bacterial leaf blight, and brown spot. The findings of a comprehensive
comparison investigation concluded that transfer learning approaches were better
than the traditional ML algorithms. The findings could be used to farming practices
recognize the paddy disease early as then immediate strategy could be done moving
forward. To extrapolate the research’s results, emphasis needs to be placed on working
with large-scale datasetsin subsequent investigations shown in(Table 1).
3 Problem Definition
The human-based perceptions are conservative methods to monitor the leaf diseases
in the previous year. In these cases, it is very complicated, time-consuming, and
expensive to try the expert advice. Many farmers who are not educated and have
lack of knowledge about the type of diseases hence the eye-catching method suffer
from many downsides. Moreover, the machine learning methods for recognize the
leaf diseases and to make the right decision for selecting the proper treatment is also
challenging. To overcome the downside of the conservative methods, a new classifi-
cation method is required for a new machine learning. Machine learning approaches
are very few documented specially in the plant leaf diseases detection. The disease
An Efficient Recognition and Classification System for Paddy Leaf …
Table 1 Summary of different classification techniques detect different diseases
Author name, Type of leaves Classification techniques Algorithm Type of diseases detect Accuracy
Year
Ramesh et al. [11] Rice/paddy leaf DNN Jaya optimization algorithm Blast, bacteria blight, brown 97%
spot, sheath rot
Sun et al. [13] Maize leaf Convolutional neural network Single-shot multibox Northern maize leaf blight 91.83%
detector (SSD)
Sladojevic et al. [14] Infected CaffeNet Backpropagation algorithm 13 different diseases 96.3%
Liu et al. [15] Apple leaf GoogleNet and CaffeNet Convex optimization Mosaic rust, brown spot, 97.62%
algorithm alternaria leaf
Zhang et al. [16] Apple leaf SVM Genetic algorithm and Powdery mildew, mosaic 90.00%
region growing and rust
Sengupta et al. [17] Rice Particle swarm optimization IPSO (Incremental PSO) Rice diseases 84.02%
Jiang et al. [18] Apple leaf Convolutional neural Oriented object detection 5 type of apple diseases 96.52%
networks—INAR-SSD model algorithm
Ramcharan et al. [19] Cassava Convolutional neural network Transfer learning algorithm Mosaic disease, red mite 96%
and machine learning damage, healthy and brown
method—SVM leaf spot, brown streak
disease
Batmavady et al. [20] Cotton leaf RBF neural network, SVM – Cotton leaf diseases Sensitivity-92.00%
Accuracy-90.00%
Specificity-96.00%
(continued)
155
156
Table 1 (continued)
Author name, Type of leaves Classification techniques Algorithm Type of diseases detect Accuracy
Year
Ma et al. [21] Cucumber leaf AlexNet Random forest and support Downy mildew, 93.4%
vector machine anthracnose, target leaf
spots, and powdery mildew
Bai et al. [22] Cucumber leaf Morphological operations Water-shed algorithm Leaf spot 86.32%
Zhang et al. [23] Cucumber leaf Sparse representation k-means cluster algorithm Downy mildew, bacterial 85.7%
angular, corynespora
cassiicola, scab, gray mold,
anthracnose
V. Prashar et al.
An Efficient Recognition and Classification System for Paddy Leaf … 157
can be identified, and a solution for disease can be found by the classifier then using
the classification method as well as detection methods.
4 Proposed Methodology
With the five phases that include the image acquisition, image preprocessing, segmen-
tation, feature extraction, and image classification, the proposed structure is devel-
oped. The compute vision is an automatic system that performs disease and non-
disease paddy leave recognition and categorization is established in implemented
work. Various stages include in the investigation are given in Fig. 2.
The basic method of gathering the dataset of various paddy leave disease photos from
the https://archive.ics.uci.edu/ml/datasets/Paddy+Leaf+Diseases site, that are used
for this presented design, is known as image acquisition. The research system is using
three disease categories of paddy images such as brown spot, bacteria light, and sheath
rot. In this proposed system the paddy leaves are collected by using higher-resolution
digitalized camera (HRDC). After that the detection or classification of disease,
complete capture paddy pictures are located to processor, where the establishment
process may take place.
In the second phase, preprocessing for reducing the images in the database is re-sized
and cropped into 256*256 image pixels. The significant things to remove the image
environment are done by considering the hue values. Originally, the paddy leave
image in red, green, and blue model is transformed into HSV model [27]. Using the
HSV model, the normal S value is measured in the model, though it completes the
whiteness. It depends on the threshold rate of about 90 degrees, and leaf picture is
transformed into a binary picture, and the binary image is associated with the actual
RGB picture to create the label. The leaf region has the available disease segment
where the contextual part hides the picture.
In this phase shows the segmentation of a paddy leave picture, k-mean clustering
model is analyzed in this research. Normally, clustering is a technique to cluster the
picture into groups. The infected region is eliminated from the leaf picture through
grouping. In the paddy leaf picture, clusters are developed that estimate the infected
region and non-disease part.
k-means Clustering
Images are selected exploitation of the picture reading operated and displayed. After
that, the transformation of the colors is completed from the actual picture for selecting
the picture. The grouped pictures are based on the component. The bunch methods are
segmented that is based on the segmented algorithms [20]. It is a simple unsupervised
learning algorithm that follows the simple model to classify the essential index by
a specified amount of the clusters. It focused on the route of different region. The
following phase considers the centroid as center of clusters. k-centroid recomputed
where k-centroid of the cluster where similar data points and closest center are
arranged. With the creation of the new loop, the k-center changes with the new
position unless more alterations occur. Figure 3 shows the diseased portion.
This research work extracts both the text and color feature. The color charac-
teristic comprises the extraction of the mean and standard deviation. The textual
An Efficient Recognition and Classification System for Paddy Leaf … 159
R = img(a, b);
G = img(a, c);
B = img(d , b);
R G B
r = , g = ,
254.5 254.5 254.5
Step 4: Searching the standard deviation of r, g, b and measure the correlated factor;
Step 5: Combine the extracted features as;
160 V. Prashar et al.
End
Here, Img = Image, R = Red, G = Green and B = Blue.
The research work selects the valuable feature using a firefly optimization algorithm.
This feature selection is a met heuristic method that is nature inspired by the irregular
behavior of fireflies and the phenomenon of bio-luminescent communication.
End for j;
End for k;
Step 5: Rank firefly and search the current best value;
End while.
Step 6: Process the resulted values and get the output;
End.
4.6 Classification
The classification process is the most important factor of the paddy leave disease
prediction. The research system has implemented the Naïve Bayes (NB) method to
classify the disease leave. It is calculating the feature probability or depends only a
particular feature in a group, and it does not depend on any other group. When the
research system is developed for detection method, then implement a Naïve Bayes
method, which is a supervised learning method. This method is separated into 2
phases like (i) training and (ii) testing.
Accuracy Rate
The research model has proposed in the paddy leave detection system to enhance the
accuracy parameter as compared with the existing algorithm. The research model
accuracy has increased, according to the increase the number of images. The proposed
accuracy rate value is 98.64%.
Cross-Entropy Loss
The research model has proposed in the paddy leave detection system to decrease the
entropy loss parameter as compared with the existing algorithm. The research model
162 V. Prashar et al.
entropy loss has decreased, according to the increase the probability. The proposed
entropy loss value is 0.0064%.
5 Result Analysis
In this result explanation, comparative analysis with various parameters and math-
ematical expression described. In this, research work is performed in MATLAB
simulation tool used, and GUIDE has designed a project desktop application. In this
experimental result analysis, totally with 494 images which include 125 brown spot,
102 bacteria blight, 61 sheath rot, 120 normal, and 86 blasts are given.
It is almost 494 images of paddy leaf images were reserved for the training module.
The testing module presentation of the research model Naïve Bayes classification
method, long-term database images of paddy leaf diseases were selected for testing
module. The research flowchart shows the working of the proposed model. All images
were stored in .png and .jpg format. The characteristics like segment area, edges,
HSV model, etc are very essential.
The complete proposal is commonly concerned with different regions, the edges
with segmentation, HSV color model, and filtration is performing an important role
in the research system. Different types of performance metrics are calculated in DIP
methods such as segmented area. This research phase has improved the performance
and compared with the existing classification method (DNN-Jaya), which can be
An Efficient Recognition and Classification System for Paddy Leaf … 163
seen in several kinds of global features. It calculates the system performance with
accuracy rate, loss, TPR, TNR, FDR, FPR, etc.
This window creates by MATLAB GUIDE environment toolbox used. The design
information is displayed, and a code is also written in the script window. The
first step is to separate two groups into classification and recognition systems. The
training_module and testing module are defined in this conceptual framework. The
training_module demonstrates how to upload many photos at once. Preprocessing
techniques should be used to spot the distortions in the submitted image. Imple-
menting the filtering approach to eliminate the unwanted noise if distortion is present.
The HSV model is definite color space. After that, the KMC algorithm developed
in this system to divide the data into different number of clusters or groups. The
segmentation process involves locating the filtered image’s region. Then, it devel-
oped the FE algorithm using HoG method. This method is used to eliminate the
global-based features and calculate the segmented image features such as energy,
mean, SD, and image contrast. After that, to train the Naïve Bayes network, given
data and labels or targets are used. In testing, module shows the upload of the test
picture. It transforms the color picture to grayscale picture. Segment the regional
area, extract unique features, and calculate the performance metrics.
The firefly optimization algorithm method is based on a performance metric modi-
fication, varied search strategy and change the solution space to create the search
simply using various probability distributions. The detection process is done by
Naïve Bayes classification. Figures 4 and 5 demonstrate the two kinds of graph plot
with compare among planned and previous parameters like as accuracy, cross-entropy
rate. The planned approach has improved the accuracy compared with various kinds
of approaches like as DNN, DNN_JAO, and OF-Naïve Bayes.
Table 2 displays the results of the research effort, including characteristics like
accuracy rate (98.64%), cross-entropy loss (0.0064), TNR, TPR, and FPR values
(0.990, 0.9964, and 0.0036 respectively). The accuracy rate and loss in the paddy
leave disease detection technique have been increased through the methodology.
Fig. 4 Comparative analysis with the proposed and existing method using accuracy rate comparison
164 V. Prashar et al.
Fig. 5 Comparative analysis with the proposed and previous method using entropy loss
Table 3 Comparison:
Parameters OF-Naïve Bayes DNN_JAO DNN
Proposed and conventional
models (OFNB, DNN + JAO, Accuracy Rate 98.64 97 93.50
DNN) Cross-Entropy 0.0064 0.0100 0.01700
Loss
Table 3 compares the intended and prior techniques utilizing the classifiers OFNB,
DNN, and DNN + JOA. The accuracy performance rating of the methodology is
98.64 percent with OFNB, 97 percent with accuracy, and 93.50 percent with DNN.
Cross-entropy loss study process performance value is 0.0064 with OFNB; accuracy
is 0.0100, and DNN is 0.01700 percent.
6 Conclusion
This research paper defines improvement in the image quality, a large number of
operations carried out. The normal phase adds image preprocessing of paddy leave
An Efficient Recognition and Classification System for Paddy Leaf … 165
images. It involves RGB to grayscale conversion and RGB to HSV model conver-
sion. It applied the filtration methods such as Gaussian and three-dimensional box
filtration methods to enhance the picture quality and eliminate the unwanted noise.
After that, implement a k-means clustering approach for calculating the region of
the image. From these images, inner and outer quality characteristics like color,
global, and texture are removed using HoG method. The color image feature is
eliminated through mean and standard deviation. Feature metrics extract through
HoG. The classification is done by optimized firefly Naïve Bayes (OFNB) or OFNB
method. After that compute the performance metrics such as TP, TN, FN, and FP
and improve the accuracy rate as compared with the existing methods (ODNN) or
OFNB. The automatic paddy leaf disease detection system using different categories
of feature like image contrast, entropy, homogeneity, energy, and color is imple-
mented. For the extraction of features, histogram-oriented gradient (HoG) is used
resulting in unique feature calculation. The firefly optimization approach uses the
results in low dimension and complication. k-mean clustering segments the paddy
leaf picture, which gives maximum accuracy rate with less error rate and entropy loss.
This proposed system also compared with OFNB, DNN_JAYA and DNN classifica-
tion, where OFNB classifier gives higher disease detection accuracy rate (98.64%),
DNN_JAYA accuracy rate (97%), and then DNN (93.50%). Future work, it will
develop image processing method and deep learning method using F-CNN algo-
rithm to improve the system performance. In addition, other options such as hybrid
approaches like as clustering model or RNN can be used for improving the PSNR
and precise values through classification procedure. In addition, the number of image
processing algorithms can be developed for the identification of leaf disease.
References
1. Islam MA, Hossain T (2021) An automated convolutional neural network based approach for
paddy leaf disease detection. Int J Adv Comput Sci Appl 12(1):280–288
2. Pantazi XE, Moshou D, Tamouridou AA (2019) Automated leaf disease detection in different
crop species through image features analysis and one class classifiers. Comput Electron Agric
156:96–104. https://doi.org/10.1016/j.compag.2018.11.005
3. Sun G, Jia X, Geng T (2018) Plant diseases recognition based on image processing technology.
J Electr Comput Eng 2018:1–7. https://doi.org/10.1155/2018/6070129
4. International Rice Research Institute (2009). Crop health: diagnostic of common diseases
of rice. Retrieved from http://www.knowledgebank.irri.org/ipm/terms-and-definitions.htm.
Accessed on Aug 2013
5. Phadikar S, Sil J, Das AK (2013) Rice diseases classification using feature selection and rule
generation techniques. Comput Electron Agric 90:76–85. https://doi.org/10.1016/j.compag.
2012.11.001
6. Retrieved from http://www.rkmp.co.in/research-domain
7. Pratheba R, Sivasangari A, Saraswady D (2014) Performance analysis of pest detection for agri-
cultural field using clustering techniques. In: 2014 International conference on circuits, power
and computing technologies [ICCPCT-2014]. IEEE, 1426–1431. https://doi.org/10.1109/ICC
PCT.2014.7054833
166 V. Prashar et al.
8. Han L, Haleem MS, Taylor M (2015) A novel computer vision-based approach to automatic
detection and severity assessment of crop diseases. In: 2015 Science and information conference
(SAI). IEEE, pp 638–644. https://doi.org/10.1109/SAI.2015.7237209
9. Lichtenthaler HK (1996) Vegetation stress: an introduction to the stress concept in plants. J
Plant Physiol 148(1–2):4–14. https://doi.org/10.1016/s0176-1617(96)80287-2
10. Yao Q, Guan Z, Zhou Y, Tang J, Hu Y, Yang B (2009) Application of support vector machine for
detecting rice diseases using shape and color texture features. In: 2009 International conference
on engineering computation. IEEE, pp 79–83. https://doi.org/10.1109/ICEC.2009.73
11. Ramesh S, Vydeki D (2019) Recognition and classification of paddy leaf diseases using opti-
mized deep neural network with jaya algorithm. Inf Process Agric 7(2):249–260. https://doi.
org/10.1016/j.inpa.2019.09.002
12. Nidhis AD, Pardhu CNV, Reddy KC, Deepa K (2019) Cluster based paddy leaf disease detec-
tion, classification and diagnosis in crop health monitoring unit. In: Computer aided interven-
tion and diagnostics in clinical and medical images. Lecture notes in computational vision and
biomechanics, vol 31, pp 281–291. Springer, Cham. https://doi.org/10.1007/978-3-030-04061-
1_29
13. Sun J, Yang Y, He X, Wu X (2020) Northern maize leaf blight detection under complex field
environment based on deep learning. IEEE Access 8:33679–33688. https://doi.org/10.1109/
ACCESS.2020.2973658
14. Sladojevic S, Arsenovic M, Anderla A, Culibrk D, Stefanovic D (2016) Deep neural networks
based recognition of plant diseases by leaf image classification. Comput Intell Neurosci 2016:1–
11. https://doi.org/10.1155/2016/3289801
15. Liu B, Zhang Y, He D, Li Y (2017) Identification of apple leaf diseases based on deep
convolutional neural networks. Symmetry 10(1):11. https://doi.org/10.3390/sym10010011
16. Zhang C, Zhang S, Yang J, Shi Y, Chen J (2017) Apple leaf disease identification using genetic
algorithm and correlation based feature selection method. Int J Agric Biol Eng 10(2):74–83.
https://doi.org/10.3965/j.ijabe.20171002.2166
17. Sengupta S, Das AK (2017) Particle swarm optimization based incremental classifier design
for rice disease prediction. Comput Electron Agric 140:443–451. https://doi.org/10.1016/j.com
pag.2017.06.024
18. Jiang P, Chen Y, Liu B, He D, Liang C (2019) Real-time detection of apple leaf diseases
using deep learning approach based on improved convolutional neural networks. IEEE Access
7:59069–59080. https://doi.org/10.1109/access.2019.2914929
19. Ramcharan A, Baranowski K, McCloskey P, Ahmed B, Legg J, Hughes DP (2017) Deep
learning for image-based cassava disease detection. Front Plant Sci 8:1852. https://doi.org/10.
3389/fpls.2017.01852
20. Batmavady S, Samundeeswari S (2019) Detection of cotton leaf diseases using image
processing. Int J Recent Technol Eng 8(2S4):169–173. https://doi.org/10.35940/ijrte.b1031.
0782s419
21. Ma J, Du K, Zheng F, Zhang L, Gong Z, Sun Z (2018) A recognition method for cucumber
diseases using leaf symptom images based on deep convolutional neural network. Comput
Electron Agric 154:18–24. https://doi.org/10.1016/j.compag.2018.08.048
22. Bai X, Li X, Fu Z, Lv X, Zhang L (2017) A fuzzy clustering segmentation method based on
neighborhood grayscale information for defining cucumber leaf spot disease images. Comput
Electron Agric 136:157–165. https://doi.org/10.1016/j.compag.2017.03.004
23. Zhang S, Wu X, You Z, Zhang L (2017) Leaf image based cucumber disease recognition using
sparse representation classification. Comput Electron Agric 134:135–141. https://doi.org/10.
1016/j.compag.2017.01.014
24. Shah JP, Prajapati HB, Dabhi VK (2016) A survey on detection and classification of rice plant
diseases. In: 2016 IEEE International conference on current trends in advanced computing
(ICCTAC). IEEE, pp 1–8. https://doi.org/10.1109/ICCTAC.2016.7567333
25. Singh A, Singh ML (2015) Automated color prediction of paddy crop leaf using image
processing. In: 2015 IEEE Technological innovation in ICT for agriculture and rural
development (TIAR). IEEE, pp 24–32. https://doi.org/10.1109/TIAR.2015.7358526
An Efficient Recognition and Classification System for Paddy Leaf … 167
26. Sharma M, Kumar CJ, Deka A (2022) Early diagnosis of rice plant disease using machine
learning techniques. Arch Phytopathol Plant Prot 55(3):259–283
27. Junhua C, Jing L (2012) Research on color image classification based on HSV color
space. In: 2012 Second international conference on instrumentation, measurement, computer,
communication and control. IEEE, pp 944–947. https://doi.org/10.1109/IMCCC.2012.226
28. Salfikar I, Sulistijono IA, Basuki A (2018) Automatic samples selection using histogram of
oriented gradients (HOG) feature distance. EMITTER Int J Eng Technol 5(2):234–254. https://
doi.org/10.24003/emitter.v5i2.182
Development of Decision-Making
Prediction Model for Loan Eligibility
Using Supervised Machine Learning
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 169
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_14
170 R. Gaurav et al.
1 Introduction
Machine learning and artificial intelligence are the most using technology nowadays
to enhanced automation in day-to-day business. This paper also comes under the
machine learning technology to enhance and automate the day-to-day business of
loan lending process. Nowadays, financial institutions are suffering from huge pres-
sure of non-performing asset. To reduce non-performing asset problem, it is important
to predict before lending the loan for the customer which is capable to repay of loan
or not. The prime objective of this models to lower the fear of defaulting of loan
repayment and also filters out the creditworthy applicants to lending fear-free loans.
It is also noticed that only good credit score is not a guarantee of loan defaulter. Huge
number of cases shown that also good credit scorer person also defaulting in loan
repayment. So, financial experts believe that there should be other mechanism could
also apply to cope with loan defaulting problem. Because loan defaulting problem
has different factors also. So, the objective of this paper is to analyze existing loan
eligibility system using supervised machine learning, to develop decision-making
prediction model for loan eligibility through supervised machine learning algorithm,
and to compare the results with existing model and enhanced the accuracy results
with developed model for loan eligibility system.
2 Related Works
I have studied so many research paper to understand the different aspects of machine
learning model related to loan prediction model and find some useful information to
propose some more robust model.
The authors, Shinde et al. [1] in their research paper used two machine learning
model logistic regression and random forest with k-folds cross-validation to predict
the eligibility of loan applicants. After using this method, their ML model reached
the accuracy of 79 percent with 600 cases in dataset.
A work by Ramya et al. [2] were used dataset having the attributes of the loan
customer applicants is accumulated with six hundred cases. This dataset is classed
among two sets: Train dataset that is exercised for training the machine learning
algorithm model. This carries each of the individualistic variables also with objective
variable, and test dataset carries each of the individualistic variables but not the
objective variable. Applying the machine learning model to forecast the objective
variable to the test dataset. The logistic regression machine learning algorithm model
is exercised to forecast the dual result.
Another work by Sarkar et al. [3] describes the “A Research Paper on Loan
Delinquency Prediction”, to design and develop a system that assists in detecting
and formulating the loan delinquency in a detailed way that can be easily understood
by borrowers. Since it involves prediction work, so, they used number of machine
learning algorithm, i.e., neural networks, linear regression.
Development of Decision-Making Prediction Model for Loan Eligibility … 171
Sheikh et al. [4] used logistic regression model for their proposed machine learning
model on “prediction of loan approval”. They used 1500 cases of dataset with
different number of features and apply various data preprocessing technique to reach
optimal accuracy of the machine learning model.
Vaidya [5] introduces regrading to applied region moving swiftly to automating
the process, to importance of automating the process, and the introduction of arti-
ficial intelligence as well as machine learning algorithm in it. Decision-making
machines are the further enhancement shifting of this trend after machine learning
model. Explained models are implemented with several machine learning model
algorithms. With more distant ample of co-related affairs, the proposed paper exer-
cised logistic regression as machine learning model technique for likelihood and
forecasted approach and achieved the accuracy between 70 to 80%.
After analyzing above related works, it can be seen that their ML prediction
accuracy is not reached more than 80 percent. In the financial sector, loan lending
is the main source of profit. To increase the profit, it is important for these financial
sectors that their lending loans must be returned on specified time without defaulting
of their applicants. This proposed paper will help to reduce their defaulting applicants
and filter the creditworthy loan applicants with the accuracy more than 90 percent.
This accuracy will be reached with the help of a higher number of generated cases
in the dataset to train the model. The main challenge in the proposed model is to
generate the dataset near the actual dataset because due to the sensitivity nature of
information, higher cases of dataset are not freely available. It can be seen in the
above related works model where the maximum number of cases used in the dataset
is 1500.
3 Problem Statement
Financial institutions, financial organizations, and big and small non-banking finance
company transaction in with several variety of loans like house loan, vehicle loan,
organization loan, personal loan, etc., in every region of state/countries. Those finan-
cial companies are providing services in cities, towns and village region. Through
file an application for loan by applicant that financial companies validate credit-
worthiness of applicants to sanction the loan rejecting the application. This model
paper is providing resolution for automating loan eligibility processes by deploying
several machine learning model algorithms. Therefore, those applicants will apply
on company’s portal with simple application form and get the status of their appli-
cation. That application form consists fields like Gender, Married Status, Academic
Qualifications, Dependents Member Details, Yearly Income, Loan Amount, Credit
History of Customers, and other related members. Using machine learning algo-
rithm to automate this process, start the process that model will pick out that facts
of the applicants whom is deserving to sanctioning the loan amounts; therefore,
financial companies could attention on those applicants. Loan eligibility forecast is
a real-life matter means almost each financial/NBFC organization address their loan
172 R. Gaurav et al.
sanctioning process. Whether the loan qualified procedure is unmanned, this could
cut back too many hours of resources and upgrade the capacity services to their
customers in other operations. The enhancement in customer assistant with satisfac-
tory service and benefits with processing costs are always important. Nevertheless,
the financial gain can only be dissecting whether the financial companies will have
a sturdy and trustworthy models to error-free forecast which customer’s loan that
should admit and which customer’s loan to reject, so that decrease the fear of loan
defaulting problem or NPA.
4 Proposed Model
Dataset is downloaded from Kaggle which provides the huge source of dataset for
learning purpose. But, this dataset is very sensitive in nature so financial institution
whom published the data for learning purpose very less number of cases. Dataset
downloaded from have two sets of dataset one for training and one for testing.
Training dataset is used to train the model with divided into 70:30 ratios. Bigger
part of dataset is for train the model, and smaller part is test for model, so accuracy
is calculated for this developed model.
Structure of Dataset in this section, describing attributes of dataset to be consider
in to develop machine learning model.
In dataset, there is 13 attributes with different data types. Loan_ID is an object
data type with hold the data of loan identification. Gender is an object data type with
hold the data of applicant’s gender, i.e., male/female. Married is an object data type
with hold the data of that applicant is married or not in yes/no options. Dependents
Development of Decision-Making Prediction Model for Loan Eligibility … 173
are an object data type with hold the data that how many dependents of applicant
in numbers. Education is an object data type with hold the data that applicant’s
qualification. Self_Employed is an object data type with hold the data of applicant’s
self-employment. ApplicantIncome is an Int64 data type with hold the data of income
of applicant. CoapplicantIncome is a float64 data type with hold the income data of
co-applicant. LoanAmount is a float64 data type with hold the data of loan amount
in USD. Loan_Amount_Term is float64 data type with hold the data of loan term in
month. Credit_History is a float64 data type with hold the data of meet the criteria
or not. Property_Area is an object data type with hold the data that property lies in
which area means city, town, and village. Loan_Status is an object data type to hold
the data that whether loan is approved or rejected.
Cases Generation of Dataset have seen that in all previous related works their
cases in dataset is small. Low number of cases in dataset impact the performance of
machine learning model. So, in this model generated more number of cases using
technique of data generation with the help of following online dataset case generation
tool:
• Faker
• Mockaroo
• Mock Turtle.
Now, after using these tools, have increased number of cases from 614 to 4298.
Increased number of cases in this dataset impacted positively to enhance the accuracy
of developed machine learning model.
Figure 1 showing sample cases of dataset after case generation, i.e., 4298 rows ×
13 columns.
After data are collected, then next step is exploring the data for better understanding.
Understanding of data is very important to filter out those value which decreases the
execution and precision of the deployed model. Because all the values are fitted into
the model, then outliers or extreme value will impact the execution and precision
of deployed model. Missing values are also important to be filed because it also
impacts the performance of ML model. Data analysis of dataset is also important to
understand the relation of their attributes.
Figure 2 showing statistical analysis of data.
Exploratory data analysis (EDA) of gender feature in dataset is showing 80 percent
of applicants is male, and 20 percent isfemale applicants (see Fig. 3).
Figure 4 is showing credit history with loan status of paid and unpaid debts appli-
cants where 1.0 is indicating paid debts applicants and 0.0 is unpaid debts applicants.
It can be seen that loan approved for 85 percent of paid debts paid applicants (1.0)
and loan disapproved for 85 percent of unpaid debts applicants (0.0) as per dataset
after EDA.
Figure 5 which is a box plot is describing dataset has lot of outliers values after
exploration of dataset. These outliers will be treated in data preprocessing section to
reduce outliers from the dataset.
Data preprocessing is the most important part to develop machine learning model.
Accuracy and performance are much dependent on data preprocessing. To develop-
ment in machine learning steps, data preprocessing takes maximum amount of time.
Exploratory data analysis shows that the following outcome:
• Dataset has some missing values. Before applying machine learning model, that
is important to fill that missing values.
176 R. Gaurav et al.
• Dataset has certain number of extreme values or can say outliers that necessarily
addressed before applying machine learning model (see Fig. 5).
Above described problem can be addressed with the following manner:
• Addressing missing values: Missing values are loaded with applying mode method
for each attributes for classed variables and for continued variables using mean
method.
• Addressing outliers or extreme values: Extreme values are handled with normal-
ized/standardized by log transformation method to attributes which has outliers
or extreme values.
Figures 6 and 7 are showing the loan amounts values prior and after standard-
ized/normalization of data systematically.
Understanding of domain knowledge, have applied some new features that impacted
positively on model accuracy and performance. Have created new three features in
dataset:
• Total Income: Combined the applicants income and co-applicant’s income
because chances of loan approval are higher whether the total income of applicant
is higher.
• EMI: Higher EMI’s for sanctioned loan might difficult for the applicant to
payback.
• Balance Income: The idea behind balance income feature is that if balance income
is higher after paying EMI’s then it is better chance that applicant can repay the
loan amount.
See Fig. 8.
• First dataset is collected, and then, exploratory data analysis (EDA) process is
performed.
• On the basis of EDA, dataset is preprocessed, and both train dataset, test dataset
ready for deploying in machine learning model process.
• Machine learning model is built, and model validation is performed.
After reaching the optimal accuracy of machine learning developed model, model
forecasts if loan applicant is eligible to indorsation the loan or not.
This section of paper describes the selection of machine learning model. Here,
working on the classification problem where prediction of loan applicant considered
that loan applicant is eligible or not for sanctioning the loan. There are so many clas-
sification machine learning model available, and each different model has different
merits and limitations. For this proposed model, are using k-nearest neighbor (KNN)
algorithm model, random forest algorithm model, and support vector machine algo-
rithm model and compare their optimal accuracy for this loan eligibility prediction
model.
This algorithm model is used for supervised machine learning algorithms that is best
suited for classed and regressive problems of prediction. This is type of lazy learning
algorithm because it does not follow the special training phase and only stores the
dataset and performs action on dataset for the time of classification also known for
non-parametric learning algorithm because it does not make any assumption about
dataset. After applying k-nearest neighbor model (k-NN) algorithm with value of k
= 5, ML model reaches the Maximum Accuracy 99.94 percent, Minimum Accuracy
99.77 percent, Overall Accuracy 98.84 percent with Recall Score 0.96, F1 Score
0.97, and Precision Score 0.98.
Next model using for proposed model is random forest model. This model is also
applied for classes and regressive problem, and this model is more reliable for this
problem. It is also a type of supervised machine learning algorithm. It is rooted
on decision tree model algorithm, but it performs better than decision tree algo-
rithm because the essential idea behind, it combines the multiple decision trees
in processing the last term output in comparison to individual decision trees. And
higher number of decision tree in random forest model leads to the higher accuracy
and also prevents the overfitting problem of machine learning algorithm. Now, after
applying this algorithm on dataset with maximum depth of 10, this ML model reaches
the Maximum Accuracy 95.09 percent, Minimum Accuracy 93.63 percent, Overall
Development of Decision-Making Prediction Model for Loan Eligibility … 179
Accuracy 93.7 percent with Recall Score 1.0, F1 Score 0.95, and Precision Score
0.91.
After random forest model, now implementing support vector machine (SVM) model
for proposed model. Support vector machine is also a supervised machine algorithm
that is applied for the classification problem, regressive, and outlier’s detection. This
model is completely different from other classification machine learning algorithm.
SVM for classification problem chooses decision boundary that utmost the distance
from closer data points for all the classes. This decision boundary called hyper-
plane which helps to find best boundary or best line for the classes. After applying
SVM model algorithm with kernel = ’linear’, this ML model reaches the Maximum
Accuracy 76.93 percent, Minimum Accuracy 74.03 percent, Overall Accuracy 75.52
percent with Recall Score 1.0, F1 Score 0.84, and Precision Score 0.73.
In this research paper, supervised machine learning model is used to forecast the loan
eligibility of the applicants on the basis of their credit history, total income means
sum of applicant’s income and co-applicant’s income, number of dependent, EMI,
employment status to approve the eligible loan applicant for the further process. After
developing the ML model, it is found that the KNN model has better accuracy than
random forest model and SVM model, and also, SVM model has the lowest accuracy.
Cross-validation stratified k-fold algorithm also applied on machine learning models
to reach optimal accuracy with accuracy validation. All three models are tuned with
optimal and satisfactory levels of accuracy. It will definitely help to reduce the non-
performing asset problem of financial institutions.
180 R. Gaurav et al.
This model could enhance further with more knowledge of domain and their
features. In this model, three features are addressed, i.e., EMI, total income, and
balance income; in future, more features could be addressed like age-related illness,
progress of startup businesses, etc., with experience of loan handling and explore the
issue about applicant means if more understanding about applicant’s failure to repay
the loan. As much as diverse scenario of loan payable or failure collected, then it is
always a future scope to enhance the system.
References
1. Shinde A, Patil Y, Kotian I, Shinde A, Gulwani R (2022) Loan prediction system using machine
learning. In: ICACC, vol 44, article no. 03019, pp 1–4. https://doi.org/10.1051/itmconf/202244
03019
2. Ramya S, Jha PS, Vasishtha IR, Shashank H, Zafar N (2021) Monetary loan eligibility prediction
using machine learning. IJESC 11(7):28403–28406
3. Sarkar A, Sai KK, Prakash A, Sai GVV, Kaur M (2021) A research paper on loan delinquency
prediction. IRJET 08(4):715–722
4. Sheikh MA, Goel AK, Kumar T (2020) An approach for prediction of loan approval using
machine learning algorithm. In: 2020 International conference on electronics and sustainable
communication systems (ICESC). IEEE, pp 490–494
5. Vaidya A (2017) Predictive and probabilistic approach using logistic regression: application
to prediction of loan approval. In: 2017 8th International conference on computing, commu-
nication and networking technologies (ICCCNT). IEEE, pp 1–6. https://doi.org/10.1109/ICC
CNT.2017.8203946
6. Madan M, Kumar A, Keshri C, Jain R, Nagrath P (2020) Loan default prediction using decision
trees and random forest: a comparative study. IOP Conf Ser Mater Sci Eng 1022(012042):1–12
7. Supriya P, Pavani M, Saisushma N, Kumari NV, Vikash K (2019) Loan prediction by using
machine learning models. Int J Eng Tech (IJET) 5(2):144–148
8. Raj JS, Ananthi JV (2019) Recurrent neural networks and nonlinear prediction in support vector
machine. JSCP 1(1):33–40
9. Jency XF, Sumathi VP, Sri JS (2018) An exploratory data analysis for loan prediction based
on nature of clients. Int J Recent Technol Eng (IJRTE) 7(4S):176–179
10. Turkson RE, Baagyere EY, Wenya GE (2016) A machine learning approach for predicting bank
credit worthiness. In: 2016 Third international conference on artificial intelligence and pattern
recognition (AIPR). IEEE, pp 1–7. https://doi.org/10.1109/ICAIPR.2016.7585216
11. Kim H, Cho H, Ryu D (2018) An empirical study on credit card loan delinquency. Econ Syst
(Elsevier) 42(3):437–449
12. Tariq HI, Sohail A, Aslam U, Batcha NK (2019) Loan default prediction model using sample,
explore, modify, model and assess (SEMMA). J Comput Theor Nanosci 16(8):3489–3503
13. Prasad P, Tripathi K (2021) Natural scene text localization and removal: deep learning and
navier-stokes inpainting approach. In: 2021 IEEE International conference on electronics,
computing and communication technologies (CONECCT). IEEE, pp 1–5. https://doi.org/10.
1109/CONECCT52877.2021.9622609
14. Malhotra R, Kaur K, Singh P (2021) Wavelet based image fusion techniques: a comparison
based review. In: 2021 6th International conference on communication and electronics systems
(ICCES). IEEE, pp 1148–1152
15. Gomathy CK, Charulata, Aakash, Sowjanya (2021) The loan prediction using machine learning.
Int Res J Eng Technol 08(10):1322–1329
16. Dosalwar S, Kinkar K, Sannat R, Pise N (2021) Analysis of loan availability using machine
learning techniques. IJARSCT 9(1):15–20
Prediction of Osteoporosis Using
Artificial Intelligence Techniques:
A Review
S. K. Chawla
Computer Science and IT, Central University of Jammu, Jammu & Kashmir, India
e-mail: scsachin110@gmail.com
D. Malhotra (B)
Department of Computer Science and IT, Central University of Jammu, Jammu & Kashmir, India
e-mail: deepti.csit@cujammu.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 181
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_15
182 S. K. Chawla and D. Malhotra
1 Introduction
Over 200 million people worldwide, mostly the elderly, suffer from osteoporosis,
a metabolic bone disease [1–5]. Reduced bone density increases bone fragility and
fractures susceptibility; spinal compression fractures are the most common types.
According to the World Health Organization, osteoporosis is defined as having a
bone mineral density that is 2.5 standard deviations or more below the mean for a
young, healthy adult T-score (DXA measure 2.5) [6–9].
Because not all physicians have access to this technology, it may not be feasible to
routinely test the general population using DXA, even if early detection can reduce
the risk of future morbidity and mortality from fracture-related consequences. To
accurately screen patients and decrease over-diagnosis and misdiagnosis, extensive
study has been done on when and how to utilize DXA and how to avoid giving
patients a false sense of security [10–14]. Osteoporosis is commonly diagnosed in
clinical settings using a battery of clinical tests, while some of the treatments can be
pricey and a few of the tests might generate erroneous results as a consequence of
anthropogenic or other chemical flaws [15–20].
As a result, we need an expert system that can scan medical records in a variety of
formats and deliver accurate, dependable results without fatigue or error [21]. With
the growing trend of technology, AI techniques have evolved in medical diagnosis and
found to be a promising solution to improving health care [22]. Many CAD systems
based on demographic parameters (Age, gender, eating habits, and life style) and
vision-based modalities (MRIs, DXA, and CT scans) have recently been developed in
the field of osteoporosis with better accuracy [23–25]. However, such systems are not
deployed in clinical practices. This may be due to high-cost involvement or difficult
data acquisition procedure [13]. Keeping this essence in mind, the researchers are
attempting to develop a cost-effective system using AI techniques and X-ray images
which could be convenient and can be used in real time and yield better results with
greater accuracy (Fig. 1).
2 Literature Review
This section provides a brief insight into the existing osteoporosis detection
techniques introduced by the various researchers covering the period 2016–2021.
• Gao et al. [1] provided a systematic study to show how using medical images and
AI techniques to diagnose osteoporosis has been developed. Using the Quality
Assessment of Diagnostic Accuracy Studies (QUADAS-2) methodology, the
Prediction of Osteoporosis Using Artificial Intelligence Techniques: … 183
included studies’ bias and quality were evaluated. They had an accuracy of 95
percent.
• Ho et al. [2] Dual-energy X-ray absorptiometry is the old standard in this disci-
pline and has been recommended as the primary method for determining bone
mineral density (BMD), which is an indicator of osteoporosis (DXA). It is recom-
mended to utilize standard radiography for osteoporosis screening as opposed to
diagnosis. They had an accuracy of 88 percent.
• Fang et al. [3] developed a fully automatic method for bone mineral density
(BMD) computation in CT images using a deep convolutional neural network and
outlined a plan for using deep learning for individuals with primary osteoporosis
(DCNN). They were 82 percent accurate on average.
• Shim et al. [4] proposed Ml algorithms are highly accurate at predicting the risk
of osteoporosis. It could assist primary care professionals in identifying which
female patients should get a bone densitometry examination if they are at high
risk of osteoporosis. They had an accuracy of ANN 0.742%, SVM 0.724%, and
KNN 0.712%.
• Yamamoto et al. [5] built a model using convolutional neural networks (CNNs)
based on hip radiographs to diagnose osteoporosis at an early stage. They had an
accuracy of 93 percent.
• Yasaka et al. [6] propounded a method that is based upon unenhanced abdominal
computed tomography (CT) images. A deep learning method can calculate the
lumbar vertebrae’s bone mineral density (BMD). They had an accuracy of 97
percent.
184 S. K. Chawla and D. Malhotra
• Kalmet et al. [7] Compared to doctors and orthopedists, CNN performed better.
They had an accuracy of 99%.
• Park et al. [8] The major purpose of this research is to help doctors make an
accurate diagnosis of colorectal cancer. With an accuracy of 94.39 percent, one
fully linked layer and 43 convolutional layers make up our recommended CNN
design.
• La Rosa [9] developed a robotized computer-aided diagnostic strategy that incor-
porates a variety of previously offered standardizations, like MLR and ICA high-
light extraction, for the detection of OA. The recommended CAD has higher cate-
gorization rates than those found in past investigations. The multivariate linear
regression (MLR) technique was used to analyze 1024 X-ray pictures of people
with osteoarthritis.
• Ferizi et al. [10] Deep learning algorithms were applied to analyze MRI images of
knee bones using the 3D MRI datasets SK10, dataset OAI Imorphics, and dataset
OAI ZIP. The 3D CNN and SSM models were used in this analysis. The SK110,
OAI Imporphics, and OAI ZIb 3D MRI dataset’s accuracy for these models was
75.73 percent, 90.4 percent, and 98.5 percent, respectively.
• Rehman et al. [11] In this study, deep learning methods were used to X-ray
pictures of osteoarthritis patients. KNN and SVM algorithms were used SVM.
KNN was found to be 100% accurate and SVM to be 79% accurate for normal
images, whereas KNN was found to be 100% accurate and SVM was found to be
100% accurate for aberrant images.
• Jaiswal et al. [12] X-ray images are used to analyze osteoarthritis in the knee
using deep learning techniques. The authors described a novel method for nonin-
terventional detection and analysis of knee OA using X-rays. It might make the
process of finding a treatment for knee pain more efficient. With a separate testing
set, their approach yields the best multi-class grouping results: a normal multi-
class accuracy of 66.71 percent, a radiographical OA accuracy of 0.93, a quadratic
weighted Kappa of 0.83, and an MSE of 0.48. This could be compared to how
people typically think. The study included 3000 patients from the osteoarthritis
initiative dataset.
• Reshmalakshmi and Sasikumar [13] The purpose of this study is to use MRI
imaging of knee bones to diagnose knee bone disorders. This technique made use
of online MRI datasets. This work employs the scale space local binary pattern
feature extraction method. The accuracy of the research was 96.1 percent.
• Antony et al. [14] In this work, MR images of the knee joint were used to identify
cartilage lesions, such as cartilage softening, fibrillation, fissuring, focal defect
broad thinning due to cartilage degeneration, and acute cartilage damage. They
had achieved 87.9% accuracy.
• Liu et al. [15] The results of total knee arthroplasty (TKA) using patient-specific
instruments (PSIs), which are described in this study, are comparable to those
of TKA using conventional equipment in terms of post-operative radiography
results. They had a 95% accuracy rate.
• Ebsim et al. [16] This research describes a strategy for integrating different
detection techniques to find femur and radius fractures. To detect fractures,
Prediction of Osteoporosis Using Artificial Intelligence Techniques: … 185
these algorithms extract several types of characteristics. They were 82.6 percent
accurate.
• Deniz et al [17] This study’s objective was to assess conventional radiography
and MRI for early diagnosis and classification of OVFs and to measure the rate
of misdiagnosis of OVFs. They were 81 percent accurate.
• Reshmalakshmi and Sasikumar [18] This study’s goal is to demonstrate a
deep convolutional neural network-based automatic proximal femur segmentation
technique CNN. They had a 95% accuracy rate.
• Gornale et al. [19] To determine if quantitative susceptibility mapping (QSM)
accurately measures postmenopausal women’s osteoporosis. They had a 86
percent accuracy.
• Schotanus et al. [20] By integrating the results of the traditional X-ray image
processing method with the fuzzy expert system used to calculate the degree of
osteoporosis, a conclusion is made. To indicate the likelihood of osteopenia, the
authors in this study display the percentage reduction in bone density. Further-
more, it is determined that the ratio of trabecular to total bone energy is 0.7985,
indicating a loss of bone density of 7.985 percent.
• Chen et al. [21] This work uses DCCN to automatically assess the degree of knee
osteoarthritis from radiographs. When adjusted for regression loss, the network’s
multi-class grade 0–4 classification accuracy in this study is 59.55 percent.
• Hordri et al. [22] In this study, authors have segmented a knee X-ray picture
using the active contour algorithm before using several feature extraction methods.
Using the random forest classifier, the retrieved features revealed an accuracy rate
of 87.92 percent.
3 Comparative Analysis
187
188
Table 1 (continued)
Author Objectives Data Input data amount Methods Trained data Main results
La Rosa [9] A decision support X-ray 1024 images MLR Train data 80% AUROC 82.9%
(2019) tool for the early Tested data 20%
identification of knee
OA utilizing X-ray
imaging and machine
learning was
developed using data
from the OA initiative
Ferizi et al. [10] Automated knee bone MRI 3D MRI datasets 3D CNNs, 3D MRI datasets 75.8%,
(2019) and cartilage SKI10 dataset-OAI SSMs SKI10 90.4%,
segmentation for Imorphics dataset OAI Dataset OAI 98.4%
osteoarthritis initiative ZIB Imorphics, dataset
using statistical shape OAI ZIB
knowledge and
convolutional neural
networks
Rehman et al. [11] Identification of knee X-ray X-ray images of OA KNN Train data 70% Normal: KNN =
(2019) osteoarthritis using patients SVM Tested data 30% 99%,
texture analysis SVM = 79%
Abnormal: KNN =
189
190
Table 1 (continued)
Author Objectives Data Input data amount Methods Trained data Main results
Gornale et al. [19] An additional and MRI QCT images dataset CNN 70% trained 87%
(2018) accurate indicator of 30% testing
osteoporosis in
postmenopausal
women is bone
susceptibility mapping
with MRI
Schotanus et al. [20] Osteoporosis detection X-ray 20 patients images FEDI 20 patients images 79%
(2016) using fuzzy inference
Chen et al. [21] (2016) DCCN for determining X-ray 8892 images of knee DCCN 7030 images 95%
the severity of joints
radiographic
osteoarthritis in the
knee
Hordri et al. [22] Image from a knee X-ray 200 knee images Random forest 40% trained 87.92%
(2016) X-ray was used to classifier 60% testing
identify osteoarthritis
or visual input and can do so with astounding accuracy—in certain cases, even better
than human performance.
Figure 3 depicts the % usage rate of various medical descriptions for analysis of
osteoporosis. From this figure, it has been observed that there is extensive research
on X-ray image data for early osteoporosis prediction due to its ease of availability
and less cost in comparison to other modalities (MRI and CT scans).
This section proffers the existing gaps and challenges in the field of early osteoporosis
prediction.
1. Insufficient Standardized Dataset Specifically for images of the knee joint,
there aren’t many available standardized sample datasets. The generalizability
of the model may be constrained by the fact that researchers have created their
datasets under controlled conditions. Hence, there is a dire need to create open-
source datasets that will be beneficial for the research community in building a
robust model for early osteoporosis prediction [17–20, 22].
2. Sample Size in Osteoporosis Prediction Models In the case of X-ray images,
an open-source dataset is available on multiple sites, but there is a huge amount
of heterogeneity in data which in turn affects the performance of the model. To
overcome heterogeneity, transfer learning techniques could be employed.
192 S. K. Chawla and D. Malhotra
AI TECHNIQUES
Fig. 3 %age utilization ratio of data modalities in early osteoporosis prediction
From the exhaustive literature, it has been observed that various researchers have
developed X-ray imaging data and intelligent systems used for osteoporosis detec-
tion, but no study has been conducted to classify osteoporosis subclasses. Hence,
there is a need to develop an efficient framework that has the potential capability
to classify osteoporosis and its subclasses. Figure 4 shows the percentage usage of
various body regions in detecting early osteoporosis. From this figure; we observed
that knee regions have very less work down in this field due to the less dataset avail-
ability and heterogeneity in the osteoporosis dataset. Our proposed framework is
based on knee X-ray images. Keeping this essence in mind, this study proposes an
intelligent osteoporosis classifier based on knee X-ray imaging data. We proposed
a deep learning model that can have the potential capability to classify osteoporosis
and its subclasses. Data collection, data preprocessing, prediction model, and results
are among the main elements (Fig. 5).
Data collection is the most important aspect of the diagnosis system, and selecting
an appropriate sample for machine learning trials is crucial. This dataset will be
taken from the Kaggle repository and contains knee X-ray data on knee joints (1656
images). The severity of the offense includes images from a variety of categories.
The images descriptions are as follows:
After data collection, preprocessing of the dataset will be employed to improve the
images. The obtained images may be improved so that the data might aid in the early
detection of osteoporosis cases in various stages. We will resize each input image
while maintaining the aspect ratio in the initial preprocessing stage to lower the
training expenses. Additionally, to balance the dataset, we will perform up-sampling
and down-sampling. During the up-sampling procedure, areas are randomly cropped
to increase minority classes. Flipping and 90o rotation are commonly used to balance
the samples of the various classes, improve the dataset, and prevent overfilling. To
satisfy the cardinality of the smallest class, more instances of majority classes can
be removed throughout the down-sampling process. Each image in the generated
Prediction of Osteoporosis Using Artificial Intelligence Techniques: … 195
distributions is mean normalized to remove feature bias and minimize training before
being flipped and rotated.
In a machine learning model, data labeling is an essential part of the data prepro-
cessing stage. The quality of a dataset can be significantly impacted by any error
or inaccuracy made in this procedure. Additionally, a predictive model’s overall
effectiveness could be ruined and result in misunderstanding. The proposal uses the
dataset acquired from the Kaggle repository which further divided the images into
5 classes: Class 0 is Healthy; Class 1 is Doubtful; Class 2 is Minimal; Class 3 is
Moderate; Class 4 is Severe.
5.2.2 Normalization
5.2.3 Resizing
Image resizing is essential to make sure that all of the input images are the same
size because their sizes vary. Deep learning models must scale all input photos to the
same size before feeding them into the model because they often learn quickly on
smaller images and accept inputs of the same size.
5.2.4 Denoising
DCNN will be used in the proposed model to identify and categorize images. It makes
use of a hierarchical model to construct a network that resembles a funnel before
producing a fully connected layer where the output is processed and all neurons
are connected. The main advantage of DCNN over the existing techniques is that it
automatically extracts key elements without the need for human participation. DCNN
would therefore be a great option for spotting osteoporosis in its early phases.
6 Conclusion
After reviewing the existing literature relating to the automated prediction models
for early osteoporosis detection, the findings suggest that machine learning and deep
learning techniques have been frequently used in osteoporosis diagnostic field. From
the exhaustive study, it has been observed that nearly 70% of the studies employed
machine learning techniques, and 30% of the studies employed deep learning tech-
niques. However, the work in the field of osteoporosis prediction based on deep
learning techniques is very less. The prime focus of this study is to critically access
and analyze AI-based models for early osteoporosis prediction using several modal-
ities (X-ray, MRI, CT scans) and AI techniques over the years 2016–2021. After
carefully examining the existing gaps and challenges, this paper elucidates some
future directions that need to be addressed and proposed an intelligent osteoporosis
classifier using X-ray imaging data and a deep convolutional neural network that will
be implemented in the future and will pose as a potential aid to research scholars and
health practitioners by providing a more precise, effective, and timely diagnosis of
osteoporosis.
References
22. Hordri NF, Samar A, Yuhaniz SS, Shamsuddin SM (2017) A systematic literature review on
features of deep learning in big data analytics. Int J Adv Soft Comput Appl 9(1):32–49
23. Giornalernale SS, Patravali PU, Manza RR (2016) Detection of osteoarthritis using knee X-ray
image analyses: a machine vision-based approach. Int J Comput Appl 145:20–26
24. Madani A, Moradi M, Karargyris A, Syeda-Mahmood T (2018) Semi-supervised learning with
generative adversarial networks for chest X-ray classification with the ability of data domain
adaptation. In: 2018 IEEE 15th International symposium on biomedical imaging (ISBI 2018),
pp 1038–1042. https://doi.org/10.1109/ISBI.2018.8363749
25. Sharma AK, Toussaint ND, Elder GJ, Masterson R, Holt SG, Robertson PL, Ebeling PR,
Baldock P, Miller RC, Rajapakse CS (2018) Magnetic resonance imaging-based assessment of
bone microstructure as a non-invasive alternative to histomorphometry in patients with chronic
kidney disease. Bone 114:14–21. https://doi.org/10.1016/j.bone.2018.05.029
26. Marongiu G, Congia S, Verona M, Lombardo M, Podda D, Capone A (2018) The impact
of magnetic resonance imaging in the diagnostic and classification process of osteoporotic
vertebral fractures. Injury 49(Suppl 3):S26–S31. https://doi.org/10.1016/j.injury.2018.10.006
Analyzing Stock Market with Machine
Learning Techniques
Abstract The financial market is extremely volatile, and this unstable nature of the
stock market is not easy to understand. But technological advancements have given
a ray of hope that it might be possible that one can make the machines understand
this level of volatility and can make accurate predictions about the future market
prices. This paper emphasizes various techniques by which machines can learn the
financial markets and their future trends/movements. This paper has made use of
four such techniques along with sentiment analysis on the news related to the under-
taken tickers. This study shows that classification techniques give a good estimate of
unusual highs and lows of the market, which in turn can prove helpful for the traders
in taking timely and accurate decisions, i.e., bullish or bearish trends. This study is
focused on determining the trends of the market while considering not only the stock
trends but also the sentiments of the news headlines, using the polarity scores. The
ensembled technique has given better results than other techniques in terms of R2
score and mean absolute error.
1 Introduction
A number of investors in the stock market are growing enormously every day. From
the very early days of the stock market, investors are using traditional ways of trading,
i.e., using statistical analysis, fundamental analysis, the past episodes of any happen-
ings in the markets, and technical indicators. These techniques need continuous
involvement of the investor and are very time-consuming. Increasing numbers of
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 199
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_16
200 K. Sharma and R. Bhalla
traders are making the investors always look for better, accurate, trustworthy, and
faster ways to do the predictions.
Technology advancements are making it possible to make all these things auto-
mated. But matching the accuracy of the predictions with the predictions made by
experienced and professional traders who are in this field of stocks for generations
is not easy. With the availability of big data and machine learning techniques, devel-
oping new algorithms or methodology for stock market predictions is very much
possible. Mehta et al. [1] show in their paper that news related to finances has a
significant effect on next-day stock prices. News headlines that were scraped from
reputed web portal, i.e., moneycontrol.com [2] using the customized Python-based
script, then were classified as per the tickers and the corresponding sentiments which
were derived from the news dataset.
Qualitative analysis shows that news data affects the stock price movements [3],
and these two are interdependent. This study was done considering only news head-
lines instead of the entire news article, presuming that the headlines alone have a
significant effect on investors, and these headlines are trustworthy and a rapid way
of influencing the stock market. A quantitative study is done with the data which was
gathered using yahoo-finance-API.
2 Review of Literature
The majority of stock market forecasting revolves around statistical and technical
analysis, though today various researches have shown clear shreds of evidence of
great interdependency of news data and the stock movements. Mohan et al. [4] in
their paper, they have used variations of RNN-LSTM in terms of input, i.e., prices,
text polarity, text, and multivariate model along with Facebook Prophet and ARIMA.
The results show a good relationship between stock prices and news data. They have
got promising results with RNN, and their model failed when it came to lower stock
prices and high volatility, whereas Mehta et al. [1] have done an implementation of
SM prediction tool taking in the sentiments, news data, and past data for forecasting
the SM prices. Authors have used SVM, LR, NB, and long short-term-memory.
Akhtar et al. [5] have preprocessed the historical dataset and then machine learning
techniques, i.e., Random Forest and SVM classifier were applied. Authors have
achieved the accuracy of 78.7% with SVM and 80.8% with RF classifier. As per
them, additional parameters such as financial ratio and sentiments of masses to figure
out the interrelationship of clients with employees can be used as input. Emioma and
Edeki [6] have used the least-square LR model with close price as the dependent
variable and everyday stock value as an independent variable. Their model made
good predictions with 1.367% MAPE and 0.512 RMSE. Their model is reusable
with other tickers of other stock markets also. Sharma et al. [7] have discussed
technical analysis, fundamental analysis, basic technical indicators, and some of the
most popular machine learning techniques to make the forecasting. In their further
Analyzing Stock Market with Machine Learning Techniques 201
study have proposed a hybrid model for forecasting of stock market on the basis of
news sentiments and historic dossier on Nifty50 data [8].
Thormann et al. [9] long short-term memory with lagged closing price served as
the basis of their study. Their study acts as a guide to use and preprocess Twitter
data, which in turn is combined with technical indicators to make the forecasting
of APPL. Their model can do better than the undertaken base model. Kedar [10]
has used fresh Twitter data and integrated the results of sentiment analysis of these
tweets along with the results of the ARIMA model’s application over the historical
dataset. Their study shows that change in the company leads to a change in accuracy
level due to COVID-19.
Chen et al. [11] have suggested a hybrid model based on XGBoost integrated
with firefly algorithm to do forecasting. Firefly helps in the optimization of hyper-
parameters of XGBoost. Then, stocks with high potency are undertaken, and MV
is applied over them. Their hybrid model is performing way better than the other
state-of-the-art models. They listed the reasons behind the success of the model as
they considered characteristics of the SM that may affect the upcoming prices, and
they have also increased the accuracy using their own IFAXGBoost model and also
used the MV model to better utilize allocation of the assets.
Sarkar et al. [12] have proposed a model which uses the ways, investors, and traders
use for making predictions in combination with technical and financial analysis using,
news headings and made the predictions about Google prices. They have used senti-
ment analysis for headings of collected news along with a long short-term-memory
model for historical data. Their approach is showing prominent improvement in the
forecasting results.
Gondaliya et al. [13] have worked with the dataset which is influenced by COVID-
19. They have selected and applied the top six algorithms, i.e., DT, KNN, LR, NB,
RF, and SVM to the data. LR and SVM have shown better forecasting solutions.
They have suggested that these top algorithms can be used as input for building a
robust forecasting model.
Gupta and Chen [14] have analyzed the effect of sentiments derived through
StockTwits. An analysis is done via extraction of features from the tweets and appli-
cation of ML techniques. The relation between the average SM prices on an everyday
basis along with daily headlines is studied. The sentiment feature extracted is then
used with past data to improvise the SM price predictions.
Li et al. [15] have proposed a new approach with the integration of both tech-
nical indicators along with sentiments. Two-layered LSTM is proposed to under-
stand time-series data of five years. Base models MKL and SVM are used for
comparison with their proposed approach. LSTM gave better results, i.e., accuracy
and f1-score in comparison with MKL and SVM. Yadav and Vishwakarma [16]
have examined various important architectures related to sentimental analysis. They
have presented DL models such as various types of neural networks including bi-
directional recurrent neural networks. They have derived the conclusion that LSTM is
giving more appropriate predictions than other DL techniques. They have also high-
lighted various languages being used for performing SA. Authors have concluded that
202 K. Sharma and R. Bhalla
3 Proposed System
3.1 Overview
This study aimed to categorize the stocks as per the respective industries and derive
the sentiments from their corresponding news data. Figure 1 shows the architecture
of the proposed system. There are three major parts (1) data collection using Yahoo
Finance API and preprocessing of collected news headlines. (2) Categorization of
tickers using various models. (3) Analyzing the categorized (derived) labels.
Yahoo finance API with Python is used to collect the historical dataset of two years,
whereas a custom Python script is used for scraping news headlines data. This way
made it possible to get specific words out of the news headlines, which is the basic
Analyzing Stock Market with Machine Learning Techniques 203
requirement of our model. Specific rules were defined for generating a set of rules
for the dictionary building. Nifty fifty is one of the topmost indices of India. Features
included were open, close, volume, and only headline data. Detailed news articles
were not taken into consideration. The news was gathered on the daily basis to get
up to the date data about the latest tendency of the stock market.
News headlines from the reputed and trustworthy financial website, i.e., moneycon-
trol.com [2] were used. While collecting the news, some factors like consideration of
news headlines only specific to undertaken tickers and elimination of detailed news
204 K. Sharma and R. Bhalla
articles were kept in mind. After finalizing the list of news, the same were prepro-
cessed with the usage of regular expressions [11]. Hashtags and other references were
also rejected, so that classification of the text can be done unbiasedly. Preprocessing
steps including lowercase conversion, stemming, and lemmatization were applied to
the news data, and preprocessing of historical data was also performed. Features are
selected on the level at which they affect the close price of the stocks. For both, the
datasets were checked for any null values and were replaced with the mean values.
This study made use of the news headlines which were scraped and filtered with
a specific ticker, the company name, i.e., INFY. Polarity scores of all the text data
combined based on the date were calculated. Python libraries nltk, punkt, and vader
for sentiment analysis were used. After the calculation of polarity scores, datasets
were combined based on the date column (intersection of dates). As there were days
when the stock market was closed due to weekends or national holidays, and there
were days when there was no major or meaningful news available. The news dataset
was then used as input for different machine learning techniques. Our model has
used Random Forest Regressor [17], Gradient Boosting Regressor [22], Decision
Tree Regressor [23], and finally a voting regressor technique that integrates all the
three listed above. The split ratio used was 0.2 (Figs. 2, 3 and 4).
Fig. 3 Polarity Score and open price as inputs for close price predictions
Fig. 4 Polarity score, open price, no. of trades, low price, high price as inputs for close price
predictions
5 Conclusion
This paper shows the analysis and comparison of four regression techniques named
RF Regressor, GB Regressor, DT Regressor, and ensembled regression technique to
do stock market predictions. Python’s nltk, punkt, and vader APIs were implemented
for sentiment analysis. The model can be used for forecasting of next three to five
days for any NIFTY indexed stock. This can be concluded that using polarity as the
only feature comes out as a non-dominant feature. But when combined with open
price, it gives excellent results. In the future, better-improved methods of polarity
score decisions for sentiment analysis of news can be used. To derive sentiments of
stake holders, financial annual reports or the companies, tweets can also be used,
which may enhance the prediction results. Moreover, hyperparameters tuning of the
models can be used further to improve the model.
References
1. Mehta P, Pandya S, Kotecha K (2021) Harvesting social media sentiment analysis to enhance
stock market prediction using deep learning. PeerJ Comput Sci 7:1–21. https://doi.org/10.7717/
peerj-cs.476
Analyzing Stock Market with Machine Learning Techniques 207
2. Business News | Stock and Share Market News | Finance News | Sensex Nifty, NSE, BSE Live
IPO News. Retrieved from https://www.moneycontrol.com/. Accessed on 10 Feb 2022
3. Zhao W et al (2018) Weakly-supervised deep embedding for product review sentiment analysis.
IEEE Trans Knowl Data Eng 30(1):185–197. https://doi.org/10.1109/TKDE.2017.2756658
4. Mohan S, Mullapudi S, Sammeta S, Vijayvergia P, Anastasiu DC (2019) Stock price predic-
tion using news sentiment analysis. In: 2019 IEEE Fifth international conference on big data
computing service and applications (BigDataService), pp 205–208. https://doi.org/10.1109/
BigDataService.2019.00035
5. Akhtar MM, Zamani AS, Khan S, Shatat ASA, Dilshad S, Samdani F (2022) Stock market
prediction based on statistical data using machine learning algorithms. J King Saud Univ Sci
34(4):101940. https://doi.org/10.1016/j.jksus.2022.101940
6. Emioma CC, Edeki SO (2021) Stock price prediction using machine learning on least-squares
linear regression basis. J Phys Conf Ser 1734:012058. https://doi.org/10.1088/1742-6596/1734/
1/012058
7. Sharma K, Bhalla R (2022) Stock market prediction techniques: a review paper. In: Second
international conference on sustainable technologies for computational intelligence. Advances
in intelligent systems and computing, vol 1235. Springer, Singapore, pp 175–188. https://doi.
org/10.1007/978-981-16-4641-6_15
8. Sharma K, Bhalla R (2022) “Decision Support Machine- A hybrid model for sentiment analysis
of news headlines of stock market.” Int J Electr Comput Eng Syst 13(9):791–798. https://doi.
org/10.32985/ijeces.13.9.7
9. Thormann ML, Farchmin J, Weisser C, Kruse RM, Safken B, Silbersdorff A (2021) Stock
price predictions with LSTM neural networks and twitter sentiment. Stat Optim Inf Comput
9(2):268–287. https://doi.org/10.19139/soic-2310-5070-1202
10. Kedar SV (2021) Stock market increase and decrease using twitter sentiment analysis and
ARIMA model. Turk J Comput Math Educ 12(1S):146–161. https://doi.org/10.17762/tur
comat.v12i1s.1596
11. Chen W, Zhang H, Mehlawat MK, Jia L (2021) Mean–variance portfolio optimization using
machine learning-based stock price prediction. Appl Soft Comput 100:106943. https://doi.org/
10.1016/j.asoc.2020.106943
12. Sarkar A, Sahoo AK, Sah S, Pradhan C (2020) LSTMSA: A novel approach for stock market
prediction using LSTM and sentiment analysis. In: 2020 Int Conf Comput Sci Eng Appl
(ICCSEA), pp 4–9. https://doi.org/10.1109/ICCSEA49143.2020.9132928
13. Gondaliya C, Patel A, Shah T (2021) Sentiment analysis and prediction of Indian stock market
amid Covid-19 pandemic. IOP Conf Ser Mater Sci Eng 1020(1):012023. https://doi.org/10.
1088/1757-899X/1020/1/012023
14. Gupta R, Chen M (2020) Sentiment analysis for stock price prediction. In: Proc 3rd Int Conf
Multimed Inf Process Retrieval (MIPR), pp 213–218. https://doi.org/10.1109/MIPR49039.
2020.00051
15. Li X, Wu P, Wang W (2020) Incorporating stock prices and news sentiments for stock market
prediction: a case of Hong Kong. Inf Process Manag 57(5):102212. https://doi.org/10.1016/j.
ipm.2020.102212
16. Yadav A, Vishwakarma DK (2020) Sentiment analysis using deep learning architectures: a
review. Artif Intell Rev 53(6):4335–4385. https://doi.org/10.1007/s10462-019-09794-5
17. Reddy NN, Naresh E, Kumar VBP (2020) Predicting stock price using sentimental analysis
through twitter data. In: Proc (CONECCT) 6th IEEE Int Conf Electron Comput Commun
Technol, pp 1–5. https://doi.org/10.1109/CONECCT50063.2020.9198494
18. Suhail KMA et al (2021) Stock market trading based on market sentiments and reinforcement
learning. Comput Mater Contin 70(1):935–950. https://doi.org/10.32604/cmc.2022.017069
19. Subasi A, Amir F, Bagedo K, Shams A, Sarirete A (2021) Stock market prediction using
machine learning. Procedia Comput Sci 194(November):173–179. https://doi.org/10.1016/j.
procs.2021.10.071
20. Rouf N et al (2021) Stock market prediction using machine learning techniques: a decade survey
on methodologies, recent developments, and future directions. Electronics 10(21):2717. https://
doi.org/10.3390/electronics10212717
208 K. Sharma and R. Bhalla
21. Raubitzek S, Neubauer T (2022) An exploratory study on the complexity and machine learning
predictability of stock market data. Entropy 24(3):332. https://doi.org/10.3390/e24030332
22. Polamuri SR, Srinivas K, Mohan AK (2019) Stock market prices prediction using random
forest and extra tree regression. Int J Recent Technol Eng 8(3):1224–1228. https://doi.org/10.
35940/ijrte.C4314.098319
23. Yang JS, Zhao CY, Yu HT, Chen HY (2020) Use GBDT to predict the stock market. Procedia
Comput Sci 174(2019):161–171. https://doi.org/10.1016/j.procs.2020.06.071
Machine Learning Techniques for Image
Manipulation Detection: A Review
and Analysis
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 209
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_17
210 S. W. Iqbal and B. Arora
1 Introduction
2 3 4 5 6 7 8 9
Literature Datasets Conclusion
Types of manipulation
Survey for IMD
2 Image Processing in ML
ML algorithms learn from data and follow a predefined pipeline or set of steps. To
begin, ML algorithms require a large amount of high-quality data in order to learn and
predict highly accurate results. As the scale of data in machine learning increases, so
does the performance. The images should be well processed, interpret, and generic
for ML image processing. Computer Vision (CV) comes into play in this which is a
field concerned with machines’ ability to understand image data.CV can be used to
process, load, transform, and manipulate images in order to create an ideal dataset
for the machine learning model [4].
Preprocessing steps used are as:
1. Changing all of the images to the same format.
2. Removing extraneous areas from images.
3. Converting them to numbers so that algorithms can learn from them (array of
numbers).
These features (processed data) are then used in further steps involved in selecting
and developing a machine learning models to classify unknown feature vectors by
classifying the large database of feature vectors. After this, we need to select a
suitable algorithm like Bayesian Nets, Decision Trees, Genetic Algorithms, Nearest
Neighbors, and Artificial Neural Networks, etc., can be used for this.
The Fig. 2 diagram explains the working of a traditional machine learning image
processing method for image data [4].
The dimensionality reduction divides and reduces the set of raw data to more manage-
able groups. So that processing becomes simpler. The important feature is the large
number of variables in these large datasets. These variables help significantly in
212 S. W. Iqbal and B. Arora
Image
Fig. 2 Machine learning workflow of image processing for image data [4]
the amount of computing power to process. It helps in attaining the best features
from large datasets by merging the selected and combined variables into feature sets
which effectively reduces the amount of data. Processing these features is simple
which helps in identifying the accurate and unique narrating the actual data [5].
Feature Extraction Techniques
• Bag of words It is the most commonly used NLP technique. This process involves
extracting words or attributes from a sentence, manuscript, internet site, or other
source and categorizing them in terms of frequency of use. As a result, one of the
most important aspects of this entire process is feature extraction.
• Image Processing Image processing is a fantastic and fascinating field. In this
domain, you will primarily start playing with your images to understand them.
We use a variety of methodologies, as well as feature extraction and algorithms,
to identify and process features such as shapes, edges, or motion in a digital image
or video.
• Auto-encoders The primary goal auto-encoders is data efficiency which is unsu-
pervised in nature.so the feature extraction procedure is used to identify the key
features from the data to code by learning from the original data to derives the
new data [5]
The last step after the feature extraction is training and modeling the image data
using various algorithms to learn from the patterns with specific parameters so that
we can use the trained model to predict the previously unknown data.
3 Types of Manipulations
Any operation performed on an image or video that causes the visual content to
differ from its original version is referred to as a digital manipulation. Moreover,
Machine Learning Techniques for Image Manipulation Detection: … 213
many images processing approaches, such as rotation, downsizing, and the applica-
tion of global filters on images, partially manipulates the information represented
by visual content. As a result, image forensic approaches are becoming increasingly
effective at detecting maliciously manipulated visual content. Image manipulation
techniques are classified into two types: content preservation and content altering.
According to the authors, steganography is also a type of image manipulation tech-
nique because it alters the image content invisibly. The sections of the paper that
follow explain various types of image tempering and manipulation detection tech-
niques. The manipulations like copy-paste, splicing, and retouching are possible by
employing both basic image processing techniques and advanced methods based, for
example, on GAN [6] (Fig. 3).
Error due to
stotrage/noise/transf
ormaƟon/quanƟsaƟon
Enhancement and
Image ManipulaƟon RestoraƟon
PosiƟon Change
Content Altering
DeleƟon Change of
texture, color, edge,
background,shadows
It can lead to distinct manipulative tactics thanks to its rich feature illustration.
Then analyzing the manipulation detection technique and comparing the perfor-
mance. NIST16 includes splicing, removal, and copy-move tempering techniques
for detecting multi-class image manipulation. To become familiar with unique visual
tinkering artifacts and acoustic aspects for each class, we change the manipulation
classification classes. Below is shown performance of each temper class (Table 1).
Splicing is the simplest and most likely to produce RGB artifacts such as artificial
corners, contrast discrepancies, and noise artifacts. Because the inpainting which
occurs after the removal process has a major impact on the noise features, removal
detection outperforms copy-move. The most difficult tamper technique is copy-move
which results in same noise distribution perplexes the noise stream resulting in same
contrast between the two regions [1].
The Fig. 4 shows for splicing, copy-move, and removal manipulation detection,
and the RGB and noise maps focus on providing conflicting information. RGB-N
produces the recognition accuracy for various tampering techniques by incorporating
the characteristics from the RGB image with the noise features.
5 Literature Survey
matrices. They used the Media Forensics (Medi For) dataset for the proposed work.
The model used was CNN. They achieved an accuracy of 81%. Kwon et al. [14]
presented localization and detection of JPEG artifacts for the manipulation of image.
The feature used is DCT coefficients and JPEG artifacts. They used the CASIA
dataset, fantastic Reality dataset, and IMD2020 dataset in this paper. The accu-
racy achieved was 81%. (Horvath et al. [15]). In this paper, they used vision trans-
former model for detecting manipulation in satellite taken images. They used the
xview2 and WorldView3 dataset consisting of satellite manipulated images. Their
model performed better than previously unsupervised detection techniques. Dang
et al. [16] gave face image manipulation detection based on CNN. The face region
features were used. They used framework HF-MANFA and MANFA on the DSI-
1 dataset and Imbalanced dataset. It outperformed existing expert and intelligent
systems. Shi et al. [17] presented a paper on Image Manipulation Detection Using
a Global Semantic Uniformity System. The features used to exploit were texture
and semantic information. The datasets that were used in this paper are NIST2016
and CASIA. They proposed GSCNet for this paper. It significantly outperforms the
previous methods in terms of performance. Zhou et al. [18] proposed image manipu-
lation detection using a neural network architecture based on geometric rectification.
The features extracted were at the pixel level. They gave RNN model for the detec-
tion of manipulation. The datasets used were Pascal VOC07, Imbalanced dataset,
and CASIA. It achieved the desirable performance with common tempering arti-
facts. Wei et al. [19] developed an algorithm based on Edge Detection and Faster
R-CNN. The tamper feature and edge were used. They used the model using Faster
R-CNN and RPN. The datasets used were NIST2016, Columbia, and CASIA. The
proposed model was more effective than other traditional algorithms. In 2021, Bekci
et al. [20] presented Cross-Dataset Face Manipulation Detection utilized Deepfake.
They used metric learning and steganalysis-rich model. The datasets on which exper-
iments were done are FaceForensics++, DeepfakeTIMIT, CelebDF. They also gave
a Deepfakes detection framework for the face manipulation detection. Under unseen
manipulations, the framework improved accuracy by 5% to 15% and showed high
degree of generalization.
217
218
Table 2 (continued)
Year Authors Objective Features Modal Dataset Matrices Results
used
2018 Bayar Constrained Image CNN, ET classifier Large-scale ACCURACY 99%
et al. convolutional content dataset
[2] neural networks:
A novel approach
to detect image
manipulation
2019 Qi et al. Detecting fake Visual CNN and CNN-RNN Seino Weibo ACCURACY 84.6%
[3] news using feature, dataset
multi-domain pixel
visual information domain
2020 Toloson Deepfakes and Image GAN DFFD database, ACCURACY 100% but Celeb-DF
et al. beyond: Facial content Celeb-DF database showed only
[9] manipulation and database 60% AUC Accuracy
fake detection
2021 Dong MVSS-Net: Pixel level, MVSS-Net, ConvGeM CASIAv2, ACCURACY It
et al. Image image level DEFACTO outperformed existing
[12] manipulation dataset methods for both
detection using inside of and
multi-view throughout dataset
multi-scale situations
supervised
219
220
Table 2 (continued)
Year Authors Objective Features Modal Dataset Matrices Results
used
2020 Shi Image Texture and GSCNet NIST2016 and ACCURACY It outperformed
et al. Manipulation semantic CASIA previously compared
[17] Detection Using a methods with
Global Semantic improved
Uniformity performance
System
2021 Zhou Geometric Pixel level RNN Pascal VOC07, ACCURACY It achieved the
et al. rectification-based Imbalanced desirable performance
[18] neural network dataset, CASIA with common
architecture for tempering artifacts
image
manipulation
detection
2019 Wei Creating a Faster Tamper Faster R-CNN, RPN NIST2016, ACCURACY It was more effective
et al. R-CNN image feature, Columbia, and than other traditional
[19] manipulation edge CASIA algorithms
detection detection
algorithm using
edge detection
2021 Bekci Cross-Dataset Deepfakes Deepfakes detection framework FaceForensics++, ACCURACY It improved the
et al. face manipulation DeepfakeTIMIT, accuracy by 5% to
four standard image manipulation datasets using two stream Faster R-CNN frame-
work showed that method used not only detects tempering artifacts but also differ-
entiates various tempering techniques with improved performance. The Random
Forest approach in [10] achieves an accuracy of 83.6%; accuracy on Seino Weibo
dataset means it does not achieve higher accuracy, while researchers build a MVSS-
Net approach [14] which outperformed recent studies that have attempted in both
intra- and inter-database instances. The CNN ET classifier approach [2] uses large-
scale dataset and achieves an accuracy of 99%. Furthermore, the vision transformer
approach in [16] uses image splicing detection on xview2 dataset which achieves
superior performance to previously unsupervised splicing techniques, while the deep
flakes and beyond approach proposed in [11] achieves an accuracy of 100% but
Celeb-DF database showed only 60% AUC accuracy on GAN model.
This section outlines the datasets that are accessible for image manipulation detec-
tion (IMD). Table 3 compares and contrasts several commonly used datasets. These
datasets were gathered from different online platforms and mainstream media portals.
When comparing datasets, few trends emerge out. CASIA provides spliced
and copy-move images of various objects. The tampered portions are specifi-
cally selected, and post-processing techniques such as filtering and obscuring are
employed. The differential among both manipulated and authentic images is fuzzi-
fied to create ground truth masks. COVER is a tiny dataset that focuses on copy-
move operations. It conceals tampering artifacts by covering common items as spliced
areas, and it also includes ground truth masks. The Columbia dataset concentrates on
lossless copy-move. Masks with underlying data are presented [1]. The CoMoFoD
dataset was created to detect counterfeit of a copy-move. It includes 260 crafted
pictures categorized into two: tiny (512 × 512 pixels) and big (512 × 512 pixels)
(3000 × 2000 pixels). Each set comes with a forgery, a mask of the tampered area,
and the authentic image. The manipulation of images is classified into five cate-
gories: translation, rotation, scaling, combination, and distortion [6]. Datasets that
are provided are outdated and out of date. Such datasets are inadequate for tack-
ling the issue of image manipulation for recent image data because manipulating
image data producers’ techniques change over time. The altered area in the CASIA
v2.0 dataset has linguistic features besides a living creature or a vehicle, and certain
relevant data could be derived from the uniformity in between two datasets. The
pixel value in the feature space is close together. Furthermore, the image and consis-
tency are divided using correlation among nearby pixels. The term region is used by
some clustering algorithms. As a result, 3 tests are carried out depending on multiple
traditional features extraction in order to split the pictures into various portions.
There are several challenges associated with detecting image manipulation in order
to identify and locate the manipulated regions on the image, which are as follows:
(a) Real-time Data Collection The manual task of identifying image manipulation
is extremely subjective. Typically, it needs to detect the manipulated image from
various sources using manual detection techniques. Detecting these manipulated
images can impose a considerable challenge.
(b) Less Distinct Data The disparity in the number of datasets available has less
distinct images presents a challenge in image detection.
(c) Exploring More Features For this problem, researchers in the field of detecting
image manipulation are trying to construct a larger multimodal dataset by fusing
current datasets available to explore more generalization on different datasets
is still a challenging task.
(d) Other Challenges The existing integration of the machine learning prepro-
cessing and post-processing operation to search for higher level cues informa-
tion for effective detection and identifying tempered regions in manipulated is
a challenging task.
Machine Learning Techniques for Image Manipulation Detection: … 223
9 Conclusion
In this modern world, the manipulation of digital images has become very popular.
The easily available editing software, tools, and many image altering tools have
become widely used to alter the image. These manipulations on an image can mislead
the public and can malign person’s character, business, criminal inquiry, and political
opinions. Hence, it has become very important for us to detect if the image is tempered
or not. The present survey shows that various techniques are available to detect
image manipulation using machine and deep learning technique but because of the
high complexity of deep learning models, machine learning approaches are more
preferred. Also, they do not perform well when the data is very small so they require
a vast data, training them requires a lot of computational power, which makes them
more time-consuming and resource intensive. The CNN, RNN, and LSTM models are
among the deep learning techniques used for this purpose of manipulation detection.
Faster R-CNN outperforms the MFCN and LSTM. So, to address these challenges,
machine learning techniques will be better to use as these techniques use large amount
of data to learn from the patterns and creates self-learning algorithms so that machines
can learn fast by themselves and make decisions based on that learned data. This
study investigates various datasets available for detecting tempered regions. Even
though several methods and techniques for combating manipulation have advanced
over the last decade, there are still several ongoing research challenges such as Jpeg
compression, low accuracy, less data available, lack of rapid, and real-time discovery.
References
1. Zhou P, Han X, Morariu VI, Davis LS (2018) Learning rich features for image manipulation
detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition
(CVPR), pp 1053–1061
2. Bayar B, Stamm MC (2018) Constrained convolutional neural networks : a new approach
towards general purpose image manipulation detection. IEEE Trans Inf Forensics Secur
13(11):2691–2706
3. Qi P, Cao J, Yang T, Guo J, Li J (2019) Exploiting multi-domain visual information for fake
news detection. In: Proc IEEE IntConf Data Mining (ICDM), pp 518–527. https://doi.org/10.
1109/ICDM.2019.00062
4. Machine learning image processing. Retrieved from https://nanonets.com/blog/machine-lea
rning-image-processing/. Accessed on 23 Jan 2022
5. What is feature extraction? Feature extraction in image processing”. Retrieved from https://
www.mygreatlearning.com/blog/feature-extraction-in-image-processing/. Accessed on 23 Jan
2022
6. Novozámský A, Mahdian B, Saic S (2021) Extended IMD2020: a large-scale annotated dataset
tailored for detecting manipulated images. IET Biometrics 10(4):392–407. https://doi.org/10.
1049/bme2.12025
7. Retrieved from https://www.researchgate.net/figure/Types-of-Image-Manipulation_fig1_3207
03095
224 S. W. Iqbal and B. Arora
8. Jin Z, Cao J, Zhang Y, Zhou J, Tian Q (2017) Novel visual and statistical image features for
microblogs news verification. IEEE Trans Multimed 19(3):598–608. https://doi.org/10.1109/
TMM.2016.2617078
9. Tolosana R, Vera-Rodriguez R, Fierrez J, Morales A, Ortega-Garcia J (2020) Deepfakes and
beyond: a survey of face manipulation and fake detection. Inf Fusion 64(June):131–148. https://
doi.org/10.1016/j.inffus.2020.06.014
10. Heller S, Rossetto L, Schuldt H (2018) The PS-battles dataset - an image collection for
image manipulation detection. arXiv:1804.04866v1 pp 1–5. Retrieved from https://arxiv.org/
pdf/1804.04866.pdf
11. Shi Z, Shen X, Kang H, Lv Y (2018) Image manipulation detection and localization based on
the dual-domain convolutional neural networks. IEEE Access 6:76437–76453. https://doi.org/
10.1109/ACCESS.2018.2883588
12. Dong C, Chen X, Hu R, Cao J, Li X (2022) MVSS-Net : Multi-view multi-scale supervised
networks for image manipulation detection. IEEE Trans Pattern Anal Mach Intell 45(3):3539–
3553
13. Nataraj L, Goebel M, Mohammed TM, Chandrasekaran S, Manjunath BS (2021) Holistic
image manipulation detection using pixel co- occurrence matrices. arXiv:2104.05693v1, pp
1–6. Retrieved from https://arxiv.org/pdf/2104.05693.pdf
14. Kwon MJ, Nam SH, Yu IJ et al (2022) Learning JPEG compression artifacts for image manip-
ulation detection and localization. Int J Comput Vis130:1875–1895. https://doi.org/10.1007/
s11263-022-01617-5
15. Horvath J, Baireddy S, Hao H, Montserrat DM, Delp EJ (2021) Manipulation detection in
satellite images using vision transformer. In: Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition (CVPR) workshops, pp 1032–1041. https://doi.org/
10.1109/CVPRW53098.2021.00114
16. Dang LM, Hassan SI, Im S, Moon H (2019) Face image manipulation detection based on a
convolutional neural network. Expert Syst Appl 129:156–168. https://doi.org/10.1016/j.eswa.
2019.04.005
17. Shi Z, Shen X, Chen H, Lyu Y (2020) Global semantic consistency network for image manip-
ulation detection. IEEE Signal Process Lett 27:1755–1759. https://doi.org/10.1109/LSP.2020.
3026954
18. Zhou Z, Pan W, Wu QMJ, Yang C-N, Lv Z (2021) Geometric rectification-based neural network
architecture for image manipulation detection. Int J IntellSyst 36(12):6993–7016. https://doi.
org/10.1002/int.22577
19. Wei X, Wu Y, Dong F, Zhang J, Sun S (2019) Developing an image manipulation detection
algorithm based on edge detection and faster R-CNN. Symmetry (Basel) 11(10):1–14. https://
doi.org/10.3390/sym11101223
20. Bekci B, Akhtar Z, Ekenel HK (2020) Cross-dataset face manipulation detection. In: 2020 28th
Signal Process Commun Appl Conf (SIU), pp 34–37. https://doi.org/10.1109/SIU49456.2020.
9302157
21. Bassi MA, Lopez MA, Confalone L, Gaudio RM, Lombardo L, Lauritano D (2020) Enhanced
Reader.pdf. Nature 388:539–547
22. Zheng L, Zhang Y, Thing VLL (2019) A survey on image tampering and its detection in
real-world photos. J Visual Commun Image Representation 58:380–399
Improving Real-Time Intelligent
Transportation Systems in Predicting
Road Accident
O. F. Ademola
Covenant University, Ogun State, Ota, Nigeria
e-mail: omolola.ademola@covenantuniversity.edu.ng
S. Misra
Ostfold University College, Halden, Norway
e-mail: sanjay.misra@hiof.no
A. Agrawal (B)
Amity University, Haryana, India
e-mail: akshatag20@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 225
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_18
226 O. F. Ademola et al.
1 Introduction
Almost 15 million people die every year as a result of traffic collision which indicates
that more than 3000 deaths are recorded every day; also, 2 to 5 million people get
injured due to road accident [1].
We can say that one of the worst affected countries of road accident is Nigeria,
despite integrated efforts to reduce fatal road accidents, yet the country still falls
victim of such mishap. With a human population of approximately 207 million, a
high vehicle population estimated at more than 30 million, a total road length of about
194,000 km (34,120 km of federal, 30,500 km of state and 129,580 km of local roads),
and the country have suffered severe losses due to fatal car accidents [1]. Survey
results have shown that the death rate in road accidents among young adults is very
high, which is a major part of the global economy [2]. The problem of road accident
can worsen in the future due to high increase of population growth and migration to
urban areas around the country. Various road safety strategies, methods, and counter-
measures have been proposed and applied in resolving this problem. Such as training
and retraining of drivers in adherence to safety tips Federal Road Safety Corps
(FRSC) and Vehicle Inspection Officers are responsible for ensuring compliance
with speed limit regulations, ensuring that vehicles are in perfect condition, building
sustainable roads, and repairing damaged roads. Hence, it is vital to develop more
technologically driven and practical solutions to reduce road accidents.
Advances in the use of an intelligent transportation system (ITS) have been
deployed in most of the world, which presents new opportunities for developing
sustainable transportation system [3]. In several cities around the world, especially
the developed countries incorporate the use of ITS. One might claim that ITS stands
out to be one of the oldest technology that constitutes the Internet of Things, but
still presently it can be seen as a leading-edge in sustainable cities. In Madrid, for
example, every public transport network and part, including trains, trams, busses,
and bus stops, is linked to a central control center, which collects and processes data
in real time to deliver smart and efficient services and applications to end users [4].
A large amount of data can be generated using intelligent transport systems. The
technology advancement in ITS, such as smart card, GPS, sensors, video detectors,
social media, and so on, has increased the complexity, the variety, and quantity of
information generated and collected from vehicles and the movements of persons
[5]. Massive volume of data is being recorded from different device that makes up
the ITS; however, traditional data management systems are inefficient and cannot
fully analyze the data being produced for deployment of an effective transportation
system. This is because the data volume and complexity are not compactible. To
combat this problem, a candidate solution is the use of Big Data analytics tools such
as Apache Spark, has been found to process vast volumes of data, and has been
used extensively in academies, stock markets, organizations, and industries [6]. An
efficient structure is necessary to design, implement, and manage the transportation
system in order to meet the computational necessity of the massive data analysis. In
this context, Apache Spark has become a centralized engine for large-scale analysis
Improving Real-Time Intelligent Transportation Systems in Predicting … 227
of data across a variety of ITS services. The deployment of the Apache Spark in
intelligent transportation system can be developed for applications, in the area of
real-time traffic control, and estimating the average speed and the congested sections
of a highway. Apache Spark is much quicker and simpler to use with this advanced
model [7]. Apache Spark uses the memory of a computer cluster to minimize reliance
on the distributed network underlying it, which results in significant improvements
of Hadoop’s map reduction.
The contribution this paper offers will be the use of Apache Spark in filtering out
data gotten from Twitter that is relating to road accident. This data will be used to
predict the likelihood of road accident occurring with the use of machine learning
algorithms, and based on those predictions, the algorithm that predicts best can be
deployed for use in the intelligent transportation system.
The rest of this paper is divided into four sections. In Sect. 2, we discuss related
works and identified research gaps which has led to this study. In Sect. 3, the method-
ology of the research which deploys the use of the Apache Spark in organizing and
cleaning of the raw data, this processed data are then analyzed using the machine
earning algorithm to make predictions. Furthermore, the results and discussion is
given in Sects. 4, and 5 concludes on the research study.
2 Literature Review
ITS devices such as microwave vehicle detection system (MVDS) are also considered
as means of collecting road accident data, data relating to vehicle speed, distance,
occupancy, and vehicle type; these road devices can generate thousands of records
weekly of the road activities [12].
The use of decision tree classifiers, PART induction rules, Naive Bayes, and multi-
layer perceptron was used by [13] to establish essential variables for the prevention
of accident seriousness. Through comparing the various models obtained, the authors
concluded that with a value of 0.08218, the tree classification and the rule induction
are the most accurate. Age, gender, nationality, year of the accident, and accident
form were the most significant variables in accident fatality.
A proposed software structure by [14], draw a significant relationship between
the variables linked to the road accident, applied to the dataset on road accidents
in Morocco, the proposed work selected the appropriate rules employing multiple
criteria analysis. Ultimately, the system will forecast death and injury based on time
series analysis using the selected regulations. The consistency of the regulations is
calculated by the assist value or the occurrence frequency of rules and compliance.
A study examined the use of Twitter as a data source and natural language
processing to improve the efficacy of road incident detection by [15]. The results
showed that only 5 percent of the information received was useful after analysis,
suggesting that the tweets were connected with traffic accidents and were able to
geocode information on a map. For the complete classification of the dataset as traffic
accidents, the researchers registered an accuracy of 0.9500. On the other hand, the
precision value for the geocoding phase from tweets was 0.5200. Public data sources,
such as the road monitoring network and the police recording of incidents of traffic,
have been checked. The authors affirmed that the frequency of the postings, which
culminated at weekends, was steady.
The results of a proposed ANN were evaluated, and a correlation of 0.991,
an R-squared value of 0.9824, and an average 4,115 square error, a root mean
square (RMSE) error of 2.0274 were reported by [16]. The model was proposed
for predicting road accidents based on an artificial neural network and taking into
account not only the accident details, such as the behavior of drivers, cars, time and
hour, and road structure but also rather certain information on the geometry of road
transport and road volume statistics. The authors consider the variable vertical degree
of road curvature to be the critical parameter influencing the number of road acci-
dents. Artificial neural networks have been developed by [17] to predict the severity
of road accidents by preprocessing road accident data using a K-means cluster to
sort the data and improve the prediction. To verify the results obtained, the authors
applied the ordered test model, finding that the ANN yields a higher precision, with
a value of 0.7460 above the 0.5990, which was obtained from the other models.
Predictive analyzes were performed using supervised learning perspective incor-
porated with autoregressive integrated moving average (SARIMA), and a Kalman
filter was developed by [18]. The work done in [19], suggested multi-task learning
(MTL) can be integrated into a deep learning model in order to learn the efficiency
of unattended flow prediction features. This profound learning model allowed for
the automatic prediction process while guaranteeing a high level of precise learning.
Improving Real-Time Intelligent Transportation Systems in Predicting … 229
Another work considers the use of deep learning approach in finding the range in
object detection in this case car which aid in improving the safety in self-service
vehicles [20]. For highway scenarios, the model may reduce errors in the range esti-
mation to an appropriate amount. The behavior of drivers in delivering the required
support to ensure safety is often taken into account in independent vehicles.
The research reported by [21], as used the Bayesian network, the J48 decision
treaties, and the neural network artificially to identify the most significant variables
in order to predict the frequency of road accidents. This research shows that the
Bayesian network has produced the most accurate 0.8159, 0.7239, 0.7239, and an
F-measure of 0.723. Findings showed that the lighting, road condition, and weather
condition could result to accidents on the road. Bad roads were identified by the
system and were marked as a likelihood of causing road accident.
In [22], the authors proposed a method for automated detection by machine
learning and Big Data technologies of road traffic events from tweets in the Saudi
Dialect. First, they create a classifier and train it with four machine learning algo-
rithms for the filtering of tweets in a relevant and irrelevant way, support vector
machine, decision tree algorithms, k-nearest neighbor (KNN), and Naïve Bayes algo-
rithm. Then, they train other rank classifiers for the identification of different types
of accidents, bridges, road closures, traffic damage, fire, weather and social occur-
rences. Analyzes of one million tweets have shown that their method automatically
detects road traffic events, their location and time without having been aware of the
events beforehand. To the best of their knowledge, the Apache Spark Big Data Plat-
form was the first task in detecting traffic events from Arabic tweets. The research
gap in this paper couldn’t extract the exact location of events that occurred in the
location detection approach, and the variety of data gotten from this work was well
filtered enough to get the relevant information about road accidents.
If we are to consider real-time processing data, Apache Spark is of best choice.
It has a function called Spark Streaming which has the advantage manage lots and
stream workloads by a single execution engine, thus overcoming the constraints of
conventional streaming systems. Spark Streaming allows Spark to improve its core
planning capacity to display data in real time.
3 Methodology
highly difficult to process huge data in reasonable time and make decisions in real
time. A number of important problems, namely the right preprocessing, real-time
analytics, and a model of communication, are posed from literature. Therefore, we
are exploring the criteria for a resourceful communication model based on big data
analysis and proposing a standardized architecture in an intelligent transport network
for processing data in real time. We will make use of data gotten from twitter based on
the transportation system, which will help us improve the intelligent transportation
system.
The Spark Streaming is used for real-time data processing. With the Spark
Streaming, live data are processed, scalable and fault-tolerant, high throughput, and
real-time data support of about 0.5 s. Spark uses RDD to arrange data and recover
from failures.
To illustrate the proposed architecture for predicting road accident using Apache
Spark. It consists of four main section: Big Data gathering and regulation, data
preprocessing, data processing, data prediction.
Data are collected using Twitter streaming API. A social media accounts will be
created, and keyword sets will be defined making use of the hashtag. This accounts
will be used to trend about the keyword relating to road accidents tweets consisting
different road accidents occurring in different locations will be posted using the
hashtag. This data can be logged and kept for record purpose in order to be used in
making a decision.
The raw data are gotten and stored as Javascript Object notation (JSON) file exten-
sion. This file extension is stored in MongoDB. For the proposed work, attributes
are used for the event prediction. The attribute is defined by; the timestamp, a user
who has made a post using the hashtag, the location the accident occurred, the text
context of the user, and the road name detected location of event. Each attribute will
be separated by a delimiter “\\” character.
The next action to take after collecting the data is preprocessing. The data gotten from
social media are vulnerable to incompatible data from various foundations, misplaced
text, unnecessary words, illegal characters, and noise. The preprocessing technique
helps to filter out unnecessary information and reduce the noise. The preprocessing
will be applied before the actual processing. This preprocessing helps to clean out
the inaccuracy of the collected information. Spark SQL is utilized to preprocess
the data. The words or text gotten from tweets can be transformed into Token. The
tweets gotten from social media cannot be analyzed directly upon because of the
Improving Real-Time Intelligent Transportation Systems in Predicting … 231
noise. A supervised machine learning is then feed with this preprocessed data. The
preprocessed data are stored back to the MongoDB as cleaned and parsing data.
Social media comprises of different kind of noise; this noise can be reduced by
making use of an optimal estimator known as Kalman filter. The Kalman filter is
utilized for fast response of the data processing in filtering out the noisy data. The
expression for the Kalman filter is given below:
|D|
IDF(t, D) = log (2)
DF( j, N )
The algorithm converts the input which will be the lists of tokens into vector
matrix of tokens. The results of the term frequency will be transferred to the IDF
Improving Real-Time Intelligent Transportation Systems in Predicting … 233
algorithm. After which, the IDF will sort content vector producing an output that will
be stored in the content set. The content set is passed as an input to the classification
algorithm.
Tweet Classification
The collected tweet, as we know, is not all relating to traffic. A sort of binary classi-
fication will be applied that will categorize the tweets to two classes: traffic related
or non-traffic related.
After that a model is build which can be evaluated using evaluation metrics such
as accuracy, recall, specificity, and precision. The four classification models used are
support vector machine, decision tree algorithms, k-nearest neighbor (KNN), and
Naïve Bayes. The model of best fit with the traffic event detection will be used in
classifying real-time data relating to areas in Nigeria.
Parallel computing is employed for building and training a model using MLlib in
Apache Spark. Label 1 and label 0 are given which signifies traffic related and non-
traffic, respectively. The models are built and trained with the default input parame-
ters. Training data are incorporated which the models learns from; the pattern is found
from label of each tweet text with the training data. Next, the model’s accuracy is
evaluated using evaluation metrics and cross-validation approach. Furthermore, with
the best selected model, new tweets are predicted and categorized as 0 and 1 itera-
tively. For further processing, we filter out tweets that are not associated with traffic.
We also summarize the data for interpretation and gain insight through the applica-
tion of various data, such as hourly counting of number of tweets and the show of
the traffic events distribution place.
For training set and test set, we use a ratio 70/30 test split. We need to know the output
of our model, in particular with unnoticed data. The first is to split the dataset into
training and tests using the cross-validation approach. Finally, through measurement
metrics, we analyze the prediction results from test data. The output of four classifi-
cation models (Support vector machine, decision tree algorithms, k-nearest neighbor
(KNN), and Naïve Bayes) was compared. Moreover, to forecast actual twitter data
in the world, we pick the best model.
In our experiment, we use different training/test split data ratios, includes 50/50,
60/40, 70/30, and 80/20. We use a train test break, one of the methods of cross-
validation assessment. We divides our dataset into two un-overlapping sections
(workout set and test set). It is easy but efficient for validation purposes. The training
package is used to train our model while a test set and/or stop set are used to evaluate
our models’ output using the evaluation calculation for handling invisible data.
In the context of a geo-mapping, we need to locate spatial positions for the distri-
bution of the traffic status in the context of a Cartesian coordinate (latitude, longitude)
in order to define the geographical distribution of traffic status. This helps to analyze
234 O. F. Ademola et al.
tweets and to extract valuable information. Google Maps Geocoding API is used
for this. Geocode is a transformation of certain addresses into geographical coordi-
nates (latitude, longitude) to identify a position of the input given on the map. The
opposite is facilitated by reverse geocoding. It transforms geographical coordinates
into a readable address for humans. Geocoding reverse provides information for the
location of the particular place, which can be read and understood by people like
postal code, the name of the road, the town, the road number and area.
This section shows the result of the application of four models using support vector
machine, decision tree algorithms, k-nearest neighbor (KNN), and Naïve Bayes
models in predicting the certainty of the causes of road accidents and what might
actually make this road accident to occur.
Before we look into the models used in predicting road accidents, we have
observed that a number of attribute such as relating to a person being involved in
a crash, or due to roadway reconstruction and environmental conditions results to
factors that are used to study the causes which are fire, road closure, road damage,
social events, weather condition such as heavy rainfall can also be said to be a cause
of road accidents. These attributes that are likely to occur can be calculated using
statistics as demonstrated below. The formula for the statistic is expressed below and
as well as descriptive statistics of the explanatory variables is presented in Eq. (4).
Σ
n
ẑ kt1
Z = (4)
k=1
n
where ẑ kt1 is the random variable individually drawn from the sample data collected
from various event relating to road accident.
n is the finite sample size of the various events relating to road accident.
From Table 1, the result obtained for the standard deviation shows that, with the
different events that have occurred, the higher the value of the standard deviation,
the more likely that variable will occur more often.
0.8
0.6
0.4
0.2
0
SVM Decision Tree KNN Naïve Bayes
in terms of accuracy, specificity, and precision. However, 92% recall was achieved
for both KNN and support vector machine.
Different events have also been measured using the classification models of
support vector machine, decision tree algorithms, k-nearest neighbor (KNN) algo-
rithm, and Naïve Bayes algorithm when measured with the evaluation metrics. These
events, which are also referred to as the variable as seen in Table 1, are likely causes
of road accident, and they have been evaluated using the four classification models
considered in this paper. The result obtained shows that the one with a better perfor-
mance is KNN. The chart in Fig. 3 shows that damaged roads, accidents, traffic,
weather condition, road shutdown, social events, road work, and fire using the KNN
give a better yield.
Figure 4 represents the number of vehicles in one of the roads in Port Harcourt city
at a particular time, which amounts to causing traffic. As at 7:00am–9:00am (during
the breaking of the day) and 11:25am–12:30 pm (when the sun is up), the roads tends
to be extremely busy with high traffic when many vehicles are on the road. This is
due to the fact that from 7:00am–9:00am, office time resumes, school children on
236 O. F. Ademola et al.
1.2
1
0.8
0.6
0.4
0.2
0
their way to school, and much people tend to be on the road at that time. Therefore,
this system tells us when and where the traffic is at the highest at a particular time.
The system has the ability to recognize obstacles on the road; information from the
tweet is being extracted about the cause of the current traffic, and information about
the number of cars at a particular road can be extracted. Large amount of cars on
the road can result to road accident as this vehicles struggle to move forward; this
can give a possible red flag as care must be taken in ensuring angry drivers don’t
create a scene of causing accident on that path, due to hesitation of one getting to
their destination on time (Fig. 5).
The tweets gotten are measured in millions. The words extracted from these tweets
are road, congestion, highway; after gathering the tweets, it was observed that the
major cause of the traffic congestion was due to blockage of one side of the main
roads making it difficult for drivers to get to their stipulated destination on time; at
another time, there was a heavy tanker that fell on the road and caught fire, making it
difficult for cars to pass as it may also make those cars passing to catch fire as well.
The more the number of vehicles on the road the lesser the average speed of the
cars, which will as well result to longer time in getting to one’s destination. We can
as well get real-time traffic information to estimate the best path between two points.
The objective of this paper is to process data gotten from tweets concerning road
accidents, road traffic, road events, and other various road activities that occur on
the road using Apache Spark to yield intelligent transportation system. Based on
the results given, we can conclude that the system operates well in real time when
implementing it on Apache Spark system. The Apache Spark system has helped in
classification of the Big Data gotten from tweets and categorized in a format where
we can make better prediction of when road accident is likely to occur.
Improving Real-Time Intelligent Transportation Systems in Predicting … 237
350
300
Number of vehicles
250
200
150
100
50
0
10:40:18
11:24:24
12:08:30
12:52:36
13:36:42
14:20:48
15:04:54
15:49:00
16:33:06
17:17:12
18:01:18
18:45:24
19:29:30
20:13:36
20:57:42
21:41:48
22:25:54
6:15:25
6:59:48
7:43:54
8:28:00
9:12:06
9:56:12
Recorded me
30,000,000
25,000,000
Number of tweets
20,000,000
15,000,000
10,000,000
5,000,000
0
16:00
8:15:29
9:12:01
22:21
10:02:25
10:30:11
11:25:00
12:20:59
13:28:15
14:06:32
15:31:00
17:15:19
18:17:55
19:35:45
20:00:32
21:43:09
23:15:03
23:52:36
Time(hourly)
5 Conclusion
The data collected from the tweets are used to make predictions for the intelligent
transport system. The Apache Spark was used to filter out irrelevant tweets from the
raw data. The models support vector machine, decision tree algorithms, k-nearest
neighbors (KNNs), and Naïve Bayes algorithm were used to make predictions based
on the relevant tweets of occurrence of road accidents happening around Nigeria.
238 O. F. Ademola et al.
The location of happenings was extracted. The models showed it was able to detect
the events of road accidents and make further predictions of how the intelligence
transportation systems is effective. Future research area can be on incorporating an
alarming system that sends information to the intelligent transportation system to
avoid road accidents from occurring. Given huge amount of data size, this system
is efficient enough for processing and offers the solution for real-time processing
in an intelligent transportation system. The incorporation of the Apache Spark into
intelligent transportation system is of great advantage in reducing the number of
road accident. Additionally, it will be helpful to equate prediction output with other
sophisticated simulation methods, such as the artificial neural network.
Acknowledgements This paper was sponsored by Covenant University, Ota, Ogun State Nigeria.
References
1. Favour OI, et al (2016) Statistical analysis of pattern on monthly reported road accidents in
Nigeria. Sci J Appl Math Stat 4(4):119–128. https://doi.org/10.11648/j.sjams.20160404.11
2. Igho OE, Isaac OA, Eronimeh OO (2015) Road traffic accidents and bone fractures in Ughelli,
Nigeria. IOSR J Dent Med Sci 149(4):2279–861. https://doi.org/10.9790/0853-14452125
3. Rezaei M, Klette R (2014) Look at the driver, look at the road: No distraction! No accident!.
In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR),
pp 129–136. Retrieved from https://openaccess.thecvf.com/content_cvpr_2014/html/Rezaei_
Look_at_the_2014_CVPR_paper.html
4. Menouar H, Guvenc I, Akkaya K, Uluagac AS, Kadri A, Tuncer A (2017) UAV-enabled intel-
ligent transportation systems for the smart city: applications and challenges. IEEE Commun
Mag 55(3):22–28. https://doi.org/10.1109/MCOM.2017.1600238CM
5. Buele J, Salazar LF, Altamirano S, Aldás RA, Urrutia-Urrutia P (2019) Platform and mobile
application to provide information on public transport using a low-cost embedded device.
RISTI-Rev Iber Sist e Tecnol Inf 476–489
6. Babar M, Arif F, Jan MA, Tan Z, Khan F (2019) Urban data management system: towards big
data analytics for internet of things based smart urban environment using customized Hadoop.
Futur Gener Comput Syst 96:398–409. https://doi.org/10.1016/j.future.2019.02.035
7. D’Silva GM, Khan A, Gaurav, Bari S (2017) Real-time processing of IoT events with historic
data using Apache Kafka and Apache Spark with dashing framework. In: 2017 2nd IEEE Inter-
national conference on recent trends in electronics, information & communication technology
(RTEICT), pp 1804–1809. https://doi.org/10.1109/RTEICT.2017.8256910
8. Contreras-Castillo J, Zeadally S, Guerrero-Ibañez JA (2017) Internet of vehicles: architecture,
protocols, and security. IEEE Internet Things J 5(5):3701–3709. https://doi.org/10.1109/JIOT.
2017.2690902
9. Beil C, Kolbe TH (2017) CiytyGML and the streets of New York — a proposal for detailed
street space modelling. In: Proceedings of the 12th International 3D GeoInfo Conference.
ISPRS Ann Photogramm Remote Sens Spat Inf Sci vol IV-4/W5, pp 26–27
10. Cao G, Michelini J, Grigoriadis K et al (2015) Cluster-based correlation of severe braking events
with time and location. In: 2015 10th System of systems engineering conference (SoSE), pp
187–192. https://doi.org/10.1109/SYSOSE.2015.7151986
11. Salas A, Georgakis P, Petalas Y (2017) Incident detection using data from social media. In: 2017
IEEE 20th International conference on intelligent transportation systems (ITSC), pp 751–755
Improving Real-Time Intelligent Transportation Systems in Predicting … 239
12. Shi Q, Abdel-Aty M (2015) Big data applications in real-time traffic operation and safety
monitoring and improvement on urban expressways. Transp Res Part C Emerg Technol 58(Part
B):380–394
13. Alkheder S, Taamneh M, Taamneh S (2017) Severity prediction of traffic accident using an
artificial neural network. J Forecast. 36(1):100–108
14. Ait-Mlouk A, Agouti T (2019) DM-MCDA: a web-based platform for data mining and multiple
criteria decision analysis: a case study on road accident. SoftwareX. https://doi.org/10.1016/j.
softx.2019.100323
15. Gu Y, Qian Z, Chen F (2016) From twitter to detector: real-time traffic incident detection using
social media data. Transp Res Part C Emerg Technol 67:321–342. https://doi.org/10.1016/j.trc.
2016.02.011
16. Çodur MY, Tortum A (2015) An artificial neural network model for highway accident
prediction: a case study of Erzurum, Turkey. Promet Traffic Transp 27(3):217–225
17. Taamneh M, Alkheder S, Taamneh S (2017) Data-mining techniques for traffic accident
modeling and prediction in the United Arab Emirates. J Transp Saf Secur 9(2):146–166
18. Lippi M, Bertini M, Frasconi P (2013) Short-term traffic flow forecasting: an experimental
comparison of time-series analysis and supervised learning. IEEE Trans Intell Transp Syst
14(2):871–882
19. Deng S, Jia S, Chen J (2019) Exploring spatial–temporal relations via deep convolutional neural
networks for traffic flow prediction with incomplete data. Appl Soft Comput 78:712–721.
https://doi.org/10.1016/j.asoc.2018.09.040
20. Parmar Y, Natarajan S, Sobha G (2019) DeepRange: deep-learning-based object detection and
ranging in autonomous driving. IET Intell Transp Syst 13(8):1256–1264
21. Castro Y, Kim YJ (2016) Data mining on road safety: factor assessment on vehicle accidents
using classification models. Int J Crashworthiness 21(2):104–111
22. Alomari E, Mehmood R, Katib I (2019) Road traffic event detection using twitter data,
machine learning, and apache spark. In: 2019 IEEE SmartWorld, ubiquitous intelli-
gence & computing, advanced & trusted computing, scalable computing & communica-
tions, cloud & big data computing, internet of people and smart city innovation (Smart-
World/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), pp 1888–1895. https://doi.org/10.1109/
SmartWorld-UIC-ATC-SCALCOM-IOP-SCI.2019.00332
Consumer Buying Behavior Analysis
During COVID-19 Phase Using
Statistical Data Mining and Multi-verse
Stochastic Algorithm
Abstract COVID-19 has changed particularly the marketing and retail sectors,
the epidemic has changed peoples’ habits. Analyzing client purchase patterns
and flows is a common use for consumer behavior analysis. For the purpose of
analyzing consumer behavior and purchase trends across the COVID-19 era, this
work combined a statistical strategy with a data mining approach. Furthermore, a
survey-based (online) data analysis was conducted for the evaluation by retailers
and customers, filling out a questionnaire that comprised demographic information
and product associations obtained during the epidemic. In order to test the associa-
tion rule of data mining, the information for this study was gathered from a nearby
grocery. Additionally, main data was converted to secondary data using the meta-
verse algorithm (MOA) (balanced). One of the newest meta-heuristic optimization
algorithms, multi-verse optimization (MVO), imitates the multiversity hypothesis
of physics and simulates the interaction of many universes. This MOA is based
on natural phenomena that employed a stochastic method to accomplish its objec-
tive. Finally, statistical analysis has been carried out to look into the purchasing and
selling trends of retailers within the dataset. In this method, association rules are
generated via pincher search. It counts the supports of the candidate in each run
A. Sinha (B)
Department of Computer Science, IGNOU, New Delhi, India
e-mail: anuragsinha257@gmail.com
M. Bhargavi
Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation,
Vaddeswaram, Guntur, India
N. K. Singh
Department of Computer Science, BIT Mesra, Ranchi, Jharkhand, India
D. Narayan
Department of Biotechnology, Amity University, Ranchi, Jharkhand, India
N. Garg
Department of Mathematics and Statistics, IIT Kanpur, Kanpur, Uttar Pradesh, India
S. Pal
Department of Physics, IIT Kanpur, Kanpur, Uttar Pradesh, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 241
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_19
242 A. Sinha et al.
using a bottom-up approach in addition to the supports of chosen item sets using
a top-down approach. The Maximum Frequent Candidate Set refers to these. The
proposed binary versions transfer a continuous version of the MVO algorithm to its
binary counterparts using the idea of transformation functions.
1 Introduction
this kind of research, where the associations between the product and the buying
investigation have to be measured [7, 8]. The major problem with this data is the
clustering and establishing the correlation between each other. Thus, with the intro-
duction of meta-heuristic algorithms, the more complex kinds of optimization prob-
lems have been solved. A multi-verse algorithm has been recently proposed based
on a meta-heuristic swarm intelligence algorithm, which is inspired by multi-verse
astrophysics. In this paper, we have used this MVO algorithm for the data [9].
To overcome the complexity of the distance measures between datasets, the
Manhattan and Euclidean distance clustering optimizations are implemented. Data
mining is one of the most widely used methods for consumer behavior analysis by
implementing association rule mining. In this paper, we have used an association rule
mining-based statistical mining approach for examining the frequent set of items that
have been purchased during COVID-19 by the same set of customers. In the part of
statistical data mining in which we have examined the maximum threshold associa-
tions of products with the data mining approach, business decision-making is one of
the critical factors that can be improved when a company comes into contact with a
customer segment and their preferences for a product, whether through an offline or
online marketing framework. As a result, implementing consumer behavior analysis
using a data mining approach increases business efficiency and productivity [4]. The
major contribution of this paper is that in feature selection, a collection of M features
is selected features from a collection of N features in the data, MN, so that the value of
a certain evaluation criterion or function is optimized over the set of every conceivable
feature subset. In this article, we suggest, test, and discuss an effective strategy based
on the most recent multi-verse optimizer (MVO) for feature selection and parameter
optimization to increase the apriori algorithm’s accuracy of variable. To the best of
our knowledge, this is the first time using data optimization using MVO. I suggested
an enhanced architecture to increase the data resilience and generalization ability.
This paper is arranged in a chronological fashion where Sect. 1 contains an introduc-
tion, Sect. 2 literature review, Sect. 3 materials and method, Sect. 4 proposed method,
Sect. 5 result and discussion, Sect. 6 conclusion, and future work, respectively.
2 Literature Review
In Ref. [1], author has used consumer behavior analysis for neuro-marketing appli-
cation in which the maximum threshold of 90% was obtained from the features that
have been collected from that 20 people. It was used for obtaining the maximum
efficiency in the application area of the marketing using the EEG signal. In Ref. [2],
author has used association rule mining for consumer behavior analysis on the super-
market data for examining the buying pattern of the consumer toward the products of
the supermarket using conventional data mining approach. In Ref. [10], Author has
used web-based association rule mining for examining the customer factor toward
the particular association of the product on e-commerce website using conventional
data mining approach. In Ref. [4], author has used association rule-based market
244 A. Sinha et al.
basket analysis for examining the engagement of the customer in one supermarket in
Thailand country based on the different threshold metrics on the product like support
and gain that is converted on the feature vector after calculation. In Ref. [5], author
has employed data mining-based association rule mining approach for increasing
the sales and predicting the consumer behavior with the accuracy of the confidence
level. In Ref. [3], to create voltage amplitude and discrete phase distributions in the
dipole elements for the creation of flat-top beam/pencil beam patterns, the multi-verse
optimization algorithm is used. While the phase distributions of these two patterns
are different, they both have similar amplitude distributions. Simulation findings
show that this algorithm completed its mission successfully and that it is also better
than other algorithms like Particle Swarm Optimization, Gray Wolf Optimization,
and Imperialist Competitive Optimization algorithms. In Ref. [6], author has used
different statistical approach for consumer behavior analysis, such as chi-square and
ANOVA test, which gives the maximum entropy on the validation data that have been
performed on the secondary dataset. In Ref. [11], author has used different machine
learning algorithms which have been implemented on Python for consumer behavior
analysis based on the supermarket data with the optimized algorithm, which gives the
maximum threshold confidence level of the model. In few work, the high exploration
and local optima avoidance of the MVO algorithm are the source of the MVO-based
SVM’s dependability and robustness. The rapid shifts in the solutions produced by
the use of white/black holes highlight the exploration process and aid in the removal
of the local optima stagnation. The WEP and TDR parameters also help MVO to accu-
rately utilize the promising regions during the course of iterations in order to increase
the generalization power and resilience of the SVM after initially performing a thor-
ough broad search of the search space [9, 12]. Anxiety, COVID-related dread, and
sadness all predicted consumer behavior toward necessities, whereas necessities-
only behavior was predicted by anxiety. Furthermore, personality characteristics,
perceived economic stability, and self-justifications for buying were all found to
predict consumer behavior toward needs and non-necessities. We now know more
about how consumer behavior changed during the COVID-19 pandemic thanks to the
current study. The findings may be used to create marketing plans that take psycho-
logical elements into account in order to cater to the requirements and feelings of
genuine consumers [13]. Due to the development of new optimization approaches
that were effectively used to address such stochastic mining challenges, data mining
optimization has drawn a lot of interest in recent years. In order to build evolutionary
optimization algorithms (EOAs) for mining two well-known machine learning data
sets, this research applies four alternative optimization strategies. Iris dataset and
Breast Cancer dataset are the chosen datasets utilized to assess the proposed opti-
mization strategies [9]. [14] discusses how actions like increasing home cold storage
capacity could undermine system resilience by exacerbating bullwhip effects, or
amplifying consumer demand shocks that are propagated to upstream food supply
chain actors, whereas responses like improving food skills can reduce the propa-
gation of shocks through the supply chain by allowing greater flexibility and less
waste.
Consumer Buying Behavior Analysis During COVID-19 Phase Using … 245
For this particular research, we have conducted a survey in which we gathered data
in the form of categorical values where the several sections like demographic data,
psychological buying pattern, and products that they have bought during COVID is
segmented [2]. The data is then further combined to close values using secondary
form of the data, which is later converted into secondary form in which we have
implemented statistical analysis for getting entropy-level, variance, and for hypoth-
esis testing, which is discussed in the later part of the paper. The data question-
naire is represented in the figure below. We have also gathered associations of prod-
ucts which they have brought by collecting payment receipt data from various local
markets, which has been later preprocessed using data discretization and a multi-verse
stochastic algorithm as shown in Fig. 1. [4, 5, 10]
The multi-verse algorithm is one of the recently developed areas in the field of
metaheuristics and nature-inspired optimizers. As per the concept of the multi-vision
algorithm, it shows that the universe has an infinite number of universes within
itself which exist, and the theories which underline the different warm holes and
white holes. So in this algorithm, the wormholes represent the total exploration and
246 A. Sinha et al.
exploitation parts combined with white and black holes and the total variables are
referred to as the object which is used as the inflation rate for the solution finiteness
problem and finally refers to the iterations [6, 11]. The core mathematical model of
the MVO method is Eq. (1), which are as follows:
The multi-verse algorithm’s mathematical modeling depicts the object having the
interchange particles between the universes, and this is done using the roulette wheel
selection. And in every iteration, this universe is depicted as being the best one where
the D is used as the variable. And N is used as the number of universes, and U is
the total solution formulated for the total set of these universes, where the universe
is categorized as a normalized inflammation rate of the M universe [10].
Data preprocessing is the total scheduling and cleaning of the data that has been
collected and transformed from the various sources of data by employing the different
noising techniques. The data cleaning is done, and the segregated part of the data is
transformed to the normalized form. In this process, several anomalies and irregu-
larities are eliminated from the data. As an outcome, the information that is being
mined in the use of the data is shown using the abstracted form of the information
and the portrayal of this total in the collection of the data delivers total equivalent
logical outcomes of the total information being processed [15]. In dimensionality
decrease, information encoding plans are applied in order to get a decreased or
“packed” portrayal of the first information. Models also include information pres-
sure procedures (such as wavelet changes and head part examination). As for trait
subset choice (e.g., eliminating unessential characteristics) and quality development
(e.g., where a little arrangement of more helpful properties is gotten from the first
set) [4]. You have chosen, say, that you might want to utilize a distance-based digging
calculation for your investigation, like brain organization, closest neighbor classi-
fiers, or clustering. 1 Such strategies produce better results if the information to be
dissected is standardized, or at the very least scaled to a more modest reach, such as
[0.0, 1.0] [5]. The client information, for instance, contains the credit age and yearly
compensation. The annual compensation trait generally prioritizes quality over age.
Hence, assuming that the qualities are left normalized, the distance estimations taken
on yearly compensation will, for the most part, offset the distance estimations taken
on age. Discretization and idea order age can also be beneficial in cases where raw
information values for credits are replaced by ranges or higher applied levels [3]. For
instance, crude qualities for age might be supported by more elevated levels of ideas,
like youth, grown-up, or senior [6]. Furthermore, idea ordered progression age is
an integral asset for information mining in that it permits the mining of information
at various degrees of deliberation. For instance, the expulsion of excess information
Consumer Buying Behavior Analysis During COVID-19 Phase Using … 247
4 Proposed Method
this subvector, which contains the M components of the dataset’s quantities [15, 16].
In Fig. 2, the model is proposed which shows the hierarchy of work.
1x
sse = gcσ (cj, ri ) (2)
x
The equation gives the Euclidean distance between the centroid cj and the Ith data
point r i , which is represented with m-dimensions as r i (r i1 , r i2 ,…, r im ).
σ c j , ri = 1 × gc σ c j , ri (3)
data. This data indicates how many characteristics are preferred. Total support is
preferred on the point scale. Then, the respondents are asked to show the pair-wise
comparisons in the meantime, the range of attributes, which depicts the relative
importance of each of them [11, 19]. The two-way steps show that the pair-wise
comparison with the metrics N attributes is taken to show the Kth individual variable
where the pair-wise comparison variable shows. The greater the importance of the
reciprocal, the more the data is reformatted into the pair-wise comparison metrics
format. This particular case shows the conducting AHP as an integrated part of the
data. It denotes the importance of the total variable, where the support of the variables
is reciprocal to the pair-wise matrix and this coded information. Either shows a true
or false positive in nature and a false negative in nature. Association rule mining
is a well-researched approach for finding the interrelation between the variables of
the items within a large, scalable transactional database. It is done to identify these
strong rules within the data using the different measures of the variables based on
this concept.
The total items in the database have N attributes, which is called items. And the
total transactions within those items are what is called in the database. And each
transaction has a unique transaction ID and attributes ID where it shows the rule can
be defined as a support, confidence, and antecedent between the data [4].
1
s = ta + tc (4)
t
the number of transaction on a
support = (5)
total transaction
where S is support; ta + tc1 t is the number of transaction that contains
antecedent and consequent; and is the number of transaction.
observed value and the expected value, expressed as (y observed − y predicted). The
least squares method, which minimizes the sum of squares of these residuals, is one
of the most used techniques for locating the regression line. For computing residual,
use the following equations:
y observed set
Residual of item shingles = (7)
number of otliers
• Accuracy: The number of real positive tests compared to the total number of
predicted benefits.
Accuracy = TP TP + FP (8)
• Review: The number of authentic positive tests among the true positive instances
is taken into account.
TP TP + FN (9)
• F1-measure: The weighted normal of accuracy and review represents the two
measures. It may give precedence to pieces of information over accuracy due to
the lopsidedness of the classifications.
In Fig. 3, the frequent item purchased during lockdown that has been retrieved
from transaction data is shown metrics [7, 18, 19]. In Fig. 3, the maximum confidence
metrics of product association is shown. In retail and e-commerce settings, its any
particular item is likely to account for a small share of transactions. Here, we have
aggregated up to the product category level and very popular categories are still only
present in 5% of transactions. Consequently, item sets with 2 or more item categories
will account for a vanishingly small share of total transactions (e.g., 0.01%). Only
0.014% of transactions contain a product from both the sports and leisure and health
and beauty categories. These are typically the type of numbers we will work with
when we set pruning thresholds in the following section [4].In Fig. 4, the frequency
range of data in cluster is given based on internal data coverage distance and its
skewness is measured in Fig. 4, the result of customer segment as per their income
and purchasing habit shown using clustering. Figure 5 shown confusion matrix of
classified accurate data. The frequency level of data in e-commerce can vary widely
depending on the specific type of data being analyzed. The convergence plot MVO
is given below which shows the correction in data optimization also a box plot graph
is given to depict the accuracy of classification based on SSE aftereffects of MVO,
Consumer Buying Behavior Analysis During COVID-19 Phase Using … 251
PSO, DE, and GA in treating all experiments. According to results on Figs. 3 and 4,
it is seen that the MVO can recognize the moderately best groups with the base SSE
results for all datasets. The same data is classified and test in data mining testing
environment using j48 classifier [4].
In this paper, we have shown that how data processing can optimize and enhance
the result of model. The result of the paper discusses the classification and segmen-
tation of customer purchasing patterns during COVID-19, which submit based on
the response which is collected from the custom on their transaction data where
the co-relation of their response is tested using AHP method. We have also used
multiple optimizations for data clustering and Manhattan distance analysis after the
data preprocessing. We have implemented apriori algorithm and customers buying
pattern based on association run by which gives the 87% of the accuracy which
have been tested on the weaker data mining environment and machine learning algo-
rithm. We have amalgamated fuzzy-c clustering method with association mining to
segment the customer type based on data collected in the survey. The major limi-
tation of proposed system is that it is tested on smaller dataset, more imbalanced
and unstructured data lies on web to be tested for consumer behavior analysis and
usability prediction.
252 A. Sinha et al.
References
1. Watada J, Yamashiro K (2006) A data mining approach to consumer behavior. In: First interna-
tional conference on innovative computing, information and control - volume I (ICICIC’06),
Beijing, China, vol 2, pp 652–655. https://doi.org/10.1109/ICICIC.2006.191
2. Gol, M, Abur A (2015) A modified Chi-Squares test for improved bad data detection. In: 2015
IEEE Eindhoven PowerTech, Eindhoven, Netherlands, pp 1–5. https://doi.org/10.1109/PTC.
2015.7232283
3. Mujianto AH, Mashuri C, Andriani A, Jayanti FD (2019) Consumer customs analysis using
the association rule and apriori algorithm for determining sales strategies in retail central. E3S
Web Conf 125:23003. https://doi.org/10.1051/e3sconf/201912523003
4. Yingzhuo X, Xuewen W (2021) Research on community consumer behavior based on associ-
ation rules analysis. In: 2021 6th International conference on intelligent computing and signal
processing (ICSP), Xi’an, China, pp 1213–1216. https://doi.org/10.1109/ICSP51882.2021.940
8917
5. Amin CR et al (2020) Consumer behavior analysis using EEG signals for neuromarketing
application. In: 2020 IEEE Symposium series on computational intelligence (SSCI), Canberra,
ACT, Australia, pp 2061–2066. https://doi.org/10.1109/SSCI47803.2020.9308358
6. Singh SP, Kumar A, Yadav N, Awasthi R (2018) Data mining: consumer behavior analysis.
In: 2018 3rd IEEE International conference on recent trends in electronics, information &
communication technology (RTEICT), Bangalore, India, pp 1917–1921. https://doi.org/10.
1109/RTEICT42901.2018.9012300
7. Bender KE, Badiger A, Roe BE, Shu Y, Qi D (2022) Consumer behavior during the COVID-
19 pandemic: an analysis of food purchasing and management behaviors in U.S. households
through the lens of food system resilience. Socio Econ Plann Sci 82:101107. https://doi.org/
10.1016/j.seps.2021.101107
8. Peighambari K, Sattari S, Kordestani A, Oghazi P (2016) Consumer behavior research: a
synthesis of the recent literature. SAGE Open 6(2):215824401664563. https://doi.org/10.1177/
2158244016645638
254 A. Sinha et al.
Jaspreet Singh, Chander Prabha, Gurpreet Singh, Muskan, and Amit Verma
Abstract Intelligence refers to the ability to learn and apply knowledge in new
innovative situations. Recent computational and electronic advances have increased
the level of autonomous semi-intelligent behavior exhibited by systems, so the new
terms like ambient intelligence and pervasive computing started to emerge. Artificial
intelligence (AI) and ambient intelligence (AmI) are inextricably technology cooper-
ation agreements with each other. Artificial is something made by human beings and
ambience is something that surrounds us while ambient intelligence assumed to be
something artificial. Pervasive computing is the growing trend of embedding compu-
tational capability into everyday objects to make them effectively communicate and
perform useful tasks to satisfy end user’s resource needs. The paper provides a biblio-
metric information using VosViewer software about the present trends of ambient
intelligence and pervasive computing technologies using Scopus and Web of Science
databases, which will help researchers getting ideas toward research development in
these domains. It also highlights a comprehensive study on ambient intelligence and
pervasive computing covering the application areas that are dramatically affected by
them in today’s world. Undoubtedly, both clean technologies have strongly influ-
enced the recent developments in the past few years. Furthermore, we can expect
that the scope will continue to multiply in upcoming years.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 257
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_20
258 J. Singh et al.
1 Introduction
In coming future, explicit source and destination devices would not be an issue
since sensors as well as processors will be built throughout ordinary objects, and the
system would smoothly adjust to that same user’s needs and preferences. Here is what
ambient intelligence (AmI) envisions regarding intelligent computing. A complex
AI system called ambient intelligence can respond immediately to human presence.
The response users receive from Siri or Alexa when it recognizes that your voice
is one specific example of ambient intelligence. Using ambient intelligence that is
concealed in the network connecting multiple devices, ambient intelligence would
enable gadgets to work together to support persons in carrying out routine activi-
ties, chores, and traditions inside of an intuitive way [1]. Pervasive computing is the
capacity of implanting knowledge in regular objects such that the individual who
interacts with this object diminishes the degree of interaction with the electronic
gadget [2]. Pervasive as well as ubiquitous are phrases that often are frequently
misused and confused with one another. Pervasive is related with the term pene-
trate, which means to distribute throughout, whereas pervasive is derived from either
the word ubiquity, which means to be something else. In other words, it refers to
that same transparency between technology-embedded different objects since there
is man-to-machine conversation. In contrast, pervasive computing corresponds to
the public accountability in a level where certain technology-embedded objects are
completely invisible or are embraced mostly by environment, i.e., beyond man-to-
machine interaction. In contradiction to pervasive computing, which is seen as an
effective implications, usage of Internet is viewed as a paradigm.
Furthermore, the paper is organized in such a way that Section II represents a
bibliometric analysis on the ambient intelligence and pervasive computing fields.
In Section III, lights have been thrown on the technology of ambient intelligence
covering its properties and applications in today’s world. The Section IV unfolds the
prospect related to pervasive computing covering its properties and applications in
today’s world. At last, in Section V, paper summarizes current challenges for ambient
intelligence and pervasive computing.
Addressing Role of Ambient Intelligence and Pervasive Computing … 259
database. The total of 405 documents was found from the database. The UK leads the
list with highest number of publications 53. India is at number 9 with 13 publications.
The result given in Fig. 3 graph shows the top 14 countries of the world with highest
number of publications. We use the filter of minimum of nine number of publications,
and the graph shows the countries which matched the said condition. About 405
document results are retrieved from the Scopus database. Keywords Used: “pervasive
computing” AND “ambient intelligence”.
Co-occurrence Keywords
Out of 2944 keywords, 63 meet the threshold (minimum number of occurrences of
a keyword is 10). Highest occurrences are of ambient intelligence with 254 number
of occurrences. Occurrences of pervasive computing are 119 as shown in Figs. 4, 5,
and 6.
3 Ambient Intelligence
A new field called ambient intelligence adds intelligence to the surroundings we live
in. Research addressing ambient intelligence (AmI) draws on advancements made in
pervasive computing, artificial intelligence, including sensor networks. Intelligent,
pervasive, highly unobtrusive electronic systems that are integrated often in human
made systems and therefore are adapted to the demands of the individuals include
known as ambient intelligence as well as pervasive computing. Such interactions of
different contemporary information and communication technology (ICT) aid people
262 J. Singh et al.
4 Pervasive Computing
real-time energy rate, cloud computing with everlasting memory, coin-sized disk
device, small color display video, as well as voice processing technologies, will
be available. Users would be able to communicate and access the data at anytime
from anywhere else in the world thanks to most of these capabilities [11]. Three
technologies have converged together form ubiquitous computing [12]:
– Compact, powerful, as well as energy-efficient devices as well as displays are
produced by microelectronic technology.
– Universal roaming and enhanced bandwidth as well as data transfer rates are also
benefits of modern telecommunication technology.
– In order to include a foundation for connecting various components into an inter-
connected power system comprising security, service, and billing systems, the
Internet has been regulated by numerous management system standards as well
as industry.
Homogeneous environment featuring complete Internet connectivity is delivered
through pervasive computing. A variety of techniques, including Internet connec-
tivity, speech recognition, networking, artificial intelligence, and wireless computing,
provide pervasive computing. Daily computer operations are incredibly approach-
able because to pervasive computing devices. Numerous possible implementations
for pervasive computing exist ranging from intelligent transport system but also
geographic tracking to home care and other services. The properties of pervasive
computing are enlisted below in Table 2.
– New distributed gadgets and services would have to be incredibly easy to use
and install because professional administrators are really not typically found in
home situations. They would not need to be configured, updated, or retired by
professional programmers. Users must therefore be given the ability to control the
settings and activities of respective home environments, while somehow allowing
for some degree of “autonomy” throughout the form of self-configuration and
self-adaptation of those ecosystems [15].
– It is necessary that AmI systems are informed of the preferences, intentions, and
needs of the user. AmI technologies should be aware of when it is more practical
Addressing Role of Ambient Intelligence and Pervasive Computing … 267
6 Conclusion
The current scenario and research potential in said fields using bibliometric analysis,
numerous applications in real world for said technologies, and current challenges in
underlying fields are envisaged in this paper. Bibliometric analysis conducted in this
paper which unfolds the trend, i.e., user interest in the field of ambient intelligence
and pervasive computing technologies according to various parameters considered
such as country-wise and year-wise publications. Around the globe, these fields are
gaining attentions. The scope of the keywords with their respective keywords and
citations of the respective documents is analyzed and results that are achieved are
presented with figures, which are showing the continuous increase in the publications
and research trends in the utilizations of respective fields. In recent years, the scope
of ambient intelligence and pervasive computing has expanded tremendously, with
almost all branches of software research and practice strongly feeling its impact. In
the context of this paper, increasing use of ambient intelligence as well as pervasive
computing environments and applications has been determined, and study presented
will give the most important suggestions for the future researchers in developing
and employing enhanced applications. We are aware that the goals put forth for AmI
and pervasive systems are not easily achievable, but the field is gaining momentum
at fast pace. So, looking at current growth in future, we need the researchers who
can build effective systems utilizing the properties of above-said technologies for
implementation in numerous sectors which will be the need of time in nearby future.
268 J. Singh et al.
References
1. Pantoja CE, Soares HD, Viterbo J, Seghrouchni A E-F (2018) An architecture for the
development of ambient intelligence systems managed by embedded agents. In: SEKE, pp
214–215
2. AbdulSattar K, Al-Omary A (2020) Pervasive computing paradigm: a survey. In: 2020 Inter-
national conference on data analytics for business and industry: way towards a sustainable
economy (ICDABI). IEEE, pp 1–5
3. Gams M, Gu I Y-H, Härmä A, Muñoz A, Tam V (2019) Artificial intelligence and ambient
intelligence. J Ambient Intell Smart Env 11(1):71–86. https://doi.org/10.3233/AIS-180508
4. Muskan, Singh G, Singh J, Prabha C (2022) Data visualization and its key funda- mentals: a
comprehensive survey. In: 2022 7th International conference on communication and electronics
systems (ICCES). IEEE, pp 1710–1714
5. Dobre C, Mavromoustakis CX, Garcia NM, Goleva RI, Mastorakis G (2016) Ambient assisted
living and enhanced living environments: principles, technologies and control. Butterworth
Heinemann
6. Rakotonirainy A, Tay R (2004) In-vehicle ambient intelligent transport systems (I-VAITS):
towards an integrated research. In: Proceedings of the 7th international IEEE conference on
intelligent transportation systems (IEEE Cat. No.04TH8749). IEEE, pp 648–651
7. Letchner J, Krumm J, Horvitz E (2006) Trip router with individualized preferences (TRIP):
incorporating personalization into route planning. In: AAAI, pp 1795–1800
8. Dashtinezhad S, Nadeem T, Dorohonceanu B, Borcea C, Kang P, Iftode L (2004) TrafficView:
a driver assistant device for traffic monitoring based on car-to-car communication. In: 2004
IEEE 59th vehicular technology conference. VTC 2004-Spring (IEEE Cat. No.04CH37514),
vol 5. IEEE, pp 2946–2950
9. Devi A, Rathee G, Saini H (2022) Secure information transmission in intelligent transportation
systems using blockchain technique. In: Intelligent cyber-physical systems for autonomous
transportation. Internet of Things. Springer, pp 257–266. https://doi.org/10.1007/978-3-030-
92054-8_15
10. Papageorgiou N, Apostolou D, Verginadis Y, Tsagkaropoulos A, Mentzas G (2018) A situa-
tion detection mechanism for pervasive computing infrastructures. In: 2018 9th International
conference on information, intelligence, systems and applications (IISA). IEEE, pp 1–8
11. Xu W, Xin Y, Lu G (2007) A system architecture for pervasive computing. In: Third
international conference on natural computation (ICNC 2007), vol 5. IEEE, pp 772–776
12. Henricksen K, Indulska J, Rakotonirainy A (2002) Modeling context information in pervasive
computing systems. In: International conference on pervasive computing. Lecture Notes in
Computer Science, vol 2414. Springer, pp 167–180
13. Satyanarayanan M (2001) Pervasive computing: vision and challenges. IEEE Pers Commun
8(4):10–17
14. Hansmann U, Merk L, Nicklous MS, Stober T (2013) Pervasive computing handbook. Springer
Science & Business Media, Springer-Verlag Berlin Heidelberg
15. Becker C, Julien C, Lalanda P, Zambonelli F (2019) Pervasive computing middleware: current
trends and emerging challenges. CCF Trans Pervasive Comp Interact 1:10–23. https://doi.org/
10.1007/s42486-019-00005-2
16. Silva-Rodríguez, Nava-Muñoz SE, Castro LA, Martínez-Pérez FE, Pérez-González HG,
Torres-Reyes F (2021) Predicting interaction design patterns for designing explicit interactions
in ambient intelligence systems: a case study. Pers Ubiquit Comput 26:1–12
17. Singh J, Singh G, Verma A (2022) The anatomy of big data: concepts, principles and challenges.
In: 2022 8th International conference on advanced computing and communication systems
(ICACCS). IEEE, pp 986–990
18. Shaheed SM, Abbas J, Shabbir A, Khalid F (2015) Solving the challenges of pervasive
computing. J Comput Commun 3:41–50
Addressing Role of Ambient Intelligence and Pervasive Computing … 269
19. Singh J, Singh G, Bhati BS (2022) The implication of data lake in enterprises: a deeper analytics.
In: 2022 8th International conference on advanced computing and communication systems
(ICACCS). IEEE, pp 530–534
20. Singh J, Bajaj R, Kumar A (2021) Scaling down power utilization with optimal virtual machine
placement scheme for cloud data center resources: a performance evaluation. In: 2021 2nd
Global conference for advancement in technology (GCAT). IEEE, pp 1–6
21. Singh J, Duhan B, Gupta D, Sharma N (2020) Cloud resource management optimization:
taxonomy and research challenges. In: 2020 8th International conference on reliability, infocom
technologies and optimization (trends and future directions) (ICRITO). IEEE, pp 1133–1138
Improving the AODV Routing Protocol
Against Network Layer Attacks Using
AntNet Algorithm in VANETs
R. S. Majeed
Faculty of Computer Engineering, Al-Esraa University College, Baghdad, Iraq
e-mail: rand@esraa.edu.iq
M. A. Abdala
Department of Medical Instrumentation Techniques Engineering, AL-Hussain University College,
Karbala, Iraq
e-mail: dr.m.ahmed@huciraq.edu.iq
D. A. Alwahab (B)
Faculty of Informatics, Eötvös Loránd University, Budapest, Hungary
e-mail: aalwahab@inf.elte.hu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 271
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_21
272 R. S. Majeed et al.
the percentage of enhancement is 16.82%, and with flooding attack, AntNet has the
lowest enhancement efficiency with value of 9.04%.
1 Introduction
Security is the most serious issue that impacts the performance of VANET. Due to
scale of network and mobility of node, it is vulnerable to different types of attacks
and also may lead VANET to be vulnerable to jamming [4]. Due to sensitivity nature
of data being transmitted by VANET, there is need for designing applications to
be protected from malicious nodes’ manipulation and losing. In network security,
attacks can be divided into three main classifications of threats; threats deal with
Authenticity, Confidentiality, and Availability of resources [5].
Attacks may affect the vehicular networks in different situations, and some of
them may raise the network delay. Others may cause congestion of network and
create routing loops. Other types of attacks may prevent the sending vehicle node
from finding correct route to destination [6]. Three types of attacks at the network
layer are tested in this research, these are:
• Blackhole attack: This type of attack shows itself as having a shortest path to
the received node that wants to attack its packets. The intermediate nodes in the
vehicular network advertise its availability of fresh routes and always have the
ability to reply the route requests and take the data packets [7]. Figure 2 shows
blackhole attacker node function on ad hoc network. Source node S wants to
send packets to the receiver node D. Firstly, it will send a Route Request (RREQ)
message to all neighbor nodes (nodes A, B, and C). Assuming node C has the best
route to node S, it replies with Route Reply (RREP) message firstly to the node
S. Before this, the blackhole node M sends a false RREP to node A with form of
having a high sequence number than the destination node, so node S sends the
packets among node A, assuming that it has the shortest path to the receiver node.
After that, the attacker node M will receive the packets and dropped them [8].
• Flooding attack: The main goal of this type of attack is to spend the resources
of network like bandwidth by broadcasting fake packets to non-existence nodes
in the network and make it go down, so that users become unable to access the
network [9]. The flooding attack can be classified into: data flooding, RREQ
flooding, and synchronization flooding. The RREQ flooding type is used in this
research. The attacker vehicle node floods RREQ packets to exist or not exist
nodes in the network. It will fill the routing table of neighbor nodes with RREQs.
So, only few numbers of data packets will be able to reach to destination. It will
spend the bandwidth and resources of the network. The high mobility of ad hoc
network makes it difficult to recognize this attack in network. Figure 3 shows the
RREQ flooding attack, which was used in this research [10].
274 R. S. Majeed et al.
• Rushing attack: This type of attack (also known as denial of service or novel
attack) behaves as denial-of-service attack with all currently proposed ad hoc
routing protocols in network (e.g., AODV). Usually, the source node finds a suit-
able route by looking for route cache of previously learned routes. In this attack,
the attacker node exploits property of the operation of route discovery [11] as
shown in Fig. 4.
• Step 2: If the received node is a ready destination node, it will reply by sending
an Route Reply (RREP) message to the source node in order to start transmission
of data. If not, it will start a search of a local route over neighboring node to find
available route to destination.
• Step 3: Finally, the checking of transmission will be done. If it is not successful, the
route discovery procedure will be started again by activating route maintenance
to find the available transmission path.
1. Using artificial agents called ants to generate a solution to the given problem.
2. Using the collected information during the past to create better future solutions.
2 Proposed Method
In this research, three types of network layer attacks were used to examine their effect
on VANET (blackhole, flooding, and rushing attacks). In addition to proposing the
combination of more than one type of routing attacks at the same time to show their
effect on AODV. The solution used in this research for eliminating the effect of
network attacks was the AntNet Colony Optimization (ACO) algorithm.
The process of VANET network using AntNet solution to blackhole attack
includes the following steps as shown in Figs. 5 and 6:
• First, to find destination, the node who wants to transmit data will send RREQ
message to all nodes in the network.
• If the blackhole node received the RREQ message, it advertises its ability of fresh
route. Otherwise, the procedure to find destination will continue and start data
transmission if it is successful.
• If the ACO algorithm is used, it will detect the attacker node and find suitable
route to destination by rediscovering suitable path again. Otherwise, the network
performance will be bad.
• Finally, the data transmission will be started successfully with attack effect
elimination.
For the rushing, flooding attacks, and using hybrid attacks including combination
of (blackhole + rushing) and (blackhole + flooding) on the same time on VANET,
the same process of eliminating using ACO algorithm is also used to eliminate their
effect on AODV as shown in Fig. 7.
Fig. 8 Total number of dropped packets of VANET for situations: without, with attacks and after
implementing the ACO algorithm with speed of cars equals to 30 kmph for a the blackhole (Bkh),
b rushing and using hybrid of (Bkh + rushing at the same time), and c flooding and using hybrid
of (Bkh + flooding at the same time)
Two simulators were jointly used to build the VANET simulation: SUMO [17] 0.25.0
which is used as a traffic simulator to simulate vehicular real-time simulation and
NS2 (network simulation) [18] with version 2.35 as a network simulation to configure
suitable parameters for VANET network as shown in Fig. 8.
Table 1 shows the parameters used for the simulation. In this simulation, for
wireless connection, the wireless channel was chosen; the AODV routing protocol
was used; for the signal propagation, omni antenna was used; the number of nodes
equals to 50, 100, 150, and 200 were positioned randomly over (2700 * 2700) m2 of
area; the simulation time was 600 s, and the IEEE 802.11p protocol is used as the
VANET MAC protocol. The following three sets of scenarios were simulated and
analyzed:
• The first set of scenarios includes running the simulation of AODV routing
protocol to analyze the performance of the network while changing speeds and
number of nodes.
• The second set of scenarios includes running the simulation of AODV under the
effect of attacks to see how it affects the performance of VANET.
• The third set of scenarios includes running simulation after implementing the
ACO algorithm and then compares the results of all scenarios.
Improving the AODV Routing Protocol Against Network Layer Attacks … 279
3 Results
In this research, the total dropped packets and throughput metrics were measured to
study and analyze the overall performance of VANET network under the effect of
flooding, blackhole, and rushing attacks with combination between them and how
network performance was improved after implementing the proposed ACO algorithm
solutions.
The following performance metrics are chosen:
• Total number of dropped packets is the difference between sent and received
packets.
• Average throughput [19] is the rate of successful packet delivery over a communi-
cation channel. In Eq. (1), the number 8 indicates the packets received converted
to bits. It is usually measured in bps, Kbps, or Mbps.
Fig. 9 Average throughput of VANET for situations: without, with attacks and after implementing
the ACO algorithm with speed of cars equals to 30 kmph for a the blackhole (Bkh), b rushing
and using hybrid of (Bkh + rushing at the same time), and c flooding and using hybrid of (Bkh +
flooding at the same time)
4 Discussion
After computing and analyzing the results of the total number of dropped packets
and average throughput for 50–150 nodes with all situations implemented, a compar-
ison between results is done and discussed to show how the network security
attacks affect the performance of AODV in VANET and how the implementation of
AntNet algorithm eliminated the effect of these attacks. Table 2 shows the percentage
of effected values of total number of dropped packets without and with attacks and
after implementing ACO algorithm nodes as follows:
• The percentage of effected values for total dropped packets with blackhole attack
is highly compared with flooding and rushing attacks.
• Increasing speed of nodes from 30 to 80 kmph leads to increase the percentage
of effected values of total dropped packets.
• The hybrid of rushing with blackhole attack at the same time values is few
compared with using only blackhole attack.
• ACO percentage values with blackhole are few compared with other attacks;
for example, with the number of nodes 50 and speed of 30 kmph, the effect of
blackhole attack is decreased from 53.86% into 16.84%.
Improving the AODV Routing Protocol Against Network Layer Attacks … 281
Fig. 10 Total number of dropped packets of VANET for situations: without, with attacks and after
implementing the ACO algorithm with speed of cars equals to 80 kmph for a the blackhole (Bkh),
b rushing and using hybrid of (Bkh + rushing at the same time), and c flooding and using hybrid
of (Bkh + flooding at the same time)
Fig. 11 Average throughput of VANET for situations: without, with attacks and after implementing
the ACO algorithm with speed of cars equals to 80 kmph for a the blackhole (Bkh), b rushing and
using hybrid of (Bkh + rushing at the same time), and c flooding and using hybrid of (Bkh +
flooding at the same time)
282 R. S. Majeed et al.
Table 2 Percentage values (%) of total dropped packets for VANET with attacks and ACO
Network status 30 kmph 80 kmph
50 100 150 200 50 100 150 200
Without attack 0.013 3.67 3.29 6.15 0.47 4.4 10.44 12.44
Bkh 53.85 44.48 57.45 32.39 37.02 44.6 99.3 26.83
ACO (Bkh) 16.84 23.64 51.56 14.00 31.97 34.2 55.94 10.5
Rushing 0.014 37.53 8.29 14.80 0.014 43.04 2.67 13.41
ACO (rushing) 0.014 24.45 30.19 13.34 0.014 42.44 9.57 14.66
Bkh + rushing 42.06 37.65 75.43 30.14 26.92 41.28 95.12 18.41
ACO (Bkh + rushing) 15.15 22.64 75.46 12.63 56.73 16.38 98.42 12.64
Flooding 0.014 40.45 3.35 6.15 0.014 8.55 10.53 4.73
ACO (flooding) 0 31.23 5.76 13.12 0 2.95 8.36 10.97
Bkh + flooding 60.57 41.20 60.96 31.95 72.35 46.44 97.82 26.30
ACO (Bkh + flooding) 45.43 23.61 37.66 16.49 77.40 30.22 56.90 10.97
5 Conclusions
attacker node. This will lead to decrease its effect on VANET and hence decrease
the number of total dropped packets which is high with blackhole attack.
• Implementing the hybrid of (Bkh + flooding) has a higher bad effect on VANET
than (Bkh + rushing), because both of blackhole and flooding attacks aim to
increase the values of drop packets while the goal of rushing attack is to gain
access the communication between source and destination nodes.
• The speed and number of nodes in the network affects ACO algorithm solution.
The efficiency of algorithm is reduced with increasing those parameters.
References
1. Botkar SP, Godse SP, Mahalle PN, Shinde GR (2021) VANET challenges and opportunities,
1st edn. CRC Press, U.S
2. Kaur M, Kaur S, Singh G (2012) Vehicular Ad-hoc networks. J Global Res Comput Sci 3(3):61–
64
3. Qays R (2015) Simulation and performance enhancement of VANET routing protocols. Al-
Nahrain University, Baghdad, Iraq, College of Information Engineering
4. Xiao L, Zhuang W, Zhou S, Chen C (2019) Learning-based VANET communication and
security techniques, 2nd edn. Springer Nature, Switzerland AG
5. Rehman S, Arif Khan M, Zia TA, Zheng L (2013) Vehicular Ad-hoc networks (VANETs)—an
overview and challenges. J Wirel Networking Commun 3(3):29–38
6. Qureshi K, Abdullah H, Mirza A, Anwar R (2015) Geographical forwarding methods in
vehicular Ad-hoc networks. Int J Electr Comput Eng (IJECE) 5(6):1407–1416
7. Salih R, Abdala M (2017) Blackhole attack effect elimination in VANET networks using
IDS-AODV, RAODV and AntNet algorithm. J Telecommun 36(1):1–5
8. Salih R (2017) Development of security schemes for VANETs. Al-Nahrain University,
Baghdad, Iraq, College of Information Engineering
9. Jindal S, Maini R (2014) An efficient technique for detection of flooding and jamming attacks
in wireless sensor networks. Int J Comput Appl (IJCA) 98(10):25–33
10. Rani S, Narwal B, Mohapatra AK (2018) RREQ flood attack and its mitigation in Ad-hoc
network. Springer Nature Singapore Pte Ltd, pp 599–607
11. Kumar S, Sahoo B. Effect of rushing attack on DSR in wireless mobile Ad-hoc network.
Department of Computer Science & Engineering, NIT Rourkela, Orissa
12. Mishra P, Sharma S, Bairwa A (2019) Mitigating the effect of rushing attack on AODV routing
protocol in mobile Ad-hoc network. Int J Res Anal Rev (IJRAR) 6(1):1154–1170
13. Kaur R, Rana S (2015) Overview on routing protocols in VANET. Int Res J Eng Technol
(IRJET) 2(3):1333–1337
14. Abdullah A, Aziz R (2014) The impact of reactive routing protocols for transferring multimedia
data over MANETs. J Zankoy Sulaimani 16(4):9–24
15. Sudhakar T, Inbarani H (2019) Improving the performance of AntNet protocol using perfor-
mance measures modeling in mobile Ad-hoc network. Int J Recent Technol Eng (IJRTE)
8(4):733–737
16. Pal S, Ramachandran K, Dhanasekaran S, Paul ID, Amritha (2014) A review on anomaly
detection in manet using AntNet algorithm. Middle-East J Sci Res 22(5):690–697
17. Dlr: http://www.dlr.de/ts/en/desktopdefault.aspx/tabid-9883/16931_read-41000/
18. ISI: http://www.isi.edu/nsnam/ns/
19. Hota L, Nayak B, Kumar A, Sahoo B, Ali GN (2022) A performance analysis of VANETs
propagation models and routing protocols. Sustainability 1–20
A Comparative Study of IoT-Based
Automated Hydroponic Smart Farms:
An Urban Farming Perspective
Abstract With the increase in world population and the simultaneous decline in the
resources available for farming, meeting the food demands of all the inhabitants is
a substantial challenge. Adopting advanced technologies to maximize agricultural
yield with the judicious use of available resources is the means to tackle these chal-
lenges. Internet of Things (IoT), one such budding technology, has made automation
possible in various facets of human lives and changed the way human–machine inter-
action takes place. IoT is a network of inter-related things which have the ability to
transfer data within each other and the surrounding environment. Integration of IoT
technology with modern farming facilitates various cost-effective intelligent farming
methods such as indoor farming and precision agriculture which, subsequently, leads
to a reduction in resource wastage and optimal utilization of resources such as land,
water, labor and fertilizers. This paper aims to review the most recent papers in the
IoT-assisted smart farming domain. The intention is to provide a comparative anal-
ysis of all the recent developments in the area of IoT-based hydroponic smart farms
to future researchers.
1 Introduction
Farming is the eldest means of support in the history of mankind. With global popu-
lation explosion, high urbanization rate, climate change and resource crisis, the gap
between demand and supply of food is widening at a significant and alarming rate.
Total arable land area used for agriculture has reduced from 39.47% in 1991 to
37. 73% in 2015 [1]. This has put the agriculture industry under immense pres-
sure to produce more food even when the available resources are declining. Luckily,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 285
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_22
286 S. Jain and M. Kaur
2 Related Work
Related surveys by Srilakshmi et al. [5], Elijah et al. [6], Farooq et al. [7], Raneesha
et al. [8] and Farooq et al. [9] have been published in the past few years in the use of
IoT in the agriculture domain.
Srilakshmi et al. [5] analyzed the three major implementations of IoT in smart
agriculture as irrigation system automation using Smart Irrigation Decision Support
System (SIDSS), use of IoT in detecting Nitrate levels in the soil through the use of
planar-type interdigital sensors and the use of IoT in precision agriculture to sense
all the environmental parameters remotely and maintain them remotely. Elijah et al.
[6] identified five broad categories of the applications of IoT in the agriculture sector
as: (a) monitoring and maintenance of farm and livestock, (b) tracking and tracing of
parameters, (c) IoT-assisted agricultural machinery, (d) IoT-based precision agricul-
ture and (e) greenhouse production. The study also discussed already existing IoT
solutions for agriculture in the market and compared them based on their features.
Farooq et al. [7] discussed three main IoT applications in agriculture as follows: (a)
precision farming which includes climate and irrigation monitoring, disease moni-
toring, farm management, etc.; (b) livestock management which includes animal
health monitoring, GPS-based monitoring, heat stress level, etc.; and (c) greenhouse
monitoring which includes weather monitoring, water management, plant monitoring
and agricultural drones. In this article, the authors also identified over 60 smartphone
applications for agricultural practices along with their features. Raneesha et al. [8]
reviewed 60 scientific publications of recent IoT applications in the farming sector
based on their sensors, used technology and IoT sub-vertical. The study identified that
A Comparative Study of IoT-Based Automated Hydroponic Smart … 287
water and crop management are the sub-verticals with the highest ratios and livestock
and irrigation management with the lowest ratios for making the use of IoT. Environ-
ment temperature was the most commonly used sensor, and Wi-Fi is being the most
used communication technology. Farooq et al. [7] conducted a review of research
work published during the years 2015–2020 on the IoT-based farming domain. The
study reviewed existing literature of IoT frameworks for water management, intel-
ligent irrigation systems, crop monitoring, pest control, etc., by highlighting their
scope and methodology along with a comparison of their architecture and pros and
cons.
The aim of this study is to properly recognize current advancements of IoT for the
automation of smart farms, and the study has been done carefully by examining,
comparing and analyzing the published work in the domain. This comprehensive
study acts as a resource to support the understanding of the basics of IoT in smart
agriculture in a very precise way and helps future researchers to conduct further
research in the domain. A thorough and systematic review of the existing literature
is conducted to meet the aim of this study.
To identify vital literature, following keywords were searched in the scholarly
databases: “Smart Farming” OR “IoT Farming” OR “Precision Agriculture” OR
“IoT in Hydroponics” OR “Indoor Farming” OR “Automated Indoor Farms” OR
“Automated Irrigation” OR “Automated Nutrient Control”.
In total, 64 papers were identified through the keyword search. After identifying
papers, the first level of paper selection was on the basis of title, abstract and text,
and then, the selected papers were shortlisted again based on the type of indoor farm,
technology used, type of automation implemented. With this two-step shortlisting
procedure, 38 were selected for this study.
Based on the existing research work, three broad categories of IoT-based smart
hydroponic farms have been formulated and discussed in this paper, namely, IoT-
based farm climate monitoring and control systems, IoT-based automated irrigation
systems and IoT-based nutrient control systems.
Jaiswal et al. [10] designed a prototype for a fully automated greenhouse for
hydroponics and vertical farming with security provisions using machine learning.
The greenhouse farm is equipped with sensors to measure the parameters which are
sent to the cloud for decision-making. When the temperature, humidity and light
intensity reach their threshold values, automatic actions are taken by the system
288 S. Jain and M. Kaur
to control the parameters. It is equipped with the security feature through facial
recognition, and livestream surveillance is available to users at all times.
Dholu et al. [11] proposed to implement precision agriculture using cloud-based
IoT application by sensing all the required parameters and controlling the actuations.
The proposed system contains sensors which collect data of soil moisture level,
temperature and humidity of the farm and send it to the microcontroller unit which
forwards the data to the ThingSpeak platform for data analysis. Users can see the
current sensor reading through a mobile application which fetches data from the
cloud.
Bhojwani et al. [12] proposed an IoT-based model to monitor and analyze different
parameters affecting the growth and production of crops in an agricultural field.
Sensors are used to send live weather and soil condition data to the cloud server
where they are analyzed and presented to the user in the form of graphs. Proposed
model helps the farmers in deciding the ideal crop for farming by analyzing the
weather and soil conditions of the field.
Doshi et al. [13] developed an IoT-based plug and sense, portable prototype
powered by a power bank notification system to send farm updates to remote farmers.
Greenhouse was equipped with the prototype, and ESP32s takes readings from the
sensor, sends it to the cloud for notification and then goes to sleep for 18 min. LEDs
are also used to notify the farmers.
Herman et al. [14] proposed a monitoring and controlling system for hydroponics
based on IoT and fuzzy logic to control the parameters for precision agriculture.
Data from sensors will be read by the Arduino Uno, and with the application of
fuzzy logic, a decision will be made based on the range of values determined for the
parameter. Output of the fuzzy logic will determine the time duration to turn on the
tap valve for pH tube, nutrition water and open/close para net curtain.
Palande et al. [15] built a cost-effective and completely automated hydroponic
system which requires no human interaction for the control of parameters of plant
growth. The system consists of two Arduino nodes and Raspberry Pi as the main
controller. Nodes collect the data from the sensors and send it to a gateway connected
to Raspberry Pi. It makes all the data available on the web interface which can
also take user inputs. The system controls the growth parameters automatically and
informs the user of any abnormal data from sensors.
Aliac et al. [16] designed an integrated IoT system for surveillance and main-
tenance of the hydroponic garden with the aim to provide the perfect habitat for
the plants to grow. Raspberry Pi controls the data received from the sensors in real
time and uploads into the firebase database and takes necessary actions based on the
commands received in return to maintain the ideal growth conditions and control
the irrigation and nutrient solution intake for the crops. Fully functional automated
system was built where a web application monitored and controlled the drainage
system, fan, sprinkler and water pump of the farm and a warning system was built to
notify users for the farm conditions, and the recommended conditions for crop types
are displayed on the application.
Nurhasan et al. [17] implemented an automated water level monitor and control
using fuzzy Sugeno and website application. The system obtains data from the sensors
A Comparative Study of IoT-Based Automated Hydroponic Smart … 289
6 Conclusion
IoT empowers hydroponics and makes urban farming and precision agriculture attain-
able. With the implementation of IoT, the precision agriculture aspect of hydroponics
can be automated. This will lead to intelligent use of available resources and mini-
mize the resource wastage associated with the human intervention in the farm. The
290
Table 1 Comparative study of functionality of 38 hydroponic and IT smart systems
References Sensing Automation Automation Mobile/web
Weather (air) Irrigation and nutrients (water) Weather Irrigation Nutrient level application
291
292
Table 1 (continued)
References Sensing Automation Automation Mobile/web
Weather (air) Irrigation and nutrients (water) Weather Irrigation Nutrient level application
293
294 S. Jain and M. Kaur
conclusion drawn from the study is that IoT is an evolutionary technology and when
implemented with hydroponics can open a new level of urban farming. This research
study brings forth a thorough literature survey on the existing IoT which enables
hydroponic farms and highlights that the need for an IoT-based hydroponic farm
with automation for climate monitoring and control, irrigation control and nutrient
manipulation and control. A system with all the before-stated automations will be
fully automated farm in truest sense and will have the capabilities of revolutionizing
urban farming.
A Comparative Study of IoT-Based Automated Hydroponic Smart … 295
7 Future Scope
From the analysis of farms presented in Sect. 5, following research gaps were
identified:
• Less than 50 percent farms were automated with weather monitoring and control.
• Less than 25% systems were equipped with automated nutrient control.
• Further, less than 50% of the systems were able to implement full automation of
their respective verticals, and yet no systems had provisions for all three cate-
gories of automation, i.e., weather automation, irrigation automation and nutrient
automation identified in Sect. 4.
Thus, it is self-evident that an indoor hydroponic farm with complete automations
for all levels has not been designed yet. Thus, to be capable of promoting urban
farming using indoor hydroponic plants and make the use of available resources
judiciously, the research gap in the existing work needs to be bridged and a hydroponic
system which is automated on all levels of its day-to-day activity is the pressing
priority.
References
11. Dholu M, Ghodinde KA (2018) Internet of things (IoT) for precision agriculture application.
In: 2018 2nd international conference on trends in electronics and informatics (ICOEI). https://
doi.org/10.1109/icoei.2018.8553720
12. Bhojwani Y et al (2020) Crop selection and IoT based monitoring system for precision agri-
culture. In: 2020 international conference on emerging trends in information technology and
engineering (Ic-ETITE). https://doi.org/10.1109/ic-etite47903.2020.123
13. Doshi J et al (2019) Smart farming using IoT, a solution for optimally monitoring farming
conditions. Procedia Comput Sci 160:746–751. https://doi.org/10.1016/j.procs.2019.11.016
14. Herman, Surantha N (2019) Intelligent monitoring and controlling system for hydroponics
precision agriculture. In: 2019 7th international conference on information and communication
technology (ICoICT). https://doi.org/10.1109/icoict.2019.8835377
15. Palande V et al (2018) Fully automated hydroponic system for indoor plant growth. Procedia
Comput Sci 129:482–488. https://doi.org/10.1016/j.procs.2018.03.028
16. Aliac CJG, Maravillas E (2018) IOT hydroponics management system. In: 2018 IEEE 10th
international conference on humanoid, nanotechnology, information technology, communica-
tion and control, environment and management (HNICEM). https://doi.org/10.1109/hnicem.
2018.8666372
17. Nurhasan U et al (2018) Implementation IoT in system monitoring hydroponic plant water
circulation and control. Int J Eng Technol 7(4.44):122. https://doi.org/10.14419/ijet.v7i4.44.
26965
18. Ani A, Gopalakirishnan P (2020) Automated hydroponic drip irrigation using big data. In: 2020
second international conference on inventive research in computing applications (ICIRCA).
https://doi.org/10.1109/icirca48905.2020.9182908
19. Bharti NK et al (2019) Hydroponics system for soilless farming integrated with android appli-
cation by internet of things and MQTT broker. In: 2019 IEEE Pune section international
conference (PuneCon). https://doi.org/10.1109/punecon46936.2019.9105847
20. Bhattacharya M et al (2020) Smart irrigation system using internet of things. In: Applications
of internet of things lecture notes in networks and systems, pp 119–129. https://doi.org/10.
1007/978-981-15-6198-6_11
21. Changmai T et al (2018) Smart hydroponic lettuce farm using internet of things. In: 2018 10th
international conference on knowledge and smart technology (KST). https://doi.org/10.1109/
kst.2018.8426141
22. Cho WJ et al (2017) An embedded system for automated hydroponic nutrient solution
management. Trans ASABE 60(4):1083–1096. https://doi.org/10.13031/trans.12163
23. Chowdhury MEH et al (2020) Design, construction and testing of IoT based automated indoor
vertical hydroponics farming test-bed in Qatar. Sensors 20(19):5637. https://doi.org/10.3390/
s20195637
24. Sen D, Dey M, Kumar S, Boopathi CS (2020, June) Smart irrigation using IoT. Int J Advan
Sci Technol 29(4s):3080–9. http://sersc.org/journals/index.php/IJAST/article/view/22479
25. Domingues DS et al (2012) Automated system developed to control PH and concentration of
nutrient solution evaluated in hydroponic lettuce production. Comput Electron Agric 84:53–61.
https://doi.org/10.1016/j.compag.2012.02.006
26. Dudwadkar A (2020) Automated hydroponics with remote monitoring and control using IoT.
Int J Eng Res Technol 9(6). https://doi.org/10.17577/ijertv9is060677
27. Gori A et al (2017, Sept) Smart irrigation system using IOT. Int J Advan Res Comput Commun
Eng 6(9):213–216
28. Kaburuan ER et al (2019) A design of IoT-based monitoring system for intelligence indoor
micro-climate horticulture farming in Indonesia. Procedia Comput Sci 157:459–464. https://
doi.org/10.1016/j.procs.2019.09.001
29. Lakshmanan R et al (2020) Automated smart hydroponics system using internet of things. Int
J Electr Comput Eng (IJECE) 10(6):6389. https://doi.org/10.11591/ijece.v10i6.pp6389-6398
30. Lakshmiprabha KE, Govindaraju C (2019) Hydroponic-based smart irrigation system using
internet of things. Int J Commun Syst. https://doi.org/10.1002/dac.4071
A Comparative Study of IoT-Based Automated Hydroponic Smart … 297
31. Mahale RB, Sonavane SS (2016) Smart poultry farm monitoring using IoT and wireless sensor
networks. Int J Advan Res Comput Sci 7(3):187–190. http://www.ijarcs.info/index.php/Ijarcs/
article/view/2665
32. Mahendran M et al (2017) Implementation of smart farm monitoring using IoT. Int J Current
Eng Sci Res (IJCESR)
33. Mehboob A et al (2019) Automation and control system of EC and PH for indoor hydroponics
system. IEEC
34. Mohanraj I et al (2016) Field monitoring and automation using IOT in agriculture domain.
Procedia Comput Sci 93:931–939. https://doi.org/10.1016/j.procs.2016.07.275
35. Montoya AP et al (2017) Automatic aeroponic irrigation system based on arduino’s platform.
J Phys: Conf Ser 850:1–11. https://doi.org/10.1088/1742-6596/850/1/012003
36. Munandar A et al (2018) Design and development of an IoT-based smart hydroponic system.
In: 2018 International seminar on research of information technology and intelligent systems
(ISRITI). https://doi.org/10.1109/isriti.2018.8864340
37. Pallavi S et al (2017) Remote sensing and controlling of greenhouse agriculture parameters
based on IoT. In: 2017 international conference on big data, IoT and data science (BID). https://
doi.org/10.1109/bid.2017.8336571
38. Perwiratama R et al (2019) Smart hydroponic farming with IoT-based climate and nutrient
manipulation system. In: 2019 international conference of artificial intelligence and information
technology (ICAIIT). https://doi.org/10.1109/icaiit.2019.8834533
39. Ramachandran V et al (2018) An automated irrigation system for smart agriculture using the
internet of things. In: 2018 15th international conference on control, automation, robotics and
vision (ICARCV). https://doi.org/10.1109/icarcv.2018.8581221
40. Deepika S et al (Jan 2020) Enhanced plant monitoring system for hydroponics farming
ecosystem using IOT. GRD J Eng 5(2):12–20
41. Shekhar Y et al (2017) Intelligent IoT-based automated irrigation system. Int J Appl Eng Res
12(8):7306–7320. https://www.ripublication.com/ijaer17/ijaerv12n18_33.pdf
42. Siddiq A et al (2020) ACHPA: A sensor based system for automatic environmental control in
hydroponics. Food Sci Technol 40(3):671–680. https://doi.org/10.1590/fst.13319
43. Valiente FL et al (2018) Internet of things (IOT)-based mobile application for moni-
toring of automated aquaponics system. In: 2018 IEEE 10th international conference on
humanoid, nanotechnology, information technology, communication and control, environment
and management (HNICEM). https://doi.org/10.1109/hnicem.2018.8666439
44. Van L-D et al (2019) PlantTalk: a smartphone-based intelligent hydroponic plant box. Sensors
19(8):1763. https://doi.org/10.3390/s19081763
45. Velmurugan S et al (2020, 2 May) An IOT-based smart irrigation system using soil moisture
and weather prediction. Int J Eng Res Technol (IJERT) Electr–2020 8(7)
46. Yolanda D et al (2016) Implementation of real-time fuzzy logic control for NFT-based hydro-
ponic system on internet of things environment. In: 2016 6th international conference on system
engineering and technology (ICSET). https://doi.org/10.1109/icsengt.2016.7849641
Smart Healthcare System Based on AIoT
Emerging Technologies: A Brief Review
Abstract The artificial intelligence Internet of Things (AIoT), one of the most rising
topics, has become a hot topic of conversation all the times. In recent years, it has been
successful in attracting the attention of many people. Many researchers have focused
now their research on AIoT; however, in health care, the research is still in the Stone
Age. Smart Healthcare System is nothing new but introducing an innovative and more
helpful version of traditional medical facilities, artificial Intelligence, IoT, and cloud
computing which will surely take our healthcare system to new heights. This new
technology is getting more important with increasing days as it is more convenient
and more personalized for both doctors and of course for patients too. Role in the
healthcare domain is also mandatory, as it focuses on intrabody monitoring services
as well as maintains the healthcare records of patients. In the introductory part of
this paper, the discussion is made on the great use of Smart Healthcare Systems by
humans. In a brief literature survey, ideas about various techniques proposed in smart
health care are presented. The later section expounds on various sensors used in smart
health care and is of great benefit for old-age people. Sensors like nap monitors or
breath monitors also help people to take care of their health and reduce the risk of
any type of health issues in the future. The existing challenges with the use of AIoT
in smart health care are discussed along with their applications favoring real-life
examples used in the healthcare industry which are also part of this research paper.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 299
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_23
300 C. Prabha et al.
1 Introduction
Can machines think? A question raised by Alan Turing in 1950 during the Second
World War to break German’s Computer Code gave a great idea of how AI improves
the living standards of an individual. In the journey of answering this question, giving
thought that how the Smart Healthcare System can be considered a milestone turned
in favor of human evolution. The innovative idea of smart health care was born out
of the concept of “Smart Planet” proposed by IBM (Armonk, NY, USA) in 2009 [1].
Smart health care is a health service system that uses technology such as wearable
devices, IoT, and mobile internet to dynamically access information [1]. The idea
of AIoT came into existence when artificial intelligence was tried to implement in
combination with the Internet of Things (IoT). AIoT establishes itself in various
aspects of everyday life by providing the users an ambient living, working, and
domestics [2]. AIoT is flourishing its wings in e-health. In health care, AI along
with IoT is used to improve the methods of treatment, for doing predictive analysis
of disease, to monitor patients in real-time, and for patient care and medication.
Smart Healthcare System is a relatively recent concept, whose diffusion has been
rapidly increasing in the last years [3]. There are still many nations in the world
that still need to go far to satisfy the condition of the WHO to have at least one
doctor over thousand patients. So, this new technology can resolve today’s one of
the biggest problems in a very efficient manner. For the improvement of the quality
of healthcare services provided by government or private hospitals and in reducing
the burden on health professionals, smart health applications will act as a catalyst
[4]. In the healthcare field, it can promote interaction between all parties, help the
parties make informed decisions, ensure that participants get the services they need,
and facilitate the rational allocation of resources [1].
Various studies by other prominent authors have been discussed in Sect. 2 as a
part of the literature survey. Section 3 of the paper discusses various sensors used in
different types of devices to achieve a Smart Healthcare System. Section 4 discussed
the challenges faced in smart health care that we need to overcome so that no problems
could be created in the future due to this technological development. Some of the
applications of AIoT are discussed in Sect. 5. Finally, Sect. 6 concludes the research
paper and discussed the doors for future scope.
Smart Healthcare System Based on AIoT Emerging Technologies: … 301
2 Literature Survey
Many research papers, journals, and magazines have already discussed the concept of
a Smart Healthcare System using AIoT. Some of the studies done by various authors
have been discussed below.
Tian et al. [1] give the great view that smart health care is a boon to today’s
scenario. This research paper makes us aware of the born-out idea of smart health
care from the smart planet. This paper mainly focuses on applications of the Smart
Healthcare System. It also introduces the concept of Smart Hospitals and virtual
assistance. Then, it discusses various problems and solutions associated with the
innovative new idea of medical facilities.
Priyadarshini et al. [2] explained various technologies for IoT. The work in this
paper focuses on devices that gather and share information directly with every other
connected cloud. This paper is also focusing on Internet of Nano-Things (IoNT) and
Internet of Bio-Nano Things (IoBNT). This study does not comprise any profound
concept of overcoming the challenges talked about in the research paper.
Ivan-Garcia et al. [5] emphasize the fact that wearable technology and applications
improve the quality of life. It gave an idea that the new technology implemented in
daily life can evaluate the risk of disease in the future by keeping a regular record
of our physical activity and symptoms and warning us to take preventive measures.
Mobile applications can also keep track of our emotional evaluation.
Shweta et al. [6] explained the IoT in big healthcare data. This research mainly
prioritizes the usage of IoT and machine learning for disease prediction and diagnosis
systems for health care. In the world, death is caused by various diseases and some of
those diseases can be rehabilitated if the problem is identified earlier. The chance of
death can be reduced. This work gathers an analysis based on data collection. Based
on the survey, the work in this paper discusses various diseases like early detection of
dementia, brain tumor, heart diseases, and the IoT model for predicting lung cancer.
The work in this paper also focuses on healthcare applications of ML and IoT and
briefly describes and also talks about their advantages. The study in this paper does
not describe the disadvantages of devices used for health care.
Chander et al. [7] provides details about case studies and success stories on
machine learning and data mining for cancer prediction. Among the copious diseases,
cancer still stays on the top of the list for the deadliest disease to cure, and an early
prediction can be helpful for rehabilitation. To analyze that, this paper proposed
machine learning and data mining, and the techniques can work with unusual opti-
mization procedures to classify the patterns on the other hand data mining, in brief,
which involve finding patterns in data. Discussed various kinds of cancer in the body
for both males and females and their detection. Also, includes a comparative anal-
ysis of work done by various researchers on various types of cancers. Issues and
challenges are also a part of discussion but no substantial concept to overcome those
challenges is presented.
Ganapathy et al. [8] show confidence in drones used in health care and to provide
various facilities. This research paper mainly focuses on how drones ease the process
302 C. Prabha et al.
Figure 1 shows how the data is collected from the sensors on the devices and then
further processed and stored in the cloud for analytic processing for a future prescrip-
tion from past reports which are also helpful for doctors for diagnosis. The data which
was stored on the cloud also sends real-time alerts and healthcare reports onto the
devices.
Table 1 shows some of the sensors used in health care. AIoT is nothing new
but a traditional way of problem-solving with innovative ideas. The applications
of AIoT are using already invented ideas fitted with new invented and intelligent
sensors which give new wings to problem-solving. Different sensors are designed
for different purposes. Some sensors can detect changes in the environment and give
the required amazing results.
4 Challenges
renovation to satisfy current global necessities. Dealing with the cost issues of the
Smart Healthcare System is the biggest challenge.
Limited thinking: Many times, doctors opt for a treatment that is not acceptable
to laws and rules of medical science and pop out with a declaration that claims “It’s
unbelievable, it’s a miracle!!!” But, the machines cannot think out of the field and
treat every different patient in the same way they are trained for. AI machines cannot
be innovative and ingenious as they cannot beat the power of human intelligence.
Without creating machines that can sincerely think like humans, smart health care is
barely viable.
No feelings and emotions: For suitable treatment and medical help, a healthy
doctor–patient relationship is necessary. A doctor cannot treat his patient unless he
cannot feel his pain. The doctor has to introduce his different forms while treating
the patient. Sometimes hard as a stone and sometimes soft as a flower. AI machines
can be excellent performers, but still, it does not have the feeling. This results in
non-emotional attachment with humans and may sometimes be noxious to users if
the right care is not taken.
Increase dependency on machines: Human beings need to apprehend and take
delivery of the reality that extra of the entirety is awful. If this practice of dependency
on machines for even small tasks exists, then in the future the glorious solar will set
no matter how high we touch the height of technology and development. Technology
is there to assist us, not to make us lazy. So, creating stability between healthful
lifestyles and smart destiny is vital. Technological development is amazing, and it
handiest goes wrong when machines remind you of how powerful you are.
304 C. Prabha et al.
Security and privacy: The ubiquitous connected sensors [15] collect massive
user data. Through IoT networks, this data is transmitted and stored in the cloud.
The biometric information is contained within this data which adds a level of security.
Data leakage may occur due to ciphertext on the AIoT system as in the universe, data
doubles every two years, and quintillions of bytes of data are generated every day
[16]. So, data security and privacy are crucial areas of concern in AIoT applications.
Multi-model data: In health care, AIoT consists of a large number of hetero-
geneous sensors, generating a huge stream of data of numerous formats and sizes
(multi-model data), thereby challenging further their processing, storage, and trans-
mission [17]. This results in transmission delay at the cloud/edge/fog node of the
AIoT system, and managing and increasing the efficiency are still a challenge.
Table 1 (continued)
Sensors Functions and features
Heart rate monitor • These monitors perform by calculating electrical signals
from your heart. They are transmitted to a wristwatch or
data center
• Many models analyze data via a computer and are having
that data allows you to interpret your workout and better
understand the benefits of exercise done
• Hard estimate can be gotten with the aid of monitoring
your pulse the old-fashioned way, feeling it in your wrist or
neck, but that can be disruptive to your actual workout
• A digital heart rate monitor can give you precise,
genuine-time records
• Heart rate monitors used in hospitals are wired and contain
multiple sensors. Consumer heart rate monitors are for
everyday use and do not contain wires
• Modern heart rate monitors use either optical or electrical
signals to record heart signals
• They are usually 90% accurate in their readings, but
sometimes more than 30% error can persist for several
minutes
Calorie counter • Counting calories can assist you to lose weight via
bringing awareness to what you orally consume each day
• This can also act as a helping hand that can guide you
about the ingesting patterns that you may need to alter,
retaining you on course to lead a healthy life
Pulse oximeter • It is an affordable sensor used to quantify the oxygen
saturation degree of hemoglobin in the blood
• The sensor is very easy to use as it is simply positioned on
the fingertip and the photodetector in the sensor measures
the intensity of transmitted light at each wavelength; thus,
oxygen quantity is calculated within the blood
• It is a very quick device as it gives results within 5 s. It has
a bright screen for easy readability. The device can give an
incorrect reading when indicates a low battery sign on the
screen [10]
CT/MRI/ultrasound scanner • Helpful examining the X-ray images of body components
using X-rays, electromagnetic waves, or magnetic waves
• CT scan is not suggested during pregnancy. It can also
cause a small dose of radiation, but it is not harmful. Ct
scanner takes 5 min while MRI takes approximately
30 min [11]
Breathe monitor • The respiratory steps, i.e., inhaling and exhaling flow of air
in the trachea can be examined by breathe monitor
• It is placed on the throat to detect the breathing frequency
and alert the doctors for sudden clinical response. It is
helpful in cases where death occurs due to late detection of
respiratory deterioration
• Spire (the breathing monitor device) has a battery life of
around 7 days and is charged wirelessly, using a special
pad that the device can simply be placed upon, negating
the need for extra wires and plugs [12]
(continued)
306 C. Prabha et al.
Table 1 (continued)
Sensors Functions and features
Nap monitor • It uses accelerometers to track the quantity and quality of
sleep
• It consists of small motion detectors. It measures the
movement a person makes while he is asleep
•High Cost
Challenges •Limited Thinking
faced in •No Feelings and Emotions
Smart
HealthCare
System
•Security and Privacy
Challenges •Multi-Model Data
faced in •Increase Dependency on Machines
Smart
HealthCare
System •Managing Computational Power at Edge Devices
•Computational Scheduling
Challenges
faced in •Data Monolpoly
Smart •Energy Consumption at Data Centres
HealthCare
System
Managing computational power at edge devices: The edge devices are having
the limitation of limited computational and storage power. The real-time deployment
of models on these devices is crucial. Further, on data, network pruning, quantization,
and compression need to be done that still a challenge.
Computational scheduling: A computational scheduling challenge may occur
in real-world AIoT systems, as from the edge devices to the cloud center or fog
node, some deep computation might be needed to offload. Further, to deal with
user requirements over time and with unbalanced data, a more robust and dynamic
scheduling strategy would be needed [18].
Data monopoly: Most data collected in AIoT is unlabeled. It would be expensive
and time-consuming to label unlabeled data [19]. A lot of advancements have been
done in unsupervised learning in terms of self-supervised learning for leveraging
AIoT multi-model data and providing solutions to challenges of rare cases and new
classes of healthcare problems. Data monopoly might be another challenge in the
AI era, as AIoT companies providing AIoT support in smart health care, restrict
market entry to new competitors. This challenge faced by new entities is due to
the non-accessibility and non-availability of vast data collected/protected by already
established parties.
Smart Healthcare System Based on AIoT Emerging Technologies: … 307
The applications of AIoT are uncountable. In every field, AIoT has spread its roots.
It plays a momentous role in a wide range of healthcare applications as it allows
patients and the elderly to live self-sufficiently [4]. The healthcare industry remains
reckless to adopt AIoT, and the reason to adopt this drift is due to the integration
of AIoT features into medical devices. This greatly improves the quality of life
and effectiveness of the service, especially in cases of elderly patients with chronic
conditions and who require regular supervision. Moreover, in case minor problems
persist or are identified, the IoT system itself can advise patients accordingly [21].
AI has proven to be a boon for the healthcare industry.
Some of the applications of AIoT are presented in this section, and Fig. 3 gives
the pictorial idea of the applications of AIoT in the healthcare system. It shows that
the health data is collected from the person using wearable devices or devices used in
hospitals, and then, the cloud gateway streams the collected data. The unstructured
data is processed in the data lake. Then, the data warehouse stores the processed
data. Further, processing of data is done using machine learning (ML) algorithms
and ML models, analysis is done, and the cloud server processes all the information
in the form of Electronic Health Records (EHRs), which is further analyzed and
inspected by medical staff, patient, and admin interface showing the transparency
in the system. AI comprises machine learning (ML), neural networks (NNs), and
deep learning (DL). The ML is further divided into supervised, (a) unsupervised,
(b) semi-supervised, and (c) reinforcement learning. The artificial neural networks
(ANNs) are trained in deep learning and further categorized into convolution neural
networks (CNNs) and recurrent neural networks (RNNs). All the above approaches
are used in health care to make systems more robust and as per patients’ needs to
develop smart healthcare instruments to be managed either remotely or at a specific
doctor’s/hospital location. AI along with IoT is used in the following areas of health
care as shown in Fig. 4. The further section discusses various applications, where
AIoT is used as a technology.
Wearable technology: Wearable technology has boosted several solutions to
improve the health and quality of people’s lives. Wearable belts are of great use.
It helps to monitor physical activity by counting steps during the day and ensure that
308 C. Prabha et al.
the wearer’s activity keeps the heart rate in a non-risky and healthy range by tracking
the heart’s activity [5].
Blood pressure monitoring system: Irregular downfall or sudden rise in blood
pressure is one of the biggest problems people are facing these days. There are many
popular blood pressure devices that are safe and simple to use [22]. To collect the
patient’s BP in real time, an electronic BP monitor is attached to an IoT sensor. It
is of great advantage to both doctors and patients as it provides quick service in the
condition of a medical emergency.
Wheelchair management: The AIoT comes out with a very innovative solution,
i.e., Smart Wheelchairs [23]. They are genuinely considered to be of great help for
elderly and incapacitated individuals who cannot move their chairs genuinely, rather
they will be given the benefit of moving them through hand gestures.
Smart Healthcare System Based on AIoT Emerging Technologies: … 309
The AIoT holds the promise of ameliorating life quality of people. If technology is
at the next level today, then it is only because of remarkable work done in the area
of AI along with the IoT. Now, we are aiming to achieve much more like this as
technology is considered a great growling engine of change. This paper surveys how
AIoT when implemented in traditional medical facilities became a major benefactor
to our medical distribution system, and no doubt, we came up with the idea of a Smart
Healthcare System. To provide health care to everybody at all times and anywhere,
smart health care plays an active role and simultaneously increases its attention in
society by keeping quality high. The main aim of this review paper is to give an
idea about the Smart Healthcare System, the challenges faced, the solutions to the
problems which are yet to be discovered, and many more. In short, it is considered
a stepping stone for those researchers who are interested in the field of AIoT, and
particularly, a smart and intelligent way of dealing with medical health issues of
generations.
310 C. Prabha et al.
References
1. Tian S, Yang W, Le Grange JM, Wang P, Huang W, Ye Z (2019) Smart healthcare: making
medical care more intelligent. 3(3):62–65
2. Priyadarshini S, Sahebzathi S, Narmatha M, R Delshi Howsalvya Devi (2016, July) The internet
of things in healthcare: a survey. GRD J|Global Res Dev J Eng|Int Conf Innovations Eng Technol
(ICIET). 559–564. e-ISSN: 2455-5703
3. Marr B (2018, May) How much data do we create every day? The mind-blowing stats everyone
should read. Forbes
4. Sanghavi J (2019) Review of smart healthcare systems and applications for smart cities. In:
ICCCE 2019. Springer Singapore
5. Magarino I-G, Sarkar D, Lacuesta R. Wearable technology and mobile applications for
healthcare. 2019:6247094. published 21 May (2019)
6. Agarwal S, Prabha C (2021) Diseases prediction and diagnosis system for healthcare using IoT
and machine learning, smart healthcare monitoring using IoT with 5G: challenges, directions,
and future predictions. CRC Press, pp 197–228
7. Prabha C, Sharma G (2021, Dec) Applications of machine learning in cancer prediction and
prognosis, cancer prediction for industrial IoT 4.0: a machine learning perspective. CRC Press
(Taylor & Francis). eBook ISBN 9781003185604
8. Ganapathy K (2022) Drones in healthcare, issue 47. https://www.asianhhm.com/healthcare-
management/drones-in-healthcare
9. Kaur G, Gupta M, Kumar R (2021) IOT-based smart healthcare monitoring system: a systematic
review. 25(1):3721–3728
10. Kapoor A. Pulse oximeter buying guide: Price, how to buy the right oximeter types and more.
https://timesofindia.indiatimes.com/most-searched-products/health-and-fitness/buying-guide/
pulse-oximeter-buying-guide-features-price-how-to-buy-the-right-one-more/articleshow/799
92362.cms. Accessed 13 Jan 2022
11. Fletcher J (2019, Oct) What is the difference between CT scans and MRI scans? https://www.
medicalnewstoday.com/articles/326839
12. McIntosh J (2014) https://www.medicalnewstoday.com/articles/278847, June 26
13. Zhang K, Ni J, Yang K, Liang X, Ren J, Shen X (2017) Security and privacy in smart city
applications: challenges and solutions. IEEE Commun Mag 122–129
14. World Bank. Poverty and inequality platform. https://data.worldbank.org/indicator/SI.POV.
NAHC
15. Yang Q, Liu Y, Chen T, Tong Y (2019) Federated machine learning: concept and applications.
ACM Trans Intell Syst Technol 10(2):1–19
16. Nazir S, Ali Y, Ullah N, García-Magariño I (2019) Internet of things for healthcare using effects
of mobile computing: a systematic literature review. Wirel Commun Mobile Comput. https://
doi.org/10.1155/2019/5931315
17. Duan L-Y, Liu J, Yang W, Huang T, Gao W (2020) Video coding for machines: a paradigm of
collaborative compression and intelligent analytics, arXiv preprint arXiv:2001.03569
18. Jing L, Tian Y (2020) Self-supervised visual feature learning with deep neural networks: a
survey. IEEE Trans Pattern Anal Mach Intell
19. He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual
representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and
pattern recognition (CVPR), pp 9729–9738
20. Yuan H, Bi J, Zhou M, Liu Q, Ammari AC (2021) Biobjective task scheduling for distributed
green data centers. IEEE Trans Autom Sci Eng 18(2):731–742. Art no 8951255
21. Zhang J, Tao D (2020, Nov) Empowering things with intelligence: a survey of the progress,
challenges, and opportunities in artificial intelligence of things. IEEE Internet Things J 20(10)
Smart Healthcare System Based on AIoT Emerging Technologies: … 311
22. Islam SMR, Kwak D, Kabir H, Hossain M, Kwak K-S (2015) The internet of things for health
care: a comprehensive survey. IEEE Access 3:678–708
23. Yash KS, Sharma T, Tiwari P, Venugopal P, Ravikumar CV (2020, June) A smart wheelchair
for health care systems. IJEET 11(4):22–29
24. Huh J-H, Choi J-H, Seo K (2021) Smart trash bin model design and future for smart city.
11(11):4810. https://doi.org/10.3390/app11114810
Intelligent Communication for Internet
of Things (IoRT)
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 313
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_24
314 K. U. Bhat et al.
1 Introduction
With the use of Internet of Things (IoT) technology, consumers may make their
current equipment smarter and connect it to the Internet so that it can communicate
with other devices. The Internet of Things is growing immensely, and large number
of devices are getting associated with IoT making the industry enter $300 billion
business. IoT is the idea of connecting any contraption to the Internet and other asso-
ciated gadgets (as long as it has an on/off switch). IoT is a vast network of connected
humans and objects that all gather and share information about how they are used
and their environment. The usage of an IP address as a distinctive identification is
implied by this incorporation with the Internet by the devices. The idea is to manage
objects through the Internet, and Arduino is the ideal model for these applications
[1]. Technologies like sensors and actuators are widely used to integrate people,
process device in Internet of Things. In terms of communications, cooperation, and
technical analytics, this total integration of IoT with humans allows for real-time
decision-making. Robotics is one area where the technology of Internet of Things is
gaining traction. Robotics is a cutting-edge, rapidly expanding technology that has
radically altered various elements of human society in recent decades.
Robotics generally perform tasks which are tedious and repetitive in nature; also,
dangerous and critical tasks which are unaffordable by a human being are performed
by robots. Initially, the robots utilized in these applications were single machines
with hardware and computational capabilities that limited them. To address these
concerns, the robots were first associated in a correspondence organization [2], either
wired a or remote association, resulting in the creation of a Networked Robotic
System. Nowadays, the concept of “Cloud Robotics” is evolving which is another
type of proficient mechanical frameworks that depends of “Distributed computing”
foundation to get to a huge volume of registering power and information to empower
its tasks. Contrary to networked robots, no one independent system is used for all
sensing, processing, and memory. There are many open issues and challenges faced.
Maintaining a balance between the real-time requirements of diverse situations and
performance precision is a difficult task [3], even for robot memory, because the
notion of cloud robotics is founded on real-time requirements. Because cloud storage
refers to data that is stored remotely, there is a requirement for better cloud security.
Other challenges that tend to stand in the way of the efficient performance of cloud
robots are Rapidity, Remoteness, Network, etc.
Intelligent Communication for Internet of Things (IoRT) 315
The Internet of Robotic Things is an idea in which events can be tracked by intelli-
gent devices, coordinate sensor information from a number of sources, utilize nearby
and circulated knowledge to sort out the fitting methodologies, and afterward act to
modify or control objects in the real world, sometimes while physically moving
through it. IoRT has many technologies such as sensors and actuators, communica-
tion technologies, processing, data fusion, modelling and dynamic mapping, virtual
and augmented reality, voice recognition, voice control, decentralized cloud [4],
adaptation, machine learning, end-to-end operation, and internet technologies safety
and security framework [5]. Robotic devices act intelligently in the sense that they
have monitoring (and sensing) capabilities while also being able to obtain sensor data
from other sources that are fused for the device’s “acting” purpose. The machine can
also use local and distributed “intelligence”, which is another “intelligent” feature.
In other words, it may analyze data from the events it watches (which, in many cases,
entails the use of edge computing or fog computing) and has access to analyzed data.
IoRT takes use of the IP protocol and its IPv6 version to achieve the benefits of
current communication and cloud-based interoperable technologies. By connecting
external resources to robots through the Internet, IoRT aims to increase Internet usage
[6]. IoRT delivers sophisticated robotic capabilities, allowing multidisciplinary solu-
tions to emerge for a variety of fields. The Robotic Internet Things is envisioned
as being built on top of the cloud*robotics*paradigm, utilizing characteristics of
cloud*computing such as cloud storage. Three service models and virtualization
technology, i.e., software, platform, and infrastructure, while taking advantage of The
Internet of Things (IoT) and its enabling technologies have the potential to enable
enormous amounts of data. To meet the provisional goal for networked robotics new
application design and implementation flexibility are suggested. As a core utility,
distributed computer resources are used. Other technologies, such as multi radio
access to link smart devices, artificial intelligence to provide optimized solutions
for difficult challenges, and intelligent systems to provide superior service, support
IoRT solutions in addition to IoT capabilities. The Internet of Robotic Objects (IoRT)
is positioned at the pinnacle of cloud robotics, combining IoT technology with the
independent and self-instructed behavior of associated automated things to foster
refined and creative arrangements that utilize appropriated assets.
2 Architecture
The physical layer comprises the bottom layer of the IoRT architecture. It comprises
the hardware part which is related to components, such as vehicles, robots, mobile
phones [7], healthcare equipment’s, home appliances, sensors, and actuators. Robots
work as perceptive agents, collaborative with each other to perform specific tasks
and achieve desired objectives in a distributed fashion. A robot can be described, for
example, a healthcare assistant which helps a person to reinstate the walking ability
and a “TUG” robot that is intended to independently move around a hospital to
provide supplies like medication and clean linens. The sensors installed in them are
used for the perception of data from the external environment, sense the movement of
objects, actions happening around while actuators perform the required actions like
turn on/off smart lights, air conditioning, etc. Sensors and actuators can both be incor-
porated into a single system to improve and optimize performance and accomplish
a shared objective through dispersed operations.
Network layer is the second layer arriving after physical layer. It provides for the
connectivity of the robotic systems such as cellular network connectivity including
3G and LTE/4G. Internet of Things (IoT) development will speed up thanks to 5G
technology [8]. To provide constant connectivity, some short-range technologies
are employed between the robotic objects, including WiFi, Bluetooth Low Energy
(BLE) [7], 6LoWPAN Broad Band Global Area Network (BGAN) [9], and Near-
Field Communication. Common long-range protocols are Sigfox, LoRa WAN, and
NB0IoT, while NB-IoT operates on a licensed spectrum with less interference than
Sigfox and LoRaWAN [5], which operate on unlicensed frequency bands. Zigbee
and 6LowPAN are based on IEEE 802.15.4. The Routing Protocol for Low Power
and Lossy Networks and 6LowPAN is popular IoT protocols used at the network
layer (RPL) [10]. For efficient information transmission among the robotic network
infrastructure located at a greater distance [10], LoRA has been implemented [1].
The IoRT architecture’s top layer, which handles end-user actions, is entirely depen-
dent on software implementations. Standard and user programs for monitoring,
processing, and managing environmental elements and agents are necessary to
achieve the objectives of the integrated Internet of Robotic Things. Sensors, actu-
ators, and robotics are installed and used in the smart environment at the service
and application layer. A variety of settings, such as commercial organizations, R&D
Intelligent Communication for Internet of Things (IoRT) 317
facilities, data centers, can use cloud computing. IoRT can be thought of as a field
with endless potential. Modern robotics must include machine learning techniques
because they may be applied to tasks like mapping, localization (knowing where a
robot is), and learning the environment.
3 Application Domains
3.1 Manufacturing
With the advancement of the Fourth Industrial Revolution, the manufacturing tech-
nology has developed responding to new inventive technologies creating high-quality
goods and services. It has resulted in smart manufacturing, which incorporates adapt-
ability, monitoring, and change. The Fourth Industrial Revolution’s increased digital
technology breakthroughs which must be expanded with improvements in produc-
tion methods and raw materials. Task scheduling in manufacturing has become more
adaptable and expandable with the arrival of cloud robotics. Robotic fleets are used
for planning, delivery, and the moving of various items like equipment and cartons.
Robots communicate with the cloud to obtain information about the industrial archi-
tecture and other robots already installed in the system. Robotics has adapted to
manufacturing uncertainties.
3.2 Agriculture
robots preventing from untimely harvests. There are numerous factors that affect
agricultural productivity, including the weather, poorly trained staff, and inadequate
farm management. Through the application of internet of robotic things technologies,
farmers may raise high-value crops. Self-guiding robotic drones use ground control
systems, GPS, image processing, and infrared cameras. Farmers employ drones as
a service (DaaS) to monitor fields, forecast crop yields, and identify insect infes-
tations. These robots come with a variety of sensors, including ones that measure
temperature, humidity, and crop hazards.
IoRT has several uses and offers health, societal, and economic benefits, especially
for patients who require specialized treatment, such as those with mental disabilities,
patients at higher risk for stroke. The future of edge computing includes applications
for virtual reality and wireless health monitoring [14] and robotics where a quick
response to sensor input is required. Robotics combined with sensors and Internet
of Things devices offers various benefits [4], by providing real-time health informa-
tion, identifying patient problems, lowering the likelihood of an incorrect diagnosis,
adjusting prescription dosages, etc. [15].
3.4 Education
Adaptable and intelligent tactics are applied by robotics in education domain in order
to build and maintain social interactions with humans. Robotics also provide support
services such as homework and learning. Electrodermal activity (EDA) is a shift in
skin reflectivity [16] that furthermore, because minors are unable to respond to these
impulses in the same way that adults do, the EDA feedback in children may differ
significantly from the average adult response.
3.5 Surveillance
To safeguard a specific region, IoRT systems are created and deployed providing rapid
and reliable information. Monitoring of sites and individuals is essential to reduce the
security risks in sensitive regions, military areas, public places, and common houses.
In order to reduce the limitation of blind areas, a number of CCTV cameras can be
installed outdoors as well indoors which cover most of the areas. A versatile field
monitoring robot was used to identify mines and hazardous chemicals. NodeMCU
stands for NodeMCU WiFi, which is used to connect controllers and develop robots
that can navigate and collect data on any terrain. The data assembled by the robot’s
Intelligent Communication for Internet of Things (IoRT) 319
sensors is shipped off cloud servers [17]. An embedded web server makes it simple
to monitor and control any device that is located in a remote location [18].
Every year, a number of catastrophes like typhoons, earthquakes, and tsunamis occur.
When a catastrophe of this size occurs, time is extremely valuable. The current
top priority is to save as many lives as possible, minimize casualties, and quickly
restore vital services. After receiving deep learning training, robots can be quite
helpful in these kinds of circumstances. The first step in achieving this is to utilize
robots to collect data about the immediate area. An external network expands on
these inputs to produce an AI model, in which the inside system then evaluates. For
extra performance testing, the prototype was also transferred from the cloud to the
local computer. Once the model reaches a certain level of reliability, it is eventually
implemented into the robots for the following stage of the learning process (Fig. 1).
4 Literature Survey
Craft is a flexible block cipher designed to defend against differentiable fault analysis
attacks. The 64-bit plaintext, 128-bit key, and 64-bit public tweak make up the algo-
rithm’s structure, which is based on involuntary building blocks. The round function
implements S-box, mixing columns, permutations based on data nibble positions,
constant addition, and tweaked addition processes [19].
Block cipher CHAM was only invented in 2017 by Korean experts. The algo-
rithm’s generalized four-branch FN structure is designed for IoT devices with limited
resources. The key schedule is quite easy to utilize [19].
Lilliput with Extended Generalized Feistel Network is a lightweight block cipher
that Mumthaz and Geethu optimized (EGFN). The suggested strategy has put in
place the PRESENT S-Box [7] and key schedule corresponding to the DES key
schedule. LILLIPUT uses an 80-bit key, a 64-bit block size, and 30 rounds, where a
nibble-level round function is acting [20].
By enhancing PRESENT, Subhadeep Baniket et al. proposed GIFT, a tiny, quick,
energy-efficient, and more secure SPN block cipher. Bit permutation is employed
for diffusion since the S-box of PRESENT is expensive. By combining permutation
with the Difference Distribution Table (DDT)/Linear Approximation Table (LAT)
of the S-Box, GIFT enhanced the PRESENT cipher. GIFT-64, with 28 rounds, and
GIFT-128, with 40 rounds, are the two variations of GIFT that have been proposed.
The key size for both variants is 128 bits. To improve resistance against linear and
differential cryptanalysts, linear hulls, and clustering effect, GIFT uses a smaller,
less expensive S-box than PRESENT [21].
Proposed SIT (Secure IoT) is a compact 64-bit symmetric key block cipher with
five rounds and a key size of 64 bits. SIT is a hybrid strategy that combines SPN
with Feistel structure. The suggested method incorporates a few logical processes
as well as some switching and replacement. Energy efficiency is increased by using
five distinct keys for five rounds of encryption. SIT algorithm uses a Feistel network
of substitution diffusion functions for confusion and diffusion. Key expansion block
makes use of a 64-bit cipher key supplied by the user. Key generation F-function is
based on modified KHAZAD [22].
By utilizing several guidelines and suggestions provided by the authors of,
Sufyan Salim Mahmood AlDabbagh [23] suggested a new lightweight Feistel
block cipher algorithm called Design 32-bit Lightweight Block Cipher Algorithm
(DLBCA). With 15 rounds and an 80-bit key size, DLBCA is built with 32-bit
block size. On the four levels, DLBCA performs the following operations: S-box,
bit-permutation, Exclusive-OR rotation, and key operations. It offers strong defense
against boomerang and differential strikes.
SKINNY and MANTIS were presented by Christof Beierle et al. in 2016. SPN is
used by the SKINNY family of adaptable block ciphers. By deleting the final NOT
gate at the end, it uses a small S-box S4 that is extremely similar to the S-Box of
PICCOLO. Block, key, and tweak sizes are all movable in SKINNY, and data is
Intelligent Communication for Internet of Things (IoRT) 321
loaded in rows like in AES. A new sparse diffusion layer and a new, light-colored
key schedule were both introduced by SKINNY [24].
In order to provide greater search space, SKINNY makes advantage of a larger
key size compared to block size and fewer rounds. Implementations that are leakage
resilient are provided via SKINNY’s tweakable capabilities.
MANTIS is a low latency, 64-bit block size, 128-bit key-size, 64-bit tweakable
block cipher. By employing its S-box and linear layer for quick diffusion, it improves
upon MIDORI. Mantis employs the row-wise added round constant of PRINCE. In
MANTIS, tweak-scheduling is used to guarantee a large number of active S-boxes
and boost security against related tweakey attacks by increasing the number of rounds.
Because MANTIS is primarily a copy of its identical MIDORI counterpart, security
analysis is not performed effectively [24].
A series of lightweight symmetric-key block ciphers called SPARX and LAX
based on the ARX (Modular Addition/Bitwise Rotation/XOR) algorithm was
proposed by Daniel Dinu and colleagues in 2016. To add nonlinearity and enough
diffusion, LTS uses sparse linear layers and a huge ARX-based S-Box termed arx-
box. Through the avoidance of database lookups, ARX lessens the impact of side-
channel attacks. By reducing the number of actions needed, this design style enables
quick software implementations. The LTS (a dual of WTS and LAX complete) long
trail design technique was used to create the cipher SPARX. Sparx-64/128 has eight
steps with three rounds each, and Sparx-128/256 has eight stages with four rounds,
each use 10 phases with 4 cycles [26].
5 Research Challenges
5.3 Privacy
Who should have access to the data and how should it be used are two more poten-
tial ethical issues. Things become more serious, particularly for sensitive data like
financial or medical information. No data collector should be given full, unrestricted
access to or use of any data without the necessary authorization or authorization,
according to the existing cultural agreement on privacy [27]. This problem needs
to be appropriately addressed because the majority of IoRT programs have a high
capability to collect data from both their working environment and interacting clients.
In robots, security and trust are the main concerns. Particularly, in the case of IoRT,
where cloud involvement is essential, we will confront two significant security chal-
lenges. First and foremost, the IoRT-VM environment needs to be reliable. For
instance, in military applications, IoRT-enabled robotic objects must be able to distin-
guish among a variety of IoRT-VM infrastructures that may be trusted, allowing them
to connect to that infrastructure while avoiding known harmful IoRT-VM infrastruc-
ture. Three strategies can be used to address this issue: building trust, measuring trust,
and reputation-based trust [16]. Second, the owners or controllers of future robotic
systems must have confidence in them to start computing tasks on IoRT-based clouds
where the cloud must be set up to allow for owner or controller verification. Here, we
must make sure that these outsourced duties are not being carried out by malicious
code. Secret data may be stored permanently on IoT-enabled cloud servers while also
replicating the logical shadow of the data to private cloud servers. As a result, strict
approaches are required to safeguard the integrity, trust, and secrecy of IoRT data.
Intelligent Communication for Internet of Things (IoRT) 323
6 Proposed Methodology
contrast, the pLayer exchanges or permutes each input bit individually in the order
depicted in Table 2, where I denotes the input data bit position and P(i) denotes the
output position. The key is updated by rotating the register left by 61 bit positions,
passing the leftmost four bits through the sBoxLayer, and using an XOR with the
current round’s counter to act on the bits from 19 to 15.
The process is done 30 more times, and in the final round (round number 32),
the data is only exclusive-ORed with the key without passing via the sBoxLayer and
the pLayer. During each cycle, the processed data and the key are both updated. The
encrypted data is the data that was obtained after round 32. Since the decipher key is
same, the decipher must perform 32 rounds of cipher key updating before beginning
the decryption process (Tables 3 and 4).
Table 5 System
RAM 4 GB
configuration
Processor i5
HDD 512 GB
Python 3.9.7
6.2 Results
See Table 5.
Algorithm name: PRESENT
Input word length: 8 bits
Average execution time for encryption: 0.00100088119506835941 ms
Average execution time for decryption: 0.002008676528930664 ms
See Fig. 3.
Fig. 3 Results
326 K. U. Bhat et al.
7 Conclusion
IoT and robotics are two phrases that each encompasses a wide range of technology
and ideas. An emerging field called the Internet of Robotic Things aims to incorpo-
rate robotic technologies into IoT contexts. Robotic systems can share information
and communicate with each other over the Internet of Robotic Things in order to
carry out complex tasks. IoRT principles, technologies, applications, and current
obstacles have been discussed in this paper. Energy-constrained devices and sensors
will constantly communicate with one another, and the security of these communi-
cations cannot be compromised. In this paper, a lightweight security method called
PRESENT is introduced for this purpose. PRESENT is one of the most lightweight
encryption techniques. As a result, many researchers have been fascinated by the
creation of lightweight block ciphers, notably over the past seven years.
References
1. Simoens P, Dragone M, Saffiotti A (2018) The internet of robotic things: a review of concept,
added value and applications. Int J Adv Robot Syst 15(1):1–11. https://doi.org/10.1177/172
9881418759424
2. Yfantis EA, Fayed A (2014) Authentication and secure robot communication. Int J Adv Robot
Syst 11(1):1–6. https://doi.org/10.5772/57433
3. Schwan KS, Bihari TE, Taulbee GM, Weide BW (1987) High-performance operating system
primitives for robotics and real-time control systems. ACM Trans Comput Syst 5(3):189–231.
https://doi.org/10.1145/24068.24070
4. Dixon C, Frew EW (2009) Maintaining optimal communication chains in robotic sensor
networks using mobility control. Mobility Netw Appl 14:281–291. https://doi.org/10.1007/
s11036-008-0102-0
5. Sabry SS, Qarabash NA, Obaid HS (2019) The road to the internet of things: a survey. In:
2019 9th Annual information technology, electromechanical engineering and microelectronics
conference (IEMECON). IEEE, pp 290–296. https://doi.org/10.1109/IEMECONX.2019.887
6989
6. Yoshino D, Watanobe Y, Naruse K (2021) A highly reliable communication system for internet
of robotic things and implementation in RT-middleware with AMQP communication interfaces.
IEEE Access 9:167229–167241. https://doi.org/10.1109/ACCESS.2021.3136855
7. Liao B, Ali Y, Nazir S, He L, Khan HU (2020) Security analysis of IoT devices by using mobile
computing: a systematic literature review. IEEE Access 8:120331–120350. https://doi.org/10.
1109/ACCESS.2020.3006358
Intelligent Communication for Internet of Things (IoRT) 327
8. Tsai W-C, Tsai T-H, Xiao G-H, Wang T-J, Lian Y-R, Huang S-H (2020) An automatic key-
update mechanism for M2M communication and IoT security enhancement. In: 2020 IEEE
International conference on smart internet of things (SmartIoT). IEEE, pp 354–355. https://
doi.org/10.1109/SmartIoT49966.2020.00067
9. Ray PP (2016) Internet of robotic things: concept, technologies, and challenges. IEEE Access
4:9489–9500. https://doi.org/10.1109/ACCESS.2017.2647747
10. Gulzar M, Abbas G (2019) Internet of things security: a survey and taxonomy. In: 2019 Inter-
national conference on engineering and emerging technologies (ICEET). IEEE, pp 1–6. https://
doi.org/10.1109/CEET1.2019.8711834
11. Villa D, Song X, Heim M, Li L (2021) Internet of robotic things: current technologies, appli-
cations, challenges and future directions. arXiv:2101.06256v1, pp 1–8. Retrieved from https://
arxiv.org/pdf/2101.06256.pdf
12. Park J-H, Baeg S-H, Ryu H-S, Baeg M-H (2008) An intelligent navigation method for service
robots in the smart environment. IFAC Proc Volumes 41(2):1691–1696
13. Kumar N, Takács M, Vámossy Z (2017) Robot navigation in unknown environment using
fuzzy logic. In: 2017 IEEE 15th International symposium on applied machine intelligence and
informatics (SAMI). IEEE, pp 279–284. https://doi.org/10.1109/SAMI.2017.7880317
14. Zhu Y, Zhang-Shen R, Rangarajan S, Rexford J (2014) Cabernet : connectivity architecture for
better network services. In: CoNEXT ’08: proceedings of the 2008 ACM CoNEXT conference,
pp 1–6. https://doi.org/10.1145/1544012.1544076
15. Alsulaimawi Z (2020) A privacy filter framework for internet of robotic things applications.
In: 2020 IEEE security and privacy workshops (SPW). IEEE, pp 262–267. https://doi.org/10.
1109/SPW50608.2020.00059
16. Khalid S (2021) Internet of robotic things: a review. J Appl Sci Technol. Trends 2(3):78–90.
https://doi.org/10.38094/jastt203104
17. Romeo L, Petitti A, Marani R, Milella A (2020) Internet of robotic things in smart domains:
applications and challenges. Sensors (Switzerland) 20(12):1–23. https://doi.org/10.3390/s20
123355
18. Rajkumar K, Kumar CS, Yuvashree C, Murugan S (2019) Portable surveillance robot using
IoT. Int Res J Eng Technol 6(3):94–97
19. Hadipour H, Sadeghi S, Niknam MM, Song L, Bagheri N (2019) Comprehensive security anal-
ysis of CRAFT. IACR Trans Symmetric Cryptol 2019(4):290–317. https://doi.org/10.13154/
tosc.v2019.i4.290-317
20. Adomnicai A, Berger TP, Clavier C, Francq J, Huynh P, Lallemand V, Le Gouguec K, Minier
M, Reynaud L, Thomas G (2019) Lilliput-AE: a new lightweight tweakable block cipher for
authenticated encryption with associated data. Submitted to NIST Lightweight Project, pp 1–58
21. Rani DJ, Roslin SE (2021) Optimized implementation of gift cipher. Wireless Pers Commun
119(3):2185–2195. https://doi.org/10.1007/s11277-021-08325-2
22. Mishra Z, Mishra S, Acharya B (2021) High throughput novel architecture of SIT cipher
for IoT application. In: Nanoelectronics, circuits and communication systems. Lecture notes
in electrical engineering, vol 692. Springer, pp 267–276. https://doi.org/10.1007/978-981-15-
7486-3_26
23. AlDabbagh SSM (2017) Design 32-bit lightweight block cipher algorithm (DLBCA). Int J
Comput Appl 166(8):17–20. https://doi.org/10.5120/ijca2017914088
24. Beierle C et al (2016) The SKINNY family of block ciphers and its low-latency variant
MANTIS. In: Advances in cryptology – CRYPTO 2016. CRYPTO 2016. Lecture notes in
computer science, vol 9815. Springer, Berlin, Heidelberg, pp 123–153. https://doi.org/10.1007/
978-3-662-53008-5_5
25. Patil J, Bansod GV, Kant KS (2017) LiCi: a new ultra-lightweight block cipher. In: 2017
International conference on emerging trends & innovation in ICT (ICEI). https://doi.org/10.
1109/ETIICT.2017.7977007
26. Seok B, Lee C (2019) Fast implementations of ARX-based lightweight block ciphers (SPARX,
CHAM) on 32-bit processor. Int J Distrib Sens Networks 15(9):1–9. https://doi.org/10.1177/
1550147719874180
328 K. U. Bhat et al.
27. Sengupta J Ruj S, Bit SD (2019) End to end secure anonymous communication for secure
directed diffusion in IoT. In: ICDCN ’19: Proceedings of the 20th international conference on
distributed computing and networking, pp 445–450. https://doi.org/10.1145/3288599.3295577
28. Kumari AS, Mandi MV (2019) Implementation of present cipher on FPGA for IoT applications.
IJERT 8(8):26–29
29. Santa FM, Jacinto E, Montiel H (2019) PRESENT cipher implemented on an ARM-based
system on chip. In: Data mining and big data. DMBD 2019. Communications in computer
and information science, vol 1071. Springer, pp 300–306. https://doi.org/10.1007/978-981-32-
9563-6_31
30. Jacinto EG, Montiel HA, Martínez FS (2017)Implementation of the cryptographic algorithm
‘present’ in different microcontroller type embedded software platforms. Int J Appl Eng Res
12(19):8092–8096
Energy-Efficient Priority-Based Routing
Model for Smart Health Care
Abstract The Internet of Things is a network that connects a huge number of plans
to collect data over the internet. The internet of possessions has a wide variety of
applications including smart health care. Using IoT, the patient can monitor medical
data at any time and from any location in this circumstance. Smart health care is one
of the key areas where IoT infrastructures and solutions are widely used to facilitate
the best possible patient surveillance, accurate diagnosis, and timely operation of
patients with existing diseases. Smart healthcare systems, on the other hand, face
various challenges including energy consumption, data integrity, data privacy, and
data transmission speed. Keeping in view the challenges, we have suggested a system
model based on priority routing. The proposed model uses RPL along with priority
approach for prioritizing the healthcare data. The healthcare data is classified into
different classes: medium, critical, and normal to deliver the most critical data to
the healthcare centers first. As a result, patients are not limited to a certain selection
of health centers and specialists at particular instance. RPL works for small power
and lossy networks, and the QoS provides a certain bandwidth to the data to reach
the destination. The congestion overhead of the channels is controlled by the time
division multiple access. The time division multiple access time slot is used for
synchronization between the source and destination to reduce energy use. The data
in a TDMA slot is used to check the traffic. If the traffic is high, normal data is
sent; otherwise, priority data is sent. Our proposed model promises the delivery of
healthcare data on time by using RPL with TDMA. The DODAG’s topology ensures
healthcare data to be traveled by the shortest possible path to improve performance
metrics.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 329
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_25
330 U. Gull et al.
1 Introduction
The proliferation of the Internet of Things and related computers have resulted in
the development of smart systems such as smart cities, smart transportation systems,
smart energy, and smart home. The IoT interconnects all objects (living and non-
living) to form a physical network, and all processes such as sensing, processing,
and communication are impulsively controlled and supervise without human inter-
vention [1]. The healthcare business has developed immensely by adopting some
level of automation since there is so much room for improvement. By bringing
together medical devices, healthcare professionals, and patients, the present health-
care business has progressed beyond hospitals [2]. By transforming traditional health-
care systems into current smart healthcare systems, the Internet of Things has great
change in the field of health care [3]. SHS was created to deal directly with patient
health-related information [4]. SHS gives in-depth insights on illness symptoms and
minimizes the need for frequent health checks, which can aid the elderly, diabetics,
and others with secure management. The process of selecting a pathway for data
to go from its source to its destination is known as routing. Routing is done by a
router, which is unique equipment. Under the OSI paradigm, a router operates at the
network layer, whereas in the TCP/IP model, it operates at the internet layer.
Smart health care uses a replacement generation of knowledge like the Web of
Things, big data, and cloud computing to remodel the usual medical system in an
all-around result. Nowadays, health care becomes more efficient, convenient, and
personalized. With an aim of introducing the concept of smart health care, we first
review the literature concerning the prioritized routing protocols for better quality
of services. The prevailing problems in smart healthcare system are addressed to
formulate solutions. Lastly, we attempt to propose a priority-based model to reduce
delay during the transmission of data. Priority routing is the priority assigned to
data for accessing a destination when capacity becomes available [5]. The priority is
allocated to two or more entities or classes of data that are waiting for a routing desti-
nation to become available [6]. The priority-based routing breaks a pull by deciding
which entity has access to the destination location first when it becomes available.
Routing could be a process of choosing a path along which the information transits
starting from the source to the destination. Routing is done by a router, which is
unique equipment inside this OSI model; the router operates at the network topology
using the internet protocol inside the TCP/IP model [7]. The router usually forwards
packets using information from packet header and sending table. The packets are
routed using routing techniques. The routing protocol is nothing more than code that
determines the best path for the packets to travel.
Routing protocols use metrics to work out the simplest path for the packet delivery
from source to the destination such as hop count, bandwidth, latency, and current
network load. The routing algorithm generates and maintains the routing table for
the path determination process. Routing performance is determined by performance
measures such as hop count. The hop count indicates how many times a packet must
pass through internetworking procedure such as a router on its way from source to
Energy-Efficient Priority-Based Routing Model for Smart Health Care 331
destination. Using hop as a major criterion, the path with the fewest hops will be
deemed as the optimum way for getting from source to destination. Similarly, the
time it takes the router to process queue and transfer a datagram to a workflow is
referred to as delay. This measure is used by the protocols to calculate the delay
morals for all acquaintances along the path from beginning to end. The optimum
path will be the one with the smallest delay.
Following is the rest of the analysis: The review of literature is covered in Sect. 2.
The problems of smart health care are discussed in Sect. 3. Section 4 describes the
proposed smart healthcare model; in this model, we discuss RPL and its working
and QoS with algorithm and various benefits in smart health care. Conclusion and
future work are presented in Sect. 5.
2 Literature Survey
The [8] used RPL-based solution for reducing IoT device energy consumption. Time
division multiple access time slot is employed to synchronize the correspondent
and recipient and decrease energy usage. The proposed technique was tested using
NS-2 to evaluate the proposed and traditional methods’ routing overhead, energy
consumption, and end-to-end delay.
In [9], the author presented priority-based energy-efficient routing protocol
(PEERP) for reliable data transmission in smart health Care. The PEERP divides
health data into two categories: emergency circumstances and important health data,
based on their importance. Based on the division and importance provided by PEERP
to healthcare data packet, the efficiency of the model is tested, using simulation soft-
ware. After the simulation experiment, the PEERP technique extended the lifetime
of the networks.
In [10], RPL protocol is used for minimizing carbon emissions and energy
consumption with significant improvement in QoS metrics. The RPL protocol was
applied on low-power healthcare gadgets including smart processes, which use a
hybrid paradigm to restore power efficiency and transmitter status of RPL, and
procedures are used to calculate the ETX value for a connection after sending a
data packet.
The [11] devised a routing and transference power control algorithm to outline
an energy-efficient, reliable, and cost-effective RPL system for IoT approach. The
restriction of this suggested model is that all of the nodes that are investigated are
homogenous and have an RF transceiver CC2420.
The [12] used a routing strategy that will improve energy efficiency and, as a
result, sensor longevity. The efficient protocol for dual statistical method supports
the suggested idea for IoT in health care (EERP-DPM). MATLAB software and the
MySignals HW V2 hardware platform were used to design and test the proposed
system.
The [4] formulated a system in which healthcare information identification was
included in the IP data packet at the sensor level, QoS was modified at the router
332 U. Gull et al.
level, and the uppermost priority was given to the healthcare data package routing.
This system was tested by using TI LaunchPad.
The [13] developed an energy-efficient solution for IoT applications in which
nodes consume less power and preserve the network’s lifespan. For IoT applications,
a network and communication power quality algorithm is built for stable, energy-
efficient operation. For a network containing nodes, using different types of RF
transceivers, the suggested solution might not perform well. The comparative analysis
of various priority based routing mechanisms are depicted in Table 1.
With data in this crowded environment coming from different IoT devices, the basic
challenge is to manipulate the data and give high priority to health-related data so
that patients will be taken care of in a smarter way in this modern world. There
are different challenges faced during transmission of data so that it may reach to its
destination in proper time. These challenges include energy consumption, volume of
data, data integrity and privacy, data transmission speed, throughput, and delay that
are discussed subsequently.
• Energy Consumption
Energy consumption is the major concern that various researchers are dealing
within the IoT-driven environment [18]. The energy is consumed due to the
frequent occurrence of collisions, repetitive congestion due to limited bandwidth
of the channel, and data transmission to longer distances. Another factor that
contributes to energy consumption is the limited battery capacity, computing
speed, and memory of IoT devices.
• Volume of Data
IoT devices generate the big amount of data known as big data that is character-
ized by volume, velocity, and variety. As the bandwidth of the channel is limited,
it becomes difficult to route the data. Therefore, employing priority to the trans-
mitted data solves the problem to some extent [19]. The massive amount of data
generation results into data loss and delay and minimizes the throughput. There-
fore, efficient and reliable mechanisms must be developed to tackle the big data
generated from smart motes.
• Data Integrity and Privacy
In today’s healthcare industry, data integrity is a chronic issue. Data integrity
assures that the information is accurate and is not tampered with in any way [20].
The data integrity is approached by the fabricated hackers that have continuous
access to the data that serves the purpose of money making and gets useful insights
from data. Therefore, data integrity must be ensured in smart healthcare system.
For the healthcare business to preserve patient’s personal information and comply
with laws, data privacy is critical. Due to the influence of various attacks like DDoS
Energy-Efficient Priority-Based Routing Model for Smart Health Care
Table 1 Comparative analysis of priority-based routing techniques in the state-of-the-art IoT
Author/year Model Techniques Simulation tools Analysis Demerits Performance metrics
El Zouka and Hosni RPL-based system Self-rejoining RPL(QU-RPL) A RPL-based IoT Nodes using Throughput, latency
[4]/2017 for IoT algorithm system that is multiple types of
reliable and RF transceiver do
low-cost not perform well
Ambarkar and RPL model Ant colony-inspired TDMA Growing network Node’s maximum Bandwidth, delay
Shekokar [10]/2020 algorithm efficiency in speed was 25 m/s,
Decentralized provisos of best only
ant-based algorithm packet
Hathaliya et al. For reliable data Reliable routing MATLAB Enlarge the ATTEMPT protocol Network lifetime,
[3]/2020 transmission in algorithm platform lifetime of the not perform well throughput and path
PEERP system after 2500 rounds loss
Choudhary et al. In a congested IoT Priority decision TICC1310 Proposed Routing issue of Delay, latency
[14]/2020 environment, a algorithm launchpads technology could healthcare data
sensor-driven greatly enhance packets
approach to priorities remote medical
healthcare data operations
Debroy et al. Dual prediction Decision-making EERP-DPM Increasing the Network duration is Throughput,
[15]/2021 model (EERP-DPM) algorithm network lifetime of less end-to-end delay
LMS algorithm sensors in IoT
Proposed model/2022 Priority-based smart Priority algorithnm TI LaunchPad Saving time, cost, Bandwidth, delay
healthcare model using QoS and more
importantly,
human lives
(continued)
333
334
Table 1 (continued)
Author/year Model Techniques Simulation tools Analysis Demerits Performance metrics
Safara et al. [8]/2020 Three-tier clustering Routing algorithm NS-3 Extend network Many problems Delay, latency
technique lifetime from general to
definite problems
McGhin et al. BHEEM ANT colony-inspired NS–3 To design and Implementation of Bandwidth,
[16]/2020 algorithm analyze a smart smart contract is not throughput
health monitoring defined formally
system
Ray et al. [17]/2019 S2SH framework Priority decision TICC1310 A detailed The integration of Pass loss, end-to-end
algorithm Launchpads framework for a various subsystem delay
smart healthcare must be validate
system is before approval
developed
U. Gull et al.
Energy-Efficient Priority-Based Routing Model for Smart Health Care 335
attack, man in the middle, and battery drainage attacks, maintaining data privacy
becomes crucial. Advanced machine learning mechanisms and blockchain-based
solutions must be imposed to ensure data privacy of data.
• Data Transmission Speed
The primary source of energy consumption in IoT devices is data transmission
[21]. Processing and delivering redundant data consume part of the energy used
by these devices. Therefore, reducing the number of redundant transmissions
preserves the energy of the network. There must be a trade-off between the data
transmission speed and the required bandwidth to ensure the reliable transmission
of data.
• Throughput
The amount of packets that preserve be delivered to the end medical server is
decided by this performance indicator, with more throughput indicating better
network quality. Routing services by means of high performance along with
low packet loss are required for the patient monitoring system [22]. The normal
network life, which corresponds to the amount of active sensor nodes, determines
the amount of packets received at the medical server.
• End-to-End Delay
End-to-end delay is the amount of time it takes for a packet of data to get from
the origin to the destination node [23]. The IoT is implemented in healthcare
applications to communicate sensitive data from the sensor nodes to a health
server. The discovered data is not always ordinary; in certain circumstances, it is
critical; as a result, it must be transmitted to the target system fast.
The proposed smart healthcare model entails sensing layer followed by assigning
priorities by adding a label M, prioritized routing layer, and healthcare services as
illustrated in Fig. 1. The functionality of each layer is discussed in detail in the
following section.
(A) Sensing Layer
IoT systems are being developed for a range of applications, urban services,
including health care and smart city transportation. On the opposite side,
collecting massive volumes of enormous data for multimedia content from these
networks usually causes traffic jam within the principal network. We suppose
that a sensor is installed on somebody’s body to capture certain healthcare data
like diabetes sugar levels, A1C, and so on. The sensors capture this information
and classify data into three classes. The classified data is sent as input to priority
algorithm which prioritizes the information and provides critical data with the
highest priority. The priority will be provided to the headers of critical data as
M/11, normal data headers will be assigned as M/01, and medium data will be
336 U. Gull et al.
Priority Assignments
Pulse sensor ECG sensor Critical data M
11
M
Normal data
01
BP
sensor M
Medium data
10
Medicine
IN Intermediate
Node
QoS Metrics TDMA slot
• Delay
Pharmacy • Latency
• Throughput
• Bandwidth
• Network
• Lifetime
Psychology
DODAG
Healthcare Services Prioritised Routing layer
provided as M/10. Critical data is then fed into RPL, which then goes through
DODAG’s topology [24] to locate the best data delivery method.
(B) Prioritized Routing Layer
RPL uses a vigorous process to enlarge and maintain a directed, non-circular
graph, destination-oriented routing architecture. The information is focused on
the DODAG root during this graph [25]. From each node to the DAG’s root,
the sides form a route. If the system is in an extremely constant condition, RPL
employs a limited DODAG information object (DIO) beacon technique to keep
the structure of the DODAG router. The DIO beacon is controlled by using
trickle timer. The trickling timer is employed to come up with RPL messages.
Trickle timers permit nodes to limit the amount of messages they send while
maintaining network strength. As long as a node gets communication that is
companionable with its own data, it will exponentially increase the amount of
direct packets it sends until it reaches its highest point. The suggested DODAG
is described by the routing input with DODAG = (V, N), where V is the
collection of nodes and N represents number of bits to communicate in an
extremely packet in accordance with Eq. (1):
Energy-Efficient Priority-Based Routing Model for Smart Health Care 337
Here, V 0 is indeed the DODAG root, N s denotes the first bit, and N 0 i and N 1
i denote the value of 0 and 1 in binary coding, respectively. The time division
multiple access slot is used to match between the source and destination and
reduce energy utilization. From the information during a TDMA slot [26], first,
the traffic is examined. Regular data is sent while traffic is heavy; otherwise,
priority data is sent. Node priorities and data rates are used to allocate space
for sending packets. It is the coordinator’s fault for allocating time slots. If
a number of nodes have the identical priorities, choosing TDMA would be
supported during the transmission rate. Frame values can be changed to use the
data unit to put a frame in an idle mode.
To synchronize the transmitter and recipient, preliminary bits are used in the
first section of the frame [27, 28]. While no data can be found on the channel,
the receiver detects that acceptable data is placed on the channel by acquiring a
sample of 01,010,101. Once the receiver gets 10,101,011, which contains two
consecutive bits of one, it indicates that genuine data begins after two consec-
utive ones and is ready to receive. The subsystem modifies its preliminary bits
such that the rest of the bits are received. The number of repeats is determined
by the initial settings for the module established by the information center inside
the preliminary bits. The following bits are delivered as the destination when
the data on the channel has been validated, i.e., after getting the consecutive bits
11 of the instance 10,101,011 inside the preliminary bits. The sender’s location
of the frame is represented by the source node. The address of the equipment
to which the message is transmitted is saved in the receiver’s location field to
determine which is to be sent. The size of the payload component of the packet
is defined in the control part of the packet. If a notification must be transmitted,
the acknowledgment bit inside the packet test area is set to true. The burden also
refers to the space used by the signal processor to receive data from the sensor
via the serial connection between both the signal processor and the sensor. The
payload size can be specified anywhere between 0 and 32 bytes in the device
settings page. The cyclic redundancy check at the frame’s top is responsible for
the frame’s integrity. The validity of the frame is tested when the error checking
code is enabled if the CRC does not equal the frame which is incorrect.
(C) Quality of Service (QoS)
The QoS receives prioritized data via DODAG [29]. Since all the data is health-
care data, QoS will further prioritize the health care into critical and normal
automatically by using priority mechanism [30]. The data headers will further
be modified to give priorityto critical data. QoS will halt all other data and
transmit the critical data by providing a particular bandwidth and hence reduce
338 U. Gull et al.
delay. QoS returns to standard processing when critical healthcare data has been
sent.
Algorithm 1: Modified QoS
Thus, the healthcare data passing via the proposed model will reach to healthcare
centers, and hence, the health of patients will be monitored, and patients will receive
appropriate instructions or treatment in time.
5 Conclusion
References
18. Hossain MS, Muhammad G (2018) Emotion-aware connected healthcare big data towards 5G.
IEEE Internet Things J 5(4):2399–2406. https://doi.org/10.1109/JIOT.2017.2772959
19. Chen SK et al (2012) A reliable transmission protocol for zigbee-based wireless patient moni-
toring. IEEE Trans Inf Technol Biomed 16(1):6–16. https://doi.org/10.1109/TITB.2011.217
1704
20. Shakeel PM, Baskar S, Dhulipala VRS, Mishra S, Jaber MM (2018) RETRACTED ARTICLE:
Maintaining security and privacy in health care system using learning based deep-Q-networks.
J Med Syst42:186. https://doi.org/10.1007/s10916-018-1045-z
21. Luo E, Bhuiyan MZA, Wang G, Rahman MA, Wu J, Atiquzzaman M (2018) PrivacyProtector:
Privacy-protected patient data collection in IoT-based healthcare systems. IEEE Commun Mag
56(2):163–168. https://doi.org/10.1109/MCOM.2018.1700364
22. Elhoseny M, Ramírez-González G, Abu-Elnasr OM, Shawkat SA, Arunkumar N, Farouk A
(2018) Secure medical data transmission model for IoT-based healthcare systems. IEEE Access
6:20596–20608. https://doi.org/10.1109/ACCESS.2018.2817615
23. Moosavi SR et al (2015) SEA: A secure and efficient authentication and authorization archi-
tecture for IoT-based healthcare using smart gateways. Procedia Comput Sci 52(1):452–459.
https://doi.org/10.1016/j.procs.2015.05.013
24. Mahmud MA, Abdelgawad A, Yelamarthi K (2017) Energy efficient routing for Internet of
Things (IoT) applications. In: 2017 IEEE International conference on electro information
technology (EIT). IEEE, pp 442–446. https://doi.org/10.1109/EIT.2017.8053402
25. Besher KM, Beitelspacher S, Nieto-Hipolito JI, Ali MZ (2021) Sensor Initiated healthcare
packet priority in congested IoT networks. IEEE Sens J 21(10):11704–11711. https://doi.org/
10.1109/JSEN.2020.3012519
26. Koutras D, Stergiopoulos G, Dasaklis T, Kotzanikolaou P, Glynos D, Douligeris C (2020)
Security in IoMT communications: a survey. Sensors 20(17):4828. https://doi.org/10.3390/s20
174828
27. Redondi A, Chirico M, Borsani L, Cesana M, Tagliasacchi M (2013) An integrated system
based on wireless sensor networks for patient monitoring, localization and tracking. Ad Hoc
Netw 11(1):39–53. https://doi.org/10.1016/j.adhoc.2012.04.006
28. Balasubramanian V, Otoum S, Aloqaily M, Ridhawi IA, Jararweh Y (2020) Low-latency vehic-
ular edge: a vehicular infrastructure model for 5G. Simul Modell Pract Theory 98:101968.
https://doi.org/10.1016/j.simpat.2019.101968
29. Saha HN et al (2017) Health monitoring using Internet of Things (IoT). In: 2017 8th Annual
industrial automation and electromechanical engineering conference (IEMECON). IEEE, pp
69–73. https://doi.org/10.1109/IEMECON.2017.8079564
30. Sciences H (2016) 済無 4(1):1–23
An Internet of Things-Based Mining
Worker Safety Helmet Using ESP32
and Blynk App
Abstract In underground mining industries, there are many factors that may risk
the lives of workers. We have seen many people who lost their livelihood due to
emergency situations in the mines. These emergency situations occur due to the
leakage of harmful gases or sudden explosions, and also workers may undergo some
kind of ventilation problems. As we know, coal contributes more in the production
of energy, so mining of a coal plays a major role. By considering all these factors, we
have introduced a smart helmet system which helps the workers to get warned under
any emergency circumstances. The only objective of this prototype is to ensure the
safety of the workers. And, the technique we use to do this is to provide a proper
communication between the worker and the monitoring system. The communication
can be provided using various methods such as Bluetooth, ZigBee, Wi-Fi modules.
Here, we have chosen Wi-Fi module, i.e., ESP32 as it provides higher transmission
rate along with covering longest distance. The data can be received by a monitoring
system through the Blynk app which is an open-source IoT platform. The gases exist
in the environment are detected by our smart helmet using the related sensors and
are sent to the monitoring system. By implementing this model, we have created a
communication so that the worker can feel safe to enter the mines and can concentrate
on his/her work.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 341
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_26
342 N. Umapathi et al.
temperature also gets raised which is abnormal to the normal conditions. “Deeper
the mines, greater the risk”. In the world, India holds the fifth largest and considered
as a home for vast coal reserves. In the year 2020, environment ministry has cleared
the way for 14 new coal mine projects. And, it is estimated that by 2024, one billion
tons of coal production can be raised. Along with the increase in the production and
development, the risk is also being increased for the mine workers. As per the records,
coal mines have recorded for highest number of accidents in mines. According to
the ministry data, averages of 549 deaths were reported every year.
At present, the workers in mining industry only use a normal helmet for protection
against the probable dangerous bumps. So, by adding some technologies to that
helmet, which helps in communication between worker and the monitor, helps to
a larger extent to identify the hazardous atmosphere inside the mine. As of now,
the prototype helmets made for the safety of mining workers were done using a
different wireless technologies like Bluetooth, ZigBee, etc., and with different kinds
of sensors. Our prototype is developed in the advancement for the existing systems.
We have used Wi-Fi technology with the combination of four different sensors to
cover the maximum possibilities for the indication of hazardous occurrences in the
mines. For overcoming and maintaining the friendly atmosphere in underground
mines, our design helps in implementing a better communication technology for
sensing and warning. This monitoring system allows the mine to intellectually make
changes based on the data monitored in the Blynk app which is an open source of IoT.
This kind of progression using IoT technology plays a prominent role in ensuring
safety of mining workers.
2 Literature Survey
the monitoring system using ThingSpeak which is an IoT open-source platform [6,
7]. In 2020, Tajane et al. made a module that imparts a well-founded communication
using ZigBee module between the worker and the monitoring [8]. In 2013, Pradeep-
kumar and Usha recognized the high-risk cases that exist in the mines like accretion
of unhealthy gases and also determine the collisions etc. [9]. In 2015, Pradeepkumar
et al. provided a workable and the full fledge outcome for establishing private LoRa
network using both the hardware and software algorithms [10].
In 2020, Kumar et al. aimed and researched regarding the analysis of providing
a secure environment using a technology named LoRa [11]. In 2017, Jagadeesh and
Nagaraj implemented a smart helmet for the detection of unhealthy situations in
the mining areas using Internet of Things [12]. In 2014, Shabina created a helmet
using technologies like wireless sensor network and radio frequency for ensuring
the safety of the underground mining workers [13]. In 2013, Ussdek et al. designed
a system detecting high sensitivity gas and monitoring system [14]. In 2010, Sheng
and Yunlong Accident analyzed and counter measured the coal and gas for mining
safety and environmental protection [15, 16]. In 2009, Zhou et al. made a module in
order to monitor the gases exist using a chain-type WSN [17].
3 Proposed Method
In this proposed method, we use four input devices, namely, alcohol sensor, gas
sensor, temperature sensor, and smoke sensor. In these, the alcohol sensor senses
the ethanol presents in the air, gas sensor detects the gases present in atmosphere,
temperature sensor measures the temperature which is of air, liquid, or solid temper-
atures, and smoke sensor module senses the smoke which is an indicator of fire.
And, the output devices are liquid crystal display for self-monitoring and buzzer for
warning in any case of dangers detected. All these are connected to the ESP32 Wi-Fi
module, and it acts as the main device which helps in the communication in between
input and output devices. Power supply is given for ESP32 Wi-Fi module (Fig. 1).
ESP32 is the main component in this project. The components used in this project
are alcohol sensor, smoke sensor, gas sensor, DHT-11 temperature sensors, LCD (16
∗ 2), and buzzer. The negative terminal of MQ-6, MQ-3, and DHT-11 is linked to
the ground pin of ESP32.The positive terminal of gas sensor is coupled to the G18
pin of ESP32, whereas alcohol sensor is tie up with the G19 pin of ESP32, DHT-11
sensor is linked to G15 pin of ESP32, and the another smoke sensor is bridged to
G21 pin of ESP32 Wi-Fi module. Buzzer positive terminal is connected to G13 pin
(Fig. 2).
344 N. Umapathi et al.
5 Components
1. ESP32
ESP32 is a general-purpose microcontroller which is a System on Chip (SoC)
along with the capabilities of handling wireless connections like Wi-Fi and Blue-
tooth. It is a low-power system which can be handled on a dual mode. And, it
is low in cost compared to other microcontroller systems with all capabilities.
ESP32 has 26 general-purpose input and output pins along with Vin, gnd, and
An Internet of Things-Based Mining Worker Safety Helmet Using … 345
Fig. 3 ESP32
enable pins, out of which 15 pins can be used as ADC pins and 2 pins as DAC
pins (Fig. 3).
2. Gas Sensors
We use three kinds of gas sensors for three different sensing operations. One is
MQ-3 sensor for detecting alcohol, i.e., for sensing harmful gases like isopropyl.
And, the other sensors are MQ-2 and MQ-6 for sensing combustible gases and
smoke, respectively. The operation involved in all these MQ series of sensors is
they contain a coil or a heater inside them, which gets heated when it comes in
contact with reactants (Fig. 4).
3. DHT-11 Sensor:
DHT-11 is a digital humidity and temperature sensor which is generally of low
cost. It contains a capacitive sensor for measuring humidity and a thermoresistor
for measuring the temperature. It does not need any analog input pins. And, it
can be easily interfaced with any kind of microcontrollers (Fig. 5).
Fig.5 DHT-11
6 Result Analysis
When presence of harmful gases like isopropyl phenol is detected, alcohol sensor
senses it; we get an alert message like shown in Fig. 7.
If there is any fire around or nearer to mining worker, MQ-6 sensor senses the
smoke, then we receive an alert message as shown in Fig. 8.
Combustible gases like methane, carbon monoxide, LPG, etc., are detected by
MQ-2 sensor, and then, alert message we receive is shown in Fig. 9.
Figure 10 shown is prototype of our proposed system—”Mining worker safety
helmet using IOT”.
7 Conclusion
Finally, we conclude that a smart helmet which helps mining workers to get rid
of various risk factors has been developed. It is cost-effective and very efficient,
result is now that workers can have safe place to work. Being operated using IoT
makes its operation easily tracked and also helps in better transfer of information.
The components being used in it have integrated features which also help in ease
of operation. And, monitoring of the output can be accessed or monitored by any
individual using any open-source IoT platform; here, we make use of Blynk app,
which is easy to understand and monitor. The only aim or objective of this proposed
348 N. Umapathi et al.
system is to create a safe environment for mining workers and to provide better
communication, which was achieved finally.
8 Future Scope
In future, this prototype can be implemented in the advancement of existing model for
the recognition of various kinds of dangerous conditions such as the accumulation of
carbon monoxide gases, exhalation in the mine areas. Additionally, this model can be
implemented by adding the sensors like pressure sensor, IR sensor, etc. This system
can also be enhanced by the implementation of few attributes such as the strength
and ranges of the signal, and also, the worker’s heartbeat and the blood pressure can
be checked and monitored by adding the required equipment.
References
1. Sravani B, Rambabu K (2017) A smart and secured helmet for mining workers. Int J Adv Res
Trends Eng Technol (IJARTET) 4(3):112–118
2. Behr CJ, Kumar A, Hancke GP (2016) A smart helmet for air quality and hazardous event detec-
tion for the mining industry. In: 2016 IEEE International conference on industrial technology
(ICIT). IEEE, pp 2026–2031. https://doi.org/10.1109/ICIT.2016.7475079
3. Kumar GR, Reddy BK (2018) Internet of things based an intelligent helmet for wireless sensor
network. Int J Eng Sci Res Technol (IJESRT) 7(6):88–92
4. Dhanalakshmi A, Lathapriya P, Divya K (2017) A smart helmet for improving safety in mining
industry. Int J Innov Sci Res Technol (IJISRT) 2(3):58–64
5. Deokar SR, Kulkarni VM, Wakode JS (2017) Smart helmet for coal mines safety monitoring
and alerting. Int J Adv Res Comput Commun Eng (IJARCCE), ISO 3297:2007 Certified.
6(7):1–7
An Internet of Things-Based Mining Worker Safety Helmet Using … 349
6. Umapathi N, Teja S, Roshini, Kiran S (2020) Design and implementation of prevent gas
poisoning from sewage workers using Arduino. In: 2020 IEEE International symposium on
sustainable energy, signal processing and cyber security (iSSSC). IEEE, pp 1–4. https://doi.
org/10.1109/iSSSC50941.2020.9358841
7. Borkar SP, Baru VB (2018) IoT based smart helmet for underground mines. Int J Res Eng Sci
Manage (IJESM) 1(9):52–56
8. Tajane PS, Shelke SB, Sadgir SB, Shelke AN (2020) IoT mining tracking & worker safety
helmet. Int Res J Eng Technol (IRJET) 7(4):5587–5590
9. Pradeepkumar G, Usha S (2013) Effective watermarking algorithm to protect electronic patient
record using image transform. In: 2013 International conference on information communication
and embedded systems (ICICES). IEEE, pp 1030–1034. https://doi.org/10.1109/ICICES.2013.
6508251
10. Pradeepkumar G, Prasad CV, Rathanasabhapathy G (2015) Effective watermarking algorithm
to protect electronic patient record using DCT. Int J Softw Hardware Res Eng 3(11):16–19
11. Kumar GP, Saranya MD, Tamilselvan KS, SU Jhanani, Iqbal MJL, Kavitha S (2020) Investi-
gation on watermarking algorithm for secure transaction of electronic patient record by hybrid
transform. In: 2020 Fourth international conference on I-SMAC (IoT in social, mobile, analytics
and cloud) (I-SMAC). IEEE, pp 379–383. https://doi.org/10.1109/I-SMAC49090.2020.924
3411
12. Jagadeesh R, Nagaraj R (2017) IoT based smart helmet for unsafe event detection for mining
industry. Int Res J Eng Technol 4(1):1487–1491
13. Shabina S (2014) Smart helmet using RF and WSN technology for underground mines safety.
In: 2014 International conference on intelligent computing applications. IEEE, pp 305–309.
https://doi.org/10.1109/ICICA.2014.105
14. Ussdek MEM, Junid SAMA, Majid ZA, Osman FN, Othman Z (2013) High-sensitivity gas
detection and monitoring system for high-risk welding activity. In: 2013 IEEE Conference on
systems, process & control (ICSPC). IEEE, pp 256–261. https://doi.org/10.1109/SPC.2013.
6735143
15. Sheng XZ, YunlongZ (2010) Accident cause analysis and counter measure of coal and gas
outburst nearly two years of our country. Min Saf Environ Prot 37(1):84–87
16. Umapathi N, Sabbani S (2022) An Internet of Things (IoT)-based approach for real-time kitchen
monitoring using NodeMCU 1.0. In: Futuristic communication and network technologies.
VICFCNT 2020. Lecture notes in electrical engineering, vol 792. Springer, Singapore, pp
35–43. https://doi.org/10.1007/978-981-16-4625-6_4
17. Zhou G, Zhu Z, Chen G Hu N (2009) Energy-efficient chain-type wireless sensor network
for gas monitoring. In: 2009 Second international conference on information and computing
science. IEEE, pp 125–128. https://doi.org/10.1109/ICIC.2009.140
Time Series-Based IDS for Detecting
Botnet Attacks in IoT and Embedded
Devices
Abstract The existing intrusion detection systems (IDS) have found it very
demanding to detect growing cyber-threats due to the voluminous network traffic
data with increasing Internet of Things (IoT) devices. Moreover, security attacks, on
the other hand, tend to be unpredictable. There are significant challenges in devel-
oping adaptive and strong IDS for IoT to avoid false warnings and assure high detec-
tion efficiency against attacks, especially as botnet attacks become more ubiquitous.
Motivated by these facts, in this paper, different types of botnet attacks have been
studied and how they are more conveniently launched with open and vulnerable IoT
devices. Then, the growing trend of deep learning (DL) techniques is being studied
extensively for their ability to detect botnet attacks by learning from time series data
specifically in the IoT environment. Hackers are exploiting the Internet of Things
(IoT), creating millions of new vulnerability points in critical infrastructure. We must
build greater consensus on IoT security standards and trust in security across critical
infrastructure.
1 Introduction
The Internet of Things (IoT) has gotten enormous attention in the recent past as
a result of its unique applications and support for a variety of fields, considering
industrial applications, health care, automation, environmental sensing, and so on.
The Internet of Things needs an infrastructure upon which programs, devices, and
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 351
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_27
352 S. Sharma et al.
services may be interconnected and used to receive, interact, store, analyze, and
exchange information from the physical world [1]. The Internet of Things (IoT) is
an expanding media concept in which things embedded with sensors and actuators
can detect their surroundings, interact with one another, and exchange information
over the Internet. To provide useful information and make timely decisions, a large
number of sensors and actuators are required for real-time monitoring and the envi-
ronment of various industrial sectors [2]. There are approximately 50 billion IoT
devices connected to the Internet, and this amount is likely to expand in the next
few years. These devices provide a large amount of data that can be used by various
applications. Regardless of the fact that it provides a vast variety of services and
applications, it is vulnerable to cyberattacks [3]. Cyber attackers can harm systems
in several different ways, such as restricting access, theft of information, or elimi-
nating a specific target [4]. Because of its diversity, immense scale, limited hardware
resources, and universal availability of IoT systems, IoT security has become a chal-
lenge [5]. However, several aspects of IoT security, such as data validation, trans-
parency, and access controls, have been strengthened. These security mechanisms
are designed to work in conjunction between the user and the IoT, yet they still have
security flaws. These security flaws in IoT can lead to a wide variety of issues, and
research is needed to address them.
As a result, a different module is required to provide IoT network security [].
Network security can be handled by the use of a software application known as an
intrusion detection system (IDS), which examines the system’s or networks’ harmful
things [6]. The basic goal of any IDS is to distinguish between abnormal and normal
traffic patterns [7].The presence of anomalies often wastes a lot of resources and leads
to a serious situation. The detection of anomalies may have a significant impact on the
overall efficiency of monitoring systems. Several techniques to implement anomaly
based intrusion detection systems have been presented, but machine learning (ML) is
currently a popular approach among researchers [8]. Machine learning-based IDSs
for security against IoT networks have been described in many surveys. Machine
learning advances have brought a new era for artificial intelligence and opened the
path for the creation of intelligent intrusion detection systems [9]. Machine learning
(ML) is widely regarded as one of the best computational paradigms for providing
embedded intelligence in IoT devices. In tasks including classification, regression,
and density estimation, machine learning is applied. Various applications such as
fraud detection, virus detection, authentication, speech recognition, and bioinfor-
matics use machine learning techniques. Machine learning algorithms are complex;
they require a huge amount of domain knowledge as well as humanitarian assistance
and are only willing of doing what they are designed for.
Deep learning models are frequently used to tackle problems involving sequential
input data, such as time series. Time series forecasting is part of predictive analytics
[10]. In the time series analysis, it is possible to do regression analysis against a
set of past values of the variables. In most anomaly intrusion detection approaches,
time series patterns are used to train the models. In the IoT environment, numerous
methods have been developed to expose the abnormality in historical data using real-
time analytics and the prediction of abnormal actions [11]. However, the use of DL
Time Series-Based IDS for Detecting Botnet Attacks in IoT … 353
such as web downloads, exploitation tools, popup advertisements, and email attach-
ments, hackers can produce zombie devices after the initial malware infection. For
the centralized botnet, the herder works differently and leads the network of bots
to a command and control server (C&C). In peer propagation, there is a peer-to-
peer botnet, and thus infected devices work on connecting with additional zombies.
When the bot herder has corrupted a large number of bots, they can organize their
attacks in stage 3. To receive their order, the zombie devices will download the most
frequent response from the command and control channel. The bot then executes
its commands and participates in hostile behavior. This way its herder continues to
maintain and expand its bot network from afar, allowing them to carry out a variety
of horrible crimes. Botnets do not pinpoint individuals because the purpose of the
bot herder is to infect as many devices as possible so that malicious assaults can be
carried out.
Figure 1[21] shows the working of a botnet. This procedure is divided into several
steps. Botnets, when fully operational, may carry out large-scale attacks. To improve
a botnet’s ability, hackers must provide it with additional machinery or gadgets. Bot
herder is necessary to guide the network’s connected infected devices. It works with
remote commands and directs the gadgets to complete specific tasks. Mirai botnet
attacks and Zeus botnets attacks are an example of botnets [21]. Botnet identification
is difficult due to the diversity of botnet protocols and architecture. Botnet attacks
in specific are extremely challenging. After capturing the botnet, the intruder can
use the C&C server to manage the devices and perform attacks against the target
hosts. Any attacker will be attracted as it becomes easy to infect and the produced
bot population has become stable.
IoT is widely accepted and used in a variety of fields, including healthcare, smart
homes, and agriculture [22]. On the other hand, the Internet of Things faces resource
constraints and varied surroundings, such as low processing power and storage. These
limitations make it difficult to provide and deploy a secure system in IoT devices.
These limits worsen the IoT environment’s already-existing issues. As a result of
the vulnerability of IoT devices, numerous types of attacks are feasible [23]. The
IoT-based botnet threat is among the most common; it grows quickly and has a
larger effect than other types of attacks. A botnet attack is a substantial cyberattack
launched by fully automated ransomware machines. For a botnet controller, it turns
infected devices into “zombie bots” [24]. The smart bot is generally referred as a
software robot which will hunt for vulnerable smart nodes and will infect them to
make them the part of a larger botnet as conventional bots do. It is similar to malware
propagation procedure running in the background. The management of the smart
botnet is handled by a malicious node that deploys these bots to complete tasks
cohesively. Distributed attacks, spams, phishing, clicking frauds, spambots, brute
force attacks, and spyware are all examples of coordinated action [25]. The dearth
Time Series-Based IDS for Detecting Botnet Attacks in IoT … 355
2. connection
1. Infection
Infected
device
Spyware
propagation
4. Multiplication
3. control
ransomware
phishing Sniffing traffic
keylogging DDoS
trojans
spyware
Botnet malware
they might spread and inflict real harm using the same infiltration techniques as the
original threat demonstrates IoT device manufacturers’ persistent disdain for even
basic security procedures [34]. If the security industry does not respond more quickly
and build new countermeasures, increasingly sophisticated attacks may become the
norm, threatening the Internet’s infrastructure [35].
With the rapid development of threats and the variability of attack methods, Internet
of Things (IoT) devices faces significant challenges in detecting security flaws and
attacks [36]. Many detection technologies and methodologies that use full time series
data during malware operation and are based on machine learning/deep learning
are becoming more popular. Almost all current intrusion detection and prevention
systems do not harness the power of time series modeling [37]. Software developers
will be able to better manage resource allocation and system readiness to fight against
malicious activity by using time series models. Time series analysis is a technique
for forecasting future events based on the premise that future predictions would be
similar to previous patterns. Forecasting is the practice of fitting models to historical
data to forecast future values [38]. Time series analysis is a dynamic study area that
has gained a lot of attention because of its potential applications in a wide variety
of fields [39]. Much effort was put into developing and refining the development
and refinement of time series analysis models during the last several decades. One
of the most critical concerns among the areas of study ranging from dimensionality
reduction to data segmentation is time series prediction for acquiring future trends
and tendencies [40]. The results can be used for a variety of applications, such as
production planning, control, optimization, and so on. As a result, different models
for handling this problem have been presented, such as Auto-Regressive Integrated
Moving Average (ARIMA) filtering-based approaches support vector machines, and
so on.
Machine learning and deep learning are important in every sector making
machines intelligent. Examples of machine learning are everywhere. Machine
learning algorithms are complex and require a lot of domain knowledge and human
intervention, whereas deep learning holds more promise for AI creators and the
rest of the world in this aspect. Researchers have made considerable progress in
detecting botnets using classical machine learning methods such as Naive Bayes,
SVM, random forests, and networking algorithms such as DBSCAN and X-means
based on a variety of aspects to create a model that can distinguish malicious network
traffic over the last few decades [39]. These detection models have low false-positive
and false-negative rates. Artificial neural networks (ANNs) are versatile computa-
tional frameworks that may be used to accurately solve a wide range of time series
358 S. Sharma et al.
problems. However, using artificial neural networks to model linear problems has
shown varied results, so it’s not a good idea to use them indiscriminately on any data.
Through network traffic, the convolutional neural network (CNN) primarily learns
spatial characteristics from the spatial dimension. Malicious behavior affects IoT
devices; therefore, a CNN-based deep learning algorithm was proposed to detect
small variations in power consumption data. CNN is most suitable for spatial data
and classification such as images, it takes fixed-size input and generates fixed-size
output [41]. From the time dimension, a recurrent neural network (RNN) learns the
properties of data traffic in time series. In 1, author proposed a model based on the DL
method, i.e., CNN+RNN that extracts features from two dimensions time and space
automatically [39]. RNN detects botnet by characterizing communication patterns
in a network as a series of time-varying states. Bi-LSTM—RNN model was built
to identify botnet activities in consumer IoT devices and networks [42]. The benefit
of using the Bi-LSTM neural network to obtain data for detection was that it could
extricate features for classification by learning the perspective correlation of vectors
in the sequence more effectively. LSTM model is used to detect network anomaly
detection. LSTM networks are a sort of RNN that includes a hybrid of special and
standard units. RNN suffers from the problem of exploiting and vanishing gradient
to solve this problem LSTM and GRU were introduced [43]. LSTM addresses these
issues by introducing new gates, such as input and forget gates that allow for greater
control of the gradient flow and improved preservation of “long-range dependencies”.
GRU was introduced as a modification for LSTM to do machine translation for easy
structure and convenience to solve problems.
5 Conclusion
Botnets have been seen as a severe threat to cybersecurity as they serve as a platform
for a variety of cybercrimes, including DDoS attacks against critical targets, virus
distribution, phishing, and click fraud. Despite the fact that malevolent botnets have
been around for a long time, still very difficult to detect them at the growing phase,
and botnet research is still in its infancy. In this paper, we focused on IoT intru-
sion detection systems, and we studied different machine learning and deep learning
models to detect various forms of IoT network attacks on time series data, which are
typically carried out by botnets. We also analyzed the detrimental consequences of
Mirai, its variations, and other similar botnets on the Internet depicting the reality
that IoT devices reflect. We find that machine learning models are complex and
are commonly utilized in projects that require predicting outcomes or identifying
trends. Machine learning is used when there is limited, structured data available.
Deep learning models work well with time series data and perform well for predic-
tive analysis. Because of its greater accuracy when trained with massive volumes of
data, deep learning is becoming increasingly popular with RNN as the best intrusion
Time Series-Based IDS for Detecting Botnet Attacks in IoT … 359
References
1. Al-Taleb N, Saqib NA (2022) Towards a hybrid machine learning model for intelligent cyber
threat identification in smart city environments. Appl Sci (Switzerland) 12(4):1863. https://doi.
org/10.3390/app12041863
2. Malhotra P, Singh Y, Anand P, Bangotra DK, Singh PK, Hong W-C (2021) Internet of things:
evolution, concerns and security challenges. Sensors 21(5):1–35. https://doi.org/10.3390/s21
051809
3. Anand P, Singh Y, Selwal A, Alazab M, Tanwar S, Kumar N (2020) IoT vulnerability assess-
ment for sustainable computing: threats, current solutions, and open challenges. IEEE Access
8:168825–168853. https://doi.org/10.1109/ACCESS.2020.3022842
4. Anand P, Singh Y, Selwal A, Singh PK, Felseghi RA, Raboaca MS (2020) IoT: internet of
vulnerable things? Threat architecture, attack surfaces, and vulnerabilities in the internet of
things and its applications towards smart grids. Energies (Basel) 13(18):4813. https://doi.org/
10.3390/en13184813
5. Smys S, Basar A, Wang H (2020) Hybrid intrusion detection system for internet of things (IoT).
J ISMAC 2(4):190–199. https://doi.org/10.36548/jismac.2020.4.002
6. Putchala MK (2017) Deep learning approach for intrusion detection system (IDS) in the
internet of things (IoT) network using gated recurrent neural networks (GRU). Retrieved
from https://etd.ohiolink.edu/apexprod/rws_etd/send_file/send?accession=wright150368045
2498351&disposition=inline
7. Lazarevic A, Ertoz L, Kumar V, Ozgur A, Srivastava J (2003) A comparative study of anomaly
detection schemes in network intrusion detection. In: Proceedings of the 2003 SIAM inter-
national conference on data mining, pp 1–12. Retrieved from https://epubs.siam.org/doi/epdf/
https://doi.org/10.1137/1.9781611972733.3
8. Sinclair C, Pierce L, Matzner S (1999) An application of machine learning to network intrusion
detection. In: Proceedings 15th annual computer security applications conference (ACSAC’99).
IEEE, pp 371–377. https://doi.org/10.1109/CSAC.1999.816048
9. 2018 10th Computer Science and Electronic Engineering (CEEC). IEEE, 2018.
10. Wu Y, Liu Y, Ahmed SH, Peng J, El-Latif AAA (2020) Dominant data set selection algorithms
for electricity consumption time-series data analysis based on affine transformation. IEEE
Internet Things J 7(5):4347–4360. https://doi.org/10.1109/JIOT.2019.2946753
11. Saufi SR, Ahmad ZAB, Leong MS, Lim MH (2019) Challenges and opportunities of deep
learning models for machinery fault detection and diagnosis: a review. IEEE Access 7:122644–
122662. https://doi.org/10.1109/ACCESS.2019.2938227
12. Ahmad Z, Khan AS, Shiang CW, Abdullah J, Ahmad F (2021) Network intrusion detection
system: a systematic study of machine learning and deep learning approaches. Trans Emerg
Telecommun Technol 32(1):e4150. https://doi.org/10.1002/ett.4150
13. Shone N, Ngoc TN, Phai VD, Shi Q (2018) A deep learning approach to network intrusion
detection. IEEE Trans Emerg Top Comput Intell 2(1):41–50. https://doi.org/10.1109/TETCI.
2017.2772792
14. Jiang K, Wang W, Wang A, Wu H (2020) Network intrusion detection combined hybrid
sampling with deep hierarchical network. IEEE Access 8:32464–32476. https://doi.org/10.
1109/ACCESS.2020.2973730
15. Otoum S, Kantarci B, Mouftah HT (2019) On the feasibility of deep learning in sensor network
intrusion detection. IEEE Networking Lett 1(2):68–71. https://doi.org/10.1109/LNET.2019.
2901792
360 S. Sharma et al.
36. Denning DE (1987) An intrusion-detection model. IEEE Trans Software Eng SE-13(2):222–
232. https://doi.org/10.1109/TSE.1987.232894
37. Gao X, Shan C, Hu C, Niu Z, Liu Z (2019) An adaptive ensemble machine learning model for
intrusion detection. IEEE Access 7:82512–82521. https://doi.org/10.1109/ACCESS.2019.292
3640
38. Henderson T (2020) TIME Series Analysis for Botnet Detection. Master’s thesis, George Mason
University
39. Han Z, Zhao J, Leung H, Ma KF, Wang W (2019) A review of deep learning models for time
series prediction. IEEE Sens J 21(6):7833–7848. https://doi.org/10.1109/JSEN.2019.2923982
40. Zhang J, Pan L, Han Q-L, Chen C, Wen S, Xiang Y (2022) Deep learning based attack detection
for cyber-physical system cybersecurity: a survey. IEEE/CAA J Automatica Sinica 9(3):377–
391. https://doi.org/10.1109/JAS.2021.1004261
41. Wurzinger P, Bilge L, Holz T, Goebel J, Kruegel C, Kirda E (2009) Automatically generating
models for botnet detection. In: Computer security – ESORICS 2009. ESORICS 2009. Lecture
notes in computer science, vol 5789. Springer, Berlin, Heidelberg, pp 232–249. https://doi.org/
10.1007/978-3-642-04444-1_15
42. Viinikka H, Debar L, Mé L, Séguier R (2006) Time series modeling for IDS alert management.
In: ASIACCS ’06: Proceedings of the 2006 ACM symposium on information, computer, and
communications security, pp 102–113. https://doi.org/10.1145/1128817.1128835
43. Fu R, Zhang Z, Li L (2016) Using LSTM and GRU neural network methods for traffic
flow prediction. In: 2016 31st Youth academic annual conference of Chinese association of
automation (YAC), pp 324–328. https://doi.org/10.1109/YAC.2016.7804912
Swarm Intelligence-Based
Energy-Efficient Framework in IoT
Abstract Internet of things (IoT) has been developed for use in a variety of fields
in recent years. The IoT network is embedded with numerous sensors that can
sense data directly from the environment. The network’s sensing components func-
tion as sources, observing environmental occurrences and sending important data
to the appropriate data center. When the sensors detect the stated development,
they send this world data to a central station. Sensors, on the other hand, have
limited processing, energy, transmission, and memory capacities, which might have
a detrimental influence on the system. We have concentrated our current research on
lowering sensor energy consumption in IoT network. This study chooses the most
appropriate potential node in the IoT network to optimize energy usage. Throughout
this paper, we suggest a fusion of techniques that combines PSO’s exploitation capa-
bilities with the GWO’s exploration capabilities. The fundamental concept is to
combine the strengths of the PSO’s capability to exploit with Grey Wolf Optimizer’s
ability for efficient potential node selection. The proposed method is compared to
the traditional PSO, GWO, Hybrid WSO-SA, and HABC-MBOA algorithms on the
basis of several performance metrics.
1 Introduction
The Internet of things (IoT) [1] has rapidly grown technologically in recent years,
and there is an abundance of enabling gadgets that use this technology. The relations
“Internet” and “Things” talk about a global network that is interconnected and based
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 363
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_28
364 Simran et al.
Table 1 lists all of the acronyms used in the text. The remainder of the paper is
organized as follows: The literature work is discussed in Sect. 2. In Sect. 3, the GWO–
PSO is examined. The described energy-efficient model is described in Sect. 4. The
conclusion is presented in Sect. 5.
The research contributions of the study are as follows:
• To conduct comparative analysis of energy-efficient PSO-based technique in IoT.
• To define and design the energy-efficient framework for IoT.
• To hybridize the working of PSO and GWO for efficient potential node selection
in IoT.
2 Literature Survey
Alqattan and Abdullah [14] found that PSO is more reliable than the ABC. Author
used the ABC technique along with PSO algorithm to evaluate the Protein Struc-
ture Prediction. Different metrics are utilized to evaluate the performance of two
algorithms, including Colony size (S) [total number of working and watching bees],
Swarm population size (N), S1 stands for Self-confidence, and Swarm-confidence
stands for (S2). Using these various parameters, the author demonstrates that the
PSO methodology outperforms the Artificial Bee Colony in terms of Time, Average
Number of Function Evaluation, and accuracy figures by 70%, 73 percent, and 3.6
percent, respectively.
Cluster heads (CHs) in a wireless sensor network (WSN) consume more energy
due to increased overload for receiving and gathering data, according to Rao et al. [15]
The author of this paper revealed a cluster head selection approach using Particle
Swarm Optimization for energy efficiency. The CH selection is influenced by the
fitness function of remaining energy, load, temperature, and aliveness of nodes. As
a result, the cluster head is chosen to optimize network speed and network lifetime.
Based on a high-energy node with low load, latency, range, and power heat, CH
is chosen to enhance the fitness function should be maximized to increase the net-
stability works and efficiency. In addition, Iwendi et al. [16] describe the fitness func-
tion to determine cluster head (CH) using α, β, Φ, ω, and θ as weighted parameters,
and the computational parameters used in calculating fitness function are FFenergy
(Energy computation), FFdistance (Distance computation), FFdelay (Delay compu-
tation), FFtemperature (Temperature computation), and FFload (Load computation).
And the total of these values is the fitness function [8].
Vijayalakshmi and Anandan [17] discussed the Tabu-PSO model, a hybrid PSO
and Tabu method to select the cluster head with the least power utilization rate in the
cluster and to increase the flexibility to pick the CH in a IoT network by utilizing a
hybrid heuristic approach, in [5]. By expanding the number of clusters and enhancing
the node survival rate, Tabu research was used to increase the ethnic variety of PSO
in order to prevent local optimal issues. In comparison with the low-energy adaptive
clustering hierarchy algorithm and Particle Swarm Optimization, their suggested
methodology effectively reduces the overall packet loss rate by 27.32 percent and
the average end-to-end delay by an average of 1.2 s.
Further, in [11], the author introduces the GWO, a novel SI-based algorithm
influenced with grey wolves. Twenty-nine test functions were used to evaluate the
proposed algorithm’s performance in terms of search, attack, avoidance of local
optima, and convergence. The author discovered that GWO gave exceptionally
competitive outcomes when contrasted to well-known heuristics such as PSO [18],
GSA [19], DE, EP, and ES.
Swarm Intelligence-Based Energy-Efficient Framework in IoT 367
In 1995, John Kennedy and Eberhart introduced the PSO concept [18]. PSO is a
methodology for population-based stochastic optimization. It is made up of a swarm
of particles (fishes, birds, etc.) wandering across a search area for probable solutions
to complex problems. Each individual has a velocity vector and a position vector
that represents a possible solution to the issue. The velocity here refers to processing
time or coverage, and the position here to the rank of a test case in a testing process.
In addition, each particle has a little memory that remembers its own best position so
far as well as a global best position achieved through interaction with its neighbors.
Particle Swarm Optimization took what it had acquired from the situation and applied
it to the optimization challenges. In Particle Swarm Optimization, every solution is a
“bird” in a solution area. It’s referred to as a “particle or individual”. Every individual
possesses fitness values that the fitness function evaluates to optimize them, along
with velocities that guide their flight. Through the solution area, the individual follows
the current optimal individual. PSO starts with a set of random solutions and then
iterates over generations in search of an optimal solution. Each particle is restructured
every cycle by comparing two “best” values. Currently, the first option (fitness) is
the most effective. (In addition, the fitness value is kept.) pbest is the name given to
this integer. Another “finest” value recorded by the particle swarm optimizer is the
best value reached yet by each individual in the swarm and gbest, which refers as
“global best”, is the highest value. Figure 1 depicts the overall concept of Particle
Swarm Optimization.
Due to its advantage over other algorithms like GA [25], PSO pays greater
emphasis to maximizing WSN lifespan. It has a number of advantages, including ease
of use, the ability to avoid local optima, and quicker convergence. Its fitness function
takes into account the leftover energy of nodes as well as the distance between them,
368 Simran et al.
giving the WSN an optimum path thanks to PSO’s ability to avoid local optima. PSO
is also utilized for node location, CH selection, and cluster creation, among other
things. The goal of PSO implementations is to aid energy management by lowering
energy costs per process and thereby increasing node lifespan.
Swarm Intelligence-Based Energy-Efficient Framework in IoT 369
Pseudocode of PSO
Step 1: Begin
Step 2: Initialization
For each particle
(a) Initialize particles position with uniform distribution
(b) Initialize particle velocity
End For
Step 3: Do
For each individual
Evaluate the fitness function
If the fitval (fitness value) in the history is superior than pbest
Set the current value to the new pbest value
End If
End For
Select the individual with best fitval (fitness value) among all the individuals as gbest
For each individual
Update the vel i (t + 1) as determined by Eq-1
Update the xi (t + 1) as determined by Eq-2
End For
Until stopping criteria is fulfilled
End Begin
The individual or particle adjusts its velocity and locations using equations
(Velocity Update equation) and (Position Update equation) after selecting the two
optimal values (Position Update equation).
Velocity Update Equation
veli (t + 1) = w · veli (t) + l1r1 [ pbesti (t) − xi (t)] + l2 r2 [gbesti (t) − xi (t)]
(1)
370 Simran et al.
where
i Particle index
w Inertial coefficient
l1 , l2 Learning elements (0 ≤ l1 , l2 ≤ 2)
r1 · r2 Random variables (0 ≤ r 1, r 2 ≤ 1)
veli (t) i th Particle’s velocity having time t
xi (t) Current position of particle having time t
pbesti (t) Particle’s best solution having time t
gbesti (t) Global best solution at time t
Mirjalili et al. [11] introduced the GWO algorithm. Grey wolves’ social structure
and hunting behavior have influenced the GWO. The testing findings revealed its
capability and great performance in handling a variety of traditional engineering
design challenges, including spring tension, welded beams, and so on. Grey wolf
leadership is a source of inspiration for the GWO algorithm. Grey wolves are the
most powerful predators on the planet. Within the leadership structure, there are four
different sorts of grey wolves, i.e., α, β, δ, and ώ.
The optimal answer in the GWO algorithm is represented by alpha (α) wolves.
Beta (β) and delta (δ) wolves are the population’s second and third best solutions.
Omega (ώ) wolves are the finest prospects for a solution. The GWO method assumes
that alpha, beta, and delta wolves hunt, with omega wolves trailing after them. The
three primary aspects of grey wolf hunting are as follows: (1) following the prey,
chasing it down, and approaching it. (2) Pursuing, surrounding, and torturing the
prey until it comes to a complete stop. (3) Attacking the prey by surprise.
The formula is as follows:
The number of iterations is t, the position of prey is Xp, and the position of a grey
wolf is X. While a and C are vector coefficients, r 1 and r 2 specify random numbers.
Dα, Dβ and Dδ are the fitness functions for alpha, beta, and gamma groups.
Swarm Intelligence-Based Energy-Efficient Framework in IoT 371
Step 1: Begin
Step 2: Initialize a, C, and t = 1
Step 3: Calculate each individual’s fitness in the population
(a) Xα = individual having best fitness value
(b) Xβ = individual having second best fitness value
(c) Xδ = individual having third best fitness value
While (i < Maximum_itr)
For each individual
Position of current individual is updated using equation
X (t + 1) = (X 1 + X 2 + X 3 )/3 where X 1 , X 2 and X 3 are position vector of α, β and δ wolves
End For
Update t, a, C
Calculate the fitness of all individual
Update Xα, Xβ, Xδ
i=i+1
End While
Return Xα
Step 4: Return best solution
Step 5: End
4 Proposed Framework
• Physical Layer: Diverse nodes are generally dispersed across the geograph-
ical region in the perception layer. The intelligent gadgets that operate on the
bottom layer are each given a unique identifier. A detector, a processor unit, a
transceiver unit, and a legitimate power supply describe intelligent devices on the
bottom layer. Manufacturers are creating a variety of smart devices with varying
specifications, standards, and technologies.
• Network Layer: Middle layer is the core layer where transmission of data takes
place. The data collected by sensor nodes in the perception layer is sent to the
network layer for processing. The network layer is broken into three portions:
1. pbest and gbest node selection: pbest and gbest is the particle best and global
best solution respectively. Pbest and gbest node selection is done via PSO
approach. The particle adjusts its velocity and locations using the particle
velocity update equation as Eq. 1 and particle position update position equation
as Eq. 2 after obtaining the two best values.
2. Potential node selection in each region for energy efficiency: This stage entails
aggregates sensor nodes and choosing Potential nodes (PNs) for all of the
clusters in each region. The information acquired from all of the nodes will
be compiled by the potential node and transferred to the IoT base node. To
determine the potential node, a PSGWO methodology is used.
3. Attain optimal solution and performance metrics: To obtain the best solution,
a PSO model is used with GWO. By employing the GWO technique’s explo-
ration capabilities, the PSO method will be prevented from trapping into local
minimums and hence an optimal solution can be attained. The performance
metrics utilized to pick the potential node include the amount of load, living
nodes, energy, network lifetime and throughput
• Application Layer: Mobile consumers, businesses, and huge organizations all
benefit from the services provided by the applications layer. It is the top most
user interactive layer. The real communication is initiated and reflected at the
application layer. Cost function may be assessed using quality criteria such as
latency, node lifespan, and residual energy.
5 Conclusion
Despite the fact that IoT has huge potential in a variety of applications in the modern
period, there are a number of barriers to overcome. Privacy, energy optimization,
networking, hardware configuration concerns, data network congestion, and other
issues must be solved to improve the resilience of IoT. We choose to focus on the
energy optimization problem in this study. To resolve this issue, a hybrid meta-
heuristic framework based on PSO–GWO is devised to reduce the sensor’s energy
usage in IoT networks in this study. The key advantage of PSO is that there are
fewer parameters to tweak. PSO gets the optimal result through particle interaction;
374 Simran et al.
however, it converges at a relatively slow speed to the global optimum through a high-
dimensional search area. To overcome this, PSO is hybridized with GWO’s explo-
ration capabilities to avoid local minima problems. In this study, several performance
parameters including energy consumption, network lifetime, alive nodes, tempera-
ture, and throughput are taken into account to choose the best potential node for the
IoT network. Using various simulations, we will evaluate the performance of the
proposed algorithm and compare it to other metaheuristic techniques such as PSO,
GWO, Hybrid WSO-SA, and HABC-MBOA algorithms.
References
1. Chopra K, Gupta K, Lambora A (2019) Future internet: the internet of things-a literature review.
In: 2019 International conference on machine learning, big data, cloud and parallel computing
(COMITCon). IEEE, pp 135–139. https://doi.org/10.1109/COMITCon.2019.8862269b
2. Alaa M et al (2017) A review of smart home applications based on internet of things. J Netw
Comput Appl 97:48–65. https://doi.org/10.1016/j.jnca.2017.08.017
3. Rana B, Singh Y (2021) Internet of things and uav: an interoperability perspective. In:
Unmanned aerial vehicles for internet of things (IoT): concepts, techniques, and applications,
pp 105–127. https://doi.org/10.1002/9781119769170.ch6
4. da Costa VCF, Oliveira L, de Souza J (2021) Towards a taxonomy for ranking knowledge in
internet of everything. In: 2021 IEEE 24th International conference on computer supported
cooperative work in design (CSCWD). IEEE, pp 775–780. https://doi.org/10.1109/CSCWD4
9262.2021.9437857
5. Miraz MH, Ali M, Excell PS, Picking R (2015) A review on internet of things (IoT), internet
of everything (IoE) and internet of nano things (IoNT). In: 2015 Internet technologies and
applications (ITA). IEEE, pp 219–224. https://doi.org/10.1109/ITechA.2015.7317398
6. Rachman T (2018)済無. Angew Chem Int Ed 6(11):951–952 (pp 10–27)
7. Rana B, Singh Y, Singh PS (2021) A systematic survey on internet of things : energy efficiency
and interoperability perspective. Trans Emerg Telecommun Technol 32(8):e4166 (pp 1–41).
https://doi.org/10.1002/ett.4166
8. Rana B, Singh Y, Singh H (2021) Metaheuristic routing: a taxonomy and energy-efficient
framework for internet of things. IEEE Access 9:155673–155698. https://doi.org/10.1109/
ACCESS.2021.3128814
9. Kumoye AO, Prasad R, Fonkam M (2020) Swarm intelligence algorithm and its application:a
critical review. In: 2020 International conference in mathematics, computer engineering and
computer science (ICMCECS). IEEE, pp 1–7. https://doi.org/10.1109/ICMCECS47690.2020.
246996
10. Sun W, Tang M, Zhang L, Huo Z, Shu L (2020) A survey of using swarm intelligence algorithms
in IoT. Sensors (Switzerland) 20(5):1420. https://doi.org/10.3390/s20051420
11. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey Wolf Optimizer. Adv Eng Softw 69:46–61.
https://doi.org/10.1016/j.advengsoft.2013.12.007
12. Long W, Xu S (2016) A novel grey wolf optimizer for global optimization problems. In: 2016
IEEE Advanced information management, communicates, electronic and automation control
conference (IMCEC). IEEE, pp 1266–1270. https://doi.org/10.1109/IMCEC.2016.7867415
13. Devika G, Ramesh D, Karegowda AG (2020) Swarm intelligence–based energy-efficient clus-
tering algorithms for WSN: overview of algorithms, analysis, and applications. In: Swarm
intelligence optimization: algorithms and applications, pp 207–261
Swarm Intelligence-Based Energy-Efficient Framework in IoT 375
14. Alqattan ZNM, Abdullah R (2013) A comparison between artificial bee colony and particle
swarm optimization algorithms for protein structure prediction problem. In: Neural information
processing. ICONIP 2013. Lecture notes in computer science, vol 8227, no Part 2. Springer,
Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-42042-9_42
15. Rao PCS, Jana PK, Banka H (2016) A particle swarm optimization based energy efficient
cluster head selection algorithm for wireless sensor networks. Wireless Netw 23:2005–2020.
https://doi.org/10.1007/s11276-016-1270-7
16. Iwendi C, Maddikunta PKR, Gadekallu TR, Lakshmanna K, Bashir AK, Piran MJ (2021) A
metaheuristic optimization approach for energy efficiency in the IoT networks. Softw Pract
Exp 51(12):2558–2571. https://doi.org/10.1002/spe.2797
17. Vijayalakshmi K, Anandan P (2019) A multi objective tabu particle swarm optimization for
effective cluster head selection in WSN. Cluster Comput 22:12275–12282. https://doi.org/10.
1007/s10586-017-1608-7
18. Okwu MO, Tartibu LK (2021) Particle swarm optimisation. In: Metaheuristic optimization:
nature-inspired algorithms swarm and computational intelligence, theory and applications.
Studies in computational intelligence, vol 927. Springer, Cham, pp 5–13. https://doi.org/10.
1007/978-3-030-61111-8_2
19. Duman S, Güvenç U, Sönmez Y, Yörükeren N (2012) Optimal power flow using gravitational
search algorithm. Energy Convers Manag 59:86–95. https://doi.org/10.1016/j.enconman.2012.
02.024
20. Manshahia MS (2019) Grey wolf algorithm based energy-efficient data transmission in internet
of things. Procedia Comput Sci 160:604–609. https://doi.org/10.1016/j.procs.2019.11.040
21. Şenel FA, Gökçe F, Yüksel AS, Yiğit T (2019) A novel hybrid PSO–GWO algorithm
for optimization problems. Eng Comput 35:1359–1373. https://doi.org/10.1007/s00366-018-
0668-5
22. Kaur S, Mahajan R (2018) Hybrid meta-heuristic optimization based energy efficient protocol
for wireless sensor networks. Egypt Inform J 19(3):145–150. https://doi.org/10.1016/j.eij.2018.
01.002
23. Rastogi R, Srivastava S, Tarun, Manshahia MS, Varsha, Kumar N (2021) A hybrid optimization
approach using PSO and ant colony in wireless sensor network. In: Mater today: proceedings.
https://doi.org/10.1016/j.matpr.2021.01.874
24. Sundaramurthy S, Jayavel P (2020) A hybrid grey wolf optimization and particle swarm
optimization with C4.5 approach for prediction of rheumatoid arthritis. Appl Soft Comput
94:106500. https://doi.org/10.1016/j.asoc.2020.106500
25. Lambora A, Gupta K, Chopra K (2019) Genetic algorithm- a literature review. In: 2019 Interna-
tional conference on machine learning, big data, cloud and parallel computing (COMITCon).
IEEE, pp 380–384. https://doi.org/10.1109/COMITCon.2019.8862255
Performance Analysis of MADM
Techniques in Cognitive Radio
for Proximity Services
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 377
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_29
378 M. Kaur and D. Singh
1 Introduction
The overcrowding spectrum seems like an old chest nut due to the burgeoning of
wireless devices and mobile devices and makes the spectrum scare. Each wireless
service has its own licensed spectrum. Some of the wireless devices have wide spec-
trum with small number of users in consequence the spectrum remains underutilized
while in other remains over utilized [1]. To overthrow the spectrum underutilized
dynamic spectrum access comes into play. A scheme that uses DSA is referred as
Cognitive Radio [2].
The capability of CR to vanquish the insufficiency of spectrum underutilized. It
permits secondary users to access licensed spectrum of primary use while evading
intervention with PU. The prospective of CR to enhance the spectrum efficiency and
spectrum becomes well founded [3–5].
The key features for the Cognitive Radio for proximity services are spectrum
sensing and spectrum database. CR devices follow the spectrum band in their region
to discover its numerous authorized user and white spaces. These spectrum holes are
generated and separated synchronously and can be utilized without an authorization.
Spectrum sensing can be distinguished into two categories such as non-cooperative
and cooperative. In cooperative method, the spectrum data can be divided by unli-
censed user devices while in other each unauthorized user independently. Federal
Communication Commission (FCC) suggested a spectrum database idea to get rid
of problem of spectrum sensing approach and to utilize TV void space. To carry
the increasing number of devices that uses the radio frequency spectrum, a merged
approach is advantageous. It confirms that devices can rapidly detect unexploited
spectrum and upgrade quality of services.
When underutilized frequency band is used by unlicensed user, then maintenance
of data transmission is necessary on channel by spectrum handoff scheme [4]. Spec-
trum handoff implies on spectrum sensing to observe the suitable optimum channel.
This handoff triggering relies on three techniques are proactive handoff, reactive
handoff, and hybrid handoff scheme [6]. These schemes concede secondary users to
switch to other channel without intrusion their transmission when PU is in licensed
band [7, 8].
2 MADM Methodologies
MADM coincides with quad play services having fixed voice, fixed video, and
paid TV services with mobile voice and data services.
The number of attributes is Wi MAX, Wi-Fi, Cellular and Satellite with some
of the alternatives is Delay, Data rate, Packet loss ratio, Price is shown in Table 1.
Various attributes are considered as instrumental such as data rate while others as
detrimental. The decision of MADM depends on the precedence level of alternatives.
MADM methods are having best decision making capabilities (such as prioritization,
selection) from the available alternatives. Therefore, this is the reason to choose
MADM for spectrum handoff for selecting optimum network. To resolve multiple
attributes decision making issues few mechanisms are as follows in Fig. 1.
It is the method of calculating Grey relational degree [9]. GRA utilizes grey system
theory. The strength of this procedure is simple in calculations and in priority deci-
sion, and it decreases error probability. The weakness of this method is used to handle
380 M. Kaur and D. Singh
only uncertainties in the data like missing information and partial information in the
data or other data set is small for processing. The methods of Grey Relational Analysis
are in Fig. 2.
Fig. 4 ROC curve for energy detection with different noise uncertainty factor
In proposed scheme, we implement the entropy, GRA, cost function, SAW, and
TOPSIS methodologies to estimate the performance using MATLAB. Results
acquired from this scheme for approach with precedence. Networks such as WIMAX,
WIFI, Cellular, Satellite were esteemed for ranking.
382 M. Kaur and D. Singh
Table 2 Network selection ranking matrix formulation using alternatives for quad play services
Alternatives Voice Video Data Mobile VOIP
Wi Max 0.8055 0.8248 0.7154 0.6100
Wi Fi 0.8183 0.6196 0.5848 0.5176
Cellular 0.8879 0.85314 0.6804 0.6266
Satellite 0.8285 0.91966 0.7587 0.6558
Ideal for handoff Cellular Wi Max Satellite Satellite
RSD (%) 4.34 16.06 10.80 9.9
RSD means Relative Standard Deviation. RSD helps us to evaluate the precision
or accuracy of data. To achieve more précised data, the value of RSD is small. If
value of RSD is large, the results are more extended from the mean of data. This will
degrade the performance of data. The best network for spectrum handoff is Cellular
(0.8879) and best service from quad play service is video services that is given in
Table 2 and represent data in bar graph in Fig. 5.
Cost Function-Based Method
The excellent network for spectrum handoff is Wi Max (0.8530) and the best service
is video service is in Table 3, and data is shown in bar graphs using MATLAB (Fig. 6).
TOPSIS Method
The optimum network for spectrum handoff is Cellular (0.8261) in Table 4. Based
on this Table, data represents in Fig. 7.
SAW Method
The ideal network for spectrum handoff is Satellite (0.8707) that is given in Table 5.
SAW network also represents data in bar graph in Fig. 8.
The graph estimates the performance between Relative Standard Deviation (RSD)
and MADM algorithms. When the value of RSD is lower, then the precision of data is
more that is shown in Fig. 9. The GRA depicts the RSD for voice, video and data and
mobile voice over Internet services are 4.34, 16.06, 10.80, and 9.9. These values are
small when compared with other methodologies. Therefore, this method is superior
for spectrum handoff.
Performance Analysis of MADM Techniques in Cognitive Radio … 383
Fig. 5 Representation of GRA algorithm implement to voice, video, data, and mobile VOIP services
Fig. 6 Representation of cost function implementation to voice, video, data, and mobile VOIP
services
Fig. 7 Representation of TOPSIS algorithm implement to voice, video, data, and mobile VOIP
services
Fig. 8 Representation of SAW algorithm implement to voice, video, data, and mobile VOIP
services
100
80
Voice
60
Video
40
Data
20 MobileVOIP
0
GRA Cost TOPSIS SAW
function
6 Conclusion
In this paper, we have proposed a spectrum handoff scheme for the best network
selection according to quad play services. In the CR network, MADM algorithm plays
a pivot role with the combination of entropy method. This method helps us to estimate
the attribute weights according to the CR preference. All MADM methods were
advantageous for determining the best network for spectrum handoff process. The
usage of attributes values is directly for GRA method to reduce the error probability.
Computational method of GRA is straightforward. Cost function method relies on
attribute values, and its implementation is uncomplicated. GRA overtops the other
entire algorithm because its value is more precise which is achieved by Relative
Standard Deviation. According to RSD method lower the value of RSD, the precision
of data is more. In the future research, these algorithms applied to network mobility
in CR vehicular network.
References
1. Yucek T, Arslan H (2009) A survey of spectrum sensing algorithms for cognitive radio
applications. IEEE Commun Surv Tuts 11(1):116–130 (1st Quart.)
2. Song M, Xin C, Zhao Y, Cheng X (2012) Dynamic spectrum access: From cognitive radio to
network radio. IEEE Wireless Commun 19(1):23–29
3. Ghasemi A, Sousa ES (2008) Spectrum sensing in cognitive radio networks: requirements,
challenges and design trade-offs. IEEE Commun Mag 46(4):32–39
4. Mahendru G, Shukla AK, Patnaik LM (2021) An optimal and adaptive double threshold based
approach to minimize error probability for spectrum sensing at low SNR regime
5. Akyildiz IF, Lee W, Vuran MC, Mohanty S (2008) A survey on spectrum management in
cognitive radio networks. IEEE Commun Mag 46(4):40–48
6. Akyildiz IF, Lee WY, Chowdhury KR (2009) Spectrum management in cognitive radio ad hoc
networks. IEEE Netw 23(4):6–12
7. Jaiswal M, Sharma AK, Singh V (2013) A survey on spectrum sensing techniques for cognitive
radio. Proc Conf ACCS 1–14
8. Kumar K, Prakash A, Tripathi R (2016) Spectrum handoff in cognitive radio networks: a
classification and comprehensive survey. J Netw Comput Appl 61:161–188
9. Bicen AO, Pehlivanoglu EB, Games S, Akan OB (2015) Dedicated radio utilization for
spectrum handoff and efficiency in cognitive radio networks. IEEE Trans Wireless Commun
14(9):5251–5259
10. Verma R, Singh NP (2013) GRA based network selection in heterogeneous wireless networks.
Wirel Pers Commun 72:1437–1452
11. Divya A, Nandakumar S (2019) Adaptive threshold based spectrum sensing and spectrum
handoff using MADM methods for voice and video services. IEEE Commun
Image Processing and Computer Vision
Comparison of Debris Removal
in Pap-Smear Images Using Enhanced
Erosion and Dilation
Abstract The pap-smear test is considered one of the most common methods avail-
able for cervical cancer screening. Women above a particular age are supposed to
undergo the cervical screening procedure at least once a year to identify whether
there is the presence of cancerous cells. Since the manual screening of each cell
from a pap-smear slide is tedious and time-consuming, automated pap-smear anal-
ysis is the need of the day. In the slides, during automated analysis, the presence of
unwanted materials such as bacteria and blood particles may produce a false diag-
nosis. The cell structure is one of the main challenges faced during the process.
Usually, debris removal is performed by many researchers along with segmentation.
While enhancing the images debris removal can be performed. This can improve the
quality of enhanced images. Therefore, debris removal plays a very important role in
the automated pap-smear analysis. In this paper, debris removal is performed using
enhanced erosion and dilation along with the image enhancement which makes it easy
for the subsequent processes. The debris removal methods are compared by using
two different binarization techniques. Initially, the images are preprocessed using
diffusion stop function-based CLAHE algorithm, and then, the images are under-
gone kittler binarization. Mathematical morphological operations like erosion and
dilation are applied to the image to remove the debris. Next, instead of kittler binariza-
tion, Otsu’s method is used. The results are evaluated with the performance measures
sensitivity, specificity, and accuracy. From the results obtained with a sensitivity value
of 99%, specificity of 99%, and accuracy of 98%, it is clear that the first method i.e.,
kittler binarization with morphological operations gives better performance.
S. Haridas (B)
Department of Computer Science, Avinashilingam Institute for Home Science and Higher
Education for Women, Coimbatore, India
e-mail: soumya.smya@gmail.com
T. Jayamalar
Department of Information Technology, Avinashilingam Institute for Home Science and Higher
Education for Women, Coimbatore, India
e-mail: jayamalar_it@avinuty.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 391
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_30
392 S. Haridas and T. Jayamalar
1 Introduction
Cervical cancer is one of the most dangerous diseases that can be identified and
treated if diagnosed at an early stage. Different diagnosis methods are available for
cervical cancer detection, out of which pap-smear analysis is the most common
method available. In a pap-smear, the doctor takes cells out of the cervical portion
of the patient, and it is sent to the laboratory for analysis. The laboratory techni-
cians prepare slides and analyze them under the microscope to identify the nature
of the cells. The cytotechnician analyzes the images by considering the signs of
malignancy [1]. These slides prepared may contain artifacts or debris due to blood
particles, air drying, or bacteria. Liquid-based cytology (LBC) slide preparation is
used to avoid this. In LBC, the sample is immersed in a solution, which is subse-
quently processed to normalize it, remove unwanted components (such as red blood
cells), and deposit an adequate mono-layer sample on a glass slide [2]. The cytol-
ogists can find the cancerous and other types of cells, such as precancerous ones,
after analyzing the slides [3]. Besides manual analysis, automated pap-smear anal-
ysis is also performed in which the software checks every slide based on specific
features to classify them as cancerous or non-cancerous. Usually, the automated
analysis consists of preprocessing the images, segmentation, feature extraction and
selection, and finally, classification. The accurate classification of cancerous as well
as non-cancerous cells is possible only if the nucleus and cytoplasm are correctly
identified and segmented [4]. Preprocessing is improving the quality of input images
by some methods such as noise removal and image enhancement. After prepro-
cessing, the region of interest is identified using segmentation, and the remaining
processes follow this. Many researchers have suggested methods for debris removal
during the segmentation phase. If the debris is removed early, it will help reduce
the time required for the following processing phase. In this paper, we propose a
method in which debris removal is performed after preprocessing. Here, the image is
first preprocessed for contrast enhancement and noise removal. Kittler binarization
is applied to this image to separate the background and foreground to identify the
objects. Morphological erosion and dilation are used in this image to fill the gaps and
remove protrusions. This method helps in removing the debris. Since segmentation
separates the nucleus and cytoplasm from the image, if the debris is removed at an
earlier stage, it will reduce the possibility of false identification of cells. Some debris
resembles the nucleus in shape. Suppose these are removed, the possibility of false
identification of debris as nucleus can be avoided. The time required for segmentation
can also be reduced using this approach.
2 Literature Survey
Kuko et al. have performed debris removal by thresholding [5]. After creating a
threshold image, contour pixels are found, and bounding rectangles are formed based
Comparison of Debris Removal in Pap-Smear Images Using Enhanced … 393
on these pixel values. The bounding rectangles below a threshold value of 2% of the
area of the sliding window are rejected and considered debris. The main drawback
of the method is that there is a chance of missed cells due to the intensity change
among some debris particles.
According to William et al., the reason for the failure of many of the existing
automated pap-smear analysis methods is the presence of debris [6], which may
generate some objects that may have similarities with the actual cells. A three-phase
elimination scheme is considered in their paper to classify the cells and debris. The
size, shape, and texture of the objects are analyzed. In size estimation, a threshold
is applied, in shape analysis, P2A descriptor is considered, and in texture analysis,
Zernike moments have been used.
Malm et al. considered a sequential elimination approach for debris removal [7].
During initial segmentation, the object that does not resemble a cell nuclei is rejected.
This will reduce the complexity in the next stages since there will be less amount
of objects left after elimination. The objects are analyzed and classified into cells or
debris in a sequence of steps. First, the area is considered, then the shape (region-
based and contour-based), then elliptical deviation, then texture, and finally, the
average gray value. The proposed method works efficiently by reducing the classifi-
cation dimensionality and saving computational power. The major drawback is that
the training data is to be updated to adapt to new variations.
According to Agarwal et al. [8], adaptive thresholding is applied to the image
after preprocessing, which will convert the image into a binary format. Mathematical
morphological operations such as closing and erosion are applied to the image after
that, and later the final nuclei objects are obtained.
Martin et al. developed an automatic rejection of non-cellular outlines [9]. The
process is carried out based on some quantitative metrics. Four different filters based
on the coefficient of variance of cytokeratin and Hoechst fluorescence (C&HF), the
correlation coefficient of C&HF, standard deviation of C&HF, and circularity value
of C&HF were considered, and the outlines were rejected accordingly. Cutoff values
for the filters were determined manually. Only less no of the samples were studied,
and with a large dataset, only the effectiveness of the filters can be further analyzed.
Moshavegh et al., in their paper, performed artifact rejection in such a way that
the nucleus is retained, thereby removing other unwanted objects [10]. The features
used here are size, shape, and nuclear granularity. These features are calculated for
each segmented nuclei-like object and then checked for shape, texture, and shape.
The objects that do not satisfy the rules about the features were rejected.
Kumar et al., in their paper, performed artifact rejection using SVM after feature
extraction [11]. After segmenting the images using Laplacian of Gaussian, the objects
in the image were classified as cervical cells or artifacts based on a set of ranked
features. The sequential minimal optimization method was used to optimize the
selected features. The selected features were ranked using the maximization func-
tion, and features were selected based on ranking and histogram analysis. They had
achieved an excellent level of proper classification.
Oscanoa et al., in their paper, performed artifact rejection by extracting certain
features that can differentiate cells from debris [12]. Area and perimeter of the objects
394 S. Haridas and T. Jayamalar
are calculated. Using thresholding the objects with values less than a fixed value are
rejected. They have achieved a nuclei detection efficacy of 92.3%. The main drawback
is missing nuclei due to less edge contrast and faint staining.
Various methods for debris removal have been discussed in the literature review,
and the need for a method that can remove the debris at an earlier stage of automated
pap-smear analysis is identified. Most of the methods discussed above do the debris
removal either after segmentation or during the segmentation process. Since the
segmentation stage requires the correct identification of the nucleus and cytoplasm
presence of debris may cause false results. If a method can remove debris from the
image before segmentation is performed, it will reduce the effort in the segmentation
phase.
The images were taken from the Sipakmed dataset, which contains 4049 isolated
cells. It falls into five categories, namely superficial-intermediate cells, parabasal
cells, koilocytotic cells, dyskerarotic cells, and metaplastic cells [12].
The images from the dataset are given to the DSF-CLAHE algorithm for enhance-
ment [13]. The DSF-CLAHE algorithm enhances the images so that each object in
the image is transparent for further processing. There is a chance of debris such
as bacteria, air drying, dye concatenation, and other unwanted things in the image.
Debris may hamper or reduce the performance of the cervical cancer diagnosis and
generate a large number of suspicious objects. Due to this, the classification accuracy
gets degraded. So the removal of debris is a necessary process. If the debris removal
process can be carried out earlier, it is effective during the subsequent processing. In
this paper, the debris is removed using the enhanced erosion and dilation operation.
Here, two binarization techniques, namely Otsu’s method and the kittler binariza-
tion are applied to the images separately. Then, the erosion and dilation operation
takes place on the results separately. The kittler enhanced erosion and dilation (KED)
method considerably improves the debris removal performance than Otsu enhanced
erosion and dilation (OED) (Fig. 1).
The steps of the KED are given by.
Step 1: Initially, the images are binarized by using kittler binarization method.
To find a threshold value, the kilter approach utilizes a mixture of Gaussian distri-
bution functions. The image is split into two portions using this threshold value:
background and foreground, both of which are modeled by Gaussian distribution.
The mathematical formulation of the kittler method is given by
where K BG (g) and kFG (g) are the Gaussian distributions of the background and
foreground regions, and the K mix (g) is the mixture of these two Gaussian distribution,
Comparison of Debris Removal in Pap-Smear Images Using Enhanced … 395
Table 1 Comparison of
Method Sensitivity (%) Specificity (%) Accuracy (%)
debris removal
KED 99 99 98
OED 84 94 90
The images from the database are preprocessed using the diffusion stop function-
based CLAHE algorithm. After preprocessing, the enhanced images are given as
input into the debris removal process. Initially, kittler binarization is applied to the
images to make the objects in the image visible. Morphological operations remove
the smaller objects present in the image during the process. The morphological
operations help to remove the small protrusions and fill the gaps in the images.
By this method, the debris in the image is removed up to a large extent. In the
second method, instead of kittler binarization, Otsu’s method is used along with
morphological erosion and dilation. The results are evaluated using performance
measures like sensitivity, specificity, and accuracy. Table 1 shows the performance
values of the two methods when compared with each other.
KED performs with a sensitivity of 99%, specificity of 99%, and accuracy of 98%
when compared with OED having 84%, 94%, and 90%, respectively, as the perfor-
mance values. The image results are shown in Figs. 2 and 3. Before segmentation,
the debris is removed, which causes less effort and time during the segmentation.
The obtained output shows that the debris removal method using kittler binarization
performs better than Otsu’s method.
5 Conclusion
The process of segmenting pap-smear image is a challenging task due to the presence
of unwanted particles in the image that resembles nucleus, and therefore, debris
removal is one of the primary concerns in automated analysis. In this paper, we have
proposed a debris removal method using binarization and morphological operations.
Comparison of Debris Removal in Pap-Smear Images Using Enhanced … 397
Fig. 2 Images before and after KED. a is the input image, b shows the method applied on the input
image, and c shows the debris removed output image
398 S. Haridas and T. Jayamalar
Fig. 3 Images before and after KED. a Input image, b method applied on the input image, and c
Debris removed output image
Comparison of Debris Removal in Pap-Smear Images Using Enhanced … 399
Here, two debris removal methods have been compared. By applying mathematical
morphological operations, the unwanted objects in the images can be removed up
to a large extent. By applying our method on the images, objects other than cells in
the image are eliminated. The results are evaluated based on performance measures
like sensitivity, specificity, and accuracy. The KED method performs well with a
sensitivity of 99%, specificity of 99%, and an accuracy of 98%. The image outputs
clearly show that the debris removal method using KED performs well. Each cell
can be seen clearly and is ready to use for further processing. Different methods
are available for debris removal, but the proposed method gives a precise result.
In the future, other binarization methods can be used along with morphological
operations and this debris removal method can be applied on more images from
different datasets.
References
1. World Health Organization (2006) Comprehensive cervical cancer control: a guide to essential
practice. WHO Press
2. Grohs H, Husain O, (eds) (1994) Automated cervical cancer screening, IGAKU-SHOIN
Medical Publishers, Inc. https://doi.org/10.1002/dc.2840130221
3. Saslow D, Solomon D, Lawson H, Killackey M, Kulasingam S, Cain J, Garcia F, Moriarty
A, Waxman A, Wilbur D, Wentzensen N, Downs L, Spitzer M, Moscicki A, Franco E, Stoler
M, Schiffman M, Castle P, Myers E (2012) American cancer society, American society for
colposcopy and cervical pathology, and American society for clinical pathology screening
guidelines for the prevention and early detection of cervical cancer. Am J Clin Pathol 137:516–
542. https://doi.org/10.1309/AJCPTGD94EVRSJCG
4. Wasswa W, Obungoloch J, Basaza-Ejiri A, Ware A (2019) Automated segmentation of nucleus,
cytoplasm and background of cervical cells from pap-smear images using a trainable pixel level
classifier. In: 2019 IEEE applied imagery pattern recognition workshop (AIPR). IEEE, pp 1–9.
https://doi.org/10.1109/AIPR47015.2019.9174599
5. Kuko M, Pourhomayoun M (2019) An ensemble machine learning method for single and clus-
tered cervical cell classification. In: 2019 IEEE 20th international conference on information
reuse and integration for data science (IRI), IEEE, pp 216–222. https://doi.org/10.1109/IRI.
2019.00043
6. Wasswa W, Ware A, Bazaza-Ejiri A, Obongoloch J (2019) Cervical cancer classification from
Pap-smears using an enhanced fuzzy C-means algorithm. Inform Med Unlocked 14:23–33
7. Malm P, Balakrishnan B, Sujathan N, Kumar VK, Bengtsson RE (2013) Debris removal in
pap-smear images. Comp Methods Progr Biomed 111(1):128–138
8. Agarwal P, Sao A, Bhavsar A (2015) Mean-shift based segmentation of cell nuclei in cervical
PAP-smear images. In: 2015 fifth national conference on computer vision, pattern recognition,
image processing and graphics (NCVPRIPG), IEEE, pp 1–4. https://doi.org/10.1109/NCV
PRIPG.2015.7490039
9. Martin D, Sandoval TS, Ta CN, Ruidiaz ME, Cortes-Marteos MJ, Messmer D, Kummel AC,
Blair SL, Rodrriguez JW (2011) Quantitative automated image analysis system with automated
debris filtering for the detection of breast carcinoma cells. Acta Cytologica 55:271–280. https://
doi.org/10.1159/000324029
10. Moshavegh R, Bejinordi BE, Menhert A, Sujathan K, Malm P, Bengston E (2012) Automated
segmentation of free-lying cell nuclei in Pap smears for malignancy-associated change analysis.
In: 2012 34th annual international conference of the IEEE EMBS San Diego, IEEE, pp 5372–
5375. https://doi.org/10.1109/EMBC.2012.6347208
400 S. Haridas and T. Jayamalar
11. Kumar RR, Kumar AV, Kumar SPN, Sudhamony S, Ravindrakumar R (2011) Detection and
removal of artifacts in cervical cytology images using support vector machine, In: 2011 IEEE
international symposium on IT in medicine and education, IEEE, pp 717–721. https://doi.org/
10.1109/ITiME.2011.6130760
12. Oscanoa J, Mena M, Kemper G (2015) A detection method of ectocervical cell nuclei for Pap
test images, based on adaptive thresholds and local derivatives. Int J Multimed Ubiquotus Eng
10(2):37–50
13. Plissiti ME, Dimitrakopoulos P, Sfikas G, Nikou C, Krikoni O, Charchanti A (2018)
SIPAKMED: a new dataset for feature and image based classification of normal and patho-
logical cervical cells in Pap smear images, In: 2018 IEEE international conference on image
processing (ICIP) , IEEE, pp 7–10. https://doi.org/10.1109/ICIP.2018.8451588
14. Haridas S, Jayamalar T (2022) Pap smear image enhancement using diffusion stop function
based CLAHE algorithm. In: 2022 8th international conference on advanced computing and
communication systems (ICACCS), IEEE, pp 1048–1054. https://doi.org/10.1109/ICACCS
54159.2022.9785050
Image Enhancement with Deep Learning
and Intensity Transformation
of No-Reference Grayscale Fluorescence
Microscopy Images
Abstract One of the essential and important preprocessing steps for raw microscopy
images is to enhance the contrast between background and the foreground. In this
work, we implement a deep learning-based image enhancement without using any
reference microscopy images for the task. The deep neural network is combined
with intensity transformation curves and trained with a loss function to obtain the
enhanced results. The proposed framework is shown to have a great potential to
enhance the low contrast grayscale fluorescence microscopy images. We show the
superiority of this method over some traditional image processing techniques using
quantitative metrics. For example, edge-based contrast measure is 13.1528, 84.8378,
and 146.0890 for Fluo-N2DH-GOWT1, Fluo-C2DL-Huh7, and Fluo-C2DL-MSC
datasets, respectively, which is a significant improvement when compared with some
classical methods like contrast limited adaptive histogram equalization. One of the
advantages of this method is that, it does not need any reference images for training.
Image enhancement is an essential preprocessing step in many image analysis appli-
cations, and this preprocessing technique may be used to improve the contrast, to
better differentiate the background from the cells in microscopy images. This further
helps in the tasks such as cell segmentation, quantification, and tracking, which are
essential for computational biology research.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 401
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_31
402 A. K. Kakumani and L. P. Sree
1 Introduction
image into enhanced image via low contrast/high contrast image pairs. For the input
images, multiple exposure images were generated and the highest quality image is
considered as the reference image and then a convolutional neural network (CNN)
model was trained in a supervised manner from end to end, to get the output enhanced
image [10]. A neural network architecture was designed inspired by the bilateral grid
processing that is capable of image enhancement which needs pairs of input–output
images [11]. In general, implementing these enhancement techniques requires raw
image and their ideal reference image pairs to train a deep learning network and
usually obtaining ideal reference image is a tough task.
In unpaired image contrast enhancement, a set of images are used as the source
domain and their improved images are used as the target domain, and the correla-
tion between source and target domains is avoided. Unpaired learning method for
image enhancement was proposed, where starting with a given set of input images
with desired characteristics, this method learns an image enhancer which trans-
forms any input image into an enhanced image with the desired characteristics. This
framework utilizes a two way generative adversarial networks [12]. CURve Layers
(CURL) [13] is a neural block which is augmented to an encoder/decoder backbone
network. CURL uses global image adjustment curves. These curves are estimated
while training the network to enhance image characteristics. These adjustments are
controlled by a well-defined loss function.
Finally, in the no-reference image enhancement, there is no need of paired or
unpaired images. In one such methods, a fully convolutional network is utilized to
learn the weighted histograms from the original input images. This technique effec-
tively enhances the areas with less contrast and have the areas with acceptable contrast
unaltered [14]. In another method, low light image was enhanced without paired or
unpaired images using deep intensity transformation curves and well formulated
no-reference loss functions [15].
Inspired by the article based on Zero-DCE [15], in this work, we design a deep
learning-based image enhancement method for grayscale fluorescence microscopy
images. One of the attractive feature of this method is that it does not require any
reference images for training the network. In this work, we train the network three
times for three different types of fluorescence microscopy grayscale images. The
raw microscopy images with low contrast are given to a deep convolutional neural
network. The pixels in the output layer of the network are then mapped to a well-
designed second order quadratic curve. This curve is applied multiple times (itera-
tions) to achieve higher order curves which helps in achieving better image enhance-
ment. The network is then trained with a well formulated no-reference loss func-
tions to achieve the desired enhancement results. An example of enhancing the low
contrast grayscale fluorescence microscopy image of mouse stem cells belonging to
Fluo-N2DH-GOWT1 dataset [16] is shown in Fig. 1.
The contributions of this work are listed below:
1. We investigate a deep learning-based image enhancement method for no-
reference grayscale fluorescence microscopy images.
404 A. K. Kakumani and L. P. Sree
2. We adjust the deep learning architecture and total loss function of Zero-DCE to
suit the grayscale image enhancement.
3. We demonstrate the performance of the deep learning framework with no-
reference image quality metrics and compare our results with some of the
state-of-the-art techniques.
The organization of the remaining article is as follows. Section 2 depicts the
methodology used in this work. Section 3 illustrates the results obtained. Section 4
mentions the conclusion which is followed by the references.
The dataset for the current study is taken from ISBI cell tracking challenge [16].
Specifically, we study the enhancement of Fluo-N2DH-GOWT1, Fluo-C2DL-Huh7,
and Fluo-C2DL-MSC 2D fluorescence microscopy image datasets. The Fluo-N2DH-
GOWT1 dataset has two time-lapse image sequences of 92 frames each, thus it
consists of a total of 184 frames of 2D fluorescence microscopy images of mouse
stem cells. The Fluo-C2DL-Huh7 dataset has two time-lapse image sequences of
30 frames each, thus consisting of a total 60 frames of 2D fluorescence microscopy
images. The Fluo-C2DL-MSC dataset has two time-lapse image sequences of 48
frames each, thus consisting of a total of 96 frames of 2D fluorescence microscopy
images.
2.2 Methodology
We adopt the deep learning framework and the intensity curves as mentioned in zero-
DCE [15]. However, we modify and tune the process to suit the grayscale images
and we believe that this is the first deep learning-based image enhancement for
Image Enhancement with Deep Learning and Intensity Transformation … 405
where I(X) denotes the input image, X are the pixel coordinates, A1 (X) through A4 (X)
are the parameter maps produced as the output of channel 1 through 4 at the output of
the network. Higher order iterations increase the order of the intensity transformation
curves, thus increasing the dynamic range. Moreover, applying the parameter maps
helps to train the network to select the best parameter pixel-wise to produce best
fitting curves according to the pixel value.
No-Reference Loss Function. The network is trained with exposure control loss.
The design of this loss function is critical since the ground truth reference images are
not available for the network to learn. The exposure control loss L exp computes the
distance between the brightness level E and the average value of intensity of a local
region, Y K . The value of E is set to 0.6. This loss helps in moderating the brightness
level. The exposure control loss L exp is represented as
1
M
L exp = |Y K − E| (5)
M K =1
3 Results
Image enhancement enables to perceive image details in a better way when compared
to its original image. Quantifying image enhancement is not an easy task especially
when the original image/reference image pairs are not available. In this study, we
use no-reference image quality metrics namely contrast enhancement-based contrast-
changed image quality measure (CEIQ) [17], screen image quality evaluator (SIQE),
Image Enhancement with Deep Learning and Intensity Transformation … 407
accelerated screen image quality evaluator (ASIQE) [18], histogram spread (HS)
[19], edge-based contrast measure (EBCM) [20], and discrete entropy (DE) [21].
CEIQ is a no-reference image quality assessment to measure the quality of image
without reference image. CEIQ learns a regression module with the features like
histogram equalization, structural-similarity index (SSIM), histogram-based entropy,
and cross entropy to infer the quality score. A high value of CEIQ means better image
quality.
SIQE and ASIQE are no-reference methods to evaluate the perpetual quality of
screen content pictures with big data learning. This method extracts certain features
from the image, and a regression method is trained on a number of training images
which are labeled with visual quality prediction score. SIQE and ASIQE higher
values indicate better quality. Humans can perceive edges more significantly in an
image, this observation is considered for designing EBCM. Contrast c(i, j ) for a
pixel X located at (i, j ) is defined as
where N (i, j) represents all the neighboring pixels of pixel (i, j ) and g(k, l) repre-
sents the edge value at pixel (k, l). We consider 3 × 3 neighborhood, and g(k, l)
is the magnitude of the image gradient calculated using the Sobel operators. The
EBCM for an image X is calculated as the average value
M
N
EBCM(X ) = c(i, j )/MN (8)
i=1 j=1
where M and N are the number of rows and columns of the image, respectively. For
enhanced image, EBCM is higher than the original image.
Histogram spread is defined as
İt is observed that low contrast images with narrow histograms have a low value of
HS when compared with the high contrast images with uniform histogram. Discrete
entropy of an image X measures its content. If the value of discrete entropy is higher,
408 A. K. Kakumani and L. P. Sree
where P(xi ) is the probability of the pixel intensity of the pixel xi , which is obtained
from the normalized histogram.
The quantitative results of the above metrics for the grayscale fluorescence datasets
[14] are shown in Table 1. The quality evaluation metrics used are CEIQ, EBCM, DE,
HS, SIQE, and ASIQE. We have compared our approach to the other two well-known
methods—Autocontrast and CLAHE. It can be observed in Table 1 that the metrics
are significantly superior for the proposed method. Our method has very good quality
evaluation metrics for the datasets Fluo-N2DH-GOWT1 and Fluo-C2DL-Huh7. The
proposed method also did very well for the image Fluo-C2DL-MSC in all the quality
metrics except for DE and SIQE.
Figure 3 displays the raw images and their corresponding output images. It is
apparent that the proposed method is giving better visual output. The improvement
of the results is because we used a combination of deep learning and intensity trans-
formation curves with well-defined no-reference loss function to train the network
for obtaining the desired results. Moreover, this method does not require a reference
image for training the deep learning network.
Table 1 Performance comparison of different image quality metrics. The best result for each image
type is highlighted
Dataset Method Quality metrics
CEIQ EBCM DE HS SIQE ASIQE
Fluo-N2DH-GOWT1 Autocontrast 1.6295 1.0611 2.3487 0.0040 0.6603 0.7132
CLAHE 1.6526 1.0428 2.3800 0.0120 0.6622 0.7210
Proposed 1.9331 13.1528 3.0800 0.0862 0.6747 0.7338
Fluo-C2DL-Huh7 Autocontrast 1.9857 6.4561 5.1894 0.0392 0.6528 0.6988
CLAHE 2.2823 12.6114 5.9950 0.0826 0.6850 0.7078
Proposed 3.1450 84.8378 6.3552 0.3220 0.7392 0.7505
Fluo-C2DL-MSC Autocontrast 2.0078 0.4007 3.9220 0.0274 0.6380 0.7026
CLAHE 2.0230 0.2014 5.3151 0.0681 0.7315 0.7246
Proposed 2.0238 146.0890 3.9029 0.1213 0.7267 0.7618
Image Enhancement with Deep Learning and Intensity Transformation … 409
Fig. 3 Original microscopy images and their enhanced outputs for different image enhancement
methods
4 Conclusion
References
1. Meijering E, Dzyubachyk O, Smal I (2012) Methods for cell and particle tracking. Methods
Enzymol 504(February):183–200
2. Panteli A, Gupta DK, De Bruijn N (2020) Siamese Tracking of Cell Behaviour Patterns
Efstratios Gavves. Proc Mach Learn Res [Internet]. 121:570–587
3. Al-Kofahi Y, Zaltsman A, Graves R, Marshall W, Rusu M (2018) A deep learning-based
algorithm for 2-D cell segmentation in microscopy images. BMC Bioinformatics 19(1):1–11
4. Das DK, Maiti AK, Chakraborty C (2015) Automated system for characterization and clas-
sification of malaria-infected stages using light microscopic images of thin blood smears. J
Microsc 257(3):238–252
5. Harder N, Mora-Bermúdez F, Godinez WJ, Wünsche A, Eils R, Ellenberg J et al (2009)
Automatic analysis of dividing cells in live cell movies to detect mitotic delays and correlate
phenotypes in time. Genome Res 19(11):2113–2124
6. Luengo--Oroz MA, Pastor-Escuredo D, Castro-Gonzalez C, Faure E, Savy T, Lombardot B
et al (2012) 3D+t morphological processing: Applications to embryogenesis image analysis.
IEEE Trans Image Process 21(8):3518–3530
7. Liu H (2013) Adaptive gradient-based and anisotropic diffusion equation filtering algorithm
for microscopic image preprocessing. J Signal Inf Process. 4(01):82–87
8. Cakir S, Kahraman DC, Cetin-Atalay R, Cetin AE (2018) Contrast enhancement of microscopy
images using image phase information. IEEE Access. 6:3839–3850
9. Al-Ameen Z (2018) Contrast enhancement for color images using an adjustable contrast
stretching technique. Int J Comput 17(2):74–80
10. Cai J, Gu S, Zhang L (2018) Learning a deep single image contrast enhancer from multi-
exposure images. IEEE Trans Image Process 27(4):2049–2062
11. Gharbi M, Chen J, Barron JT, Hasinoff SW, Durand F (2017) Deep bilateral learning for
real-time image enhancement. ACM Trans Graph 36(4)
12. Chen YS, Wang YC, Kao MH, Chuang YY (2018) Deep photo enhancer: unpaired learning
for image enhancement from photographs with GANs. Proc IEEE Comput Soc Conf Comput
Vis Pattern Recognit, 6306–6314
13. Moran S, McDonagh S, Slabaugh G (2020) CuRL: neural curve layers for global image
enhancement. Proc—Int Conf Pattern Recognit, 9796–9803
14. Xiao B, Xu Y, Tang H, Bi X, Li W (2019) Histogram learning in image contrast enhancement.
IEEE Comput Soc Conf Comput Vis Pattern Recognit Work, 1880–1889, June
15. Guo C, Li C, Guo J, Loy CC, Hou J, Kwong S, et al (2020) Zero-reference deep curve estimation
for low-light image enhancement. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit,
1777–1786
16. Cell Tracking Challenge. http://celltrackingchallenge.net/, last accessed 2022/8/17
17. Li L, Yan Y, Lu Z, Wu J, Gu K, Wang S (2017) No-reference quality assessment of deblurred
images based on natural scene statistics. IEEE Access. 5(7):2163–2171
18. Gu K, Zhou J, Qiao JF, Zhai G, Lin W, Bovik AC (2017) No-reference quality assessment of
screen content pictures. IEEE Trans Image Process 26(8):4005–4018
19. Tripathi AK, Mukhopadhyay S, Dhara AK (2011) Performance metrics for image contrast. In:
ICIIP 2011—Proc 2011 Int Conf Image Inf Process (Iciip), 0–3
20. Celik T, Tjahjadi T (2012) Automatic image equalization and contrast enhancement using
Gaussian mixture modeling. IEEE Trans Image Process 21(1):145–156
21. CE S (1948) A mathematical theory of communication. Bell Syst Tech J XXVII(3):379–423
Real-Time Human Action Recognition
with Multimodal Dataset: A Study
Review
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 411
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_32
412 K. Joshi et al.
1 Introduction
a) Golf Swing
b) Forward Kick
Fig. 1 Actions of a golf swing and a kick forward are examples of depth map sequences [3]
Real-Time Human Action Recognition with Multimodal Dataset: … 413
Fig. 2 Decomposition of
human activities
2 Background
For the purpose of identifying video clips of people acting recorded by conven-
tional space-time-based RGB cameras, techniques such spatiotemporal space-time
volumes characteristics, as well as trajectory, are extensively used. In [4], to recognize
human action, spatiotemporal interest points and an SVM classifier were combined.
Cuboids’ descriptors accustomed to express actions. Activities in a series of videos
were identified using SIFT-feature trajectories that were described in an order of three
degrees of abstraction. In order to accomplish action categorization, several charac-
teristics of local motion were assembled as spatiotemporal features from a bag (BoF)
[3]. As motion templates to characterize the spatial and temporal properties of human
motions in movies, motion energy images (MEIs) or motion-history images (MHIs)
414 K. Joshi et al.
were only launched in [5]. When computing dense motion flow using MHI is occurred
then, a hierarchical extension was provided with correct accuracy. The sensitivity for
recognition to variations in illumination is a significant drawback of adopting either
depending on hue or intensity approaches, restricting the robustness in recognition.
Research with action recognition dependent on depth data has expanded with the
introduction of RGBD sensors. Skeletal joint locations are retrieved from depth
pictures for skeleton-based techniques. A customized spherical coordinate system
and histograms of 3D joint positions (HOJ3D) were used to create a view-invariant
posture representation. With the use of LDA, reprojected HOJ3D were used and
grouped around K-situation visual words. A continuous hidden Markov model was
used to simulate the sequential evolutions of such visual words. Based on Eigen
joints, a Naive Bayes Nearest-Neighbor (NBNN) classification was used to identify
human behavior. (i.e., variations in joint position) integrating data on offset, motion,
and still posture. Due to some errors in skeletal estimate, many skeleton-based tech-
niques have limits. Additionally, many programmers do not always have access to
the skeleton information.
To discriminate between various actions, several techniques require spatiotem-
poral data extraction information based on complete [6] collection of a depth map’s
point’s series. The use of an action graph in a group was 3D points which was also
used to describe body positions and describe the dynamics of actions. The 3D points’
sample technique, however, produced a lot of data, necessitating a time-consuming
training phase. To efficiently describe the body shape as well as movement informa-
tion for distinguishing actions, an extent motions’ histogram with a map on direc-
tional gradients (HOG) has been used. A weighted sampling strategy was used to
extract random occupancy frequency (ROP) features from depth pictures. The char-
acteristics were demonstrated to be robust to occlusion by using a sparse coding
strategy to effectively encode random occupancy sequence features during action
recognition. In order to preserve spatial and geographic context statement while
managing intra-class conflict variability, 4D advanced patterns were being used as
features. Then, for action recognition, a straightforward technical design here on
cosine distance was applied. A hybrid system for action recognition method incor-
porating depth and the skeleton data was employed. Local occupancy patterns and 3D
joint position were employed as features then, to characterize each action and account
for intra-class variances; another action let accuracy of the model was learned.
The 3D structure but also shape information can be recorded using a depth map.
Alemayoh et al. [7] suggested to characterize the motion of an action by imposing
depth pictures across three Cartesian orthogonal planes. Because it is computationally
straightforward, the same strategy is used throughout the work while the method for
getting DMMs is changed. In more detail, any 3D depths are frame also used like
create three map v 2D mapped projections that represent the top, side, or front
Real-Time Human Action Recognition with Multimodal Dataset: … 415
perspectives
Where v = { f, s, t} (1)
To illustrate (x, y, z) with in a frame depth z, the number of pixels in three projected
maps is denoted by the value of depth in such an orthogonal coordinate system, z, x,
and y, respectively.
Separated from, the actual distinction between these two separate maps before
thresholding is used in this calculation to determine the motion energy for each
projected map. The depth gesture map DMMv is created in-depth video series N
frame’s worth by stacking all motion energies throughout the full sequence as follows:
b
mapi − mapi−1
DMMv = v v (2)
i=a
Over the past two decades, the categorization of human activities has remained a
difficult job in computer vision. There is a lot of potential in this field based on
earlier studies on describing human behavior. According to the type of sensor data
they use, we first divide the acknowledgement of human action techniques into
the two broad categories: (i) unimodal and (ii) multimodal identification system
approaches. According on how they represent human activities, every one of those
is two types, then further broken into smaller divisions. As a result, we suggest
alternative classification of human activities in hierarchy techniques, as shown in
Figs. 3 and 4.
Fig. 4 Representative frames of the main human action classes for various datasets [8]
5 Unimodal-Based Methods
6 Multimodal-Based Methods
Dataset is satisfied with high class variability (intra-class) and high class similarity.
The following values are shown in Table 1.
418 K. Joshi et al.
In both Tables 1 and 2, we calculated the precision and recall value of tested
data where some data [10] on precision, rappel and accuracy with latest relevant
data. We also categorized the age scale between 1 and 10, and last range was 40–50
for monitoring the activity of human. Some results are better in age from 25 to 40,
i.e., middle age. We use dataset in further study if we consider any image [11–14]
pattern [15–19].
8 Conclusion
References
1. Chen C, Liu K, Kehtarnavaz N (2016) Real-time human action recognition based on depth
motion maps. J Real-Time Image Proc 12(1):155–163
2. Cheng X et al (2022) Real-time human activity recognition using conditionally parametrized
convolutions on mobile and wearable devices. IEEE Sensors J 22(6):5889–5901
3. Park J, Lim W-S, Kim D-W, Lee J (2022) Multi-temporal Sam pling module for real-time
human activity recognition. IEEE Access
4. Mazzia V et al (2022) Action transformer: a self-attention model for short-time pose-based
human action recognition. Pattern Recog 124:108487
5. Andrade-Ambriz YA, Yair A et al (2022) Human activity recognition using temporal
convolutional neural network architecture. Expert Syst Appl 191:116287
6. Sun X et al (2022) Capsganet: deep neural network based on capsule and GRU for human
activity recognition. IEEE Systems J
7. Alemayoh TT, Lee JH, Okamoto S (2021) New sensor data structuring for deeper feature
extraction in human activity recognition. Sensors 21(8):2814
8. http://crcv.ucf.edu/data/UCF_Sports_Action.php
9. Vrigkas M, Nikou C, Kakadiaris IA (2015) A review of human activity recognition methods.
Front Robot AI 2:28
10. Kumar M, Gautam P, Semwal VB (2023) Dimensionality reduction-based discriminatory clas-
sification of human activity recognition using machine learning. In: Proceedings of Third
International Conference on Computing, Communications, and Cyber-Security. Springer,
Singapore, pp 581–593
11. Joshi K, Diwakar M, Joshi NK, Lamba S (2021) A concise review on latest methods of image
fusion. Recent Advances in Computer Science and Communications (Formerly: Recent Patents
on Computer Science) 14(7):2046–2056
12. Sharma T, Diwakar M, Singh P, Lamba S, Kumar P, Joshi K (2021) Emotion analysis for
predicting the emotion labels using Machine Learning approaches. In: 2021 IEEE 8th Uttar
Pradesh Section International Conference on Electrical, Electronics and Computer Engineering
(UPCON), pp. 1–6, November. IEEE
13. Diwakar M, Sharma K, Dhaundiyal R, Bawane S, Joshi K, Singh P (2021) A review
on autonomous remote security and mobile surveillance using internet of things. J Phys:
Conference Series 1854(1):012034, April. IOP Publishing
14. Tripathi A, Sharma R, Memoria M, Joshi K, Diwakar M, Singh P (2021) A review analysis on
face recognition system with user interface system. J Phys: Conference Series 1854(1):012024.
IOP Publishing
15. Wang Y et al (2021) m-activity: Accurate and real-time human activity recognition via
millimeter wave radar. In: ICASSP 2021–2021 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP). IEEE
16. Sun B, Wang S, Kong D, Wang L, Yin B (2021) Real-time human action recognition using
locally aggregated kinematic-guided skeletonlet and supervised hashing-by-analysis model.
IEEE Trans Cybernetics
17. Varshney N et al (2021) Rule-based multi-view human activity recognition system in real time
using skeleton data from RGB-D sensor. Soft Comp, 1–17
18. Hossain T, Ahad M, Rahman A, Inoue S (2020) A method for sensor-based activity recognition
in missing data scenario. Sensors 20(14):3811
19. AlShorman O, Alshorman B, Masadeh MS (2020) A review of physical human activity
recognition chain using sensors. Indonesian J Elect Eng Inform (IJEEI) 8(3):560–573
KnowStress: A Mobile Application
Prototype Detection of Stress and Stress
Related Disorders
Abstract In today’s growing world, we often overlook the need of paying heed to
the health of some of the most important constituents of our body viz., mental health.
This might be due to working hours, which do not allow a person to pay due attention
that this facility desires, this in turn may lead to a lot of disorders such as bipolar
disorder, ADHD, PTSD and anxiety disorder remaining un-diagnosed which may
cause further health complications. Hence, it is the need of the hour to make available
a solution so as to enable the people to get themselves diagnosed within a short span of
time at their own convenience and from the comfort of their homes. KnowStress has
been developed with the aim of improving access to mental health resources for those
who lack access to conventional support and to help people explore how technology
can be used to improve general wellbeing. KnowStress provides diagnostic tests for
screening stress and stress-related disorders by using well known tests and tools
that are used by medical professionals. Our proposed model utilises the K-means
clustering algorithm to categorise an individual’s responses into the groups of “low
stress”, “bipolar disorder”, “PTSD” and “ADHD / GAD” with a validation accuracy
of 88.86%.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 421
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_33
422 M. Navlakha et al.
1 Introduction
Everyone is affected by stress; there is no real way to totally escape it. Furthermore,
depending on their personal, psychosocial, professional and biological backgrounds,
some people are more significantly affected by its effects. Stress is essentially a
human defence mechanism, but it is crucial to avoid letting it rule your life.
Stress can have many different types of origins, including physical, psycholog-
ical, emotional and social ones. This application can be used independently, as a
bridge to in-person therapy, or as an addition to existing therapy. There are many
wonderful advantages to adopting mental health apps, including their affordability
and portability. One does not have to worry about scheduling appointments, waiting
lists or insurance when using mental health applications.
Additionally, this app allows for privacy and confidentiality and can serve as a
safe haven for people who may be reluctant to disclose that they have mental health
problems in person or who fear being disparaged or shunned by others. The current
scope of this app extends to stress and stress-related disorders as follows:
• ADHD [1]: A long-term disorder characterised by impulsivity, hyperactivity and
trouble paying attention.
• GAD [2]: Severe, ongoing anxiety that interferes with daily activities.
• Bipolar Disorder [3]: A condition characterised by cycles of mood swings that
range from manic highs to depressive lows.
• PTSD [4]: A condition marked by an inability to recover after being exposed to
or witnessing a horrific incident.
All the state-of-the-art stress-related applications focus on only one particular
stress-related disorder. We first try to get an estimate of the overall stress of an indi-
vidual and then redirect the user to the most probable disorder. Most of the available
platforms were calibrated keeping in mind the western standards of living. This led
us to design an app calibrated according to Indian standards of living. Majority of
the platforms available were not acquainted with the provision of spreading aware-
ness about the diagnosed disorder. Propagation of awareness would thereby, lead to
expansion of the user base of our app. Most of the available platforms reviewed on
the Internet did not seem to have incorporated personalisation. Our application deals
with the issue by maintaining the records of the past tests and recommending blogs
and tests accordingly.
2 Related Work
Fliege et al. [5] proposed the PSQ scale, an indicator for diagnosing stress which
may tend to trigger or exacerbate other stress-related disorders. It is explicitly recom-
mended for clinical facilities and is often prescribed by the doctor to have a cursory
estimate of the severity of the stress. PSQ index is calculated based on the responses
to the PSQ, where the respondent marks the options ranging from 1 (“Never”) to 4
(“Always”) depending upon the severity of stress associated feelings. Higher scores
indicate greater levels of stress. However, the questions in the questionnaire are either
too blunt or vague which makes it difficult for the user to understand. This could lead
to a misdiagnosis. The paper also does not explain the meaning of each question.
This could lead to misinterpretation and thus again, a misdiagnosis. The PSQ is a
valid way of checking stress but almost always requires a doctor’s guidance.
Li and Liu [6] introduced a technique that makes use of two deep neural networks
that were created for the processing of physiological information captured by sensors
in order to detect stress and emotions. To strengthen the robustness of the networks,
the neural networks must be trained and tested on much larger datasets containing a
variety of human populations. The justification for subjecting the neural networks to
a dataset that is typical of the entire human population is because each individual’s
sensitivity to stress conditions (i.e. the conditions under which they experience stress)
differs. Also, the method requires high quality sensors which is not economically
feasible for everyone. Another disadvantage to this method is that it is invasive
and can harm the human body instead of helping with the diagnosis. The neural
networks function on unknown data and are thus unsupervised but this may lead to
wrong diagnoses.
Albertetti et al. [7] have proposed a novel approach for the detection and binary
classification of stressful events by taking into account both the features extracted
from physiological signals and the information provided by participants via ques-
tionnaires. The signals mainly focused on blood volume pulse (BVP) and electro-
dermal activity (EDA). For the purpose of categorising stressful events, the research
study considers three machine learning models which are gradient boosting decision
tree, recurrent neural network (RNN) and convolutional recurrent neural network
(CRNN). Out of these three models it was observed that the RNN performed the
best with a macro F1-score of 71%. Even though the results obtained show that the
novel approach proposed in this research was successful, the main drawback of this
method is that it depends on receiving a constant influx of physiological signals from
wearable devices such as the Empatica E4 bracelet which is not always accessible for
all intended users. This drawback along with the fact that the study was conducted
on a small scale imbalanced dataset with metrics such as accuracy being ignored,
causes the research and its subsequent results to fall short in certain critical aspects.
Kumar et al. [8] have conducted an empirical study on eight existing machine
learning algorithms for the main objective of predicting the occurrence of psycho-
logical disorders like anxiety, stress and depression. The research also involves the
classification of these aforementioned disorders into classes such as “extremely
severe”, “severe”, “moderate”, “mild” and “normal” on the basis of their severity.
The research also includes the proposal of a hybrid classification algorithm that
will detect and classify the psychological disorder. Datasets created by using the
424 M. Navlakha et al.
DASS42 and DASS21 tools were used to train and test the models with a train-test
split of 75:25. On the DASS42 data, it was observed that the radial basis function
network (RBFN) performed the best amongst the eight existing machine learning
models with a maximum accuracy of 97.48% and while the hybrid model gave a
comparable performance, it fell short in the aspect that it is very time consuming.
On the DASS21 data gathered by the authors, it was observed that the random forest
model gave the best performance with an accuracy of 100% and a score of 1 across
metrics such as precision and recall. Such atypical results indicate that the model
has possibly overfitted the available data and this suspicion is later confirmed by the
authors themselves as they report on the imbalanced nature of the small size data
obtained from the DASS21 questionnaire. Inconsistent results, poor performance
of the proposed model and the lack of a better quality, more representative dataset
results in the research not coming up to scratch.
Baheti et al. [9] proposed a method to detect the day-to-day stress-related expres-
sions and emotions of the user by using their daily conversations on social media
and other text-based applications based on sentiment analysis. The proposed model
automatically extracted the sentiment related keywords from text and fed it to
Tensi/Strength framework [10] for identifying the strength of the sentiment of the
words used on the social networking sites. Tensi/Strength returned a score ranging
from −5 to +5 for each word, and the final classification was performed by SVM.
The proposed model provided a NLP-based sentiment analysis, but the scope was
restricted to social media conversation as the model was trained on twitter sentiment
dataset. A model trained on more realistic and wider scoped dataset is required that
focuses on daily activities in order to gauge stress of an individual.
Despite the rapid surge in the count of smartphone-based mental health apps, the
usability of these applications is still restricted. These applications were marketed
as a potential solution to the corresponding stress disorder but lacked severely in
its effectiveness to manage, moderate and treat the disorder. Williams and Pykett
[11] conducted a study where they analysed 39 different mental health applications
focusing on the social, environmental and mental angles of life, the overall user
experience and whether the primary purpose was fulfilled or not. The paper clearly
highlighted that the current monitoring mental health apps especially for treating
anxiety and depression are still lacking in terms of efficacy. Most of the applications
focused more on making the applications more engaging, rather than improvising
the testing efficiency.
3 Proposed Work
React.js, and the backend would be running Django REST framework (DRF). For
our application, we have chosen SQLite database. The ML models reside on the
Django server. The machine learning models are trained and ready to predict once
the server has been booted successfully. Figure 1 depicts the proposed ML model
which predicts the stress disorder based on the responses from the GST. The music
and the blogs model illustrated in the architecture will recommend music and blogs
based on the performance of the user in the GST test. An enhanced version of these
models, one that also incorporates intelligence, will be completed as part of the future
works of the project.
3.2 Questionnaire
Some of the questions (Question 2 and 5) which are included in the GST are
positive in nature and hence, their influence on the stress calculation should be
accordingly adjusted. Along with determining whether the user has stress or not, we
have also embedded certain deterministic specialised questions in the GST. These
questions help the unsupervised ML model get the probabilities of various stress-
related disorders. Based on these results, the user may decide which specialised
disorder test he/she needs to attempt so as to get a conclusive insight into whether
the user has the disorder or not.
Generalised Stress Test:
1. People tend to feel irritable or grouchy at times. In the last month, have you felt
this way?
2. Individuals tend to feel lonely or isolated. Have you felt this way in the last
month?
3. Have you found yourself feeling more tired than usual in the last month?
4. People often fear that they may not be able to attain their goals. In the last
month, have you felt this way?
5. In the last month, have you felt more calm than usual?
6. People often experience elevated levels of frustration. In the last month, have
you felt this way?
7. In the last month, have you felt more tense than usual?
8. In the last month, have you felt as if you are always in a hurry?
9. Have you found yourself feeling more worried than usual in the last month?
10. In the last month, have you been able to enjoy and unwind?
11. Do you often feel you are doing things because you have to, and not because
you want to?
12. Mental exhaustion is very common these days. In the last month, did you find
yourself feeling more mentally exhausted than usual?
13. Insomnia and the inability to relax has become common in today’s world. Did
you have trouble relaxing in the last month?
14. In the last month, have you often felt overly active and compelled to do things?
15. Did you face difficulties in concentrating or paying attention in the last month?
16. Do you experience frequent mood swings?
17. Do you experience repeated, distressing memories or dreams about traumatic
events from the past?
Minimum Score = 0.
Maximum Score = 60.
KnowStress: A Mobile Application Prototype Detection of Stress … 427
questions to be marked as “Yes”. The accuracy of RMS has been compared against
other available screening tools and validated in the study conducted by Sayyah et al.
[14].
4 Dataset
Currently, all the available datasets on the Internet do not match the required spec-
ifications, they are either not for the decided age bracket or are not geared towards
the standard of living in India. A field survey based on the GST questionnaire was
conducted with the objective to develop a reliable dataset that is targeted towards
the population lying in the age bracket of 15–25 residing in India. The target audi-
ence for this survey were mainly students that often suffer from various forms of
stress-related disorders such as ADHD, bipolar disorder, GAD and PTSD. Finally,
we procured our pertinent dataset consisting of 976 records from the survey out of
which 85% (train split) records were used to train our machine learning models and
the remaining 15% (test split) records were used to test the model. The data collected
via the survey is independent of factors associated with time and is not affected by
increased stress levels that may be observed during specific time periods such as the
exam season.
430 M. Navlakha et al.
5 Implementation
5.2 Models
The data samples in the aforementioned dataset are clustered on the basis of their simi-
larities by utilising clustering algorithms such as (1) density-based spatial clustering
of applications with noise (DBSCAN) and (2) K-means.
DBSCAN: It clusters the samples based on the density distribution and expands
clusters from them. Good for records which include clusters of comparable density.
KnowStress: A Mobile Application Prototype Detection of Stress … 431
The DBSCAN set of rules is primarily based totally in this intuitive belief of “clusters”
and “noise”. The crucial concept is that for every factor of a cluster, it is mandatory
for the neighbourhood of a given radius to include at least a minimal range of factors
and has a reminiscence complexity of O(n). The following parameters were set for
the DBSCAN model:
• algorithm: auto
• eps: 0.1
• leaf_size: 30
• metric: euclidean
• min_samples: 50
The screenshots of the KnowStress mobile prototype can be seen in Figs. 8, 9 and
10.
Both K-means and DBSCAN models were trained on the aforementioned dataset and
subsequently analysed by using evaluation metrics such as the silhouette coefficient
and confusion matrix.
Silhouette Coefficient:
Silhouette coefficient helps to determine the quality of the clusters. Higher silhouette
coefficients typically have more cohesive clusters; the silhouette coefficient ranges
KnowStress: A Mobile Application Prototype Detection of Stress … 433
Fig. 9 GST
Table 1 Silhouette
Model Silhouette coefficient Validation accuracy (%)
coefficient
K-means 0.8249 88.86
DBSCAN 0.8130 50.98
from 1 to 1. Close to + 1 silhouette coefficients suggest that the sample is far from
the surrounding clusters, 0 means it is close to or on the decision boundary.
b−a
s=
max(a, b)
6.2 Analysis
For the purpose of clustering data samples into distinct clusters, K-means and
DBSCAN were the two clustering algorithms that were considered. On comparing
the performance of these two models, it was observed that K-means performed better
with a silhouette coefficient of 0.8249 as compared to DBSCAN that gave a silhou-
ette of 0.8130. The comparison between DBSCAN and K-means shown in Table 1
depicts the silhouette scores and the validation accuracy of both the models.
The validation dataset consisting of 50 records was collected with the help of
a reputed medical institution. This dataset was used to validate the K-means and
DBSCAN models. It was observed that K-means gave a superior validation accuracy
of 88.86% as visible in Table 1. DBSCAN, however, had an inferior performance
on the validation dataset with just 50.98% accuracy. Moreover, another issue with
DBSCAN was to decide how to handle a response which was categorised as noise.
Since we focus on four disorders currently, a GST response labelled as noise can be
from an extreme end of any disorder or even from a low stress score. This uncertainty
on how to handle label “noise” is also a major criteria for choosing K-means over
DBSCAN. The confusion matrix for the selected K-means models is displayed in
Fig. 11. The vertical axis depicts the actual labels, and the horizontal axis the expected
labels as predicted by the model.
KnowStress: A Mobile Application Prototype Detection of Stress … 435
7 Conclusion
In the current work, we have reinvented the way to classify people as stressed or
not stressed by leveraging modern day machine learning techniques. The proposed
application, KnowStress would help doctors as well as individuals to gauge their
levels of stress in a comfortable manner thereby allowing them to prescribe tests to
determine their general stress levels and possibilities of having specific stress-related
disorders such as ADHD, bipolar disorder, GAD and PTSD. Our proposed GST
model utilises the K-means clustering algorithm which has a validation accuracy
of 88.86%. The KnowStress app also includes blogs that are aimed at spreading
awareness about stress-related disorders and music that would help the users meditate
and relax.
8 Future Work
The current model was trained on a small dataset consisting of only 976 records,
to improve the model’s performance one may increase the size of the dataset thus
increasing generalisation. Furthermore, the scope of the research can be extended
by incorporating additional stress-related disorders and experimenting with newer
state-of-the-art models.
436 M. Navlakha et al.
References
1. Kessler RC et al (2004) The World Health Organization adult ADHD self-report scale (ASRS):
a short screening scale for use in the general population
2. Spitzer RL, Kroenke K, Williams JBW et al (2006) A brief measure for assessing generalized
anxiety disorder. The GAD-7
3. McIntyre RS, Patel MD, Masand PS, Harrington A, Gillard P, McElroy SL, Sullivan K et al
(2021) The Rapid Mood Screener (RMS): a novel and pragmatic screener for bipolar I disorder.
Current Med Res Opinion 37(1)
4. Prins A et al (2016) The primary care PTSD screen for DSM-5 (PC-PTSD-5): development
and evaluation within a veteran primary care sample. J General Internal Med 31(10)
5. Fliege H, Rose M, Arck P, Walter O, Kocalevent R-D, Weber C, Klapp B (2005) The Perceived
Stress Questionnaire (PSQ) reconsidered: validation and reference values from different clinical
and healthy adult samples
6. Li R, Liu Z (2020) Stress detection using deep neural networks. BMC Med Inform Decis Mak
20:285
7. Albertetti F, Simalastar A, Rizzotti-Kaddouri A (2020) Stress detection with deep learning
approaches using physiological signals. In: International Conference on IoT Technologies for
HealthCare, pp 95–111. Springer, Cham
8. Kumar P, Garg S, Garg A (2020) Assessment of anxiety, depression and stress using machine
learning models. Procedia Comp Sci 171
9. Baheti RR, Kinariwala SA (2019) Detection and analysis of stress using machine learning
techniques. Int J Eng Adv Tech (IJEAT). 9(1). ISSN: 2249-8958
10. Baheti RR, Kinariwala SA (2019) Survey: sentiment stress identification using tensi/strength
framework. Int J Sci Res Eng Dev 2(3)
11. Williams JE, Pykett J (2022) Mental health monitoring apps for depression and anxiety in
children and young people: a scoping review and critical ecological analysis. Soc Sci Med
297:114802
12. Brevik EJ, Lundervold AJ, Haavik J, Posserud M-B (2020) Validity and accuracy of the ADHD
Self-Report Scale (ASRS)
13. Pranckeviciene A, Saudargiene A, Gecaite-Stonciene J, Liaugaudaite V, Griskova-Bulanova
I, Simkute D, Naginiene R, Dainauskas LL, Ceidaite G, Burkauskas J (2022) Validation of
the patient health questionnaire-9 and the generalized anxiety disorder-7 in Lithuanian student
sample. PLoS One
14. Sayyah M, Delirrooyfard A, Rahim F (2022) Assessment of the diagnostic performance of two
new tools versus routine screening instruments for bipolar disorder: a meta-analysis. Brazilian
J Psychiatry 44
15. Williamson MLC et al (2022) Diagnostic accuracy of the primary care PTSD screen for DSM-5
(PC-PTSD-5) within a civilian primary care sample. J Clin Psych. https://doi.org/10.1002/jclp.
23405
Handwritten Character Evaluation
and Recommendation System
1 Introduction
The velocity at which handwritten data is created in the real world necessitates manual
processing since computers still struggle to decipher and extract information from
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 437
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_34
438 R. Mittal et al.
2 Literature Review
This section provides a brief of some works done in this field by various other authors.
In this study, Masoud [5] looked at the potential of deep convolutional neural
networks (CNNs) for classifying multispectral remote sensing images. For wetland
mapping in Canada, they examined seven well-known deep convolutional nets:
Handwritten Character Evaluation and Recommendation System 439
In this study [10], Inunganbi described a novel decision forest-based strategy for
transfer learning, which we utilise to recognise characters. In order to move knowl-
edge from source tasks to a specific target task, they presented two enhancements to
the decision forest architecture. The MNIST dataset shows that experiments outper-
form typical decision trees. They also perform well when compared to other state-
of-the-art classifiers. Thus, they present a novel approach for transferring knowledge
from several source tasks to a single target task in this study. As a result, a classifier
that may use knowledge from related tasks to improve predicted performance on the
target task has been created.
Sulaiman and Nazir, cursive text recognition [11] is considered the most difficult
task in the field of machine learning and pattern recognition, but due to minor differ-
ences in character shape and a large number of characters in its character database, the
Pashto language presents even more challenges to the research community. Using an
MLSTM-based deep learning approach, the suggested research study demonstrates
the construction of an optimal OCR system for the recognition of isolated handwritten
Pashto letters. The suggested model’s applicability is demonstrated utilising a deci-
sion trees classification tool based on zonal feature extraction and invariant moments-
based techniques. The MLSTM-based OCR system has an overall accuracy rate of
89.03%, whereas DT-based identification rates of 72.9% are reached employing zonal
feature vectors, and 74.56% is achieved for invariant moments-based feature map.
The recognition of Chinese characters with a vast character set has always been a
difficult challenge that needed to be solved quickly. This study by author Dongdong
and Zhang [12] focuses on the enhancement of the CRNN method and provides
a character recognition algorithm based on feature fusion to address the problem
of character recognition in artificial intelligence machine learning. Fine tuning’s
accuracy is determined to be higher than transfer learning’s, which can better extract
the features of the current data set and enhance accuracy, according to experimental
research. A deep convolution neural network is constructed based on MNIST, which
is better suited for Chinese character recognition data. Finally, on the training dataset,
it achieved accuracy of 0.99 and 0.99 on the test data, indicating that the neural
network model can completely suit the Chinese character recognition training set.
Syed Yasser Arafat and Muhammad Javed Iqbal address the issue that the non-
Latin and cursive script of the Urdu language presents in their article [13]. They
suggested utilising FasterRCNN with CNNs and regression residual neural networks
to recognise Urdu writing in real situations (RRNNs). The authors made use of a
collection of five unique datasets with embedded pictures in Urdu text. Using a two
stream deep neural network, this technique can recognise partial portions of synthetic
pictures with an accuracy of 95.20% and real-world photos with an accuracy of 76.6%
(TSDNN).
Chernyshova introduced an incredibly lightweight framework that can run on an
embedded or mobile device in this research [14]. The author suggested a compact
ANN that operates quickly and accurately on photos of poor quality. Tesseract 4.0
OCR and ABBYY FineReader 15 are both outperformed by this effort in terms of
speed and accuracy. The model was trained using MNIST and MDIV-500, and when
Handwritten Character Evaluation and Recommendation System 441
evaluated with data from the 1961 Census Sample, the model’s accuracy was 96.69%
(Table 1).
3 Dataset
The EMNIST dataset [15] was used to train the model as it contains 814,255 images
of handwritten characters (both capital and small mixed) and digits (zero to nine).
The input is an image of size 28 × 28 which is flattened into a single dimensional
array, thus the input provided to the model is a 1D array of size 1 × 784 and the final
output layer gives 62 outputs as there are 62 different labels (26 for capital alphabets,
26 for small alphabets, and 10 for digits). A total of 697,923 images were used to
train this model, and 116,323 were to test it (Fig. 1).
For the evaluation part of the programme, ‘Times New Roman’ font glyphs were
used that were downloaded from a website https://graphemica.com/. This website
contains various different font style glyphs that all each of size 500 × 500 × 3. So a
total of 62 images were downloaded from this website as there are 62 output classes.
4 Proposed Methodology
The method proposed consists of six major steps that can be categorised in three
parts (Fig. 2).
4.1 Pre-Processing
Pre-processing is a technique for preparing data for further processing processes and
converting raw data into a format that may be used. The steps involved in preparing
the EMNIST data for deep learning neural networks included following steps.
4.1.1 Reshaping
As the shape of the imported EMNIST data is in the form of a single dimensional
array, it has to be converted into a two-dimensional single channel image as shown
in Fig. 3.
442
Table 1 Literature review summary
S. no Author Dataset used Technique used Metrics Performance
1 Masoud Mahdianpari, Multispectral remote sensing SVM, RF, DenseNet121, Accuracy, Kappa coefficient, 96.17% accurate
Bahram Salehi, Fariba images InceptionV3, VGG16, VGG19, and F1-score
Mohammadimanesh, and Yun Xception, ResNet50,
Zhang InceptionResNetV2
2 Sri Manchala, Jayaram IAM dataset CNN, RNN, CTC Accuracy rate 90.30% accurate
Kinthali, Kowshik Kotha, and
Jagilinki Kumar
3 Ritik Dixit, Rishika Kushwah, MNIST dataset MLP, SVM, CNN Accuracy rate 99.31% accurate
and Samay Pashine
4 S. M. Shamim, Md Badrul Digit dataset MLP,SVM, Naïve Bayes, Accuracy, time consumption, 90.37% accurate
Miah, Angona Sarker, Masud Bayes Net, random forest, J48, Kappa coefficient, RMS, mean
Rana, and Abdullah Jobair random tree absolute error
5 Md Hossain, Zainul Hossain, BanglaLekha-Isolated FRDNN,TDNN,CNN Accuracy rate 96.99% accurate
Manik Abadin, and Md multipurpose comprehensive
Ahmed handwritten isolated character
samples dataset
6 Sanasam Inunganbi and Meitei Mayek dataset AlexNet, GoogLeNet, Accuracy rate 98.47% accurate
Robin Katariya ResNet-18, ResNet50, VGG16
7 Sulaiman Khan and Shah Handwritten Pashto characters MLSTM, decision trees Accuracy, time consumption, 89.03% accurate
Nazir database recall, F-score, precision,
specificity
(continued)
R. Mittal et al.
Handwritten Character Evaluation and Recommendation System
Table 1 (continued)
S. no Author Dataset used Technique used Metrics Performance
8 Dongdong He and Yaping MNIST Inception-v3, Accuracy rate 99% accurate
Zhang Inception-ResNet-v2,
CRNN-based feature fusion
9 S. Y. Arafat and M. J. Iqbal Custom Urdu text dataset TSDNN, Accuracy rate 99.06% accurate
FasterRCNN,
RRNN,
SqueezeNet, GoogLeNet,
Resnet18, Resnet50
10 Y. S. Chernyshova, A. V. MNIST, MIDV-500, the 1961 CNN, ANN Accuracy rate 96.69% accurate
Sheshkus, and V. V. Arlazarov census for England and wales
subsample
443
444 R. Mittal et al.
Fig. 3 Reshaping
To identify characters in our images, we first convert the entire images in perfect
black and white by converting them into grayscale, then applying a threshold. Then,
we dilate the images to enhance the characters and add a Gaussian filter on top to
smoothen out the image. After all the pre-processing is done, we find the contours
in image and store them in a list. Finally, we get the bounding rectangle for each
contour and use that to separate the characters out of the entire image (Fig. 4).
Then, finally each of the separated characters are resized to 28*28 in order to
perform prediction on them.
Handwritten Character Evaluation and Recommendation System 445
The process of extracting information from data that may be used for categorisation
purposes is known as feature extraction. Adam optimizer and the categorical-cross-
entropy loss function were used to create all of the models. We utilised a variety of
models for this, as seen below.
CNN is the most commonly applied to analyse images. In our proposed model, we
used two set of convolutional layers and an output layer, which totals to 11 layers
and 989,502 parameters shown in Fig. 5. The input given to the model were images
with size 28 × 28 × 1 as in the EMNIST without any processing.
446 R. Mittal et al.
It is a field where we apply knowledge from a problem into solving another problem.
In our case, we used popular pre-trained deep learning models on the ImageNet
dataset to extract features from the EMNIST data.
Since the dataset used to train models was ImageNet, EMNIST data had to be
processed before it could be passed onto the network. Therefore, all the images that
were originally of size 28 × 28 were scaled up to 84 × 84 × 3 due to the minimum
size requirement. Also the models were slightly modified at the input and output
layers to accommodate the EMNIST data (Fig. 6).
The following models were tested.
Simonyan and Zisserman [16] It contains five set of convolutional layers and an
output layer for a total of 16 layers and 14,714,688 parameters. The network was
trained on ImageNet with image size of 224 × 224 × 3. It managed to achieve 92.7%
accuracy in ILSVRC 2014, whereas VGG19 contains five set of convolutional layers
and two fully connected layers for output layer for a total of 19 layers (three extra
convolutional layers than VGG16) and 20,151,422 parameters.
Handwritten Character Evaluation and Recommendation System 447
Xception
ResNet50
Chollet et al. [19] It is a modified form of ResNet model which contains 48 convo-
lutional layers along with one max pool and one average pool layer for a total of
23,714,750 parameters. It allows very deep neural nets due to skip connections as
they remove vanishing gradient problem by giving an alternate path. It reached an
accuracy of 92.5% in ILSVRC 2015.
After the character is recognised, the next step is to find how accurate the formation
of the character is. This was done using similarity learning. For example, the cursive
writing books that come have a character printed with perfect formations and we ask
the child to write similar to that and then compare those to see if he/she is right/wrong.
448 R. Mittal et al.
Similarly, the idea behind this approach is that we can compare the image of character
formed by a child with a perfectly formed character and find the accuracy.
Similarity, learning is a sub-field of machine learning and the goal of it is to
measure how similar or related two objects are. There are a lot of different metrics
that may be utilised to measure similarity index between two images, the one used
here is structural similarity or SSIM.
It was proposed in 2004 as an evolved version to the old UQI and its performance
backed its popularity as it consistently outperformed a lot of other metrics. The only
downside for our use case being the fact that SSIM requires the images to be of the
same size which is not practical. Thus, requiring the images to be resized which leads
to a lot of distortions and errors.
After we know what the character is and how accurately is it formed, next step is to
find what is wrong in it and suggest a way to improve the character formation for
the next time. This was done simply by comparing the ratio of length (as shown in
Fig. 7 with red line) and width (as shown in Fig. 7 with blue line) of character under
test with the length and width of Times New Roman font Glyphs and the orientation
of character by calculating angle between the lines drawn from extreme points from
the image contours.
The extreme points of image contours refer to the following points on the image:
left most, rightmost, top, and bottom using the contours and thresholds. Then, we
calculate the distances between the left–right points as shown in Fig. 8 by red colour
and top–bottom by blue colour. Finally, we compare the ratio of the length of the
lines and angle between them to suggest an improvement that can be made either in
the ratio or orientation of the character.
Testing Accuracy
85.72
Accuracy Score
77.45 VGG 16
75.2 VGG 19
74.21
Xception
Resnet50
Models
As we can see from the Table 2, when deep learning and transfer learning are
combined, they may produce a very accurate model capable of accurately detecting
characters, as seen in the image below.
Capital C = .55
Small c = .17
The best accuracy was obtained with the proposed CNN-based neural net as the
other transfer learning models were not properly optimised for it and were not trained
on the entire dataset due to the lack of computational resources. Thus, it still leads
for a huge scope of improvement.
Also, for certain characters like the capital ‘C’ and small ‘c’, since the formations
are very similar in nature, most of the error comes from cases like this which leads to
the problem of not being able to differentiate between the small and capital letters.
This could be tackled by using similarity learning concept with the existing model
prediction probability. Basically, instead of a single prediction probability, we use
two prediction probabilities which give us two characters to compare the similarity
of the input with times new roman glyphs and takes it into account by giving priority
to the score which is more differentiated. This method provides an efficient result
for cases where capital and small letter formations overlap like letter C and Z.
This has been shown via the example in Fig. 9.
6 Conclusion
algorithms could also lead to the network becoming more resilient to cases where the
formation of small and capital letters is similar. Also this same methodology could
also be opted for other scripts such as Chinese and Devanagari.
References
1. Garg S, Kumar K, Prabhakar N, Ratan A, Trivedi A (2018) Optical character recognition using
artificial intelligence. Int J Comp Appli 179:14–20. https://doi.org/10.5120/ijca2018916390
2. Joshi P, Agarwal A, Dhavale A, Suryavansi R, Kodolikar S (2015) Handwriting analysis for
detection of personality traits using machine learning approach. Int J Comput Appl 130(15):40–
45. https://doi.org/10.5120/ijca2015907189
3. Wiley V, Lucas T (2018) Computer vision and image processing: a paper review. Int J Artificial
Intell Res
4. Jia X (2017) Image recognition method based on deep learning. In: 29th Chinese Control and
Decision Conference (CCDC), May 28, pp 4730–4735
5. Mahdianpari M, Salehi B, Mohammadimanesh F, Zhang Y (2018) Very deep convolutional
neural networks for complex land cover mapping using multispectral remote sensing imagery.
Remote Sens 10:1119. https://doi.org/10.3390/rs10071119
6. Manchala S, Kinthali J, Kotha K, Kumar J (2020) Handwritten text recognition using deep
learning with TensorFlow. Int J Eng Res 9. https://doi.org/10.17577/IJERTV9IS050534
7. Dixit R, Kushwah R, Pashine S (2020) Handwritten digit recognition using machine and deep
learning algorithms. Inter J Comp Appl 176:27–33. https://doi.org/10.5120/ijca2020920550
8. Shamim SM, Miah B, Sarker A, Rana M, Jobair A (2018) Handwritten digit recognition using
machine learning algorithms. Indonesian J Sci Tech 18. https://doi.org/10.17509/ijost.v3i1.
10795
9. Hossain M, Hossain Z, Abadin M, Ahmed M (2021) Handwritten bangla numerical digit
recognition using fine regulated deep neural network. Eng Int 9(2):73–84. https://doi.org/10.
18034/ei.v9i2.551
10. Inunganbi S, Katariya R (2022) Transfer learning for handwritten character recognition. https://
doi.org/10.1007/978-981-16-6369-7_63
11. Khan S, Nazir S (2022) Deep learning based Pashto characters recognition: LSTM-based
handwritten Pashto characters recognition system. In: Proceedings of the Pakistan Academy
of Sciences: A. Physical Comput Sci 58:49–58. https://doi.org/10.53560/PPASA(58-3)743
12. He D, Zhang Y (2021) Research on artificial intelligence machine learning character recognition
algorithm based on feature fusion. J Phys: Conf Ser 2136:012060. https://doi.org/10.1088/1742-
6596/2136/1/012060
13. Arafat SY, Iqbal MJ (2020) Urdu-text detection and recognition in natural scene images using
deep learning. IEEE Access 8:96787–96803. https://doi.org/10.1109/ACCESS.2020.2994214
14. Chernyshova YS, Sheshkus AV, Arlazarov VV (2020) Two-step CNN framework for text line
recognition in camera-captured images. IEEE Access 8:32587–32600. https://doi.org/10.1109/
ACCESS.2020.2974051
15. Cohen G, Afshar S, Tapson J, van Schaik A (2017) EMNIST: an extension of MNIST to
handwritten letters
16. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image
recognition. arXiv 1409.1556
17. Chollet F (2017) Xception: deep learning with depthwise separable convolutions, 1800–1807.
https://doi.org/10.1109/CVPR.2017.195
452 R. Mittal et al.
18. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, 770–778.
https://doi.org/10.1109/CVPR.2016.90
19. Chollet F et al (2015) Keras. Available at: https://keras.io/api/applications/
20. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: From error
visibility to structural similarity. IEEE Trans Image Process
Prediction of Lung Disease
from Respiratory Sounds Using
Convolutional Neural Networks
Abstract Recently, the pollution in the world is leading into many diseases which are
incurable at current situation, and also the treatments are not in the leading position.
These ailments have an impact on the tubes (airways) that carry gases like oxygen into
and out of the lungs. They frequently obstruct or tighten airways. Airway ailments
include chronic, asthma, obstructive pulmonary disease (COPD), and bronchiectasis.
People with airway illnesses frequently describe their sensation as “attempting to
exhale through a straw”. The main aim is to detect the lung diseases that are affected
in humans and that are very vulnerable to the human bodies. Currently, a collective
system for identifying and recommending lung problems is not easily available to
a large number of people, and it is technically implemented by machine learning
algorithms. This research makes use of deep learning techniques to learn from the
respiratory audio signal of patients. The unwanted noise is removed by means of
preprocessing, and it is converted to overlapping frames. The FFT is used to convert
the signals into the spectrogram, and it is used for predicting the lung diseases. The
convolution neural network is applied to spectrogram with eight hidden layers to
predict the COPD, upper respiratory tract infection (URTI), lower respiratory tract
information (LRTI), etc. This model provides 95% of accuracy, and also comparison
is done with the help of different optimizers like ADAM, SGD, and RMSPROP. This
system might be improved by further changing both the preprocessing techniques
and fine tuning the parameters. This system detects the lungs diseases in the earlier
stage itself, so that the disease may not be affect the whole body.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 453
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_35
454 R. Rajadevi et al.
1 Introduction
The main components of the human respiratory system are the lungs. On each side
of the chest are two spongy, air-filled organs called the lungs (thorax). Through
bronchi, tubular branches of the trachea (windpipe), air from the lungs is trans-
ferred. The bronchi were split into bronchioles, which were ultimately microscopic
branches. They carry oxygen from the atmosphere into the bloodstream and release
carbon dioxide from the circulation into the atmosphere, a process known as gas
exchange, in the respiratory system. Lung sickness includes asthma, COPD, infec-
tions such as influenza, pneumonia, and tuberculosis, lung cancer, and a number
of other breathing problems. The respiratory network, a network of tissues and
organs that aids in breathing, is a network of disorders that can lead to respira-
tory failure. Your nasal cavity, mouth, lungs, and blood vessels are all part of your
respiratory system. The respiratory system includes the muscles that operate your
lungs. Together, these parts transport oxygen around the body and expel waste gases
like carbon dioxide. Many disorders can have an impact on the tissues and organs
that comprise the respiratory system. Some develop as a result of airborne irritants,
such as viruses or bacteria that induce infection. Others arise due to illness or ageing.
The following conditions can produce inflammation or otherwise impact the respi-
ratory system: Allergies: some people develop respiratory allergies after inhaling
proteins including such dust, mould, and pollen. These proteins can induce airway
inflammation. Asthma: breathing becomes challenging due to the chronic (long-
term) inflammation of asthma that damages the airways. Infection: infections can
cause pneumonia or bronchitis. The flu (viral disease) and the common cold are both
respiratory diseases. Disease: respiratory ailments include lung cancer and COPD.
These conditions can compromise the respiratory system’s ability to circulate oxygen
through the body and filter out waste gases. An acoustic medical tool called a stetho-
scope is used to hear inner noises in an animal or human body. It typically has a small,
skin-contact resonator in the shape of a disc, and one or two tubes that connect to
two earpieces. A stethoscope can be used to listen to blood flow in arteries and veins
as well as heart, lung, and gastrointestinal noises. There are different types of stetho-
scopes, and one of it is electronic stethoscope. The electronic stethoscopes are more
costly than the normal stethoscopes, so most of the doctors use normal stethoscopes.
By electronically amplifying body noises, electronic stethoscopes use modern tech-
nology to overcome these low sound levels. Acoustic sound waves collected through
the chest piece must be converted into electrical signals, which are then delivered
through specially developed circuitry and processed for optimal listening. This may
save the sound signal in memory and use it for further processing by employing
electronic stethoscopes. Convolutional neural networks (CNNs) are a hierarchical
feature detector that is biologically inspired. It can learn very abstract properties and
reliably recognize items. The CNN is to determine whether an individual has COPD,
URTI, bronchiectasis, pneumonia, or bronchiolitis, or whether the patient is healthy.
It is worth noting that the majority of the good multi-class classification results came
from hybrid DL methods. Training a reliable fully convolutional architecture could be
Prediction of Lung Disease from Respiratory Sounds Using … 455
time-consuming and extremely intensive, despite the fact that it shows exceptionally
promising results without the need of complicated feature engineering procedures.
The training procedure is iterative as well, and it incorporates a large number of
model parameters, including datasets.
2 Literature Survey
Serbes et al. [1] have developed an approach for extracting multiple feature sets
from pulmonary signals using time–frequency and time–scale analysis The collected
feature sets are put into three distinct machine learning methods, both individu-
ally and as a group of networks. Furthermore, up to the time analysis, bandwidths
containing no-crackle data are deleted using the double tree complex wavelet trans-
form to increase the model’s effectiveness. Lin et al. [2] have announced the devel-
opment of an advanced wheeze detecting system. The order cutoff average approach
and a backpropagation neural network are used to create a wheeze detection system.
Some characteristics from processed spectra are used to train a BPNN, which then
analyzes test samples to see if they are asthmatic sounds. Chen et al. [3] have gathered
a data collection of normal and pathologic heart and lung noises. An effective and
automated diagnostic procedure is particularly appealing since it may help detect
possible hazards at an early stage, even without the assistance of a professional
doctor. Islam et al. [4] have analyzed even in the absence of wheeze, and enhanced
signal processing of anterior lung sound data was used to distinguish between normal
and asthmatic patients. For the multichannel signal, a spectral subband-based feature
extraction strategy is developed that works with ANN and SVM classifiers. To cate-
gorize normal and asthmatic participants, a collection of statistical characteristics is
calculated from each subband and used to ANN and SVM classifiers. Zulfiqar et al. [5]
have analyzed that the respiratory sound (RS) features and analyzes are an impor-
tant part of pneumonic pathology because they provide symptomatic information
about a patient’s lung. Doctors used to rely on mere listening to differentiate clinical
symptoms in pulmonary sound using a standard stethoscope, which is considered a
low-cost and safe approach of assessing patients. Because lung illness is indeed the
third leading cause of mortality globally, properly characterizing the RS anomaly
is critical in order to reduce mortality. Hafke-Dys et al. [6] have analyzed the AI
analysis of sounds captured during conventional stethoscope auscultation that can
be used to detect the intensities of diseased breath occurrences. The presence of
aberrant noises was assessed by a panel of three physicians who were not aware
of the AI predictions. The performance of each indicator in discriminating across
groups was evaluated. Lang et al. [7] introduce graph moderated CNNs (GS-CNNs)
for categorizing respiratory sounds into normal, crackle, and wheeze, using a low
labelled sample group and a high unlabelled sample group. The results show that
the suggested GS-CNNs exceed regular CNNs, and that the more graph-RS data
incorporated, the better the results. Pouyani et al. [8] have presented the lung sound
456 R. Rajadevi et al.
(LS) analysis by computer which was investigated as a potential technique for eval-
uating lung function. Background noise from various sources significantly pollutes
the LS signal. Traditional denoising procedures may be unsuccessful due to the LS’s
noisy nature and spectrum overlap with many noise sources. This study provides an
adaptive strategy for filtering LS signals in a noisy environment based on wavelet
transformation and ANN. The inter capacity of DWT is used with ANN as a change-
able structure filter in this unique technology. You et al. [9] have proposed a method
of automatic cough recognition in realistic audio recordings of patients that is crucial
for diagnosing and monitoring respiratory disorders like COVID-19. To date, several
detection systems have been invented, but none have reached the practical criteria.
They added a convolution just before LSTM to enhance cough characteristics and
maintain the sequence information in the sound source. The unique model on the final
feature map contains an incorporated boundary regression for improved detection
efficiency and more precise borders, which is critical for future analysis.
3 Proposed System
Lung disease is one of the symptoms of many severe diseases which may affect
patient’s health badly and sometimes may cause to the death. Severe pulmonary
illnesses damage the bronchi or other cells of the lungs, severely affecting human
lungs. Human respiratory sounds are used to assess the health from the lungs. A
wheezing sound is used to identify common lung diseases like COPD, LRTI, and
URTI. Respiratory sound is recorded by digital stethoscopes. These sounds can be
used to find the disease of lungs. Hence in the proposed system, it employs CNN
model for finding the type of lungs disease at early stage, which may save the life
of patients Fig. 1 illustrates the proposed architecture model for classification lungs
disease.
The following steps are involved in the processing of the model: (1) audio extrac-
tion, (2) dataset preprocessing, and (3) developing CNN model for detection of lung
diseases. Every audio signal consists of many features. First step is to convert the
signals from time domain to frequency domain in order to extract the features of the
signal. Extracted features are converted into numerical data, and it is given as input
to the training model and testing which result in the disease detection of the lungs in
the specific signal.
Respiratory sounds are the critical indications of respiratory health diseases. The lung
sound dataset was gathered locally at Jordan University of Science and Technology’s
King Abdullah University Hospital in Irbid, Jordan. The freely accessible ICBHI
challenge database was used to supplement this dataset. In addition to the 1176
Prediction of Lung Disease from Respiratory Sounds Using … 457
recordings, the dataset references contained a total of 215 patients with 309 clinically
obtained lung sound recordings. A total of 70 individuals with respiratory illnesses
such as bronchitis, fever, heart failure, bronchiectasis, respiratory problems, and
chronic obstructive pulmonary disease were included in the primary dataset (COPD).
Data was also collected from 35 healthy controls. This project employs the dataset
from the Kaggle; this dataset includes 920.wav soundfiles, 920 annotation.txtfiles, a
file with each patient’s illness, file explaining the file naming format, file listing 91
names (filename_differences.txt), and file containing demographic information for
each patient. Two primary research teams from the Portuguese and Greek created the
respiratory sound database. It contains 920 annotated recordings ranging in duration
from 10 to 90s. These recordings were recorded from 126 patients. There are 6898
respiratory cycles in all, 1864 of which contain the crackles, 886 of which contain the
wheezes, and 506 of which have both the crackles and wheezes. The data comprises
both the respiratory sounds and noise recordings that mimic real-world scenarios.
Patients include young people, adults, and the elderly.
the further processing of the data and using it for the prediction and analysis of the
audio data, and the process also involves the conversion of the audio signals into
desired nature that is required for particular mode that is analogue or digital mode
and it also make sure of enhancing the audio clarity that is more effective in analysis
of the audio data and produces much more effective output (Fig. 2).
The audio signals consist of frames and frequency and also have unnecessary
noises and unwanted signals, the first process is breaking the signals into overlapping
frames, and after this, it undergoes the FFT. It deco It breaks down a signal into
distinct spectral components and as a result provides frequency information about
the signal. It composes a signal into discrete spectral components and offers frequency
information about the signal as a result. Next, the audio signal that is processed from
the FFT is passed into the many filter banks, the resultant audio signal is passed
through log cube and again undergoes the FFT, and the resultant helps in producing
the spectrogram of desired extraction features that is useful in the process of analysis
in the audio data and predicting the lung diseases (Fig. 3).
In the frequency domain, mathematical functions and signals are expressed in
terms of speed rather than duration. A bandwidth graph, for example, depicts the
amount of signal present in each wavelength range, but a time-domain graph depicts
changes over time. The data can, however, be translated from a time domain to a
3.2.1 Audioframing
The FFT will induce distortions since audio is a non-stationary process. Any audio is
a stationary process for a brief duration. This audio signal is divided into short frames.
The FFT will be the same size as each audio frame. The frames must be overlapping.
We do this to maintain some association between the frames and to avoid losing
information on the frame’s boundaries after applying a window function. The FFT
assumes that the audio is continuous and periodic. The audio would be periodic by
framing the signal. The window function is used on each frame to keep the music
playing. The process will obtain high-frequency distortions if we do not do so. To
solve this, first apply a window function to the framed audio before doing FFT. The
window ensures that the signal terminates close to zero on both ends. Choosing
the right window is difficult and time-consuming. Hanning window is used for its
simplicity (Fig. 4).
Mel frequency cepstral coefficients (MFCCs) are a technique for extracting audio
characteristics. The MFCC divides the frequency band into subbands using the mel
scale then extracts the cepstral coefficients using the discrete cosine transform (DCT).
The mel scale is based on how people discern between frequencies, making sound
processing a breeze. The MFCC is a good algorithm to extract high-frequency and
low-frequency information (Fig. 5).
CNN is a hierarchical feature detector that is biologically inspired. It can learn very
abstract properties and reliably recognize items. This CNN approach is preferred
above other traditional techniques for the following reasons. CNN uses weight
sharing to lower the number of parameters. This smoothes the process and stops
The initial step of a CNN is the convolution layer. Convolution layers perform a
convolution operation on the input and transmit the output to the following layer. A
convolution is a process that turns all of the pixels or sound signals in its receptive area
into a single value. For example, applying a convolution to a sound signal reduces
the signal size whilst also bringing all of the information in the field together into
a single signal. The convolutional layer’s final output is a vector. This may employ
several types of convolutions depending on the sort of issue we need to solve and the
features we want to learn (Fig. 6).
The dimension of the feature extraction is lowered by pooling layers. As a result,
there is a reduction in the number of parameters that must be learned and the amount
of calculation carried out within the network. The preprocessing stage describes the
attributes of a convolutional layer-generated feature map. Flattening is the process of
combining all of the resulting pooled 2D arrays of feature maps which are combined
into a single sustained linear vector. To categorize image or sound data, the flattened
matrix is provided as input to the fully linked layer. Softmax functions are squashing
functions. Squashing functions confine the function’s output from 0 to 1 range. This
allows the result to be simply understood as a probability.
Σ
Softmax(xi) = exp(xi )/ j exp(xi ) (1)
The rectified linear unit function is an extra step on top of the convolution function.
The ReLU is employed in order to boost nonlinearity. This needs to enhance nonlin-
earity because pictures are very nonlinear and want to break up linearity. Padding is a
term used in convolutional neural networks to describe the quantity of pixels added to
an image during processing by the CNN kernel. Input is 1D signal having length 193,
this input is applied for convolution with 64 different filters, so 64 different outputs
are generated (input image [193,1]) convolution (64 filters, size [1,5]) = 64 different
outputs having size (10,046,1). Convolution1D will take all 64 input channels one by
one and generate 64 different output channels for each input channel; so in total, 64
× 64 = 4096 channels output should be generated. Input is 1D signal having length
193, and this input is applied for convolution with 128 different filters. Convolu-
tion1D will take all 128 input channels one by one and generate 128 different output
channels for each input channel; so in total, 128 × 128 = 16,384 channels output
should be generated. The resultant is send as input to the upcoming layer. Size of the
filters in the pooling layer is 1 × 1, and the number of filters is 2 which are used with
stride = 1. Now, the output image size is reduced to 24 × 24 × 2. The resultant values
from the max pooling layer 3 are given to the next convolutional layer. Number and
size of the filters in second convolution layer are 256 1 × 1. This input is applied
for convolution with 256 different filters, so 256 different outputs are generated.
Conv1D will take all 64 input channels one by one and generate 64 different output
channels for each input channel; so in total, 256 × 256 = 65,536 channels output
should be generated. With activation “ReLU”. Large value sets trained on partial
small datasets could overfit trained dataset. A positive drop value in a hidden layer
would be in the range of 0.5–0.8. Input layers employ a higher attrition, such as 0.8.
In this case, we use 0.3 as the dropout value for the 1D array value. For an example,
network with n = 100 nodes and given dropout rate is p = 0.3, and this will require
200 nodes (100/0.3)—n/p whilst using dropout. After completing extracting features
to flattening the result of layer 5 that is to be converted into one dimensional. The
resultant from the 3rd convolutional layer is flattened, and given matrix is reduced
to one vector of selective number neurons. Then, it is given to the fully connected
(dense) neural network. Like other classifiers, an CNN classifier requires private and
unique characteristics. This is helpful for those who require a feature vector. As a
result, you must transform the output of the CNN’s section into a single dimension,
and feature vector can be utilized by the CNN part. Flattening is the term for this
process. It takes the result of the convolutional layers and flattens all of its structure
into a single dimension feature vector that the dense layer may utilize for final classi-
fication. Activation:Softmax (activation = activations.softmax)–softmax modifies a
462 R. Rajadevi et al.
4 Experimental Analysis
• A real positive result (TP) is a model which accurately predicts the positive result
• A real negative, (TN), is a result in which the model properly predicts the negative
one.
• A false positive (FP) is an outcome in which the model predicts the positive class
inaccurately.
• A false negative (FN) is an outcome in which the model predicts the negative
class inaccurately.
5 Performance Analysis
Accuracy:
For the purposes of assessing the planned task, one of the most important aspects is
accuracy. Accuracy is defined as the ratio of the total number of samples to the true
positive and true negative samples.
Precision:
The proportion of correctly identified positive samples to all positive samples serves
as a measure of accuracy (either correctly or incorrectly).The precision of a model
is measured by its accuracy in categorizing a sample as positive.
F1-Score:
The F1-score is a single statistic that takes the harmonic mean of a classifier’s
accuracy and recall. It is mostly used to compare two classifiers’ performance.
Recall:
Recall is determined by the true positive value, and the model’s training benefits
from the total of all true positive and false negative values.
The optimizer as Adam, SGD, and RMSprop helps to compare the efficiency of
the system by various ways and helps in the detection of the lungs diseases and with
464 R. Rajadevi et al.
Fig. 8 Comparison of
various optimizers for epoch
50 (F1-score)
help of precision, recall, F1-score and accuracy helps to detect the diseases with the
high accuracy and efficient manner (Tables 1 and 2; Figs. 8, 9, and 10).
6 Conclusion
This system is used to determine whether an individual has COPD, URTI, bronchiec-
tasis, pneumonia, or bronchiolitis, or whether the patient is healthy. It is worth noting
that the majority of the good multi-class classification results came from hybrid
deep learning methods. Training a reliable fully convolutional architecture could be
time-consuming and extremely intensive, despite the fact that it shows exceptionally
promising results without the need of complicated feature engineering procedures.
The training procedure is iterative as well, and it incorporates a large number of model
Prediction of Lung Disease from Respiratory Sounds Using … 465
References
1. Serbes G, Sakar CO, Kahya YP, Aydin N (2013) Pulmonary crackle detection using time-
frequency and time-scale analysis. Dig Signal Process 23:1012–1021
2. Lin BS, Wu HD, Chen S (2015) Automatic wheezing detection based on signal processing of
spectrogram and backpropagation neural network. J HealthcEng 6:649–657
3. Chen Q, Zhang W, Tian X, Zhang X, Chen S, Lei W (2016) Automatic heart and lung sounds
classification using convolutional neural networks. In: 2016 Asia-Pacific Signal and Information
Processing Association Annual Summitand Conference (APSIPA), 1–4
4. Islam MA, Bandyopadhyaya I, Bhattacharyya P, Saha G (2018) Multichannel lung sound analysis
for asthma detection. Comput Methods Programs Biomed 159:111–123
5. Zulfiqar R, Majeed F, Irfan R, Rauf HT, Benkhelifa E, Belkacem AN (2021) Abnormal respi-
ratory sounds classification using deep CNN through artificial noise addition. Frontiers Med
8
6. Hafke-Dys H, Kuźnar-Kamińska B, Grzywalski T, Maciaszek A, Szarzyński K, Kociński J
(2021) Artificial intelligence approach to the monitoring of respiratory sounds in asthmatic
patients. Front Phys
7. Lang R, Fan Y, Liu G, Liu G (2021) Analysis of unlabeled lung sound samples using semi-
supervised convolutional neural networks. Appl Math Comput 411:126511
8. Pouyani MF, Vali M, Ghasemi MA (2022) Lung sound signal denoising using discrete wavelet
transform and artificial neural network. Biomed Signal Process Control 72:103329
9. You M, Wang W, Li Y, Liu J, Xu X, Qiu Z (2022) Automatic cough detection from realistic
audio recordings using C-BiLSTM with boundary regression. Biomed Signal Process Control
72:103304
SemWIRet: A Semantically Inclined
Strategy for Web Image
Recommendation Using Hybrid
Intelligence
Abstract With a rapid increase of Websites, retrieving information from the Internet
has become a very challenging and time-consuming endeavour. There is always room
for improvement in terms of yielding and recommending the correct informative
measures so that the user would get a very pertinent topic for which the user is
searching over the Internet. To ease the work of the user, an enhanced recommenda-
tion system is best suited to recommend the optimal solution. Furthermore, relevant
articles are displayed according to the user queries. It is highly likely that the rele-
vant search article recommended will be of no use to the user if the recommendation
algorithm is not effective. This paper proposes a semantic-infused hybrid intelli-
gence approach to recommendation for user-centric queries relevant using concept
and ANOVA cosine similarity and extreme gradient boosting (XGBoost) where the
documents are classified. Using the flickr30k dataset, the above experiment was
conducted, and 97.16% accuracy was achieved.
1 Introduction
The World Wide Web as the name implies is the largest accumulation of existing
information throughout the world. We are living in the era of Web 3.0, which is also
known as the “semantic Web” as it has harnessed the coexisting powers of big data and
M. Y. Bobde
Department of Computer Science and Engineering, SRM Institute of Science and Technology,
Kattankulathur, India
G. Deepak (B)
Department of Computer Science and Engineering, Manipal Institute of Technology Bengaluru,
Manipal Academy of Higher Education, Manipal, India
e-mail: gerard.deepak.christuni@gmail.com
A. Santhanavijayan
Department of Computer Science and Engineering, National Institute of Technology,
Tiruchirappalli, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 467
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_36
468 M. Y. Bobde et al.
machine learning together, where every user’s data and activities are being analyzed
and used to deliver more personalized web browsing experience and better richer app
experiences. Information extraction from the current scenario where a tremendous
amount of data is being generated on a daily basis is not just a tedious task but also
involves enormous efforts to link the user queries with the precise contents throughout
the World Wide Web. Even if emerging learning mechanisms are subsumed properly,
most Web-based search algorithms tend to be delinquent owing to the information
that looks much similar. Most of the Web-based recommendation systems are of two
types query-centred and user-centred. The query-centric Web page recommendation
systems pivot only on the relevance of the Web pages to the query that is being
launched into the system. Although, when the Web page recommendation system is
user-centric, the focus is predominantly on personalization and pleasuring the user.
These systems do not ease the diversification of search results as there is a scanty
of real-world knowledge that is being infused into the system. Semantic Web is
constituted with a large amount of data available on the current structure of the Web
by inclusion of metadata of the Web and to reason out. A strong semantic approach
which can transform the existing data into knowledge or which can incorporate
knowledge for the recommendation is the need of the hour. The annotation-based
text recognition system is the best out of the two other options available which
are context-based recommendation system and hybrid recommendation system as
the data is labelled in it, and it is also a text-based classification that addresses the
user-specific query efficiently as well as efficiently.
Motivation: Since the emerging technologies improve the quality and the enrich-
ment of personalization needed by the user accordingly, the Web is also moulding
itself in this and is on the Web 3.0 version, which is purely based on the semantics
and keeps on improving the quality of searched queries; but still, the need for more
semantic methods for the precise catering to the demanding informational needs of
users is needed much, and to fulfil it, we need an annotation-based recommendation
system, and add on to this, it is also the state of the art.
Contribution: Semantically inclined strategy for Web image recommendation
using hybrid intelligence is proposed for Web image retrieval purposes. Several
preprocessing techniques are employed in the preprocessing phase, amongst them
are lemmatization, tokenization, stopword removals, and neural network extractions.
Texts relevant to the topic are identified using the XGBoost classifier. The labelled
categorical dataset comprising of Web images is obtained from the flickr30k dataset
to enhance labels. By combining semantic similarities such as concept and ANOVA
cosine, we can achieve amazing accuracy and distinguish results with the maximum
level of consistency, and the accuracy is 97.17%, the F-measure of 97.16%, and an
FDR value of 0.04 is attained as a result of the proposed model.
Organization: The remaining paper proceeds as follows: Section 2 discusses
related works. Section 3 elaborates on the proposed system architecture. Section 4
deals with the implementation and results. Section 5 concludes the report.
SemWIRet: A Semantically Inclined Strategy for Web Image … 469
2 Related Works
It is very difficult to find precise information on the Web, where we can find billions of
data related to that field of search. Several people have researched this topic to arrive
at a particular and effective solution to such Web image retrieval problems. Xie et al.
[1] wherein authors stated that a cross-modal hashing (CMH) algorithm is an effective
method of retrieving images from the Web quickly, whereas traditional CMH uses a
streaming approach for the generation of the hash functions and codes that are quite
ineffective for retrieval of Web images using batch learning. Chen [2] stated that with
digital colour space quantization image colour distribution, the selected colour model
is closely related to the colour space, and hence, we can easily retrieve the images.
Li [3] asserted in their framework of isomorphic hashing, they proposed three key
principles that are useful for guiding the design of hashing algorithms, with a more
precise weight coefficient and a greater diversity of weights with higher precision.
They proposed a novel hashing algorithm based on these guidelines called BWLH.
Liu et al. [4] exploited the CNN structure by using a binary code learning framework,
named deep supervised hashing (DSH), in which they learned for image data of large
size, a compact similarity-preserving binary code must be created, where each image
will have discrete values with approximate shapes. Gupta et al. [5] performed the
study, in which they assessed existing image retrieval techniques in order to fetch
information from the cyberspace by using textual annotations on the images. Liu
et al. [6] analyzed the architecture of the deep convolutional neural network and
concluded fusing two kinds of deep convolutional features to retrieve images which
is effective. Wu et al. [7] where their framework proposes a novel local feature
learning algorithm for image retrieval that only requires image-level annotations and
can be trained end-to-end. Hou et al. [8] present V-RSIR as an open-access Web-
based tool that requires minimum training to identify RS images. It also allows users
to view RS image quantities and their spatial distributions. Huang et al. [9] proposed
the MRBDL model which is multi-concept retrieval using bimodal deep learning
through the use of convolutional neural networks, semantic correlations between
a visual image, and its context can be effectively captured. Vijayarajan et al. [10]
Google’s knowledge graph is a knowledge base that the Google uses to supplement its
search engine’s results with semantic-search data obtained from a range of sources.
Sejal et al. [11] proposed an algorithm to recommend images based on ANOVA
cosine similarity, where text and visual features are compared to fill the semantic
gap. This allows for visual synonyms to be computed based on semantically related
visual features found in images. Kaushik et al. [12] developed a new image search
algorithm for a search engine API, which filters out the relevant images from all the
images retrieved from the Wikipedia API and displays them to the user. Nair et al.
[13] proposed the work for a medical image retrieval where they made an effort to
find the ways and means of semantic gap and inferred that the feedback got the user
is a key factor. Kolahkaj [14] used the combination of wavelet transform and colour
histogram which was used to extract the features, in order to bridge the semantic
gap between low-level visual features of images and high-level semantics of the
470 M. Y. Bobde et al.
The trees are formed in a boosting order, with each successive tree striving to mini-
mize the mistakes of the prior tree. Each tree builds on its predecessors’ knowledge
and corrects any faults that remain. As a consequence, the next tree in the sequence
will use an updated set of residuals to learn. Scaling should be performed to numer-
ical features as well as categorical features should be encoded for achieving the best
performance over XGBoost. The classifier method uses the fit-predict method; so,
after preprocessing, the data is needed to be divided into train and test sets. We
preferred XGBoost because of the reasons that to handle different sorts of as the
XGBoost has a sparsity-aware split finding algorithm, as well as being especially
intended to make use of current technology, which is achieved by designating internal
buffers in each thread for the storing of gradient information and also, for the reason
of regularization as regularization helps in preventing overfitting.
The dataset is classified using the input query words; by supplying the features
from the input query words, the dataset is classified using the XGBoost classifier;
from the obtained classified dataset, the images are extracted from the labels because
the dataset is labelled categorical dataset comprising of Web images, and so, these
labels are again subjected to the synonymization; in this case, the WordNET synsets
are used for the purpose of synonymization, and once, the synonyms are generated
for each of the provided labels. Further, the label enrichment is done by aggre-
gating categorical domain ontology with these labels by computing the concept
similarity. Concept semantic similarity in information retrieval means that the degree
of semantic matching between the text and the user’s inquiry is always reflected in
the similarity. The term “similarity” refers to the degree of similarity between two
ideas. We set the following for two notions (A, B) and (C, D).
472 M. Y. Bobde et al.
1 |A ∩ B| |C ∩ D|
s((A, B), (C, D)) : = + (1)
2 |A ∪ B| |C ∪ D|
The similarity measurement in Eq. (1) is used to order the exact sibling concepts
where A and B are the labels of X where they are from the label synonymization, and
C and D are the labels of Y where they are from categorical domain ontology.
On the aggregation of the categorical domain ontology with the labelled
synonymized classified image labels, the entity enrichment is achieved. The initial
enriched query words, as well as the labelled aggregated with the ontology, are
subjected to the computation of Shannon’s entropy and the normalized Google
distance. By comparison, the Shannon entropy computes the information measure,
and the normalized Google distance determines the semantic similarity. The normal-
ized Google distance (NGD) determines accordingly to the number of Google search
hits for a set of keywords, semantic similarity is calculated. Keywords with similar
or identical natural language meanings are “near” in normalized Google distance
units, whereas terms with different meanings are “far” away. The normalized Google
distance (NGD) between two search keywords p and q, for instance in Eq. (2), is
represented by
where N is the total number of Web pages searched by Google times, the average
number of singleton search phrases found on those pages; f (p) and f (q) are the
number of hits for search terms p and q, respectively; and f (p, q) is the number of
Websites on which both p and q appear. If the NGD (p, q) = 0, then p and q are seen
as similar as possible, whilst if NGD (p, q) ≥ 1, then the given x and y are different.
The deviation criterion for Shannon’s entropy is considered as 0.25, whilst the
threshold for normalized google distance is set as 0.75, the threshold for concept
similarity is set as 0.5 for the reason that we want more aggregation of categorical
ontology for concept similarity. The threshold of normalized Google distance is set
as 0.75 is mainly due to the fact that we want more relevant entities to be populated,
Shannon’s entropy is considered for computing the information measure with the step
deviation of 0.25, and all this is done under the influence of Gaussian adaptation.
Signal processing systems use Gaussian adaptation, referred to as normal or
natural adaptation, to maximize the yield of components due to the statistical vari-
ance of component values. A wide range of optimization problems is addressed
by evolutionary algorithms and typically provides good approximation solutions
using evolutionary algorithms. The stochastic adaptive process implies that an n-
dimensional x[xT = (x1, x2,…, xn)] sample is taken from a multivariate Gaussian
distribution N(m, M), with mean as m and moment matrix as M.
The samples are tested to determine if they pass or fail. M* and M* are the
Gaussian’s first- and second-order moments confined to pass samples, respectively,
where the s(x) is a function determining the probability that x is chosen as a passing
sample, 0 < s(x) < q ≤ 1. The average possibility of finding pass samples (yield) is
SemWIRet: A Semantically Inclined Strategy for Web Image … 473
There is always a Gaussian probability density function (p. d. f) that is tuned for
maximum dispersion for every s(x) and any value of P < q. A local optimum requires
the criteria m = m* and M proportionate to M*.
Finally, the corresponding matching labels and the images are semantically rear-
ranged in increasing order similarities which is the normalized google distance, and
the corresponding images are recommended to the user. If the user records any further
clicks, these current user clicks are again passed into the Google knowledge graph
API model, or if there are no user clicks that are recorded which means the search
ends, here otherwise, the search continues until there are no further user clicks that
are recorded. The current user clicks are an annotation-based Web image search, so
therefore, only the terms pertaining to the current user clicks is sent, and the reason we
are making use of here annotation-based image search is for the reason that the World
Wide Web is overpopulated with annotations labels images, and in the semantic web,
it is happening that the label has to be perfectly matched concerning the images so
as a result of the annotation-based semantic, search is more than sufficient compared
to the traditional content-based Web search.
4 Implementation
Here, Eqs. (4) and (5) represent the precision and recall values, respectively, whilst
Eqs. (6) and (7) represent the accuracy as well as the F-measure, and Eq. (8) represents
the false discovery rate (FDR).
The dataset which was used for the experimentation is flickr30k. The performance
of SemWIRet was compared using IRACS MRDL and OBIR as baseline models, also
XGBoost with KNN was also used as the experimental hybridization as an evaluation
of the proposed SemWIRet model for receiving Web images was conducted; from
Table 1 also, the efficiency measuring F-measure accuracy in precision and recall
% along with the FDR value is used as potential metrics in order to evaluate the
performance of the proposed SemWIRet model. Since it is a standard formulation
for precision–recall accuracy F-measure % and FDR which were used, the precision–
recall accuracy F-measure % indicates the relevance of the result yielded, whereas
the FDR depicts the number of the false positives captured by the system. From
Table 1, it is indicative that the IRACS model yields the lowest precision of 87.12%,
the lowest recall of 89.44%, and the lowest accuracy of 88.28%. So, overall the
lowest F-measure of 88.26% with the highest FDR value of 0.13 and also, the Table
1. Based on the MRDL model, we have the following results: 91.16% accuracy,
SemWIRet: A Semantically Inclined Strategy for Web Image … 475
94.41% recall, and 92.78% accuracy, as well as 92.75% F-measure with a 0.09
FDR. Similarly, an average F-measure of 92.75% is returned by the OBIR model,
plus an average recall of 93.62%, the accuracy of 92.29%, and FDR of 0.09 with
an average percentage of 91.37%. By integrating XGBoost and KNN, we arrive
at an average precision of 91.23%, an average recall of 91.12%, and an average
accuracy of 90.67% with an FDR of 0.10. The proposed SemWIRet model yields
the highest precision of 96.63%, the highest average recall of 97.71%, the highest
accuracy of 97.17%, and the highest F-measure of 97.16% along with the lowest
FDR value of 0.04. The reason for the proposed model to excel in terms of precision,
recall, accuracy, F-measure, and FDR, for the SemWIRet model is mainly due to
the fact that it incorporates first and foremost, the LDA model for topic modelling
so the query words are enhanced using topic modelling by uncovering all the hidden
topics relevant to the query. Furthermore, the knowledge graph API is used for
incorporating the auxiliary knowledge in much more depth into the framework, it
is a standard knowledge store that ensures the entity density becomes very high
employing extracting content from Google’s knowledge graph API, so the query
words are enriched by the means of entities. Most importantly, the XGBoost model
is used for the classification, and label synonymization takes place using WordNet 3.0,
apart from that a standard categorical domain ontology is used for label aggregation,
and in particular, the incorporation of Shannon’s entropy and normalized Google
distance under the Gaussian adaptation mechanism ensures that the relevancy is very
high, and also, Shannon’s entropy with the step-deviation value and the normalized
Google distance with a threshold ensures the strength of the relevance computational
mechanism, and again, there is a refinement of the initial feasible solution to a
much more refined and organized solution. So, as a result, the proposed SemWIRet
model performs much better, that is, it is a semantic infused model, XGBoost, as
a classification model is being used along with that semantic similarity schemes,
like normalized Google distance and Shannon’s entropy, are being used, and most
importantly, lateral auxiliary knowledge addition by means of the knowledge graph
API and LDA for topic modelling ensures that a large density of global knowledge is
fed into the localized framework, and as a result, the proposed model is much better
as compared to baseline models.
IRACS, a proposed model yields the lowest F-measure precision–recall accuracy
and the highest false discovery rate value mainly since it uses ANOVA with cosine
similarity, the usage of ANOVA with cosine similarity is novel, and we trying to
fill the semantic gap; however, cosine similarity is very naive and traditional scope;
but still, it computes the relevance results where the relevancy factor is computed,
however, the global knowledge is not fed into the algorithm rather the algorithm only
uses the learning on the entities and features provided in the localized dataset alone.
So as a result, the IRACS model does not perform as expected.
Similarly, the MRDL model yields only a mediocre accuracy, F-measure. preci-
sion, recall when compared to the SemWIRet model and the reason for this is that
it uses deep learning techniques which is Web deep learning technique in which it
uses both of the characteristics of the image, as well as the characteristics of the text
and all of this, is trained using the convolutional neural network so the deep learning,
476 M. Y. Bobde et al.
however, the images do not ensure a better auxiliary knowledge for the model rather
they tend to make it more primitive, however, multiple concepts by enriching the texts
ensures in increasing the accuracy of the model despite this is retarded by means of
the usage of the image features when compared to the text features and also, global
knowledge, auxiliary knowledge, background knowledge is not supplemented into
the model so as a result, the semantic correlation happens only with respect to the
labelled concepts of the query rather than incorporating entity enrichment and that
is the reason this model does not performs well as expected.
The reason behind the OBIR model does not perform as expected is because
although it uses ontology and NLP techniques along with resource description frame-
work (RDF), it incorporates a lot of semantic methods, and the bag-of-words is
selected as the model. The relevance computation model fails, auxiliary knowledge
remains the same, but the relevance computational model is not very rigid; as a
result, there is a lag in the OBIR model. Ontologies increase the auxiliary knowl-
edge, and also the RDF knowledge is quite strong but the RDF knowledge would
use the subject-predicate, and object increases the complexity which again has to be
handled and that is the reason there is a lag in the model.
When XGBoost and KNN are used together, the same classifier is proposed,
SemWIRet is used, but nonetheless, it fails abruptly with respect to the F-measure
of precision–recall accuracy, and it has a higher FDR mainly due to that there is
absolutely no auxiliary knowledge fed in the framework, and relevance computation
is absent; by means of strict semantic similarity model based on vary thresholds
and also coming to KNN, the KNN is also a very naive classifier so the only use
of the high-end classifier does not serve rather it requires relevance computation
mechanism with auxiliary knowledge which is absent in this model consequences of
which the proposed XGBoost with the KNN does not perform as expected.
From Fig. 2, compared to other baseline models, the proposed model SemWIRet
has better accuracy vs no. of recommendations when the precision and recall curves
are plotted together. It is very clear from the curve that the proposed SemWIRet
is the highest precision with the increase in the number of recommendations when
compared to the baseline models, named IRACS, MRDL, OBIR, and XGBoost, with
KNN. The second and third position are occupied by the MRDL and OBIR, respec-
tively; even so, they almost have a similar precision–recall accuracy F-measure, and
then, the fourth-lowest position belongs to the XGBoost with KNN, and the IRACS
model has the lowest precision versus the number of recommendations curve.
5 Conclusions
Using an infused hybrid intelligence model for Web image retrieval, a Web image
recommendation system is developed. The proposed system first preprocesses labels
based on input images, and then, XGBoost is used to weight documents based on
the preprocessed labels. Concept and ANOVA cosine semantic similarities, many
relevant factors are integrated to obtain the top resultant relevant queries search which
SemWIRet: A Semantically Inclined Strategy for Web Image … 477
is further incorporated into Google’s knowledge graph API by which the auxiliary
knowledge is fed in much more depth into the framework, and also, the use of LDA
in topic modelling is done to uncover all hidden topics and enrich query words.
Finally, the experiment is carried out for the modified dataset of flickr30k that
yields 97.16% F-measure with the lowest FDR value of 0.04 which is achieved for
the proposed SemWIRet model.
References
1. Xie et al (2016) Online cross-modal hashing for web image retrieval. In: Thirtieth AAAI
Conference on Artificial Intelligence
2. Chen YY (2016) The image retrieval algorithm based on colour feature. In: 7th IEEE
International Conference on Software Engineering and Service Science (ICSESS)
3. Li H (2019) A novel web image retrieval method bagging weighted hashing based on local
structure information. Int J Grid Utility Comp 11(1)
4. Liu et al (2016) Deep supervised hashing for fast image retrieval. IEEE Comp Vision Pattern
Recogn, 2064–2072
5. Gupta et al (2020) Comparative analysis of image retrieval techniques in cyberspace. Int J
Students’ Res Tech Manag
6. Liu H et al (2017) Image retrieval using fused deep convolutional features. In: ICICT
7. Wu et al (2021) Learning deep local features with multiple dynamic attentions for large-scale
image retrieval. In: ICCV
8. Hou et al (2019) V-RSIR: an open access web-based image annotation tool for remote sensing
image retrieval. In: IEEE
9. Huang C, Xu H, Xie L, Zhu J, Xu C, Tang Y (2018) Large-scale semantic web image retrieval
using bimodal deep learning techniques. Inf Sci 430:331–348
10. Vijayarajan V, Dinakaran M, Tejaswin P, Lohani M (2016) A generic framework for ontology-
based information retrieval and image retrieval in web data. HCIS 6(1):1–30
478 M. Y. Bobde et al.
11. Sejal D, Ganeshsingh T, Venugopal KR, Iyengar SS, Patnaik LM (2016) Image recommendation
based on ANOVA cosine similarity. Procedia Comp Sci 89:562–567
12. Kaushik A, Jacob B, Velavan P (2022) An exploratory study on a reinforcement learning
prototype for multimodal image retrieval using a conversational search interface. Knowledge
2(1):116–138
13. Nair LR, Subramaniam K, PrasannaVenkatesan GKD, Baskar PS, Jayasankar T (2021) Essen-
tiality for bridging the gap between low and semantic level features in image retrieval systems:
an overview. J Ambient Intell Humaniz Comput 12(6):5917–5929
14. Kolahkaj M (2022) An image retrieval approach based on feature extraction and self-supervised
learning. In: 2022 Second International Conference on Distributed Computing and High
Performance Computing (DCHPC), pp 46–51, March. IEEE.
15. Noor J, Shanto MNH, Mondal JJ, Hossain MG, Chellappan S, Al Islam AA (2022)
Orchestrating image retrieval and storage over a cloud system. IEEE Trans Cloud Comp
16. Deepak G, Priyadarshini JS (2018) Personalized and enhanced hybridized semantic algorithm
for web image retrieval incorporating ontology classification, strategic query expansion, and
content-based analysis. Comput Electr Eng 72:14–25
17. Gulzar Z, Leema AA, Deepak G (2018) PCRS: Personalized course recommender system
based on hybrid approach. Procedia Comp Sci 125:518–524
18. Kumar A, Deepak G, Santhanavijayan A (2020) HeTOnto: a novel approach for concep-
tualization, modeling, visualization, and formalization of domain centric ontologies for heat
transfer. In: 2020 IEEE international conference on electronics, computing and communication
technologies (CONECCT), pp 1–6, July
19. Varghese L, Deepak G, Santhanavijayan A (2019) An IoT analytics approach for weather
forecasting using raspberry Pi 3 model B+. In: 2019 fifteenth international conference on
information processing (ICINPRO), pp 1–5, December. IEEE
20. Ojha R, Deepak G (2021) Metadata driven semantically aware medical query expansion. In:
Iberoamerican Knowledge Graphs and Semantic Web Conference, pp 223–233, November.
Springer, Cham
21. Rithish H, Deepak G, Santhanavijayan A (2021) Automated assessment of question quality
on online community forums. In: International Conference on Digital Technologies and
Applications, pp. 791–800. Springer, Cham
Optimal Drug Recommender
Framework for Medical Practitioners
Based on Consumer Reviews
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 479
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_37
480 P. Khanna et al.
1 Introduction
The clinical space is quickly extending with new medications showing up consis-
tently. Simultaneously, with increase in intricate and complex medicine routines indi-
vidual often gets lost that creates a sorry state that one needs to reconsider to keep
away from the undesirable aftereffects resulting from medication routine. Sometimes
in certain cases, individuals often consume medications without consulting with a
doctor, which consequently might be a genuine danger factor later on, particularly
when certain insights about the current clinical plan or clinical records are not thought
about. One of the most potential research domains with enough funding support on
the net is health-based records and information. According to a survey conducted by
the Pew Internet and American Life Project in 2013, 59% of Americans have looked
online for health information, with 35% of those who did so focusing on identifying
a medical issue. Approximately, 40% of drug errors take place as experts provide
prescriptions based on their limited sample space experience. General observations
that may contribute to these issues can be summarized as [1].
i. For serious health conditions, several health centers/hospitals lack either infras-
tructure or specialized medical experts.
ii. Efficient diagnosis is mostly dependent on the expert’s expertise, particularly for
inexperienced novices who are generally more prone to commit errors.
Health records about diagnosis data in hospitals have remained untouched and
have not been mined, and this sometimes hides new discoveries with respect to new
conclusions about data values. Patients have begun discussing their clinical reports,
as well as their perspectives on the prescription, as a result of the advancement in
innovation. This result is a large amount of unstructured data that must be managed
and examined in order to obtain useful information. Sentiment analysis is a well-
known part of NLP that focuses on analyzing the beliefs or viewpoints of people
who are influenced by a substance. This domain has grown tremendously on various
microblogging sites like as Twitter, Instagram, Flickr, and others, where the clients’
opinions are eliminated from the text, photographs, emojis, and recordings. In any
case, this field is mostly ignored in the clinical setting. The Internet’s steady evolution
has increased the amount of client-generated data available on the Web. Patients are
increasingly uploading surveys after taking medications in order to put themselves
out there and raise public awareness.
A recommendation system is a type of information filtering system that tries to
predict how a user would evaluate or prefer a certain item. In layman’s terms, it is
an algorithm that recommends products to consumers based on their interests. For
example, on Netflix, which movie to watch, on e-commerce, which product to buy,
on Kindle, which book to read, and so on. There are many use-cases of it. Some are
as follows:
i. Personalized content: Enhances the on-site experience by making dynamic
recommendations for various audiences, similar to what Netflix does.
Optimal Drug Recommender Framework for Medical Practitioners … 481
ii. Better product search experience: Helps to categories the product based on their
features. E.g.: material, season, etc.
A drug recommendation system helps in recommending medicines to patient with
a particular condition based on the previous consumer reviews and ratings. This is
achieved with the help of more accurate feature selection and opinion mining from
the previous experiences of consumers. Feature selection is the process of choosing,
changing, and transforming raw data into features that may be used in supervised
learning. Feature engineering is the process of using statistical or machine learning
approaches to turn raw observations into desired attributes. While sentiment analysis
on the other hand is used to determine the emotional significance of communica-
tions, analytic techniques such as statistics, natural language processing, and machine
learning are used [2–5]. The paper is arranged as follows: Section 2 discusses motiva-
tion for the designing the recommender system. Section 3 discusses the methodology
proposed for the work, Sect. 4 presents materials and methods employed during the
work conducted, and finally, conclusion is presented in Sect. 5.
2 Related Work
In recent years, a large volume of clinical data spread over several Websites on the
Internet makes it difficult for individuals to locate useful information for improving
their health. Furthermore, the overabundance of medical information has made
it difficult for medical professionals to make patient-centered decisions. These
concerns highlight the potential requirement of recommender systems to be used in
the healthcare industry to assist both end-users and medical professionals in making
more efficient and accurate health-related decisions.
To deal with the growing problem of online information overload, recommender
systems strive to give users with customized products and services. Since the mid-
1990s, many recommender system strategies have been presented, and a range of
recommender system software has lately been built for a variety of applications.
The majority of recommender technologies are used in the areas of e-government,
e-business, e-commerce/e-shopping, e-learning, e-tourism, and so on. However, few
recommender technologies exist in the medical field, and this work focuses on the
construction of a medicine recommender system as well as mining information from
medical case data.
Common recommendation tactics include collaborative filtering (CF), content-
based (CB), knowledge-based (KB), and hybrid recommendation systems. Each
recommendation technique comes with its own set of advantages and disadvantages:
Collaborative filtering (CF)-based recommendation systems aid people in making
decisions based on the opinions of others who share similar interests, although CF
has sparseness, scalability, and cold-start concerns. Knowledge-based (KB) recom-
mendations promote goods to users based on information about the users, products,
482 P. Khanna et al.
and/or their relationships. A functional knowledge base is often kept for KB recom-
mendations that describes how a single item satisfies a specific user’s requirement. To
increase performance and overcome the disadvantages of traditional recommenda-
tion systems, a hybrid model has been designed that employs the best sections of two
or more recommendation strategies into one hybrid strategy. A new recommendation
system is required to solve these difficulties in a new application area [6–9].
With advent of technologies, lot of innovations have happened in diversified
data mining techniques of recommender system in healthcare domain for diagnosis,
treatment, or prognosis [10]. Collaborative filtering (CF) assisted and based on e-
commerce to estimate risks in heart attack and outperforms existing techniques, i.e.,
SVM or linear regression [11]. A model titled iCARE supported by CF, and ensemble
learning estimates predictions of patient disease risks based on disease history [12].
An incremental CF technique, W-InCF, employing Mahalanobis distance and fuzzy
membership calculates the risk of women in pregnant condition during process of
delivery [5]. Apart from these, various models were designed based on case-based
reasoning (CBR). A recommendation model to estimate different state of diabetes
condition according to various features shortlisted employing rough set feature reduc-
tion [13]. A CBR model can also be applied to estimate radial dose in cancer treat-
ment, and weights of parameters are optimized by bee colony optimization [14].
Suggested work without technology support acquires lab test results in numerical
form as patient feature to prepare similar case analogy in predicting the test cases.
Accurate prediction for such problems is particularly quite easy with binary result as
benign or infected. However, diagnostic test cases, medical inference, and treatment
path in medication context can be very complex, and no direct steps for treatment
may be traced. Conclusions on prescription are to be based on among diversified
combination of drugs. Permissible combinations, diversified permutations, and effi-
cient effectiveness of relationships between various drugs will further increase the
complexity of the medication. Till date, no accurate and efficient model have been
designed to support decisions for recommending perfect drug combination.
3 Proposed Methodology
Work proposed is an effort to estimate the most popular drugs among the options
available in market using NLP assisted by sentiment analysis and learning techniques,
the recommender system designed will as primary step acquire data samples from
the corpus and perform data cleaning before applying recommender model. Steps
involved may be summarized as follows:
2. Data Pre-processing
Putting together data in order to “change” it and observe what looks to be “normal”
and what appears to be “abnormal.“ The allocation of a variable shows what respect
the variable takes and how sometimes the variable takes these qualities.
The output shows the mean, standard deviation, min, and max value. It can be
concluded that most the reviews from the consumers are positive. In addition, when
compared to the other useful statistics, the max count column for the useful count
column is extremely high. After the summarization of the dataset, a step further
is taken. Now, the analysis of useful and useless drug will be done for the better
comprehension of the task. Figure 3 summarizes rating and useful count column for
the dataset.
Next important step is to unveil interesting relationships, trends, and pattern hidden
inside the dataset. This can be accomplished by determining the relationship that exist
between key factors in the data sample. For this step, the following will be done.
i. Check for the distribution of rating and useful count columns depicted in Fig. 4.
ii. Check for the relationship between rating and usefulness count.
Figure 5 suggests that positive and almost linear relationship exist between rating
and useful count column. Useful count is increasing as the rating is increasing.
After this typical data preparation strategies were employed such as verifying and
eliminating null values, duplicating rows, deleting superfluous values, and removing
text from rows. After that, all null values were eliminated. Then, a visual analysis
of features was done to get a better insight. Top 20 drugs per condition, bottom 20
drugs per condition, number of reviews in all the years, mean rating in the years, and
number of reviews per month were some of the aspects of visualization [17–20].
Figure 6 depicts a graphic representation of the 10-star rating system’s value
counts. The vast majority choose four attributes; the numbers 10, 9, 1, 8, and 10 are
Optimal Drug Recommender Framework for Medical Practitioners … 485
nearly identical. It demonstrates that the positive side of the scale is higher than the
negative, and that people’s reactions are polarized.
from the review feature. After the stop word removal process, tokenization and stem-
ming were done. When stemming is utilized, words are reduced to their word stems.
A word stem does not have to be the same root as a dictionary-based morphological
root; it simply needs to be the same size as or smaller than the word. This was done by
cleaning the reviews by eliminating HTML tags, punctuation, quotations, and URLs,
among other things. To avoid duplication, the cleaned reviews were lowercased, then
tokenization was used to break the sentences down into little chunks called tokens.
Stop words such as “a, to, all, we, with, and so on” were also eliminated from the
corpus. By executing lemmatization on all tokens, the tokens were returned to their
foundations.
2. Sentiment Analysis
A bag of words model, which converts documents into vectors and assigns a score to
each word in the document, is a popular technique for constructing sentiment analysis
models. For this step, a bag of word (BOW) model was created with TF-IDF and
count vectorizer. The bag of words model will help in generating a set of vectors
holding the count of word occurrences in the text (reviews), and the TF-IDF model
will enhance the work by including information on both the more significant and
less important words. The bag of words vectors are simple to understand. TF-IDF,
on the other hand, frequently outperforms TF-IDF in machine learning models. The
approach was to give low weight to phrases that appeared frequently in the dataset,
implying that TF-IDF measures relevance rather than recurrence. The possibility of
finding a word in a document is known as term frequency (TF).
The inverse document frequency (IDF) is the inverse of the total number of times a
phrase appeared in the corpus. It detects how a particular phrase is document-specific.
N
IDF(t, d) = log (2)
count d D:t D
3. Model Building
The data modeling for this work was done by Naïve Bayes and random forests clas-
sifier. TF-IDF vectorizer is the vectorizer used. The text vectorizer term frequency-
inverse document frequency converts the text into a useable vector. Term frequency
(TF) and document frequency (DF) are combined in this idea (DF). The term
frequency refers to the number of times a term appears in a document. This is a
typical approach for converting text into a meaningful numerical representation,
which is then used to fit a machine learning algorithm for prediction.
Looking at the distribution of useful count in Fig. 7, one can see that the difference
between the least and the most is roughly 1300, which is significant. The idea is that
the more medications people look for, the more people read the survey, regardless
of whether their review is good or negative, increasing the useful count. As a result,
we normalized useful count by circumstances while developing the recommender
system. Work presented simple technique for extracting opinion from the sample data,
data acquired was subjected to classification for developing a recommender system,
consumer reviews were analyzed using the Vader tool and NLP-based sentiment anal-
ysis algorithm, and the model was verified using random forest classifier that achieved
82.85% accuracy which is at par with existing techniques. Future work involves
comparison of different oversampling techniques, using different values of n-grams,
and optimization of algorithms to improve the performance of the recommender
system.
5 Conclusion
The clinical space is quickly extending with new medications showing up consis-
tently. Simultaneously,with increase in intricate and complex medicine routines indi-
vidual often gets lost that creates a sorry state that one needs to reconsider to keep
away from the undesirable aftereffects resulting from medication routine. Some-
times in certain cases, individuals often consume medications without consulting
with a doctor, which consequently might be a genuine danger factor later on, partic-
ularly when certain insights about the current clinical plan or clinical records are
not thought about. The principle point of the drug analysis is to serve the human to
make them liberated from expected sickness or anticipation of the infection. For the
medication to fill its planned need, it ought to be liberated from pollutants or other
impedance which may hurt people. This was simply understanding the foundation
of the ventures.
The work proposed incorporated the best-predicted result of each approach to
the recommendation framework. A good assembly of different expected findings is
required for improved results and understanding. Work presented simple methods
for extracting sentiment from data and subjected the data to classification to develop
a optimal recommender system, consumer reviews were analyzed using the Vader
tool and NLP-based sentiment analysis algorithm, and the model was verified using
random forest classifier that achieved 82.85% accuracy which is at par with existing
techniques. Future work involves comparison of different oversampling techniques,
using different values of n-grams, and optimization of algorithms to improve the
performance of the recommender system.
References
1. Fox S (2013) Health Online 2013. Pew Internet & American Life Project, January. https://
www.pewresearch.org/internet/2013/01/15/information-triage
2. McInnes DK, Saltman DC, Kidd MR (2006) General practitioners’ use of computers for
prescribing and electronic health records: results from a national survey. Med J Aust 185:88
3. Ting S, Kwok SK, Tsang AH, Lee W (2011) A hybrid knowledge-based approach to supporting
the medical prescription for general practitioners: Real case in a Hong Kong medical center.
Knowl-Based Syst 24:444–456
4. Esfandiari A, Babavalian MR, Moghadam A-ME, Tabar VK (2014) Knowledge discovery in
medicine: current issue and future trend. Expert Syst Appl 41:4434–4463
5. Lu X, Huang Z, Duan H (2012) Supporting adaptive clinical treatment processes through
recommendations. Comput Methods Programs Biomed 107:413–424
6. Guo WY (2008) Reasoning with semantic web technologies in ubiquitous computing
environment. J Soft 3(8):27–33
7. Hamed AA, Roose R, Branicki M, Rubin A (2012) T-Recs:Time-aware Twitter-based drug
recommender system.In: 2012 IEEE/ACM International Conference on Advances in Social
Networks Analysis and Mining
8. IBM (2017) IBM Watson health. http://www.ibm.com/watson/health/
9. Kitchenham B, Charters C (2007) Guidelines for performing systematic literature reviews in
software engineering. 2007 Joint Report—EBSE 20 07-0 01
490 P. Khanna et al.
10. Hassan S, Syed Z (2010) From Netflix to heart attacks: collaborative filtering in medical
datasets. In: Proceedings of the 1st ACM International Health Informatics Symposium, pp
128–134
11. Davis DA, Chawla NV, Christakis NA, Barabási A-L (2010) Time to CARE: a collaborative
engine for practical disease prediction. Data Min Knowl Disc 20:388–415
12. Komkhao M, Lu J, Zhang L (2012) Determining pattern similarity in a medical recommender
system. In: Xiang Y, Pathan M, Tao X, Wang H (eds) Data and knowledge engineering. Springer,
Berlin, pp 103–114
13. Teodorovi D, Šelmi M, Mijatovi-Teodorovi L (2013) Combining case-based reasoning with
Bee Colony Optimization for dose planning in well differentiated thyroid cancer treatment.
Expert Syst Appl 40:2147–2155
14. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC et al (2010) Mayo
clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component
evaluation and applications. J Am Med Inform Assoc 17:507–513
15. Kushwaha N, Goyal R, Goel P, Singla S, Vyas OP (2014) LOD cloud mining for prognosis
model (case study. native app for drug recommender system), AIT 04, 03, 20–28
16. Mahmoud N, Elbeh H (2016) IRS-T2D. individualize recommendation system for type 2
diabetes medication based on ontology and SWRL. In: Proceedings of the 10th International
Conference on Informatics and Systems—INFOS ‘16. ACM Press, New York, New York,
USA, 203–209. https://doi.org/10.1145/2908446.2908495
17. Medvedeva O, Knox T, Paul J (2007) DiaTrack. Web-based application for assisted decision-
making in treatment of diabetes. J Comp Sci Colleges 23(1):154–161
18. Protégé (2016) Protégé. http://protege.stanford.edu/. Accessed 15 March 2017
19. Rodríguez A, Jiménez E, Fernández J, Eccius M, Gómez JM, Alor-Hernandez G, Posada-
Gomez R, Laufer C (2009) SemMed: applying semantic web to medical recommendation
systems. In: 2009 First International Conference on Intensive Applications and Services
20. Sun L, Liu C, Guo C, Xiong H, Xie Y (2016) Data-driven Automatic Treatment Regimen
Development and Recommendation. In: KDD’16 Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, 1865–1874
21. Zhang Q, Zhang G, Lu J, Wu D (2015) A framework of hybrid recommender system for
personalized clinical prescription. In: 2015 International Conference on Intelligent Systems
and Knowledge Engineering
22. Zhang Y, Zhang D, Hassan MM, Alamri A, Peng L (2015) CADRE. cloud-assisted drug
recommendation service for online pharmacies. Mobile Netw Appl 20(3):348–355
23. Doulaverakis C, Nikolaidis G, Kleontas A, Kompatsiaris I (2014) Panacea, a semantic-enabled
drug recommendations discovery framework. J Biomed Semantics 5:13
Development of Intelligent Framework
for Early Prediction of Diabetic
Retinopathy
Abstract Diabetes affects blood arteries throughout the body, particularly those in
the kidneys and eyes. Diabetic retinopathy is a condition in which the blood vessels
in the eyes get damaged as a result of diabetes (DR). Diabetic retinopathy is a severe
public health problem that is one of the leading causes of blindness across the world.
Diabetic retinopathy is a kind of microvascular disease that can affect diabetics.
Diabetic retinopathy causes vision problems and might even result in blindness. If a
diabetic person is not treated for a long time, diabetic retinopathy is more likely to
develop. Diabetic retinopathy progresses to the point that it causes symptoms. In the
early stages of the infection, diabetics may be unaware that they have been infected.
Clinically, DR is diagnosed by looking at it directly or through imaging techniques
like fundus photography or optical coherence tomography. Various methodologies
were proposed by different researchers for the prediction of DR in diabetic patients,
but the prediction of DR in the early stages in literature has not been visualized more
broadly way. The focus of this research is to design an intelligent framework for the
prediction of DR in the early stages of diabetic patients.
1 Introduction
Diabetes is a condition that affects blood vessels all over the body, especially those in
the kidneys and eyes [1]. Diabetic retinopathy is a disorder in which the blood vessels
of the eye are compromised (DR). Diabetic retinopathy is a serious public health
issue and one of the primary causes of blindness worldwide. Diabetic retinopathy is
a microvascular problem that can develop in diabetic people [2]. Diabetic retinopathy
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 491
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_38
492 A. Husain and D. Malhotra
impairs vision and can lead to blindness. If a diabetic individual is not treated for a
long time, he or she is more likely to develop diabetic retinopathy [3]. Later, stages
of diabetic retinopathy become symptomatic. Diabetic people may not be aware
that they have been infected with the illness in the early stages. In the clinic, DR is
diagnosed by examining directly at the retinal fundus or through imaging techniques
such as fundus photography or optical coherence tomography [4].
The word “retinopathy” refers to damage to the retina in general. DR arises when
the tiny blood arteries that supply the retina’s tissue and nerve cells are damaged.
There are generally no early warning signals of diabetic retinopathy [5]. It can only
be identified with a thorough eye exam that searches for early indications of the
illness, such as [6].
• Macular edema (swelling).
• Pale, fatty deposits on the retina.
• Damaged nerve tissue.
• Any changes to the retinal blood vessels.
By 2040, diabetes will affect an estimated 600 million people, with one-third of
them having diabetic retinopathy (DR), the leading cause of vision loss in working-
age adults around the world [7]. Mild non-proliferative DR (NPDR) is a kind of DR
in which microaneurysms are present in the early stages. Proliferative DR (PDR) is a
more advanced kind of DR that can result in significant vision loss. Regular DR tests
are essential in order to receive quick treatment and avert vision loss [8]. Glycemia
and blood pressure control can help to slow the progression of DR, whereas late-
stage treatments like as photocoagulation or intravitreal injection can help to avoid
vision loss [9]. Although many professional associations urge routine DR screening,
thorough DR screening is not routinely used due to difficulties in finding human
assessors [10]. Imaging investigations such as fluorescent angiography and optical
coherence tomography (OCT) (see our OCT section) can aid in the identification and
treatment of diabetic eye abnormalities. To address abnormalities in the eyes caused
by diabetes, retina specialists employ medications, laser treatments, or surgery [11].
The earlier diabetes-related abnormalities are detected, the better the chances of
keeping vision. The optometrist will have the best chance of diagnosing this and all
other issues with an OCT extended eye examination. Getting your eyes tested at least
once a year might help you avoid losing your eyesight due to diabetes [12]. Main-
taining tight blood sugar and blood pressure management, as well as not smoking,
will help lower the risk of diabetic eye abnormalities. The diagnosis of DR has been
considered in the later stages of the diabetic patients, and hence, these techniques are
inefficient for the detection of DR in the early stages [13]. There are two stages of
DR in diabetic patients, and most of the research had been going on the later stages of
DR. Both of these stages ultimately result in blindness in diabetic patients gradually
[14].
If automatic DR screening is available, it relieves doctors of a significant amount
of effort [15]. With the adoption of automatic DR screening, the ratio of patients
to doctors will drop, saving time, and money and increasing the effective use of
existing resources [16]. Another advantage of introducing automatic DR screening
Development of Intelligent Framework for Early Prediction of Diabetic … 493
Fig. 1 Images of the retinal fundus at various stages of diabetic retinopathy. a stage II: mild non-
proliferative diabetic retinopathy; b stage III: moderate non-proliferative diabetic retinopathy; c
stage IV: severe non-proliferative diabetic retinopathy; d stage V: proliferative diabetic retinopathy
[18]
is that patients who live in rural places and do not have access to medical facilities
can be treated through telemedicine [17]. We aim to find an efficient and accurate
technique for DR detection that takes less computational time and gives better results
than existing methods (Fig. 1).
Depending on medical professionals
2 Literature Review
• Shankar et al. [19] provided another method for preparing fundus pictures by
utilizing histogram-based segmentation to remove areas with lesions. This paper
used the synergic DL (SDL) model as a classification step, and the findings showed
that the given SDL model outperforms common DCNNs on the MESSIDOR-1
database in terms of ACC, SE, and SP.
• Arcadu et al. [20] provided a method to predict DR progression defined as a 2-
step worsening in early treatment diabetic retinopathy. This paper used DCNNs
on 7 FOV images of the RIDE and RISE dataset with a specificity of 77% and a
sensitivity of 66%.
• Wang et al. [21] to assess the severity stages of diabetic retinopathy, we integrate
fundus fluorescein angiography and color fundus images simultaneously, extract
6 features using curve let transform, and feed them into a multitasking deep
learning framework. The recommended approach was evaluated on two indepen-
dent testing sets using quadratic weighted Cohen’s kappa coefficient, receiver
operating characteristic analysis, and precision–recall analysis.
• Ali et al. [22] provided a method for segmentation and classification of DR. Four
types of features—histogram (H), wavelet (W), co-occurrence matrix (COM), and
run length matrix (RLM)—were retrieved for texture analysis, and several ML
classifiers were used to achieve a classification accuracy of 77.67%, 80%, 89.87%,
and 96.33%, respectively. The data fusion technique was utilized to create a fused
hybrid feature dataset to increase classification accuracy.
• Bora et al. [23] provided a deep learning system to predict the development of
DR in patients.
• Gangwar and Ravi [24] provided a novel deep learning hybrid method for auto-
matic DR detection. The suggested model was tested on the MESSIDOR-1
diabetic retinopathy dataset and the APTOS 2019 blindness detection dataset
(Kaggle dataset). Our model outperformed previously reported findings. On the
MESSIDOR-1 and APTOS datasets, we attained test accuracy of 72.33% and
82.18%, respectively.
3 Comparative Analysis
Table 1 (continued)
Author Method used Dataset Accuracy Specificity Remarks
Shankar DCNN: MESSIDOR-1 99.28% 99.38% They
et al. Histogram-based improved the
[19] segmentation + efficiency and
SDL accuracy of
categorization
by creating
architectural
improvements
to an existing
DCNN
Wang DCNN: Shenzhen, Guangdong, Not Not Another
et al. Multitask China applicable applicable algorithm can
[21] network using be employed
channel-based
attention blocks
Ling dia ResNet/RCNN Private dataset 82% 81.3% Another CNN
et al. [5] algorithm
might be
employed
Ali et al. ML: SMO, Lg, Bahawal Victoria MLP: Not Specificity
[22] MLP, LMT, Lg Hospital, Pakistan 73.73% applicable can be added
employed on LMT:
selected 73.00
postoptimized SLg:
hybrid feature 73.07
datasets SMO:
68.60
Lg:
72.07%
Bora DCNN: EyePACS Not Not Specificity
et al. Inception v3 applicable applicable and accuracy
[23] not mentioned
Gangwar DCNN: APTOS 2019, APTOS: Not
et al. Inception MESSIDOR-1 82.18% applicable
[24] ResNet v2 MES
SIDOR 1:
72.33%
(continued)
where true positives (TPs), true negatives (TNs), false positives (FPs), false negative
(FN)
All of the studies presented in this paper used deep learning approaches to control
the diabetic retinopathy screening system. Due to a growth in the number of diabetic
patients, the demand for efficient diabetic retinopathy screening equipment has lately
become a major concern. Using DL for DR detection and classification solves the
challenge of picking trustworthy features for ML; nevertheless, it requires a large
498 A. Husain and D. Malhotra
Table 1 (continued)
Author Method used Dataset Accuracy Specificity Remarks
Mashal DCNN: EyePACS 85% 91% They
et al. customized improved the
[25] highly nonlinear efficiency and
scale-invariant accuracy of
network categorization
by creating
architectural
improvements
to an existing
CNN. DR
reduces the
number of
steps in color
fundus
imaging
amount of data to train. To enhance the number of photos and overcome over fitting
during the training step, most research employed data augmentation. To address
the difficulty of data size and to evaluate the DL approaches, 74% of the research
included in this paper used public datasets, while 26% used a mix of two or more
public datasets as shown in Fig. 2.
5 Proposed Framework
From Sects. 2 and 3, it is obvious that the algorithm’s efficiency and accuracy varied
significantly with the dataset employed. Various approaches for the prediction of
DR in diabetic patients have been presented by various researchers; however, the
prediction of DR in the early stages in literature has not been depicted more compre-
hensively. The limitation being in those techniques is that they are used to detect it
in later stages of DR. So to overcome those limitations, an intelligent framework for
early prediction of DR has been proposed in this paper.
Various key components involve data collection, data preprocessing, prediction
model, and results.
Data collection: The data is the most important aspect of any study. We used the
Kaggle dataset comprising 35,126 color fundus pictures, each with a size of 3888 X
2951 pixels. It comprises photographs from a variety of classifications based on the
severity of the offense.
Dataset: distribution of different classes
Preprocessing: The dataset is first put through preprocessing step in which our aim is
to enhance the images. The acquired images are enhanced so the data can help in the
500 A. Husain and D. Malhotra
6 Conclusion
however; the prediction of DR in the early stages in literature has not been depicted
more comprehensively. The primary goal of this work is to provide a short review of
the existing DR detection strategies that employ retina data. As a result, the current
study is an essential step toward the development of an intelligent diagnostic model
that must be adopted soon for the early and accurate diagnosis of DR, which will be
a great aid to the research community and health professionals in designing better
medicine. The focus of this research is to design an intelligent framework for the
prediction of DR in the early stages of diabetic patients.
References
17. Mohammad et al (2020) Exudate detection for diabetic retinopathy convolutional neural
networks, Article ID 5801870. https://doi.org/10.1155/2020/5801870
18. Rishab et al (2017) Automated identification of diabetic retinopathy using deep learning
124(7):962–969, July
19. Shankar K, Sait ARW, Gupta D, Lakshmanaprabu S, Khanna A, Pandey HM (2020) Automated
detection and classification of fundus diabetic retinopathy images using synergic deep learning
model. Pattern Recognit Lett 133:210–216
20. Arcadu F, Benmansour F, Maunz A, Willis J, Haskova Z, Prunotto M (2019) Deep-learning
algorithm predicts diabetic retinopathy progression in individual patients. NPJ Digit Med 2:92
21. Wang J, Bai Y, Xia B (2020) Simultaneous diagnosis of severity and features of diabetic
retinopathy in fundus photography using deep learning. IEEE J Biomed Health Inform 24:3397–
3407
22. Ali A, Qadri S, Mashwani WK, Kumam W, Kumam P, Naeem S, Goktas A, Jamal F, Chesneau
C, Anam S et al (2020) Machine learning-based automated segmentation and hybrid feature
analysis for diabetic retinopathy classification using fundus image. Entropy 22:567
23. Bora A, Balasubramanian S, Babenko B, Virmani S, Venugopalan S, Mitani A, Marinho GDO,
Cuadros J, Ruamviboonsuk P, Corrado GS et al (2021) Predicting the risk of developing diabetic
retinopathy using deep learning. Lancet Digit Health 3:e10–e19
24. Gangwar AK, Ravi V (2020) Diabetic retinopathy detection using transfer learning and deep
learning. In: Evolution in computational intelligence. Springer, Singapore, pp 679–689
25. Mashal et al (2017) Detecting diabetic retinopathy using deep learning. IEEE
Healthcare Data Analysis Using
Proposed Hybrid Harmony Genetic
Diagnostic Model
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 505
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_39
506 M. Sharma and S. Tyagi
the proper healthcare data analysis because an inaccurate medical decision some-
times leads to the highly intolerable loss of lives. Analysis of the healthcare data
usually helps the medical professionals in the early diagnosis of chronicle diseases.
Several diagnostic models, expert systems, and medical decision systems have been
described in the literature for the analysis of data from the healthcare domain as
well as early diagnosis of diseases. Evolutionary and swarm intelligence algorithms
are nature-inspired algorithm that mimic the process of natural phenomenon. In the
biomedical field, these algorithms are emerging as a field of research for solving the
optimization problems [1] (like clustering, classification, and prediction).
Clustering is an unsupervised classification or data analysis technique that assem-
bles the data items in such a way that the data items present in a group are dissimilar
from the data items present on other groups [2]. Numerous evolutionary and swarm-
based diagnostic models have been proposed by researchers for the early diagnosis
of a disease. Nevertheless, accuracy and precision are still challenging issues, exclu-
sively for healthcare datasets. Hence, there is a requirement of designing an effi-
cient intelligent system that can assist humans to make accurate judgements. The
key contribution of this paper is to develop an effective diagnostic model using the
concept of hybridization in harmony search using genetic algorithm operators.
Harmony search (HS) algorithm developed by Geem et al. [3] is a well-known
metaheuristic technique inspired by the improvisation concept of musicians. It is
gaining a wide range of popularity among researchers in different application areas
like medical, agriculture, scheduling, image processing, communication system,
etc. [4]. Effectiveness of a metaheuristic algorithm depends on the proper balance
between its exploration and exploitation phases. Imbalance in diversification and
intensification phases sometimes leads to a trap in local optima that result in the
problem of premature convergence. In this paper, before employment of harmony
search algorithm as a diagnostic model, some improvements have been incorpo-
rated in the traditional algorithm by incorporating genetic operators in the harmony
improvisation step along with parameter tuning. The proposed diagnostic algorithm
is termed as hybrid harmony search genetic algorithm (HHSGA). The efficiency
of this proposed model is evaluated by implementing it on four real-life datasets
of the medical domain from the UCI repository [5]. Moreover, the effectiveness of
the proposed algorithm is also checked by comparing it with other state-of-the-art
metaheuristic algorithms like genetic algorithm (GA), particle swarm optimization
(PSO), and HS.
The remaining sections of this paper are as follows: Section 2 presents the basic
harmony search algorithm and related work. Section 3 provides the application of the
proposed hybrid harmony genetic search algorithm as a diagnostic model. Section 4
covers the experimentation analysis, and simulation results followed by a statistical
analysis of the algorithm. Section 5 discusses the conclusion and future aspects.
Healthcare Data Analysis Using Proposed Hybrid Harmony Genetic … 507
2 Background
This section gives an overview of work related to diagnosis of diseases along with
brief overview and various issues related to harmony search.
Al-Muhaideb and Menai [6] discussed various approaches related to prognosis and
diagnosis of disease with medical data using metaheuristic algorithms. The study
includes various learning models, an appropriate selection of metaheuristic tech-
niques as well as performance indicators. Extraction of overlapped data and struc-
ture discovery in medical datasets is a difficult process. Khanmohammadi et al. [7]
designed a hybrid technique in 2017 by combining k-harmonic means and over-
lapping k-means algorithms (KHM-OKM) to resolve the above-mentioned problem.
The resulting outcomes confirm the effectiveness of the proposed hybrid approach
in extracting overlapped data and also resolving sensitivity problems. The Cuckoo
search algorithm was adopted by Gadekallu and Khare [8] for feature reduction by
using a rough-set-based approach for heart disease datasets. The results showed the
effectiveness of the proposed approach as compared to other algorithms.
Accuracy is one of the most important metrics in medical diagnosis. Noureddine
et al. [9] rectified the accuracy issues of most of the medical datasets by using a
symbiotic organism search mechanism. The proposed model significantly improved
the death rate count of earlier treatment of diseases. Inaccuracy in any diagnosis
model is mainly due to improper parameter setting and feature selection. These
disputes are resolved by Wang and Chen [10] using chaotic whale optimization
algorithm for medical datasets. Kaur and Kumar [1] designed a new diagnostic model
using water wave optimization algorithm. Furthermore, the premature convergence
issue in the basic algorithm was also resolved using a decay operator. The proposed
model achieved higher accuracy as compared to other techniques.
variable provides a value for generating a global optimum solution. Just like musi-
cians, the harmony search algorithm also undergoes a harmony improvisation phase
in each iteration by changing the value of decision variables. The basic algorithm
consists of the following steps:
Step 1: Initialize the objective function and parameters: Define the objective
function of the problem, i.e., to minimize (or maximize) the fitness function f (X ).
Furthermore, the values of control parameters of the algorithm like harmony memory
consideration rate (HMCR), harmony memory size (HMS), pitch adjustment rate
(PAR), bandwidth (BW), and maximum number of iteration (MaxIt) or stopping
criteria are also initialized.
Step 2: Initialization of harmony memory: The algorithm starts with the random
initialization of harmony memory (HM) through a random harmony vector (H Mi j ),
where j is the decision variable of ith harmony vector and is defined as follows:
HMi j = LB j + rand(0, 1) ∗ UB j − LB j (1)
where i ∈ (1, HMS), j ∈ (1, d) and d is the dimension of the problem. UB and LB
are the upper and lower bounds of variables, respectively.
Step 3: Harmony improvisation: To generate a new pitch and a better harmony, HS
is governed by three rules. In the first rule, a new pitch is randomly selected from
the existing pitches in harmony memory. In the second rule, the algorithm selects an
existing random pitch from the memory and fine-tunes it through a random bandwidth
(BW) that specifies its variations. Lastly, in the third rule, the algorithm generates
a new pitch. HMCR is a probabilistic parameter that decides whether to apply the
first two rules or the third one for improvement of the pitch. Similarly, PAR is also
a probabilistic parameter that governs the selection between the first and the second
rule for pitch generation. The detailed procedure is given in Algorithm 1.
Step 4: Harmony memory updation: New harmony vector generated after the above
steps is compared with the worst harmony vector. If the new one is superior to the
compared harmony, then it is replaced by the worst one in harmony memory.
Step 5: Check termination criteria: Repeat steps 3 and 4 until the improvisation
stopping criteria are reached. Otherwise, stop the algorithm and get the optimal
solution.
Healthcare Data Analysis Using Proposed Hybrid Harmony Genetic … 509
new wor st
If f X , < f X, then
X wor st = X new & f X wor st = f X new
j j j j
End If
Itr = Itr + 1
End while
Return global optimal harmony vector
End
This paper introduces a novel improved hybrid HS algorithm by using some operators
of the GA during harmony improvisation and new harmony generation phases of the
510 M. Sharma and S. Tyagi
HS. The new algorithm is named as HHSGA algorithm, which is a hybrid of both
HS and GA.
Improvement 1: During the pitch adjustment stage, the value of the BW parameter
is generally taken as constant by most of the researchers. However, its value affects
the modification of the existing harmony vector selected from the harmony pool.
In the proposed approach, initially, the value of BW is set according to maximum
and minimum values of attributes in the datasets which gradually changed after each
iteration as follows:
Improvement 2: In standard HS, the new harmony generation step follows the
random selection mechanism for generating a new pitch. Consequently, at every
iteration when a new harmony is generated, the algorithm follows the random search
procedure for selecting a new harmony that enhances its exploration capability. But,
at the same time, algorithm needs to deploy its exploitation capability as well. To
overcome the above-mentioned shortcoming, a new hybrid approach is proposed
that uses the polygamous selection [12] procedure selection of a harmony vector
from the harmony memory and then improves the exploitation capability by using
an arithmetic recombination operator between the selected and the worst harmony
vector. If the child’s harmony is better than the parent’s harmony, then replace it with
the worst harmonies in the harmony pool. The pseudo-code of the recombination
operator is given as follows:
In this paper, the proposed HHSGA approach is used as a clustering technique for
the classification of datasets for the diagnosis of a disease. In the proposed approach,
each harmony vector is a sequence of real numbers that represents the cluster centers
(K). In d-dimensional space, harmony vector is represented as (K*d). Consider a
dataset having k = 2 and d = 3. Then, each harmony vector is displayed as follows:
Here, the value min-term represents the distance measurement between data point
pm and cluster center c j . The proposed HHSGA as a clustering technique is given
as follows:
(continued)
Begin
Randomly initialize the harmony vectors from HM with k number of cluster centers. Calculate the
distance of each data point from the center of cluster by using Eq. 7. Assign data objects to clusters
having minimum value of fitness function
While ( Itr < MaxIt) /*Apply improved harmony search algorithm for generating the new cluster
centers*/
For j = 1 to d.
If I(rand(0,1) < HMCR) then
X new
j = X mj m ∈ randomcluster f romthe pool
If (rand(0,1) < PAR) then
Improve X new
j by using the modified value of BW by using Eqs. 2 and 3.
End If
Else
p
Select X j by using selection operator as in Eq. 4.
Apply Recombination operator between X Pj , X wor st and generate Offspring X c1 , X c2 using
j j j
Eq. 5 and 6
End If
End For
Calculate fitness of X C1 C2
j ,Xj
If f X C1j ,Xj
C2 < f (X P , X wor st ) then
j j
Replace offspring with the parents
End if
Evaluate the centers of the new cluster and replace the existing ones
Itr = Itr + 1
End while
Partition the data points of the dataset using the global cluster centers.
Return global best solution and labeled data points.
End
Many diagnostic models have been proposed by researchers for disease diagnosis.
In healthcare datasets, the main aim of the diagnostic model is to maintain a high
level of accuracy in the proper classification of datasets among different clusters.
Henceforth, there is a requirement to design an intelligent diagnostic system that
assists human in the proper diagnosis of diseases and can make precise results. In
this paper, the proposed hybrid HHSGA algorithm is used as a diagnostic model
for efficient classification of datasets and for generating the optimal clusters. The
proposed model is shown in Fig. 1.
Healthcare Data Analysis Using Proposed Hybrid Harmony Genetic … 513
Apply Improved
Harmony Search
Upload
Pre-Processing
Data
Pre-Processing Dataset
Clustering
Select Harmony Memory using
Polygamous Selection
Eliminate
Class Labels
Apply Recombination
Operator
Generation of
Clusters
Diagnosis and
Disease
Evaluation
Healthy Unhealthy
4 Experimental Analysis
See Table 1.
To evaluate the performance of the model, the proposed algorithm is compared with
other state-of-the-art algorithms like GA, PSO, and HS using four cluster quality
measures, namely precision, recall, accuracy, and G-measure. These metrics are
mathematically described as follows:
Results in Tables 2, 3, 4, and 5 show that the proposed HHSGA model provides more
significant results as compared to other compared models. It is clear from the results
that the proposed HHSGA diagnostic model provides a higher level of accuracy
than the other techniques. G-measure is also an important indicator for checking the
diagnostic performance of a model. It is clear from the simulation results that the
proposed HHSGA model provides a better G-measure value for the proper diagnosis
of a disease. It represents the true positive rate in the diagnosis of a disease. From
observations, it is clear that the proposed model has better precision and recall value
that makes it more efficient and robust as compared to other compared models to
diagnose diseases.
Table 2 Simulation results of different metaheuristic models for the Bupa (LD) dataset
Quality Precision mean Recall mean (SD) Accuracy mean G-measure mean
metrics/algorithm (SD) (SD) (SD)
GA 0.5740 (0.0028) 0.5447 (0.0020) 0.5147 (0.0020) 0.5593 (0.0022)
PSO 0.5680 (0.0158) 0.5405 (0.0116) 0.4872 (0.0153) 0.5541 (0.0137)
HS 0.5744 (0.0022) 0.5459 (0.00174) 0.4944 (0.0020) 0.5600 (0.0019)
HHSGA 0.6237 0.5767 (1.17e−16 ) 0.5246 0.5998 (0.00)
(1.17e−16 ) (1.17e−16 )
Table 3 Simulation results of different metaheuristic models for the breast cancer dataset
Quality Precision mean Recall mean Accuracy mean G-measure mean
metrics/algorithm (SD) (SD) (SD) (SD)
GA 0.9607 (0.0032) 0.9495 (0.0089) 0.9593 (0.0055) 0.9550 (0.0060)
PSO 0.9597 (0.0051) 0.9480 (0.0127) 0.9581 (0.0081) 0.9638 (0.0089)
HS 0.9630 (0.0013) 0.9568 (0.0023) 0.9636 (0.0016) 0.9599 (0.0018)
HHSGA 0.9640 0.9584 0.9648 0.9612 (0.00)
(1.17e−16 ) (1.17e−16 ) (1.17e−16 )
Healthcare Data Analysis Using Proposed Hybrid Harmony Genetic … 515
Table 4 Simulation results of different metaheuristic models for the thyroid dataset
Quality Precision mean Recall mean (SD) Accuracy mean G-measure mean
metrics/algorithm (SD) (SD) (SD)
GA 0.7209 (0.0198) 0.6664 (0.0420) 0.6348 (0.0796) 0.6930 (0.0303)
PSO 0.6327 (0.1598) 0.3643 (0.0387) 0.6553 (0.1120) 0.6407 (0.0629)
HS 0.7269 (0.0655) 0.7211 (0.0215) 0.6753 (0.0737) 0.7308 (0.0425)
HHSGA 0.7385 (0.0249) 0.7236 (0.0035) 0.6883 (0.0014) 0.7239 (0.0143)
Table 5 Simulation results of different metaheuristic models for the Haberman dataset
Quality Precision mean Recall mean Accuracy mean G-measure mean
metrics/algorithm (SD) (SD) (SD) (SD)
GA 0.5049 (0.0135) 0.5064 (0.0172) 0.5078 (0.0071) 0.4602 (0.0153)
PSO 0.5041 (0.0121) 0.5053 (0.0155) 0.5183 (0.0067) 0.5047 (0.0138)
HS 0.4996 (0.0157) 0.4940 (0.0189) 0.5169 (0.0250) 0. 5056 (0.0208)
HHSGA 0.5150 0.5192 0.5196 0.5171 (1.17e−16 )
(1.17e−16 ) (1.17e−16 ) (1.17e−16 )
Figures 2a–d illustrate the categories of disease related to a dataset using the
HHSGA model.
The HHSGA model categorizes the liver disorder data point into two categories
such as (a) patients having liver disorder and (b) patients having no liver disorder.
Data objects related to the thyroid dataset have been categorized into (a) normal,
(b) hypothyroidism, and (c) hyperthyroidism clusters using the proposed HHSGA
model. Furthermore, the breast cancer dataset is clustered into malignant or benign
categories, and the Haberman dataset is clustered into two categories, i.e., patients
who survived after surgery or not.
In this paper, a hybrid harmony search and genetic algorithm-based diagnostic model
are proposed for the classification of healthcare datasets and diagnosis of a disease. To
resolve the problem of poor exploitation capability of the basic harmony search algo-
rithm, some improvements have been made during the selection steps of the harmony
vector. Instead of random selection, a polygamous selection followed by a recom-
bination operator has been incorporated into the harmony improvisation steps. The
proposed algorithm is used as a clustering technique for the classification of health-
care data points into clusters that make it applicable as a disease diagnostic model.
The performance of the proposed model has been analyzed using four healthcare
datasets and is compared with other state-of-the-art models. Simulation and statis-
tical analysis using the Friedman test revealed that the proposed HHSGA model is
more robust and efficient in accurate diagnosis of a disease. In future, this work
can be extended for other applications like protein synthesis, software engineering,
etc., using another hybrid approach by integrating the concept of multi-objective
optimization in the proposed technique.
References
1. Kumar DY, Kaur A (2021) Healthcare data analysis using water wave optimization-based
diagnostic model. J Info Commun Tech 20:457–488
2. Hartigan JA (1975) Clustering algorithms. Wiley
3. Geem ZW, Kim JH, Loganathan GV (2001) A new heuristic optimization algorithm: harmony
search. Simulation 76:60–68
518 M. Sharma and S. Tyagi
Abstract Software cost estimation is a standout among the huge demanding tasks
in project management for new software. In any case, the procedure of estimation
is unsure as it largely relies on certain qualities that are very hazy amid the begin-
ning periods of improvement. This exploration is to give a method to software cost
estimation that performs superior to different procedures on the precision of effort
estimation. A soft computing procedure has been investigated to beat the vulnera-
bility and error in estimation. This investigation is to expand the constructive cost
model by intertwining the possibility of fuzziness into the estimation of size, method
of improvement projects, and the cost drivers adding to the general advancement
effort. The primary goal of the explorations is to examine the job of the fuzzy infer-
ence system method in enhancing the cost estimation precision utilizing COCOMO
II by describing inputs variables utilizing fifth GL systems and contrasting their
outcomes. The PROMISE dataset is utilized for the assessment of the fuzzy inference
system (FIS) procedures. The examinations have been completed utilizing MATLAB
simulation conditions.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 521
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_40
522 M. Islam et al.
1 Introduction
2 Related Work
There is basic two category of models, such as algorithmic and non-algorithmic [1].
Everyone require inputs, a precise estimate of explicit traits, for example, lines of code
and other cost drivers like range of abilities which are difficult to procure amid the
beginning time of software development. In 1990s, non-algorithmic was conceived to
extend estimating cost. Analysts have focus toward novel methodologies that delicate
registering, for example, ANN, GA, and fuzzy logic [2, 3]. A portion of early works
demonstrates that fuzzy logic offers an amazing etymological interpretation that
ready to speak to imprecision in sources of inputs and outputs while giving huge
learning ways to deal with model’s structure. It is a procedure to take care of issues,
which are too complex to be in any way seen quantitatively. It depends on fuzzy-
set-theory. It gives a system to speaking to semantic builds, for example, many, low,
medium, and high. It gives a deduction structure that empowers fitting man reasoning
limits. Unexpectedly, the binary set hypothesis depicts crisp events that do or do not
Cost Estimation Model Using Fifth Generation Language Technique … 523
happen. This experiments perspective hypothesis that clarify if the events will happen
evaluating the opportunity for which the given events are required happening [4, 5].
It is the rule-based system which is the core learning containing the implied
Fuzzy IF THEN guidelines in which a couple of words are portrayed by consistent
part works. It can be classified into three kinds: pure, Takagi and Sugeno’s fuzzy
logic with fuzzifiers and defuzzifiers. The enormous bit for planning produces crisp-
data as for input and envisions crisp-data as for output. This was right off the bat
created by Mamdani. It has been viably associated with a collection of mechanical
techniques and customer things [6, 7].
It is the initial phase in the fuzzy inference process. This includes an area change
where crisp inputs sources are changed into fuzzy inputs. Crisp inputs are careful
information sources estimated by sensors and go into the control system for handling,
for example, temperature, weight, etc. [8–10]. Each crisp input that will be prepared
by the FIS has its gathering of membership functions or sets to which they are
changed. This gathering of membership functions exists inside a vast expanse of talk
that holds every single important value that the crisp input can have. The accompa-
nying demonstrates the structure of membership functions inside a vast expanse of
talk for a crisp input [11, 12].
The principal of FIS is concentrated at the capacity of fuzzy logic that show
characteristic. This system contains fuzzy-rules worked for expert-knowledge and
called fuzzy expert systems, contingent upon their last use. Before FIS, it was at
that point applied to construct expert systems for recreation objectives [13, 14]. The
master frameworks depended upon the classical-boolean-logic that was not appro-
priate for dealing with the sequentially to the fundamental procedure wonders. Fuzzy-
logic enables continuous standards that be brought into expert-knowledge-based test
systems [7, 15].
The Sugeno’s initial tasks, a great deal of scientists, have been engaged with
structuring fuzzy systems from databases. The means of fuzzy reasoning performed
by FISs are as follows:
1. Comparison of input factors with the MF on the precursor portion to get the
membership estimations of each phonetic mark. (Progression is frequently said
fuzzification).
2. Connection of the membership values on the reason portion to get terminating
quality (level of satisfaction) of each standard.
3. Production of certified ultimately (either fuzzy or crisp) or each standard relying
upon terminating quality.
4. Composite the certified consequents to deliver a crisp output. (Progression is said
defuzzification).
524 M. Islam et al.
The FIS is used to execute the differing preparing propels. Decisions were obliged
making and adjusting FIS with fuzzy logic toolbox software using graphical instru-
ments or command line capacities. The research will implement on third GL,
fourth GL, and fifth GL using Mamdani FIS. Figure 1 is used as fuzzification and
defuzzification.
The performance analysis and their corresponding results are compared. The
results are analyzed using the criterion RMSE. Less value means that the result is
more accurate.
• Select a specific kind of FIS (Mamdani).
• Define the variables for the input and output.
• Set input and output member functions.
• The data is now in rule editorial manager if-then rules.
• An explicit model structure is made, and parameters of input and output factors
tuned to get the ideal output.
COCOMO II is utilized as the model-based to assess the software project cost. The
model was developed by Mr. Boehm and Scattered in 1981 that utilizing aggregated
data from 63 projects. It is a good manual that estimating maintenance cost for
software. This proposed model implies with new factual methodologies and strategies
that estimate the maintenance cost of software using fifth GL (fuzzy inference system)
procedure.
Cost Estimation Model Using Fifth Generation Language Technique … 525
The issue of software cost estimation is that everything considered and relies on
single estimations of size, cost drivers, and scale factors. It is assessed dependent on
recently finished projects that are fairly like the present projects [16–18]. Similarly,
cost drivers and scale elements need through evaluation instead of doling out a
fixed numbers value. To beat this condition, it is more brilliant to address these
responsibilities to the kinds of fuzzy-sets, where the qualities of interval are utilized
which is expressed through collection of membership functions like triangular MF,
trapezoidal MF, and Gaussian MF [19].
The proposed fuzzy software cost estimation model is represented in the Fig. 2. Its
principles contain phonetic factors identified with the undertaking. The FIS utilizes
connecters “and/or” for COCOMO input factors that shape principles. The FIS incor-
porates many input software characteristics: seventeen cost drivers, five scale factors,
one size (KDLOC), and one output as cost estimation (CE).
Fig. 2 Proposed
maintenance cost estimation
model using FIS techniques
526 M. Islam et al.
Fuzzy set takes all the input and convert into the phonetic values. For each cost
driver, a different FIS is planned. Principles are created as cost drivers for forerunner
parts that comparing effort-multiplier in the resulting portion. The defuzzified value
for the effort-multiplier has kept for separate FIS. The scale factors are additionally
fuzzified. The pcap means that programmer-capability is examined for an example.
Programmer-capability for fuzzification depends on COCOMO II, calibrated model
of post-architecture qualities.
Next, the model so obtained will be later subjected to optimization of its model
parameters using fuzzy inference system optimization technique to arrive at better
software cost estimation prediction accuracy. The fuzzy operators such union, inter-
section, and complement shall be used. FIS races to create answers for progres-
sive generations. Henceforth, the quality of the solutions in progressive generations
improves. The procedure is ended when an ideal solution is found. The result has
analyzed the criterion of root mean square error (RMSE) factor, which predicts the
better software cost estimation with accuracy.
The data used as input and output variables for ideal COCOMO II model advancement
is given in Table 1. The dataset Table 2 is assembled from the examination of 40 soft-
ware projects, which is adopted from Software Engineering Repository of PROMISE
dataset which open access for researching reason. It comprises 26 attributes like
seventeen standards COCOMO II characteristics cost drivers and five scale factors
in the range that measure in thousand delivered source lines of code (KDLOC) direc-
tions. The output of the model is the cost estimation (CE), measured in man-months.
The estimated efforts using third GL, fourth GL, and fifth GL approaches obtained
are tabulated and compared. The model equation is given as follows:
5
17
PM = A × [Size]1.01 + SFi × EMi (1)
i=1 i=1
Fuzzy principles for the fuzzy inference system dependent on COCOMO II are
characterized with semantic factors in the fuzzification procedure. These principles
depend upon connective “and” as between the input factors.
The Rules are defined as follows:
if (rely is vl) then (effort is vl)
if (rely is l) then (effort is l)
if (prec is vl) then (effort is xh)
if (pmat is vh) then (effort is l)
The following rules are used in Figs. 3, 4, and 5:
if (pcap is very low) then (increased effort)
if (pcap is low) then (increased effort)
if (pcap is nominal) then (unchanged)
528 M. Islam et al.
5 Experimental Results
The assessment comprises in contrasting the exactness of the calculated cost with
genuine cost. There are numerous assessment scales for estimating software cost.
We connected the regular one is RMSE and defined as follows:
N 2
1 ∧
RMSE = yi − y (2)
N i=1 i
The software effort got when utilizing regular COCOMO II and fuzzy MF were
looked at. In the wake of breaking down, the outcomes acquired utilizing applying
third, fourth, and fifth GL. It is shown that the cost evaluated by fuzzifying all
effort-multipliers utilizing fifth GL (FIS) procedure is predicting better estimate.
6 Comparison
The parameter of cost estimation models for the assessment is the MAE that is
represented in the Eq. 3. The effort has been calculated for every observation (Table
3).
Table 4 demonstrates with the chart representing to the similar examination of the
real cost with the estimated cost using COCOMO II, third, fourth, and fifth GL. The
RMSE and MAE values are calculated using Eqs. 2 and 3, and the RMSE values for
all project for COCOMO II, third GL, fourth GL, and fifth GL are 1.2403, 1.0638,
1.075, and 0.9398, respectively. The MAE values are 0.1650, 0.1251, 0.1236, and
0.1183, respectively. This plainly demonstrates here is a reduction in the absolute
errors; therefore, the proposed model is progressively reasonable for estimating cost.
Cost Estimation Model Using Fifth Generation Language Technique … 531
7 Conclusion
We conclude that the use of fuzzy logic in SCE yields more exact outcomes than the
past experimental model methodology. The RMSE values of cost estimation using
fifth GL based on FIS techniques give better outcomes for most extreme rules if
other high-level language techniques will be used. It found that the FIS is achieving
better as it shows a likely change in its intervals, and accomplished outcomes were
nearer to the actual cost. FIS has the lowest MAE and highest accuracy of the three
generation language software methodologies that studied.
Future work incorporates more current procedures, i.e., type-2 fuzzy-system can
likewise be connected for increasingly precise forecasts of software. The above
research work can be easily employed in the software industries.
532 M. Islam et al.
References
1. Lagerström R, von Würtemberg LM, Holm H, Luczak O (2016) Identifying factors affecting
software development cost and productivity. Softw Qual J 20:395–417
2. El Bajta M, Idri A, Fernandez-Alem JL, Ros JN (2015) Software cost estimation for global
software development. In: 10th International Conference on Evaluation of Novel Software
Approaches to Software Engineering, ENASE
3. Veeranjaneyulu N, Suresh S, Salamuddin S, Kim H-Y (2014) Software cost estimation on
e-learning technique using a classical fuzzy approach. Int J Soft Eng Appl 8(11):217–222
4. Patil LV, Waghmode RM, Joshi SD, Khanna V (2017) Generic model of software cost
estimation: A hybrid approach. IEEE Int Adv Comput Conf, IEEE, pp 1379–1384
5. Vu N (2010) Improved size and effort estimation models for software maintenance. University
of Southern California
6. Marounek P (2012) Simplified approach to effort estimation in software maintenance.
University of Economic, Prague, Faculty of Information and Statistics. J Syst Integration,
51–63
7. Wijayasiriwardhane T, Lai R, Kang KC (2011) Effort estimation of component based software
development, a survey. IET Soft 5:216–228
8. Borade JG (2013) Software project effort and cost estimation techniques. Int J Adv Res Comp
Sci Soft Eng 3(8):730–739
9. Chiu S (1994) Fuzzy model identification based on cluster estimation. J Intell Fuzzy Syst,
267–278
10. Farooqui NA, Ritika (2020) A machine learning approach to simulating farmers’ crop choices
for drought prone areas. In: Singh P, Panigrahi B, Suryadevara N, Sharma S, Singh A (eds)
Proceedings of ICETIT 2019. Lecture Notes in Electrical Engineering, vol 605. Springer,
Cham. https://doi.org/10.1007/978-3-030-30577-2_41
11. Aggarwal K, Singh Y, Chandra P, Puri M (2005) Measurement of software maintainability
using a fuzzy model. J Comp Sci. ISSN 1549-3636
12. Sneed H (2004) A cost model for software maintenance & evolution. IEEE, 264–273
13. Patil LV, et al (2014) Develop efficient technique of cost estimation model for software
applications. Int J Comp Appl 87(16):0975–8887, February
14. Maleki I, Ebrahimi L, Jodati S, Ramesh I (2014) Analysis of software cost estimation using
fuzzy logic. Int J Foundations of Comp Sci Tech 4(3):27–41
15. Mukherjee S, Bhattacharya B, Mandal S (2013) A survey on metrics, models & tools of software
cost estimation. Int J Adv Res Comp Eng Tech (IJARCET) 2(9):2620–2625
16. Jing QF, Zhu X-Y, Xiaoyuan XX, Baowen X, Shi Y (2017) Software effort estimation based
on open-source projects: case study of Github 92:145–157, December
17. Patil PK (2015) A review on calibration factors in empirical software cost estimation (SCE)
models. Int J Soft Eng Res Pract 3:1–7
18. Mohammad I, Vinodani K (2014) Development of a software maintenance cost estimation
model: 4TH GL perspective. Int J Tech Res Appl, 6
19. Sehraab SK, Brara YS, Sukhjit NK, Sehra S (2017) Research patterns and trends in software
effort estimation 91:1–21, November
20. Srivastava SK, Prasad P, Varma SP (2016) Evolving predictor variable estimation model for
Web engineering projects. Comput Sci Eng, 68–89
Survey of Text Summarization
Stratification
Abstract The volume of data on the Internet has increased at an exponential rate
during the previous decade. Consequently, the need for a method for converting
this massive amount of raw data into meaningful information that a human brain
can comprehend emerges. Text summarization is a common research technique
that aids in dealing with a massive quantity of data. Automatic summarization is
a well-known approach for distilling the important ideas in a document. It works by
creating a shortened form of the text and preserving important information. Tech-
niques for text summarizing are classified as extractive or abstractive. Extractive
summarization methods reduce the burden of summarization by choosing a few
relevant sentences from the original text. The implications of sentences are calcu-
lated using linguistic and statistical characteristics. This paper investigates extractive
and abstract methods for text summarization. We will also explore many efforts in
automatic summarization, particularly recent ones, in this article.
1 Introduction
Every day, the volume of textual data available is increasing digitally as web pages,
news, academic paper, and articles. Despite having so much information available, it
seems difficult to find the related information to a certain user as most of the data is not
related to a certain user’s need [1]. Automatic summarization helps in the extraction of
needful information while rejecting the extraneous. It can also increase text legibility
and cut down on the amount of time people spend searching. The ultimate goal is to
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 533
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_41
534 A. Jamwal et al.
produce summaries that include the major themes in a clear and right manner, while
not including unnecessary material [2–4]. There are multiple ways to do summariza-
tion based on input parameters and desired outputs, but there are mainly two summa-
rization approaches in which we can divide the research, abstractive and extractive.
Extractive summarization selects a few sentences or words or phrases from a source
document and combines them to form a summary without modifying or changing
the input sentences from the document. An abstractive summarization converts the
relevant phrases collected from a text into an understandable and coherent semantic
form, perhaps changing the original sentences. The combination of an extractive and
abstractive summary is referred known as “hybrid text summarizing” [5].
In general, all automatic text summarizing systems have three phases in their
processing design [6]. The initial stage is to identify the text’s sentences, words, and
other elements. In the processing phase, it transforms the input text into a summary
using a text summarizing approach. Post-processing is the third phase, which entails
rectifying problems in the produced draft summary [7]. Some recent reviews on
automatic text summarizing have been published, with the majority focusing on
extractive summarization approaches [8] since abstractive summarization is difficult
and necessitates extensive Natural Language Processing (NLP). On the basis of
the literature survey done in this paper, Fig. 1 is drawn. In this paper, we will be
discussing the following arguments over text summarization: approaches/techniques
used in summarization; the evaluation, used for text summarization; and advantages
of using specific techniques.
1. Indicative summary is being used to offer a fast overview for lengthy documents,
which only includes points of the source text to entice the user for reading the
entire work.
2. Informative might be used in place of the original document. It gives the user
succinct information from the original material [8].
1. Generic summarizing is a system that may be utilized by any sort of user, and
the summary is independent of the document’s topic [8].
2. In a query-based, question-and-answer system in which the summary is decided
by the user’s inquiry is known as query-based summarization.
3 Extractive Summarization
In this form summary, just the most important and frequent terms are included from
the original text. The sentences are scored, and those with the highest rating are
included in the summary [10]. Figure 2 shows the architecture of the extractive text
summarization as explained in [11].
The TF–IDF is numerical insights that show how important a word is in a specific
report. The word TF can relate to the repeated recurrence of the term in the report,
and IDF can be defined as a metric that reduces the weight of repeating terms in
the collection while increasing the weight of phrases that are encountered seldom.
At this point, sentences are assessed in accordance with the things, and sentences
with a high score are included in a rundown. One disadvantage of this method is
that lengthier sentences generally have a higher word count since they include more
words [12].
These text summarizing methods make use of a multi-valued system known as fuzzy
logic. Because the logical concepts “one” and “zero” do not necessarily match the
“real world,” fuzzy logic [13] offers an efficient method for generating feature values
for phrases that lie between these two values. Selecting a set of qualities for each
sentence is the first step in sentence grading. The second step involves applying the
fuzzy logic concept to each statement and assigning a score based on its relevance.
This implies that each sentence is assigned a score between 0 and 1 depending on its
characteristics [8].
This method involves teaching neural networks to figure out which phrases should
be contained in the summary. A 3-tiered Feed Forward neural_network is used [14].
It is natural to believe that summaries should cover various “themes” found in the
papers. If the document collection for which the summary is being generated contains
documents on completely diverse themes, document clustering becomes virtually
mandatory in order to construct a relevant summary. Sentences are picked for their
similarity to the theme of the cluster (Ci). The position of the sentence in the paper
is the next consideration (Li). The last criterion is it resembles the initial phrase of
the text it belongs to (Fi) [15]. In this section, we have compared the summarization
models based on a technique used by the model which are described in Table 1.
4 Abstractive Summarization
This type of text summarization involves the paraphrasing of the sentences and
the generation of new sentences which are syntactically and semantically correct.
Figure 3 shows the architecture of the abstractive text summarization.
538 A. Jamwal et al.
Table 1 Comparison of models based on the technique used and their accuracy in terms of Rouge
metric
Reference Approach/technique Remark Dataset Accuracy
[18] Attentional Multi-sentence CNN/DM and 27.14
encoder–decoder RNN summaries GigaWord
[19] Pointer generator, Solved Inaccurate CNN/DM 31.06
coverage mechanism factual details
[20] Coverage mechanism OOV words problem XSum 24.53
[21] Bidirectional RNN, Solved limited CNN/DM 33.3
LSTM awareness of the
sentence context
[22] CNN Utilize CNNs to allow CNN/DM and 33.74
parallelization over GigaWord
text data
[23] Attention mechanisms, Quality summaries for CNN/DM 32
RL long documents
[24] RL model Significant speed-up in CNN/DM 32.41
training and decoding
[25] Bottom-up attention Training the content CNN/DM 32.75
selection takes 1,000
sentences
[26] Reinforcement learning Better overview than CNN/DM 33.03
numerous baselines
[27] Transformers Good semantic and CNN/DM and 33.45
Encoder-Decoder part context features of XSum
embeddings
[27] Transformer-based, Can produce a broad GigaWord 36.69
decoder spectrum of summaries
In various fields, human summaries prefer to employ specific sentence forms. These
are known as templates. Based on the style of the input document, the information in
the input document is used to fill slots in applicable pre-defined templates document
to conduct abstractive summarization [17]. To fill template slots, text samples can be
retrieved using rules and linguistic clues [8].
Alami et al. [16] used a graph model with nodes that each represent a word and
positional information while also being connected to other nodes. Directed edges
indicate the structure of a sentence. Creating a textual graph that reflects the original
material and producing an abstractive summary are two aspects of the graph tech-
nique. Such a strategy explores and scores multiple sub-paths in the graph in order
to offer an abstractive summary [1].
540 A. Jamwal et al.
5 Conclusion
Automatic text summarization produces a summary by reducing the size of the orig-
inal content while maintaining important information. Despite the fact that various
techniques have been presented, automatic text summarizing remains a difficult
undertaking, with the results falling far short of quality human summaries. The
majority of researchers concentrate on the extractive strategy. As a result, extractive
summarization has a larger body of literature than abstractive summarization. This
survey examined several approaches and strategies. We can say that combining two
or more approaches or procedures is likely to provide positive outcomes, improving
the quality of the summaries over utilizing either way alone.
References
16. Alami N, Meknassi M, En-nahnahi N (2019) Enhancing unsupervised neural networks based
text summarization with word embedding and ensemble learning. Expert Syst Appl 123:195–
211
17. Gupta S, Gupta S (2019) Abstractive summarization: an overview of the state of the art. Expert
Syst Appl 121:49–65
18. See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator
networks. In: ACL 2017— Proceedings of the 55th annual meeting of the Association for
Computational Linguistics (Long Pap.2017), vol 1, 1073–1083. https://doi.org/10.18653/v1/
P17-1099
19. Narayan S, Cohen SB, Lapata M (2018) Don’t give me the details, just the summary! Topic-
aware convolutional neural networks for extreme summarization. In: Proceedings of the 2018
conference on empirical methods in natural language processing (EMNLP 2018), pp 1797–
1807
20. Al-Sabahi K, Zuping Z, Kang Y (2018) Bidirectional attentional encoder-decoder model and
bidirectional beam search for abstractive summarization. arXiv Prepr. arXiv1809.06662
21. Zhang Y, Li D, Wang Y, Fang Y, Xiao W (2019) Abstract text summarization with a
convolutional seq2seq model. Appl Sci 9.https://doi.org/10.3390/app9081665
22. Paulus R, Xiong C, Socher R (2018) A deep reinforced model for abstractive summarization.
In: 6th international conference on learning representations (ICLR 2018)—conference track
proceedings, pp 1–12
23. Chen YC, Bansal M (2018) Fast abstractive summarization with reinforce-selected sentence
rewriting. In: ACL 2018—56th annual meeting of the Association for Computational
Linguistics (Long Papers, vol 1), pp 675–686. https://doi.org/10.18653/v1/p18-1063
24. Gehrmann S, Deng Y, Rush AM (2018) Bottom-up abstractive summarization. In: Proceedings
of the 2018 conference on empirical methods in natural language processing (EMNLP 2018),
pp 4098–4109. https://doi.org/10.18653/v1/d18-1443
25. Celikyilmaz A, Bosselut A, He X, Choi Y (2018) Deep communicating agents for abstractive
summarization. In: NAACL HLT 2018 - 2018 conference of the North American chapter
of the Association for Computational Linguistics: human language technologies, vol 1, pp
1662–1675. https://doi.org/10.18653/v1/n18-1150
26. Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov L, Zettlemoyer
L (2019) BERT: denoising sequence-to-sequence pre-training for natural language generation,
translation, and comprehension. arXiv Prepr. arXiv1910.13461
27. Song K, Wang B, Feng Z, Liu R, Liu F (2020) Controlling the amount of verbatim copying in
abstractive summarization. In: Proceedings of the AAAI conference on artificial intelligence,
vol 34, pp 8902–8909.https://doi.org/10.1609/aaai.v34i05.6420
28. Joshi M, Wang H, McClean S (2018) Dense semantic graph and its application in single
document summarization. In: Emerging ideas on information filtering and retrieval. Springer,
pp 55–67
29. Modi S, Oza R (2018) Review on abstractive text summarization techniques (ATST) for
single and multi documents. In: 2018 international conference on computing, power and
communication technologies (GUCON). IEEE, pp 1173–1176
30. Mahajani A, Pandya V, Maria I, Sharma D (2019) A comprehensive survey on extractive and
abstractive techniques for text summarization. In: Ambient communications and computer
systems. Springer, pp 339–351
31. Hou L, Hu P, Bei C (2017) Abstractive document summarization via neural model with joint
attention. In: National CCF conference on natural language processing and Chinese computing.
Springer, pp 329–338
32. Mohd M, Jan R, Shah M (2020) Text document summarization using word embedding. Expert
Syst Appl 143:112958
33. Moratanch N, Chitrakala S (2016) A survey on abstractive text summarization. In: 2016
International conference on circuit, power and computing technologies (ICCPCT). IEEE, pp
1–7
542 A. Jamwal et al.
34. Hsu W-T, Lin C-K, Lee M-Y, Min K, Tang J, Sun M (2018) A unified model for extractive and
abstractive summarization using inconsistency loss. arXiv preprint arXiv:1805.06266
35. Nima S (2018) Abstractive text summarization with attention-based mechanism. MS Thesis,
Universitat Politecnica de Catalunya
36. Zihang D et al (2019) Transformer-xl: attentive language models beyond a fixed-length context.
arXiv preprint arXiv:1901.02860
37. Khandelwal U, Clark K, Jurafsky D, Kaiser L (2019) Sample efficient text summarization using
a single pre-trained transformer. arXiv preprint arXiv:1905.08836
38. Liu PJ, Saleh M, Pot E, Goodrich B, Sepassi R, Kaiser L, Shazeer N (2018) Generating
Wikipedia by summarizing long sequences. In: Proceedings of the ICLR
39. Wani MA, Riyaz R (2016) A new cluster validity index using maximum cluster spread based
compactness measure. Int J Intell Comput Cybern 9:179–204
40. Wani MA, Riyaz R (2017) A novel point density based validity index for clustering gene
expression datasets. Int J Data Min Bioinform 17:66–84
41. Wani MA, Bhat FA, Afzal S, Khan (2020) Advances in deep learning. Springer
42. Wani MR, Wani MA, Riyaz R (2016) Cluster based approach for mining patterns to predict
wind speed. In: Proceedings of the international conference on renewable energy research and
applications, pp 1046–1050
43. Wani MA (2008) Incremental hybrid approach for microarray classification. In Proc. of the 7th
international conference on machine learning and applications, pp 514–520
44. Riyaz R, Wani MA (2016) Local and global data spread based index for determining number
of clusters in a dataset. In: Proceedings of the 15th IEEE international conference on machine
learning and applications, pp 651–656
45. Lin H, Ng V (2019) Abstractive summarization: a survey of the state of the art. In: Proceedings of
the 33rd AAAI conference on artificial intelligence, vol 33, pp 9815–9822. [Online]. Available:
https://www.aaai.org/ojs/index.php/AAAI/article/view/5056
46. Klymenko O, Braun D, Matthes F (2020) Automatic text summarization: a state-of-the-art
review. In: Proceedings of the 22nd international conference on enterprise information systems,
pp 648–655
47. Shi T, Keneshloo Y, Ramakrishnan N, Reddy CK (2018) Neural abstractive text summarization
with sequence-to-sequence models. arXiv:1812.02303. [Online]. Available: http://arxiv.org/
abs/1812.02303
48. Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional trans-
formers for language understanding. In: Proceedings of the 2016 conference of the North Amer-
ican chapter of the Association for Computational Linguistics: human language technologies,
pp 4171–4186
49. Radford A, Improving language understanding by generative pre-training. Open AI J, to be
published
50. Rane N, Govilkar S (2019) Recent trends in deep learning based abstractive text summariza-
tion. Int J Recent Technol Eng 8(3); Sciforce (2019) Towards automatic summarization. Part
2. Abstractive methods. Sciforce Blog. [Online]. Available: https://medium.com/sciforce/tow
ards-automaticsummarization-part-2-abstractive-methods-c424386a65ea
51. Sanad M (2019) A comprehensive guide to build your own language model in Python.
[Online]. Available: https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-
language-model-nlp-python-code/. Manning CD, Schutze H, Raghavan P (2008) Introduction
to information retrieval, vol 238. Cambridge University Press, Cambridge, U.K.
52. Jing K, Xu J (2019) A survey on neural network language models. arXiv:1906.03591. [Online].
Available: http://arxiv.org/abs/1906.03591
53. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in
vector space. In: Proceedings of the 1st international conference on learning representations
(ICLR), pp 1–12
54. Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summa-
rization. In: Proceedings of the conference on empirical methods in natural language processing,
pp 379–389. [Online]. Available: https://arxiv.org/abs/1509.00685
Survey of Text Summarization Stratification 543
55. Chopra S, Auli M, Rush AM (2016) Abstractive sentence summarization with attentive recur-
rent neural networks. In: Proceedings of the conference of the North American chapter of the
Association for Computational Linguistics: human language technologies, pp 93–98
56. Nallapati R, Zhou B, Gulçehre C, Xiang B (2016) Abstractive text summarization using
sequence-to-sequence RNNs and beyond. In: CoNLL 2016—20th SIGNLL conference on
computational natural language learning. Proceedings, pp 280–290. https://doi.org/10.18653/
v1/k16-1028
A Systematic Study of Various
Approaches and Problem Areas
of Named Entity Recognition
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 545
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_42
546 M. Madan et al.
address the different needs of various languages, extra work is required to adapt the
current or to add a new method. Different languages can be character based or may
be varied in morphological patterns adding to the diverse semantics and syntaxes to
cater to. This chapter will be going through the introduction of NER, the standards
and metrics used in general by the researchers, followed by a synopsis of the state of
the art available today. For the duration of 2017 to 2021, the problem areas that the
authors focused on along with the approaches taken, solution advantage and future
scope are illustrated.
The motivation behind this study is that not only that it depicts the latest trends but
also it has seen one of the major shifts that modified the course of solution, not only
NER but also NLP in general. This shift is from feature engineering to contextual
embedding, with the use of transformers, is so strong that they are naturally being
adapted by the researcher to achieve state of the art. A summary of the latest trends
not only helps with an overall recent picture but also facilitates the planning of the
approach that one is planning to take. The papers chosen for this study are not of a
particular domain or language application, but for NER in general.
The earlier approaches for the NER problem were the rule-based approaches
including dictionaries and finding patterns and adding grammatical knowledge.
Popular probabilistic models were used such as hidden Markov model (HMM)
and conditional random fields (CRF). Machine learning approaches such as support
vector machines (SVM) and decision trees require features that had to be specifically
curated and were then followed by advancing deep learning. The commonly used
deep learning methodologies are recurrent neural network (RNN), long short-term
memory (LSTM), and convolution neural network (CNN) where (Bidirectional) Bi-
LSTM-(CRF) is one of the popular models. Among the recent surveys, Li et al. [1]
wrote about the deep learning advancements on NER explaining in detail, the deep
learning technologies used and the resources involved. For language and domain-
specific surveys, to name a few recent ones, Sharma et al. [2] presented a named
entity recognition survey for the Hindi language. A survey specific to food entities
was presented by Popovski et al. [3]. Georgescu Tiberiu-Marian et al. [4] worked
a survey for the cybersecurity domain while Wang et al. [5] worked on a detailed
survey for nested entities.
NER is the identification of the entities from the text that are names of a person,
organization, date, or place. This work can be further subdivided into two tasks,
identifying the entity with its span and then classifying it. An entity can be a single
word “Amazon” or can be of multiple words such as “University of New York.” Thus,
it is important to identify and understand the boundary of the entity. The mentioned
entity “University of New York” also represents another type of entity that is a nested
entity where the system has to decide to identify “New York” as the entity in question
or the “University of New York” as a single entity. Furthermore, there can be a general
A Systematic Study of Various Approaches and Problem Areas … 547
1.2 Metrics
Being a multiclass classification problem, the following metrics that are universally
applied are precision, recall, and F1 score for the evaluation of the NER models.
These are widely used over accuracy as they are good measures even for class imbal-
ance. Precision is the ratio of correct entities to all the classified entities in that class.
The recall is the ratio of correct entities to actual correct entities in the class. The F1
score, a harmonic mean of both is a good measure and assigns equal weight to both
the precision and recall. Equations 1, 2, and 3 show how the precision, recall, and
F1 are calculated.
True Positive
Precision = (1)
True Positive + False Positive
(Total posoitives predicted)
548 M. Madan et al.
True Positive
Recall = (2)
True Positive + False Negative
(Total actual positives)
Precision × Recall
F1 = 2 × (3)
Precision + Recall
NER has many datasets that are popular and used in the benchmark study serving
general and specific purposes. Few popular datasets are CoNLL2003 and Ontonotes
V5. On the CoNLL2003 dataset, the state of the art is with an F1 of 94.6 by Wang
et al. [6]. The researcher has used automatic concatenation of embedding to find one
for the best performance. This is very closely followed by work done by Yamada
et al. [7] with the F1 of 94.2 based on LUKE, pretrained representations including
context using transformers. For Ontonotes v5 (English), Li et al. [8] have reached an
F1 score of 92.07 using a new loss function Dice loss in place of cross-entropy. This
is very closely followed by the subject-based packed markers of Ye et al. [9] with an
F1 of 91.9.
2 Literature Review
2.1 Methodology
The sample of the few papers used to represent this summary has been collected
using the Google search engine and other reliable resources and Web sites for state
of the art. The paper is an effort toward a mix of some popular and random works to
represent the general work done during this time. The yearly distribution is not strict
and only representational. It should be looked at subdivision for looking closely and
should be taken together as the work done during these five years is presented in an
easy-to-compare tabular form. Much of the work in NER areas can be seen in the
applications of these models in different languages and domains. This paper is not
exhaustive and does not talk about such efforts but does more about general issues
faced by the NER.
A Systematic Study of Various Approaches and Problem Areas … 549
Table 1 shows a few of the papers from 2017. Performance is one of the major
problem areas. The solution approach here is incorporating more context, sentence
structure information in the training data, or a strong ensemble approach using three
different algorithms particle swarm optimization (PSO), Bayesian, and CRF [10].
Performance is followed by the problem area of low availability of labeled data.
Supervised learning approaches need a very big corpus to be able to have good
training. To perform well in specific domains, the models need to be trained with
a specific corpus. In such cases, the unavailability of the corpus can be a huge cost
to the system. This brings one to the importance of unsupervised learning that does
not depend on the huge amount of data for training. The details are mentioned in
the table. Apart from this, scalability is another important area [11] to maintain its
speed and accuracy, and Map Reduce was implemented with the model to exploit the
parallelism. Also, accessibility and ease of use by the non-experts was one problem
area worked upon by Dernoncourt et al. [12] using BRAT (annotation tool) and
artificial neural network (ANN) for the same. The solution to the low resource leading
to improved performance was by incorporating the domain knowledge using a disease
dictionary and Unified Medical Language System (UMLS) along with semantic type
filtering and CRF for better domain-specific results by Kanimozhi et al. [13]. In terms
of models being used commonly, the Bi-LSTM-CRF, CNN, and graph convolution
network (GCN) can be seen.
Table 2 shows a few of the papers from 2018 where the problem areas listed are
robustness, fine-grained, and nested entities addressed using summarization, using a
sliding adaptive window for including the context, while the other one exploited the
correlation using knowledge base and attention. Nested entities were addressed using
dependencies to explore the possible span of the entities (Details in the table). Further,
newly introduced popular, embedding, or vector representations of words helped the
models attain significant gain in performance and are responsible for changing the
course of future research in NLP. These contextual embeddings can be pertained and
generated using LSTM or transformer, e.g., FLAIR, Embedding for Language Model
(ELMo), and Bidirectional Encoder Representations from Transformers (BERTs).
These can be computationally expensive; thus, a hybrid approach with lexical features
and small embedding can be seen for saving time [19]. Apart from these, in the
problem area of low resources, variations of transfer learning using graphs were used
to map corpus similarities [20] using n-gram and CRF models. The same problem was
also addressed using few-shot or even zero-shot learning using metric, prototypical,
and transfer learning [21] but requires more work in future for the multiple classes.
550 M. Madan et al.
Table 3 lists the details from 2019. The problem areas listed here are the imbal-
ance of data and emerging entities, apart from the previously mentioned problem
areas. Cloze words objective was used to pretrain where the middle word is to be
predicted using the left and the right context to achieve new state of the art. More
work listed in the table is the unified approach for nested and flat entities treating
NER as machine comprehension or as a query, exploiting more associated semantic
information compared to NER classification general approaches which do not have a
A Systematic Study of Various Approaches and Problem Areas … 551
specific task in mind but extraction. To handle the imbalance of data, that is, the nega-
tive and positive examples not being in the same proportion, a new criterion “dice
loss” was proposed that gives equal weightage to false positives or negatives and
thus is less susceptible to being affected by the data imbalance (details in the table).
Also, the problem area of the low availability of resources, the effort can be seen
in the automatic generation of the corpus using bootstrapping [28] with the advan-
tage of the possibility of adding specific information to the corpus along with being
low cost, but future needs removal of incorrect labels. On the other side, phonetics
was explored with the advantage of being able to work in multilingual settings by
Cabot et al. [29] but with a few remaining errors of spelling and disambiguation to
be removed in future.
552 M. Madan et al.
Table 4 lists the work in 2020. The problem areas emerge again as performance, nested
entities, and class imbalance. In a certain language, the entity recognition performed
better when the researcher added word position in the word vector because the entity
boundary is otherwise difficult to find using Bi-LSTM + CRF and IDCNN + CRF
[34]. Another approach in the direction of the low labeled data availability, thus cost-
effective, was suggested by using triggers [35] with the benefit of a solution that can
generalize even to the sentences that were not seen by the model, the trigger is the
phrases that indicate the system toward an entity or in other words which would give
a human intuition of recognizing an entity in the given text.
A Systematic Study of Various Approaches and Problem Areas … 553
The list of the latest papers in 2021 is presented in Table 5. Problem areas seen here are
overfitting and unified solution to flat and nested entities. Extensive experimentation
on the models by varying embedding, sentence length, and many other factors was
done by researchers and also seen adding GCN and gating to the Bi-LSTM-CRF
model while gaining the context and robustness. Also, seen is the use of adversarial
training for robustness. The adversarial training is used to add samples to the training
data using an attack algorithm; here, the data are added to fill the data gaps. Going
forward to reinforcement learning, negative data and span selector were successfully
utilized by researchers Peng et al. [40] to achieve comparable gain without extra
554 M. Madan et al.
corpus, but optimization could be worked upon in the future for this. Figure 2 shows
the areas identified in the duration in the form of a word cloud.
3 Conclusion
The future can be toward contextual embedding, i.e., ELMo, BERT, etc., as we can
see the latest models achieving the state of the art using them in any NLP area.
However, they have high demand in terms of computation and may need special
hardware, and in the future, any effort in this direction or even the use of hybrid
models will make them more available and easily accessible. One prominent problem
A Systematic Study of Various Approaches and Problem Areas … 555
area identified is the need of a big corpus for the supervised approaches and points
us to the unsupervised approaches that currently use transfer learning, adversarial
training, etc. Efforts here in future can prove fruitful as it is yet to see more in terms
of gain in results. Few problems still need effort such as OOV words due to fast
emerging technologies in every field do make it difficult to catch up and can be a
desirable future work area. Also, can be foreseen much more work areas in future in
terms of the languages and particular domains that need corpus and specific effort,
lot of work should be seen in the future for the same.
The survey aims to provide an overview of NER’s latest techniques and problem
areas with approaches based on recent work. With the above sample of papers, the
supervised, semi-supervised, reinforcement learning, and unsupervised approaches
are covered with most of the problem areas that can be seen repetitively appearing
throughout this duration. The paper can help provide the future direction to researcher
while summing together the latest on this front.
References
1. Li J, Sun A, Han J, Li C (2022) A survey on deep learning for named entity recognition. IEEE
Trans Knowl Data Eng 34(1):50–70
2. Sharma R, Morwal S, Agarwal B (2019) Named entity recognition for Hindi language: a survey.
J Discrete Math Sci Cryptogr 22(4):569–580
3. Popovski G, Koroušić Seljak B, Eftimov T (2020) A survey of named-entity recognition
methods for food information extraction. IEEE Access 8:31586–31594
4. Tiberiu-Marian G, Iancu B, Zamfiroiu A, Doinea M, Boja CE, Cartas C (2021) A survey on
named entity recognition solutions applied for cybersecurity-related text processing. In: Yang
XS, Sherratt S, Dey N, Joshi A (ed) Proceedings of fifth international congress on information
and communication technology. Springer, Singapore, pp 316–25
5. Wang Y, Zhu Z, Li Y (2022) Nested named entity recognition: a survey. ACM Trans Knowl
Discov Data, 1556–4681
6. Wang X, Jiang Y, Bach N, Wang T, Huang Z, Huang F et al (2020) Automated concatenation
of embeddings for structured prediction. In: Proceedings of the 59th annual meeting of the
556 M. Madan et al.
association for computational linguistics and the 11th international joint conference on natural
language processing (volume 1: long papers), pp 2643–2660
7. Yamada I, Asai A, Shindo H, Takeda H, Matsumoto Y (2020) LUKE: deep contextualized
entity representations with entity-aware self-attention. In: Proceedings of the 2020 conference
on empirical methods in natural language processing (EMNLP), pp 6442–6454
8. Li X, Sun X, Meng Y, Liang J, Wu F, Li J (2019) Dice loss for data-imbalanced NLP tasks.
In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp
465–476
9. Ye D, Lin Y, Li P, Sun M (2022) Packed levitated marker for entity and relation extraction.
In: Proceedings of the 60th annual meeting of the association for computational linguistics
(volume 1: long papers), pp 4904–4917, Dublin, Ireland
10. Akkasi A, Varoglu E (2017) Improving biochemical named entity recognition using PSO
classifier selection and Bayesian combination methods. IEEE/ACM Trans Comput Biol Bioinf
14(6):1327–1338
11. Liu F, Xu X, Ji Z (2017) MapReduce based named entity recognition for biology literature. In:
2017 IEEE 2nd advanced information technology, electronic and automation control conference
(IAEAC), pp 1268–1271
12. Dernoncourt F, Lee JY, Szolovits P (2017) NeuroNER: an easy-to-use program for named-entity
recognition based on neural networks. In: Proceedings of the 2017 conference on empirical
methods in natural language processing: system demonstrations, pp 97–102, Copenhagen,
Denmark
13. Kanimozhi U, Manjula D (2017) A CRF based machine learning approach for biomedical
named entity recognition. In: 2017 second international conference on recent trends and
challenges in computational models (ICRTCCM), pp 335–342
14. Cotterell R, Duh K (2017) Low-resource named entity recognition with cross-lingual, character-
level neural conditional random fields. In: Proceedings of the eighth international joint confer-
ence on natural language processing (volume 2: short papers). Asian Federation of Natural
Language Processing, Taipei, Taiwan, pp 91–96
15. Cetoli A, Bragaglia S, O’harney AD, Sloan M (2018) Graph convolutional networks for
named entity recognition. In: Proceedings of the 16th international workshop on treebanks
and linguistic theories, pp 37–45, Prague, Czech Republic
16. Wang C, Chen W, Xu B (2017) Named entity recognition with gated convolutional neural
networks. In: Sun M, Wang X, Chang B, Xiong D (ed) Chinese computational linguis-
tics and natural language processing based on naturally annotated big data. Cham: Springer
International Publishing, pp 110–121
17. Yang J, Zhang Y, Dong F (2017) Neural reranking for named entity recognition. In: Proceedings
of the international conference recent advances in natural language processing (RANLP), pp
784–792, Varna, Bulgaria
18. Liu L, Shang J, Ren X, Xu FF, Gui H, Peng J et al (2018) Empower sequence labeling
with task-aware neural language model. In: Proceedings of the thirty-second AAAI confer-
ence on artificial intelligence and thirtieth innovative applications of artificial intelligence
conference and eighth AAAI symposium on educational advances in artificial intelligence
(AAAI’18/IAAI’18/EAAI’18), Article 644. AAAI Press, pp 5253–5260
19. Ghaddar A, Langlais P (2018) Robust lexical features for improved neural network named-
entity recognition. In: Proceedings of the 27th international conference on computational
linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, USA, pp
1896–1907
20. Sheikhshab G, Starks E, Karsan A, Chiu R, Sarkar A, Birol I (2018) GraphNER: using
corpus level similarities and graph propagation for named entity recognition. In: 2018 IEEE
international parallel and distributed processing symposium workshops (IPDPSW), pp 229–238
21. Fritzler A, Logacheva V, Kretov M (2019) Few-shot classification in named entity recognition
task. IEEE Trans Knowl Data Eng
22. Malykh V, Lyalin V (2018) Named entity recognition in noisy domains. In: 2018 international
conference on artificial intelligence applications and innovations (IC-AIAI)
A Systematic Study of Various Approaches and Problem Areas … 557
23. Liu J, Wang L, Zhou M, Wang J, Lee S (2018) Fine-grained entity type classification with
adaptive context. Soft Comput 22(13):4307–4318
24. Song CH, Hltcoe DL, Finin T, Mayfield J, Improving neural named entity recognition with
gazetteers
25. Akbik A, Blythe D, Vollgraf R (2018) Contextual string embeddings for sequence labeling. In:
Proceedings of the 27th international conference on computational linguistics. Association for
Computational Linguistics, Santa Fe, New Mexico, USA, pp 1638–1649
26. Sohrab MG, Miwa M (2018) Deep exhaustive model for nested named entity recognition. In:
Proceedings of the 2018 conference on empirical methods in natural language processing, pp
2843–2849. Association for Computational Linguistics, Brussels, Belgium
27. Peters ME, Neumann M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextu-
alized word representations. In: Proceedings of the 2018 conference of the North American
chapter of the Association for Computational Linguistics: human language technologies, vol
1. Association for Computational Linguistics, New Orleans, Louisiana, pp 2227–2237
28. Kim J, Ko Y, Seo J (2019) A bootstrapping approach with CRF and deep learning models for
improving the biomedical named entity recognition in multi-domains. IEEE Access 7:70308–
70318
29. Cabot C, Darmoni S, Soualmia LF (2019) Cimind: a phonetic-based tool for multilingual
named entity recognition in biomedical texts. J Biomed Inform 94
30. Nawroth C, Engel F, Mc Kevitt P, Hemmje ML (2019) Emerging named entity recognition on
retrieval features in an affective computing corpus. In: 2019 IEEE international conference on
bioinformatics and biomedicine (BIBM), pp 2860–2868
31. Li X, Feng J, Meng Y, Han Q, Wu F, Li J (2019) A unified MRC framework for named entity
recognition. In: Proceedings of the 58th annual meeting of the Association for Computational
Linguistics, pp 5849–5859
32. Devlin J, Chang MW, Lee K, Google KT, Language AI (2019) BERT: pre-training of deep
bidirectional transformers for language understanding. In: Proceedings of the 2019 confer-
ence of the North American chapter of the Association for Computational Linguistics: human
language technologies, vol 1, pp 4171–4186, Minneapolis, Minnesota
33. Baevski A, Edunov S, Liu Y, Zettlemoyer L, Auli M (2019) Cloze-driven pretraining of self-
attention networks. In: Proceedings of the 2019 conference on empirical methods in natural
language processing and the 9th international joint conference on natural language processing
(EMNLP-IJCNLP), pp 5360–5369, Hong Kong, China
34. Du Y, Zhao W (2020) Named entity recognition method with word position. In: 2020 inter-
national workshop on electronic communication and artificial intelligence (IWECAI), pp
154–159
35. Lin BY, Lee DH, Shen M, Moreno R, Huang X, Shiralkar P et al (2020) TriggerNER:
learning with entity triggers as explanations for named entity recognition. In: Proceedings
of the 58th annual meeting of the Association for Computational Linguistics. Association for
Computational Linguistics, pp 8503–8511
36. Luo Y, Xiao F, Zhao H (2020) Hierarchical contextualized representation for named entity
recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, no 05,
pp 8441–8448
37. Thi T, Hanh H (2021) Named entity recognition architecture combining contextual and global
features. In: Ke HR, Lee CS, Sugiyama K (eds) Towards open and trustworthy digital societies.
ICADL 2021. Lecture Notes in Computer Science, vol 13133. Springer, Cham
38. Osman A, Berk¸ Berk¸sapcı B, Tastan O, Yeniterzi R (2021) Focusing on possible named entities
in active named entity label acquisition
39. Yu J, Bohnet B, Poesio M (2020) Named entity recognition as dependency parsing. In: Proceed-
ings of the 58th annual meeting of the Association for Computational Linguistics. Association
for Computational Linguistics, pp 6470–6476
40. Peng S, Zhang Y, Wang Z, Gao D, Xiong F, Zuo H (2021) Named entity recognition using
negative sampling and reinforcement learning. In: 2021 IEEE international conference on
bioinformatics and biomedicine (BIBM), pp 714–719
558 M. Madan et al.
41. Xu L, Jie Z, Lu W, Bing L (2021) Better feature integration for named entity recognition
42. Peng Q, Zheng C, Cai Y, Wang T, Xie H, Li Q Unsupervised cross-domain named entity
recognition using entity-aware adversarial training. Neur Netw 138:68–77
43. Fu J, Liu J, Shi W (2021) Exploiting named entity recognition via pre-trained language model
and adversarial training. In: 2021 IEEE international conference on computer science, elec-
tronic information engineering, and intelligent control technology, CEI 2021. Institute of
Electrical and Electronics Engineers Inc., pp 665–669
44. Yan H, Gui T, Dai J, Guo Q, Zhang Z, Qiu X (2021) A unified generative framework for various
NER subtasks. ACL
From Virtual World Back to Real
Classrooms?
1 Introduction
To tell the truth, there were a lot of problems already with university education before
Covid-19 [1–3]. Old, classical methods were not working as before. All participants
in educational process were changed and maybe the most important thing, the “world”
around us was changed as well. Maybe the most significant thing was noticed by
many educators all over the world—students were not so active, and sometimes it
seemed that they were totally boring during the lessons. Often heard, “It’s a waste
of time to learn such a thing being one click distance on the internet.”
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 559
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_43
560 V. Bakonyi and I. Zoltán
The question is obvious: Why was the method good before some decades ago and
why is it not working now? Now we experience, they forget active thinking, enough
“two clicks” and go! Researchers detected two main problems:
• Prensky [4] spoke about digital natives first. It is far much than a simple word,
indicator. Spending hours a day on the Internet has changed the physical activity of
people’s brains. Researchers state that this is the result of the must of frequent and
quick decisions what we are stimulated, when reading a nice, designed homepage
with a lot of inspiring links. Therefore, digital natives need bigger stimulus not to
be bored.
• Surrounded by several devices parallel and making quick decisions causes not
only a bigger activity in brain, but meanwhile it causes hyper-attention [5]. This
problem was detected by Kathrine Hayes. Hyper-attention means that the person
who use several devices parallel must switch between the stimulus, and it can be
managed properly on 2% of people—at the others efficiency will decrease [6].
As IT professionals, it is easy to give an example that can highlight the problem:
switching between tasks in a multitasking operating system involves a lot of steps
to reset the system later on where to continue. The “resetting” of brain back to
the previous thoughts needs extra efforts.
Some years ago, we decided to use a CRS in our teaching practice to increase inter-
activity in our lectures and practices. Mostly, at large-scale lectures, presentations
participated several hundreds of students, we have seen, the classical methods did not
work further. If a professor asked a question, only the silent was the answer. Students
did not want to raise their hands to ask when they do not understand something new or
a question got to their mind—they might shamed their questions before the others. So,
we recognized; their accustomed communication mode is chatting instead of classic
oral communication [12]. We, as a first step, examined few possible ready-made
software.
However, they are mostly well designed good solutions, but with limitations and
difficulties. We think not only our students found, e.g., funny language inventions
as ID names (Donald dug, No pain-No gain, etc.) causing 5–10 min laughing break.
Sure such a funny situations are not fully waste of times, unfortunately they not really
help the efficiency of lectures. After trial period, we decided to implement an own
system avoiding unnecessary cases. See Fig. 1 [13–15].
It became a Bring Your Own Device (BYOD), bidirectional real-time Web appli-
cation based on university standard authentication process (no funny ID names, no
ambiguous language questions, etc.). It was written in C#, using Web Forms template,
and for real-time features, we used SignalR. It was bidirectional because teacher may
send quiz questions to all of the students’ joint devices and students may send back
answers; their questions toward the teacher or a simple do not understand signals.
The concept was working; we measured the efficiency of it and proved that the
results became better using it see Table 1. In our faculty, there are 3 different training
versions students may choose. That is why some of them learn the subject in autumn
and others in spring. They were all Hungarian students; the lectures were given in
Table 1 Measure of
2017 without 2018 first year
efficiency [20]
E-lection with E-lection
Spring Missed exam 20.5% 17.8%
Average 9.9 10.6
Dispersion 4.8 5.2
Median 10 11
Autumn Missed exam 15.5% 14.2%
Average 10.3 12.2
Dispersion 4.9 5.5
Median 10 12.2
their native language. The age group of them is between 20 and 25. They are both
from the countryside and from bigger towns, mixed.
Beside the efficiency also was important for our students, that this CRS was written
by their professors and in this case not true the well-known saying “Preaching water
and drinking wine!”
In 2020 spring from one day to another, we have to transfer our courses to virtual
space without any trial period or planning. What about education which usually based
on personal connections? [16–18].
• LCMS (Learning Content Management System) where educators have to upload
the documents, ppt-s, video tutorials, create online tests, quizzes; on the other
hand, the students have to upload the homework, performed the online tests, e.g.,
Moodle, Canvas, etc.
• VCS (Virtual Classroom Systems) and give lessons online in real time to keep
the personal contact with our students as much as we can, e.g., Google Meet, MS
Teams.
• Hybrid solution, some lessons are online, some have got only documents or
another version of hybrid learning to attend personally only one third of the
students at the same time, and they are rotating. This is practically mixing of
LCMS and VCS.
From Virtual World Back to Real Classrooms? 563
473 Hungarian students answered the questions. Age group is between 20 and 25.
They are from the country side and from bigger towns too, mixed. Filling out was
not compulsory. We used Google Forms to support anonymity. Survey for Hungarian
students is available at https://forms.gle/b5AMePqjmG7DvnWbA.
Now, we focus on the following 3 questions:
• Do you like online Teams courses? (Grade 1 (not at all)–5 (very much))
• What do you think, a Teams online lesson may substitute a classical live lesson,
where you are personally there? (Grade 1(absolute not)–5(totally))
• What do you not like in them? (Free text).
Most of them liked it, see Fig. 2, they graded it quite well. (We must note that first
semester student’s likeness was lower than in case of experienced students, which
average was grade 4.0. This is a little bit higher, but this is not so surprised, they are
practically professionals.) 62% of Hungarian students graded 4 or 5 the likeness of
Teams online courses. At about 9.5% graded it 1 or 2 others liked it. The average
grade is 3.76. This value is higher than we thought before the survey.
The result for the second question was a little bit surprising. As we saw at previous
question, they liked online lessons, but the same time they do not think it may
substitute the classic face-to-face teaching see Fig. 2. The average grade is only 3.04
contrary to 3.76!
564 V. Bakonyi and I. Zoltán
Comparing the trends of the data given in the two different questions (Hungarian
survey), we can clearly see that to go into the virtual space for ever and neglect
personal classes is not so acceptable for a lot of students. See the dotted trend lines,
Fig. 2.
So, students liked very much online courses, but they can’t substitute classic
educational method. What my cause this controversy? What is the reason for this
contradiction? We asked their opinion what they do not like.
They mentioned a lot of things; we analyzed the results and published in papers
as well [19, 20]. Among others, the result proved that they need more interactivity,
more personality which seemed to be lost using virtual classrooms. At Table 2, we
gave the average grade of likeness of online lessons in the case the student mentioned
a given problem. Naturally, if they have a problem, the likeness is lower.
Though they miss interactivity, we noticed all over distance period that students
now (maybe regarding this lonely period) are less active than before. They do not
like to switch on their microphone or Web camera. They do not like to share their
screens. They prefer chat messages. Therefore, interaction is not easy with them.
Based on the epidemic regulations of the country, 2021 autumn, we tried hybrid
teaching method. Half of the students were personally in the class, and half of them
were online, next week they changed—to avoid crowd in the buildings. We started
2022 spring semester with distance teaching due to a strong Covid wave, but from the
1st of March, we continued with face-to-face teaching. The rules for absence were
not so strict because faculty considered that may be there will be more illness among
students and professors than usual. For the same reason, lessons were streamed and
recorded through Teams for those who must stay at home for any reasons.
We and all over the world educators notice that something changed—again [23–
25]. Yes, as we know this, one thing is constant, the change. A lot of students do
not want to come into university at all; they prefer to be at home, and they neglect
interactivity as many as they can—they were absent till the maximum limit. As we are
involved to understand the students’ needs, we again created an anonymous survey
and asked their opinions.
It was done in April, in 2022 spring semester, after the distance period. 169 Hungarian
students answered the questions. Filling out was not compulsory. The age group is
between 20 and 25. They are from the countryside and from bigger towns too, mixed.
We used Google Forms to support anonymity.
Survey for Hungarian students is available at https://forms.gle/J5syCkqYXFYm
uY148.
Now, we should like to focus on three questions with which we can detect how
the students’ needs changed toward online teaching.
• What form of teaching would you choose? (Dropdown list)
• In terms of lecture-style lessons, which do you find most effective? (Dropdown
list)
• Which is the most adequate practice style? (Dropdown list).
After two years of studying in emergency situation, they already tried fully online
and hybrid teaching.
About 30% of students would like to learn fully online, and only, 10% want to go
back to traditional school! A lot of students would prefer a mixed style see Fig. 3.
See, Fig. 2 again, where we collected data what they think about possible substitu-
tion of traditional education with online teaching. We can compare data (Hungarian
566 V. Bakonyi and I. Zoltán
One is the most precious; serious part of education is the interaction among the
professors and the students, the common work, the interaction, discussion between
them. Think about Dale’s cone again. Nowadays, it is said, a professor should work
as a tutor and not as a classical teacher who use frontal teaching tools.
3.4 Lectures
Our second question was: In terms of lecture-style lessons, which do you find most
effective? The possible choices were the followings: pre-recorded material, no need
for live lectures; live lecture with ppt, demos; live lecture with content outline (ppt);
live lecture using demos only.
If somebody prefer pre-recorded lectures, then he/she does not want to be involved,
does not want to discuss anything in real time. The possibility of real-time interaction
is lost.
Let us see the results see Fig. 5. Surprisingly, at about 40% of students choose this
ready-made, impersonal lecture style as most effective. But, there is another group;
it is as big as the first mentioned (40%) which prefer live lectures with demos and
ppt-s.
Our next question is: Which is the most adequate practice style?
568 V. Bakonyi and I. Zoltán
The possible choices were the followings: Lecture style + small ppt; from the
initial sample code—solution written by the teacher; after practical presentation,
instructor writes the full solution; after practical presentation, individual solutions
from initial sample code; direct practical exercise presentation—then individual
solutions.
The first three choices means that professor has a bigger role—they are more
classical, frontal methods. The last two choices mean, the professor role is rather a
tutor role; he/she may give personal help in solutions. Meanwhile, it means a more
active learning mode. But, the fact is that at more than half of the students prefers
to attend practice lessons without really, individual contributions, where they might
understand deeper the contexts, can ask better questions—see Fig. 6.
This kind of attitude is against Dale’s experience; it is against the law of personal
care and attention. Covid-19, the emergency situation, the online lessons strengthened
such a strange attitude.
4 Summary
very good base to increase the level of active learning and give a better motivation to
students—contrary we found that a large proportion of students’ priorities becomes
more convenience and are less inclined to work independently. Unfortunately, this
unexpected result comes at the expense of the ability to learn and the knowledge
acquired.
It is clear, we cannot step into the same river twice; we must not go back simply to
the teaching mode before Covid-19. The usage of new tools is required by everybody.
This moment, we do not have the stone of wise; we have to work on a more successful
mixed teaching mode, a new mix of tools, maybe a new E-Lection we need again,
as we did before to bring back students’ activity.
References
1. Zitny R et al (2016) Education using mobile technologies. In: ICETA 2016.11.24–25. Stary
Smokovec IEEE, pp 115–120. ISBN 9781509046997
2. Zoltán I, Bakonyi BH, Illés Z Jr (2016) Supporting dynamic, bi-directional presentation
management in real-time. In: Emil V (ed) 11th joint conference on mathematics and computer
science, CEUR-WS.org, 6 p
3. Bakonyi HV, Zoltán I (2017) Real-time tool integration for lectures. In: 15th IEEE inter-
national conference on emerging elearning technologies and applications (ICETA 2017).
Starý Smokovec, Slovakia, 2017.10.26–2017.10.27. IEEE Computer Society Press, Denver,
pp 31–36. ISBN 978-1-5386-3294-9
4. Prensky M (2001) From on the horizon, vol 9, no 5. MCB University Press. https://bit.ly/2Ye
KG7U
5. Hayes NK (2007) Hyper and deep attention: the generational divide in cognitive moods.
Profession, 187–199 pp
6. Bradberry T, Multitasking damaging your brain and career, new studies suggest. https://
www.forbes.com/sites/travisbradberry/2014/10/08/multitasking-damages-your-brain-andcar
eer-new-studies-suggest/2/#6088a80642ef
7. Shaaruddin J, Mohamad M (2017) Identifying the effectiveness of active learning strategies and
benefits in curriculum and pedagogy course for undergraduate TESL students. Creative Educ
8(14). Available at https://www.scirp.org/journal/PaperInformation.aspx?PaperID=80647
8. Opre D et al (2022) Supporting students’ active learning with a computer based tool. Act Learn
High Educ. https://doi.org/10.1177/14697874221100465
9. Brown JL (2016) Quick, click: student response systems evolve in higher ed, New student
response systems offer increased versatility. University Business. http://bit.ly/2fnJMRw
10. Dangel H, Wang C (2008) Student response systems in higher education: moving beyond linear
teaching and surface learning. J Educ Technol Dev Exchange 1(1):93–104. http://www.sicet.
org/journals/jetde/jetde08/paper08.pdf
11. Mader S, Bry F (2019) Audience response systems reimagined. In: Herzog M, Kubincová Z,
Han P, Temperini M (eds) Advances in web-based learning—ICWL 2019. ICWL 2019. Lecture
notes in computer science, vol 11841. Springer, Cham
12. Li R (2020) Communication preference and the effectiveness of clickers in an Asian university
economics course. Heliyon 6(4):e03847. https://doi.org/10.1016/j.heliyon.2020.e03847
13. Bakonyi V, Zoltan I, Verma C (2021) Key element in online education to activate students
with real-time tools. In: Institute of electrical and electronics engineers 2021 2nd international
conference on computation, automation and knowledge management (ICCAKM) conference:
Dubai, United Emirates, 19 Jan 2021–21 Jan 2021. Curran Associates, Red Hook (NY), pp
326–331
570 V. Bakonyi and I. Zoltán
14. Bakonyi V, Zoltan I, Verma C (2020) Towards the real-time analysis of talks. In: Chauhan
AK, Singh G (2020) International conference on computation, automation and knowledge
management Dubai. Amity University, United Arab Emirates, pp 322–327, 6 p
15. Bakonyi V, Illes Z, Verma C (2020) Analyzing the students’ attitude towards a real-time class-
room response system. In: Institute of Electrical and Electronics Engineers 2020 international
conference on intelligent engineering and management (ICIEM), London, pp 69–73. ISBN
9781728140971
16. Kővári E, Bak G (2021) University students’ online social presence and digital competencies
in the COVID-19 virus situation. In: Agrati LS et al (eds) Bridges and mediation in higher
distance education. HELMeTO 2020. Communications in computer and information science,
vol 1344. Springer, Cham. https://doi.org/10.1007/978-3-030-67435-9_13
17. Martin F, Parker M, Deale D (2012) Examining interactivity in synchronous virtual classrooms.
Int Rev Res Open Distrib Learn 13(3):227–261. https://doi.org/10.19173/irrodl.v13i3.1174
18. Bower B (2006) Virtual classroom pedagogy. In: Conference: proceedings of the 39th SIGCSE
technical symposium on computer science education, SIGCSE 2006, Houston, Texas, USA,
3–5 March 3–5. https://doi.org/10.1145/1124706.1121390
19. Bakonyi V, Zoltán I (2020) Real-time and digital solutions in education during emergency
situation in Hungary. In: Abonyi-Tóth A, Stoffa V, Zsakó L (eds) New methods and technologies
in education, research and practice: proceedings of XXXIII. DidMatTech 2020 Conference
Budapest. ELTE Informatikai Kar, Magyarország, 507 p, pp 231–240, 10 p
20. Bakonyi V, Zoltán I (2020) Real-time online courses during emergency situation in Hungary.
In: ICETA 18th international conference on emerging elearning technologies and applications,
Nov 12, Stary Smokovec, Slovakia
21. Bakonyi V, Zoltán I, Szabó T (2021) Real-time interaction tools in virtual classroom systems.
In: Singh PK, Singh Y, Chhabra JK, Illés Z, Verma C (eds) Recent innovations in computing:
proceedings of ICRIC 2021, vol 2. Springer Singapore, Singapore, pp 625–636, Paper:
Chapter 47, 12 p
22. Bakonyi V, Zoltán I, Verma C (2021) Real-time education in emergency situation. In: 2021
international conference on advances in electrical, computing, communication and sustainable
technologies (ICAECT), Piscataway (NJ), USA. IEEE, pp 1–6, 6 p
23. Bialystok L (2022) Education after COVID. https://doi.org/10.7202/1088373ar. Available at:
https://www.researchgate.net/publication/360986155_Education_after_COVID
24. Mazzara M et al (2022) Education after COVID-19. In: Smart and sustainable technology for
resilient cities and communities. https://doi.org/10.1007/978-981-16-9101-0_14. Available at
https://bit.ly/3dsyzOF
25. Koopman O, Joy K, Karen K (2021) The rise of the university without classrooms after COVID-
19. In: Re-thinking the humanities curriculum in the time of COVID-19. CSSALL Publishers
(Pty) Ltd. Available at: https://bit.ly/3zYRf03
Aspect-Based Opinion Mining
Framework for Product Rating
Embedded with Fuzzy Decision
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 571
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_44
572 G. Srivastava et al.
1 Introduction
Fig. 1 Architecture of a
generic opinion analysis Corpus
system
Lexicon Document
& Processing
Lingustic Resources
Document
Analysis
Opinion Scores
for
Aspects
Aspect-Based Opinion Mining Framework for Product Rating … 573
While sentiment analysis locates and analyzes the polarity of a text, opinion
mining obtains and analyzes people’s views about a subject [4–6]. Opinion mining
is the technique of categorizing opinions to identify if the polarity of the opinion
holder as neutral, positive, or negative about a product or topic. Figure 2 illustrates the
general framework or phases involved in the identification of polarity of expressions:
Data gathering. The WWW is a vast repository of data, and the process of opinion
mining begins with data collection. Review sites and social networks are only two
examples of diversified sources of data from which data for opinion mining can be
gathered. Flipkart.com instance may be employed to accumulate data on customer
opinion for any product.
Data Preprocessing. Primary step for classification involves preprocessing of
data. Preprocessing helps improve the performance of classification algorithms by
reducing noise. Real-time sentiment classification is accelerated with the help of
preprocessing.
Preprocessing involves:
i. Tokenization
ii. Removal of URLs, hash tags, references, and special characters
iii. Slang word translation
iv. Stemming.
The subsequent step in the classification of opinions is feature extraction or feature
selection. Here, pertinent features that are used to create an efficient and precise
classifier are chosen. The potential of the features selected determines how successful
a classification model will be. Once features are identified various, the classification
of the opinions is done using supervised, unsupervised machine learning, or lexicon-
based algorithms, which is a tough problem.
Opinion Polarity, Evaluation, and Results: In this step, opinions are divided into
three categories—positive, negative, and neutral—and the effectiveness of the clas-
sification methods is assessed using established performance metrics. Lastly, the
opinion mining results are shown as charts or graphs. [7–10]
The paper is organized as follows: Sect. 2 presents the motivation for conducting
the study; Sect. 3 describes the materials and method employed for preprocessing
of the corpus and implementing the algorithm adopted for estimating the aspect
score classification of the cleaned data; Sect. 4 briefly discusses the outcomes of the
experiment carried out before drawing a conclusion in Sect. 5.
574 G. Srivastava et al.
2 Motivation
Applying various strategies to analyze the opinions and sentiments derived from
data sources is known as opinion mining [11]. Machine learning and lexicon-based
techniques are two processes for classifying opinions expressed. Machine learning
techniques is a commonly utilized and useful strategy in opinion mining algorithms
devised because of its ability to handle enormous amounts of data and automatic
implementation. The machine learning methodology uses an algorithm that enables
systems to understand [12]. Algorithms for machine learning classification are a
group of techniques for discovering patterns in data. Classification algorithm first
creates a framework to classify the testing dataset after learning how to classify
opinions from a training dataset [13]. There are various types of machine learning-
based sentiment classification methods, including supervised learning approach,
unsupervised learning approach, and semi-supervised learning approach [14].
Sentiment lexicons, which are collections of annotated and preprocessed senti-
ment phrases, are used in lexicon-based approaches [15]. Through the discovery of
an opinion lexicon that analyzes the textual material, the lexicon-based technique
carries out opinion mining. Lexicon-based methods can categorize opinions in two
different ways [14]:
i. dictionary-based approach and
ii. corpus-based approach.
The decision tree (DT), Naive Bayes (NB), K-nearest neighbor (KNN), support
vector machine (SVM), and SentiWord-Net machine learning and lexicon-based
algorithms were compared in [15], which provided a detailed assessment of various
machine learning techniques. A model for combined sentiment topics (CSTs) based
on unsupervised learning was introduced in [16]. According to experimental find-
ings, this model outperformed supervised and semi-supervised methods in diversified
domains.
Support vector machines and Naive Bayes supervised machine learning methods
were covered in [17], which also provided a summary of the opinion mining area.
They determined that the task of opinion mining is exceedingly difficult and provided
a description of the metrics for performance evaluation of the opinion mining classifi-
cation method: accuracy and F1-score. To examining customer feedback, researchers
in [18] developed phrase-level opinion mining. Frequent itemset mining was utilized
for aspect extraction, and the opinion or sentiment orientation of each aspect was
determined using the supervised machine learning Naive Bayes method. In [19],
a dictionary-based method for polarity analysis of Twitter corpus was presented.
Combining various opinion mining classifiers is known as the hybrid classification
approach for opinion mining. Shahnawaz and Astya [20] addressed many methods
for sentiment analysis, including lexicon, machine learning, and hybrid sentiment
classification strategies.
Aspect-Based Opinion Mining Framework for Product Rating … 575
3 Proposed Methodology
Extracting aspect from corpus provides very useful insights into thought process of
customers, which can serve as base for prospects of products and their commer-
cial viability. Aspects are the potential entities rated by the reviewers that provide
excellent information on trends of sale and customer liking [21]. An aspect for a
product can be a sentence or a single word. From work studied, it can conclude that
generally noun and phrases containing noun are potential aspects. Figure 3 depicts
the proposed generalized flow graph for aspect-assisted opinion mining framework
for product rating embedded with fuzzy decision. To segregate aspects, searching of
noun and phrases containing nouns of the reviews are required. Aspect extraction is
performed at sentence level for all reviews; steps can be summarized as
i. Acquire features and split the features into the sentence, the sentence is then
put up for analysis.
ii. Evaluate the part of speech tag at the sentence level.
iii. Extract NN, NNP, NNS, etc., tagged words. Step is performed for each
sentence.
iv. Estimate acutance of words extracted.
v. Removal of redundant words.
vi. Work on most frequent words.
vii. Cluster synonyms and label them.
viii. Estimate subjectivity and polarity.
ix. Classify tokens as positive, negative, and neutral.
Aspect table comprises of potential aspects and similar words. The aspect-based
corpus for product rating was acquired from GitHub; aspect database comprised of
7563 rows; visible column is depicted in Fig. 4. Aspect table so formed is employed
for evaluating the subjective phrases [22–24]. Stanford POS tagger was employed
for tagging purpose.
Database
Reviews
Fig. 3 Generalized flow graph aspect-assisted opinion mining framework for product rating
embedded with fuzzy decision
576 G. Srivastava et al.
Figure 7 depicts the number of trigrams that appear in corpus; the graph presents
information about expressions, people carry for products. For subjective phrases,
identification is achieved via labeled polarity words as present in phrases defining
expressions about products. Opinionated phrases so labeled are called subjective
phrases, that must be analyzed, and other redundant phrases not included for
expression establishment should be removed. This helps in reducing latency further.
The proposed technique takes phrases as a subjective phrase only if it has a feature
present in the aspect table. The same was estimated for aspect corpus and is depicted
in Fig. 8. In aspect-based opinion mining, it is of primary importance to identify
words that alter the meaning of aspects, as these help in identification of opinion
words that establish polarity of communication toward those aspects.
Work done employed adjectives, verbs, adverb adjective, and adverb verb
combinations for polarity establishment.
The proposed technique employs adjectives, verbs, adverb adjective combina-
tions, and adverb verb combinations as potential data for polarity estimation. Adjec-
tives are typically very important opinion words in a phrase. Trigrams are searched
forward and backward from the aspect location of the sentence to acquire opinion
words. This was performed employing POS tag data; the technique doesn’t provide
satisfactory results if phrases have number of aspects and will not be able to accurately
estimate contextual polarity scores.
Figure 9 presents the sparse diagram between subjectivity and polarity estimated
from the corpus. Technique employed uses dependency relations that exist between
words. The algorithm extracts opinion words for score estimation. Based on these
578 G. Srivastava et al.
scores, the polarity of the aspects is estimated; values estimated are further subjected
to fuzzification to predict the prospects of products.
Aspect-Level Score Calculation: To this point, the subjective phrase of a product
review is estimated for aspect-related opinion words. In aspect-level score estimation
step, the polarity score value of an aspect in a phrase is evaluated by algebraically
adding opinion word scores in that phrase.
The proposed methodology provides polarity ratings to the opinion words algo-
rithm before assigning priority values. Here, word sense disambiguation is not
considered in the process [25–27] (Fig. 10).
Fig. 9 Sparse diagram between subjectivity and polarity estimated from the corpus
Next step comprises of estimation of aspect score for entire review comments by
customers. Total score value of the aspect from entire review is estimated by alge-
braically adding the phrase-wise score of the aspect. Polarity of the area separately
aggregated. If the positive score is higher side, the conclusion about the product from
the entire set of reviews is categorized as positive otherwise it is negative. Aspect
score for the entire corpus can be estimated using following relation [28].
For each aspect j of the product:
Positive_ PolarityAggregate [ j] = Positivepoli, j (1)
i
Negative_ PolarityAggregate [ j] = Negativepoli, j (2)
i
Polarity values of the labeled dataset were then divided into 5 fuzzy set decision
boundaries, confirmed positive, positive, neutral, negative, and confirmed negative, to
accurately predict the correlation between polarity score obtained and product rating.
Figure 11 depicts fuzzification of the polarity ranges for five different categories.
Polarity estimated was mapped on fuzzy set defined for 5 different ranges
confirmed positive, positive, neutral, negative, and confirmed negative; from Fig. 12,
it can be concluded that:
i. Confirmed Positive—for the product that crossed 0.5% polarity on positive side
had excellent reviews in the corpus, the percentage share is 36% of the total
Fuzzy Ranges
1.5
0.5
Polarity
-0.5
-1
-1.5
1
33
65
97
129
161
193
225
257
289
321
353
385
417
449
481
513
545
577
609
Fig. 12 Percentage share of product with fuzzy boundaries
data, and fuzzy range suggests that the products studied will be referred and
bought. Product may be rated as top performing and highly saleable.
ii. Positive—product that stays between 0 and 0.5% polarity on positive side had
good reviews in the aspect corpus, the percentage share is 44% of the total data,
and fuzzy range suggests that products studied will be referred and has maximum
percentage share. Product may be categorized as performing satisfactory and
saleable.
iii. Neutral—for the product that stays at 0% polarity on axis had mixed reviews in
the aspect corpus, the percentage share is 3% of the total data, and fuzzy range
suggests the products studied may or may not be will be referred at all. Product
under this category comes under average performing category.
iv. Negative—for the product that stays between 0 and 0.5% polarity on negative
side had poor reviews in the aspect corpus, the percentage share is 12% of
the total data, and fuzzy range suggests that the products studied will never
be referred. Organization needs to modify or stop production of the product.
Product under this category comes under below average performing category.
v. Confirmed Negative—for the product that stays between beyond 0.5% polarity
on negative side had extremely poor reviews in the aspect corpus, the percentage
share is 1% of the total data, and fuzzy range suggests that the products studied
had no future at all and should be completely removed from list. Product under
this category comes under poor performing category.
Algorithm proposed maps the polarity ranges on the fuzzy boundaries with 5
classification sets and establishes correlation between polarity and product rating
through fuzzy boundary.
582 G. Srivastava et al.
5 Conclusion
Work proposed is an effort to rate a product and predict its demand based on the
aspect-assisted opinion mining embedded with fuzzy set decision boundary. The
primary work in aspect-assisted opinion mining is aspect identification, identification
and extraction of words that act as feature of aspects and its inclination detection.
Polarity scores of the labeled dataset were then divided into 5 fuzzy set decision
boundaries, confirmed positive, positive, neutral, negative, and confirmed negative,
to accurately predict the correlation between polarity score obtained and product
rating. Algorithm proposed was operated on aspect-based corpus for product rating
which acquired from GitHub, comprised of 7563 rows of product aspects; confirmed
positive was defined for products that crossed 0.5% polarity on positive side; the
percentage share is 36% of the total data, and fuzzy range suggests that these prod-
ucts will be referred and bought. Positive was defined for the product that stays
between 0 and 0.5% polarity on positive side; the percentage share is 44% of the
total data, and fuzzy range suggests that the products studied will be referred and
has maximum percentage share, Neutral was defined for the product that stays at
0% polarity on axis had mixed reviews in the aspect corpus; the percentage share
is 3% of the total data; negative was defined for the product that stays between
0 and 0.5% polarity on negative side had poor reviews in the aspect corpus; the
percentage share is 12% of the total data, and confirmed negative was defined for the
product that stays between beyond 0.5% polarity on negative side had extremely poor
reviews in the aspect corpus; the percentage share is 1% of the total data; the analysis
suggests that maximum share 44% lies with positive range of fuzzy boundary, and
this can be tapped for trend prediction of customer choice; the algorithm successfully
provides a fuzzy-assisted prediction mechanism for product rating from the aspect-
based corpus; algorithm performs at par with existing technique, and it also provides
percentage share of products presence in market for future trends.
References
1. Saad S, Saberi B (2017) Sentiment analysis or opinion mining: a review. Int J Adv Sci Eng Inf
Technol 7:1660. https://doi.org/10.18517/ijaseit.7.5.2137
2. Abirami AM, Gayathri (2017) A survey on sentiment analysis methods and approaches. In:
Proceedings of the 2016 eighth international conference on advanced computing, pp 72–76
3. Phan HT, Tran VC, Nguyen NT, Hwang D (2020) Improving the performance of sentiment
analysis of tweets containing fuzzy sentiment using the feature ensemble model. IEEE Access
8:14630–14641
4. Aung KZ, Myo N (2017) Sentiment analysis of students’ comment using lexicon based
approach. In: Zhu G, Yao S, Cui X, Xu S (eds) IEEE/ACIS 16th international conference
on computer and information science. IEEE, pp 149–154
5. Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3. 0: an enhanced lexical resource for
sentiment analysis and opinion mining. In: Proceedings of the seventh international conference
on language resources and evaluation, 2200–2204. http://www.lrec-conf.org/proceedings/lre
c2010/pdf/769_Paper.pdf
Aspect-Based Opinion Mining Framework for Product Rating … 583
6. Abirami AM, Gayathri V (2017) A survey on sentiment analysis methods and approach. In:
8th IEEE international conference on advanced computing. IEEE Press, Chennai, India, pp
72–76. https://doi.org/10.1109/icoac.2017.7951748
7. Mishra N, Jha CK (2012) Classification of opinion mining techniques. Int J Comput Appl
56(13):1–6. https://doi.org/10.5120/8948-3122
8. Soong H-C, Jalil NBA, Ayyasamy RK, Akbar R (2019) The essential of sentiment analysis
and opinion mining in social media: introduction and survey of the recent approaches and
techniques. In: 9th IEEE symposium on computer applications & industrial electronics. IEEE
Press, Malaysia, pp 272–277. https://doi.org/10.1109/iscaie.2019.8743799
9. Golande A, Kamble R, Waghere S (2016) An overview of feature based opinion mining. In:
Corchado Rodriguez J, Mitra S, Thampi S, El-Alfy ES (eds) Intelligent systems technologies
and applications. Advances in intelligent systems and computing, vol 530. Springer, Cham, pp
633–645. https://doi.org/10.1007/978-3-319-47952-1_51
10. Emam A, Alzahrani M (2017) Opinion mining techniques and tools: a case study on an Arab
newspaper. In: IEEE international conference on computational science and computational
intelligence (CSCI). IEEE Press, Las Vegas, NV, USA, pp 292–296. https://doi.org/10.1109/
csci.2017.49
11. Sindhura, Sandeep Y (2015) Medical data opinion retrieval on Twitter streaming data. In: IEEE
international conference on electrical, computer and communication technologies (ICECCT),
IEEE Press, Coimbatore, India. https://doi.org/10.1109/icecct.2015.7226043
12. Bhavitha BK, Rodrigues AP, Chiplunkar NN (2017) Comparative study of machine learning
techniques in sentimental analysis. In: IEEE international conference on inventive communi-
cation and computational technologies. IEEE Press, Coimbatore, India, pp 216–221. https://
doi.org/10.1109/icicct.2017.7975191
13. Eshak MI, Ahmad R, Sarlan A (2017) A preliminary study on hybrid sentiment model for
customer purchase intention analysis in social commerce. In: IEEE conference on big data and
analytics. IEEE Press, Kuching, Malaysia, pp 61–66. https://doi.org/10.1109/icbdaa.2017.828
4108
14. Rani MS, Sumathy S (2017) Analysis on various machine learning based approaches with
a perspective on the performance. In: IEEE innovations in power and advanced computing
technologies. IEEE Press, Vellore, pp 1–7. https://doi.org/10.1109/ipact.2017.8244998
15. Sankar H, Subramaniyaswamy V (2017) Investigating sentiment analysis using machine
learning approach. In: IEEE international conference on intelligent sustainable systems. IEEE
Press, Palladam, India, pp 87–92. https://doi.org/10.1109/iss1.2017.8389293
16. Usha MS, Devi MI (2013) Analysis of sentiments using unsupervised learning techniques.
In: IEEE international conference on information communication and embedded systems
(ICICES). IEEE Press, Chennai, India. https://doi.org/10.1109/icices.2013.6508203
17. Das MK, Padhy B, Mishra BK (2017) Opinion mining and sentiment classification: a review.
In: IEEE international conference on inventive systems and control. IEEE Press, Coimbatore,
India, pp 1–3. https://doi.org/10.1109/icisc.2017.8068637
18. Jeyapriya A, Selvi CSK (2015) Extracting aspects and mining opinions in product reviews
using supervised learning algorithm. In: 2nd IEEE international conference on electronics and
communication systems. IEEE Press, Coimbatore, India, pp 548–552. https://doi.org/10.1109/
ecs.2015.7124967
19. Biltawi M, Etaiwi W, Tedmori S, Hudaib A, Awajan A (2016) Sentiment classification tech-
niques for Arabic language: a survey. In: 7th IEEE international conference on information
and communication systems. IEEE Press, Irbid, Jordan, pp 339–346. https://doi.org/10.1109/
iacs.2016.7476075
20. Shahnawaz, Astya P (2017) Sentiment analysis: approaches and open issues. In: IEEE interna-
tional conference on computing, communication and automation. IEEE Press, Greater Noida,
India, pp 154–158. https://doi.org/10.1109/ccaa.2017.8229791
21. Kumar, Vinoth V et al (2022) Aspect based sentiment analysis and smart classification in
uncertain feedback pool. Int J Syst Assur Eng Manage 13(1):252–262
584 G. Srivastava et al.
22. Kumar A et al (2022) Sentic computing for aspect-based opinion summarization using multi-
head attention with feature pooled pointer generator network. Cogn Comput 14(1):130–148
23. Aggarwal CC (2022) Opinion mining and sentiment analysis. In: Machine learning for text.
Springer, Cham, pp 491–514.
24. Liu B (2012) Sentiment analysis and opinion mining. Synthesis Lect Human Lang Technol
5(I), 1–167
25. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth
ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp
168–177
26. Qiu G, Liu B, Bu J, Chen C (2011) Opinion word expansion and target extraction through
double propagation. Comput Linguist 37(1):9–27
27. Liu Q, Gao Z, Liu B, Zhang Y (2013) A logic programming approach to aspect extraction in
opinion mining. In: 2013 IEEE lWICIACM international joint conferences on web intelligence
(WI) and intelligent agent technologies (IAT), vol I. IEEE, pp 276–283
28. Banjar A, Ahmed Z, Daud A, Ayaz Abbasi R, Dawood H (2021). Aspect-based sentiment
analysis for polarity estimation of customer reviews on twitter. Comput Mater Continua
67(2):2203–2225
The Problems and Organization
of Learning Using Distance Educational
Technologies: Practical Issues
Aliya Katyetova
Abstract Since the beginning of the pandemic 2020, educational system in every
country has offered and introduced new technologies and instruments for distance
education. Each school independently chose an Internet platform through which
distance learning was organized. Parents and children are worried about Internet
connection and their speed because there are problems with it everywhere, in
urban and rural areas, also regarding technology. The results of distance learning
in Kazakhstan during the pandemic indicate the insufficient effectiveness of national
telecommunications networks. The author of the article has experience in teaching
distance learning technologies at universities and outlines the problems and basic
requirements for the organization of the educational process with the use of distance
educational technologies in the Republic of Kazakhstan, namely, the practical ques-
tions based on what standards and rules, whereby and how to execute them in the orga-
nization of education including a primary school. The author provides a comparative
study of learning management systems used in Kazakhstan education and suggests
that a future task is the identification of the student in distance learning, which can be
performed using facial recognition of students during authorization. Furthermore, the
author plans to conduct a questionnaire survey among teachers, which will demon-
strate the situation during distance learning in computer science lessons at primary
schools. These practical issues will help to avoid the main distance learning problems
of schoolchildren. The present paper is useful for teachers and school management
in the implementation and control of educational activities.
A. Katyetova (B)
Eotvos Lorand University, Budapest, Hungary
e-mail: akatyetova@inf.elte.hu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 585
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_45
586 A. Katyetova
1 Introduction
In the spring of 2020, every school in the world including Kazakhstan moved on to a
new form of education–distance education. This required the Ministry of Education
and Science to revise approaches to the educational process.
Currently, Kazakhstan’s education is going through a period of reforms. The
development of domestic education is focused on the best world and European stan-
dards. Among them are digitalization and automation of the educational process and
the introduction of electronic textbooks in schools in Kazakhstan. According to the
instructional and methodical letter “About the features of the educational process in
secondary education organizations of the Republic of Kazakhstan in the 2021–2022
academic year,” the subject “Digital literacy” was introduced in the first grade from
January 1, 2022. The name of the subject "Information and communication tech-
nologies" in primary school has been changed to "Digital literacy" [1]. Computer
science textbooks have been developed for the youngest students: Now, first-graders
will be taught the basics of computer literacy.
In the national project “Quality education “Educated Nation,”” a focus places on
the availability of school content from home 24/7 and the provision of Internet at
least 100 Mbit/s.
During COVID-19, many schools in Kazakhstan introduced distance educational
technologies. In each educational institution, there had been problematic issues to
do with implementation. This article was motivated by the author’s previous work
in e-learning and provides some suggestions about the means and rules by which
distance learning is organized.
Concerning Learners
Learners (students) who want to be trained on educational technologies (DET) must
write an application addressed to the head of an organization/school with a reasoned
justification for the possible use of DET in training with the provision of supporting
documents from the parents and/or recommendations of medical and pedagogical
consultation.
The head of the education organization should specify the technologies. It should
include:
590 A. Katyetova
The education organization has two days to familiarize students and their parents
with the plan and timetable of an educational process used in e-learning.
E-Pedagogy in Disciplines
To provide school students with teaching materials, educational providers need
to have an e-learning strategy, particularly their electronic educational methodical
complexes on all subjects/disciplines of the curriculum (EEMCD), implemented
using DET. The preparation of electronic teaching materials such as a set of a training
package, teaching and methodical materials, handouts, visual, audio, video, and
multimedia materials for educational discipline needs to be provided by the devel-
oper of the course (teacher) based on the approved work training programs developed
in accordance with the curriculum.
Nowadays, many educational institutions use “case technology” where teaching
materials are clearly structured and appropriately collected in a special set (case).
It is therefore recommended that electronic educational methodical complex of the
subject includes obligatory and optional sets of EEMCD structure which is defined
below.
The compulsory suite of electronic educational and methodical complexes of
disciplines should consist of:
• the working learning program including the content of the discipline, calendar
thematic plan, a list of references for recommended reading (basic and supple-
mentary), the modular partition of discipline, the schedule for distance consulta-
tions;
• electronic lecture notes;
• materials for practical works and seminars;
• tasks for students’ self-study and self-study-led teacher;
• formative materials (test assignments of individual tasks);
• summative assessment materials (test examination tasks, exam questions, tickets,
examinational control works).
technical support for the learning process by DET, and having in its composition the
following offices:
• administration of the educational process by DET;
• design of didactic means of DET;
• information and technical support.
The administration office of the educational process by DET plans and organizes
the educational process by DET, shall keep such records, related to the DET, and
plans to increase the training of teachers and tutors. Further, this service organizes
the collection of all control materials of students (control and course works, essays,
papers, written examination papers, and the like) in paper form and/or electronic
form. And the transfer of these materials to tutors to assess knowledge, and provide
entry of information on current progress, received from the tutors in the information
database.
The DET pedagogical department works on the development, acquisition, and
mastering of electronic textbooks, multimedia courses, methodical manuals, devel-
oping test systems and other means of control of knowledge, and techniques of using
information resources for DET.
The information and technical support office designs, produces, and maintains
software manuals, as well as informational and technical distance learning tools.
As claimed by Dipak R. Kawade et al., “the Internet of Things (IoT) is known as the
most profound technology which connects most of the digital devices with the help
of the internet” [7]. IoT means the connection between people, people with things,
and vice versa. Likewise, automated technology can access all information through
the Internet and smart devices.
All people use computers or smartphones. By using this modern and changing
world of electronics and telecommunication industry in education, we enhance the
performance of primary education. IoT is a new technology that uses various methods
and modern technologies to improve performance and gain higher accuracy.
Essentially, e-learning is one type of distance education. Under the legal frame-
work for distance education, technologies are implemented by “case technol-
ogy” where teaching materials are clearly structured and appropriately collected
in a special set (case), that is, network, TV—technologies in online and offline
modes. In distance education, the subjects are students, pedagogical staff, and those
organizations that implement training programs for additional education (schools,
colleges).
As highlighted by Al Rawashdeh et al., IoT helps teachers and students to provide
access to course content in digital form and exchange knowledge, while increasing
the effectiveness of learning by expanding interaction between teachers and students
through online forums, knowledge sharing, and content sharing [8].
Each school chooses an Internet platform for the organization of the educational
process in a distance learning format with the appropriate and functional infrastruc-
ture and educational content based on the principal teachers’ requests [9]. These
platforms stimulate learning through various functions that include online course
development, assessment, and monitoring of activities for students and teachers.
In conformity with the Kazakhstan methodological recommendations on the orga-
nization of the educational process in secondary education organizations during the
period of restrictive measures related to preventing the spread of coronavirus infec-
tion, one exercise in a playful form was provided for students of grades 3 and 4 on
the subject of ICT in a distance format for one lesson [9]. Traditional educational
methods have fulfilled the need of students in this digital era, including the pandemic
The Problems and Organization of Learning Using Distance … 593
time, of 2020. At the same time, there was an urgent need to acquire new technologies
and techniques that would meet the students’ needs. IoT is one of the best ways for
this problem.
For example, during the pandemic in 2020, smartphones were used in Kazakhstan
schools. These means were the main instruments for distance learning in each grade.
As part of the monitoring of distance learning in schools of JSC “Information-
Analytic Centre” as part of emergency distance learning in Kazakhstan, a survey
was conducted, which was attended by about 20 thousand students, parents, and
teachers of the country. According to the survey results in the IV term of the 2019–
2020 academic year, 92% of the students used their smartphones as a learning tool.
However, despite the convenience of connecting via a smartphone, this device is not
the most effective for learning [10].
Teachers chose the Zoom video service for video lessons, video conferences, and
online meetings. More advanced teachers used the Teams platform. Students and
teachers used WhatsApp messenger daily. The WhatsApp application allowed them
to connect to a video conference without spending a large amount of traffic, whereas
these difficulties were when connecting through the Zoom application [10].
Besides that, the school program was fully digitalized. Nowadays, more than 12
thousand online lessons are available in two languages, Kazakh and Russian, on all
electronic devices.
“Online mektep” was the most visited online school Web site during the COVID-
19 (see Fig. 4) [11]. Almost the entire school curriculum is presented here: more than
24 thousand lessons under the standards of the Ministry of Education and Science
of the Republic of Kazakhstan. Many resources are available in Kazakh, Russian,
and English languages. What is more, the mobile application “OnlineMektep” is
available on gadgets.
On this Web site, teachers prepare homework and distribute it to all students.
Students can solve these tasks, and teachers can give grades. Based on the students’
grades and individual characteristics of students, the teacher can update their teaching
material or create new material. Teachers can prepare professional topics using smart
devices that will increase student attendance and involve students in classes. It can
be used to create an attractive presentation of educational programs that will help
students focus on the topic, better understand the topic, and memorize information
on the topic for a long time. Furthermore, it will encourage the learning process and
students to have a positive attitude toward learning. IoT will increase readability for
teachers, as well as reduce learning time.
As a result, the author believes that digital literacy is improved. In the book
“Role of information and communication technology during COVID-19” Nidhi,
S. & Meetender hold the same opinion and state that both the students and teachers
594 A. Katyetova
Fig. 4 Online school Web site with ICT materials and tasks for primary schoolchildren [11]
have grown up as good students to study and use digital tools during COVID-19,
which has allowed them to increase their level of digital literacy [12].
recorded how their online learning was going and what problems arose. In total, data
were collected from 110 people (55 students of grades 8–10, 55 parents of primary
schoolchildren of grades 2–4) from 5 regions of the country, where the responses of
students from cities and villages were evenly presented. Among the most frequently
encountered problems, students and parents noted the low quality and speed of the
Internet, a heavy burden on teachers, but at the same time their low involvement in
the learning process, low level or lack of control, knowledge gaps, maintaining the
status quo in assessing students’ knowledge, incomplete educational resources and
applications, and other resulting problems [13].
It was not only on the Internet but in programs and Web sites that are not fully final-
ized. They had many disadvantages. In particular, the electronic diary Kundelik.kz,
unfortunately, does not always open and is not filled properly. This Web portal is
responsible for setting and viewing grades, and attendance marks, and issuing and
viewing homework assignments.
A domestic platform “Sphere” was developed. It has become a platform for
distance learning for schoolchildren in Almaty city. The educational content was
developed through the Bilim Media Group Company. Teachers were trained to
successfully conduct distance learning.
The software environment Opiq.kz allows not only to distribute, but also to create
electronic textbooks. It offers students electronic textbooks from grades 1 to 11. Opiq
electronic textbooks are available on any device: desktop computer, laptop, tablet,
or smartphone. The interface adapts automatically. The cost of the license is 1300
tenge per month, for a set of 85 textbooks from grades 1 to 11.
Simultaneously, video lessons were shown on TV, which served as additional
material for teaching schoolchildren. Unfortunately, as noted in the Ministry of
Education, not all settlements of the country could receive a television picture of
educational channels. These channels were unavailable in more than 600 villages,
and more than 300 students live in them [14].
Below is one world example from Arizona State University (ASU), USA, how
the university responded to the educational challenges presented by the COVID-19
pandemic:
ASU has ASU Prep Digital (ASUPD) which is an accredited online school for
students and other users, who want to take a single online course or to learn full-time.
ASUPD, constantly promoting the transition to blended learning with local schools,
provided guidance and support, continuing to support more than 10,000 learners.
As Carole G. Basile notes, “ASUPD was able to adapt the following initiative to
support schools during the disruptions caused by the pandemic. ASU immediately
responded by launching a reliable set of free online educational resources to support
the transition to distance learning for students and teachers at the national level. This
platform is called ASU for You and includes online classes for teaching students,
access to the main materials of the ASUPD course, a library of instructional videos
that help teachers and parents move to a new level, as well as full support for learning
in schools. The platform ensures consistency between schools and provides teachers
with tools that complement their current distance learning plans by providing metrics
and assessments to make informed decisions” [15].
596 A. Katyetova
This study contributes to the theory and practice of organizing education using
distance learning technologies. It seems currently, electronic textbooks (e-books)
are already familiar to everyone in the education system of Kazakhstan. They have
become reliable helpers to schoolchildren and students in their independent indi-
vidual learning. Their effectiveness has been proven in many schools and universities
in Kazakhstan. When using e-books, the quality of academic performance increases
two to three times; at the same time, the duration of training is reduced. Thus, you can
learn quickly and efficiently. Additionally, e-books are useful for distance learning;
this was especially noticeable during the pandemic in 2020 when all educational
institutions were forced to switch from the traditional education system to a full
distance learning form. Thus, we can single out Estonia’s experience in the digital-
ization of education. Over the past three years, the use of electronic textbooks in
Estonia has increased tenfold, and the educational literature industry has turned into
a real provider of digital services. There, the introduction of electronic books into
the educational process solved another important task—relieved children of heavy
school backpack. Paper textbooks are provided to students by the school. At the
end of the lessons, books remain on the shelves in school classrooms. Pupils study
at home using electronic textbooks [17]. Estonia has EdTech Opiq, an interactive
digital learning materials platform that replaces all old school textbooks [18].
Many universities have already realized this and have developed or are
developing their distance learning systems in one form or another, including
using e-books. Now, almost all universities in Kazakhstan are training personnel
using distance learning technologies: Web sites and portals have been opened;
special services have been created, and an educational and methodolog-
ical base is being developed. For example, the following links to universi-
ties confirm this: https://satbayev.university/en/second-education, https://polytecho
nline.kz, https://www.kaznu.kz/ru/17959/page/, https://www.keu.kz/ru/edu/distantsi
onnoe-obuchenie.html. The Moodle and Platonus systems are taken as the basis of
the portals, and their developments are also used. Portal sections allow for feedback
The Problems and Organization of Learning Using Distance … 597
between the participants of the process DET. Communication of training portals and
automated information systems of distance learning is implemented.
In 2008, by order of the Ministry of Education and Science dated July 22, 2008, a
list of basic educational organizations was determined in the priority direction of the
development of higher and postgraduate education—the development of distance
learning technologies—Kazakh-Russian University and Karaganda University of
Kazpotrebsoyuz. However, this order was canceled in 2016 by the order of the
Minister of Education and Science [19]. Despite this, higher education institutions
with a special status can introduce new learning technologies, including distance
learning technologies [20].
The superiority in the introduction of distance learning technology in the Republic
of Kazakhstan rightfully belongs to the Kazakh-Russian University (KRU), whose
educational activities since the very foundation of the university (1998–1999) have
been associated with the use of information and satellite educational technology
of the Modern Humanitarian Academy in Moscow [21]. This technology includes
dozens of licensed automated systems that allow users to fully and cost-effectively
deliver educational materials anywhere in the world, provide feedback, and monitor
student progress.
International scientific and practical conferences on distance learning technolo-
gies were held annually based on KRU. In 2009, the Ministry of Education and
Science confirmed the competence to conduct short-term courses on distance learning
technologies with the right to issue certificates of the established sample of the Insti-
tute of Advanced Training and Retraining (IATR) of Personnel of the KRU. IATR
KRU has conducted advanced training for more than 2000 teachers and staff from
more than 30 national, state, and non-state educational organizations. One of the
lecturers of the advanced training courses on DET was the author of the present
article, who trained teachers on the development of e-books and EEMCD.
However, the announced educational portals and Web sites carried quite a bit of
information intended for participants in the educational process. Most of the sites
had a limited list of sections that could satisfy only a person who knew nothing
about this university with information, i.e., they performed only the functions of a
business card. Definitely, such information was necessary, but the fact that the bulk
of universities was limited only to this opportunity did not cause much satisfaction.
The main task of the university Web site is to help its students master academic
disciplines, prepare for seminars, tests and exams, and contain information about the
individual plan—the schedule of classes, tests, and exams, the topics of term papers,
and theses, events at the university and faculty scale. The same applies to school Web
sites.
There were various issues of intellectual property protection, development of high-
quality content, identification of the student’s identity, and other questions, which
should be reflected in the regulatory framework.
Along with this, schools use their developments or a single EDUPAGE platform.
The author together with colleagues from IATR KRU helped and advised univer-
sities and other educational organizations on the development of the interface of the
university portal on DET, digital textbooks, EEMCD, and introduced teachers and
598 A. Katyetova
university staff to the rules and approaches to the successful implementation of DET
in the educational process and the development of regulatory documents.
Several universities, after completing advanced training courses and using
distance educational technologies, have implemented international double-degree
programs for master and Ph.D. doctoral studies.
5 Discussion
Vidakis and Charitakis state that the active development of information and commu-
nication technologies (ICT) and their application in everything and everywhere,
especially in the field of education, has prompted educational institutions around the
world to introduce new technologies into teaching and learning processes [22].
Gupta and Bansal claim “The educational institutions and students are now ready
to understand and accept the online teaching methodologies and approaches, digital-
ization brings with it. A number of new technologies and platforms have appeared”
[23].
Thus, learning management systems (LMSs) help educational institutions to
provide and implement training programs. For a software product to be included
in the category of learning management software systems, the system must meet the
criteria:
• allow courses and teaching materials to be contained in a centralized system
accessible to students for learning purposes;
• store reports on the training progress of individual participants and the perfor-
mance of training programs as a whole;
• allow customizing training programs by individual needs;
• provide opportunities for building plans and schedules and tracking the passage
of training courses and disciplines.
Increasingly, Kazakhstan educational institutions use in the educational process
such tools of LMS as EDUPAGE, ONLINEMEKTEP.ORG, PLATONUS, and
CANVAS. If compare these tools with the most popular LMS, for instance,
MOODLE, they are used for the development, management, and distribution of
educational, online materials with the provision of shared access by all interested
stakeholders (teachers, students, the administration of the educational institution,
parents). As well, they are a repository of educational materials and students using
DET have their training plans with a schedule of classes. Responsible managers and
teachers too can create and add new modules, subjects, and lessons to the LMS. They
set deadlines for completing homework and manage assessment criteria and grades.
These systems allow for storage reports on the progress of individual participants’
learning.
Table 1 provides information about LMS tools used in Kazakhstan educational
organizations.
The Problems and Organization of Learning Using Distance … 599
According to the study, EduPage and Moodle systems are popular and frequently
used in Kazakhstan educational organizations. At the same time, teachers and
students also use the information system Platonus. However, not all tools have open
source code such as MOODLE.
Each educational institution has the right to independently choose any educational
platform that meets the needs of teachers and students.
All the mentioned tools allow teachers and students to download and work with e-
books that motivate students for creative work. This is an environment that is familiar
to a child: Gadgets and computer technologies make school classes interesting and
encourage new ideas to be generated.
In addition, together with practical issues and fundamental requirements for orga-
nizing the educational process in the Republic of Kazakhstan using DET, this study
can help the schools and universities in solving the future task—the student identity
in distance learning. This may be the recognition of a student’s face using a Webcam.
6 Conclusion
Based on the above, it can be stated that educational organizations desire to use
distance learning technologies in their educational process must adhere to the regu-
latory framework for distance education and follow all the rules and requirements
described in this paper.
At the same time, the government of Kazakhstan is working to update the subject
of “ICT” in schools, the introduction of programming lessons in primary grades, and
600 A. Katyetova
References
1. Instructional and Methodical Letter (2021) About the features of the educational process in
secondary education organizations of the Republic of Kazakhstan in the 2021–2022 academic
year, official website of the Y. Altynsarin National Academy of Education. http://uba.edu.kz/
storage/app/media/IMP/IMP_2021-2022_kaz.pdf. Last accessed 27 July 2022
2. On amendments to the Order of the Minister of Education and Science of the Republic of
Kazakhstan dated March 20, 2015 No. 137 “On approval of the Rules for the organization of
the educational process on distance learning technologies”—Information and legal system of
regulatory legal acts of the Republic of Kazakhstan “@dilet”. https://adilet.zan.kz/rus/docs/
V2100025038
3. International Association of Universities. COVID-19: Higher Education challenges and
responses, https://www.iau-aiu.net/COVID-19-Higher-Education-challenges-and-responses.
Last accessed 21 July 2022
4. Distance education from the oldest domestic university. https://www.keu.kz/ru/edu/distantsi
onnoe-obuchenie.html
5. Humanitarian College of Astana International University. https://agk.edupage.org/user/?
6. Al-araibi, AAM, Mahrin MNb, Yusoff RCM (2019) Technological aspect factors of E-learning
readiness in higher education institutions: Delphi technique. Educ Inf Technol 24:567–590
7. Dipak K, Kavita O, Poornima N (2018) IOT in primary education. In: International interdis-
ciplinary conference on curriculum reforms in higher education: global scenario (IICCRHE-
2018). Shivaji University Kolhapur
8. Al Rawashdeh AZ et al (2021) Advantages and disadvantages of using e-learning in university
education: analyzing students’ perspectives. Electron J e-Learn 19:107–117
9. Metodiqeckie pekomendacii po opganizacii yqebnogo ppocecca v opganizaciЯx
cpednego obpazovaniЯ v pepiod ogpaniqitelbnyx mep, cvЯzannyx c nedopyweniem
pacppoctpaneniЯ koponavipycno. infekcii/Methodological recommendations on the
organization of the educational process in secondary education organizations during the period
of restrictive measures related to preventing the spread of coronavirus infection. The order of
the Minister of Education and Science of Kazakhstan, № 548 (2020). https://www.gov.kz/mem
leket/entities/kdso/documents/details/63523?lang=ru
10. The National report on the state and development of the education system of the Republic
of Kazakhstan as of 2020 (2021) naczionalnyj-doklad-po-itogam-2020_kaz.pdf (iac.kz), pp
201–207
The Problems and Organization of Learning Using Distance … 601
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 603
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_46
604 S. Sarkar et al.
of information retrieval [2]. Additionally, the QA has been utilized to create dialogue
systems and chatbots designed to simulate human conversation. There are two main
procedures for processing questions. The first step is to examine the structure of the
user’s query. The second step is to convert the question into a meaningful question
formula that is compatible with the domain of QA [3]. The majority of modern NLP
problems revolve around unstructured data. This entails extracting the data from the
JSON file, processing it, and then using it as needed. An implementation approach
categorizes the task of extracting answers from questions into one of four types:
1. IR-QA (Information retrieval based)
2. NLP-QA (Natural language processing based)
3. KB-QA (Knowledge based)
4. Hybrid QA.
2 General Architecture
The following is the architecture of the question answering system: The user asks a
question. This query is then used to extract all possible answers for the context. The
appropriate architecture of a question answering system is depicted in the Fig. 1.
The overall function of the question processing module, given a question as an input,
is to process and analyze the input question so that the machine can understand the
context of the question.
After giving the question as an input, the next big task is to parse the entire context
passage to find the appropriate answer locations. The related results that satisfy the
given queries are collected in this stage in accordance with the rules and keywords.
The similarity is checked after the document processing stage to display the related
answer. Once an answer key has been identified, a set of heuristics is applied to it
in order to extract and display only the relevant word or phrase that answers the
question.
3 Background
“Can digital computers think?” was written by Alan Turing in 1951. He asserted that
a machine could be said to be thinking if it could participate in a conversation using a
teleprinter and imitate a human completely, without any telltale differences. In 1952,
the Hodgkin–Huxley model [5] showed how the brain creates a system that resembles
an electrical network using neurons. According to Hans Peter Luhn [6], “the weight
of a term that appears in a document is simply proportional to the frequency of
the term”. Artificial intelligence (AI), natural language processing (NLP), and their
applications have all been influenced by these events. The BASEBALL program,
created in 1961 by Green et al. [7] for answering questions about baseball games
played in the American league over the course of a season, is the most well-known
early question answering system. The LUNAR system [8], created in 1971 to aid lunar
geologists in easily accessing, comparing, and evaluating the chemical composition
of lunar rock and soil during the Apollo Moon mission, is the most well-known
piece of work in this field. A lot of earlier models, including SYNTHEX, LIFER,
and PLANES [9], attempted to answer a question. Figure 2 depicts the stages of
evolution of the NLP models.
606 S. Sarkar et al.
4 Benchmarks in NLP
Benchmarks are basically some set of some standard used for assessing the perfor-
mance of different systems or models agreed upon by large community. To ensure
that the benchmark is accepted by large community, people use multiple standard
benchmarks. Some of the most renowned benchmarks that are used largely are as
follows: GLUE, SuperGLUE, SQuAD1.1, and SQuAD2.0
4.2 SuperGLUE
SQuAD2.0 or Stanford Question Answering Dataset combines all the 100,000 ques-
tions in SQuAD1.1 with over 50,000 unanswerable questions written so that it may
look similar to answerable ones. SQuAD2.0 tests the ability of a system to not only
answer questions when possible, but also determine when no answer can be found
in the comprehension. Currently, the IE-NET (ensemble) by RICOH_SRCB_DML
is leading the scoreboard with EM score of 90.93 and F1 score of 93.21.
5 Research
In this systematic literature review (SLR), we tried to address the various steps
based on the guidelines provided by the Okoli and Schabram [11], Keele [12], which
emphasizing as: Purpose of the Literature Review, Searching various Literature,
Practical Screen, Quality Appraisal, and Data Extraction. The amount of written
digital information has increased exponentially, necessitating the use of increasingly
sophisticated search tools. Pinto et al. [13], Bhoir and Potey [14]. Unstructured data is
being gathered and stored at previously unheard-of rates, and its volume is growing.
Bakshi et al. [15], Malik et al. [16], and Chali et al. [17], among others. The main
difficulty is creating a model that can effectively extract data and knowledge for
various tasks. The tendency in this situation of the question answering systems is to
glean as many answers from the questions as you can. This SLR will be guided by
the research questions in Table 1 in an effort to comprehend how question answering
systems techniques, tools, algorithms, and systems work and perform, as well as
their dependability in carrying out task.
We gathered as many journals and papers written in English in different digital
libraries and reputed publications through the various keywords and tried to provide
some strong evidence related to the research questions that have been tabulated
earlier.
RQ_1: Fig. 3 tried to show the popularity of various models on the basis of the
number of paper published in the category in every year. Here, we can observe that
the BERT-based model is the most popular in this category.
608 S. Sarkar et al.
RoBERTa
GPT-2
BERT
T5
XL-net
RQ_2: Fig. 4 tries to show the various question answering fields the QA models
are used. We can see that general domain QA is dominantly used here.
RQ_3: The fine-tuning of different models have given rise to various improve-
ments in the existing models. Moreover, using the different techniques over the
existing model can give rise to different model which can improve the existing the
model. For Example: The different BERT-based models like AlBERT, RoBERTa,
DistilBERT with different parameters are used according to the need as shown
in Table 2.
6%
88 % 2%
2%
1%
Table 2 Different
Tasks BERT T5 GPT-2
application using different
models Language modeling 4 3 3
Text generation 1 3 3
Question answering 7 4 3
The Task of Question Answering in NLP: A Comprehensive Review 609
RQ_4: This is the main purpose of the literature review. This question is answered
in support with Table 3. Many papers have been taken into consideration for this
comparison [8, 18–38]. Here, we took only three models as these are the main base
models that predominate the question answering domain.
6 Conclusion
References
1. Abdi A, Idris N, Ahmad Z (2018) QAPD: an ontology-based question answering system in the
physics domain. Soft Comput 22(1):213–230
2. Cao YG, Cimino JJ, Ely J, Yu H (2010) Automatically extracting information needs from
complex clinical questions. J Biomed Inform 43(6):962–971
3. Joulin A, Grave E, Bojanowski P, Mikolov T (2016) Bag of tricks for efficient text classification.
arXiv preprint arXiv:1607.01759
4. Allam AMN, Haggag MH (2012) The question answering systems: a survey. Int J Res Rev Inf
Sci (IJRRIS) 2(3)
5. Hamed SK, Ab Aziz MJ (2016) A question answering system on Holy Quran translation based
on question expansion Technique and neural network classification. J Comput Sci 12(3):169–
177
610 S. Sarkar et al.
6. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation
of machine translation. In: Proceedings of the 40th annual meeting of the Association for
Computational Linguistics, pp 311–318
7. Hyndman RJ, Koehler AB (2006) Effect of question formats on item endorsement rates in web
surveys. Int J Forecast 22(4):679–688
8. Liang T, Jiang Y, Xia C, Zhao Z, Yin Y, Yu PS (2022) Multifaceted improvements for
conversational open-domain question answering. arXiv preprint arXiv:2204.00266
9. Sun Y, Wang S, Li Y, Feng S, Tian H, Wu H, Wang H (2020) Ernie 2.0: a continual pre-training
framework for language understanding. In: Proceedings of the AAAI conference on artificial
intelligence, vol 34, no 05, pp 8968–8975
10. Hogan A, Blomqvist E, Cochez M, d’Amato C, Melo GD, Gutierrez C et al (2021) Knowledge
graphs. Synthesis Lectures on Data, Semantics, and Knowledge 12(2):1–257
11. Okoli C, Schabram K (2010) A guide to conducting a systematic literature review of information
systems research
12. So D, Mańke W, Liu H, Dai Z, Shazeer N, Le QV (2021) Searching for efficient transformers
for language modeling. Adv Neural Inf Process Syst 34:6010–6022s
13. Turing AM (1951) Can digital computers think? The Turing test: verbal behavior as the hallmark
of intelligence, pp 111–116
14. Bhoir V, Potey MA (2014) Question answering system: a heuristic approach. In: The fifth inter-
national conference on the applications of digital information and web technologies (ICADIWT
2014). IEEE, pp 165–170
15. Bakshi K (2012) Considerations for big data: architecture and approach. In: 2012 IEEE
aerospace conference. IEEE, pp 1–7
16. Malik N, Sharan A, Biswas P (2013) Domain knowledge enriched framework for restricted
domain question answering system. In: 2013 IEEE international conference on computational
intelligence and computing research. IEEE, pp 1–7
17. Chali Y, Hasan SA, Joty SR (2011) Improving graph-based random walks for complex question
answering using syntactic, shallow semantic and extended string subsequence kernels. Inf
Process Manage 47(6):843–855
18. Yao X (2014) Feature-driven question answering with natural language alignment. Doctoral
dissertation, Johns Hopkins University
19. Zhang J, Zhang H, Xia C, Sun L (2020) Graph-BERT: only attention is needed for learning
graph representations. arXiv preprint arXiv:2001.05140
20. Zhang X, Hao Y, Zhu XY, Li M (2008) New information distance measure and its application
in question answering system. J Comput Sci Technol 23(4):557–572
21. Mozafari J, Fatemi A, Nematbakhsh MA (2019) BAS: an answer selection method using BERT
language model. arXiv preprint arXiv:1911.01528
22. Sun C, Qiu X, Xu Y, Huang X (2019) How to fine-tune BERT for text classification? In: China
national conference on Chinese computational linguistics. Springer, Cham, pp 194–206
23. Wang A, Cho K (2019) BERT has a mouth, and it must speak: BERT as a Markov random field
language model. arXiv preprint arXiv:1902.04094
24. Wang Z, Ng P, Ma X, Nallapati R, Xiang B (2019) Multi-passage BERT: A globally normalized
BERT model for open-domain question answering. arXiv preprint arXiv:1908.08167
25. Yang W, Xie Y, Lin A, Li X, Tan L, Xiong K, Li M, Lin J (2019) End-to-end open-domain
question answering with BERTserini. arXiv preprint arXiv:1902.01718
26. Kale M, Rastogi A (2020) Text-to-text pre-training for data-to-text tasks. arXiv preprint arXiv:
2005.10433
27. Lin BY, Zhou W, Shen M, Zhou P, Bhagavatula C, Choi Y, Ren X (2019). CommonGen: a
constrained text generation challenge for generative commonsense reasoning. arXiv preprint
arXiv:1911.03705
28. Ribeiro LF, Schmitt M, Schütze H, Gurevych I (2020) Investigating pretrained language models
for graph-to-text generation. arXiv preprint arXiv:2007.08426
29. Agarwal O, Kale M, Ge H, Shakeri S, Al-Rfou R (2020). Machine translation aided bilingual
data-to-text generation and semantic parsing. In: Proceedings of the 3rd international workshop
on natural language generation from the semantic web (WebNLG+), pp 125–130
The Task of Question Answering in NLP: A Comprehensive Review 611
30. Moorkens J, Toral A, Castilho S, Way A (2018) Translators’ perceptions of literary post-editing
using statistical and neural machine translation. Translation Spaces 7(2):240–262
31. Ethayarajh K (2019) How contextual are contextualized word representations? Comparing the
geometry of BERT, ELMo, and GPT-2 embeddings. arXiv preprint arXiv:1909.00512
32. Frydenlund A, Singh G, Rudzicz F (2022) Language modelling via learning to rank. In:
Proceedings of the AAAI conference on artificial intelligence, vol 36, no 10, pp 10636–10644
33. Mager M, Astudillo RF, Naseem T, Sultan MA, Lee YS, Florian R, Roukos S (2020) GPT-too: a
language-model-first approach for AMR-to-text generation. arXiv preprint arXiv:2005.09123
34. Qu Y, Liu P, Song W, Liu L, Cheng M (2020) A text generation and prediction system: pre-
training on new corpora using BERT and GPT-2. In: 2020 IEEE 10th international conference
on electronics information and emergency communication (ICEIEC). IEEE, pp 323–326
35. Puri R, Spring R, Patwary M, Shoeybi M, Catanzaro B (2020) Training question answering
models from synthetic data. arXiv preprint arXiv:2002.09599
36. Wang A, Singh A, Michael J, Hill F, Levy O, Bowman SR (2018) GLUE: a multi-task bench-
mark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.
07461
37. Wang A, Pruksachatkun Y, Nangia N, Singh A, Michael J, Hill F, Levy O, Bowman S
(2019) Superglue: a stickier benchmark for general-purpose language understanding systems.
Advances in neural information processing systems, 32
38. Hsu HH, Huang NF (2022) Xiao-Shih: a self-enriched question answering bot with machine
learning on Chinese-based MOOCs. IEEE Trans Learn Technol
Monitoring False Colour in an External
Monitor for Film and Television
Production: An Evaluative Study
S. Pattanayak (B)
Amity School of Communication, Amity University Rajasthan, Jaipur, India
e-mail: Sambhram.pattanayak@gmail.com
M. M. Bishnoi
Department of Humanities Arts and Applied Science, Amity University Dubai, Dubai, UAE
e-mail: mbishnoi@amityuniversity.ae
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 613
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_47
614 S. Pattanayak and M. M. Bishnoi
1 Introduction
One of the most common reasons filmmakers choose external monitors over inbuilt
cameras is the small size of the built-in screen. A production monitor is a larger
version of a television set. It has a better propensity for accurately reading camera
signals such as fast processing, precise linearization of an input signal using the
optical-electrical transfer function, and perfect colour reproduction. Production
monitors are commonly found in production trucks, studios, shooting sets, and other
permanent control room sites. Like an on-camera monitor, a production monitor
has monitoring tools like waveform, vectorscope, false colour, and on-screen infor-
mation displays. Some also have a variety of waveforms, scopes, and histograms,
ensuring that the user has all the information they need to provide proper lighting
and exposure. Video-centric features such as focus peaking and zebras are becoming
more common, but some cameras still lack them. Only external monitors offer these
features.
Exposure is the amount of light that enters the camera’s sensor, making visual
information throughout time. Correct exposure is subjective, primarily based on a
specific visual narrative. Aperture, shutter speed, and ISO triangulate the exposure
affect and must be adjusted to take correctly exposed images. The external monitor
shows the exposure values ranging from 0 to 100 of a shot under the false colour and
changes them into various colours for easy viewing. So each pixel on the monitor
will have an exposure value colour which is an essential weapon in a filmmaker’s
arsenal. It is used to convey mood and emotion, inform the viewer about the context
of a scene, or provide information about characters and settings. A colour component
transfer function is commonly referred to as using the ‘gamma’ and the chromaticity
of a white point in video colour space. RGB chromaticities describe a colour gamut.
These numbers specify how colour data is encoded for a specific video standard.
Watching these crucial colours on an extensive professional monitor gives accurate
colour fidelity.
Earlier researches have primarily focused on HDR uniform colour space, colour
transfer, and colour correction methods. The current research problematizes absence
of appropriate exposure and examines accurate exposure reading through the false
colour technique.
Monitoring False Colour in an External Monitor for Film and Television … 615
2 Related Work
Since the year 2000, colour transfer has been actively researched. Reinhard et al.
performed colour transfer technology that aligns the source and reference images’
mean and standard deviation of colour distributions [1]. The most common format
for HDR data is linear RGB, which highly correlates with the channels. RGB pixel
values are frequently converted to luma-chroma colour spaces such as YCbCr or
Yuv, where ‘u’ and ‘v’ denote uniform chromaticity scales to reduce the influence of
pixel manipulation on one channel affecting the others [2].
Due to multi-spectral imaging techniques, many innovative multi-spectral
recording devices have been developed in recent years. Zhenghao et al. solved colour
bias and multicollinearity for an RGBN camera colour correcting. Although the
images suffer from colour desaturation, red-green-blue and near-infrared (RGBN)
cameras can simultaneously capture visible NIR information. The image colour bias
becomes apparent when photographing outdoor situations with high NIR illumi-
nation. The ordinary least squares regression (OLSR) colour-correcting result is
inadequate due to the multicollinearity in RGBN camera channels [3].
Because of time savings, modern television tales may cover much more ground
than movies. By positioning each camera at a different angle and shooting the scene
once, multiple cameras reduce the need for several shots of the same scene. This
covers a large part of a screenplay in a considerably shorter span of time than
a single camera production. Inter-camera colour consistency is an issue in multi-
camera production. Chunqiu et al. developed a highbred histogram matching (HHM)
algorithm to address this issue. This is accomplished by utilizing the cumulative
colour histogram, which primarily involves global colour mapping and local colour
straightening for uniform colour presentation among all the cameras [4].
3 Methodology
This is hands-on experimental research. A video camera and two Fresnel lamps of
800 watts each were used as the media tool. This experimental study deploys a ‘false
colour’ technique to primarily determine and visualize wavelengths that the human
eye cannot see. The camera was adjusted at various exposure levels to get the perfect
image in a stable lighting condition (see Figs. 1 and 2). With the help of the false
colour and its IRE values, the monitor shows that 0 is clipped out black and 100 is
white. The purple colour (IRE 0) denotes the absence of all colours or black. The
colours blue and blue variations (IRE 2–24) imply that it is pretty dark not much
exposure to light is available. The colour dark grey (IRE 24–42) implies that light is
leaving (or entering) a dimly lit place. The experiment’s findings reveal that a 60–70
IRE value is an ideal choice for the shot.
The experiment technique deployed various types of exposure adjustments like
f-stop, shutter speed, and gain as the primary function to get the perfect brightness.
616 S. Pattanayak and M. M. Bishnoi
Fig. 2 Pre-requisite
methods
Similarly, incoming lighting control with intensity and quality was also attuned. The
metre of illumination, expressed in footcandles or lux, is used to measure the amount
of light present in a given area.
Monitoring False Colour in an External Monitor for Film and Television … 617
4 Colour Balance
The media industry is constantly improving image quality with higher frame rates,
larger resolution, brighter colours, and better contrast to enhance the entire watching
experience for both cinema and television. Due to the chromatic adaptation process,
the human visual system perceives display colours differently depending on the envi-
ronment. The trichromatic human visual system, which has three cone sensors, is
used to develop digital cameras. If a camera wants to record colours like a human
vision, the sensor fundamentals should possibly be similar [5]. Chromatic adapta-
tion is the ability of the human visual system to preserve perceived colour appear-
ance despite variations in chromaticity reflected from an object under a wide range
of lighting conditions [6]. Different colour temperature cards are used to modify
the colour temperatures of photos when utilizing various filming equipment. High
colour temperatures produce warm and reddish images, while low colour tempera-
tures produce incredible blue images [7]. Both resolution and colour reproduction
determine a colour display’s visual quality: colours, i.e. all colours with chromatici-
ties outside the triangle associated. Because the settings of these controls choose the
best image quality, it is essential to understand how different brightness and contrast
levels affect image quality. Emerging display technologies can create pictures with
a significantly broader colour gamut than traditional cinema and television distribu-
tion gamuts, allowing for the creation of gamut extension algorithms (GEAs) to fully
utilize the colour potential of these new systems. Colour refers to all the weighted
combinations of spectral wavelengths, expressed in nanometres (nm), emitted by
the sun visible by the human eye (see Fig. 3). Through adaptation, the human eye
can perceive a dynamic range of over 14 orders of magnitude (i.e. the difference in
powers of ten between the highest and lowest luminance cvalue) in the real world
[8].
Accurate colour reproduction requires chromaticity consistency, a wide colour
gamut, and high brightness [9]. Colour reproduction is determined by the white
point brightness level, repeatable colour gamut, and constant channel chromaticity.
Most present displays are based on the trichromacy property of human vision, which
creates colours by combining three carefully chosen red, green, and blue primaries in
various proportions (see Fig. 4). The chromaticities of these primaries, which form a
triangle in the CIE xy chromaticity diagram, determine the display’s colour gamut.
Colour gamut refers to the range of colours that can be reproduced on an output
device within a colour spectrum or colour space. Every screen will display different
amounts of colour depending on how broad the gamut is.
There will be many colours that humans can perceive but that the display cannot
produce for any given three primary colours, i.e. all colours with chromaticities
outside the triangle associated with the display [10]. A wide colour gamut (WCG) is
included with HDR, bringing an even more comprehensive range of colours to the
table. WCG improves the quality of colour reproduction on the screen-redder reds,
bluer blues, greener greens, etc. Whereas a high dynamic range improves a picture’s
dynamic range (with brighter brights and darker darks), a wide colour gamut enhances
the quality of colour reproduction on the screen (with redder reds, greener greens,
bluer blues, and so on).
A majority of imaging systems can only collect, process, and display LDR content
with the ITU-R BT.709 colour gamut which is standard for high-definition television
image encoding and signal characteristics.
Monitoring False Colour in an External Monitor for Film and Television … 619
HDR technology strives to capture, distribute, and exhibit various brightness and
colour values that closely match what the human eye can see. HDR increases the
contrast range between the darker and brighter sections of an image on the screen
(the difference between the brightest whites and the darkest blacks that a television
can display is known as contrast). While standard dynamic range (SDR) capturing
and display mechanisms can only deal with 2–3 orders of brightness magnitude.
The HDR has the potential to closely mimic the human visual system by increasing
this range to 5 orders of brightness units and reproducing colour values that are
comparable to what the human eye can perceive [8] Though wide colour gamut
(WCG) seeks to improve the amount of visible colour that can be portrayed, HDR
aims to express a whole range of perceptible features from shadows to highlight with
sufficiently distinct tonal levels to avoid visual information loss.
WCG and high dynamic range HDR requirements should be applied to video
monitors for vivid and realistic displays. However, the WCG’s low transmittance rate
and the HDR’s high expenses for peak brightness capability are essential roadblocks
[11]. Pictures that were tone-mapped to standard screens are re-rendered for high
dynamic range (HDR) displays. Because these new HDR displays have a far more
comprehensive dynamic range than regular monitors, a picture generated for standard
monitors may likely appear overly bright when viewed on an HDR panel [12].
Different manufacturers’ cameras use different interpolation algorithms to
recreate the missing colour values. As a result, some systems use colour attributes to
identify the camera model [13]. HDR acquisition has not been a problem. Camera
manufacturers like Arri, Red, and Sony have recently released digital cinema cameras
that can capture 14–16 stops of dynamic range. True HDR material, on the other
hand, may capture, store, and process more than 16 stops of scene dynamic range
with either ITU-R BT.709 or ITU-R BT.709 [14]. PQ-EOTF is used to build and
save HDR master pictures. This guarantees that the HDR movie master makes better
use of the dynamic range initially taken. Dolby algorithms can deliver a better user
experience regardless of where the video was seen by adapting to the display’s capa-
bilities. Algorithms ensure that the material maintains its original creative purpose
[15]. Users can enjoy a better viewing experience with HDR images because they
provide the full dynamic range that the human visual system (HVS) can perceive
at any level of adaptation. HDR photos have been effectively used to explore the
viewing experience of presented images in the past [16].
Using an instrument to assess exposure is critical, especially when the user wants
to double-check and ensure that the monitor displays the right image. The brain
receives information about luminance and chrominance. Visible light, which the
620 S. Pattanayak and M. M. Bishnoi
eye senses from a scene, consists of variable ratios of the three primary colours red,
green, and blue (RGB). In terms of television, the colour white is composed primarily
of red (30%), green (59%), and blue (11%) signals. The false colour is much like
waveforms and histograms, in which the user can check the exposure levels of any
image. Histograms give mountain shapes broad exposure, and the waveforms can see
different RGB values across the horizontal values, whether its singular vectorscope
or parade. False colour evaluates actual brightness values from sampling data on
a coloured scale, indicating IREs ranging from 0 to 100%. False colour is bene-
ficial because it displays each image section’s Institute of Radio Engineers (IRE)
value, allowing users to identify where under/overexposure occurs. It denotes a video
stream’s overall energy (in Mv.) by measuring an amplified electronic signal.
This is a challenging scenario for a camera’s dynamic range. For instance, in a
scene with a bright summer sky in the background with a character in the shade, the
cinematographer must ensure that the background sky does not burn out and that the
subject in the shade receives adequate exposure to expose the image appropriately.
As a result, using false colours on the monitor, the sky may appear hot red. This
indicates that the highlights are being sheared. Alternatively, if the cinematographer
lowers the exposure to compensate for the sky, the subject may now be pushed into
the purple or underexposed zone. Hence, the cinematographer may use fill lights for
the character to achieve a balance.
This is commonly used in image processing to represent otherwise invisible infor-
mation. It is a well-recognized technique to show photographs in varied colour
schemes to highlight specific aspects. Purple, blue, black, grey, yellow, orange, and
red are just a few of the hues used in these images. These colours can assist the
operator in determining the amount and quality of exposure an image receives. For
example, high dynamic range (HDR) photographs are frequently represented in false
colour to depict the wide range of luminance in the photographed scene [17]. Mostly
false colour options are available on an external monitor attached to the camera. Any
digital cinema camera’s raw camera files give images encoded at high colour bit
depth with a native colour gamut that considerably surpasses DCI-specification stan-
dards, and in most cases, meets or exceeds Rec. 2020. The spectrum of a waveform
monitor displays the same type of sampling data.
On a waveform monitor, 0% indicates that no light is transformed into an electrical
signal [18]. The sensor’s highest energy state is represented by 100. This is where
the majority of the light from the sensor is turned into image data. The false colour
matrix applies a range of reference colours to the video image (See Fig. 5) to ‘fill in’
parts of the image at various IREs (illustrated in Table 1). There are usually six or
more colours representing the range of brightness across the entire video frame.
The false colour exposure check measures the camera’s image, tints specific signal
levels a particular colour, and displays the remaining as a black and white image.
A true colour image combines accurate red, green, and blue light measurements.
Although at least one nonvisible wavelength is used in a false colour image, that
band is nevertheless represented in red, green, or blue. As a result, the final image’s
colours may differ from what we expect. The inherent linkages between true colour
and greyscale films relating to the same individual exist, even though true colour and
Monitoring False Colour in an External Monitor for Film and Television … 621
100 93 to 100
90 84 to 93
80 77 to 84
70 58 to 77
60 54 to 58
50 47 to 54
40 43 to 47
30 24 to 43
20 15 to 24
10 8 to 15
0 2 to 8
−10 −7 to 2
622 S. Pattanayak and M. M. Bishnoi
greyscale videos are diverse [19]. White indicates sections of the scene that are 6
or more stops overexposed. Normal-neutral is 18% grey. Black is 6 or more stops
underexposed. The primary distinction between a true colour image and a greyscale
image is that an actual colour pixel may be considered a vector with three components
(red, green, and blue). Still, a greyscale pixel can only be considered a scalar grey
level [20].
The monitor’s resolution refers to how sharp a screen may be. It also has to do with the
size of the screen. Because the density of the pixels makes the image appear sharper
on a small screen, a lower resolution can be used. Higher resolutions are required for
a larger screen. Colours are simply one facet of an image, but they usually determine
how the image looks and gives a camera its identity.
To view the picture clarity on an inbuilt camera monitor while shooting in direct
sunlight is a difficult chore for a camera operator. Most external displays have a 2200-
nit brightness level, which is twice as bright as ordinary monitors’ 1000-nit bright-
ness. When combined with its max brightness capacity, its high pixel density, high-
quality resolution, and 1200:1 contrast ratio prevent the image from being washed
out by ultra-bright sunlight.
False colour employs the pixel data straight from the camera sensor to establish the
integrity of the values existing in the video file, rather than just using any inspection
monitor onset to determine the effectiveness of a video image. It displays them on
a colourful reference scale that can be read using a colour-coded key provided by
the camera or monitor manufacturer. Another compelling reason to invest in an on-
camera monitor is brightness, as built-in displays frequently fall short of this criterion,
or their high-brightness options quickly deplete battery life.
Based on the experiment with limited literature review and fewer available research,
materials in the past left room for other researchers. For this exploratory research, a
Panasonic video camera was used with various exposure values (dB settings—0 dB,
12 dB, and 24 dB) and two Fresnel lamps of 800 watts intensity were used for the
purpose of the experiment. The key light was placed 10mts far from the subject and
the backlight was 13mtr far in flood mode. False colour was monitored on a SWIT
external monitor. A colour represents each exposure stop. The colours blue, cyan,
green, yellow, orange, and red are presented from dark to light. The value 0 to 100,
0 is black, 100 is white, and the rest of the values in between are grey (mentioned in
Figs. 6, 7, and 8).
Monitoring False Colour in an External Monitor for Film and Television … 623
Adjusting false colours produced a beautiful and perfect exposed image (see
Fig. 9). Overexposed elements would seem red, and underexposed parts would gener-
ally appear blue. The use of false colour on colour monitors showed what was being
exposed in a frame and how much (in IER). Thus, while ensuring that there is neither
underexpose nor overexpose, ‘false colour’ primarily achieves the perfect colour
balance.
Getting the perfect exposure in any professional camera used in film and television
production is vital. This research draws out the advantages and possibilities of false
colour technology to determine different luminance values represented by different
colours that grasp the exposure levels of every section of the shot to see exactly
where under- or overexposures are occurring. This helps the cinematographer or the
camera operator to set the perfect exposure according to the available light and the
scene.
Future work will focus on the external recorder, which can record Apple ProRes
Raw up to DCI 4K60 directly from the sensor of some cameras. An external recorder
supports essential log files from major professional camera manufacturers and a
10-bit screen with brightness to monitor log gamma footage properly.
Monitoring False Colour in an External Monitor for Film and Television … 625
References
1. Reinhard E, Adhikhmin M, Gooch B, Shirley P (2001) Colour transfer between images. IEEE
Comput Graph Appl 21(4):34–41. https://doi.org/10.1109/38.946629
2. Mukherjee R, Debattista K, Bashford-Rogers T, Bessa M, Chalmers A (n.d.) Uniform color
space based high dynamic range video compression. IEEE Trans Circ Syst Video Technol 99
3. Han Z, Jin W, Li L, Wang X, Bai X, Wang H (2020) Nonlinear Regression color correction
method for RGBN cameras. IEEE Access 8:25914–25926
4. Ding C, Ma Z (2021) Multi-camera color correction via hybrid histogram matching. IEEE
Trans Circuits Syst Video Technol 31(9):3327–3337
5. Finlayson GD, Zhu Y (2021) Designing color filters that make cameras more colorimetric.
IEEE Trans Image Process 30:853–867
6. Choi K, Suk H-J (2014) User-preferred color temperature adjustment for smartphone display
under varying illuminants. Opt Eng 53(6):61708
7. Hsu W-Y, Cheng H-C (2021) A novel automatic white balance method for color constancy
under different color temperatures. IEEE Access 9:111925–111937
8. Boitard R, Pourazad MT, Nasiopoulos P (2018) Compression efficiency of high dynamic range
and wide color gamut pixels representation. IEEE Trans Broadcast 64(1):1–10
9. Kim TH, Lee YW, Cho HM, Lee IW, Choi SC (1999) Optimization of resolution and
color reproduction for color LCD by control of brightness and contrast levels. In: Tech-
nical Digest. CLEO/Pacific Rim ‘99. Pacific Rim conference on lasers and electro-optics (Cat.
No.99TH8464). doi: https://doi.org/10.1109/cleopr.1999.811568
10. Bertalmio M, Vazquez-Corral J, Zamir SW (2021) Vision models for wide color gamut imaging
in cinema. IEEE Trans Pattern Anal Mach Intell 43(5):1777–1790
11. Kwon KJ, Kim MB, Heo C, Kim SG, Baek JS, Kim YH (2015) Wide color gamut and high
dynamic range displays using RGBW LCDs. Displays 40:9–16
12. Meylan L, Daly S, Sabine S (2007) Tone mapping for high dynamic range displays
13. Chen C, Stamm MC (2015) Camera model identification framework using an ensemble of
demosaicing features. In: 2015 IEEE international workshop on information forensics and
security (WIFS), 1 Nov 2015, pp 1–6
14. Parameter values for ultra-high definition television systems for production and international
programme exchange (2015), [online] Available: https://www.itu.int/dms_pubrec/itu-r/rec/bt/
R-REC-BT.2020-2-201510-I!!PDF-E.pdf
15. Agrawal A, Agrawal A (2021) Dolby vision: advancing the technology of cinema and home
entertainment transformation of an industry. IEEE Comput Graphics Appl 41(2):96–98
16. Melo M, Bessa M, Debattista K, Chalmers A (2014) Evaluation of HDR video tone mapping
for mobile devices. Signal Process Image Commun 29(2):247–256
17. Ciftci S, Akyuz AO, Ebrahimi T (2018) A reliable and reversible image privacy protection
based on false colors. IEEE Trans Multimedia 20(1):68–81
18. Ma F, Jing X-Y, Zhu X, Tang Z, Peng Z (2020) True-color and grayscale video person re-
identification. IEEE Trans Inf Forensics Secur 15:115–129
19. Pattanayak S, Malik F, Verma M (2021) Viability of mobile phone cameras in professional
broadcasting: a case study of camera efficiency of Apple iPhone. In: International conference
on computational intelligence and knowledge economy (ICCIKE), 17 March 2021, vol 11, pp
452–456
20. Liao X, Yu Y, Li B, Li Z, Qin Z (2020) A new payload partition strategy in color image
steganography. IEEE Trans Circuits Syst Video Technol 30(3):685–696
Adaptive Gamification in E-Learning
Platforms: Enhancing Learners’
Experience
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 627
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_48
628 M. Chugh et al.
sectors like education, training, marketing, gaming, etc., specifically focusing on the
unpredictable and dynamic needs of the stakeholders [3]. Simoens et al. opine that
gamification contributes to e-learning to enhance student motivation and engage-
ment [4] and use game elements in creating worthy games that are used in the
process of teaching. E-learning facilitates content delivery as a tool, web technology,
and virtual learning environments making education possible everyplace and at any
time [5]. Gamified e-learning systems have been designed to draw learners’ attention
and interest, however, have faced failures due to uniform design for all learners [6].
Schöbel and Sollner state that all learners are different, and hence, the e-learning
systems must be designed to address the individual preferences of the learners, and
it is the primary reason for the failure of the e-learning systems [7]. Therefore, it
is essential to design tailored gamification systems that offer adaptivity of gamifi-
cation components centering on special necessities [8, 9]. As gamified e-learning is
a budding research area and only a few studies [10–12] have proposed and imple-
mented the components and operational frameworks for adaptive e-learning systems,
thus the present work has the following objectives to address the research gap.
• Exhaustive literature review on the structures of gamification suitable for
e-learning
• To recognize the design essentials of an efficacious e-learning gamified system.
• Recommend a framework architecture for an e-learning gamification platform.
The paper is organized as follows: This section introduces the significance of adap-
tive gamified e-learning systems and the objectives of the study. Section 2 examines
the methodology followed by the proposed adaptive operational gamified frame-
work for e-learning systems in Sect. 3. Section 4 presents the implications of work
for research and practice. Section 5 concludes the study and presents the future scope
of the work.
2 Methodology
The organized literature review is conducted to address the stated objectives of the
study. The research papers that are related to e-learning and gamification are included
in the study. The search was conducted related to databases indexed in Google
Scholar, Scopus, and Web of Science (WOS) indexes by entering keywords like e-
learning gamification frameworks or gamification design frameworks in e-learning,
title, abstract, and metadata, and full text includes education with the inclusion and
exclusion standards as offered in Table 1.
As to the literature reviewed, the planned adaptive operational gamification frame-
work comprises different components like informative, technical, proposal, organiza-
tion, social, economic, and gamification characteristics to introduce e-learning gami-
fication as shown in Fig. 1. In this study, we have proposed the conceptual framework
along with the operational framework for implementing an e-learning platform that
Adaptive Gamification in E-Learning Platforms: Enhancing Learners’ …
Table 1 Inclusion/exclusion standards of the research
Classification Language Access type Time Slot Document type Included content
Inclusion criteria English Open access 2017–2022 Articles/conference papers Explicit discussion on gamification
in E-learning platforms and
adaptive/personalized/tailored
gamification in e-learning platform
Exclusion criteria Not in English Not available as open access – Review, report, book chapter, etc. –
629
630 M. Chugh et al.
can be in the next stage be used for course implementation and evaluation (Tables 2
and 3).
The barriers of time and distance have been overcome using e-learning platforms.
However, the drop-out ratio for many e-learning platforms is high. Among the signif-
icant reasons for the dropouts is dearth of motivation among learners. It is due the
fact that all the learners do not have the same style of learning, and the e-learning
platforms have same kind of learning experience. Hence, to enhance motivation for
learning gamification has been introduced in the e-learning platforms.
The gamification has enhanced learner’s motivation, and we have introduced the
framework that specifically targets to the diverse learning styles of learners based
on the student interactions with the e-learning system. Thus, the system is adaptive
gamification system.
Components of the adaptive framework: The first component adaptive gamifi-
cation engine presents elements of the system that are mapped with the attributes of
the learners. To perform the administrative functions such as enrolling students,
Adaptive Gamification in E-Learning Platforms: Enhancing Learners’ … 631
access rights, etc., management of the e-learning platform exists. The adaptive
game elements are stored in the repository called adaptive game techniques and
dynamics which constitutes the third component of the system. The game elements
initiate motivation at the time of learning. The curriculum, resources, and learning
content are developed following the gamification plans contented by the adaptive
device. The element influence of adaptive gamification is the desired outcome of
the e-learning platform. The e-learning platform associates gamification to attain
enhanced education by improving engagement, learning practices, motivation, and
better knowledge.
Operational framework for adaptive framework: The operational aspect of the
adaptive framework includes the use of game thinking in the non-game perspective,
i.e., e-learning in our scenario. The apt usage of game components, dynamics, and
mechanics results in positive effects on the learner’s behavior that is in addition to
course objectives and achievements. A gamified course design process (GCDP) [20]
that incorporates game mechanics, elements, and dynamics is the basis of designing
the proposed framework. The framework is a blend of game modules to attain the
anticipated learning outcomes conferring on the student reaction. To drive applicable
game dynamics, proper game mechanics are chosen that use suitable game compo-
nents. At the time of course delivery, learners’ marks are fed into the e-learning
system to obtain the final grades. If the learner is adequately proficient, then a certifi-
cate is given, or else the learner is conveyed to attain the required skills and give a
re-assessment. The operational framework facilitates diverse and dynamic learners’
behavior, by different and accustomed game components, envisioned to emphasize
the desired learners’ behavior.
Addressing the customized and societal requirements of the learner provides a
pleasing experience for the learner. Altered game mechanics, elements, and dynamics
are anticipated to augment effectiveness, enthusiasm, efficiency, commitment,
experience, and knowledge throughout the learning.
5 Conclusion
References
1 Introduction
DLP System is one of the most overhyped and misunderstood security tools. It could
be difficult to understand the inherent worth of the tools and which products best
suit those environments with at least a half-dozen different names and even more
technology approaches [1] available in the market. This review will give you the
background knowledge that you need on DLP Systems so you can better understand
the technology, know what to look for in a product, and select the most appropriate
for your cloud computing or individual company. There is no consensus on precisely
what undermines the DLP solution. Some people think about encryption or DLP for
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 635
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_49
636 S. R. Racha and G. R. Karri
USB ports [2], while others stick to full product suites. DLP is defined as products
that use deep content analysis to identify, monitor, protection of data, in motion,
and use, based on central policies [3]. DLP solutions protect sensitive data while
also providing insight into how content is used within the company. Only a few
businesses categorize data beyond what is publicly available and everything else.
DLP enables businesses to gain a better understanding of their data and improve their
content classification and management capabilities. DLP as a feature and solution
both technologies is available in the market. Basic DLP functions are provided by
several products, particularly email security solutions [4], but they are not complete
DLP solutions. The distinction is a DLP product consists of centralized management,
policy creation, and enforcement workflow for content and data monitoring and
protection. The user interface and functionality are designed to address the practical
and theoretical problems with content awareness-based content security (Fig. 1).
2 Literature Review
We must distinguish between content and context. Content awareness is one of the
distinguishing features of DLP solutions. This is different from contextual analysis
and refers to a product’s capacity to do in-depth content analysis utilizing a range
of methodologies. It is simple to comprehend that the content is a letter, with the
envelope and surroundings serving as context. Context includes anything outside of
the letter’s content such as the source, header information, destination, size, recipi-
ents, sender, metadata, format, and time [5]. Context is extremely valuable, and any
A Study on the Integration of Different DLP Systems at Different Levels 637
DLP System should include it as a part of the overall solution. Business context
analysis is a more sophisticated form of contextual analysis that takes into account
the content’s use at that specific time as well as its environment at the time of analysis
[6]. A component of content awareness is looking into containers and assessing the
contents. Content awareness allows us to utilize context without being constrained
by it [7].
Taking possession of the envelope and opening it is the first step in conducting content
analysis [8]. Following that, the engine will need to parse the context (which is going
to be necessary for the analysis) and delve into it. This is a straightforward task to
complete in the case of an email containing only plain text; however, if you want to
inspect the contents of a binary file, the process becomes somewhat more difficult.
This problem is solved by every DLP solution available, which is file cracking.
The process of file cracking [9] refers to the technology that is utilized to read and
comprehend the file, regardless of how deeply buried the content may be. Crackers
can sometimes read Excel spreadsheets in compressed Word documents. Unzip the
file, read it, and analyse the Word document, then read and analyse the Excel data
[10].
To assist in file cracking, quite a few of these programmes make use of the
autonomy or verity content engines; nevertheless, all of the major tools have quite
a bit of their proprietary capacity in addition to the embedded content engine. The
majority of solutions can recognize standard data encryption and use it as a contextual
rule to block or quarantine information [11]. Using recovery keys in conjunction with
corporate data encryption, several technologies make it easier to analysis encrypted
data [12].
Once the basic information has been accessed, seven basic analysis techniques are
applied on that data to discover policy breaches. Each of these techniques has its own
set of advantages and disadvantages.
It examines the material in search of particular rules, such as 16-digit numbers that
comply with the standards of credit card checksums, medical billing codes, or some
other textual analysis techniques [13]. The majority of DLP solutions improve upon
638 S. R. Racha and G. R. Karri
This technique is also referred to as exact data matching at times. This method
searches exclusively for exact matches within a database [14], using either a dump of
the database or live data retrieved from the database (through an ODBC connection).
You could create a policy, for instance, that only checks for credit card numbers
in customer base, allowing you to ignore your employees when they make online
purchases.
With this process, you calculate a file’s hash values first and then search for additional
files with the exact same fingerprint. Some people categories this technique like a
contextual analysis procedure because the content of the files themselves is not
studied in it. It excels in dealing with media files as well as other binaries, when
textual analysis is just not viable [15].
This technique searches for complete or partial match on the material that is being
protected. You are therefore able to construct a policy to safeguard a sensitive docu-
ment, and the DLP solution will search for either the full text of the document or
even just a few phrases’ worth’s of an excerpt from the document [16]. For instance,
if a worker copied and pasted even a single paragraph from a business plan for a new
product into an instant message, the DLP System would alert you.
utilized in the prevention of spam. It can work with huge content where it might not
be possible to find exact documents to match.
3.6 Conceptual/Lexicon
Everything that can be put into one of the categories that are offered is ideal for
this. Generally speaking, it is not difficult to describe the content that is associated
with privacy, legislation, or industry-specific requirements. It is very easy to set up
and saves substantial policy generation time. Category policies can be used as a
foundation for more advanced policies that are specific to an enterprise. Most DLP
products on the market are based on these seven techniques. Not every product uses
every technique, and there can be big differences in how each technique is used.
Most products can also use “chaining techniques”, which means that they can make
complex policies by combining content analysis technique with contextual analysis
technique.
4 Technical Architecture
A DLP System is meant to protect content throughout its entire life. In terms of DLP,
this is important in three ways; scanning storage and other content repositories are
part of data at rest. This is done to find where sensitive content is stored. This is what
we call “content discovery”. A DLP device, for instance, can search your servers for
documents containing credit card details. The file may be encrypted, erased, or the
owner of the file may receive a warning if the server is not permitted to store that
kind of information. Data in motion is sniffing network traffic [19] (either passively
640 S. R. Racha and G. R. Karri
or in real time with a proxy) to find out what is being sent over certain channels. One
way to do this is to look for bits of sensitive source code in emails, instant messages,
and web traffic. Depending on the type of traffic, tools that are already in use can
often block based on central policies [15].
Organizations start in DLP with network-based products that protect both managed
and unmanaged systems in a wide range of ways. Most of the time, it is easier to start
a deployment with network products so that you can quickly cover a large area. Early
products could only do basic alerting and monitoring, but all of today’s products have
advanced features like that allow them to work with existing network infrastructure
and offer protective controls, not just detective controls [20]. A passive network
monitor forms the core of the majority of data loss prevention (DLP) solutions. In
most cases, the network monitoring component is installed on a SPAN port at the
gateway or in close proximity to it. Real-time full packet of data capture, session
reconstruction, and content analysis is among the tasks that may be carried out using
this tool [21]. However, this level of performance is not necessary except in extremely
rare situations, as very few organizations are currently operating at such a high level of
communications traffic [22]. Additionally, some systems restrict monitoring to pre-
defined port and protocol combinations [23], as opposed to using service/channel
identification based on the content of the packets being monitored [24].
The next crucial element is the incorporation of email. Due to the fact that email
acts as a store and forward, a variety of features, including filtering, quarantining,
and encryption integration, can be added without encountering the same difficulties
as synchronous. The majority of goods already include a Mail Transport Agent,
allowing you to easily add it as an additional receiver in a chain of email recipients
[25]. Integration of proxies for filtering/blocking is almost inevitable that whoever
deploys a DLP System will at some point want to start limiting traffic [26]. Bridge
is if we use a bridge, all we need is one computer with two network cards or some
kind of content analysis tools in the middle [27]. Proxy is a protocol or application-
specific middleware that organizes incoming traffic into queues before forwarding it
and so enables more in-depth analysis. Most commonly, HTTP, FTP, and IM are the
protocols supported by gateway proxies.
Internal Networks: DLP System is rarely utilized on internal communications
other than email, although being technically able to monitor internal networks. Gate-
ways offer handy choke points, but internal monitoring presents a difficult challenge
from the perspectives of cost, efficiency, and policy management. Although some
DLP companies have association for internal monitoring, most organizations place
less importance on this feature (Fig. 2).
A Study on the Integration of Different DLP Systems at Different Levels 641
Despite the fact that detecting leaks on the network is an effective strategy, it is
only one small component of the overall problem. Finding out where all of that data
is housed in the first place is proving to be of equal or even greater value to a lot
of clients these days. The technique that we are referring to here is called content
discovery. It is possible that enterprise search tools could help with this, but they
aren’t very well optimized for the difficulty that is being presented here. Enterprise
data classification tools can also be helpful, but based on the feedback I’ve gotten
from a number of customers, I’ve found that they aren’t very effective at locating
specific rule violations (Table 1).
The DLP tool is equipped with a variety of possible responses that it can implement
in the event that a violation of the data policy is discovered. When you receive an
alert or a report, you should handle it as though it were a breach of the network and
create an incident on the central management server. Send the user an email letting
them know that their actions may be in violation of the rules. Move the file to the
central administration server and then leave a text file with instructions on how to
make a recovery request for the file. This action is known as “quarantine and notify”.
Quarantine and encrypt mean to encrypt the file in its current location while typically
leaving behind a plain text file that describes how to make a request for decryption
(Fig. 3).
642 S. R. Racha and G. R. Karri
DLP often starts on the network since doing so is the most effective, efficient, and
cost-friendly approach to acquiring the widest possible coverage. DLP may be broken
down into two categories: network-based and host-based. Network monitoring is non-
intrusive and gives visibility to every system on the network, regardless of whether
it is managed or unmanaged, a server or a workstation. This is because network
monitoring does not interfere with the operation of the system being monitored.
The exception to this is when you need to crack SSL [27]. The process of filtering
is more complicated, but it is still not overly complicated on the network, and it
encompasses all of the systems that are connected to the network. However, it is
quite clear that this is not a comprehensive solution. If someone goes out with a
laptop, you will not be able to back up your data. Nor can you prevent someone from
copying data to a portable storage device such as a USB drive. The product needs to
extend coverage not only to the stored data, but also to the endpoints. Control and
policing are performed directly within the operating system kernel. If you plug into
the kernel of the operating system, you will be able to monitor user behaviours such
as copy-pasting sensitive text (Fig. 4).
• Performance and storage constraints will limit the types of content analysis and
the number of policies that can be imposed locally.
• It is difficult to spot and intercept confidential material that has been encrypted, as
well as difficult to spot data leakage that is occurring across encrypted networks.
• When communicating sensitive information about the company to a private entity,
there should be controls in place that are on the same level as the information being
sent.
644 S. R. Racha and G. R. Karri
• Graphics files often include sensitive information about companies, such as credit
card scores, academic records, and project specifications. This requires more CPU
time as well as the capacity to perform more computations.
• A DLP network solution has been implemented to monitor activities, which
includes email traffic, chat over IM, and communication over SSL, and more.
To begin with, these solutions need to be configured in accordance with a prede-
termined information disclosure policy in order to distinguish private data from
regular data. Unless it reveals sensitive information.
Since the beginning of the DLP Systems market, there have been at least one hundred
different firms doing DLP evaluations. Although not all of them purchased a product
and not all of them put one into operation, those who did generally found that the
implementation was simpler than that of a great number of other security products.
Inappropriate expectations and a lack of preparation for the business process and
workflows of DLP are typically the most significant roadblocks that stand in the way
of a successful DLP deployment from a purely technical point of view. Make sure
that your expectations are realistic. DLP System is a very useful tool for prevent
inadvertent disclosures and putting a stop to poor business processes that include
the usage of sensitive data. DLP market is still a few years away from being able to
halt knowledgeable bad people. Although DLP devices are still in their expanding
years, they provide very high value to businesses that take the time to plan effectively
and learn how to make the most of their capabilities. In future, DLP Systems pay
special attention to the process of creating policies and workflows and collaborate
with relevant business units to overcome existing problems. Therefore, there is a need
for study that would analysis both the content and the context of data in a balanced
way.
References
1. Cheng L, Liu F, Yao DD (2017) Enterprise data breach: causes, challenges, prevention, and
future directions. Wiley Interdiscip Rev Data Min Knowl Discov 7. https://doi.org/10.1002/
widm.1211
2. Reddy PV, Reddy KG (2021) An analysis of a meta heuristic optimization algorithms for
cloud computing. In: 2021 5th international conference on information systems and computer
networks (ISCON 2021), pp 2–7. https://doi.org/10.1109/ISCON52037.2021.9702376
3. Michael G (2017) ijpam.eu. 116:273–278
4. Faiz MF, Arshad J, Alazab M, Shalaginov A (2020) Predicting likelihood of legitimate data
loss in email DLP. Futur Gener Comput Syst 110:744–757. https://doi.org/10.1016/j.future.
2019.11.004
A Study on the Integration of Different DLP Systems at Different Levels 645
Kaushal Binjola
Abstract Anime is the unique art animation shows or movies originating from
Japan. It has received widespread appreciation and recognition from the public. It is
an aggregation of multiple genres and will always have something for everyone. Due
to its wide popularity and ability to grasp attention, it is often the talk of the town
and has led to the genesis of many communities, each based on one/multiple anime
shows/movies. Observation shows that consumption of such media has led many to
relate to a particular character from the show and revolve their personality around said
character. They try to imitate various behavioral habits, tones, and even dialogs from
these characters. Many people have begun using various Japanese terms and dialogs
in natural speech. Unconsciously, anime has started to affect its watchers’ natural
behavior, speech, and vocabulary. This paper aims to study the range of effects of
anime on an average watcher. We will also learn how anime has led to a significant
increase in the knowledge and appreciation of Japanese culture and how it could
affect tourism in the recent years.
1 Introduction
This paper will explain the correlation between the increasing popularity of Japanese
culture in the country and anime’s role in it. The influence of anime is not restricted to
Japanese culture and language but extends way above and bleeds into an individual’s
personality. To be closer to a character, people pick up various quirks and habits of the
character, which can range from talking in a strange voice to imitating the character
at all times, or performing certain habits performed by the characters, for example,
Kaneki Ken, the protagonist of Tokyo Ghoul, is seeing cracking his index finger
whenever he is about to power up. This is seen as cool and is also easily replicable.
We also see the use of certain dialogs in everyday conversation. Expanding on the
K. Binjola (B)
K.J. Somaiya College of Engineering, Vidya Vihar East, Mumbai, Maharashtra 400077, India
e-mail: kaushal.binjola11@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 647
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_50
648 K. Binjola
2 Literature Review
Many studies show that television shows and movies significantly impact viewers.
The social learning theory provides reliable proof and states that people imitate what
they observe, and a great proof of the same is children trying to imitate people around
them like their parents or siblings. The same is true when people consume entertain-
ment media. As proposed in the paper: A study on the influence of anime among
anime fans in Aizawl was carried out by creating focus groups. The researchers
found that the participants were keen on copying the behaviors of their favorite
anime characters [1]. There is also a relation between an individual’s personality and
the media genre they consume. However, the media they consume is not necessarily
the one they like most [2]. This postulation can mean that a genre can easily affect
an individual’s personality, allowing them to take an interest or like another genre
they would have otherwise hated. We can also see the significant effects of media on
personality, wherein a participant from the focus group started finding his younger
sibling endearing after watching an anime about siblings and their adventures [1].
Participants also started picking up personality cues of their favorite characters,
which changed their outlook on life [1]. There are adverse effects, such as smoking
promoted in teenagers due to movies [3]. We can also see a promotion and desire
for knowledge of the culture and dominant language of the country to which the
media belongs. We observed this when participants of the focus group were inter-
ested in Japanese food, clothing, music, and language from watching anime [1].
Young children aged 3–8 years are learning and conversing in Hindi in Bangladesh
after watching the Hindi dub of the show Doraemon [4].
The Pearson correlation coefficient is an effective method for checking the simi-
larity between multiple data variables. Relations between multiple data variables can
Effect of Anime on Personality and Popularity of Japanese Culture 649
3 Methodology
The data is collected via a survey using google forms. The demography of people
is between the ages of 16 and 27. The respondents were asked various questions
about their interest in anime, Japan, and tried to find how anime has affected their
personality and lifestyle. Table 1 gives the questions asked and the possible answers
respondents could have as a response.
On the data, we then applied various exploratory data analysis methods to understand
the respondents better and form meaningful relationships between those who watch
and do not watch anime to the effect anime has had on their life or their choices.
An initial way is by calculating the percent responses to ordinal questions asked to
people who watch anime. Time also plays an essential role in the development of
bias. Critical analysis can be done by checking how prolonged exposure to anime
has affected an individual’s answers. We also applied different kinds of correlations
like Pearson and Spearman to check the impact of various data features on each
other. Meaning the degree to which the answer to a question relates to the answer to
650 K. Binjola
another question. Meaningful conclusions can also be drawn by which features are
more related to others.
Pearson coefficient is calculated as
cov(X, Y )
ρ= (1)
σx σ y
where
cov(X, Y ) covariance of X and Y
σ x standard deviation of X
σ y standard deviation of Y.
Spearman coefficient is calculated as
6d 2
ρ =1− 2 i (2)
n n −1
where
d the pair-wise distances of the ranks of the variables x i and yi .
n the number of samples.
Kendall coefficient is calculated as
c−d S 2S
τ= = = (3)
c+d n n(n − 1)
2
where
c the number of concordant pairs
d the number of discordant pairs.
In our case, we will convert our ordinal data into numbers with Strongly Agree
= 2, Agree = 1, Disagree = −1, and Strongly Disagree = −2. Following that,
a correlation coefficient matrix is made, which will help calculate the correlation
coefficient between every data variable with other data variables by applying the
above-given formulas. The coefficients can then be extracted, and analysis can be
performed on the features. The analysis will be performed on Jupyter Notebooks.
Google Trends is an online search tool that analyzes a portion of Google searches
to compute how many searches have been done for the terms entered relative to the
total number of searches done on Google over the same time. It allows the user to
652 K. Binjola
see how often specific keywords, subjects, and phrases have been queried over a
specific period. We cannot directly check for the effect of anime on an individual’s
personality via Google Trends, but it helps give an accurate idea of how interest in
Japan and Japanese culture has been changing over the years and how it equates
to the popularization of anime. This directly relates to anime’s effect on Japan and
Japanese culture.
4 Analysis
A way of analyzing the effect of exposure of time on the answers given by the respon-
dents is to take a ratio of Agree to Disagree of all questions belonging to a particular
year. We have also taken an assumption as to when there are no disagreements in
a year. The disagreements in the year will be given a one by default to avoid zero
division error.
Correlation coefficient gives us how strong the relationship between two variables is.
The range of correlation coefficients varies from − 1 to 1. Wherein 1 signifies total
positive correlation, as X increases or decreases, Y will also increase or decrease,
respectively, by a scaled value of X. − 1 signifies total negative correlation, as X
increases or decreases, Y will also decrease or increase respectively by a scaled value
of X. Spearman coefficient is used for variables that are not normally distributed and
are nonlinear. For a case such as this paper, this will prove very useful. It is also
useful in a rank-based system, such as one our respondents answer.
Effect of Anime on Personality and Popularity of Japanese Culture 653
5 Result
Two-thirds of the respondents agree that anime has encouraged them to learn
Japanese. Following that, 87.5% have shown an interest in Japanese cuisine. A clear
indication of anime promoting tourism can be seen as 100% of all respondents agree
that they are interested in Japanese Culture and have a desire to travel to Japan. The
survey also shows significant marketing leverage, indicating that two-thirds of all
respondents are interested in clothing and apparel displayed in anime. An exciting
discovery was also made, showing anime’s direct impact on personality. 59% of
anime-watching respondents conveyed that they underwent a personality change or
that anime has affected their personality in one way or another. Furthermore, the
same has also agreed to use dialogs used in anime frequently. A more direct impact
can be seen from the question, “Do you ever act like any favorite anime character and
copy any of their characteristics?” 54% agree and 45% agree they also copy certain
habits or personality trademarks of anime characters they are watching. 50% of all
respondents also expressed the desire to be the main character of an anime.
On analyzing the subjective question of the effect of anime on personality, all the
responses were positive and usually revolved around gaining the ability never to give
up, being tenacious, willful, and having a strong sense of friendship.
Performing the above analysis, we obtained the following distribution Fig. 1 of years
to ratios. We can see that there was no determining factor as to the effect of year on
the answers provided by the respondents. A notable observation was that the highest
number of respondents started watching anime in 2015, and it has received one of
the most balanced amounts of agreeing to disagreeing responses.
A moderate negative Pearson correlation of 0.544580 can be seen between the respon-
dents’ age and their want to learn a language. This can be expected as the want to learn
a language decreases as the age of an individual rises. The three correlations also
reaffirm our analysis of the relation of time exposure to anime and other features,
as all the correlation coefficients used display a low correlation between the start
year of anime and other features. A strong Pearson correlation of + 0.604790 was
seen between a person using more anime dialogs and feeling that anime has affected
their personality. Respondents of a higher age also imagine themselves more as the
654 K. Binjola
main character of an anime, with a Spearman correlation between the two being +
0.497336. Following expectations, a 0.570561 positive Spearman correlation coef-
ficient was seen between liking Japanese cuisine and wanting to travel to Japan. As
interest in Japanese cuisine increases, it will be natural to try it from an authentic
source. Those who feel that their personality has been affected by anime also show a
positive Spearman correlation of 0.560748 to imagining they are the main character
of an anime. Such a coefficient is expected. Those who had their personality affected
have been influenced by a particular character, usually the main one. A high posi-
tive correlation of 0.735706 can be seen between respondents to whom anime has
introduced specific wanted characteristics in their significant other and those who
copy specific characteristics of any characters. This is also intuitive as those who
copy characteristics of a character would also want a significant other similar to the
characters they copy. A substantial positive Pearson correlation of 0.678287 can also
be witnessed between the introduction of anime to Japanese music and the want to
learn Japanese. Lastly, according to expectations, those who enjoy Japanese music
are more likely to have Japanese music introduced to them via anime, as shown by
a powerful positive Spearman correlation coefficient of 0.736528.
Google Trends gives us a great look at how interest in the Japanese language changed
to anime. According to Fig. 2, which queries through Google web searches from 2004
to the present, we see a similar shape shared by the two graphs with Japanese language
Effect of Anime on Personality and Popularity of Japanese Culture 655
Fig. 2 Web searches of anime and Japanese language from 2004 to present
being offset by some value. Both line charts rise and fall at the same time and can be
seen to have similar slopes. The two slopes also show a very high positive Spearman
correlation coefficient of 0.885203. Spearman is used here as the distributions are
not normal.
An outlier point can be seen in the Japanese graph around 2011, which can be
associated with the Japan Tohoku earthquake and tsunami 2011. The spike of searches
in January of 2017 and fall of searches in March of 2019 cannot be attributed to any
concrete point or theory.
Using the analysis work presented in this paper, we can say that anime has substan-
tially affected the viewer’s personality and has introduced various positive changes
to their mindset. Japanese culture has also received a massive boost from anime,
with people wholeheartedly supporting Japanese cuisine and songs and wanting to
learn the Japanese language. A significant take from the survey can be the impact of
anime on tourism. Japan can use anime as a marketing strategy to invite the youth
to explore Japan firsthand. They can also extend the use of anime to popularize their
education and work sector to invite people from other countries to pursue education
and work in Japan, increasing their economy.
Further analysis can also be done by considering the economic change in Japan’s
industry due to anime and tourism over the years. Sales of anime merchandise and
profits due to tourism can also serve as validation. Furthermore, data can be collected
on the increase in travel to Japan by foreign nationals for a vacation to experience
Japan firsthand. Social media can be used to collect data on the increase in the number
of posts relating to anime and Japan and can be used to perform sentiment analysis
656 K. Binjola
References
1. Muankimi ML (2017) A study on the influence of Anime among Anime Fans in Aizawl.
IJCHSSR 1(1)
2. Jakˇsa S (2020) What anime to watch next? The effect of personality on anime genre selection.
3. Heatherton TF, Sargent JD (2009) Does watching smoking in movies promote teenage
smoking? Curr Dir Psychol Sci 18(2):63–67
4. Islam NN, Biswas T (2012) Influence of Doraemon on Bangladeshi children: a CDA
perspective. Stamford J English 7:204–217
5. Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction
in speech processing. Springer, Berlin, Heidelberg, pp 1–4
6. Adler J, Parmryd I (2010) Quantifying colocalization by correlation: the Pearson correlation
coefficient is superior to the Mander’s overlap coefficient. Cytometry A 77(8):733–742
7. Javaran SH, Sajadi SAN, Karamoozain M (2014) The relation- ship between the social respon-
sibility of club with reputation and fans’ dependency on the team in the football premier
league
8. Croux C, Dehon C (2010) Influence functions of the Spearman and Kendall correlation
measures. Stat Methods Appl 19(4):497–515
9. Malamuth NM, Check JV (1981) The effects of mass media exposure on acceptance of violence
against women: a field experiment. J Res Pers 15(4):436–446
10. Aborisade OP (2013) Data collection and new technology. Int J Emerg Technol Learn (iJET)
8(2):48–52
11. Morgan GA, Harmon RJ (2001) Data collection techniques. J Am Acad Child Adolescent
Psych 40(8):973–976
Synthetic Time Series Data Generation
Using Time GAN with Synthetic
and Real-Time Data Analysis
Abstract Synthetic time series data generation is a wide area to research, and
lot of attention has drawn recently. The assumption for generating multivariate
time sequenced data is well proportioned and continuous without missing values.
A productive time series data model should retain momentum, such that the new
sequence maintains the actual relationship between different variations occurrence
over time. Different existing approaches that put forward generative adversarial
networks into the sequence system that prohibits care for temporary interactions
that differ from sequential time data. Simultaneously, monitored sequence predic-
tion models—allowing for fine control over network dynamics—are naturally deter-
mined. Time series data generation has problems like informative missing data values
leading to untraceable challenge and long sequences with variable length. These prob-
lems are biggest challenges in making of a powerful generative algorithm. Herein,
we are using an innovative structure to produce real-time sequence which associates
the adaptability of different unattended prototype through controls provided during
the supervised model training. Privacy data analysis safeguards data privacy and data
sharing. Generation of accurate private synthetic data is a NP hard problem from any
considered scenario. This paper discusses about privacy parameter and data analysis
of real and synthetic data. The synthetic data is enclosed in the boundaries of real
data and have similar behavior.
Keywords Data privacy · Dynamic bayesian network · Synthetic data · Time series
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 657
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_51
658 T. Juneja et al.
1 Introduction
The temporal model poses a unique challenge on productive models. The model not
only gives the function of apprehend the distribution of parameters within various
instances of time, but it should also seize the potential for intricate variables of
variables present at that period. Specifically, in modeling multivariate data y1 : T =
(y1 …yT ), we wish to do accurately download conditional distribution πp(yt |y1 :t − 1)
and temporary changes. Introducing the time series generative adversarial networks
(Time GANs), an ordinary structure aimed at producing real-time serialized data for
diverse domains. Furthermore, the unpredictable losses of both original and practical
contradictions, respectively, we present the tracked losses that follow the steps by
means of the real data as monitored, obviously to encourage models to seize uncer-
tain distribution following steps within data points. It yields to take benefits of the
fact that more information acquired on training data rather than every data point
is original or artificial so, to acquire from transformation since actual follow-up.
Importantly, supervised losses are jointly reduced training both embedded networks
and generators, so that the hidden area is not just functional promote efficiency of
parameter setting to expedite generator learning temporary relationships. Eventu-
ally, the framework for dealing hybrid data preparation with together discrete plus
continuous time series data generated simultaneously.
Our approach is the first to integrate the unstructured GAN structure in autore-
gressive models with control given to supervise training. The benefits in a sequence
of analysis with different realistic plus virtual datasets scenarios. Preferably, we drive
TSNE [1] and PCA [2] analyze to visualize similarity in distribution produced actual
distribution. In addition, using the “train on synthetic, test on real (TSTR)” frame-
work [3, 4] in the subsequent predictive function, we test fitness of the facts produced
keeps original speculative features. We find that Time GAN accesses consistently and
significant developments beyond modern standards in producing real-time series.
Also using Gretel for different comparisons between real and synthetic data. Gretel
Synthetics called privacy filter, which provides the kind of weapons to cover data for
the making of weak points that often exploited by enemies to attack. For example,
synthetics that are remarkably like the original data can lead attacks on membership
ideas and attribute disclosure. Another major privacy risk appears when you have
“external” records, especially if they are like external one’s training records. Combat
both conditions, we created filters like outlier, which are both can dial to a certain
extent based on the desired level of privacy.
2 Related Work
Time GAN, a productive time sequence model, taught to fight and collaborate
with capabilities to learn embeddings. A gap consisting supervised loss and uncon-
trolled loss. Therefore, route is in middle of the road of a wide range of research
Synthetic Time Series Data Generation Using Time GAN with Synthetic … 659
3 Proposed Work
Time GAN contains four structure modules: embedding function, retrieval func-
tion, sequencing generator, and subsequent discrimination. Important understanding
that autoencoding parts trained jointly with opposing components, such as Time
GAN concurrently acquires coding features, making presentations, and repeating
over time. Embedding network provides hidden space, the opposing structures oper-
ates containing space, and hidden power your real and virtual dataset synced with
supervised losses. We describe each one in sequence.
Architecture: The GAN time variant intended to seize characteristics of real data
and the complex flexibility of those features over time. The newly launched embed-
ding network provides a rewind map between features and hidden representation. The
network embedding and the recovery network maintains the relationship between the
hidden vectors and elements in the hidden area, whereas the function of the gener-
ator and the racist network remains unaffected. Like traditional GAN, generator and
discrimination are associated with successive losses, also known as unsupervised
loss [29]. Inclusive to unattended losses, the model repaired with two other losses
activities, identified as supervised losses and reconstruction losses, coupled with it
automatic installers.
Four parts are as: (I) Generator—Produces data sequences. (II) Discrimination—
Distinguishes sequence of data as actual or duplicate. (III) Embedding network—
Consists of a rewind map amid features and hidden features illustration. (IV)
Restore—Consists of a map between the feature and the hidden space.
Three types of losses can describe as: (1) Uncontrolled Loss—Loss of work in rela-
tion to electricity generation and racism (minimum size). (2) Monitored Loss—How
Synthetic Time Series Data Generation Using Time GAN with Synthetic … 661
well the generator accurately calculates future data in a hidden location. (3) Recon-
struction Loss—Compares reconstructed and original data, referring to automated
embedding
Embeddings and recovery function: The embedding and recovery functions
offer mappings among features and latent spaces, permitting to acquire basics of
temporal dynamics of the data through lower-dimensional representation in adver-
sarial network. Let GS: GX denotes the latent vector spaces resultant characteristics
space S:X.GX includes constant and progressive time-based characteristics to the
latent codes gS; g1 : T = e (s; × 1: T ). Herein, we execute e through repetitive struc-
ture, where eS : S− > GS is an embedding network for unchanging characteristics,
and eX : GS ∗ GX ∗ X− > GX a recurrent embedding network for progressive
time-based features. t X takes constant and progressive time-based codes back to
their feature representations s; × 1: T = r (GS; g1 : T ). Implementations of r across a
feedforward network at each step , r S : GS− > S and r X : GX− > X are recovery
structure for constant and progressive time-based embeddings [30]. The embedding
and recovery functions characterized with the help of several structure of selection,
along with one perquisite that it adopts autoregressive plus spontaneous sequencing.
3.2 Datasets
In this paper, we have taken three multivariate time series datasets. These datasets
described below:
1. Power: Database characterized by sound repetition, high magnitude, and related
features. The UCI appliances’ power forecasting database contains various
estimates, with continuous value that includes different temporal features that
measured soon.
2. Stock: This database taken from Yahoo finance contains six variables. Open, Up,
Down, Close, Adj Close, and Volume.
3. Air quality: The site contains 9358 times the mid-hour responses from a series
of five chemical oxide sensors embedded in a high-quality chemical device. The
device found in a stadium in a highly polluted area, road level, in the heart of an
Italian city. Data recorded from March 2004 to February 2005 (one year) repre-
sents the longest record available for cooling chemical sensors. Low true hours
CO hours, non-methane hydrocarbons, benzene, total nitrogen oxides (NOx ), and
nitrogen dioxide (NO2 ) provided by a certified reference reviewer.
Fig. 1 Field data comparison between training data (violet) and synthetic data (sea green) on dataset
stock field volume and high
TSNE [1] and PCA [2] analyzes on both the original and synthetic datasets (flattening
the temporal dimension). These visualization helps in estimating the closeness of
the distribution of created data points resemblance to real in 2D plot, resulting in
qualitative assessment. In Fig. 2, there is PCA and TSNE analysis of the datasets. In
Fig. 2, PCA analysis shows that all data points are within the original data points,
and similarly for TSNE, original data encapsulates synthetic data. The descriptive
analysis of data points shows that data is close to original data and varies between
lowest and highest real data points.
4.2 Correlation
The term autocorrelation or correlation refers to the degree of similarity between (A)
and a given time series, and (B) its late version, over (C) consecutive intervals. In
other words, autocorrelation intends to measure the relationship between the current
value of a variable and any past values you can reach. Therefore, the time series
of automatic integration attempts to estimate the current values of the variance by
comparing the historical data for that variance (Fig. 3).
Synthetic Time Series Data Generation Using Time GAN with Synthetic … 663
Fig. 2 PCA and TSNE result for synthetic (red color) and real data (black color) for energy data
Privacy protection [31] is some on basis of four characteristics: (I) outlier filter, (II)
similarity filter, (III) overfitting prevention, and (IV) differential privacy. Figure 4
gives us details about privacy of datasets.
664 T. Juneja et al.
5 Result
The synthetic data generated is not exactly close to real data values. Data values
duplicated depending on datasets such as zero values duplicated in synthetic data,
while 130 data values duplicated in energy datasets. In the worst-case generation
of synthetic data, Boolean of linear statistical is NP hard problem [32]. Differential
privacy disabled for all three datasets which is very efficient to prevent duplicate data
generation. Synthetic quality score is above 85 which indicates all the privacy factors
and analysis are favorable and synthetic data is unique and in structured format. PCA
analysis of the dataset overlapped properly for the taken dataset as seen in Fig. 2.
Observation from Fig. 3 various parameters in the datasets are correlated while certain
parameter is distant in that instance of time. The correlation difference is less than 0.2
in most parameter considered for comparison. Differential privacy disabled indicates
that no privacy compromised during synthetic data generation. Overall accuracy and
quality of data ranges between 80 and 90. The privacy outliners enable data to be
secure for usage and (Table 1) encourages to obtain high accuracy also enabling
synthetic data to fully capture characterstics of original data.
Synthetic Time Series Data Generation Using Time GAN with Synthetic … 665
Herein, we present Time GAN, a unique framework for a generation of serialized time
data that includes variation of the unmanaged GAN method and conditional variable
duration controls provided by autoregressive supervised models. Using donations
for supervised losses as well as collaboratively trained embedding network, Time
GAN demonstrates consistent and critical development in addition to high-quality
benchmarks in the production of real-time series data. We also investigated privacy
parameters on the generated time series data using Time GAN with the help of Gretel.
The privacy of data encourages to preserve original data with high accuracy enabling
to fully capture characteristics during synthetic data generation. Data generation can
be used in different fields like IOT and health care by developers to develop strong
data model with improved hyperparameter adjustments embedding network creation.
References
1. Bermperidis T, Schafer ST, Gage FH, Sejnowski T, Torres EB (2022) Dynamic interrogation
of stochastic transcriptome trajectories using disease associated genes reveals distinct origins
of neurological and neuropsychiatric disorders. Front Neurosci 703
666 T. Juneja et al.
2. Dogariu M, Ştefan L-D, Boteanu BA, Lamba C, Ionescu B (2021) Towards realistic financial
time series generation via generative adversarial learning. In: 29th European signal processing
conference (EUSIPCO), pp 1341–1345
3. Esteban C, Hyland SL, Rätsch G (2017) Real-valued (medical) time series generation with
recurrent conditional gans. arXiv preprint arXiv:1706.02633
4. Yoon J, Jordon J, van der Schaar M, PATE-GAN (2019). Generating synthetic data with
differential privacy guarantees. In: International conference on learning representations
5. Liu X, Li L (2022) Prediction of labor unemployment based on time series model and neural
network model. In: Computational intelligence and neuroscience
6. Persaud D (2022) Arts education in a time of crisis: COVID-19 in Los Angeles, 2020–2022.
Doctoral dissertation, UCLA
7. Raval M, Dave P, Dattani R (2021) Music genre classification using neural networks. Int J Adv
Res Comput Sci 12(5)
8. Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky
V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(1):2096–2030
9. Lamb AM, Parth Goyal AGA, Zhang Y, Zhang S, Courville AC, Bengio Y (2016) Professor
forcing: a new algorithm for training recurrent networks. In: Advances in neural information
processing systems, pp 4601–4609
10. Sajja RK, Killari V, Nimmakayala SA, Ippili V (2022) Machine learning algorithms for asl
image recognition with lenet5 feature extraction. Int J Adv Res Comput Sci 13(3)
11. Bahdanau D, Brakel P, Xu K, Goyal A, Lowe R, Pineau J, Courville A, Bengio Y (2016) An
actor-critic algorithm for sequence prediction. arXiv preprint arXiv:1607.07086
12. Mogren O (2016) C-rnn-gan: continuous recurrent neural networks with adversarial training.
arXiv preprint arXiv:1611.09904
13. Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:
1411.1784
14. Zhang Y, Gan Z, Carin L (2016) Generating text via adversarial training. In: NIPS workshop
on adversarial training, p 21
15. Zhang C, Kuppannagari SR, Kannan R, Prasanna VK (2018) Generative adversarial network
for synthetic time series data generation in smart grids. In: IEEE international conference
on communications, control, and computing technologies for smart grids (SmartGridComm).
IEEE Publications, pp 1–6
16. Chen Y, Wang Y, Kirschen D, Zhang B (2018) Model-free renewable scenario generation using
generative adversarial networks. IEEE Trans Power Syst 33(3):3265–3275. https://doi.org/10.
1109/TPWRS.2018.2794541
17. Ramponi G, Protopapas P, Brambilla M, Janssen R (2018) T-cgan: Conditional generative
adversarial network for data augmentation in noisy time series with irregular sampling. arXiv
preprint arXiv:1811.08295
18. Dai AM, Le QV (2015) Semi-supervised sequence learning. In: Advances in neural information
processing systems, pp 3079–3087
19. Lyu X, Hueser M, Hyland SL, Zerveas G, Raetsch G (2018)Improving clinical predictions
through unsupervised time series representation learning. arXiv preprint arXiv:1812.00490
20. Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video represen-
tations using lstms. In: International conference on machine learning, pp 843–852
21. Miuccio L, Panno D, Riolo S (2022) A Wasserstein GAN autoencoder for SCMA networks.
IEEE Wirel Commun Lett 11(6):1298–1302
22. Li Y, Mandt S (2018) Disentangled sequential autoencoder, p10. arXiv preprint arXiv:1803.
02991
23. Hsu W-N, Zhang Y, Glass J (2017) Unsupervised learning of disentangled and interpretable
representations from sequential data. In: Advances in neural information processing systems,
pp 1878–1889
24. Larsen ABL, Sønderby SK, Larochelle H, Winther O (2015) Autoencoding beyond pixels using
a learned similarity metric. arXiv preprint arXiv:1512.09300
Synthetic Time Series Data Generation Using Time GAN with Synthetic … 667
Keywords Big Data · Data mining · Heterogeneous sources · Big Data in higher
education
1 Introduction
Higher education institutions cannot avoid the effect of Big Data. Massive volumes
of educational data are gathered and created every day in the higher education system
from many sources and in various forms [1]. In higher education institutions, infor-
mation technology has been employed to automate the majority of the manual proce-
dures (from student admissions to tests to result declaration or physical file tracking
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 669
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_52
670 I. R. Banday et al.
Big Data basically refers to huge volume of data that cannot be stored and processed
using the traditional approaches within the given time frame. How huge this data
needs to be in order to be classified as Big Data. There is lots of confusion while
referring to the Big Data. Usually, Big Data term is used to data that is either in
gigabytes or terabytes or petabytes or exabytes or anything that is larger than this
size. However, this does not define the term Big Data completely, even a small amount
of data can be referred as big data depending on the context it is being used. The size
of the data will vary between megabytes and petabytes, depending on the domain [5,
6]. Big Data is therefore context-specific and can apply to various sizes and types
from domain to domain, but the common challenge facing all of these domains is to
be able to make sense of the data by analysing it at a high analytical level.
Structured data: Structured data can be defined as the data that has predefined
format and usually stored in tabular format [11]. The sources for this type of data are
flat files (delimiter separated values), relational databases.
Unstructured data: Unstructured data is the data that may or may not have and
predefined format or repeating patterns. Unstructured data consists of data having
different formats like text, images, audio, video, emails, etc. [11]. Sources for this type
of data are documents, logs, survey results, feedbacks, social networking platforms,
and mobile data.
Semi-structured data: Semi-structured data can be defined as the data that does not
follow the proper structure of the data models as in relation databases [11]. This type
of data contains labels or mark-up components in order to separate elements and
generate hierarchies of records and fields in the given data.
Figure 1 shows the percentage of structured, semi-structured, and unstructured
data [7].
672 I. R. Banday et al.
2 Literature Survey
There is a lot of research on Big Data in Academia, and we have highlighted some of
the most significant publications in this study. We believe there is not a single research
employing models on big data for Artificial Intelligence solutions for predicting
academic behaviour including student performance, academic bias and job market
viability.
A Big Data-based novel knowledge teaching assessment method in universities
is reviewed by Xin [12]. The discussion of knowledge teaching evaluation systems
in colleges and universities over the last ten years is summarized in this paper. It also
examines the factors that are preventing further advancements in teaching evaluation
and draws on performance management theory. The evaluation system offers a struc-
tured framework for integrating assessment methods used in university knowledge
teaching theory and practice.
Brock et al. [13] suggested that the research model adds elements of organiza-
tional learning capacities to the classic technology adoption paradigm. This study
polled 359 information technology workers from 83 countries who are studying
at the University of Liverpool Online and work in a variety of sectors. This study
combines two technology adoption paradigms to give meaningful academic and
practical information.
Wen et al. [14] examine works on increasing Hadoop cluster energy efficiency
and group them into five categories: energy-aware cluster node management,
energy-aware data management, energy-aware resource allocation, energy-aware
job scheduling, and alternative energy-saving strategies. Authors briefly explain
each category’s logic and compare and contrast the relevant works in terms of their
benefits and drawbacks. Furthermore, they present our findings and suggest future
research directions, including energy-efficient cluster partitioning, data-oriented
resource classification and provisioning, resource provisioning based on optimal
utilization, EE and locality aware task scheduling, machine learning-assisted job
profiling, elastic power-saving Hadoop with containerization, and efficient big data
analytics on Hadoop.
Rehman et al. [15] looked into the latest BDA technologies, methods, and method-
ologies that can lead to the development of intelligent IIoT systems into their research.
Authors created a taxonomy by classifying and categorizing the literature based on
How Big Data Analytical Framework has Redefined Academic Sciences 673
key factors (e.g. data sources, analytics tools, analytics techniques, requirements,
industrial analytics applications, and analytics types). The foundations and case
studies of the many businesses that have profited from BDA were presented. Authors
in this research also go through the several advantages that BDA brings to the IIoT.
Furthermore, they highlight and debate the critical concerns that must be addressed
as research directions in the future.
Williamson et al. [16] emphasis the paper on a large ongoing data infrastruc-
ture project in UK Higher Education. It looks at the infrastructure’s sociotechnical
networks of organizations, software programmes, standards, dashboards, and visual
analytics technologies, as well as how these technologies are integrated with govern-
mental market reform imperatives. The study highlights how higher education is
being remade through the utopian goal of a “smarter university” while also being
transformed through the political agenda of marketization.
Cantabella et al. [17] present a case study conducted at the Catholic University of
Murcia, in which student behaviour was analysed over the course of four academic
years according to learning modality (on-campus, online, and blended), taking into
account the number of LMS accesses, tools used by students, and events associated
with them. Due to the difficulties of managing the vast number of data created by users
in the LMS (up to 70 GB in their research), statistical and association rule approaches
were used in conjunction with a Big Data framework to speed up statistical analysis
of the data. The gathered findings were shown and reviewed using visual analytic
tools in order to uncover patterns and shortcomings in student’s use of the LMS.
Higher educational institutions like universities and their affiliated colleges hold
very huge quantity of data related to students, courses, and staff. Analysing this data
can allow us to obtain insights which can enhance the operational effectiveness of
educational organization. By doing statistical analysis on this educational big data,
variables like student course selection, examination results, and career prediction of
each student can be processed [18].
Before we talk about this educational Big Data, let us first examine how higher
education institutions look like without Big Data. Let us take the example of Univer-
sity of Kashmir (UoK) (Fig. 2), but it can be very similar at any other universities.
In University, lots of operations take place during life cycle of a student. Actually
these operations are referred as processes that take place at different stages right
from registration of student in a particular programme to completion of degree. In
2001, the first process, which was automated in UoK, was result compilation. For
this purposes, database management systems (dBase) were used which were having
“.dbf” file as underlying file format. As of now, this system is still in use for result
preparation of few small programmes. In year 2004, registration section was auto-
mated. Registration section is the back bone of university, where in they keep data
674 I. R. Banday et al.
like student name, DOB, college, address, course opted, etc., in short all the informa-
tion related to the student is available here. This process was having platform which
was built using SQL SERVER at back end and VB as front end tool. Simultaneously,
another process was automated which was handling degree preparation and printing,
with MS access at backend DBMS.
This is how organizations have been working for quite some time. In University
of Kashmir, it was working in similar way before automation of every process of
student life cycle at UoK that took place, i.e. after year 2008. Now, problem is
that we have data everywhere. Educational institutions have changed today, all the
operations/ processes in student life cycle as shown in below figures are online, and
each process has its own database thus huge volume of data at multiple locations is
being generated rapidly.
The identification of Big Data and the sources from which it originates is the
reference point from where one has to start. It is assumed that we are dealing with
University of Kashmir (UoK) atmosphere, but it can be very similar at any other
universities. This data is available in physical files (non-electronic format) at the
university, data from multiple databases of different processes as illustrated in the
figures above, data in the form of logs from various processes and university servers,
data from external educational boards such as the JK State Board of Education,
data from affiliated colleges, and data from the feedback forum (the university’s
feedback/grievance portal for students). Today all the processes involved in student
life cycle (at University of Kashmir) is happening online and each process as is
having its own data source. Below (Fig. 3) mentioned pictures show the detailed
explanation of these processes.
The student life cycle as demonstrated in the Fig. 3 above starts with the admission
of a student in the University. The next important step is the student verification and
registration process. Once the student is enrolled in the university, he becomes eligible
for appearing in the internal exams conducted by the concerned college/department
and the external exams conducted by the university. The colleges/departments after
How Big Data Analytical Framework has Redefined Academic Sciences 675
conducting the internal exams/practical submit the awards to the university exam-
ination system where these are awards recorded for each student. Next, university
conducts the examinations, and the papers written by students of different colleges/
departments are being evaluated by University examination system. The results for
both internal and external exams are finally compiled and declared. For students who
pass all the subjects, university awards the degree. The students who fail in one or
multiple subjects are considered as backlogs and for these students’ examination
process is repeated [19].
The examination conduct process is the core process in the student life cycle. As
demonstrated in Fig. 4, this process is all about the conduct of examination. There
is a repository of the student academic history.
From this repository, the internal/practical marks for the regular students are
pushed into the examination conduct system where they get recorded for the compi-
lation of the final results. Through the examination conduct system, there are other
tasks that get initiated. These tasks are as listed below.
(1) The generation of admit cards for all students who are eligible for sitting in the
exams.
(2) The centre statements for the conduct of exams. In these statements’, centres
are allocated for all the colleges and departments for the conduct of their exams.
(3) Attendance sheets for all designated centres to track the record of present and
absent students for different exams.
(4) Centre-wise/paper statements that is taken care by conduct and paper setting
section.
The backlog students are also included in the same examination conduct process
after due process of examination form submission and fee payment by the concerned
students.
676 I. R. Banday et al.
4 Technical Implementation
4.1 MapReduce
4.2 Hadoop
Hadoop software library is a system that uses simple programming models to allow
for the distributed processing of large data sets across clusters of computers [22].
Hadoop is designed in such a way that it can be scaled from a single server to
thousands of computers with very high fault tolerance.
678 I. R. Banday et al.
4.3 NoSQL
NoSQL also known as not only SQL are databases that store data in a format other
than relational tables.
5 Technical Implementation
For academic enterprise analysis, we need to design a framework for that we have
to first identify the problem for which framework will be designed. For instance,
suppose university authorities or government officials are looking for information
regarding how many students of science stream of a particular district have received
their degrees that have completed their programme without any backlog, of partic-
ular years. So, to generate that information, we actually need the data from all the
identified sources and data processing algorithm that will make it ready for analysis.
An algorithm for this will collect data from all known data sources and clean it up
as needed. The next stage will be to move unstructured data to Hadoop and perform
any required data transformations. Following data transformations, the algorithm will
create a data warehouse using the transformed data as well as other structured data.
Finally, we can analyse the data that has already been placed in the data warehouse.
To design analytics architecture using above-mentioned algorithm, there are few
steps that need be taken after the data from physical files, multiple databases, and
other sources (external sources) has been identified. The crucial step is to get all the
data from these sources/databases integrated or accessible at one place/platform. The
architecture will be having minimum four layers (Fig. 6). The data sources are covered
by the lowest layer. Structured data could be stored in relational databases, or all of
the academic enterprise’s relational databases can even be explicitly integrated at this
layer. NoSQL databases can be used to store unstructured data. Apache Hadoop will
be the next layer. The data from the lowest layer is processed by Hadoop. This tool has
the advantage of being able to work with both relational data and NoSQL. The Apache
Hive tool can be used at the same layer and can be used for working with NoSQL data
and converting it to RDBMS. In next layer, we now have a data warehouse. Once we
have all data in data warehouse, we can use any business intelligence tool/platform
(like Tableau) and create visualizations.
Big Data analytics is employed less frequently in higher education than in other
industries, and the need for it is growing in the education sector as well. We are
approaching a new era in which Big Data mining in education sector will assist us in
discovering learning in modern times. Big Data analytics is critical in this era of data
How Big Data Analytical Framework has Redefined Academic Sciences 679
floods because it can bring unexpected insights and aid in better decision-making in
the education sector. Big Data analysis of educational data is not as easy as it may
appear. We highlighted a very important asset in this research, which is the identi-
fication of heterogeneous data sources in the University of Kashmir’s (educational
setting), as well as the concerns associated with big data mining and the design of
a framework. The design of the framework is only one of the challenges associated
with it. The approach given in this paper incorporates Hadoop into existing systems
and data warehouse to provide a solution for educational institutes’ expanding data.
This suggested Big Data architecture for education will allow for collecting data,
storage, and analysis, which will all have potential benefits for future initiatives. The
academic challenges will be well addressed by Big Data because unforeseen ideas
such as job vacancies, course recommender systems for students, and student reten-
tion can only be well designed and integrated with the existing system when big data
is implemented in the education setup.
References
1. Weiss SM, Indurkhya N, Introduction to big data in education and its contribution to the quality
improvement processes. https://doi.org/10.5772/63896
680 I. R. Banday et al.
2. Daniel BK (2017) Big data in higher education: the big picture. Springer International
Publishing Switzerland, p 19. In: Daniel BK (ed), Big data and learning analytics in higher
education. https://doi.org/10.1007/978-3-319-06520-5_3
3. Prinsloo P, Archer E, Barnes G, Chetty Y, Van Zyl D (2015) Big(ger) Data as better data in
open distance learning. Int Rev Res Open Distrib Learn 16(1). https://doi.org/10.19173/irrodl.
v16i1.1948. https://id.erudit.org/iderudit/1065938ar
4. Tulasi B (2013) Significance of big data and analytics in higher education. Int J Comput Appl
(0975–8887) 68(14)
5. Imran RB, Majid Z, Quadri SMK, Muheet AB (2018) Big data mining: a literature review.
IJRECE 6(3). ISSN: 2393-9028 (PRINT) | ISSN: 2348-2281 (ONLINE)
6. Williams P (2017) Assessing collaborative learning: big data, analytics and university futures.
Assess Eval High Educ 42(6):978–989
7. Shamsuddin NT, Aziz NI, Cob ZC, Ghani NL, Drus SM (2018) Big data analytics framework
for smart universities implementations. In: International symposium of information and internet
technology. Springer, Cham, pp 53–62
8. Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data
Eng 26(1)
9. Chen H, Roger HL, Veda C (2012) Business intelligence and analytics: from big data to big
impact. MIS Q 36(4). Eller College of Management, University of Arizona, Tucson, AZ 85721
U.S.A.
10. Diebold F (2012) On the origin(s) and development of the term “Big Data”. Pier working
paper archive, Penn Institute for Economic Research, Department of Economics, University
of Pennsylvania
11. Fayaz SA, Altaf I, Khan AN, Wani ZH (2019) A possible solution to grid security issue using
authentication: an overview. J Web Eng Technol 5(3):10–14
12. Xin X, Shu-Jiang Y, Nan P, ChenXu D, Dan L (2022) Review on A big data-based innovative
knowledge teaching evaluation system in universities. J Innov Knowl 7(3):100197
13. Brock VF, Khan HU (2017) Are enterprises ready for big data analytics? A survey-based
approach. Int J Bus Inf Syst 25(2):256–277
14. Wu WenTai, Lin WeiWei, Hsu C-H, He LiGang (2018) Energy-efficient hadoop for big data
analytics and computing: a systematic review and research insights. Futur Gener Comput Syst
86:1351–1367
15. ur Rehman MH, Yaqoob I, Salah K, Imran M, Jayaraman PP, Perera C (2019) The role of big
data analytics in industrial Internet of Things. Future Gener Comput Syst 99:247–259
16. Williamson B (2018) The hidden architecture of higher education: building a big data
infrastructure for the ‘smarter university.’ Int J Educ Technol High Educ 15(1):1–26
17. Cantabella M, Martínez-España R, Ayuso B, Yáñez JA, Muñoz A (2019) Analysis of student
behavior in learning management systems through a Big Data framework. Future Gener Comput
Syst 90:262–272
18. Rasmitadila R, Humaira MA, Rachmadtullah R (2022) Student teachers’ perceptions of the
collaborative relationships between universities and inclusive elementary schools in Indonesia.
F1000Research 10:1289
19. Kenett RS, Prodromou T (2021) Big Data, analytics and education: challenges, opportunities
and an example from a Large University unit. Big Data Educ Pedagogy Res, pp 103–124
20. Durbin P, A framework for turbulence modeling using Big Data: Phase II final report Karthik
Duraisamy (PI) University of Michigan, Ann Arbor, MI Juan J. Alonso (Co-I) Stanford
University, Stanford, CA
21. Phan A-C, Phan T-C, Cao H-P, Trieu T-N (2022) Comparative analysis of skew-join strategies
for large-scale datasets with MapReduce and spark. Appl Sci 12(13):6554
22. Zhang J, Wang F, Zhou J (2022) Research on the construction of university data platform
based on hybrid architecture. In: International conference on wireless communications and
applications. Springer, Singapore, pp. 104–111
Software Complexity Prediction Model:
A Combined Machine Learning
Approach
Abstract The need for computers increased quickly. As a result, the program is
utilized in a significant and intricate manner. More complex systems are being devel-
oped by software businesses. Additionally, customers expect great quality, but the
market requires them to finish their assignment faster. Different measuring methods
are employed by software firms. Some of these include customer feedback after it has
been given to customers, software testing, and stakeholder input. The objective of this
project is to use a combination of machine learning techniques to predict software bug
states using the NASA MDP dataset. The research process considered data prepro-
cessing methods and applied singular and combination machine learning algorithms.
To create the model, the single classifiers were combined using the voting method.
Accuracy, precision, and recall were used to evaluate the model’s effectiveness, along
with tenfold cross-validation. The promising result was recorded by a combination
of J48 and SMO classifiers. Before attempting to test the software product, the
researcher retrieved attribute data from the source code; the complexity of the soft-
ware product will then be ascertained using the constructed model. The main contri-
bution of this study is to improve software quality by incorporating a machine learning
framework into the present software development life cycle between implementation
and testing.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 681
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_53
682 E. Birihanu et al.
Software companies are building various complex systems from time to time.
Usually, software businesses construct a variety of complicated systems. Addition-
ally, customers want great quality while the market expects that projects be finished
quickly. Most frequently, the quantity of faults or defects discovered in software
products can be used to assess the software’s quality. Customer reviews may be one
technique to determine or judge the software quality of a product. However, there
are at least two issues with consumer feedback metrics. In the first place, it does
not provide specific feedback regarding potential areas for improvement, and in the
second, it occurs too late to implement any changes. Software fault classification is
used to both discover and prevent problems.
The need for computers increased quickly. Therefore, the usage of software
grows into enormous and complex. Software is computer programs that are used to
instruct the computer to do a given task. Software Development Life Cycle (SDLC)
[1] describes the methodology by which the development of any software takes
place. SDLC includes feasibility analysis, requirement analysis and specification,
designing, implementation, testing, and maintenance. In this study, the researchers
were focused only on software testing and maintenance. Software testing [2] provides
a way to reduce maintenance and overall software costs, as well as faults. Various
software techniques are applied for testing the software; to reduce error, but, it
cannot possible be error-free. The systems that are taking an input such as char-
acters numbers, and files; it is easy to test for any small input, but what if the line of
code is complex and large?. Software testing is a means of determining the quality
of the software. It is the most important and expensive stages in SDLC. Project
managers must understand “When to stop test?” and “Which part of the code to be
tested?” the answers to these queries will have a direct impact on resource alloca-
tion (i.e., the experience of test staff, how many people to allocate for testing), cost,
and defect rates as well as product quality. Manual software inspection gets more
difficult as software grows and complexity. Software defect prediction is an alternate
strategy that is utilized in this context to predict potential impacts clearly. Testing is
an important component of software engineering since it often consumes between
40% and 50% of development efforts and requires more work for systems that need
higher levels of reliability [2–4]. Software defects have an impact on the price and
time of development in addition to the software’s quality. Various factors, most of
which are connected to human mistake, design or coding issues, data input errors,
documentation faults, and communication breakdowns, could also cause the system
to be faulty. We will use software fault prediction to account for the most common
human variables prior to putting the software systems via testing and maintenance.
Now a day, there not available several datasets that could be mined to discover
useful knowledge regarding software defects. We were used a variety of machine
learning approaches to the publicly accessible datasets CM1, JM1, KC1, KC2, and
KC3 from the National Aeronautics and Space Administration (NASA) software
repository. The goal is to divide the software modules into modules that are prone
Software Complexity Prediction Model: A Combined Machine … 683
2 Related Works
Machine learning has been developing quickly, resulting in a wide range of learning
algorithms for various purposes. Machine learning techniques also used for software
engineering. These techniques overall worth is mostly determined by how well they
work to solve issues in the actual world. As a result, for the field to advance, algorithm
reproduction and application to new tasks are crucial. However, several machine
learning experts have just lately released work on the creation of models for predicting
software defects.
In this section, we examined the research that has been conducted to date divided
into three categories: software defect prediction using classification approaches [5–
9], clustering approaches [10, 11], and ensemble approaches [12, 13]. According
to these categories assessments, we have seen combining two or more classifiers
produces better performance; hence, we used the voting method to combine two
machines learning techniques.
3 Research Methodology
This study is based on data collected from NASA MDP; a dataset created by NASA
for research purposes. To fully comprehend the data, it is necessary to have a close
working relationship with domain experts like software testers and developers.
The design and development of the software complexity prediction model were the
main emphases of this research study. We started by using the chosen data from
the NASA Promise MDP, which contains training data such as software metrics and
associated values. Second, the input data received preparation through the handling of
684 E. Birihanu et al.
missing values and the removal of noise. Third, we applied a 10-fold cross-validation
test in the devloped model while using the selected techniques (J48, MLP, SMO,
and Vote). Finally, a performance report was examined, and software complexity
predictions are in line with fault or non-fault one. The researcher creates a prototype
system with an input form that automatically feeds from the database when the user
clicks on the load data button to assess if it has flaws or not to ensure the design
and development of a model. The architecture of the software complexity prediction
model is given in Fig. 1.
Data on software faults are difficult to find and even while commercial software
companies do not make the data available to the public, for measuring software
faults. Due to its initial version and the fact that 60% of software defect studies
chose it as a priority, in this study, we collected data from NASA MDP Promise [14].
Before using machine learning techniques, we considered several factors, which are
covered in the following subsections.
There are now 13 datasets targeted for software metrics based on NASA Promise
repositories [15]. It includes the distinct programming languages, code metrics (Code
size, Halstead’s complexity, and McCabe’s Cyclomatic complexity), and time limits.
We have selected five datasets (CM1, JM1, KC1, KC2, and KC3) out of the 13
Software Complexity Prediction Model: A Combined Machine … 685
available.1 Tables 1 and 2 illustrate the attributes we utilized and the description of
the dataset.
In this section, we took two sample codes that are measured by software metrics tools
(Prest and Loc metrics). We used Java source code to extract features using Prest
and Loc metrics. The researcher extracted the feature from the source code based on
mathematical methods which is several operands and operators.
1. Prest Based Extracting Attributes: It can simultaneously parse all files written
in C, C++, Java, JSP, and SQL using several parsers. We were used remote method
invocation for scientific calculator Java source code. Table 3 provides a full set
of extracted attributes with corresponding values.
2. Loc Metrics-Based Extracting Attributes: We used source code for remote
method invocation to extract attributes. Loc metrics were used to calculate
physical line, executable logical, blank lines, total lines of code, McCabe VG
complexity, comment, and header comment. Since Prest cannot support these
properties, we only considered IO code, comment, Halsted comment, and Halsted
blank lines. In the Loc metrics tool first the user’s browser the source code to be
extracted attributes (useful information of the source code), then click on count
Loc.
4 Experimental Results
The main goal of this study was to predict software fault using machine learning
approach by single classifiers and combing of two classification methods. We used
Weka 3.6.9 machine learning software to develop a predictive model.
1 http://promise.site.uottawa.ca/SERepository/datasets-page.html.
686 E. Birihanu et al.
The main objectives of this study were to enhance the performance and accuracy of
single classifiers. To achieve this, we first measured the accuracy and performance of
single classifiers and recorded it using the method and performance measure that we
chose. Three algorithms—decision tree, multi-layer perceptron, and support vector
machine were selected by the researchers. Table 3 which discussed below is an
experimental result summary for selected machine learning methods.
Software Complexity Prediction Model: A Combined Machine …
Table 3 Result summary on single classifiers
Dataset Learning method
J48 MLP SMO
Correctly classified Precision Recall Correctly classified Precision Recall Correctly classified Precision Recall
CM1 87.95 0.90 0.96 87.55 0.901 0.969 89.56 0.903 0.993
JM1 79.7 0.83 0.93 80.95 0.841 0.99 80.73 0.807 1
KC1 84.54 0.88 0.93 85.91 0.872 0.978 84.78 0.996 0.85
KC2 81.41 0.87 0.89 84.67 0.873 0.945 82.76 0.828 0.98
KC3 89.3 0.92 0.96 90.61 0.919 0.983 90.82 0.952 0.90
Tenfold cross-validation
687
688 E. Birihanu et al.
Based on the researchers showed that combining of two methods, improve the perfor-
mance and accuracy of single classifiers [16]. We combined two classifiers in order
to evaluate the performance and accuracy of combined classifiers using a vote algo-
rithm with the average probability combinational method. Table 4 which discussed
below is an experimental result summary for selected machine learning methods.
5 Performance Evaluation
In Sects. 3.1 and 3.2, the researcher observed the following basic research work
outcomes:
• J48: As the dataset size increased, the number of leaves, the size of the tree,
and the time required to create the model all increased, while the model’s recall,
precision, and accuracy all decreased.
• When the size of the dataset is increased, the models for MLP, SMO, J48 &
MLP, J48 & SMO, SMO & MLP perform worse in terms of recall, precision, and
accuracy, and take more time to develop.
6 Experimental Discussion
Single and combination classifiers are employed to create the software fault
predicting model, and six independent experiments are conducted for each clas-
sifier and dataset. Based on all criteria, 30 experiments were performed. The dataset
was classified into two sections, i.e., training (90 %) and testing (10%) using ten
cross-validation. The experiment was created to determine if growing the size of the
dataset enhances or degrades the performance of algorithms, to assess the impact
of algorithms on performance, and to compare the effectiveness of algorithms in
predicting software faults. The model produced by the J48 classifier was chosen as
the most effective one for predicting software defects. J48 decision tree generated 25
rules for predicting software complexities data from which the researchers consid-
ered only 7 rules. The graphical user interface was designed and developed using Java
NetBeans IDE 8.0.2 on these seven selected rules. The developed model points out
that extracting of the source code information before software testing then applying
the developed model to determine whether the software product has faults or not.
Software Complexity Prediction Model: A Combined Machine …
Table 4 Result summary on combined method
Dataset Learning method
J48 and MLP J48 and SMO MLP and SMO
Correctly classified Precision Recall Correctly classified Precision Recall Correctly classified Precision Recall
CM1 88.5 0.903 0.978 89.5 0.901 0.993 89.5 0.901 0.998
JM1 80.7 0.808 0.999 80.72 0.807 1 80.72 0.807 1
KC1 85.4 0.881 0.957 84.6 0.848 0.997 84.6 0.845 1
KC2 81.8 0.874 0.901 82.7 0.828 0.998 79.5 0.795 1
KC3 90.6 0.921 0.981 90.8 0.908 1 90.82 0.906 1
Tenfold cross-validation
689
690 E. Birihanu et al.
The outcomes of the experiments demonstrate that increasing the data size affects
how well machine learning algorithms work. To understand why machine learning
algorithm performance declined as dataset size increased, the researchers ran addi-
tional experiments. We used J48 classifiers for the CM1, JM1, KC1, KC2, and KC3
datasets with pruned and unpruned parameters. Table 5 illustrates how pruning affects
the effectiveness of prediction models as data size grows.
According to the findings of the experiments and additional study by the
researcher, machine learning algorithms perform worse because datasets grow since
doing so complicates models, produces more rules, and lowers model recall, accuracy,
and precision. The example in Table 6 is very apparent.
Out of all the approaches, the combination of the J48 and SMO method produced the
results for KC3 data with the highest accuracy (90.82). Figure 2 depicts the impact
of the algorithm on performance for the CM1 dataset; when compared to other
Software Complexity Prediction Model: A Combined Machine … 691
techniques, J48 and SMO showed higher performance; the second performance was
recorded for MLP with SMO and SMO methods for the CM1 dataset.
When compared to the other five classifiers in this study, J48 classifiers achieved
relatively high performance based on performance evaluators like specificity and
execution time. As can be seen in the results of Experiment 1 using the J48 decision
tree, 25 (twenty-five) rules for predicting software defect were produced. We chose
7 (seven) criteria that account for the majority of instances in the provided dataset,
and then, the researchers held in depth discussions with domain experts to ensure
that the chosen rules actually apply to all instances.
In this study, an attempt was made to design and develop an operational applica-
tion prototype named software complexity classification system that uses the clas-
sification rules generated from J48 classifiers. The prototype is used to classify a
software product into one of the software labels (fault or not-fault). The software
complexity classification system includes the user to load extracted source code data
from the database and predict the software product based on the trained into defect
or not defected one if we click on predict button. From Fig. 3, there are 22 inputs
(attributes), and output can be defect or not defect. Inputs are given from extracted
source code data. Exit button is used to close out from the graphical user interface.
The prototype was developed based on the rules generated by the J48 classifier. To
demonstrate the predicted outcome of software defects as TRUE based on rule 7 in
692 E. Birihanu et al.
Fig. 3 Software complexity prediction prototype user interface with sample result
experiment one, we put the extracted source code and data into the prototype as we
saw in Fig. 3. We used Java source code to predict software faults according to the
selected rule generated from J48 classifiers.
Due to its advantages, software-based system development is now growing more than
in past years. Before the software system is made available to end users, quality assur-
ance is necessary. We have several quality measurements, including software testing,
CMM, and ISO standards, to improve the quality of the software. Software testing is
currently becoming more and more crucial to the dependability of software. Using
machine learning techniques, software defect prediction can significantly increase
the effectiveness of software testing and direct resource allocation.
Software Complexity Prediction Model: A Combined Machine … 693
This study main objective is to use a mixed machine learning method for software
complexity prediction in software product. This study attempted to generate practical
predictive models from the NASA MDP dataset and develop a novel graphical user
interface for effective utilization of the developed model.
In the data exploration section, we had a total of 498, 10885, 2109, 522, and 458
records and 22 attributes. To get the dataset ready for experiments, we did noise
removal and handled missing values. To build the model, we used machine learning
algorithms such as a decision tree, support vector machine, multi-layer perceptron,
and a combination of single classifiers using the voting method. We also introduced
a novel, visual representation method (Fig. 3) which visually conveys information
about the extracted software attributes from the source code using Prest and LOC.
The performance of the model was measured by accuracy, precision, and recall. This
study helps the software project managers in predicting and fixing bugs before the
product is given to clients, ensuring the software’s quality.
In the future, we plan to extend the model to more software repositories (Eclipse,
JEdit, Open-source software, and AR datasets). Additionally, we also plan to perform
to examine the effect of attribute reduction on the performance of machine learning
algorithms and apply other software metrics, i.e., resource metrics.
References
1. Saini M, Kaur K (2014) A review of open source software development life cycle models. Int
J Softw Eng Appl 8:417–434
2. Zhang S, Zhang C, Yang Q (2003) Data preparation for data mining. Appl Artif Intell 17:375–
381
3. Sharma C, Sabharwal S, Sibal R (2014) A survey on software testing techniques using genetic
algorithm. arXiv preprint arXiv:1411.1154
4. Shivaji S (2013) Efficient bug prediction and fix suggestions. University of California, Santa
Cruz
5. Pal S, Sillitti A (2021) A classification of software defect prediction models. In: 2021
International conference nonlinearity, information and robotics (NIR), pp 1–6
6. Liu C, Sanober S, Zamani AS, Parvathy LR, Neware R, Rahmani AW (2022) Defect prediction
technology in software engineering based on convolutional neural network. In: Security and
communication networks, vol 2022
7. Ha TM, Tran DH, Hanh LT, Binh NT (2019) Experimental study on software fault predic-
tion using machine learning model. In: 2019 11th International conference on knowledge and
systems engineering (KSE), pp 1–5
8. Peng X (2022) Research on software defect prediction and analysis based on machine learning.
J Phys Conf Ser, p 012043
9. Jorayeva M, Akbulut A, Catal C, Mishra A (2022) Machine learning-based software defect
prediction for mobile applications: a systematic literature review. Sensors 22:2551
10. Marjuni A, Adji TB, Ferdiana R (2019) Unsupervised software defect prediction using median
absolute deviation threshold based spectral classifier on signed Laplacian matrix. J Big Data
6:1–20
11. Park M, Hong E (2014) Software fault prediction model using clustering algorithms
determining the number of clusters automatically. Int J Softw Eng Appl 8:199–204
694 E. Birihanu et al.
12. Elahi E, Kanwal S, Asif AN (2020) A new ensemble approach for software fault prediction. In:
2020 17th International Bhurban conference on applied sciences and technology (IBCAST),
pp 407–412
13. Li R, Zhou L, Zhang S, Liu H, Huang X, Sun Z (2019) Software defect prediction based on
ensemble learning. In: Proceedings of the 2019 2nd international conference on data science
and information technology, pp 1–6
14. Malhotra R (2015) A systematic review of machine learning techniques for software fault
prediction. Appl Soft Comput 27:504–518
15. Gray D, Bowes D, Davey N, Sun Y, Christianson B (2012) Reflections on the NASA MDP
data sets. IET Softw 6:549–558
16. Wang T, Li W, Shi H, Liu Z (2011) Software defect prediction based on classifiers ensemble.
J Inf Comput Sci 8:4241–4254
Work with Information in an Outdoor
Approach in Pre-primary Education
1 Introduction
One of the innovative (although not new) approaches to education, especially in pre-
primary education, is outdoor education. “For the kindergarten teachers, the most
handy and most accessible environment is the nature. Children love nature, they are
interested in everything that surrounds them, and have an unstoppable curiosity”
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 695
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_54
696 J. Burgerová and V. Piskura
[4]. Contemporary globalized society is characterized by, among other things, lack
of exercise, being in nature, loss of physical activity, increasing stress, and overall
demands on the psyche, which is manifested by a decrease in physical fitness, an
increase in various civilizational diseases, and thus a deterioration in the health status
of the entire population. Physical activity is an essential biological expression of life
and necessary for a child’s healthy growth and development [11]. To explain outdoor
education in a simple way, we can generally imagine education in outdoor spaces
outside the classroom. Outdoor education can also be understood as the transfer of
education from the traditional classroom to the schoolyard. The question is, why
would we do it? There are different opinions. There are several opinions. Whether
people are discovering something new about nature, developing skill, or simply
learning how to be comfortable in the outdoors, they often experience a wide range of
emotion [6]. When teachers realize the teaching outside the class, they often observe
the improvement of the children’s behaviour, and the whole class enjoys learning;
children with specific needs are often more successful in the outdoor environment.
In pre-primary education, the national curriculum for pre-primary education docu-
ment, which sets out the state’s essential requirements for providing institutional
education in Kindergarten, is the most important document and the content of educa-
tion. It is structured into areas, out of which two are mathematics and work with
information, describing the standards of informatics education and digital technolo-
gies. “Outdoor” and technology seem to represent two incompatible domains at first
glance. This challenged our interest; particularly, we were interested in the perception
of existing students on the interconnection or intersection of those two domains.
There is a generation of young people, currently studying in higher education
referred to by the terms as Millennial Generation, Howe and Strauss [7], the Net-
Generation surrounded by digital media [10], the Digital Natives by Prensky [8]
and considered to be “native speakers” of the digital language of computers and the
Internet. They are also called and described as the Y and Z Generation replacing
Generation X [13], the iGeneration [9], or Digital Learners, which offers a more
global vision of the twenty-first century learner [2]. Net-Generation generally refers
to the generation that has grown up in the Internet age. The concept of Net-Generation
or Digital Natives was first proposed by American educator Prenskyin [8] and he has
made pro-and-con distinctions between Digital Natives and Digital immigrants [5].
Regardless the label, the young people of this era are generally considered univer-
sally able to and skilled in working with digital technologies. Still, there is research
that speaks against this claim and which demonstrates that there is no automatic
ability for this generation to transfer digital competencies and digital skills into, for
example, academic settings. In fact, there is no evidence that today’s students want
to use these technologies for educational purposes either. Our experience gained
in training the future primary and pre-primary education teachers at the Faculty of
Education of the University of Prešov also sounds against the generally overesti-
mated digital skills of today’s students. The results of testing conducted in 2016
using the IT Fitness Test on a sample of 532 students showed the significant gaps in
the digital skills of our students [3]. Also, the direct experience of teachers technology
enhanced subjects teaching in which it is possible to directly observe the work of
Work with Information in an Outdoor Approach in Pre-primary Education 697
students, points to the absence of skills in working with spreadsheets, graphic editors,
presentation software, the use of cloud computing, but especially the use of tech-
nology for problem solving. Alghamdi et al. [1] realize study with aims to explore
kindergarten student teachers’ readiness to integrate technology into their future
classrooms and factors affect their integration. The first-phase results showed that
participants were ready to implement technologies while having positive attitudes
towards technology integration. The second-phase results confirmed all participants
were able to transfer their technical skills into professional practice. However, few
were ready to practically apply their pedagogies. The results indicate three main
factors, including technological resources, the school infrastructure, and the number
of students in their classrooms. It is recommended to improve teacher preparation
programme to develop teacher technology readiness. In a study titled primary school
teachers’ attitudes towards technology use and stimulating higher-order thinking
in students: a review of the literature. Wijnen et al. [12] present interesting results
focused on teachers’ attitudes towards technology. They conducted two separate
literature reviews on teachers’ attitudes towards (1) using technology (78 articles)
and (2) stimulating higher-order thinking in students (18 articles). To structure the
potential underlying constructs constituting teachers’ attitudes in these two contexts,
they used the Theory of Planned Behaviour. They identified nine factors related to
primary school teachers’ attitudes towards using technology in their teaching and
four factors related to primary school teachers’ attitudes towards stimulating higher-
order thinking. Furthermore, they found that it was not always possible to establish
the impact of each factor on teachers’ intended or actual use of technology and
behaviours stimulating higher-order thinking, respectively.
Considering the above-mentioned characteristics of the students, and all
mentioned researches, in the present research, we focused our attention on the
students’ perception of the interconnection between the outdoor and computer
science fields and the possibility of implementing computer science activities in
the MOE through an outdoor approach.
2 Participants
We electronically surveyed 419 students. All participants were students of the Faculty
of Education, University of Prešov, in the following fields of study: pre-school and
elementary pedagogy and pre-school and elementary pedagogy and pedagogy of
psychosocially disturbed in the academic year 2021/2022. The target group consisted
of students in the first (Y1) and third year (Y3) of bachelor’s studies. We received 153
completed questionnaires, representing a return rate of 37%. Of the 153 students, 99
(65%) were first-year students, and 54 (35%) were third-year students. The partici-
pants were deliberately selected; they were future educators who will work in pre-
primary settings. Therefore, the survey was administered at the beginning and end
of their studies.
698 J. Burgerová and V. Piskura
3 Measurement
The research aimed to analyse the knowledge and attitudes towards outdoor educa-
tion, knowledge about information literacy, and to analyse their interrelationship in
the groups of future kindergarten teachers in the first and the last year of their studies.
For this purpose, we constructed a non-standardized research instrument, a question-
naire, which contained a total of 6 items. The first three items (p1–p3) in the form of
a Likert scale (ranging from 1 to 6) focused on outdoor education. The items p4–p6
focused on information literacy education in kindergartens, with two Likert scales
items and 1 (p6) multiple choice item.
The following hypotheses were formulated:
H1: There is a statistically significant difference in the scores of the two study
groups between items p1 and p2.
H2: There is a statistically significant difference between items p2 and p3 in the
ratings of the two treatment groups.
H3: There is a statistically significant difference between items p4 and p5 in the
scores of both treatment groups.
H4: There is a statistically significant difference in scores between items p3 and
p5 in both treatment groups.
The data collection was done electronically through MS Forms tool. The link to
the research tool was provided to the students electronically. Data collection was
conducted between 3 and 18 May 2021. Each Likert-type item contained a six-item
scale, and a summary score was calculated for each item.
Statistical analysis was performed in the freely available programmes Jasp and
Jamovi. Comparison of scores between the selected items was done by paired t-test.
The above frequentist techniques were used in order to assess whether the difference
between the selected items was statistically significant. Therefore, Bayesian statistics
were used to assess the relative degree of empirical evidence in favour of the null
(H0) or alternative hypothesis (Ha). Thus, for each of the items tested, a so-called
Bayes factor was calculated to indicate the extent to which our beliefs in favour of
one of the hypotheses need to be revised in the light of the observed data. A Bayes
factor greater than 3 can already be considered as significant evidence in favour of a
given hypothesis. At link 1 (bit.ly/38wPbCP) is available research tool and at link_2
(bit.ly/3taKYMj) the complete data.
3.1 Results
At the beginning of the analysis, we were interested in the summary results of the
Likert-type questionnaire items. Items p1, p2, and p3 focused on awareness and
attitudes towards outdoor education and their overall mean rating in both study groups
is high, at 5.29 on a 6-point scale, with the lowest rating in item p3 in Y1: 4.96 and the
highest rating in item p2, also in Y1 5.65. Items p4 and p5 were focused informatively,
Work with Information in an Outdoor Approach in Pre-primary Education 699
and their overall mean rating in both treatment groups is moderately high at 4.53 out
of a 6-point scale. Comparing all the outdoor (Øp1, p2, p3) and informative (Øp4,
p5) Likert-type items, we detect a difference in the ratings of all participants at 0.76
points. Items focused on outdoor education had more positive ratings than those
focused on computer science. All descriptive statistics for each item are presented
in Table 1. The summary ratings show that students rated the items more positively
than negatively overall, averaging 4.99 out of 6 on the 6-point scale.
Hypothesis H1
To begin our analysis, we tested for differences in scores between the p1 and p2 items.
Item p1 read: I know what outdoor education/outdoor access is in kindergarten. In
terms of the evaluation level, both monitored groups (R1 and R3) are at an almost
identical level, with a difference of only 0.08 points. In terms of distribution, the
data in both groups are similarly inconsistent. In the first year, a larger number of
participants evaluated the highest level, while the same number is scattered on levels
5, 4, and 3, the lowest levels are occupied only sporadically. In R3, the situation is
similar, but the lowest level is not occupied at all. In item p2, we asked the students
whether they consider spending more time outside to be beneficial for a child in
kindergarten. Of all the items, the students rated it the most positively, at an average
level of 5.62. The evaluation of both monitored groups is almost at an identical level
(difference 0.05b). Based on the descriptive indicators (Table 2), the data in R1 are
consistent. The majority of students rated grade 6, while no one chose the first 3
grades. In R3, the data are more scattered, but still consistent in the positive region.
A paired t-test was used due to the nature of the research data and the research
population, the results of which are presented in Table 3.
The calculated p-value is at the level of statistical significance with a medium
effect size. Based on this fact, we reject the null hypothesis. The performed Bayesian
analysis demonstrates that assuming the validity of Ha, i.e. p(D|Ha), the observed
700 J. Burgerová and V. Piskura
data are 5220 times more likely than the data that can be expected in the case of
validity of H0, i.e. p(D|H0). Thus, with robust empirical support, we can conclude
that item p2 was rated significantly better by students compared to item 2.
In the next part of the evaluation, we statistically compare items p2 and p3 and
in both treatment groups to see whether students who know that being outdoors
more often is beneficial for the child in the Nursery School as a future educator will
implement outdoor learning in the Nursery School and how this changes over the
course of the study.
It is evident from the data visualisation in Graphs 1 and 2 that first-year students
rated items p2 and p3 diametrically differently. This fact is confirmed by the descrip-
tive indicators in Table 4. The observed difference is 0.69 points from the 6-point
scale. Given the calculated standard deviations, the data for item p2 are consistent,
while the data for item p3 are scattered. Based on the visualized data on Graphs 1 and
2, for item p2, first-year students are unanimous in their belief that being outdoors
more often is beneficial for the child. For item p3, the data no longer suggest this is
the case, and we find a spectrum of ratings. Students in the first year are not unan-
imous in their approach to the subsequent implementation of outdoor education in
teaching practice.
Hypothesis H2
A paired t-test was calculated to test the statistical significance of the differences
between items p2 and p3 in the first year, which are presented in Table 5.
The p-value reaches the level of statistical significance, and thus, we reject the null
hypothesis with a large effect size (Cohen’s d = 0.663). The calculated Bayes factor
probability values clearly tend (BF10 = 4.75) to favour the alternative hypothesis
(BF10 > 3), and the observed level of evidence in favour of Ha is sufficient to declare
its validity. Thus, it can be concluded that the difference in scores between p2 and
p3 and in Y1 was confirmed, respectively. The observed data constitute adequate
empirical evidence to decide on the validity of Ha.
The same statistical analysis was carried out in the third, final year of the Bach-
elor’s degree. The visualized data in Graphs 3 and 4 show similar characteristics
in this case as well. They are consistent for item p2 and scattered in the last three
Work with Information in an Outdoor Approach in Pre-primary Education 701
grades for item p3, a slight change from the first year. The difference in mean scores
between items p2 and p3 in Y3 compared to Y1 is half of that in Y1, 0.32 points on
a 6-point scale. From the descriptive indicators (Table 6), we can see that compared
to the first year, this item was rated slightly more positively.
To test the statistical significance of the differences between p2 and p3 in Y3, a
paired t-test was calculated, and the results of which are presented in Table 7. The
calculated p-value reaches the level of statistical significance, and we are thus able to
reject the null hypothesis with a small effect size (Cohen’s d = 0.384). Furthermore,
the calculated Bayes factor likelihood values show that Ha is a model that describes
the observed data better than H0 (BF10 = 5.14), and the observed level of evidence
in favour of Ha is sufficient to declare its validity.
The difference between p2 and p3 in Y3 is confirmed as the observed data provide
sufficient empirical evidence for the validity of Ha.
Hypothesis H3
In our analyses, we further investigated whether there is a statistically significant
difference between the scores of items p4 and p5, and thus whether there is a corre-
lation between the level of computer literacy represented by item p4 and the idea of
implementing computer science education activities in the MOE through an outdoor
approach. We investigated this separately in the first and third years and compared
the results.
Table 8 gives the basic descriptive statistics for the items. From the mean values,
we can see that students in the first year rated item p4 more positively than item p5,
with a difference of 0.505 points. From the values of the standard deviations, but
also the visualization of the data in Graphs 5 and 6, we can see that there is also a
difference in the variance of the data. While the data for item p4 look consistent, in
item p5, it is diversified. Thus, in item p4, more students rated positively, while the
first stages of the evaluation are not occupied at all.
A paired t-test was used to test the statistical significance of the detected difference
in scores between items p4 and p5 for first-year students, the results of which can
be found in Table 9. As the table gives, the p-value reaches the significance level,
on the basis of which we are able to reject the null hypothesis and thus confirm the
existence of a certain difference.
The performed Bayesian analysis shows that assuming the validity of Ha, i.e.
p(D|Ha), and the observed data are 407 times more probable than the data that can
be expected in the case of validity of H0, i.e. p(D|H0). Thus, with robust empirical
support, we can conclude that there is a statistically significant difference between
the first-year scores of item p4 and p5. Students in their first year of study who
have some level of information literacy (item p4 = 4.76/6) think they can master
information technology, but they cannot imagine using this technology to carry out
outdoor activities.
The same statistical analysis was carried out in the third year of study. The basic
descriptive statistics are presented in Table 10.
Already from the mean values, we can conclude that the difference between these
items in year 3 is almost at the same level as that of the freshmen of 0.463 (the
difference of differences is only 0.042 points). A slight difference occurred in the
density distribution of the data. Overall, the data density distribution shows similar
can be found in Table 13. The calculated p-value reaches the level of significance,
on the basis of which we are able to reject the null hypothesis and thus confirm the
existence of a certain difference.
The implemented Bayesian analysis provides robust empirical support for the Ha
model. Assuming the validity of Ha, i.e. p(D|Ha), the observed data are 165,751
times more likely than the data that can be expected in the case of the validity of
H0, i.e. p(D|H0). Thus, with robust empirical support, we can conclude that there is
a statistically significant difference between the first-year scores of item p3 and p5.
We performed the same statistical analysis for the final year Y3. Table 14 reports
the basic descriptive statistics for items p3 and p5 in the third year. Compared to
Y1, we observe more positive evaluations for both items under study. The detected
difference between the items is at 0.899 (the difference in differences is + 0.141 for
year 3).
In terms of the scatter in the data visualized in Graphs 11 and 12 for item p3
in Y3, the data look consistent compared to Y1. Comparing p3 and p4 in Y3, we
observe a significant difference in the density distribution of the data. Item p3 is
rated significantly more consistently in the positive domain by third parties, with the
first 3 rating levels not occupied at all, whereas item p5 is rated significantly more
diversely by third parties and all rating levels are occupied comparatively.
The calculated p-value is at the level of statistical significance with a large effect
size, formally demonstrating the existence of a statistically significant difference
between p3 and p5 in Y3 and thus formally rejecting H0 (Table 15).
The implemented Bayesian analysis also in Y3 provides robust empirical support
for the Ha model. Assuming the validity of Ha, i.e. p(D|Ha), the observed data are
5487 times more likely (with Bayes facts greater than 3 being sufficient as formal
evidence for the validity of the hypothesis) than the data that can be expected in the
case of the validity of H0, i.e. p(D|H0). Thus, with robust empirical support, we can
conclude that there is a statistically significant difference between the item scores of
p3 and p5 in the third year.
710 J. Burgerová and V. Piskura
The study aimed to analyse the knowledge and attitudes towards outdoor education
and information literacy and to find out their interrelationship among prospective pre-
service teachers in the first and final years of their studies in the MOE. In general, we
can conclude that students rated the items overall more positively than negatively.
Outdoor education represents a new theoretical area that several students have
encountered for the first time at university. Being outdoors is generally beneficial
for people physically and psychologically, and it is also a stably implemented part
of the day that has been in the organization of the day in kindergartens since their
inception, which in our opinion, may have caused the significantly higher score of
the p2 item dealing with the dependence of the frequency of being outdoors on the
child’s well-being.
It was interesting to compare items p2 and p3, where, on the one hand, students
declare that they know what outdoor education is and that being outdoors more
often is beneficial for the child in kindergarten (and with a higher rating in the first
year), on the other hand, they rate significantly lower the item in which they, as future
educators, should implement this type of education, which we observe in both groups
studied.
On the one hand, students are proactive in implementing outdoor activities; on
the other hand, when technology enters into it a problem arises in outdoor activities
for students. The observed difference was confirmed in both study groups and did
not change during the study. Also, the future teachers in the third, i.e. final year,
plan to implement outdoor education in the MoE; however, they already rate the
implementation of outdoor educational activities in the computer field in the MoE
significantly lower.
From the results of the conducted questionnaire research and statistical analysis,
we can conclude that students are familiar with the concept of outdoor education
and generally consider more frequent staying outdoors as beneficial for the child.
However, students know this concept more theoretically and applying the acquired
knowledge to the practical level is a potential problem for them. A similar problem
is observed in the field of technology. On the one hand, students take the initiative
in the implementation of outdoor activities; on the other hand, if they have to use
technology in a practical way, a problem arises in outdoor activities. Our task is,
therefore, to provide opportunities to link these seemingly incompatible areas.
712 J. Burgerová and V. Piskura
References
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 715
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_55
716 G. Datta et al.
Also, these human evaluators are used to verify the accuracy of available automatic
evaluation metrics.
1 Introduction
One of the initial objectives of computers was the automatic translation of text
between languages. Being flexible with human language, MT is considered to be
one of the most challenging jobs in Artificial Intelligence. Historically, rule-based
systems were employed for this task, but in the 1990s, statistical methods super-
seded them. Deep neural network models have recently achieved state-of-the-art
performance in the field of Neural Machine Translation [1, 2]. There are different
models proposed by the researchers in the NMT domain. Some of the popular NMT
systems as proposed by the researchers in different phases are as follows: Bahadanu’s
Attention model, transformer model, etc. [3, 4]. Google helped solve many of NMT’s
issues, including issues with handling rare words and a lack of robustness, among
others, with their NMT system [5]. At Google, in the research paper published in
2017 and titled “Attention Is All You Need,” the transformer model was presented
for the first time. Its purpose was to transform one sequence into another without
making use of any recurrent neural networks (RNNs). The transformer models are
state of the art, and their design is based on a self-attention mechanism that allows
them to function (Fig. 1). The simple transformer model is a sequence-to-sequence
model having encoder and decoder blocks. Each encoder and decoder block has a
stack of identical blocks. Also, each encoder block has two important components
such as self-attention and a position-wise feedforward network.
There is another challenge of MT, i.e., evaluating the performance of MT systems
[6–8]. Although human evaluation is considered to be the best, it is a time-consuming
process. There are many automatic evaluation metrics proposed by the researcher,
such as BLEU [9], precision, recall, F-measure, cosine similarity, METEOR [10].
There is another human-like automatic metric proposed by many researchers [11–13].
In this paper, we have experimented with a transformer-based model on a low-
resource Bengali-to-English language pair (tourism dataset). Bengali is a language
that is mostly spoken in India and Bangladesh. English and Bengali are both morpho-
logically rich languages. We have evaluated the performance of our model and two
more MT engines, such as Google and Bing Translate, with automatic metrics BLEU,
WER, and human evaluation. We also compared the scores of BLEU with gold-
standard human evaluation. The rest of the paper is organized as follows: Sect. 2
describes some previous work on NMT and its performance evaluation with auto-
matic metrics. Section 3 highlights our overall methodology. Section 4 outlines our
experimental setup; in Sect. 5, we have presented some analysis and discussion.
Finally, we put conclusions and future direction in Sect. 6.
Human Versus Automatic Evaluation of NMT for Low-Resource Indian … 717
2 Previous Work
In the paper [12], the authors primarily focused on evaluating the performance of MT
systems. They designed a framework for different types of errors and categorized
them. They performed a comparative analysis of the types of errors available in the
raw text generated by the MT engine and then performed some post-editing processes
to remove them. The aim was to find out the recurrent errors that are produced by the
translation system. This may eventually be helpful for future researchers to produce
robust translation systems. Human evaluation is considered to be the best in eval-
uating the performance of MT systems, but it is time-consuming and cannot be
reproducible. In bilingual translation systems, if the human evaluator understands
both languages, then he can correctly judge the quality of the translation generated
by MT engines. Automatic evaluations are dependent on human evaluation because
they require one or more reference translations to correlate the MT outputs. Some-
times, even automatic metrics fail to generate correct scores when the languages
are morphologically rich. In the paper, the authors address this issue of automatic
metrics by proposing a human-like evaluation metric where they have exploited the
use of scalar and multidimensional quality metrics. Their transformer-based NMT
718 G. Datta et al.
model was trained on low-resource English-to-Irish language pairs. In this study, they
presented a comparative analysis with human evaluation of the outputs generated by
RNN-based and transformer-based models on English-to-Irish language pairs [6].
In another work, researchers during the development of MT evaluation mainly
focused on the semantics of the language. They proposed an approach where the
adequacy of the language is the prime concern rather than fluency. This evaluation
scheme mainly tries to find the semantic similarity between the source and the target
languages and does not depend on the reference translation [13].
3 Methodology
As stated before, to generate a human score, we have carried out a survey that is
based on the questionnaires. Five individuals with a satisfactory to excellent level
of linguistic expertise in both of these languages (English and Bengali) are partici-
pated in the survey. Questions primarily addressing the adequacy and fluency of the
translation were developed as part of the questioners’ inquiries. The adequacy and
fluency of the response were evaluated using a scale with a value that ranges from
0 to 5. The concept of adequacy ensures that the meaning of the source text and the
translated output is the same; that the source sentence and the target sentence are
both complete; and that there is no distortion between the source sentence and the
target sentence, among other things. In a similar vein, fluency guarantees that both
the source language and the target language have the correct syntax. A score of 5 is
given for adequacy when all meanings between the source and target languages are
correct; a score of 4 is given when most of the meaning is correct; a score of 3 is given
when much meaning is given; a score of 2 is given when little meaning is given, and
a score of 1 is given when no meaning is given. When evaluating fluency, a score
of 5 indicates flawless language, a score of 4 indicates good language, a score of 3
Human Versus Automatic Evaluation of NMT for Low-Resource Indian … 719
S+D+I
WER = , (1)
N
where S is the total number of substitutions in the target sentence, D is the total
number of deletions in the target sentence, I is the number of insertions, and N is the
total number of words in the reference.
The overall methodology is represented in Fig. 2. With the help of processed
dataset and default set of hyperparameters, we trained our model. Hyperparameter
tuning is essential in any machine learning model [16–18]. We then tested our model
with the test set. The model’s performance was evaluated with two automatic metrics
such as BLEU and Word Error Rate (WER). As stated before, we also evaluated our
model with human evaluators. Their scores were averaged to compute the final score.
4 Experimental Setup
We used open NMT’s transformer model and trained the model on a Bengali–English
dataset. Our model was trained in the NVIDIA Tesla V100 environment. We trained
our model up to 10,000 training steps. After that, model starts converging. We split
our entire dataset as follows: 70% training set, 15% validation set, and 15% test set.
720 G. Datta et al.
Fig. 2 Schematic representation of the model building and quality evaluation process
The model took around 3 hours for the entire training process. Proper hyperparam-
eter tuning is very important in any machine learning model. Few of the selected
hyperparameters of our transformer model are represented in Table 1.
With the help of selected hyperparameters, we trained our model.
The purpose of the selected hyperparameters is as follows:
Table 2 Randomly picked test sets and their translation scores as generated by human and automatic
evaluation metrics
Test set Translation engine Human evaluation BLEU (scale 0 to 1) WER (in %age)
(averaged) (in scale 0
to 100)
1 Our model 30 0.14 88
2 Our model 25 0.12 91
3 Our model 35 0.11 95
Table 3 Randomly picked test sets and their translation scores as produced by human and automatic
evaluation metrics (in Google translation engine)
Test set Translation engine Human evaluation BLEU (scale 0–1) WER (in %age)
(averaged) in scale
(0–100)
1 Google 90.5 0.11 73.3
2 Google 90.5 0.67 81.81
3 Google 100 1.0 0
Optimizer: It helps the model by selecting suitable parameters so that loss can be
reduced and accuracy can be increased.
Learning rate: It is a tunable parameter that selects step size in each iteration
during its movement to the minimum loss function.
Dropout: It is a regularization technique used in machine learning and deep
learning.
Heads: It is the number of heads in the transformer model.
Decay_method: In machine learning, decreasing the learning rate is a type of
decay. Similarly, increasing the learning rate is a warm-up process. Noam is a type
of decay method used in deep learning where warm-up and decay both exist.
We noticed after around 10,000 epochs there was no further improvement in
training and validation accuracy. We verified the performance of our model with
randomly picked sentences from the test dataset. And, the same sentences were fed
to other online engines Google and Bing. Performance evaluation of all translators
was carried out with automatic and human evaluation metrics. The performance of
our model, Google Translate, and Bing Translate, is reported in Tables 2, 3, and 4,
respectively. The graphical representations of all these results are reported in Figs. 3,
4, and 5, respectively.
From our above experimentation, we can see that our model’s performance is not
satisfactory (Table 2 and Fig. 3). WER, BLEU, and human evaluation scores are
722 G. Datta et al.
Table 4 Randomly picked test sets and their translation scores as produced by human and automatic
evaluation metrics (in Bing translation engine)
Test set Translation engine Human evaluation BLEU in a scale (0–1) WER (in %age)
(Averaged) in 0–100
1 Bing 90 0.12 66.6
2 Bing 90 0.14 68.18
3 Bing 60 0.75 50
Fig. 3 Translation quality scores as generated by human evaluation and automatic evaluation
metrics (BLEU and WER) for the transformer model
Fig. 4 Translation quality scores as generated by human evaluation and automatic evaluation
metrics (BLEU and WER) for Google Translate
Human Versus Automatic Evaluation of NMT for Low-Resource Indian … 723
Fig. 5 Translation quality scores as generated by human evaluation and automatic evaluation
metrics (BLEU and WER) for Bing Translate
presented in Table 2 and Fig. 3. There are possibly many factors for this. We have
used tourist datasets that are too small for deep learning models. Even the dictio-
nary vocabulary was very small for our model. In the case of morphologically rich
languages, the handling of rare words is a serious problem if the vocabulary is
not large enough. Another reason for model underperformance is that our model
was tuned with the default hyperparameter setting. Next, in the evaluation part, we
have evaluated the performance of Google Translate and Bing Translate. We have
picked three sentences randomly from the test set. As we can observe from the tables
and graphical representations in Section 4: Table 3 and Fig. 4, we can see that in
some instances, the BLEU score is less than the human score. The WER is also
significantly higher, whereas, for the same test sentence, the human score is quite
high. The same thing we can notice in Bing Translate; also, results are presented
in Table 4 and Fig. 5. In some cases, the scores generated by the automatic metric
and the human score differ significantly. The reason is that the BLEU is based on
n-gram matching between translated texts and one or more reference texts. It does
not capture the overall semantics of the translated text. Similarly, in WER also, if
there are many substitutions of the words (although the substituted words have the
same meaning) and many deletions, the WER percentage will be higher. These are
some of the reasons; though automatic metrics are faster and reusable, their scores
are sometimes not satisfactory.
724 G. Datta et al.
From the above experimentation, we can conclude that proper tuning of hyperparam-
eter is important in designing any machine learning model. The use case we consid-
ered is an NMT system. We have seen that NMT exploits deep learning, and hence, it
requires a huge corpus to develop a better model. Furthermore, computing the scores
of MT systems with automatic evaluation metrics such as BLEU, WER, and others
is useful for evaluating the overall translation quality of MT models. However, MT
evaluation is a challenging task since the same word can convey different meanings in
some other contexts. Most of the automatic metrics are precision and recall based and
fail to capture the semantics of the words. Hence, MT evaluation is still an interesting
research area, and as discussed before, sometimes intervention of human evaluation
is required because human evaluation is the best [19], though it is time-consuming.
In the future work, researchers are focusing on the strategies to capture the seman-
tics of the hypothesis and reference texts along with the n-gram-based matching to
enhance the accuracy of the automatic metric scores.
References
1. Softmax G, Softmax G, Comparative study of neural machine translation models for Turkish
Language
2. Stahlberg F (2020) Neural machine translation: a review. J Artif Intell Res 69:343–418. https://
doi.org/10.1613/JAIR.1.12007
3. Bahdanau D, Cho KH, Bengio Y (2015) Neural machine translation by jointly learning to align
and translate. In: 3rd International conference on learn represent ICLR 2015—conference track
proceedings. Published online 2015, pp 1–15
4. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. In: Advances in neural
information processing systems, pp 5999–6009
5. Wu Y, Schuster M, Chen Z et al (2016) Google’s neural machine translation system: bridging
the gap between human and machine translation. Published online 2016, pp 1–23. http://arxiv.
org/abs/1609.08144
6. Fomicheva M, Specia L (2019) Taking MT evaluation metrics to extremes: beyond correla-
tion with human judgments. Comput Linguist 45(3):515–558. https://doi.org/10.1162/coli_a_
00356
7. Lin T, Wang Y, Liu X, Qiu X (2021) A survey of transformers. 1(1). http://arxiv.org/abs/2106.
04554
8. Datta G, Joshi N, Gupta K (2021) Empirical analysis of performance of MT systems and its
metrics for English to Bengali: a black box-based approach. In: Paprzycki M, Thampi SM,
Mitra S, Trajkovic L, El-Alfy E-SM (eds) Intelligent systems, technologies and applications.
Springer Singapore, pp 357–371
9. Papineni K, Roukos S, Ward T, Zhu W-J (2002) {B}leu: a method for automatic evaluation
of machine translation. In: Proceedings of the 40th annual meeting of the association for
computational linguistics. Association for Computational Linguistics, pp 311–318. https://doi.
org/10.3115/1073083.1073135
10. Banerjee S, Lavie A (2005) METEOR: an automatic metric for mt evaluation with improved
correlation with human judgments. In: Proceedings of the Acl workshop on intrinsic and
extrinsic evaluation measures for machine translation and/or summarization, ACL 2005, pp
65–72
Human Versus Automatic Evaluation of NMT for Low-Resource Indian … 725
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 727
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_56
728 M. Manzoor and B. Arora
1 Introduction
Malware, sometimes known as “harmful software,” is a catch-all term for any mali-
cious program or code that causes computer damage. Malware is malicious soft-
ware created to infiltrate, damage, or disable a computer system by obtaining partial
control over the operations of the information and communication (ICT)-enabled
devices. The malware that has the power to spread is the most hazardous since
there is no central control, making it difficult to defend them [1]. The types of
malware that are usually found in any kind of malware attack are enlisted as worms,
spywares, viruses, Trojan viruses, ransomware and adwares [2]. Malware is designed
to attack internet-based applications as practically every aspect of life. Malware can
be detected using different methods. The most popular method is using the classifier
to determine whether the file is a malware or not. In order to detect the malware,
several issues and concerns arise. There are various types of malwares that differ
from each other through the propagation modes, functionality, and performance.
The malware writers encrypted their malware so that the former detection mecha-
nism should fail to identify and detect the malware. Key limitations of signature-based
detection methods are that they vary from different antivirus vendors; furthermore,
the signature-based malware detection technology only works with known malware.
Malware authors employ a variety of ways to enable their produced malware to
readily evade and fool signature-based detection methods, which include mutations
like encryption, oligiomorphism, polymorphism, and metamorphism, usually done
by mutation engines [3]. Mutation engines are computer program that can change one
program into another with different codes by encrypting the target software using
various keys and a decryption module that may be customized widely [4]. These
mutations are described as follows:
(a) Encryption: The malware developers did not want their malicious code to be
noticed and detected by any detection mechanism, and for that purpose, encryp-
tion is very simple way. This type of mutation consists of the phenomena of
encrypting and decrypting the malwares while escaping from the detectors and
infecting the target, respectively [3].
(b) Oligiomorphism: It is another important technique used for achieving the camou-
flage behavior of malwares, and the oligomorphic malwares were first emerged
during 1990’s under the name “whale” (as a DOS virus). Unlike encryption
technique where the decryptor remains uniform or same for each malware, the
pleomorphism supports the unique decryptor for each malware infection. This
technique is considered as an advancement in the encryption.
(c) Polymorphism: This technique is the blend of encryption and oligiomorphism,
but these malwares are more infectious than the others. The first polymorphic
virus was first designed by “Mark Washburn” during late 1990’s and was labeled
as virus 1260 [3]. It is very difficult and hard for the antiviruses to detect such
type of malwares as these malwares change their behavior and appearance upon
the generation of new copies.
Framework for Detection of Malware Using Random Forest Classifier 729
The detection methods for malware are classified into several groups from various
perspectives, such as the signatures they produce, their behavior upon executing
under controlled scenarios, and the heuristic analysis; based on these perspectives,
the different techniques of malware detection are depicted in Fig. 1 [5].
1. Signature-Based Malware Detection:
Malware Detection
Techniques
Signature Based
Heuristic Based
Behaviour Based
3. Heuristic-Based Detection:
Heuristic evaluation is a virus-spotting approach that includes looking for suspicious
features in code; the heuristic model was designed to discover suspicious qualities
in unknown, new viruses, modified versions of current threats and known malware
samples [5].
Machine Learning
Techniques
Fig. 2 Taxonomy of different machine learning techniques for malware detection [6]
2 Literature Survey
Based on the various machine learning approaches, various authors have proposed
different methods, along with the objectives and the accuracy for the detection of
malware. A brief description of some of the research is as follows.
Souriet et al. [5] The paper give out an analytical and elaborated survey of malware
detection approaches by employing machine learning. The authors had reviewed
more than 50 papers, and according to their detailed survey, the detection tech-
niques are bifurcated into two classifications, including signature and behavior-based
detections. The research specifies an appropriate category for malware detection
techniques rather than scanning and statistical analysis.
Tahir [3] The paper discusses in detail about the different malwares in addition
to different malware detection techniques. According to the author, malware is the
biggest threat to this digital flow of information as it can manipulate or perform any
obfuscation activity inside a computer by corrupting important files and disabling the
network system with malicious attacks. The author classified the malware as Trojans,
worms, rootkit, adwares, spywares, sniffers, robot-networks (botnets), keyloggers,
spamware, ransomware, etc. The author also briefed about how malware creators
are advanced in their developing malware approach, as the malware creators had
developed new tools to make the malware be unidentified inside a computer or a
network, and the authors labeled the technique as camouflage behavior.
Stahlbock et al. [7] Malware is an ever-changing hazard in the internet era. The
author presented a deep learning architecture as well as an intelligent malware model.
Detection: to show how a deep learning architecture is utilized, this article employs
extracted Windows API calls from portable.executable (PE) files, and autoencoders
(SAEs) are a two-part deep learning approach for intelligent malware identification.
Phases: For intelligent malware detection, unsupervised pretraining and supervised
backpropagation can be utilized.
Liu et al. [8] This study presented a malware analysis system based on machine
learning with three modules: data processing, decision-making, and new virus detec-
tion. The data processing module is in charge of grayscale pictures, opcode features,
732 M. Manzoor and B. Arora
and import functions, which are all used to extract malware features. Then, there
is the detection. The module searches for new malware families using the shared
nearest neighbor (SNN)-based clustering method.
Damodaran et al. [9] In this study, the authors compare malware.detection method-
ologies utilizing static, dynamic, and hybrid analysis. Markov models that are used
to train on both types of feature sets—static and dynamic, as to compare detection
rates among virus families (HMMs). They also include hybrid cases, such as when
dynamic analysis is utilized during training, but static approaches are used for detec-
tion, and vice versa. The authors look static and dynamic analyses, as well as hybrid
methodologies.
Shijo et al. [10] This research proposes a static and dynamic analyses-based
approach for assessing and identifying an unknown executable file. Machines are used
in the technology. With training data consisting of known viruses and benign applica-
tions, learning is possible. After inspecting both the binary code and the dynamic, the
feature vector has determined behavior. The proposed technique employs both static
and dynamic analyses, resulting in improved efficiency and classification accuracy.
The results of the tests show that the static approach is 95.8% correct, the dynamic
technique is 97.1% correct, and the combined method is 98.7% correct. In the exper-
imental setup and assessment, static analysis was done on 997 viral files and 490
clean files, with each file assessed using the strings program.
Saeed et al. [11] The current status of malware infection and the effort being done
to build anti-malware or malware detection systems are examined in depth in this
study. As a result, it provides malware detection system developers with an up-to-
date comparison reference. The major purpose of this review article is to look into the
present state of malware and detection technologies. In addition, the study examines
the approaches and technology utilized to develop anti-malware. The authors in
this paper provide a thorough comparison of major malware families along with a
summary of malware detection systems.
Garcia et al. [12] The authors of this work employed a way of converting a
malware binary into an image and then utilizing Random Forest to categorize malware
families. The method’s usefulness in identifying malware is demonstrated by the
accuracy of 0.9562. The authors used the Malimg Dataset, which comprises 9342
malware samples from 25 distinct malware families, to evaluate their ideas.
Roseline et al. [13] This study presents a complete machine learning-based anti-
malware solution that employs a visualization method that depicts malware as 2D
graphics. The proposed method employs a layered ensemble approach that resembles
the primary features of deep learning while excelling them. The proposed system
requires no hyperparameter change or backpropagation and requires reduced model
complexity.
Chen et al. [14] The researchers look at the peculiarities of API execution and
offer this double retrieval approach based on semantics and structural data. Based
on the API data features and attention method, researchers also designed and built
a sliding local attention detection system. The authors divided their work into three
phases, i.e., in phase first, the authors did analyze the characters or features of the API
execution sequence. In the phase second, the researchers proposed a new novel feature
Framework for Detection of Malware Using Random Forest Classifier 733
extraction model that is totally based on API execution sequence, and in the last
phase, based upon the API data characteristics and attention mechanism feature, the
authors framed a model on detection framework scenario called sliding local attention
detection system (SLAMS) that is totally based on local attention mechanism and
sliding Windows method, and the result shows the accuracy of 97.23%.
Anderson et al. [15] The authors of this paper describe the many assaults against
machine learning models that have been demonstrated in the field of information
security. The researchers used a gradient-boosted decision tree model with a 0.96
area under the receiver operating characteristic score that was trained on 100,000
malicious and benign data (ROC AUC).
Aafer et al. [16] Researchers performed a rigorous study to extract significant
characteristics from malware behavior gathered at the API level and then used the
feature set to test multiple classifiers. Their results show that using the KNN classifier,
researchers can attain an accuracy of 99 percent and a false positive rate of only 2.2
percent. In this research, researchers attempt to address the limitations of permission-
based warning methods by developing a robust and lightweight classifier for Android
applications that can be used to detect malware. The authors use a generic data mining
strategy to develop a classifier for Android applications in this research.
Galen et al. [17] The authors of this paper investigate how machine learning-based
models perform when it comes to detecting malware that is in the portable executable
(PE) format. Given the extensive use of PE format executables (which include.exe
files) on Windows-based machines, when results for these files are available, they
may be of substantial practical use. Many machine learning-based malware detection
studies train and test malware detection models on a large dataset of malware and
good ware samples’ models.
Raman [17] The author of this study proposed a list of seven key characteristics
for differentiating between malware and clean software. To discover these traits, we
used the assumption that attributes from various regions of a PE file would be linked
less to one other and more to the file’s class, dirty or clean. These characteristics
can be utilized as raw data or as input to malware classification algorithms. The
categorization data can be used by antivirus software to increase detection rates.
Bekerman et al. [18] The authors did offer a supervised technique for detecting
malware by analyzing network data in great detail. At the network layer, the proposed
technique removes 972 behavioral characteristics from various protocols. The
researchers then utilize a feature selection strategy to highlight the most important
or relevant features. According to the authors, who based their findings on an exper-
imental investigation of real network traffic from various circumstances (Table 1).
3 Proposed Framework
In this section, the methodology and the framework that are being implemented are
discussed. The technique that is used along with the dataset is given in detail.
734 M. Manzoor and B. Arora
Table 1 (continued)
S. References Author/year Objectives Method Dataset Accuracy
no.
07 [12] Garcia Converting Random Forest Malimg 95%
et al./2016 malware method dataset
binaries into
the grayscale
images
08 [13] Bakerman Network Naïve Bayes, j48 Network 90%
et al./2015 traffic packet and Random traffic
analysis Forest capture
collected by
Verint and
Emerging
threats
09 [14] Aafer Malware KNN classifier McAfee 99%
et al./2013 activity and
recorded at the Android
API level Malware
Genome
Project
10 [15] Raman/2012 Using machine IBK, J48, PART, Clean files j48 Graft
learning J48 Graft, Ridor, (benign and
approaches. and Random files) from Random
The resulting Forest Windows Forest with
algorithms XP and 7 an accuracy
classify and dirty of 98.55%
malware files and
(malware) 98.21%,
from VX respectively
Heaven’s
archive
3.1 Technique
The focus of this research is to propose a method that could detect the latest malwares
with maximum accuracy by using the behavior analysis. The technique that this
research uses is the Random Forest classifier technique for malware detection.
Random Forest employs the supervised learning approach algorithm for learning.
It can be utilized for classification as well as for regression issues. It is based on
ensemble learning, which means that it is a strategy for solving complicated prob-
lems by merging numerous classifiers as well as enhancing the model’s performance.
Random Forest is a machine learning approach that unites the number of decision
trees formed on various subsets of a dataset and combines the results and provides
the average result to improve the model’s predicted accuracy. The important features
or advantages of using a Random Forest classifier are that it predicts output behavior
with high accuracy and runs efficiently, and it keeps track of accuracy even when a
736 M. Manzoor and B. Arora
large chunk of data is missing. Next, we need to develop a model that can classify
the output into two output classes, namely "Malware" and "Benign."
3.2 Dataset
Datasets are crucial in determining the requirement for malware detection depending
on performance. The dataset “Malware Detection” used in this research has been
collected from “Kaggle” repository, the entire dataset consists of about 216,352
instances, in which 75,503 files are malware files and 140,849 files are benign
(normal) files. In our experiment, we extracted 36 characteristics or parameters that
may be used to determine if a file is authentic or malicious, and the full dataset was
split into training and testing phases in a 75:25 ratio. During the experiment, the
greatest accuracy attained was 98.5%.
3.3 Methodology
The methodology is shown in Fig. 3. The following steps are performed to achieve
the results:
1. Importing dataset: The dataset mentioned above is imported into the model.
2. Data preprocessing: After collecting the mixed data containing both malware and
benign files from the dataset, the entire data are preprocessed. The preprocessing
includes removal of null values, and most importantly, the dataset is preprocessed
in such a way that it does not overfit the proposed designed model. The outputs
from this process play a vital role for analyzing the data.
3. Splitting dataset: The dataset is splitted into the training and testing sets in the
ratio of 75% and 25%, respectively.
4. Fitting Random Forest algorithm: We have used Random Forest classifier with
default hyperparameter settings except the use of Gini as criterion in place of
Importing
Data Splitting data into Fitting random
Dataset
Data Pre-processing training & testing forest algorithm on
sets training set
Benign (1)
Predicting the Parameter
Legitimate test results Selection
Malware (0)
Table 2 Parameter
Parameter label Value range
description for the model
1. Id 1–216,352
2. Size of optional header 224 or 240
3. Major linker version 8–48
4. Section alignment 4096 or 8192
5. File alignment 512 or 4096
6. Major operating system version 4–10
7. Minor operating system version 0–2
8. Size of header 512–4096
9. Section min entropy 512–35,840
10. Minor image version 0–20,512
11. Subsystem version 2–9
12. Minor subsystem version 0–20
13. Size of header 512–4096
14. Size of stack commit 0–8196
15. Loader flags 0 or 1
16. Legitimate 0 or 1
entropy. After the splitting phase, the Random Forest classifier is applied on the
training set.
5. Feature selection: The important parameters or characteristics that are used by
the model to label any sample as malware or benign are depicted in Table 2 along
with the range of values. The parameters ranging from 1 to 15 are the important
input features for the model and the parameter with label “Legitimate” is the
output parameter; the model will learn the trends for the classification of the
samples based on enlisted parameters.
6. Predicting the results: So, with the final output of these algorithms, we can deter-
mine whether the given file taken into consideration is malware, i.e., 1or benign,
i.e., 0.
5 Conclusions
References
16. Aafer Y, Du W, Yin H (2013) DroidAPIMiner: mining API-level features for robust malware
detection in android. In: Lecture Notes of the institute for computer sciences, social informatics
and telecommunications engineering. LNICST, vol 127 LNICST, pp 86–103. https://doi.org/
10.1007/978-3-319-04283-1_6
17. Galen C, Steele R, Performance maintenance over time of random forest-based malware
detection models
18. Raman K, Selecting features to classify malware
19. Al-Sammarraie NA, Al-Mayali YMH, Baker El-Ebiary YA (2018) Classification and diagnosis
using back propagation Artificial Neural Networks (ANN) algorithm. In: 2018 International
conference on smart computing and electronic enterprise ICSCEE 2018, pp 1–5. https://doi.
org/10.1109/ICSCEE.2018.8538383
20. Bekerman D, Shapira B, Rokach L, Bar A (2015) Unknown malware detection using network
traffic classification. In: 2015 IEEE conference on communications and network security, CNS
2015, pp 134–142. https://doi.org/10.1109/CNS.2015.7346821
Machine Learning-Based Intrusion
Detection of Imbalanced Traffic
on the Network: A Review
Abstract Cyber threats are a very widespread problem in today’s world, and because
there are an increasing number of obstacles to effectively detecting intrusions, secu-
rity services, such as data confidentiality, integrity, and availability, are harmed. Day
by day, attackers discover new sorts of threats. First and foremost, the type of attack
should be carefully assessed with the aid of Intrusion Identification Methods (IIMs)
for the prevention of these types of attacks and to provide the exact solution. IIMs
that are crucial in network security have three main features: first, they gather data,
then they choose a feature, and finally, they choose an engine. As the amount of data
produced grows every day, so does the number of data-related threats. As a result of
the growing number of data-related attacks, present security applications are insuf-
ficient. In this research, the Modified Nearest Neighbor (MNN) and the Technique
for Sampling Difficult Sets (TSDS) are two machine learning techniques that have
been suggested to detect assault in this research. It is intended to employ an IIM
technique based on a machine learning (ML) algorithm by comparing literature and
giving expertise in either intrusion detection or machine learning algorithms.
1 Introduction
The use of the internet has been steadily expanding recently. It offers a lot of possi-
bilities in applications, considering education, business, healthcare, and a variety of
other industries. Everyone has access to the internet. This is where the primary issue
arises. The information we obtain from the internet must be protected. This Intrusion
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 741
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_57
742 S. V. Sugin and M. Kanchana
Identification (IIM) ensures data security over the network and system. Firewalls
and other traditional ways of implementing, for the sake of security, authentication
procedures have been implemented [1]. The first level of protection for data was
considered, and the second level of protection was studied.
IIM is used to detect illegal or aberrant conduct. An attack is initiated on a network
that is exhibiting unusual activity. Attackers take advantage of network flaws such as
poor security procedures and practices, as well as program defects such as buffer over-
flows, to cause network breaches [2]. It is possible that the attackers are less acces-
sible component services on the lookout to get more control of access or black hat
attackers looking to check on regular internet users for critical information. Methods
for identifying intrusion can be centered on detecting misuse or based on detecting
anomalies. Misuse-based IIM examines traffic on the network and compares it to a
set of criteria in a database of predefined malicious activity signatures. Attacks are
identified in the identification of anomalies method.
Faced with this unbalanced traffic on the internet, we suggested the Technique for
Sampling Difficult Sets (TSDS) algorithm, which compresses the majority class
samples, while in tough situations, enhancing the quantity of minority samples is a
must to decrease the training set’s imbalance and allow the Intrusion Identification
Method to improve category performance [5]. For classification models, as classifiers,
employ RF, SVM, k-NN, and Alex Net.
The intrusion identification model presented in Fig. 2 was proposed. Data prepro-
cessing such as processing of duplicates, incomplete data, and missing data is done
first in our intrusion identification structure [6]. The test and training sets were then
partitioned, with the sets of practice being treated for metrics balance with the help of
our suggested TSDS algorithm. We utilize StandardScaler to normalize and digitize
the sample labels and analyze the data before modeling to speed up the convergence
[7]. Likewise, the practice set is processed and utilized for the training data to be
constructed, which is then evaluated using the test set.
Several traffic data types have comparable patterns in imbalanced network traffic,
and minority attacks, in particular, might be hidden within a significant tough for
the classifier to understand the distinctions between them during the training phase
because there is a lot of typical traffic [8]. The redundant noise data is the majority
class in the unbalanced training set’s comparable samples. Because the majority
class’s number is substantially greater than the class of the minority predictor, who
is not able to understand the minority class’s spread, the majority level is compact.
Discrete traits in the minority class remain constant, but constant attributes change
[9]. As a result, the continuous qualities of the minority class are magnified to provide
data that adheres to the genuine distribution. As a result, we propose the TSDS
algorithm as a means of redressing the imbalance.
First, using the Modified Nearest Neighbor (MNN) technique, the near-neighbor
and far-neighbor sets were created from an unbalanced set of data [10]. Because the
samples from the collection of near-neighbors are so similar, the classifier has a hard
time recognizing the distinctions between the groups. In the identification process,
we refer to them as “exhausting instances and extracts.“ Then, in the tough set,
they move in and out of the samples from the minority. Likewise, the augmentation
samples from the easy set and the toughest set’s minorities are merged to make a new
set of exercises. In the MNN method, the K-neighbors are used as the availability
aspect for the complete algorithm [11]. The number of problematic samples grows
as the scaling factor K increases, as does the compression.
See Table 2.
4 Discussions
The research trends in benchmark datasets for evaluating NIDS models are also
graphically illustrated. The KDD Cup ‘99 dataset is shown to be the most popular,
followed by the NSL-KDD dataset. However, the KDD ‘99 dataset has the issue
of being quite old and not resembling present traffic data flow. Other datasets are
accessible as well, but the research trend in these datasets is quite low due to the new
dataset’s lack of appeal in research. It is suggested that researchers can be encouraged
Machine Learning-Based Intrusion Detection of Imbalanced Traffic … 745
Table 1 (continued)
S. No. Author Attack Dataset Accuracy (%)
17 C. Liu, IEEE Distributed NSL-KDD, 99.87
Access[18] Denial of CIS-IDS2017
Service (DDoS)
18 Y. Li, IEEE Denial of service NSL-KDD 94.25
Access[19] (DOS)
19 Y. Tang, IEEE Access Denial of service UNSW-NB15 88.53
[20] (DOS)
20 G. Siewruk, IEEE Denial of service NSL-KDD 98
Access [21] (DOS)
21 W. Xu, IEEE Denial of service NSL-KDD 90.61
Access[22] (DOS)
22 A. G. Roselin, IEEE Distributed NSL-KDD 81.82
Access [23] Denial of
Service (DDoS)
23 A. R. Gad, IEEE Distributed NSL-KDD, 80.65
Access[24] Denial of KDD-CUP99
Service (DDoS)
24 Z. Li, IEEE Journal Denial of service NSL-KDD, 93.12
[5] (DOS) CIC-IDS2017
25 L. Le Jeune, IEEE Distributed NSL-KDD 94.7
Access [7] Denial of
Service (DDoS)
26 Y. D. Lin, IEEE Denial of service CSE-CIC-IDS2018 97
Access [25] (DOS)
27 M. D. Rokade, (ESCI) Denial of service NSL-KDD-CUP-1999 88.50
[26] (DOS)
28 P. F. Marteau, IEEE Denial of service CIDDS 80
Transactions [27] (DOS)
29 W. Wan, Z. Peng, Denial of service NS-KDD 80.49
(ICCEA) [28] (DOS)
30 M. Lopez-Martin, Distributed UNSW-NB15 91
IEEE Access[29] Denial of
Service (DDoS)
to use modern datasets with more detailed attributes that are more relevant to today’s
environment.
5 Conclusion
In this review, we studied the dataset assault through machine learning techniques.
It reviewed ML models from different assaults available in the dataset. As a result of
Machine Learning-Based Intrusion Detection of Imbalanced Traffic … 747
120.00%
100.00%
80.00%
60.00%
40.00%
Accuracy (%)
20.00%
0.00%
Table 2 (continued)
S. Authors Key findings Techniques used Dataset Limitations
No.
5 M. Wang, An Improved Framework uses NSL-KDD Framework
IEEE Access Conditional SHapley not in real
[8] Variational Additive time
Autoencoder (ICVAE) exPlanations
with a enhance (SHAP)
detection rates
6 A. Kavousi, Anomaly Detection The use of LUBE-SOS Malicious
IEEE Model based on prediction attacks with
Transactions LUBE and SOS intervals (PIs) is different
[10] used to develop severities,
an intelligent data can
anomaly attack easily
detection
approach
7 Z. Chkirbene, Unsupervised and The Euclidean UNSW-NB, In compared
IEEE Systems supervised learning distance map NSL-KDD to modern
[13] approaches are used to (EDM) is a system
create triangle novel method procedures,
area-based closest for detecting the EDM
neighbors (TANN) anomalies using technique has
sequential a lower
algorithms warning rate
8 M. A. Siddiqi, The detection rate of IDS approaches CIC-IDS2017, The
IEEE Access intrusion detection is based on a ISCX-IDS2012 reinforcing
[6] high when guided ML random forest procedure
methods are used were utilized provided less
efficiency
9 G. De The AB-TRAP is used AB-TRAP AB-TRAP Applying
Carvalho to identify attackers in organizes the machine
Bertoli, IEEE both local (LAN) and process of learning
Access [14] global (internet) designing and algorithms to
aspects implementing give fresh
NIDS systems techniques is
a key point in
favor of not
recycling old
datasets
10 Y. Uhm, IEEE To reduce the minority Random forest CIC-IDS2017, Improve the
Access [9] class problem, a (RF) and Kyoto2016 real-time
service-aware decision tree intrusion
partitioning method (DT), as well as prevention
was developed deep neural algorithm that
networks has been
(DNNs), are presented
used to build
NIDS
(continued)
Machine Learning-Based Intrusion Detection of Imbalanced Traffic … 749
Table 2 (continued)
S. Authors Key findings Techniques used Dataset Limitations
No.
11 D. Han, IEEE Network Intrusion Particle Swarm Kitsune The
[4] Identification Methods Optimization scalability of
based on anomaly also (PSO) based on ML-focused
use machine learning algorithm for NIDS is being
(ML) techniques traffic mutation improved
12 L. Jeune, IEEE Intrusion Detection The botnet was DARPA1998, Real-world
Access [7] Expert System (IDES) utilized in a NSL-KDD scenario is not
and HIDS large-scale synthesized in
(DDoS) effort the datasets
on the (DNS)
13 S. Wang, IEEE To protect networks Used firewalls, NSL-KDD, The
Access [15] against malicious deep packet UNSW-NB15 performance
access inspection validated by
systems and UNSW-NB15
intrusion cannot be
detection clearly
systems categorized
14 M. Injadat, SMOTE is done to In order to apply CIC-IDS2017, When
IEEE increase the training Z-score UNSW-NB2015 compared to
Transactions model’s performance normalization the CBFS
[16] and decrease network and SMOTE, approach, the
traffic data class data IGBFS
imbalance preprocessing is method had a
required higher
detection
accuracy
15 W. Seo IEEE In signature-based Convolutional UNSW-NB15 To develop
Access [17] detection and anomaly neural networks’ real-time IPSs
detection, cyberattacks (CNNs) and identify
have made significant algorithm is current
progress used network
system
vulnerabilities
16 D. Gumusbas, Artificial Neural Packet CAPture AWID2018, To do
IEEE Journal Networks (ANNs) and (PCAP) and the CIC-IDS2017 classification,
[11] Deep Belief Networks NetFlow another ML
protocol model is
required
17 C. Liu, IEEE Adaptive Convolutional NSL-KDD, It takes a long
Access [18] Synthetic Sampling Neural Network CIS-IDS2017 time and has a
(ADASYN) (CNN), Long low efficiency
Short-Term
Memory
(LSTM)
18 Y. Li, IEEE Domain Generation Hidden Markov NSL-KDD DNN model
Access [19] Algorithm model (HMM) classification
(DGA) should be
improved
(continued)
750 S. V. Sugin and M. Kanchana
Table 2 (continued)
S. Authors Key findings Techniques used Dataset Limitations
No.
19 Y. Tang, IEEE Randomly initializing Improved NSL-KDD, To increase
Access [20] weights and deviations particle swarm UNSW-NB15 IRELM’s
increases the speed of optimized online capacity to
an extreme learning regularized classify data
machine (ELM) extreme learning
machine
(IPSO-IRELM)
20 G. Siewruk, Context-aware Continuous NSL-KDD Improve the
IEEE Access software vulnerability Integration vulnerability
[21] classification system and Continuous performance
Deployment
(CICD)
21 W. Xu, IEEE The network is Autoencoder NSL-KDD Improve the
Access [22] recreated using Mean (AE)-based deep performance
Absolute Error (MAE) learning of the dataset
approaches
22 A. G. Roselin, To identify malicious Optimized Deep NSL-KDD ODC
IEEE Access network traffic, Clustering technique has
[23] BIRCH clustering (ODC) a lower
technique is used detection rate
of anomalies
23 A. R. Gad, Synthetic minority The Chi-square NSL-KDD, Less
IEEE Access oversampling (Chi2 ) approach KDD-CUP99 complexity
[24] technique (SMOTE) was used to pick
features. ODC
technique has a
lower detection
rate of
anomalies
24 Z. Li, IEEE Gated Recurrent Unit Broad Learning NSL-KDD, Less accuracy
Journal [5] and Long Short-Term System CIC-IDS2017 BLS
Memory algorithms
25 L. Le Jeune, PCCN-based Intrusion NSL-KDD IDES
IEEE Access approaches are used Detection performance
[7] Expert System should be
improved
26 Y. D. Lin, Variational Range-based CSE-CIC IDS2018 Improve the
IEEE Access autoencoder and sequential categorization
[25] multilayer search algorithm of
perception model are segmentation
used
27 M. D. Rokade, SVM-IDS approach Artificial Neural KKDDCUP99, Classification
(ESCI) [26] based on deep learning Network NLS-KDD and detection
algorithm of high-class
objects should
be improved
(continued)
Machine Learning-Based Intrusion Detection of Imbalanced Traffic … 751
Table 2 (continued)
S. Authors Key findings Techniques used Dataset Limitations
No.
28 P.F.Marteau One-class SVM Semi-supervised CIDDS Inaccurate
IEEE classifier (1C-SVM) is DiFF-RF datasets
Transactions used algorithm
[27]
29 W. Wan, Z. All single DNN Generative KDD99, Increase the
Peng,(ICCEA) classifiers are Adversarial NS-KDD sample
[28] integrated using the Networks deduction
AdaBoost technique (GAN) accuracy rate
30 M. Radial Basis Function Radial NSL-KDD, Improve the
Lopez-Martin, (RBF) is implemented Basis Function UNSW-NB15 suggested
IEEE Access Neural dataset’s
[29] Networks performance
(RBFNNs) metrics
the growing number of data-related assaults, present security applications are insuf-
ficient. In this research, the Modified Nearest Neighbor (MNN) and the Technique
for Sampling Difficult Sets (TSDS) are two machine learning techniques that have
been suggested to detect assault in this research. More recent and updated datasets
must be utilized in future research in order to assess deployed algorithms in order to
deal with more current harmful intrusions and threats.
References
1. Liu L, Wang P, Lin J, Liu L (2021) Intrusion detection of imbalanced network traffic based on
machine learning and deep learning. IEEE Access 9:7550–7563. https://doi.org/10.1109/ACC
ESS.2020.3048198
2. Kim T, Pak W (2022) Robust network intrusion detection system based on machine-learning
with early classification. IEEE Access 10:10754–10767. https://doi.org/10.1109/ACCESS.
2022.3145002
3. Alikhanov J, Jang R, Abuhamad M, Mohaisen D, Nyang D, Noh Y (2022) Investigating the
effect of traffic sampling on machine learning-based network intrusion detection approaches.
IEEE Access 10:5801–5823. https://doi.org/10.1109/ACCESS.2021.3137318
4. Han D et al (2021) Evaluating and improving adversarial robustness of machine learning-based
network intrusion detectors. IEEE J Sel Areas Commun 39(8):2632–2647. https://doi.org/10.
1109/JSAC.2021.3087242
5. Li Z, Rios ALG, Trajkovic L (2021) Machine learning for detecting anomalies and intrusions
in communication networks. IEEE J Sel Areas Commun 39(7):2254–2264. https://doi.org/10.
1109/JSAC.2021.3078497
6. Siddiqi MA, Pak W (2021) An agile approach to identify single and hybrid normalization
for enhancing machine learning-based network intrusion detection. IEEE Access 9:137494–
137513. https://doi.org/10.1109/ACCESS.2021.3118361
7. Le Jeune L, Goedemé T, Mentens N (2021) Machine learning for misuse-based network intru-
sion detection: overview, unified evaluation and feature choice comparison framework. IEEE
Access 9:63995–64015. https://doi.org/10.1109/ACCESS.2021.3075066
752 S. V. Sugin and M. Kanchana
8. Wang M, Zheng K, Yang Y, Wang X (2020) An explainable machine learning framework for
intrusion detection systems. IEEE Access 8:73127–73141. https://doi.org/10.1109/ACCESS.
2020.2988359
9. Uhm Y, Pak W (2021) Service-aware two-level partitioning for machine learning-based network
intrusion detection with high performance and high scalability. IEEE Access 9:6608–6622.
https://doi.org/10.1109/ACCESS.2020.3048900
10. Kavousi-Fard A, Su W, Jin T (2021) A machine-learning-based cyber attack detection model
for wireless sensor networks in microgrids. IEEE Trans Industr Inf 17(1):650–658. https://doi.
org/10.1109/TII.2020.2964704
11. Gumusbas D, Yıldırım T, Genovese A, Scotti F (2021) A comprehensive survey of databases
and deep learning methods for cybersecurity and intrusion detection systems. IEEE Syst J
15(2):1717–1731. https://doi.org/10.1109/JSYST.2020.2992966
12. Maseer ZK, Yusof R, Bahaman N, Mostafa SA, Foozy CFM (2021) Benchmarking of machine
learning for anomaly based intrusion detection systems in the CICIDS2017 dataset. IEEE
Access 9:22351–22370. https://doi.org/10.1109/ACCESS.2021.3056614
13. Chkirbene Z et al (2021) A weighted machine learning-based attacks classification to alleviating
class imbalance. IEEE Syst J 15(4):4780–4791. https://doi.org/10.1109/JSYS.2020.3033423
14. De Carvalho Bertoli G et al (2021) An end-to-end framework for machine learning-based
network intrusion detection system. IEEE Access 9:106790–106805.https://doi.org/10.1109/
ACCESS.2021.3101188
15. Wang S, Balarezo JF, Kandeepan S, Al-Hourani A, Chavez KG, Rubinstein B (2021) Machine
learning in network anomaly detection: a survey. IEEE Access 9:152379–152396. https://doi.
org/10.1109/ACCESS.2021.3126834
16. Injadat M, Moubayed A, Nassif AB, Shami A (2021) Multi-stage optimized machine learning
framework for network intrusion detection. IEEE Trans Netw Serv Manage 18(2):1803–1816.
https://doi.org/10.1109/TNSM.2020.3014929
17. Seo W, Pak W (2021) Real-time network intrusion prevention system based on hybrid machine
learning. IEEE Access 9:46386–46397. https://doi.org/10.1109/ACCESS.2021.3066620
18. Liu C, Gu Z, Wang J (2021) A hybrid intrusion detection system based on scalable K-means+
random forest and deep learning. IEEE Access 9:75729–75740. https://doi.org/10.1109/ACC
ESS.2021.3082147
19. Li Y, Xiong K, Chin T, Hu C (2019) A machine learning framework for domain generation
algorithm-based malware detection. IEEE Access 7:32765–32782. https://doi.org/10.1109/
ACCESS.2019.2891588
20. Tang Y, Li C (2021) An online network intrusion detection model based on improved regularized
extreme learning machine. IEEE Access 9:94826–94844. 10.1109/ ACCESS. 2021.3093313
21. Siewruk G, Mazurczyk W (2021) Context-aware software vulnerability classification using
machine learning. IEEE Access 9:88852–88867. https://doi.org/10.1109/ACCESS.2021.307
5385
22. Xu W, Jang-Jaccard J, Singh A, Wei Y, Sabrina F (2021) Improving performance of auto
encoder-based network anomaly detection on NSL-KDD dataset. IEEE Access 9:140136–
140146. https://doi.org/10.1109/ACCESS.2021.3116612
23. Roselin AG, Nanda P, Nepal S, He X (2021) Intelligent anomaly detection for large network
traffic with optimized deep clustering (ODC) algorithm. IEEE Access 9:47243–47251. https://
doi.org/10.1109/ACCESS.2021.3068172
24. Gad AR, Nashat AA, Barkat TM (2021) Intrusion detection system using machine learning for
vehicular ad hoc networks based on ToN-IoT Dataset. IEEE Access 9:142206–142217. https://
doi.org/10.1109/ACCESS.2021.3120626
25. Lin YD, Liu Z-Q, Hwang R-H, Nguyen V-L, Lin P-C, Lai Y-C (2022) Machine LEARNING
with variational autoencoder for imbalanced datasets in intrusion detection. IEEE Access
10:15247–15260. https://doi.org/10.1109/ACCESS.2022.3149295
26. Rokade MD, Sharma YK (2021) MLIDS: a machine learning approach for intrusion detection
for real time network dataset. In: 2021 International conference on emerging smart computing
and informatics (ESCI), pp 533–536. 10.1109/ ESCI50559.2021. 9396829
Machine Learning-Based Intrusion Detection of Imbalanced Traffic … 753
27. Marteau PF (2021) Random partitioning forest for point-wise and collective anomaly detection-
application to network intrusion detection. IEEE Trans Inf Forensics Secur 16:2157–2172.
https://doi.org/10.1109/TIFS.2021.3050605
28. Wan W, Peng Z, Wei J, Zhao J, Long C, Du G (2021) An effective integrated intrusion detec-
tion model based on deep neural network. In: 2021 International conference on computer
engineering and application (ICCEA), pp 146–152. 10.1109/ ICCEA53728. 2021.00037
29. Lopez-Martin M, Sanchez-Esguevillas A, Arribas JI, Carro B (2021) Network intrusion detec-
tion based on extended RBF neural network with offline reinforcement learning. IEEE Access
9:153153–153170. https://doi.org/10.1109/ACCESS.2021.3127689
A Novel Approach to Acquire Data
for Improving Machine Learning Models
Through Smart Contracts
Abstract Despite the Big Data Revolution, critical aspects to improve machine
learning models have been overlooked concerning the benchmarked datasets and their
nature that are available for the model. In this work, we propose a blockchain-based
decentralised, trustless platform using “Smart Contracts” that is tailored exclusively
for data collection from proficient developers and machine learning model improve-
ment. Elastic Weighted Consolidation is used to update this model to take into account
the characteristics of the incoming dataset(s) in order to avoid catastrophic forget-
ting, which occurs when a model only learns from fresh data and ignores its existing
knowledge. A rewarding mechanism has been discussed, which ensures that low-
quality data is not rewarded and good information is compensated fairly based on
the improvement made to the model. It fosters a favourable environment for compe-
tition. This platform is conceived as a marketplace that provides monetary incentives
for developers to partake in improving and contributing to model development.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 755
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_58
756 A. Raghani et al.
The development of computer algorithms that can adapt to new data is the focus of
machine learning.
At the very heart of ML lies Data. Datasets are collections of instances that
all share particular attributes, where each sample is a single row of data. Machine
learning algorithms are given training datasets to help them “learn” and then vali-
dation datasets to ensure that the model interprets the data as it should. Datasets
govern the development, utility, and success of machine learning models. They are
the foundation for developing cutting-edge solutions but also serve as inhibitors of
the potential of machine learning models [2]. Machine learning has experienced
exponential growth, which can be attributed to three factors: (a) algorithms with
high efficiency, (b) improved computational power, and (c) the availability of vast
amounts of labelled data [3].
The ability of models to represent data and the processing power of GPUs have
significantly improved over the past few years, yet data is still being overlooked in the
quest to improve machine learning. Simple examples are the ImageNet benchmark for
visual object recognition and the GLUE benchmark for English textual understanding
[4, 5]. It is evident that the vastly increased availability of data has been a key driver
of AI/ML, coupled with faster processors, less expensive storage, and theoretical
advancement. We have produced and catalogued data exponentially more quickly in
recent years than ever before.
To support the notion of the importance of data for improving our results vastly,
we need to look no further than well-known and established benchmark datasets. The
direction of the aims, ideals, and research agendas of machine learning advancement
has been greatly influenced by benchmark datasets. According to reports, machine
learning algorithms perform astonishingly well when put to the test against these
benchmark datasets. Quality datasets are, therefore, crucial to the construction and
assessment of models in the field of machine learning. The desire to acquire newer and
more varied datasets is motivated by the realisation that a single group of developers
or institutions cannot, on their own, create or amass a dataset that best serves their
intended purpose, i.e. has a dataset that is highly representative of the data the systems
would encounter in real-world use.
However, recent developments reinforce the understanding that datasets limit
the capability of machine learning and artificial intelligence along with the emer-
gence of concerns regarding biases and societal prejudices creeping into the realm of
computing and AI as a result of the shortcomings in the underlying data leading to
disturbing and unfavourable trends in the domain. Particularly, current data processes
have a propensity to abstract away human labour, arbitrary assessments and biases,
and variable circumstances involved in dataset development [6]. To elaborate on
one facet of data concerns, Paullada et al. [7] imply that the relationship between
inputs and target labels seen in datasets is not necessarily significant and that the way
objectives and data are gathered might cause models to rely on unreliable heuris-
tics. The issues this raises go beyond false conclusions drawn from benchmarking
studies. When machine learning models can use fictitious cues to predict outcomes
well enough to surpass a baseline on test data, the resulting systems may give the
impression that fictitious tasks that do not correspond to real-world capabilities are
A Novel Approach to Acquire Data for Improving Machine Learning … 757
legitimate. Formally speaking, certain tasks have the ability to be specified but cannot
have an adequate extensional realisation frequently because the task’s underlying
theory is flawed.
The type of unique, skilled, and methodical annotation used in dataset collection
was found to be “slow and expensive to acquire,” so developers turned towards the
unrestricted collection of increasingly large amounts of data from the web, along-
side increased reliance on crowd workers and contributing developers. Recently,
the machine learning field has turned to approaches with much more robust data
requirements.
In order to address this issue, which is at the heart of data science and machine
learning, our solution—which is covered in more detail in Sect. 3—sets out to develop
better machine learning models and solutions. To do this, it will tap into the world’s
machine learning talent and enhance machine learning solutions. It will also create
a decentralised platform on the blockchain. In order to increase the performance of
current machine learning models, we thus make use of breakthroughs in the field
of blockchain technology and the theoretical underpinnings of federated machine
learning.
The remainder of the paper is structured as follows: the backdrop of present
systems is given in Sect. 2, along with background information on the fundamental
ideas upon which our solution is built. The summary of the proposal and the justi-
fication for the system’s architecture are included in Sect. 3. Key components and
the architecture are highlighted in Sect. 4. The proposal is restated and submitted for
confirmation in Sect. 5. Section 6 contains the paper’s conclusion as well as a list of
references.
2 Background
At the outset, we acknowledge prior research that shares the objectives of our proposal
and provides the groundwork for the combination of blockchain technology, smart
contracts, and machine learning. The most insightful work is presented by Harris
and Waggoner [8], who lay out their idea for a system that would allow users to
collaborate to build a dataset and host a continuously updated model using smart
contracts. We acknowledge their proposal and have worked to refine and build on its
foundations to create a functioning marketplace. We also give credit to the DanKu
protocol proposal, which was the first to advocate using blockchain technology to
establish contracts that provide compensation for a machine learning model that has
been trained for a specific dataset [9]. This proposal is a derivative of their initial
proposal, from soliciting machine learning models as solutions for certain datasets
to soliciting datasets to refine existing machine learning models. Similarly, Marathe
et al., in their proposal DInEMMo [10], suggest a complete marketplace for both ML
model and data sharing, which provides theoretical insight for this proposal.
For conceptual similarity, we also note Waggoner et al. [11], who put forward a
technique for buying data from a set of individuals. Participants are encouraged to
758 A. Raghani et al.
participate if they feel that their data points are representative or that the information
they provide will help the mechanism make more accurate predictions in the future
using a test set.
The goal to cultivate, encourage, and increase the engagement of experienced
developers in machine learning creation and refinement to advance the field has
served as the main inspiration for this idea. At the same time, the Open-Source
movement has greatly enabled and inculcated the values of learning, sharing, and
cooperation. However, it is still limited because this strategy depends on the other
contributors’ willingness to participate. Hence, we propose a blockchain-powered
marketplace designed to facilitate the improvement of machine learning models by
adequately incentivising collaboration. Before elaborating on this system, we briefly
examine the core technologies and concepts which serve as its foundation.
2.1 Blockchain
Blockchain is a decentralised and distributed ledger system that promotes trust and
dependability since data and transactions are only added to the chain once participants
have come to an agreement. We use its services in our system to make sure that such
agreements will happen smoothly, securely, and reliably [12].
In essence, smart contracts are computer programmes that are kept on a blockchain
and linked to a specific blockchain address that contains the contract’s source
code [13]. A smart contract’s code cannot be modified when it is published, and
anybody can interact with it [14]. Transactions are traceable and irreversible, and
the code governs their execution. Transactions are traceable and irreversible, and the
code regulates their execution. Without the need for a centralised authority, a legal
system, or an external enforcement mechanism, smart contracts enable trusted trans-
actions and agreements to be made between dispersed, anonymous parties. Simple
“if/when…then” phrases that are typed into code and placed on a blockchain are how
smart contracts operate. When predefined circumstances have been verified to have
been met, a network of computers will carry out the actions.
The main concept is that supplemental data contracts are provided, where the best
model currently available is improved with additional data. The best model currently
available can be improved with new data points through a data contribution contract
A Novel Approach to Acquire Data for Improving Machine Learning … 759
that the organisers can design. Participants are compensated if their data contribution
improves the performance of the best model, i.e. fills in the gaps in the training data
already available. For instance, newly developed data can include fresh texts, movie
titles, images, etc. In a study, Chen et al. [15] make the claim that if we have a
test dataset, we can pay data providers based on how well the model trained on
their provided data performs on the test dataset. When a data buyer lacks access to
a test dataset, they also investigate the creation of incentive systems for obtaining
high-quality data from multiple data sources.
3 Proposed Solution
The submitted machine learning model is tested on the validation dataset, and base
accuracy is evaluated on the backend server. The validation dataset and submitted
machine learning model are uploaded onto the IPFS [18]. The hashes from the vali-
dation dataset and machine learning model uploading, as well as the base accuracy
assessed earlier, are then added to the smart contract. The Ethereum blockchain is
where the smart contract is stored and deployed. Initiation of the contract results in
creating a new competition on the application.
The “Developer” or “Data Contributor” views the contract and reads the model
and data definitions. The developer may upload the data if doing so will enhance
the model’s accuracy. The existing model is trained on the new data provided; after
training is completed, the new model is tested on the validation data uploaded earlier,
and a new accuracy is calculated. If the new accuracy is more than the base accuracy,
the submission is considered successful. Multiple such developers can participate in
the contract, and if the accuracy obtained by evaluating new models trained on their
datasets is more than the base accuracy, then those developers will be eligible for
some portion of the reward stated by the organiser.
When the organiser downloads the final model, which is an aggregation of all
the models with accuracy more than the base model on the validation data, then the
rewards will be distributed to all the eligible developers who contributed data to the
competition. A continual learning approach is proposed for averaging the weights
when the final model is created. The architecture diagram of the solution is illustrated
in Fig 1.
4 Functional Overview
We propose a solution where the participants collaboratively improve a model and use
smart contracts to store the submissions and incentivise contributors that improve the
model and apply federated learning with a continual learning approach. The reward
mechanism is designed to receive good data only. The basic outline of the flow of
the application is illustrated in Fig 2. Our framework has three phases:
An initiation phase in which an organiser stakes a token to be awarded to
contributors and shares the validation data and base model,
A contribution phase in which participants submit training data samples and
train their data on the base model,
A remuneration phase in which the provider pulls the model and the best models
are averaged, and the reward is distributed based on the improvement made by indi-
vidual submissions. The specifics needed to explain the proposal’s intricacies are
covered in the parts that follow.
A Novel Approach to Acquire Data for Improving Machine Learning … 761
The organiser initialises the smart contract by uploading the validation data and base
model. The validation data is used for assessing the quality of the data submitted.
This is done by first calculating the base accuracy, and once the base model is trained
on the data submitted by the contributor, the new accuracy score will act as feedback
on the quality of the data. Additionally, if the validation data is kept private, the
dataset given may be biased, which may encourage data providers to report data
that is biased in favour of the validation set, reducing the usefulness of the acquired
data for subsequent learning or analysis activities. The organiser then specifies the
reward he is ready to stake. Mechanisms exist for regulating the reward price set by the
organiser [19]. This is to make sure that the contributors are adequately compensated.
In case the contract fails, the reward is refunded back. All the computations in these
phases are done off-chain to reduce gas costs on Ethereum and the incapability to
handle floating-point calculations. Once the organiser is satisfied with the contract,
the contract is submitted to be deployed. The Data Contribution Contracts Service
(DCCS) then deploys the contract in the organiser’s account, and the model file and
validation data are uploaded on IPFS to ensure privacy and availability. The Hash
generated from it is then stored in the contract as a struct instance, and the contract
is now publicly available on the portal for contributors. The gas cost of the whole
process is borne by the organiser. The code below is the Solidity snippet of a contract
entity.
struct Contract{
uint 256 reward;
address payable organiserAddress;
string validationDataHash;
string baseModelHash;
string model description
}
Once the commitment phase is over, the contract is available publicly now, and
contributors can understand the problem at hand through the problem description.
The organiser offers metadata, such as the range of values for each characteristic and
potential labels on the records, to help with interaction. In recent years, we have seen
the emergence of marketplaces like WorldQuant and Xignite, where data is available
as a commodity. These marketplaces’ main objective is to facilitate communica-
tion between data contributors and data consumers when they need data to carry
out certain tasks, such as developing new machine learning models, enhancing the
precision of existing ones, or doing statistical estimation. Recent works have demon-
strated methods that strike a balance between exploration (learning more about the
A Novel Approach to Acquire Data for Improving Machine Learning … 763
data that the provider has) and exploitation (putting that knowledge to use in allo-
cating the limited data acquisition budget) [20]. Concerns like data privacy, repeating
submissions to steal rewards, and data security are not addressed here. Various data
acquisition techniques are used by contributors, like data discovery, data acquisition,
and data generation (crowdsourcing or synthetic data) [21]. Once the user is satisfied
with the data collected, the user uploads the data. The data is received by the DCCS
service, and the base model is trained on this data. After the training is finished,
the validation data hash is fetched from the contract and accuracy is computed, and
the data contributed is stored on the IPFS. Since EVM cannot handle floating-point
numbers, we propose storing the decimal part and integer part separately. This will
ensure fairness even in close-case scenarios. Again, the objective of storing on IPFS
and not EVM is to reduce gas costs. All off-chain API calls are handled by Oracalize,
a service that allows smart contracts to access data from other blockchains and the
World Wide Web [22]. The submission details are stored on the contract. Code below
is the Solidity snippet of a Submission entity:
struct Submission{
address payable contributor;
uint 256 accuracyInteger;
uint256 accuracyDecimal;
string contributedDataHash;
string baseModelHash;
string improvementInAccuracy;
}
Once the organiser is satisfied with the improvement in the metrics so far, the contract
can be closed. Here, the organiser will receive a newly trained model with a better
accuracy score on the validation data. Optionally, the organiser can also acquire
the data points that contributed to the improvement. This will be useful when the
organiser wants to periodically again create a new contract in the future. We take
inspiration from the federated learning approach for combining the learnings from
all submissions that improved the base model. This phase is further divided into two
sub-phases.
Incentives are required to encourage people to contribute new data that will help
improve the model’s performance. Data contributors can earn points and badges
when other contributors validate their contributions, just like on websites like Stack
Overflow. Incentives must not be distorted or misaligned. Because the prize structure
is winner-take-all, second place is equivalent to not competing at all. This eventually
764 A. Raghani et al.
leads to an equilibrium, in which only a few teams compete, and potential new
teams never form because catching up appears highly unlikely. A structure like this
is anticollaboration. Competitors are strongly encouraged to keep their techniques
confidential. This is in stark contrast to many other crowdsourced projects, such as
Wikipedia, where participants must build on the work of others [23].
Willingness of the contributors to improve the model is the one thing that the
proposal is dependent on. So, there is a need to automate this process. Since we are
storing the new accuracy scores of all submissions, it makes sense to reward only
those who contribute “good data.” As proposed in our previous work [24], we put
forward using a fair and equitable incentive system. Let’s assume the base metric
score is B, and we have “N” contributors and their metric scores are S 1 , S 2 ,…, S N ,
so the incentive received by the ith contributor will be Ri which will be:
⎛ ⎞
n
Ri = ⎝(Si − B)/ (Sk − B)⎠ ∗ R, (1)
(k=0)
where R denotes the total reward specified in the contract. The mechanism described
above ensures that low-quality data is not rewarded. Because there is no winner-take-
all scheme, the reward mechanism encourages more people to contribute. It fosters a
healthy level of competition. The whole mechanism for reward distribution is shown
in Algorithm 1.
Continual Learning
The ability to learn tasks sequentially is critical for artificial intelligence development.
In general, machine learning models are incapable of this, and it has long been
assumed that forgetting of weights learned is an unavoidable property of models
dependent on previous input. This process is known as continual learning [25]. Since,
in our approach, we want the models to improve upon the existing learned parameters,
we have used the Elastic Weighted Consolidation (EWC) algorithm [26]. The basic
idea behind this algorithm is a quadratic penalty, introduced to constrain the network
parameters to stay within the low error region for task A when learning to perform
B. The quadratic penalty acts as a “spring” of sorts to anchor the parameters to
previously learned solutions, hence the name Elastic Weight Consolidation. This
iterative process is executed over all the submissions that improved the base model.
Federated Averaging
For all the submissions that have improved the base accuracy, we now have to combine
the learnings from all models into one global model.
A Novel Approach to Acquire Data for Improving Machine Learning … 765
5 Validation
no deduction is made from the organiser’s wallet at this time. Published contracts
are available on the marketplace and can be viewed by all the users, except the
organiser, from their developer dashboard, as shown in Fig. 3. Developers can upload
new datasets from their dashboards for a particular contract. The base model is
trained on the newly submitted data on the backend server using the elastic weight
consolidation (EWC) so that the model learns new data and also remembers old
data. If the performance of the new model on the validation dataset uploaded by the
organiser is better than the old model, the weights of the new model are stored on the
IPFS network and the hash for these weights is stored in the smart contract along with
the contributor address. The organiser can view the performance of the submissions,
as shown in Fig. 4. On contract termination, the organiser downloads the model, and
for all the models that have performed better than the base model, their weights will
be averaged. The amount of the reward is deducted from the organiser’s account and
deposited into the account of all the contributors whose data was accepted into the
model development process.
This proposal, however, is subjected to certain technical limitations similar to
some of those presented by Harris and Waggoner [9], namely, Bad intent and
response, Illusionary and Ambiguous Data, and Overwhelming the Network, which
are briefly explained as follows:
Bad intent and response: this essentially refers to the ability of certain developers
to “game” the system in a manner, where in bad faith, they can block other submis-
sions by filling up the submission limit without making any meaningful contribution.
A wealthy and determined agent can corrupt the contract. The incentive mecha-
nism should make it costlier for the contributor with every wrong submission or set
submission limits.
Illusionary and Ambiguous Data: on the other hand, the “organisers” who use
this framework must carefully evaluate the type of model and how providing unclear
data can affect the system, as doing so may merely give the impression of greater
performance. To avoid a bad submission penalty, the contributors should make sure
that they meet the data quality standards mentioned in the contract.
Overwhelming the Network: public blockchain-based applications have had
dependability problems as a result of network congestion. While we circumvent this
by using a private blockchain, its scalability is somewhat constrained, and resources
are relatively scarce. When adding data that necessitates making new transactions,
this can be problematic for this framework.
6 Conclusion
In this paper, we have introduced a system that intends to source better quality datasets
and improve the efficiency of existing machine learning models by incentivising
contributors. Contributed datasets are stored securely on IPFS, which also makes the
system fault-tolerant to data storage failures and highly scalable. The gas costs of
keeping enormous amounts of data on-chain are reduced by off-chain interactions;
instead, the data is kept on IPFS, and only the hashes of the data will be saved in the
smart contract. We use smart contracts to maintain a record of all contributors and
ensure fair and unbiased distribution of rewards among the contributors. Our novel
rewarding mechanism filters out bad-quality data. Since the model has to improve on
the existing benchmark, the EWC algorithm is used to avoid catastrophic forgetting of
prelearned weights of submissions that improve baseline metrics. Our system creates
a platform to improve machine learning models using blockchain technology. With
advances in technology, the solution can be scaled for the improvement of more
complex machine learning models. Also, we anticipate the use of blockchain in AI
not only for improving machine learning models but also in various other aspects of
machine learning.
768 A. Raghani et al.
References
1. IBM Cloud Education (2020) Machine learning, 15 July 2020. Retrieved from https://www.
ibm.com/cloud/learn/machine-learning
2. Halevy A, Norvig P, Pereira F (2009) The unreasonable effectiveness of data. IEEE Intell Syst
24(2):8–12
3. Sun C, Shrivastava A, Singh S, Gupta A (2017) Revisiting unreasonable effectiveness of data
in deep learning era. In: Proceedings of the IEEE international conference on computer vision,
pp 843–852
4. Deng J et al (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE
conference on computer vision and pattern recognition. Ieee
5. Wang A, Singh A, Michael J, Hill F, Levy O, Bowman S (2018) GLUE: a multi-task benchmark
and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP
workshop BlackboxNLP: analyzing and interpreting neural networks for NLP, Association for
Computational Linguistics, pp 353–355
6. Scheuerman MK, Denton E, Hanna A (2021) Do datasets have politics? Disciplinary values in
computer vision dataset development. In: Computer Supported Cooperative Work, CSCW
7. Paullada A et al (2021) Data and its (dis) contents: a survey of dataset development and use in
machine learning research. Patterns 2(11):100336
8. Harris JD, Waggoner B (2019) Decentralized and collaborative AI on blockchain. In: 2019
IEEE international conference on blockchain (Blockchain). IEEE
9. Kurtulmus AB, Daniel K (2018) Trustless machine learning contracts; evaluating and
exchanging machine learning models on the ethereum blockchain. arXiv preprint arXiv:1802.
10185
10. Marathe A et al (2018) DInEMMo: decentralized incentivization for enterprise market-
place models. In: 2018 IEEE 25th international conference on high performance computing
workshops (HiPCW). IEEE
11. Waggoner B, Frongillo R, Abernethy JD (2015) A market framework for eliciting private data.
In: Advances in neural information processing systems, vol 28
12. NIST Blockchain Technology Overview Draft NISTIR8202, 23 Jan 2018
13. Pinna A, Ibba S, Baralla G, Tonelli R, Marchesi M (2019) A massive analysis of Ethereum
smart contracts empirical study and code metrics. IEEE Access 7:78194–78213
14. Bartoletti M (2020) Smart contracts. Front. Blockchain, 4 June 2020
15. Chen Y, Shen Y, Zheng S (2020) Truthful data acquisition via peer prediction. Adv Neural Inf
Process Syst 33:18194–18204
16. McMahan B, Moore E, Ramage D, Hampson S et al (2016) Communication efficient learning
of deep networks from decentralized data. arXiv preprint arXiv:1602.05629
17. Wood G (2014) Ethereum: a secure decentralized generalized transaction ledger. Ethereum
Project Yellow paper 151(2014):1–32
18. Benet J (2014) IPFS—content addressed, versioned, P2P file system
19. Chen L, Koutris P, Kumar A (2018) Model-based pricing for machine learning in a data
marketplace. arXiv preprint arXiv:1805.11450
20. Li Y, Yu X, Koudas N (2021) Data acquisition for improving machine learning models. arXiv
preprint arXiv:2105.14107
21. Roh Y, Heo G, Whang SE, A survey on data collection for machine learning: a big data-ai
integration perspective. IEEE
22. Liu X, Chen R, Chen YW, Yuan S-M (2018) Off-chain data fetching architecture for Ethereum
smart contract, pp 1–4. https://doi.org/10.1109/ICCBB.2018.8756348
23. Abernethy JD, Frongillo R (2011) A collaborative mechanism for crowdsourcing prediction
problems. In: Advances in neural information processing systems, vol 24
24. Ajgaonkar A et al (2022) A blockchain approach for exchanging machine learning solutions
over smart contracts. In: Science and information conference. Springer, Cham
25. Ring MB (1998) Child: a first step towards continual learning. In: Learning to learn. Springer,
pp 261–292
A Novel Approach to Acquire Data for Improving Machine Learning … 769
26. Kirkpatrick J et al (2017) Overcoming catastrophic forgetting in neural networks. Proc Natl
Acad Sci 114(13):3521–3526
27. Wang H et al (2020) Federated learning with matched averaging. arXiv preprint arXiv:2002.
06440
28. Reyes J et al (2021) Precision-weighted federated learning. arXiv preprint arXiv:2107.09627
Blockchain Framework for Automated
Life Insurance
Abstract The insurance process is done manually in India. The insurance client has
to depend on an insurance agent from buying to the claim firing process which leads
to wrong entry, fraudulent claims, and cost overhead on the agent’s commission. An
insurance client has to maintain documents and make them available at the time of
claim firing, which is very cumbersome. Blockchain smart contracts will provide
a secure, automatic, cost-cutting, paperless, and real-time insurance process. This
paper aims to provide a blockchain solution for life insurance claim processing,
as it is the most purchased and needy insurance. An architectural illustration of a
blockchain prototype is contributed to this paper. As a result, the authors found that
using Hyperledger Composer, we can develop fine-grained insurance applications.
1 Introduction
Blockchain development for insurance is in its early stage. Many researchers are
interested in adopting blockchain for the insurance process. Using blockchain for
insurance will provide secure and automatic claim processing. In most cases, family
members have no idea about any life insurance purchased by a dead person, and delay
is made for claim processing or collecting all required documents for an insurance
Raja Kulkarni and Shivkumar Kalsgonda—These authors contributed equally to this work.
V. Kalsgonda (B)
Department of Computer Science, Shivraj College, Gadhinglaj, Maharashtra 416502, India
e-mail: vpkalsgonda@gmail.com
R. Kulkarni
Department of Computer Studies, Chhatrapati Shahu Institute of Business Education Research,
University Road, Kolhapur, Maharashtra 416004, India
e-mail: drrvkulkarni@siberindia.edu.in
S. Kalasgonda
E-Commerce, Distributed Systems, Seattle, USA
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 771
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_59
772 V. Kalsgonda et al.
claim; sometimes, beneficiary does not get any benefits of insurance purchased. If a
blockchain solution is used for the insurance process, there is no need to maintain
documents and make them available at the time of claim processing by the client,
and also, the documents will be kept secure in the blockchain. The insurance process
will become automatic by use of smart contracts, and the smart contract is a small
program written on top of the blockchain and responsible for transaction processing
under particular conditions. For implementing blockchain solutions for the insur-
ance industry in India, a consortium blockchain network has to be used as multiple
organizations are involved in this network and only authorized nodes have to give
access permissions, and this approach does not require the use of cryptocurrencies.
So, implementing blockchain for the insurance industry in India, Hyperledger Fabric
is the best solution. In this paper, the authors aim to design a blockchain solution
for life insurance using the Hyperledger Composer tool of the Hyperledger Fabric
framework.
2 Literature Review
In India, banking, insurance, and card industries are coming together to form a consor-
tium to realize the benefits of blockchain at an industry level. Insurers are focusing on
using blockchain solutions to speed up claim processing; also, blockchain is the best
solution for avoiding errors, which will introduce while manual entry [1]. Integrating
blockchain with IoT devices can be used such as supply chain [2], Unmanned Aerial
Vehicle, and [3] also for accessing and managing IoT devices [4, 5]. Using existing
blockchain frameworks, we can develop applications that are solved only by binary
conditions [6]. The papers dealing with implementation details are fewer in number,
and in most of the papers, researchers are interested to use Ethereum [7]. Blockchain
technology has a lot of benefits for an insurance company, but we cannot fully auto-
mate all the steps in insurance processing. Certain validations by manual step are
required in the current claim process. Blockchain can be adopted for limited use
cases where there is no requirement for complex regulatory processing [8]. Nath [9]
reported that blockchain technology to share fraud intelligence will make it harder
for any fraudulent activity by criminals and further suggested adopting this tech-
nology step by step to avoid the big bang. Integrated use of both Hyperledger Fabric
and Ethereum is used for implementing blockchain and for transportation insurance
is used for getting advantage of both the private and public blockchains. Hyper-
ledger Fabric is used for storing data and Ethereum is used for modeling payments
[10]. As the insurance industry suffers from fraud [11], researchers are going to use
AI for fraud detection and permanently save the result in blockchain to minimize
claim refund losses. Mayank [12] proposed a blockchain framework for insurance to
offer fine-grained access control and experimented by scaling up the network to test
the robustness of the system and finally concluded that the network size is directly
proportional to the confirmation time; the more the number of nodes, the more will be
Blockchain Framework for Automated Life Insurance 773
the confirmation time, and in short, the slower will be the network. Blockchain solu-
tions to the insurance industry will speed up the insurance processing with reduced
cost [13]; as per the architectural concern, consortium blockchain network will be
preferable for automatic processing and the public blockchain will be the solution
for payment purposes. Though the blockchain solution has many advantages, there
are issues like scalability, network lighting, and lack of skill, to write bug-free smart
contracts. Once these drawbacks are overcome, blockchain solutions will success-
fully automate insurance processing. According to a systematic literature study [14],
more improvements are needed to accept blockchain technology in the insurance
sector such as forming a consortium, prototyping the use cases, reaching the average
users, etc., but the insurance industry will get more benefits than other industries.
MIStore [15] is a blockchain solution to store medical insurance records securely in a
distributed environment implemented using the Ethereum platform. The efficiency of
MIStore is dependent on the efficiency of the Ethereum platform if another platform
is used for implementing the same it might provide better throughput. Blockchain-
based crop insurance [16] will result in automatic payouts in case of natural disasters,
and farmers will not have to worry about climate change. Manual entry in medical
insurance results in fraud of ten billion per year in the USA [17], because of a lack
of endorsement of stakeholders. Blockchain solutions to the health insurance sector
will provide security, immutability, and transparency.
3 Background Work
3.1 Blockchain
In India, it is mandatory under the law to register every death with the concerned
State/UT Government within 21 days of its occurrence. If the death has taken place
in hospitals, nursing homes, or medical institutions, such deaths are to be reported
by the head of the institutions within 21 days of the death to the concerned registrars.
If the death has taken place at home, it is the responsibility of the family member to
report the same within 21 days to the sub-registrars. A death certificate is then issued
after proper verification [20].
4 Gap Analysis
Most of the researchers provide novel architecture, models, and frameworks using
blockchain, but there is a lack of technical details about used blockchain elements.
There are several papers present on blockchain framework using Hyperledger
Composer for health care, supply chain, and banking sector, but there is only one
paper available related to the insurance sector, which provides only general frame-
work for the insurance process and only two participants, as insurance company
and insurance client [12], but by further studies, it is found that there will be
different participants, endorsement policies, consensus algorithms, ordering peers,
and different algorithms for smart contracts for a different type of insurance policies.
Implementing blockchain solutions for the insurance industry is a very challenging
job [21]; firstly, insurers have to train or hire staff having proper technical knowledge
for implementing blockchain solutions, and second insurer has to invest in adopting
new practices. The proposed system in this paper aims to design a blockchain frame-
work for life insurance having four types of participants as insurance company, death
registry, verifier, and insurance client. The development language of the proposed
work is Hyperledger Composers modeling language which supports JavaScript API.
Blockchain Framework for Automated Life Insurance 775
Hyperledger Fabric is a project led by IBM, under the Linux Foundation. It provides
a platform for modular architecture to develop Enterprise Blockchain solutions.
Hyperledger Fabric supports building private (permissioned) business networks
[22]. Composer Playground is a web-based tool for modeling and testing business
networks. Playground communicates with the local Fabric runtime directly [23].
6 Proposed Framework
6.1 Methodology
For the proposed framework, phases such as analysis, design, and implementation
are used. In the analysis phase, the authors analyze the requirements of the proposed
work and identify participants, assets, and transactions. Depending on the analysis,
776 V. Kalsgonda et al.
author designs the architecture of the proposed system which is shown in Fig. 2. The
blockchain network consists of three different organizations such as insurance organi-
zation, a death registry, and a verifier organization. In this architecture, death registry
workers, insurance workers, claim verification workers, and insurance customers
interact with a proposed blockchain network with their respective web apps, through
a Nodejs server. Hyperledger framework provides Fabric SDKs for interacting with
clients on the blockchain network. Here, the client is referred to as a worker. Each
organization’s participating node or clients are called peer nodes. Transactions made
by these peers are submitted to the blockchain network only after validating it by
ordering service. The detailed design of the smart contract is discussed in the next
section. The authors will implement the design in the implementation phase.
In our proposed work for the network model, author designed participants, asset, and
transactions as follows.
Blockchain Framework for Automated Life Insurance 777
6.2.1 Participants
6.2.2 Assets
In our proposed work, asset is a policy and attributes of assets are policy number, start
date, status, premium, and beneficiary. Every insurance policy has a unique policy
number, so this is our prime attribute using this we can make transactions. Figure 7
demonstrates adding a new asset to blockchain. The record of the new asset is stored
in the asset registry.
6.2.3 Transactions
From buying a policy to refunding of policy, there will be transactions like client
registration, buy policy, claim, pay premium, refund, and history; these all are listed
in Table 1. The most important transaction is automatic claim processing when an
authorized person makes an entry for life status as death. Therefore, the design of
claim processing is the main objective of our proposed work.
Access control rules define which participant can view which part of the system. In
our proposed design, insurance clients can only view details of their own policies.
In Fig. 10, you can view that the admin can access the data of all clients and Fig. 11
shows that the client can access its own data only; Fig. 12 shows that client does not
have the authority to create a new participant.
Likewise, in our proposed system, authorized persons in the death registry can
only change the life status of policyholders and cannot access another part of the
782 V. Kalsgonda et al.
Fig. 12 Client does not have the authority to create a new participant
system. A person from the insurance industry verifies the death reason, whether the
reason for death is valid or not for the claim processing. After the confirmation, the
person who verifies has the authority to make appropriate entries in verification form.
In query files, we can write queries like SQL queries. In the proposed work, the
author designs query for admin and policyholders. In our framework, we design
queries to perform all admin activities, also queries for the customer for accessing
own activities.
7 Test Result
In this paper, simple data is used for demonstration purposes. The authors show
all implementation details of assets, participants, and transactions. Here, Hyper-
ledger Composer-based implementation is used for testing purposes and found that
the system works well according to ACL. Only authorized persons can access or
784 V. Kalsgonda et al.
update specific data. Thus, the result shows that Hyperledger Fabric is the best-suited
solution for building a fine-grained permissioned blockchain.
References
13. Gatteschi V, Lamberti F, Demartini C, Pranteda C, Santamaría V (2018) Blockchain and smart
contracts for insurance: is the technology mature enough? Future Internet. 10(2):20. https://
doi.org/10.3390/fi10020020
14. Kar AK, Navin L (2021) Diffusion of blockchain in insurance industry: an analysis through
the review of academic and trade literature. Telematics Inform 58:101532. https://doi.org/10.
1016/j.tele.2020.101532. ISSN 0736-5853
15. Zhou L, Wang L, Sun Y (2018) MIStore: a Blockchain-based medical insurance storage system.
J Med Syst 42:149. https://doi.org/10.1007/s10916-018-0996-4
16. Jha N, Prashar D, Khalaf OI, Alotaibi Y, Alsufyani A, Alghamdi S (2021) Blockchain based crop
insurance: a decentralized insurance system for modernization of indian farmers. Sustainability
13(16):8921. https://doi.org/10.3390/su13168921
17. Ismail L, Zeadally S (2021) Healthcare insurance frauds: taxonomy and blockchain-based
detection framework (Block-HI). IT Professional 23(4):36–43. https://doi.org/10.1109/MITP.
2021.3071534
18. Leka E, Lamani L, Selimi B, Deçolli E (2019) Design and implementation of smart contract:
a use case for geo-spatial data sharing. In: 42nd International convention on information and
communication technology, electronics and microelectronics (MIPRO), Opatija, Croatia
19. Tschorsch F, Scheuermann B (2016) Bitcoin and beyond: a technical survey on decentralized
digital currencies. IEEE Commun Surv Tutor, pp 2084–2123
20. News: Information, 20 May 2019. [Online]. Available: https://www.indiatoday.in/information/
story/death-certificateprocedure-india-delhi-1529341-2019-05-20
21. Grima S, Spiteri J, Romānova I (2020) A STEEP framework analysis of the key factors
impacting the use of blockchain technology in the insurance industry. Geneva Pap Risk Insur
Issues Pract 45:398–425. https://doi.org/10.1057/s41288-020-00162-x
22. hyperledger-fabric, 2019. [Online]. Available:https://hyperledgerfabric.readthedocs.io/en/rel
ease-1.4
23. Welcome to Hyperledger Composer, [Online]. Available: https://hyperledger.github.io/com
poser/latest
Blockchain for IoT: A Comprehensive
Review for Precision Agricultural
Networks
Abstract Internet of Things (IoT) has gained popularity in recent years due to the
services it offers and its wide usage in the field of science and technology such as smart
agriculture, smart home devices. We now live in a world surrounded by smart gadgets
and utilize it almost on daily basis. As the usage is increasing, the amount of data being
disseminated is also increasing due to which a lot of data is being exposed to threats.
So, it must be made sure that the channels through which data is being transferred
should reach securely to the endpoint without compromising integrity, confidentiality,
and authentication, ensuring that data reached should not be modified or tampered. In
recent years, an important technique has been developed called “blockchain”, which
can help in improving security, trust, speed, visibility, immutability, and traceability.
Motivated by these facts, this work explores the potential of the blockchain and the
efficacy of using it to deal with growing security and performance challenges in IoT
associated within precision agriculture.
1 Introduction
We live in a world surrounded by smart devices and gadgets. One of the revolu-
tionary innovations of the twentieth century is Internet of Things (IoT). IoT is an
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 787
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_60
788 A. Upadhyaya et al.
area of active research and is playing an important role in advancing technical sector.
Internet of Things is defined as a new technology paradigm envisioned as a global
network of machines and devices which are capable of communicating with each
other [1]. It is also a smart network that connects everything to the internet for the
purpose of exchanging data and interacting via information detecting devices such
as sensors, gateways [1]. These devices are resource constraint and have very less
power and storage capacity [2]. The term IoT was coined by Ashton in 1998; since
then, significant progress has been done in IoT domain [3]. The number of smart
connected devices is increasing every year; by this rate of growth, on forecasting,
the number of devices connected would reach 23.5 billion by 2029 [4]. IoT offers
plethora of applications in science and technology. Smart industry, smart logistics,
smart agricultural, intelligent transportation, smart grid [5], smart environmental
protection, smart safety, smart medical, smart wearables, home gadgets are some of
the use cases [6, 7].
Considering the execution of numerous approaches to secure IoT devices, new
types of attacks will continually emerge due to variances in security standards. Even
though IoT devices are secure on their own, they become vulnerable to a variety of
threats when linked to an insecure network such as device authentication, DoS/DDoS
attack [8], intrusion detection malware detection [9]. As per studies, 70% of the smart
devices are vulnerable to cyberattacks. Furthermore, attacks such as Mirai (2016),
Persirai (2017), and BrikerBot (2017) in the past have exploited IoT devices security
[10]. Today, a large number of devices communicate with one other on a daily basis,
resulting in a large amount of data transfer. Many crucial IoT systems such as Internet
of Drones (IoD) and Internet of Robotic Things (IoRT) are used in surveillance and
warfare activities [11]. As a result, there can be security and privacy breach while
transferring important data, which is a major concern.
In such a scenario, blockchain has quickly emerged as an important technology
in recent years. Satoshi Nakamoto is widely regarded as the inventor of blockchain.
He also introduced the electronic cash currency “Bitcoin”, which sparked interest
in blockchain research [12]. Blockchain is a public, trusted, shared, and immutable
ledger of all transactions, has decentralized storage, offers high transparency and
security [10]. Blocks are formed from transactions and are linked together in a
chain utilizing the information of adjacent blocks called hash. These records are
protected from tampering and change using a variety of cryptographic techniques
[12]. Blockchain has been integrated with a range of domains since its inception to
handle a variety of problems and provide better solutions to existing ones; one such
domain is Internet of Things. In the realm of the Internet of Things, blockchain has
the ability to tackle issues of privacy, security, and lack of trust. Thus, the use of
blockchain in the IoT domain has given rise to a new blockchain domain in the IoT
known as blockchain Internet of Things (BIoT) [13].
The organization of the paper is as follows: Sect. 1 consists of the introduction.
Section 2 consists of related work. Section 3 includes the blockchain and its require-
ments. Section 4 and 5 includes the IoT security challenges in precision agriculture.
Lastly, Sect. 6 depicts the IoT performance challenges (Fig. 1).
Blockchain for IoT: A Comprehensive Review for Precision … 789
2 Literature Review
Elhoseny et al. [14] presented a hybrid security model in healthcare services for
securing the diagnostic text data in medical images during transmission. The model
was developed using a technique known as 2D Discrete Wavelet Transformation 1
Level (2D-DWT-1L) steganography and then integrated with a hybrid encryption
scheme which consists of both AES and RSA algorithms. The model was applied
on two datasets, i.e., DME and DICOM datasets. The results exhibited that the
model has a PSNR value of 57.02, and the value of MSE was 0.1288 which reveals
higher performance than the existing one. In another work, Khari et al. [15], the
work in this paper focused on the security of data in IoT devices using cryptography
and steganography techniques for secure data transmission using an Elliptic Galois
Cryptography (EGC) protocol which includes cryptography to encrypt data, then
embed this data into low complexity images using steganography. The proposed
work (EGC) showed 86% efficiency, when compared to the existing techniques.
Aggarwal et al. [16] proposed a model for internet of drones, which provide secure
communication, data collection, and transmission among drones and users as well by
utilizing a public blockchain having Ethereum platform which included selection of
forger node, creating blocks and validation, applying proof-of-stake mechanism. The
results revealed that the proposed model covered all security aspects such as authen-
tication, authorization, accountability (AAA), data integrity (DI), identity anonymity
(IA), verification and validation (VV) which made the system more scalable, reli-
able, and superior by evaluating the computation cost and time. Another work done
by Nikooghadam et al. [17] proposed a safe and lightweight authentication and key
management protocol for IoT-based WSNs to provide a secure communication link
between user and sensor nodes. The method for forward verification is Automated
Validation of Internet Security Protocol and Applications (AVISPa) tool. The results
of the research revealed that communication cost and storage cost showed better
performance when compared to the existing protocols.
790 A. Upadhyaya et al.
Truong et al. [18] have proposed a framework named SASH that integrates IoT
platform with blockchain which provides plethora of advantages including security
and data transmission among IoT devices. In SASH, basically, blockchain technology
is utilized which consists of data marketplace, two sharing schemes, and prefix
encryption. Firmware and Hyperledger have been used as platform to implement
SASH. The proposed work showed a moderate overhead. In another work, Karati
et al. [19] proposed a new generalized CLSC (gCLSC) cryptography, certificateless
signcryption technique to assure secure communication of data through IoT devices.
It functions as a digital signature and encryption both and can be adopted where
authenticity, confidentiality, and lightweight are instrumental factors. The perfor-
mance metric of proposed work revealed that the computational cost was way better,
and also, the gCLSC observed minimal storage almost 50% less when compared to
CLSC.
ASeyfollahi et al. [20] proposed a reliable data dissemination for IoT using
an algorithm called Harris Hawks Optimization. The mechanism is equipped with
fuzzy hierarchical network model for Wireless Sensor Network (WSN) to provide
reliable and secure data aggregation. The results showed that RDD improved the
energy consumption, packet forwarding distance, and packet delivery ratio by 3.12%,
17.5%, and 43.5%, respectively, when compared to other methods. Gochhayat et al.
[21] proposed a novel distributed key management scheme for IoT devices which
provides security and privacy to user sensitive data. The method includes delegating
the resource-consuming cryptographic to local entity, and this entity coordinates with
other peer entities to provide a authentication mechanism. The scheme also exploits
the merits of mobile agents by deploying them in subnetworks when required. The
results of the research manuscript revealed that it reduced communication overhead.
Li et al. [22] proposed a node-oriented secure data transmission (NOSDT) algo-
rithm for securing the transmission in social networks based on IoT as nodes are
highly vulnerable to attacks by malicious nodes. The scheme analyzes the behavior
of malicious node and provides secure data transmission by selecting new reliable
nodes. Influence model is designed to reduce transmission in malicious node. Upon
evaluation, NOSDT improved the performance of social networks by reducing the
transmission impact of malicious nodes. In another work, Khan et al. [23] introduced a
blockchain-based solution for secure and private IoT communication in smart home
network. Smart home architecture based on blockchain is empowered with Deep
Extreme Learning Machine. DELM learns quickly, and extreme learning machine is
a feedforward neural network. In this proposed system, they used backpropagation
approach during learning phase and the network adjusts weight to achieve accu-
racy. The proposed approach achieved accuracy of 93.91%, outperforming other
algorithms.
Mahdi et al. [24] proposed a new stream cipher procedure called Super ChaCha
over standard ChaCha for securing data communication in IoT devices. A change in
rotation is done as a modification in the procedure. The input cipher is also altered
from column to diagonal to zigzag to alternate form. The results showed that Super
ChaCha successfully passed all five benchmarking and NIST tests, and with a small
increase in time, memory, and power and a small drop in throughput, the complexity
Blockchain for IoT: A Comprehensive Review for Precision … 791
level rises considerably. By brute-force attack, Super ChaCha requires 2512 likely
keys to break. Also, Pampapathi et al. [25] proposed an efficient and reliable data
distribution and secure data transmission using improved adaptive neuro fuzzy infer-
ence system (IANFIS) and modified elliptical curve cryptography (MECC) in IoT.
The proposed work consists of methods including registration and data communica-
tion, sending, data broker, security analysis, and local calculations. On comparing
ANFIS with IANFIS, the performance of IANIFS was better; on the other hand, the
MECC was 96% better at security when compared to the existing ECC.
Manogaran et al. [26] have proposed a blockchain-assisted secure data sharing
(BSDS) model in industrial IoT. The goal is to maximize response rate by reducing
false alarm progression (FAP). This model is in charge of data capture and dissemina-
tion security, both inbound and outbound. The BSDS was found to achieve a 5.67%
high response rate with FAP of 4% and reduced the failure rate by 2.14% rate with
FAP of 2%, respectively. Further, it achieved 3.12% less FAP with the response rate
of 0.95, maximized response rate by 6.63% with the failure rate of 0.06, and reduced
delay by 11.91% with the FAP of 5.2%. In another work, Naresh et al. [27] have
proposed an identity-based online/offline signcryption (OOSC) scheme suitable to
provide secure message communication among IoT devices, gateway, and server.
This approach is divided into two phases: online and offline, with the offline phase
performing heavy mathematical computations and the online phase performing minor
computations. On experimenting, the results revealed that the proposed scheme takes
less computational time and provides security against IND-CC2.
Miao et al. [28] proposed a Federated Learning-Based Secure Data Sharing mech-
anism for IoT, named FL2S for secure data sharing in IoT. Federated learning (FL)
framework is developed based on the sensitive task decomposition. In addition,
to improve data sharing quality, deep reinforcement learning (DRL) technology is
utilized. The results revealed that FL2S achieved better privacy protection and data
quality in secure data sharing. Table 1 shows the various techniques used in IoT
devices for securing data along with results and challenges.
From the above analysis, it is clear that many of the techniques utilized are based
on cryptography, which can be potentially breached for IoT networks. So, it is clear
that a robust solution such as blockchain is needed for several issues described in
Sect. 4.
Table 1 The table presents various security techniques utilized for securing data communica-
tion with results and challenges in different IoT domains in recent years
Author Year Technique/method Result/challenges Domain
Elhoseny et al. [14] 2018 2D-DWT-1L Better performance Healthcare-based
Steganography in hiding confidential IoT
integrated with AES data into transmitted
and RSA cover image and
securing
Khari et al. [15] 2019 Elliptic Galois EGC showed 86% IoT
Cryptography better efficiency
Aggarwal et al. [16] 2019 Public blockchain on Better computational Internet of Drones
Ethereum platform cost and time (IoD)
It can also be
performed on private
blockchain
Nikooghadam et al. 2019 Authentication and Better IoT
[17] key management communication and
protocol (AVISPa) storage cost
tool No
machine-to-machine
security protocol in
industrial IoT
Truong et al. [18] 2019 SASH, integrating Showed moderate IoT
blockchain with IoT overhead. SASH is
platform yet to be
implemented in
global policies in
network
Karati et al. [19] 2019 gCLSC Minimal storage, i.e., IoT
Cryptography, 50% less on
certificateless comparing CLSC
signcryption Can be enhanced by
incorporating
revocation and
discarding the
bilinear pairing
efficiently
Seyfollahi et al. [20] 2020 Reliable Data Improved energy IoT
Dissemination for consumption, packet
the Internet of forwarding distance,
Things (RDDI) and packet delivery
using Harris Hawks ratio by 3.12%,
Optimization (HHO) 17.5%, and 43.5%,
respectively
Yet to be evaluated
on jamming attack
(continued)
Blockchain for IoT: A Comprehensive Review for Precision … 793
Table 1 (continued)
Author Year Technique/method Result/challenges Domain
Gochhayat et al. 2020 Delegating the Reduced the IoT
[21] resource-consuming communication
cryptographic to overhead, generation
local entity of extra certificates
Li et al. [22] 2020 Node-oriented Improved Social network in
secure data performance of IoT
transmission social networks by
(NOSDT) reducing the
transmission impact
of malicious nodes
More work can be
done on forwarding
path, better methods
for detection of
malicious nodes
Khan et al. [23] 2020 Blockchain Accuracy = 93.91% Smart home IoT
empowered with Exploring extensions
Deep Extreme through the
Learning Approach application of further
(DELM) datasets and
architectures
Mahdi et al. [24] 2021 Super ChaCha Increased complexity IoT
time requires 2512
keys to break
brute-force attack
Pampapathi et al. 2021 Improved Adaptive Better performance IoT
[25] Neuro Fuzzy than ANFIS; also,
Inference System MECC is 96% better
(IANFIS) and when compared to
modified elliptical ECC
curve cryptography
(MECC)
Manogaran et al. 2021 Blockchain-assisted 5.67% high response Industrial IoT
[26] secure data sharing rate with FAP of 4%
(BSDS) and reduced the
failure rate by 2.14%
rate with FAP of 2%,
respectively
Naresh et al. [27] 2021 Identity-based Less computational IoT
online/offline time and secure
signcryption scheme against IND-CC2
(OOSC) Yet to be
implemented in
healthcare
monitoring systems
and in industrial
systems
(continued)
794 A. Upadhyaya et al.
Table 1 (continued)
Author Year Technique/method Result/challenges Domain
Miao et al. [28] 2021 Federated Better privacy IoT
Learning-Based protection and data
Secure Data Sharing quality in secure data
(FL2S) sharing
ensures that it is one and only version of truth that is agreed upon by all nodes of
blockchain [29]. A visualization of blockchain is given in Fig. 2.
Blockchain technology comprises several components on which blockchain
process depends and operates. There are protocols that must be followed for creation,
addition, and storing data in blocks. Some of the major requirements of blockchain
are:
• Smart contracts: It is a program that is stored on blockchain when certain
conditions are met, makes transaction transparent and irrevocable.
• Tokenization: It enables the digital representation of rights, goods, and services.
Tokenization enables users to exchange values and trust without engaging any
centralized authority.
• Data security: Data security is an important aspect and necessary need of
blockchain technology so as to protect data from tampering.
• Decentralized data storage: It is a prerequisite for a distributed or scattered system.
• Immutability: In distributed ledger, all the transactions/records stored should not
be modified. This ensures that data is safe and secure.
• Consensus: It makes sure that every new block added should only be integrated
in blockchain when all the legit users in the network agree.
• Typed blocks: It is necessary for smart contracts as well as high-speed business
transactions. So, formatting is different for the different kinds of blocks which
includes time, consensus algorithm, transactions per blocks, and data types of
content it has.
• Sharding: It is required for dividing the content over subset of nodes so that there
is less burden on each node.
Blockchain for IoT: A Comprehensive Review for Precision … 795
IoT security is vital for smooth functioning of the system. There are several security
parameters that need to be taken into consideration. Hence, to make IoT network more
secure and reliable, we must meet the security challenges. Some of the parameters
that we must focus on are as below [30]:
• Confidentiality: It essentially means hiding information from people who are
not authorized to read it. IoT collects various sensitive information from devices
such as in medical healthcare equipment, heart rate, temperature, and pressure
detectors. The important details from these devices can be traced. So, every bit of
information must be transmitted confidentially from sensor devices.
• Integrity: It is defined as preventing modification of data by unauthorized person
in communication process. In IoT, a single change in data bit could potentially
alter the entire meaning from NO to YES.
• Availability: It can be defined as the services running the system should be within
reach at any time to anyone who is an authenticated entity. Availability in smart
home, like turning off the smart lights via remote control (RC), should be able to
perform anytime as per user’s request [31].
• Trust and privacy: Trust and privacy are important aspects in IoT security. In
IoT, trust management consists of data gathering that is reliable, data fusion and
mining that is reliable, and user privacy that is enhanced. Frequency of answers,
796 A. Upadhyaya et al.
instance, the TrustChain protocol has been proposed for validating and admin-
istering trustworthy transactions in distributed IoT networks while retaining
integrity. Each and every block in the TrustChain displays a transaction among
two IoT participants, plus the hash codes of earlier transactions are used to create
new transactions. Aside from security, the fundamental benefit of TrustChain
is that each and every agent in the system monitors the interaction of others and
gathers records to calculate trustworthy levels. In addition, blockchain can provide
control functions for trustworthy and for decentralized transactions as well. It also
enables remote asset management and rapid end-to-end data verification among
IoT devices.
• IoT transaction verification: The network of blockchain has the potential to play a
significant role in the authentication as well as authorization of IoT systems. Entire
IoT transactions made by devices are recorded on the distributed or shared ledger
and may be monitored and traced safely using blockchain. The valid sender, who
has a unique private key (PK) and GUID, will always cryptographically confirm
each and every IoT transaction communicated with the blockchain system. As a
result, it would be easier to confirm authentication and integrity of the triggered
or activated transaction. Blockstack is a popular blockchain approach that makes
use of JSON Web Token (JWT) to easily authenticate IoT transactions. One of
the applications of blockstack is that it can be utilized in smart greenhouse for
authenticating access.
• Securing IoT communications: Many classical protocols such as DTLS/TLS
protocols have some limitations particularly in computation time or memory
needs. Furthermore, using the common PKI protocol, these methods have certain
issues with centralized governance and control of key production and distribution.
By providing each and every device, a unique pair of GUID and PKI: once installed
and connected to the blockchain network, the blockchain could solve these chal-
lenges and improve key management among IoT devices. With aid of blockchain,
additional secure communication enhancements can be imagined, such as elimi-
nating the need for a handshake phase in the DTLS or TLS protocols to exchange
PKI certificates. As a result, for creating secure interactions between IoT devices,
the ideal solution for covering runtime computing and memory management needs
would be blockchain. Furthermore, IoT device firmware can be continuously
hashed into a blockchain for identifying IoT malware and alerting device owners
to take necessary security actions against the identified malicious bot. The trans-
mitter node hashes a message to send to another IoT node and stores the hash
code in a blockchain network. The recipient node, on the other hand, hashes the
identical message. The verification protocol states that if the hash value on the
received message matches the hash value on the blockchain, then the message
has not been modified or tampered during transmission [34].
798 A. Upadhyaya et al.
Because of the rising number of linked IoT gadgets in precision agricultural networks,
IoT systems will need to coordinate a large variety of network topologies in the future
and evaluate massive amounts of data at a rapid rate. As a result, the performance
of IoT networks in precision agriculture systems portrays some more major chal-
lenges. Five obstacles might be characterized as IoT performance issues in precision
agriculture networks as shown in Fig. 3. Blockchain technology can help with such
issues as well.
• Blockchain and sensing problem: Mainly, this issue arises in perception
layer of IoT model. Many agricultural apparatuses such as tractors, irrigation
machines, smart greenhouses, farming devices consist of sensors embedded,
which constantly generate data about operating status and allow IoT nodes to
send and receive data via IoT cloud. In this scenario, blockchain can be utilized to
define communication protocols among these sensors in addition to keep track of
all M2M transactions. For example, IOTA, which is a new update of blockchain
platform, is designed in a way to perform large transactions in IoT devices using
IOTA as well as DAG. Application of IOTA is that it can eliminate the scalability
issue in precision agriculture.
• Blockchain and energy consumption problem: Prominently, this issue is related
to the network layer in the IoT layer paradigm. Generally, IoT devices are low
constraint, i.e., they are expected to be low-power devices. As IoT gadgets in
precision agriculture become more widespread, wireless devices have to be used,
which consumes much greater energy than that wired one. However, because
of the decentralization aspect of Blockchain, it may be possible to implement
certain solutions to the energy consumption challenge. Blockchain such as private
blockchain could be used to ensure that the ratio of high compute power to high
bandwidth connection for the IoT node is maintained. Blockchain will help in
7 Conclusion
The IoT network is growing at a rapid speed in terms of its size and wide implemen-
tation in different sectors of society. The usage nowadays has become enormous, and
a large amount of data is being transferred, which has made data more vulnerable
to threats. The need of the hour is securing IoT devices to make communication
safer and more reliable. Blockchain has an instrumental role in securing IoT device
communication. It is an immutable and distributed ledger, transparent, and traceable
providing enormous benefits. In this paper, we took an overview of blockchain by
defining it conceptually and its required parameters. Considering blockchain over
traditional methods has several benefits. Some IoT issues were also discussed such
as addressing address space, managing object identification, and transaction verifica-
tion. Blockchain has also played a crucial role in providing solutions for IoT perfor-
mance challenges in IoT precision agriculture such as in sensing, network complexity,
energy consumption, limited data storage, bandwidth, and latency issues. Further,
a reliable precision agricultural infrastructure framework can be developed. Also,
the implementation of a blockchain-based technique to secure data dissemination in
smart homes will be carried out in the future.
References
1. Lee I, Lee K (2015) The Internet of Things (IoT): Applications, investments, and challenges
for enterprises. Bus Horiz 58(4):431–440. https://doi.org/10.1016/j.bushor.2015.03.008
2. Bodkhe U, Tanwar S (2021) Secure data dissemination techniques for IoT applications: research
challenges and opportunities. Softw Pract Exp 51(12):2469–2491. https://doi.org/10.1002/spe.
2811
3. Bhuvaneswari V, Porkodi R (2014) The internet of things (IOT) applications and communica-
tion enabling technology standards: an overview. In: Proceedings—2014 international confer-
ence on intelligent computing applications, ICICA 2014, pp 324–329. https://doi.org/10.1109/
ICICA.2014.73
4. Estimated data on number of IoT devices connected worldwide. https://www.statista.com/sta
tistics/1183457/IoT-connected-devices-worldwide/. Accessed 17 Jan 2022
5. Anand P, Singh Y, Selwal A, Singh PK, Felseghi RA, Raboaca MS (2020) IoVT: internet of
vulnerable things? threat architecture, attack surfaces, and vulnerabilities in internet of things
and its applications towards smart grids. Energies (Basel) 13(18). https://doi.org/10.3390/en1
3184813
6. Chen S, Xu H, Liu D, Hu B, Wang H (2014) A vision of IoT: applications, challenges, and
opportunities with China perspective. IEEE Internet of Things J 1(4): 349–359. Institute of
Electrical and Electronics Engineers Inc. https://doi.org/10.1109/JIOT.2014.2337336
7. Real world examples of IoT. https://www.edureka.co/blog/iot-applications/. Accessed 17 Jan
2022
8. Malhotra P, Singh Y, Anand P, Bangotra DK, Singh PK, Hong WC (2021) Internet of things:
evolution, concerns and security challenges. Sensors 21(5):1–35. https://doi.org/10.3390/s21
051809
9. Wu H, Han H, Wang X, Sun S (2020) Research on artificial intelligence enhancing internet of
things security: a survey. IEEE Access 8: 153826–153848. Institute of Electrical and Electronics
Engineers Inc. https://doi.org/10.1109/ACCESS.2020.3018170
Blockchain for IoT: A Comprehensive Review for Precision … 801
10. Anand P, Singh Y, Selwal A, Alazab M, Tanwar S, Kumar N (2020) IoT vulnerability assess-
ment for sustainable computing: Threats, current solutions, and open challenges. IEEE Access
8:168825–168853. https://doi.org/10.1109/ACCESS.2020.3022842
11. Shah Y, Sengupta S (2020) A survey on classification of cyber-attacks on IoT and IIoT devices.
In: 2020 11th IEEE annual ubiquitous computing, electronics and mobile communication
conference, UEMCON 2020, pp 0406–0413. https://doi.org/10.1109/UEMCON51285.2020.
9298138
12. Kamran M, Khan HU, Nisar W, Farooq M, Rehman SU (2020) Blockchain and Internet of
Things: a bibliometric study. Comput Electr Eng 81. https://doi.org/10.1016/j.compeleceng.
2019.106525
13. Bhushan B, Sahoo C, Sinha P, Khamparia A (2021) Unification of Blockchain and Internet of
Things (BIoT): requirements, working model, challenges and future directions. Wireless Netw
27(1):55–90. https://doi.org/10.1007/s11276-020-02445-6
14. Elhoseny M, Ramírez-González G, Abu-Elnasr OM, Shawkat SA, Arunkumar N, Farouk A
(2018) Secure medical data transmission model for IoT-based healthcare systems. IEEE Access
6:20596–20608. https://doi.org/10.1109/ACCESS.2018.2817615
15. Khari M, Garg AK, Gandomi AH, Gupta R, Patan R, Balusamy B (2020) Securing data in
Internet of Things (IoT) using cryptography and steganography techniques. IEEE Trans Syst
Man Cybern Syst 50(1):73–80. https://doi.org/10.1109/TSMC.2019.2903785
16. Aggarwal S, Shojafar M, Kumar N, Conti M (2019) A new secure data dissemination model
in internet of drones. In: ICC 2019 - 2019 IEEE International Conference on Communications
(ICC), Shanghai, China, pp 1–6. https://doi.org/10.1109/ICC.2019.8761372
17. Ostad-Sharif A, Arshad H, Nikooghadam M, Abbasinezhad-Mood D (2019) Three party secure
data transmission in IoT networks through design of a lightweight authenticated key agreement
scheme. Fut Gener Comput Syst 100:882–892. https://doi.org/10.1016/j.future.2019.04.019
18. Truong HTT, Almeida M, Karame G, Soriente C (2019) Towards secure and decentralized
sharing of IoT data. In: Proceedings - 2019 2nd IEEE International Conference on Blockchain,
Blockchain 2019, pp 176–183. https://doi.org/10.1109/Blockchain.2019.00031
19. Karati A, Fan CI, Hsu RH (2019) Provably secure and generalized signcryption with public veri-
fiability for secure data transmission between resource-constrained IoT devices. IEEE Internet
Things J 6(6):10431–10440. https://doi.org/10.1109/JIOT.2019.2939204
20. Seyfollahi A, Ghaffari A (2020) Reliable data dissemination for the Internet of Things using
Harris hawks optimization. Peer-to-Peer Netw Appl 13(6):1886–1902. https://doi.org/10.1007/
s12083-020-00933-2
21. Gochhayat SP et al (2020) Reliable and secure data transfer in IoT networks. Wireless Netw
26(8):5689–5702. https://doi.org/10.1007/s11276-019-02036-0
22. Li X, Wu J (2020) Node-oriented secure data transmission algorithm based on IoT system
in social networks. IEEE Commun Lett 24(12):2898–2902. https://doi.org/10.1109/LCOMM.
2020.3017889
23. Khan MA et al (2021) A machine learning approach for blockchain-based smart home networks
security. IEEE Netw 35(3):223–229. https://doi.org/10.1109/MNET.011.2000514
24. Mahdi MS, Hassan NF, Abdul-Majeed GH (2021) An improved chacha algorithm for securing
data on IoT devices. SN Appl Sci 3(4). https://doi.org/10.1007/s42452-021-04425-7
25. Pampapathi BM, Nageswara Guptha M, Hema MS (2021) Data distribution and secure data
transmission using IANFIS and MECC in IoT. J Ambient Intell Human Comput. https://doi.
org/10.1007/s12652-020-02792-4
26. Manogaran G, Alazab M, Shakeel PM, Hsu CH (2021) Blockchain assisted secure data sharing
model for internet of things based smart industries. IEEE Trans Reliab. https://doi.org/10.1109/
TR.2020.3047833
27. Naresh VS, Reddi S, Kumari S, Divakar Allavarpu VVL, Kumar S, Yang MH (2021) Practical
identity based online/off-line signcryption scheme for secure communication in internet of
things. IEEE Access 9:21267–21278. https://doi.org/10.1109/ACCESS.2021.3055148
28. Miao Q, Lin H, Wang X, Hassan MM (2021) Federated deep reinforcement learning based
secure data sharing for Internet of Things. Comput Netw 197. https://doi.org/10.1016/j.com
net.2021.108327
802 A. Upadhyaya et al.
29. Bodkhe U et al (2020) Blockchain for Industry 4.0: a comprehensive review. IEEE Access
8:79764–79800. https://doi.org/10.1109/ACCESS.2020.2988579
30. Malik R, Singh Y, Sheikh ZA, Anand P, Singh PK, Workneh TC (2022) An improved deep
belief network IDS on IoT-based network for traffic systems. J Adv Transp 2022:1–17. https://
doi.org/10.1155/2022/7892130
31. Anand P, Singh Y, Selwal A, Singh PK, Ghafoor KZ (2021) IVQFIoT: intelligent vulnerability
quantification framework for scoring internet of things vulnerabilities. Expert Syst. https://doi.
org/10.1111/exsy.12829
32. Anand P, Singh Y, Selwal A (2021) Internet of things (IoT): vulnerabilities and remediation
strategies. In: Singh PK, Singh Y, Kolekar MH, Kar AK, Chhabra JK, Sen A (eds) Recent inno-
vations in computing. ICRIC 2020. Lecture notes in electrical engineering, vol 701. Springer,
Singapore. https://doi.org/10.1007/978-981-15-8297-4_22
33. Patel C, Doshi N (2019) Security challenges in IoT cyber world, pp 171–191. https://doi.org/
10.1007/978-3-030-01560-2_8
34. Torky M, Hassanein AE (2020) Integrating blockchain and the internet of things in precision
agriculture: analysis, opportunities, and challenges. Comput Electron Agric 178. Elsevier B.V.
https://doi.org/10.1016/j.compag.2020.105476
35. Georgiou K, Xavier-De-Souza S, Eder K (2018) The IoT energy challenge: a software perspec-
tive. IEEE Embedded Syst Lett 10(3):53–56. Institute of Electrical and Electronics Engineers
Inc. https://doi.org/10.1109/LES.2017.2741419
The Proof of Authority Consensus
Algorithm for IIoT Security
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 803
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_61
804 S. B. Wankhede and D. Patel
1 Introduction
Machine to machine (M2M) connections are predicted to increase from 5.6 billion in
2016 to 27 billion by 2024 [1]. The increase in number shows that Internet of Things
(IoT) is important emerging markets that will be a cornerstone of the growing digital
economy. The gadgets will be connected to the Internet and local devices but at
the same time will be able to communicate with other devices on Internet, so the
security and privacy of this IoT applications plays a major role. Some countries
like Western Europe, North America, and China are the primary driving countries
[1]. Upcoming IoT applications cannot reach high demand and may lose all of the
potential without a trusted and interoperable IoT ecosystem. Internet of things has
its own unique security challenges including the issues of Privacy, Authentication,
Management, and information storage. The immutable and tamperproof data security
characteristics of blockchain make it a significant tool in a variety of fields, including
health care, military, banking, and networking.
The Industrial Internet of Things (IIoT) is a subcategory of Internet of Things
(IoT). IIoT networks serve as a platform for a variety of applications and enable us
to respond to customer needs, particularly in industrial settings like smart factories
[2]. Due to the benefit of blockchain technology, it is widely used in smart factories,
smart homes, smart cities, and healthcare systems [2, 3].
Numerous items in current smart factories are connected to public networks and
smart systems such as temperature monitoring systems, Internet-enabled lights, IP
cameras, and IP phones assist many activities. These gadgets store personal and
sensitive information and provide life-saving services [2, 4]. The key difficulty will
be securely storing, collecting, and sharing data. Through rigorous authentication,
Blockchain Technology assures data integrity in the smart factories, as well as the
availability of communication backbones. User’s data privacy should be protected
during transmission, consumption, and storage in smart factories [5].
Fraudsters intending to access, edit, or use stored data for harmful purposes are
vulnerable to tampering. Such attacks can be classified as anomalous events, as they
deviate significantly from normal behavior [3, 6].
The major goal of this article is to identify suspicious parties and transactions in
a blockchain-based IIoT network targeted at smart factories. Abnormal conduct can
also be used as a proxy for suspicious behavior [5].
Rest of the paper is organized as follows: Sect. 2 discusses related work in this
area. Section 3 discusses foundations and methodology.
Section 4 discusses challenges in IIOT. Section 5 discusses blockchain to address
challenges in IIoT with conclusions and references at the end.
The Proof of Authority Consensus Algorithm for IIoT Security 805
2 Related Works
The smart factory has become the Center of interest as the IIoT develops. The research
on smart factory for securing access to industrial data is very crucial, and it relies
heavily on developing technologies like AI and machine learning [7]. There are
basically three categories of related study. Various surveys on the security of IoT and
privacy issues are already available.
Yuchen and colleagues compiled a list of security concerns in IoT applications.
The authors of [8] talked on the security concerns with services based on locations.
Their focus was on the specific issues of IoT device localization and positioning.
In [9], Anne et al. focus on IoT middleware security challenges and present a full
assessment of related protocols and security issues. M. Guizani et al. examined
several trust management strategies for IoT, as well as their benefits and drawbacks,
in [10].
Jung et al. [10] proposed a smart factory web based on the IIoT concept. Shin
et al. [11] merged the intelligent computing technology for gathering information
utilizing ICT technology in industry and presented a network topology based on edge
computing to satisfy the real-time needs of IIoT.
Domova et al. [12] proposed a set of data mining alarm algorithms for alert
processing in smart factories.
Lee et al. [13] presented the re-industrial architecture to boost intelligent produc-
tion, sharing and equipment management in smart factories, as well as to construct
a reliable cloud platform. The use of blockchain as a trusted entity to ensure the
security of IIoT has become widespread.
Wan et al. [14] built and reshaped a distributed network using traditional IIoT
architecture and blockchain. The flexibility and security capabilities of blockchain
for creating a credit mechanism based on consensus and the proof of work mechanism
based on the credit to deal with IIoT devices.
3 Methodology
Blockchain and the Internet of Things are significant technologies that will have huge
impact on the IT and telecommunication industries. These technologies are focused
at increasing transparency, visibility, comfort, and trust. The IIoT devices collect
real-time data from sensors and blockchain. A distributed, decentralized, and shared
ledger is the key to data security [15]. The entries in blockchain are time stamped
and chronologically arranged. By using the cryptographic hash keys, the entries in
the ledger are firmly tied with the prior entries. We discuss here a foundation block
consensus algorithm called as Proof of Authority to be used for security in smart
factories.
806 S. B. Wankhede and D. Patel
Proof of Authority (PoA) is a new family of consensus algorithms with great perfor-
mance and fault tolerance. Blockchain platforms can be classed as permissionless
or permissioned or restricted to a specific number of authority nodes. The PoA
protocol is a novel BFT algorithm that targets only approved nodes and permits to
join consortiums and submit transactions [16].
The scalability and performance differences between Byzantine Fault Tolerant
(BFT)-like algorithms and PoA algorithms include that PoA algorithms require few
message exchanges, and therefore, they give greater performance, and they can also
be implemented at a wider scale than standard BFT methods [15]. The crucial fact
is that the PoA protocol can work normally with opponents accounting for half of
the total players, whereas Byzantine tolerates only third of the total [16]. In the
worst-case scenario, the actual performance of PoA protocol lacks comprehensive
examination.
PoA algorithm’s consensus mechanism is reliant on group of trusted nodes known
as authorities. Every authorized node has a unique ID, and at least N/2 + 1 of them
must be trusted for the network. The nodes demonstrating their authority in PoA
are given the rights of generating new blocks. Validators are the nodes that execute
the software that allows them to insert transactions into blocks. The process does
not require validators to continually monitor the systems, but it does necessitate
keeping the machine secure. Proof of Authority is suitable for private as well as
public networks where trust is spread, such as the POA Network. Because the PoA
consensus mechanism relies on the value of identities, block validators stake their
own reputation rather than coins. The faith in the identities chosen secures PoA.
PoA consensus may vary depending on the implementation; however, they are
commonly applied under the following conditions:
1. Validators must verify their true identity: A candidate must be prepared to
invest money and risk his or her reputation. A rigorous approach decreases the
danger of choosing shady validators and encourages long-term commitment to
the blockchain.
2. Validators must be chosen in the same way for all applicants:
To keep the blockchain’s integrity, validators’ identities must be validated.
PoA consensus has the following benefits:
1. Risk tolerance is high as long as 51% of the nodes are not malicious.
2. The duration between fresh blocks being generated is predictable.
3. Transaction volume is high.
4. Far more sustainable than computationally intensive algorithms like Proof of
Work.
Permissionless (Bitcoin, Ethereum) and permissioned consensus techniques can
be found on blockchain platforms like Apla or Ethereum. The nodes in a permissioned
blockchain are pre-authenticated. This enables the employment of consensus types
in addition to other benefits, giving a higher transaction rate.
The Proof of Authority Consensus Algorithm for IIoT Security 807
Only validating nodes have the ability of generating new blocks. The blockchain
registry keeps track of validating nodes. The order of generating new blocks are
determined by the order in which they appear in this list.
The current leader node generates new block at the current moment and is
determined by the formula below.
where
Leader—current leader node, Time—current time
First—generation time of first block
Step—numberofseconds, Nodes—number of current nodes.
• Generation of new blocks:
The current time interval’s leader node generates the new block. The leader’s role is
given to the next validating node from the list during each time interval.
The new block is created by the leader node as follows:
1. Accept the new block and verify the following:
i. New block was created by current interval’s leader node.
ii. This leader node has generated no further blocks.
iii. The block has been produced successfully.
808 S. B. Wankhede and D. Patel
2. Execution of transactions from the block in sequential order and checking if all
transactions are completed appropriately and that block production limits are not
exceeded.
3. Accept or reject the block:
i. If there is successful block validation then add new block to the blockchain
of the node.
ii. If the validation fails then reject the block and provide an error transaction
of block.
iii. If the validating node that generated erroneous block continues to do so, it
may be blocked or terminated.
iv. Transaction queue is used to collect new transactions.
• Verification of new blocks:
1. Accept new block and verify the following:
i. The block was created by the current interval’s Leader node.
ii. The Leader node has generated n further blocks, and the block has been
produced successfully.
2. Execute transactions from the block one by one. Check that all transac-
tions are completed appropriately and that block production limits are not
exceeded.
3. Accept or reject the block based on the previous step:
i. If block validation is successful, add the new block to the node’s
blockchain.
ii. If block validation fails, reject the block and provide a faulty block trans-
action. It may be blocked or removed if the validating node that generated
this incorrect block continues to do so.
The Proof of Authority Consensus Algorithm for IIoT Security 809
4 Challenges in IIoT
IIoT devices being low cost and constrained, there are number of challenges to
address.
1. Device privacy: IIoT devices are vulnerable to disclosing personal data.
2. Cost and traffic: To keep up with the exponential increase of IIoT devices.
3. Cloud service overload and insufficiency: Cloud services are unavailable due to
attacks, software faults, power outages, and other issues.
4. Defective architecture: The components of an IIoT device have a single point of
failure that affects the device and the network.
5. Data manipulation: Data from IIoT devices is extracted and then manipulated
before being used in an inappropriate way.
The hash allows only authorized parties to access the data from the cloud.
Probability of storing faulty data from devices is reduced by using blockchain as
a solution.
4. To prevent unwanted access: IIoT systems require regular connectivity between
different nodes. The communication in blockchain is based on public and private
keys. Even if the data is accessed by an unwanted party, the contents will be
incomprehensible by encrypting the data using keys. Therefore, blockchain data
structure addresses the security challenges of IIoT applications.
5. Blockchain-based proxy architecture for resource-constrained devices:
Despite the fact that blockchain provides a variety of security characteristics for
a distributed environment, IIoT has a unique resource limitation.
IoT devices can’t keep big ledgers due to the limited resources. Various efforts
have been made to make the usage of blockchain in IIoT easier. One of the possible
methods is proxy-based architecture. Proxy servers can be set up in the network
to store data in an encrypted format. The client can download the encrypted
resources from the proxy servers.
Since smart factories handle sensitive data, storing it on the blockchain is both
monetarily and computationally burdensome. As a result, the smart factory stores
the actual smart device and sensor data. The smart factory data also provides
information on the data type and control states.
6 Conclusion
References
1 Introduction
The Internet of Things (IoT) is a rapidly evolving technology that offers a wide
range of services, making it the most rapidly evolving technology with a significant
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 813
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_62
814 R. Jabeen et al.
influence on society and corporate networks. IoT has developed an important section
of modern social life, with applications in knowledge, industry, and health, where
it stores sensitive data about businesses and individuals, economic transaction data,
product development, and marketing.
As a result of its increased usage in critical systems, it invoked a greater focus on
cyber-criminals. The success of IoT cannot be overlooked; nonetheless, assaults and
threats against IoT devices and facilities are increasing by the day. Cyber-attacks have
become a part of IoT, harming users’ lives and societies; hence, proactive measures
to guard against cyber-attacks must be done. Cybercrime is a global threat to govern-
ment and business infrastructure, and it can harm people in a variety of ways. Cyber-
crime is expected to cost the global economy up to $6 trillion every year, according
to estimates. Cyber-attacks can be caused by several factors, including (a) inadequate
cyber security in some countries and (b) cyber-criminals using new technology to
attack. (c) Services and other business plans can be used to commit cybercrime.
As the Internet of Things is evolving at such a rapid pace, it is vital to identify
IoT security imitators, problems and incursions on IoT. [1] Infrastructure, as well
as the consequences of these risks and assaults, should be extensively investigated.
Criminalizing, investigative authorities and processes, digital evidence, threat and
norms, and international monitoring and collaboration are all concepts and mate-
rials covered by transmission methods and local legal systems and national laws.
The scope (multilateral or regional) and application of these contracts varied [1]. We
can see the rise or technological advancements from the timeline in Fig. 1. These
advancements also grab the attention of malicious users.
Various traditional definitions have been offered to the field of machine learning.
In his founding work, Arthur Samuel defines device knowledge as an “area of
training aimed at giving supercomputers the potential to study without having to be
taught”. Depending on the nature of the data labelling, machine learning is classed as
supervised, unsupervised, or semi-supervised. The output is labelled and supervised
learning predicts an unknown (input, output) mapping using known (input, output)
examples (regression and classification). Only input models are sent to the system
in unsupervised learning (e.g. estimation of probability density function and clus-
tering). Semi-supervised learning (e.g. image/text recovery systems) is a blend of
supervised and unsupervised learning in which a part of data is partially labelled and
that portion is used to understand the unlabelled part [2]. Deep learning is a branch of
machine learning that deals with artificial neural networks (ANN) which are based
on the structure and function of the brain [3]. Deep learning allows deep networks
to be placed in a parameter space region where supervised fine-tuning prevents local
minima. For jobs where a huge large data is available, even if classification is done
for a small number of instances, deep learning approaches obtain very good accu-
racy [4]. Deep learning has since produced multiple state-of-the-art successes in
fields like speech recognition, image recognition, and language translation and is
now applicable for a wide range of Artificial intelligence (AI) applications. Deep
learning, as it is commonly known, is a statistical technique for categorizing patterns
based on sample data utilizing multiple-layer neural networks. Corporations have
spent billions of dollars to recruit deep learning experts.
Machine Learning for Security of Cyber-Physical Systems and Security … 815
Currently, intelligent methods based on machine learning and deep learning have
shown encouraging outcome in dealing with cyber-attacks. Specifically, the deep
learning models possess the greater capability to deal with cyber-attacks considering
the larger flows of data. Also, ML and DL can deal with known as well as unknown
attacks. Artificial intelligence (AI) approaches that deal largely with deductive infer-
ence can be contrasted to machine learning’s inductive inference, i.e. generalizations
from a set of observed cases. One of the major applications of machine learning tech-
niques is ontology learning. Ontologies can be learned from scratch using machine
learning techniques, or they can be enhanced using existing ontologies. Learning data
comes from a variety of sources: linked data, social networks, tags, and textual data.
Another common machine learning application is the learning of mappings from
one ontology to another (for example, using association rules or similarity-based
approaches). Machine learning algorithm is a computational approach that achieves
a target without being expressly stated (i.e. “difficult”) to do so. These algorithms are
“soft programmed” in the sense that they automatically adjust or adapt their design
as a result of repetition (i.e. experience) to get better and better at doing the target
objective. Training is the process of adapting, and it entails supplying samples of
input data as well as desired outcomes.
Manipulation of the system. One of the most common attacks on machine
learning systems is to force high-volume algorithms to generate incorrect predic-
tions. Researchers created a platform that integrates human–computer interaction,
816 R. Jabeen et al.
analytics, gamification, and deception to entice harmful individuals into specific traps
while piquing their curiosity.
When an adversary can feed erroneous data into your model’s training pool,
causing it to learn something it shouldn’t, this is known as a poisoning attack. The
most typical symptom of a poisoning attack is a shift in the model’s border. The flaws
with DL’s privacy protection have been exposed. According to privacy and security
issues, the DL model may be copied or reverse engineered, personal data set can be
deduced, and even a recognized face picture of the victim can be recreated. Further-
more, recent research has discovered that the deep learning model is susceptible to
adversarial samples altered by undetectable noise, causing the DL model to forecast
incorrectly within elevation confidence.
Many architectural frameworks have been proposed by researchers for CPS as shown
in Fig. 2. Among all, one well-known architecture is based on layers namely the
perception execution layer, application control layer, and data transmission layer.
The perception execution layer is made up of physical components such as sensors
and actuators. The primary responsibility of the application control layer is to provide
services to users. The transmitting data layer serves as a link between the application
control layer and the perceptual execution layer, allowing data to be sent [5]. The
other architectures are four-layer, five-layer, and seven-layer architectures. In seven-
layer architecture, there are various layers like the physical layer, application layer,
data accumulation layer, connectivity layer, edge computing layer, data abstraction
layer, collaboration, and process layer. The connectivity layer guarantees that data
is transmitted reliably. The analysis and processing of data is the emphasis of the
edge computing layer. Data is stored in a variety of ways in the data abstraction layer
to construct performance applications. The application information is shared with
people and business processes through the collaboration and process layer [6].
Application layer: Researchers in [7] to accomplish varied smart IoT systems, the
application layer connects IoT with various types of users (individuals or systems),
as well as their specialized requirements.
Data Abstraction Layer: This layer is aware of the many languages used to express
data where the information is stored. It’ll be able to manage network requirements
to appropriate data resources as a result. Layer allows multi-agent systems entities
to access knowledge via Java calls regardless of the true information representation
language. The data abstraction layer is made up of many application programming
interfaces (API) as well as a new component is known as Data Accessing Layer. The
application programming interfaces are just a set of Java techniques that connect
stored data in one area to the remainder of the network [8].
Machine Learning for Security of Cyber-Physical Systems and Security … 817
Collaboration and processes layer: To get ideas, it consumes and distributes appli-
cation data with business owners’ processes. It provides responses or action to be
taken against the data provided. This action for instance can even be an actuation of
an electromechanical device upon the instruction from the controller [9].
Cyber-physical systems are devices that combine computer, memory, and inter-
connection to manipulate and connect with physical processes. A CPS is a device that
is managed or monitored by application algorithms and connected to the Internet,
with physical and software components that are tightly integrated. From the perspec-
tive of the IoT, cloud technology, portability, Big Data, and networks of networked
devices and sensors, software architectural concepts must function in an open and
highly interactive world [11].
Cyber-physical systems and the IoT are utilized in smart manufacturing to
constantly monitor operations and automate tasks that were previously handled by
people. Several kinds of services may be established, publicized, and given to clients
in the industrial and corporate environment that focuses on these notions. The motive
of Industry 4.0 is to improve the effectiveness of design and manufacturing processes,
as well as the number and quality of services while lowering prices. Industry 4.0 is
a step forward in establishing developed schemes by utilizing new technology and
organizational structures [12].
Cyber-attacks on the application layer, data transmission layer incursions, and appli-
cation control layer security problems are all types of network assaults, as per the
framework [5, 13–19]. A cyber-attack is an attempt to disable computers or breached
a computer system to launch additional attacks or steal data. Attackers use different
techniques to launch a cyber-attack. Various cyber and physical attacks are shown in
Fig. 3.
secrecy” are commonly used to characterize security. Many cyber systems place a
premium on confidentiality. Physical systems possess many problems that make them
vulnerable to physical assaults which can compromise their integrity and availability.
Confidentiality is a security feature of the cyber-physical systems as well [22].
1.3 Contribution
The road map of the article is as follows. In Sect. 2, we introduce the security
of machine learning and discuss the adversarial attack methods, defence methods,
adversarial threat model, and its use case. In Sect. 3, we discuss the use of generative
adversarial networks (GAN) for generating fake data samples to perform adversarial
attacks. In Sect. 4, we propose a security methodology for CPS based on adversarial
machine learning. Section 5 researches challenges and gaps. Finally, the paper is
concluded in Sect. 6.
The word “cyber security” refers to a set of methods and technology for restricting
access to, attacks on, changes to, or demolition of networks, computers, data, and
applications. Network security measures and computer (server) security systems
make up cyber security systems. Each of these entities must have at least one
antivirus program, firewall, and intrusion detection system (IDS) [23]. IDSs detect
and assess unlawful use, alteration, duplication, and damage to information systems
in addition to identifying them. Security breaches are defined as internal intrusions
Machine Learning for Security of Cyber-Physical Systems and Security … 821
dynamic questions to develop a proxy model that perfectly mimics the characteristics
of a particular model, essentially converting black-box assaults to white-box assaults.
Adversarial training is the commonly researched way of increasing the robustness of
DNN [29]. Adverbially trained framework learns to withstand perturbations in the
gradient direction of the error function after several tries. Training multimodal adver-
sarial augmented the training set with hostile samples generated by other hold-out
models as well as the model being evaluated. As a result, the ensemble adversarial
trained models are resistant to single-step and black-box assaults. A set of dynamic
incremental gradient-based algorithms is an offer to create adversarial instances that
can deceive both box models. Among numerous approaches, adversarial training is
the most well-researched method for improving the resilience of DNNs [28].
For attacking ensemble models, the authors in [28] suggest the application of
the logits technique to numerous models with fused logit activation functions. The
logarithm connections between probability predictions are captured by an array of
logits. The finely detailed outputs are aggregated using models fused by logits of
all models which flaws can be found quickly to target an ensemble of models in
particular. We prove that models that were trained adversarial are susceptible to our
black-box assaults after analysing three ensemble methodologies.
An adversarial machine learning approach should be used to look at the vulner-
abilities of machine learning. In the IoT data fusion and aggregation, the procedure
they describe is a partial modelling defence based on adversarial machine learning.
The Wireless Sensor Network (WSN) is frequently regarded as the foundation of
an IoT system. Although data interchange between multiple sensor smart objects
and connector centres is required, many Internet of Things (IoT) systems have tight
speed, privacy, and reliability of the system. In a wireless context, ensuring that the
true information collected by sensor systems is safely acquired by the hub/data centre
is critical for IoT system security and dependability. As the scope of IoT systems that
increases rapidly with more sensors added, also machine learning has started to play
a critical role in the parsing and learning from massive data sets produced by IoT
devices. Although machine learning improves IoT systems running more efficiently,
it may also be used by adversaries to conduct attacks against IoT technology. This
research looks at the safety of data condensation procedures in the IoT. Actuators,
RFID, switches, and sensors are some examples of IoT devices that collect orig-
inal data or information. In wireless architectures like waveform design, analysis
of signals and security, deep learning has initiated to be utilized. Machine learning
is a method of extracting characteristics from data and making decisions based on
previous and real-time streaming data [30] (Table 1).
R. Jabeen et al.
Machine Learning for Security of Cyber-Physical Systems and Security …
Table 1 (continued)
Methods References Black box White box Descriptions Adversarial mechanism equation
Momentum Jinping Dong et al. ✔ ✔ Dynamic …
iterative fast [14] incremental
gradient sign gradient-based
technique techniques for
(MI-FGSM) improving the
chances of creating
adversarial cases
The variant of Andrew Ilyas et Al. ✔ ✖ NES version to …
natural [33] present the
evolution adversarial example
strategies generation strategy
(NES) under
search limits,
restricted
information,
ZOO (zero Chen et al. [34] ✔ ✖ Use the zero-order …
order) (ZOO)
Arbitrary position
climbing method in
conjunction image
compression,
hierarchy attack, and
sampling methods to
conduct black-box
attacks
(continued)
825
826
Table 1 (continued)
Methods References Black box White box Descriptions Adversarial mechanism equation
ATNs Baluja et.al. [35] ✖ ✖ Adversarial …
(adversarial transformation
transformation network (ATN)
network) generated adverse
samples using a
feed-forward neural
network
Δ
Deep fool Moosavi-Dezfooli ✖ ✖ The minimal norm K (x) =argk max f k (x)
et al. [36] adversarial
perturbation iterative
approach is used to
calculate the
minimum norm. The
minimal norm
adversarial
perturbation iterative
approach is used to
calculate the
minimum norm
One-step Kurakin et al. [37] ✔ ✖ Getting adversarial X adv = X − ε Sign (▽x J(X, y target))
methods of a samples from
target class Formula [5]
calculation
(continued)
R. Jabeen et al.
Machine Learning for Security of Cyber-Physical Systems and Security …
Table 1 (continued)
Methods References Black box White box Descriptions Adversarial mechanism equation
IGS (iterative Wang et al. [38] ✖ ✔ The huge data set’s i =x i−1 – clip ε
xadv adv
gradient sign high dimensionality i−1
(α. sgn (▽ x L(F(xadv ), y true)))
method) and complex data
manifolds make it
difficult for the
defence
to characterize
adversarial
samples
⎧ ( ) Δ
−|δg y, y |2 /2 ( )
Δ
l y, y ,
⎪
⎪ Δ
)] ⎨ −C.e ( ) ( ) g = gθ(x, y )
⎪
⎪
[ ( Δ
2/2
−|δg y, y | l y, y , Δ
827
828
Table 1 (continued)
Methods References Black box White box Descriptions Adversarial mechanism equation
Δ Δ
R. Jabeen et al.
One pixel Su et al. [43] ✔ ✖ To create adversarial xi (g + 1) = (x r 1 (g) + F(xr 2 (g) − xr 3 (g)), r1 /= r2 /= r3
samples, only one
pixel per image was
changed
Machine Learning for Security of Cyber-Physical Systems and Security … 829
approaches in light of CNN susceptibility. Black-box and white-box assaults are the
two basic types of adversarial attacks. In white-box attacks, CNN models are believed
to be known, whereas, in black-box attacks, they are unknown. Physical assault
methods, in addition to algorithmic attacks, generate real-world objects. CNN’s
models were misclassified as a result of this. Existing tracking-by-detection tech-
nologies entail two basic stages in terms of online updates. While using an offline
pre-trained CNN model, trackers do not update online in step1. Trackers collect data
in step2 to update the model, samples from prior frames were found online. It’s
important to keep in mind that trackers aren’t model adaptability that is improved
by gathering samples incrementally with online updates, which may assist protect
against hostile perturbations. They deploy on two state-of-the-art trackers, and an
adversarial assault and defence were presented. On the OTB100 data set, we first
assess the baseline performance. For all of the experiments, we build our approaches
on top of DaSiamRPN and RT-MDNet [44].
A wide number of domains, including computer vision, natural language
processing, and anomaly detection, have seen tremendous success with machine
learning systems. The machine learning technologies being suggested for detecting
cyber-attacks, these attackers are a major source of concern. The security commu-
nity is particularly concerned about the emergence of adversarial machine learning,
particularly concerning the applicability of these bouts to intrusion detection systems
(IDS) [45, 46] (Table 2).
Table 2 (continued)
Defence name References Defence method Description
Data randomization Xie et al. [51] Randomness defence
approaches add
randomness to the input
(such as resizing or
padding) or to the
model parameters (such
as
adding randomness to
the model parameters)
to
reduce the impact of
adversarial examples
Regularization Lyu et.al[52] Modifying model Carefully regulating the
networks global
Lipschitz constant, is a
layer-wise
regularization strategy
for decreasing the
network’s sensitivity to
tiny perturbations
Defensive distillation Soll et al. [53] The idea behind this
strategy is that
soft-label training
delivers more
information than
hard-label training
since they encode the
relative distinctions
across classes. As a
result, it has been
recommended that
robust classifiers be
trained using this
strategy, known as
defensive distillation
(continued)
Table 2 (continued)
Defence name References Defence method Description
Feature squeezing Xu et al. [54] Reduce the colour bit
complexity of each
pixel and use spatial
levelling as a feature
squeezing approach.
Adversarial
perturbations can be
reduced to a certain
extent via variance
minimization. It
randomly selects a
small pixel value and
reconstructs the
simplest picture that is
compatible with those
pixels
Deep contractive Wu et al. [55] To reduce adversarial
network noise the network for
compression which is a
deep network actually
employs a noise
reduction automated
encoder as a result of
this, DCN applied a
softening penalty same
as the convolutional
auto encoder (CAE) in
the process of training,
which was proven to
offer some protection
against assaults like
L-BGFS
Mask defence Reddy Kalavakonda By training the original
et al. [56] pictures and related
adversarial samples the
mask layer captures the
disparities between the
actual pictures and the
output attributes of the
prior network model
layer
(continued)
Machine Learning for Security of Cyber-Physical Systems and Security … 833
Table 2 (continued)
Defence name References Defence method Description
Defence-GAN Esmaeilpour et al. Using auxiliary tools Defence-GAN is a type
[57] of defence that use a
WGAN that has been
trained on valid training
samples to learn how to
remove perturbations
from adversary
examples
MagNet SAYED et al. [58] To distinguish between
legal and adversarial
data MagNet employs a
detector. Any deviation
in a sample for testing
and the manifold
exceeds a threshold the
detector rejects that one
High-level Liao et al. [59] High-level
representation guided representation guided
denoiser (HGD) denoiser (HGD) is used
for creating a strong
adversarial model that
can with stand both
black-box and
white-box attacks
∑
UA = p(x, y)[U A (C A(x), y) − W(x, A(x))] (2)
(x,y)∈x y
Recently, the cyber realm has begun to investigate adversarial machine learning.
The developers used the DREBIN Android malware data set to produce adversarial
samples. Using a modified Jacobian saliency map approach, only 20 antagonistic
characteristics could be added by the writers. The authors were able to fool a classifier
that is actually neural network-based 50–80% of the time based on a target network
from a normal beginning performance of over 95%. That research, however, required
white-box access to the detector. When the authors could only query malware classi-
fiers, they revealed how to attack them. An approximate training data set for a replace-
ment malware classifier was labelled by repeated queries. The substitute detector was
then employed as a stand-in for the real target, which was unknown. To create the
stealthy attacks GAN was used which functioned very well. A random forest, which
had a 0.19% accuracy versus the adversarial data, was the best-performing classifier
[45].
Attacker Modelling: When bearing in mind how attackers in virtual systems can
act, it’s important to keep the following in mind, it needs to establish a consistent
model for describing an intruder that can be used to inspire adversarial research
principles. Such principles do not hold the same weight in the cyber world; hence,
alternate attacker modelling is required. The following are some of the qualities we
recommend.
Levels of Perturbation: Certain fields in network traffics are not modified either
the attack will fail because a faulty packet will be formed, or specific areas will be
encoded, causing the attack to fail.
Attacker Knowledge: The attacker’s understanding of the destination system can
be measured in terms of data about the targeted neural network and computer vision
protections. It must also indicate how much of an IT system the invader is familiar
with in the cyber realm. In systems that are cyber-physical nature such as systems
of industrial control, the attacker’s grasp of dynamic behaviour is also crucial. If the
attacker has no idea how their changes will affect the evolution of the system, all
they can do is optimize mercilessly for another time step.
Timing: Defining the invader’s capacity to decide the strike’s initial point is impor-
tant. Some systems will become more susceptible to a sneaky attacker in future,
making it much easier to hack the system.
Machine Learning for Security of Cyber-Physical Systems and Security … 835
Human in the Loop: Finally, see if a person is required in extra to the IDS [61].
Depending on the assault, the level of disturbance necessary to deceive an intrusion
detection system may be abundant to be perceptible to a man watching a human–
machine interaction. The significance of these changes is based on the real system
and human response times: for instance, if the system under assault is a control
source, variations might happen too rapidly for a person to behave.
Min max V (G, D) = E X ∼Pdata(x) [log(D(X )] + E z∼P z(z) [log(1 − D(G(z))) (3)
Reconstructed Data Y= 0
GZ Discrimi- Y=?
Y
=
Generator
1
Z X
Noise Original Data
There are various types of GANs such as deep convolutional (DC GAN), Conditional
GAN, Least Square GAN (LSGAN), Auxiliary Classifier GAN (ACGAN), Dual
Video Discriminator GAN, Tab GAN, etc. Conditional GAN is a widely used GAN,
and Tab GAN is used to generate tabular synthetic data samples. The Conditional
GAN, these networks may be built by simply feeding the extra auxiliary information
(e.g. label) into the GAN, transforming it into extra auxiliary information (e.g. label)
into the GAN, transforming it into a CGAN as shown in Fig. 6.
The CGAN generator uses the extra auxiliary information y (label, text, or images)
and a latent vector z to generate conditional real-looking data (G (z|y)), and the CGAN
discriminator uses the extra auxiliary information y (label, text, or images) and real
data x to distinguish between generator generated samples-D (G (z|y)) and real data
x. The data creation may be controlled using CGAN. This isn’t doable with the
standard GAN. [65]. The following is the CGAN updated loss function [66]:
Y
=
Generator
1
L Z L X
Label No Label Original Data
Our methodology is based on the evaluation of learning model (ML and DL) perfor-
mance in non-adversarial scenarios and adversarial scenarios with and without adver-
sarial learning. Our proposed methodology is depicted in Fig. 7. We evaluate the
performance of ML and DL models in our framework in three different scenarios,
i.e. on a normal data set, on an adversarial data set, and adversarial data set with
adversarial learning. Various adversarial attacks will be considered such as data modi-
fication, perturbation, corruption, etc., [67] as assessed thereby for attack severity.
We initially select a model and its structure by performing hyper-parameter tuning.
Also, the data set selected is pre-processed and split into the training set and testing
set for performance evaluation purposes. Our methodology consists of three phases.
In Phase 1, the model is trained and tested on a normal data set without adversarial
attack consideration and the performance is measured as P1. In Phase 2, we perform
adversarial attacks on the testing set, and the model trained in Phase 1 is tested for
performance, and its resulting performance is indicated as P2. In Phase 3, we consider
the adversarial learning of the model and train it to show resilience against adver-
sarial attacks. The adversarial learned model is then tested for performance on the
Phase 2 testing set containing adversarial attacks. The performance is measured and
indicated with P3. Consequently, we compare the performances of P1, P2, and P3
in general and P2, and P3 in particular to assess the improvement achieved through
adversarial learning.
(a) Effective and efficient defence against white-box attacks: To the authors’
knowledge, no defence has been developed that can strike a compromise
between efficacy and efficiency. Adversarial training is the most successful in
terms of efficacy, but it comes at a high computational time. In terms of capa-
bility, several randomization and information and communications technology
defences/detection methods may be configured in a matter of seconds. Although
certificated defences suggest a path to theoretically assured privacy, their high
efficiency falls well short of the actual requirements.
(b) Causality behind adversarial samples: Although many adversarial assaults
have been developed, the cause of adversarial examples is still a mystery. The
pervasiveness of adversarial data was attributed to model structures and based
838 R. Jabeen et al.
Perturbation LEGEND
/Insertion, Modifi- Phase 1
cation, Corruption
Phase 2
Dataset Pre-processing Training Set Adversarial Phase 3
Attack
Testing Set
Performance P1
ML or DL Hyper-parameter
Training Testing
Model Tuning Selected Trained Performance P2
Model Model
Performance P3
Adversarial
Learning
learning in the early research on the topic, which assumed that suitable tactics
and network design considerably improved adversarial resilience. However,
such efforts—particularly those that result in obscured gradients—create a false
sense of safety. Recent research, on the other hand, has discovered that adver-
sarial susceptibility is more likely to be caused by data transfer geometry and a
lack of training information.
(c) Existence of a general robust decision boundary: Because there are so many
distinct adversarial assaults described by different metrics, it’s logical to wonder:
is there a basic strict disciplinary boundary that a specific type of DNN with a
specific training technique can learn? The answer to this question is now “no”.
Even though PGD adversarial training shows excellent resilience to a wide
variety of L ∞ assaults demonstrate that it is still subject to adversarial assaults
as evaluated by other L p standards like EAD and CW2. Establish that for a
two-concentric-sphere database, the optimum L 2 and L ∞ decision bounds are
distinct and that the discrepancy rises with the similarities of the data sources.
(d) Model privacy: Model extraction threats pose a major danger to the security of
learnt models through unlawful duplication. According to Behzadan and Hsu,
one way to mitigate this is to raise the cost of such assaults or to timestamp the
regulations. We may see some randomization in the medium to protect against
such threats, but this will result in an unreasonable loss of performance. A
potential subject of research is creating approaches that might induce restricted
randomization in the system to protect against such assaults.
Machine Learning for Security of Cyber-Physical Systems and Security … 839
6 Conclusion
In this survey, we discuss the security issues in cyber-physical systems (CPS) and
the use of ML and DL for its security. We discussed the difficulties faced by tradi-
tional security approaches to dealing with complex attacks. The ML and DL security
methods are vulnerable to adversarial attacks, and there is a need to ensure their secu-
rity before their utilization for security in CPS domains. Their security is evaded by
adversarial attacks both at a testing time (poisoning attack) and training time (evasion
attack) to misclassify the data by performing data corruption, data perturbation, and
data modification. Depending on the knowledge of the attacker about the attacking
model, the capabilities of the attacker vary from black box to white box. The white
one contains, and the attacker possesses greater data about the model and its param-
eters, while in the case of the black box, the attacker possesses no understanding of
the model being attacked and its parameters. We also discuss the general Adversarial
Threat Model (ATM) which could be used as a risk prevention technique. More-
over, we have covered the use of generative adversarial networks (GAN) for creating
fake data samples to perform adversarial attacks. Conditional GAN (C-GAN) which
considers the labels in supervised data set is also discussed in great detail. Based on
adversarial attacking and defence methods, we propose a methodology to evaluate
the performance of ML and DL models. We consider three scenarios of evaluating
performance, i.e. normal performance, performance against adversarial attacks, and
performance against adversarial attacks with adversarial learning.
References
1. Kagita MK, Thilakarathne N, Gadekallu TR A review on cyber crimes on the internet of things
2. El Naqa I, Murphy MJ What is machine learning ? pp 3–11
3. Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects, vol 349,
no 6245
4. Arnold L et al (2016) An introduction to deep learning to cite this version : HAL Id : hal-
01352061 an introduction to deep learning
5. Cao L, Jiang X, Zhao Y, Wang S, You D, Xu X (2020) A survey of network attacks on
cyber-physical systems. IEEE Access 8:44219–44227
6. Rana B (2020) A systematic survey on internet of things : energy efficiency and interoperability
perspective, no. August, pp 1–41
7. Zhong C, Zhu Z, Huang R (2017) Study on the IOT architecture and access technology
8. Zhu Y, Sampath RZ, Jennifer R (2008) Cabernet : connectivity architecture for better network
services, no. December
9. Ray PP (2017) Internet of robotic things: concept, technologies, and challenges. IEEE Access
4:9489–9500
10. Benchmarking EEC, Hao T, Huang Y, Wen X, Gao W, Zhang F Edge AIBench: towards
comprehensive end-to-end edge computing benchmarking, pp 1–8
11. Garcia-rodriguez J, Azorin-lopez J, Tom D, Fuster-guillo A, Mora-mora H (2021) IA-CPS :
intelligent architecture for cyber-physical systems management, vol 53, no June
12. Sacala IS, Pop E, Moisescu MA, Dumitrache I, Caramihai SI, Culita J (2021) Enhancing CPS
architectures with SOA for industry 4.0 enterprise systems enhancing CPS architectures with
SOA for industry 4.0 enterprise systems, no January 2022
840 R. Jabeen et al.
13. Fung WW et al Protection of keys against modification attack HKUST theoretical computer
science center research report HKUST-TCSC-2001–04
14. Wilhelm M, Schmitt JB, Lenders V Practical message manipulation attacks in IEEE 802.15.4
wireless networks, pp 2–4
15. Mcdermott JP Attack net penetration testing, pp 15–21
16. Ashok A, Govindarasu M, Wang J (2017) Cyber-physical attack-resilient wide-area grid, no i,
pp 1–17
17. Luckett P (2016) Neural network analysis of system call timing for rootkit detection, pp 1–6
18. Mohammad AH (2020) T. World, and I. Science, Ransomware Evolution, growth and
recommendation for detection, no February
19. Tyagi AK, Aghila G (2011) A wide scale survey on botnet, vol 34, no 9, pp 9–22
20. Neuman C Challenges in security for cyber-physical systems
21. I.- Things safety and security in cyber—physical systems and internet-of-things systems, vol
106, no 1, pp 9–20 (2018)
22. Akella R, Tang H, Mcmillin BM (2010) Analysis of information flow security in cyber—
physical systems. Int J Crit Infrastruct Prot 3(3–4):157–173
23. C. Security and T. Monitoring (2011) Importance of intrusion detection system (IDS), vol 2,
no 1, pp 1–4
24. Salloum SA, Alshurideh M Machine learning and deep learning techniques for cybersecurity
: a review, vol 2. Springer International Publishing
25. Introducing machine learning, no November (2014)
26. Jan STK, Messou J, Lin Y, Huang J, Wang G Connecting the digital and physical world :
improving the robustness of adversarial attacks
27. Taran O, Rezaeifar S, Voloshynovskiy S Bridging machine learning and cryptography in
defence against adversarial attacks, no 200021
28. Dong Y et al Boosting adversarial attacks with momentum, pp 9185–9193
29. Li J, Zhao R, Huang J, Gong Y, M. Corporation, and O. M. Way (2014) Learning small—size
DNN with output—distribution—based criteria, no September, pp 1910–1914
30. Luo Z, Lu Z (2020) Adversarial machine learning based partial-model attack in IoT, pp 13–18
31. Adate A, Saxena R (2017) Understanding how adversarial noise affects single image classifi-
cation. In: Proceedings of the international conference on intelligent information technologies,
Chennai
32. Li P Query-efficient black-box attack by active learning
33. Ilyas A, Engstrom L, Athalye A, Lin J (2018) Black-box adversarial attacks with limited queries
and information
34. Chen P (2017) ZOO : zeroth order optimization based black-box attacks to deep neural networks
without training substitute models, pp 15–26
35. Baluja S, Fischer I (2016) Adversarial transformation networks : learning to generate adversarial
exampl arXiv:1703.09387v1 [cs.NE] 28 Mar 2017, no 2013
36. Fawzi A, Frossard P (2016) DeepFool : a simple and accurate method to fool deep neural
networks, pp 2574–2582
37. Goodfellow IJ (2017) Amls, pp 1–17
38. Wang DD, Li C, Wen S, Xiang Y (2020) Defending against adversarial attack towards deep
neural networks via collaborative multi-task training, vol 5971, no AUGUST 2019, pp 1–12
39. Cisse M, Adi Y, Keshet J Houdini : fooling deep structured prediction models
40. Sarkar S, Mahbub U UPSET and ANGRI : breaking high performance image classifiers, vol
20742, no 1, pp 1–9
41. Fawzi O, Frossard P Universal adversarial perturbations, pp 1765–1773
42. Papernot N, Mcdaniel P, Jha S, Fredrikson M, Celik ZB, Swami A (2016) The limitations of
deep learning in adversarial settings
43. Su J, Vargas DV, Sakurai K One pixel attack for fooling deep neural networks, pp 1–15
44. Jia S, Ma C, Song Y, Yang X Robust tracking against adversarial attacks
45. Zizzo G, Hankin C, Maffeis S, Jones K (2019) INVITED : adversarial machine learning beyond
the image domain
Machine Learning for Security of Cyber-Physical Systems and Security … 841
46. Li J, Yang Y, Sun JS, Tomsovic K, Qi H (2021) ConAML: constrained adversarial machine
learning for cyber-physical systems, vol 1, no 1. Association for Computing Machinery
47. Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing, pp 1–11
48. Papernot N, Mcdaniel P, Goodfellow I Practical black-box attacks against machine learning,
pp 506–519
49. Chen Y Blocking transferability of adversarial examples in black-box learning systems
50. Dziugaite GK, Roy DM (2016) A study of the effect of JPG compression on adversarial images
arXiv:1608.00853v1 [cs.CV] 2 Aug 2016, no Isba
51. Xie C, Wang J, Zhang Z, Zhou Y, Xie L, Yuille A Adversarial examples for semantic
segmentation and object detection, vol 1, pp 1369–1378
52. Lyu C (2015) A unified gradient regularization family for adversarial examples
53. Soll M, Hinz T, Magg S, Wermter S (2019) Evaluating defensive distillation for defending text
processing neural networks against adversarial examples
54. Xu W, Evans D, Qi Y (2018) Feature squeezing : detecting adversarial examples in deep neural
networks, no February
55. Wu EQ, Zhou G, Zhu L, Wei C, Ren H, Sheng RSF (2019) Rotated sphere Haar wavelet
and deep contractive auto-encoder network with fuzzy Gaussian SVM for pilot’s pupil center
detection. IEEE Trans Cybern 1–14
56. Kalavakonda RR, Vikram N, Masna R, Bhuniaroy A (2020) A smart mask for active defense
against coronaviruses and other airborne pathogens, vol 2248, no c
57. Esmaeilpour M, Cardinal P, Koerich AL (2021) Multi-discriminator Sobolev defense-GAN
against adversarial attacks for end-to-end speech systems, vol X, no X, pp 1–10
58. Sayed E, Member S, Yang Y, Member S (2019) A comprehensive review of flux barriers in
interior permanent magnet synchronous machines. IEEE Access 7:149168–149181
59. Liao F, Liang M, Dong Y, Pang T, Hu X Defense against adversarial attacks using high-level
representation guided denoiser, pp 1778–1787
60. Ren K, Zheng T, Qin Z, Liu X (2020) Adversarial attacks and defenses in deep learning.
Engineering
61. Duque S, Nizam M (2015) Using data mining algorithms for developing a model for intrusion
detection system (IDS). Procedia Comput Sci 61:46–51
62. Hoang Q, Nguyen TD, Le T, Phung D (2017) Multi-generator generative adversarial nets, no
August
63. Wang K, Gou C, Duan Y, Lin Y, Zheng X, Wang F (2017) Generative adversarial networks :
introduction and outlook, vol 4, no 4, pp 588–598
64. Creswell A et al (2017) Generative adversarial networks : an overview, no April, pp 1–14
65. Jabbar A, Li X, Omar B A survey on generative adversarial networks : variants, applications,
and training, pp 1–38
66. Wang Y et al (2018) Accepted Manuscript
67. Chakraborty A, Alam M, Dey V, Chattopadhyay A, Mukhopadhyay D (2018) Adversarial
attacks and defences: a survey, vol x, no x
Legal Challenges of Digital Twins
in Smart Manufacturing
1 Introduction
The influence of digital twins is becoming more important as digitalization and the
Internet of Things (IoT) grow more ubiquitous. Nevertheless, the complicated nature
of digital twins paves legal, ethical, and social issues. Most notably, digital twins are
intangible concepts that connects several issues relating to intellectual property and
data protection. As a result, it is critical to first comprehend what digital twins are,
how they are used, and what the legal ramifications are in the real-life scenarios.
R. Karim (B)
School of Business, Department of Business Law and Taxation, Monash University, Subang Jaya,
Malaysia
e-mail: ridoan.karim@monash.edu
S. Vyas
School of Computer Science, UPES, Dehradun, India
A. I. Kabir
School of Business and Economics, United International University, Dhaka, Bangladesh
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 843
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_63
844 R. Karim et al.
At its most basic level, a digital twin may simply serve as a central store of data,
including information on how a certain asset—such as a building—was planned and
created [1]. The technology might be utilized for building’s creation, administra-
tion, operation, and maintenance. Nevertheless, the technology utilization is not so
straight-forward; rather it is complex and diverse, incorporating virtual promontories
of practical beings, at the multiple state-of-the-art statuses. This is the unique nature
of the technology—as it provides a precise benchmark of frame developments to
impact choices and provide solutions and establish a sustainable future for the smart
manufacturing industries. Nevertheless, issues pertaining to scalability are providing
a hard time governing a model of this complexity. Additionally, issues pertaining to
data ownership and privacy protection are all potentially ambiguous in this early
phrase of the technology.
As a result, there are certain legal, policy, and ethical constraints on using this
technology. Digital twins are linked systems in which changes to one data item have
an impact on other parts of the model. As more parties are able to access and depend on
data that may include inaccuracies, the fundamental issue of data ownership, liability,
and intellectual property remains unanswered. Existing laws may make it difficult
to govern the technological system. As multiple parties depend on the accuracy of
information provided by each other on this technology platform, it might lead to
trust difficulties. As a result, this chapter will examine the current legal, ethical, and
regulatory difficulties of digital twins in the smart production business and potential
solutions for their future use.
Worldwide health care and life sciences are also utilizing digital twins to deliver
tailored therapies [8]. This involves giving doctors the freedom to use clinical support
tools within the digital health care system [9]. However, there is still more possi-
bility. Digital medicines, virtual reality-based treatments, enhanced human body
comprehension, hospital management, and disease modelling are other sectors where
research into the usage of digital twins should be further enhanced.
Digital twins are also being used to create digital cities [9]. Dynamism is a char-
acteristic of large cities that makes managing urban infrastructure more challenging.
One simply needs to consider the millions of people that reside in big cities, as
well as the numerous hospitals, schools, workplaces, and retail establishments—that
need the ongoing care of local government [10]. The National Research Founda-
tion’s (NRF) “Virtual Singapore” project offers 3D semantic modelling of the urban
environment, allowing users to directly relate the data to the real world by displaying
actual locations, the characteristics of various modes of transportation, or elements
of buildings and urban infrastructure [11]. The platform also provides a range of
dynamic real-time indicators, as well as data on demographics, temperature, and
traffic, in addition to the standard map [12].
Digital twins also started playing a great role in the automotive industry [7].
Concepts for brand-new automobiles are created digitally. In contrast to a prac-
tical physical depiction, this offers an excellent visual portrayal of a future vehicle.
Things could eventually need to alter as that vehicle advances in the development
process based on how it responds or performs in the actual world. Digital twins are
being employed more frequently to expedite and enhance the development process of
autonomous vehicles [13]. Vehicle testing has also gotten exceedingly complicated
in recent times, and this complexity increases when we are talking about autonomous
cars. Through the use of real-world data, digital twins facilitate that process and aid
in the elaboration of AI judgments. As a result, automakers now have the technology
and equipment to evaluate car designs before they are put on a testing track or real
roads.
Digital twins also hold the promise to transform global supply chain [14]. There
may not be an appropriate way to test out new ideas, goods, or modifications to the
present supply chain because it is continuously in motion, at least not without causing
significant delays. The solution to this issue is digital twins, which let logistics teams
experiment while also gaining a precise picture of how it might impact operations
[15]. They are an integral part of the ever-evolving digital supply chain, which is
vastly improving operations through technologies like advanced analytics, robotics,
automation, rapid manufacturing, and more.
To discover possible issues before they are introduced to the market, new product
packaging or revised designs, for instance, can be virtualized by putting them through
a digital twin model. Teams may utilize the same technologies to optimize inven-
tory and positioning, evaluate environmental or transportation conditions, and more.
Imagine a prediction model that can be used to analyse almost any situation and
produce answers that are both extraordinarily accurate and realistic.
Digital twins are useful tools in the area of architectural design as well [17]. Most
of the time, designers constantly attempt to construct or produce model according
846 R. Karim et al.
Fig. 1 Framework of the function structure behaviour control intelligence performance (FSBCIP)
Legal Challenges of Digital Twins in Smart Manufacturing 849
consider, which may be gathered by deep learning and big data analytics approaches.
In summary, a digital twin model with intelligent capabilities may help smart manu-
facturing become smarter [43]. Most modern digital twin models are limited in the
understanding and basic thinking of SMS design options. Integrating state-of-the-art
performance algorithms into digital twin models in future could result in the ability
to conceive and produce new design solutions automatically.
Apart from blockchain, web services and cloud computing for assisting design,
analysis and simulation in a distributed digital twin situation are another kind of
crucial enabling technology for realizing collaborative smart manufacturing. Both
designers and computer resources are spread globally in the context of collaborative
smart manufacturing [48]. Cloud computing and web service technologies enable
several comparable units in far-away places to share design and work jointly [49].
Another reason for adopting cloud computing technology is the necessity for high-
performance computing for replication work in smart manufacturing [50], which
might enable less costly computing resource allocation, dynamic distribution, and
growth. By combining online services and cloud computing, designers may work on
SMSD projects from anywhere.
850 R. Karim et al.
Other than real estate and business, there are many digital twin use cases related to
healthcare sector today. In 2023, the marketplace for digital twin is predictable to
be worth $15.6 billion, by a yearly progress rate of above 40%. The possibilities
of digital twins for smart manufacturing are endless, and they are reshaping the
real estate industry’s future. However, the digital twin’s inventive potential should
not overwhelm the legal framework’s value. Indeed, the creation of the digital twin
necessitates the removal of certain legal stumbling blocks. The digital twin is built on
contextualized data that is then translated into knowledge. Data, algorithms and AI
systems are therefore at the centre of the digital twin. As a result, the legal framework
for these notions is crucial to the growth of the digital twin. The digital twin creates
legal concerns about data, liability, algorithmic reliability, and intellectual property.
As a result, the digital twin becomes a scientific as well as a legal concern.
The legal ramifications of digital twins are many and may be difficult to navigate,
particularly if there is no regulatory framework that can establish norms and provide
a consistent, secure basis for which we can rely. Consequently, well-documented
contracts are essential to control communication with digital twin’s technology users.
Cyber-security is an essential part of digital twin technology. An uncontrolled digital
twin resolution can have serious real-world significances, especially if it is used to
control composite systems or to make or adapt a physical creation and use it or
sell it in the market. Laws must thus control the unique characteristics of digital
twin resolutions and must be detailed to address the most significant dangers and
possible ramifications, while also being flexible enough to adapt to technological
and real-world linked counterpart evolutions.
any legal framework devoted to the digital twin until now. However, data protection
requirements are similarly valid in the usage of personal data in digital twins [51].
The smart manufacturing industry is made up of ideas, which is why intellectual
property generates value and deserves legal rights under patent protection. A patent
may be filed depending on how the technology is created, utilized, and how new
ideas are produced. In general, intellectual property rights and patents are difficult
to monitor. The use of intellectual property rights is conceivable if the algorithm
is included in a patentable invention [52]. For example, consider the scenario of a
discovery made possible by the modelling of a digital twin. The innovation might
therefore be legally protected, as long as the standards for novelty are encountered:
the origination needs to be original and capable of industrial application.
The digital world’s actors are putting a lot of pressure on the extent of liability. The
traditional principles of civil and criminal culpability are being utilized for robots and
autonomous technologies. Machines were formerly thought to be incapable of acting
on their own, but in the new virtual world, they have overcome the gap and become
autonomous. Do we need to adapt our paradigms and enable robots to represent us
in an authorized structure that governs virtual communities [53]? Is it possible to
hold the actual person liable for the harm produced by his or her digital twin? Who
is to blame: the computer programmer, the industrialist, the owner of robot, or the
machine itself? The legal application of digital twin in its current stage does not
answer the above questions.
The legal effect of digital twins on liability is likely to be the most complicated.
To hold someone legally accountable for any kind of loss, you must be able to
demonstrate that the liable party failed to execute their obligation, which resulted
in the loss or harm for which they are liable. Because digital twins are a network
of interconnected technologies, proving responsibility is challenging. As a result,
changes in one system have an impact on the whole model. Errors may be difficult
to see and track when there are several users and data sources involved. One option
to address this issue is to establish a centralized command centre that monitors and
approves data updates. Even yet, mistakes may happen and are tough to track down.
Errors might also have come from outside the digital twin ecosystem. For example, if
the physical twin’s sensors are not operating properly, then it is normal for the whole
system to fail. To assist the best functioning of digital twins, it is necessary to specify
goals, qualities, and effective functions of a task. To build trust and responsibility,
these concepts might be expressed in the application of digital twins.
852 R. Karim et al.
The data quality will determine the digital twin’s virtual picture, which will mirror
the actual product. Alternatively, data reliability is a significant and difficult problem.
The AI research group has just recently started to focus on approaches to identify
and eliminate prejudice in supervised automated learning systems’ training data
sets. It has been shown that the computerized and automatic evaluation of large
data increases the danger of prejudice [54]. As a result, data quality is critical: it
should be impartial, and comprehensive, that is, reflective of diverse developments
and consistent across time.
5 Conclusion
Because of the complexity, digital twins have substantial legal significances. The
usage and models of digital twins will rise in tandem with digitalization. The legal
effect of digital twins on liability is likely to be the most complicated. To hold
someone legally accountable for any kind of loss, you must be able to demonstrate
that the liable party failed to execute their obligation, which resulted in the loss or
harm for which they are liable. The use of digital twins will grow and become more
complex in the imminent times. Consequently, it is important that all groups involved
in the use of digital twins form a pure individuality, function, and accountability from
the beginning. If the models are prepared appropriately, this will ensure that all groups
are legitimately secured.
References
1. Lu Q, Xie X, Parlikad AK, Schooling JM (2020) Digital twin-enabled anomaly detection for
built asset monitoring in operation and maintenance. Autom Constr 118:103277
2. Bundin M, Martynov A, Shireeva E Legal issues on the use of “digital twin” technologies for
smart cities. In: International conference on electronic governance and open society: challenges
in Eurasia. Springer, pp 77–86
3. Butt J (2020) Exploring the interrelationship between additive manufacturing and Industry 4.0.
Designs 4:13
4. Jones D, Snider C, Nassehi A, Yon J, Hicks B (2020) Characterising the digital twin: a
systematic literature review. CIRP J Manuf Sci Technol 29:36–52
5. Semeraro C, Lezoche M, Panetto H, Dassisti M (2021) Digital twin paradigm: a systematic
literature review. Comput Ind 130:103469
6. Millwater H, Ocampo J, Crosby N (2019) Probabilistic methods for risk assessment of airframe
digital twin structures. Eng Fract Mech 221:106674
7. Grieves M, Vickers J (2017) Digital twin: mitigating unpredictable, undesirable emergent
behavior in complex systems. Transdisciplinary perspectives on complex systems. Springer,
pp 85–113
8. Elayan H, Aloqaily M, Guizani M (2021) Digital twin for intelligent context-aware IoT
healthcare systems. IEEE Internet Things J 8:16749–16757
Legal Challenges of Digital Twins in Smart Manufacturing 853
9. Erol T, Mendi AF, Doğan D The digital twin revolution in healthcare. In: 2020 4th international
symposium on multidisciplinary studies and innovative technologies (ISMSIT). IEEE, pp 1–7
10. Deng T, Zhang K, Shen Z-JM (2021) A systematic review of a digital twin city: a new pattern
of urban governance toward smart cities. J Manage Sci Eng 6:125–134
11. Ketzler B, Naserentin V, Latino F, Zangelidis C, Thuvander L, Logg A (2020) Digital twins
for cities: a state of the art review. Built Environ 46:547–573
12. Shahat E, Hyun CT, Yeom C (2021) City digital twin potentials: a review and research agenda.
Sustainability 13:3386
13. Yun H, Park D (2021) Virtualization of self-driving algorithms by interoperating embedded
controllers on a game engine for a digital twining autonomous vehicle. Electronics 10:2102
14. Augustine P (2020) The industry use cases for the digital twin idea. Adv Comput 117:79–105.
Elsevier
15. Barricelli BR, Casiraghi E, Fogli D (2019) A survey on digital twin: definitions, characteristics,
applications, and design implications. IEEE Access 7:167653–167671
16. Bécue A, Maia E, Feeken L, Borchers P, Praça I (2020) A new concept of digital twin supporting
optimization and resilience of factories of the future. Appl Sci 10:4482
17. Ruohomäki T, Airaksinen E, Huuska P, Kesäniemi O, Martikka M, Suomisto J Smart city
platform enabling digital twin. In: 2018 International conference on intelligent systems (IS).
IEEE, pp 155–161
18. Sepasgozar SM, Hui FKP, Shirowzhan S, Foroozanfar M, Yang L, Aye L (2020) Lean practices
using building information modeling (Bim) and digital twinning for sustainable construction.
Sustainability 13:161
19. Nujoom R, Mohammed A, Wang Q (2018) A sustainable manufacturing system design: a fuzzy
multi-objective optimization model. Environ Sci Pollut Res 25:24535–24547
20. De Paolis LT, Bourdot P (2019) Augmented reality, virtual reality, and computer graphics:
6th international conference, AVR 2019, Santa Maria al Bagno, Italy, June 24–27, 2019,
Proceedings. Springer, Part II
21. Zakoldaev D, Korobeynikov A, Shukalov A, Zharinov I Digital forms of describing industry
4.0 objects. In: IOP conference series: materials science and engineering. IOP Publishing, p
012057
22. Negri E, Fumagalli L, Macchi M (2017) A review of the roles of digital twin in CPS-based
production systems. Proc Manuf 11:939–948
23. Shao G, Kibira D Digital manufacturing: requirements and challenges for implementing digital
surrogates. In: 2018 winter simulation conference (WSC). IEEE, pp 1226–1237
24. Valckenaers P (2020) Perspective on holonic manufacturing systems: PROSA becomes ARTI.
Comput Ind 120:103226
25. Leng J, Wang D, Shen W, Li X, Liu Q, Chen X (2021) Digital twins-based smart manufacturing
system design in industry 4.0: a review. J Manuf Syst 60:119–137
26. Tao F, Zhang H, Liu A, Nee AY (2018) Digital twin in industry: State-of-the-art. IEEE Trans
Industr Inf 15:2405–2415
27. Tao F, Qi Q, Wang L, Nee A (2019) Digital twins and cyber–physical systems toward smart
manufacturing and industry 4.0: correlation and comparison. Engineering 5:653–661
28. Nativi S, Mazzetti P, Craglia M (2021) Digital ecosystems for developing digital twins of the
earth: the destination earth case. Remote Sensing 13:2119
29. Haag S, Anderl R (2018) Digital twin—proof of concept. Manuf Lett 15:64–66
30. Konstantinov S, Ahmad M, Ananthanarayan K, Harrison R (2017) The cyber-physical e-
machine manufacturing system: virtual engineering for complete lifecycle support. Proc CIRP
63:119–124
31. Hasan HR, Salah K, Jayaraman R, Omar M, Yaqoob I, Pesic S, Taylor T, Boscovic D (2020)
A blockchain-based approach for the creation of digital twins. IEEE Access 8:34113–34126
32. Zheng P, Sang Z, Zhong RY, Liu Y, Liu C, Mubarok K, Yu S, Xu X (2018) Smart manufacturing
systems for industry 4.0: conceptual framework, scenarios, and future perspectives. Front Mech
Eng 13:137–150
854 R. Karim et al.
33. Sanna A, Giacalone G (2021) Digital twin and machine learning solutions for the manufacturing
environment
34. Wu C, Zhou Y, Pessôa MVP, Peng Q, Tan R (2021) Conceptual digital twin modeling based on
an integrated five-dimensional framework and TRIZ function model. J Manuf Syst 58:79–93
35. Glaessgen E, Stargel D The digital twin paradigm for future NASA and US air force vehicles. In:
53rd AIAA/ASME/ASCE/AHS/ASC structures, structural dynamics and materials conference
20th AIAA/ASME/AHS adaptive structures conference 14th AIAA, pp 1818
36. Qi Q, Tao F, Hu T, Anwer N, Liu A, Wei Y, Wang L, Nee A (2021) Enabling technologies and
tools for digital twin. J Manuf Syst 58:3–21
37. Bao J, Guo D, Li J, Zhang J (2019) The modelling and operations for the digital twin in the
context of manufacturing. Enterprise Inf Syst 13:534–556
38. Friedland B (2012) Control system design: an introduction to state-space methods. Courier
Corporation
39. Peruzzini M, Pellicciari M (2017) A framework to design a human-centred adaptive manufac-
turing system for aging workers. Adv Eng Inform 33:330–349
40. Scacchi A, Catozzi D, Boietti E, Bert F, Siliquini R (2021) COVID-19 lockdown and self-
perceived changes of food choice, waste, impulse buying and their determinants in Italy:
QuarantEat, a cross-sectional study. Foods 10:306
41. Taylor C, Murphy A, Butterfield J, Jan Y, Higgins P, Collins R, Higgins C (2018) Defining
production and financial data streams required for a factory digital twin to optimise the
deployment of labour. Recent Adv Intel Manuf 3–12. Springer
42. Moyne J, Iskandar J (2017) Big data analytics for smart manufacturing: case studies in
semiconductor manufacturing. Processes 5:39
43. Gao RX, Wang L, Helu M, Teti R (2020) Big data analytics for smart factories of the future.
CIRP Ann 69:668–692
44. Danilczyk W, Sun Y, He H Angel: an intelligent digital twin framework for microgrid security.
In: 2019 North American power symposium (NAPS). IEEE, pp. 1–6
45. Gupta N, Tiwari A, Bukkapatnam ST, Karri R (2020) Additive manufacturing cyber-physical
system: supply chain cybersecurity and risks. IEEE Access 8:47322–47333
46. Leng J, Jiang P, Xu K, Liu Q, Zhao JL, Bian Y, Shi R (2019) Makerchain: a blockchain
with chemical signature for self-organizing process in social manufacturing. J Clean Prod
234:767–778
47. Spellini S, Chirico R, Lora M, Fummi F Languages and formalisms to enable EDA techniques
in the context of industry 4.0. In: 2019 Forum for specification and design languages (FDL).
IEEE, pp 1–4
48. Leng J, Jiang P (2018) Evaluation across and within collaborative manufacturing networks: a
comparison of manufacturers’ interactions and attributes. Int J Prod Res 56:5131–5146
49. Avventuroso G, Silvestri M, Pedrazzoli P (2017) A networked production system to implement
virtual enterprise and product lifecycle information loops. IFAC-PapersOnLine 50:7964–7969
50. Cohen Y, Faccio M, Pilati F, Yao X (2019) Design and management of digital manufacturing
and assembly systems in the Industry 4.0 era, vol 105. Springer, pp 3565–3577
51. Lkhagvasuren G Ensuring rights of the data subject in non-EU countries. In: Proceedings of
the 12th international conference on theory and practice of electronic governance, pp 465–467
52. Mulligan DK, Kluttz D, Kohli N (2019) Shaping our tools: contestability as a means to promote
responsible algorithmic decision making in the professions. Available at SSRN 3311894
53. Bourcier D (2001) De l’intelligence artificielle à la personne virtuelle: émergence d’une entité
juridique? Droit et société 847–871
54. Schmid PC, Amodio DM (2017) Power effects on implicit prejudice and stereotyping: the role
of intergroup face processing. Soc Neurosci 12:218–231
LSTM-Based Encoder–Decoder
Attention Model for Text Translation
and Simplification on the Constitution
of India
Abstract Natural language processing techniques can be used on judicial and leg-
islative documents like the Constitution for making it more accessible to the gen-
eral audience. Various approaches such as Natural Machine Translation (NMT),
text simplification of complex sentences can be performed on the Constitution.
The model proposed in this paper can be used for the purpose of translation and
simplification of the Constitution of India. The model is a LSTM variant form-
ing an encoder–decoder network integrated with an attention layer and is trained
using teacher forcing method. The data set used for the task of translation consists
of Parallel Corpus from IIT Bombay English–Hindi concatenated with our own
curated data set of 300 English articles of the Constitution of India translated to
Hindi, whereas for simplification task it consists of our own curated data set of
complex 300 English articles of the Constitution of India translated to simplified
sentence. The proposed model for translation has a BLEU score of 15.34, and the
research has elaborated on the performance analysis of the generated outputs and
the BLEU score. For simplification, the results show some inconsistencies which
can be improved by increasing the data set for simplification task.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 855
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_64
856 M. Navlakha et al.
1 Introduction
In the following Sects. 2 and 3, we discuss the related works on English to Hindi
machine translation. Section 4 elaborates on the proposed methodology of our
research. Section 5 gives a brief description of the corpus used, and Sect. 6 expounds
on the experimental designs and set-up for the model. Finally, in Sect. 7, we analyse
the results and performance of the model and conclude the paper in Sect. 8.
2 Literature Review
As there is a lot of potential in using NLP models for machine translation and sen-
tence simplification, recent research has shown a myriad of methods that provide
a cohesive end-to-end structure, rather than conventional methods that utilise vari-
ous submodels and long pipelines. Cho et al. [1] propose a neural machine transla-
tion model consisting of encoder and decoder. Neural machine translation models
usually employ an encoder and a decoder. Using two models, the RNN encoder–
decoder and newly created gated recursive convolutional neural networks, the fea-
tures of this technique were investigated. But with a rise in sentence length and an
increase in unknowns, the performance started to degrade.
The model suggested by He et al. [2] improve translation diversity and quality
by using a council of specialised translation models rather than a single translation
model. Each mixture component chooses its own training data set leading to soft
clustering of the parallel corpus. Encoder–decoder architectures have dominated
the field of sequence modelling in NLP since Sutskever et al. [3] released their
“sequence to sequence” learning model. The resulting translations from Seq2Seq
models sometimes lack diversity. The reason for this is the differences in styles,
genres, subjects, or ambiguity in the translation process which causes semantic
and syntactic changes in a corpora that SEQ2SEQ models are unable to detect.
Other approaches include DNNs that have proven to be powerful models.
However, they cannot be used to map sequences to sequences which is a signif-
icant limitation. Sutskever et al. [3] present a sequence to sequence (seq2seq)
model that uses a multilayered LSTM and another deep LSTM. It was observed
that LSTM did not have difficulty with long sentences and LSTM outperformed
the phrase-based SMT system. Escolano et al. [4] suggest an alternative approach
where encoders–decoders which is language specific will be flexible to learn new
language by learning their corresponding modules was proposed. The method was
successful to add new language, without the need of retraining, thus outperform-
ing universal encoder–decoder by 3.28 points on average. Reference [5] introduces
mT5, a multilingual variant of T5 which is pretrained on a new Common Crawl-
based data set covering 101 languages.
Mahmud et al. [6] implemented a GRU-based unidirectional RNN model with
an encoder–decoder attention which is a neural machine translation (NMT) model.
It is focused on English–Bangla translation and a small-sized balanced data set
is used for the same. A huge BLEU score of 50.07 was achieved. The Statistical
Machine Translation (SMT) proposed by Brown et al. [7] makes use of the rules of
probability for language translation.
858 M. Navlakha et al.
Martin et al. [8] build upon the Seq2Seq model. Additional inputs have been
added to the original sentences at the time of training, in the form of control
tokens. The aim is to make the user in charge of how the model simplifies sen-
tences on four important features of sentence simplification. Four control tokens
have been introduced namely NbChars, LevSim, WordRank and DepTreeDepth.
ACCESS scores best with SARI score of 41.87, a substantial advancement
over previous SOTA (40.45), and third to best FKGL (7.22). With respect to
the SARI score, the second and third models, DMASS + DCSS (40.45) and
SBMT + PPDB + SARI (39.96), both make use of the external resource Simple
Paraphrase Database (Pavlick and Callison-Burch, 2016) that was derived from
data that was 1000 times larger than the training set used in this paper.
Narayan et al. [9] present a hybrid approach to sentence simplification. It
combines a machine translation module that handles reordering and replacement
with a model that encodes probabilities for splitting and deletion. The approach
is based on semantics. The SARI score is 28.61, 31.40, 30.46 on Newsela, Turk
Corpus, PWKP/WikiSmall data sets, respectively.
A unique end-to-end neural programmer–interpreter (Reed and de Freitas
2016) that learns to generate edit operations directly, similar to a human editor,
was introduced by Dong et al. [10]. The suggested architecture consists of a pro-
grammer and an interpreter; the programmer predicts an edit operation that will
simplify things, such as add, delete and keep; the interpreter performs the edit
operation while keeping track of a pointer. It gives a SARI score of 38.22, 32.35,
31.41 on WikiLarge, WikiSmall, Newsela data sets, respectively.
As opposed to prior work that uses source-side similarity search for retrieving
memory and bilingual corpora as translation memory (TM), in their proposed par-
adigm, Cai et al. [13] suggest using monolingual memory and crosslingual learna-
ble memory retrieval. In the proposed framework, abundant monolingual memory
can be TM due to the crosslingual memory retriever. Also, for the ultimate trans-
lation objective, the memory retriever and NMT model can be jointly optimised.
The randomness of neural networks leads to the existing neural network
machine translation models unable to effectively reflect the linguistic dependen-
cies and having unsatisfactory results when dealing with long sentence sequences.
Xu et al. [14] propose a machine translation model with entity tagging improve-
ment. It implements LSTM with attention mechanism to tune the extent to which
the context at source language influences the target language sequence. The mean
BLUE value achieved is 24.7%.
Recurrent Neural Networks were initially proposed for NMT by Medsker, Larry
et al. [11]. RNNs relies on the idea of repeated units which makes it easier to learn
and maintain context owing to their cyclic structure. However, in a sentence like:
“Goalkeepers have to always look at the attackers footwork and on-field position to
determine whether the ball will be passed or shot. So, I believe in a game of football
64 LSTM-Based Encoder–Decoder Attention Model for Text Translation … 859
instinctive play is best suited for ___ position”, the blank here can have a variety
of answers :- Goalkeeper, RightFullback, LeftFullback, CenterBack, CenterBack.
RNNs find it difficult to learn the dependencies and relationships and preserve
context for long sentences . Therefore, in general for language with such complex
contexts, RNN is generally avoided for the use of encoder and decoder design. Long-
and Short-Term Memory (LSTM) models were introduced for encoding and decod-
ing to solve the drawbacks of RNNs with better context preservation capabilitites.
The performance of the seq2seq LSTM model declined with the increase in the
length of the input sequence. A Deep Neural Network could focus just on a few
important topics by using the attention mechanism instead of storing the entire
context and then passing it to the decoder. An attention mechanism that simulta-
neously learnt to position and translate was introduced by Bahdanau et al. [15] and
depicted in Fig. 2. The Bahdanau attention could alleviate the performance bot-
tleneck of traditional encoder–decoder systems. Cho et al. [1] and Sutskever et al.
[3] used a RNN encoder–decoder framework for neural machine translation along
with the attention layer. The encoder and decoder states were combined linearly
using this additive technique. Contrary to the seq2seq concept without attention,
the context vector was created by all concealed states of the encoder (forward as
well as backward) and decoder. To help attention, focus on the most crucial infor-
mation, the input and output sequences were aligned using an alignment score
parameterized by a feed-forward network based on the context vector connected
to the source position and previously generated target words, the model would pre-
dict the target word.
The main components in the Bahdanau encoder–decoder architecture:
1. The hidden decoder state at time step t − 1 is represented by st-1.
2. The context vector at time step t is denoted by ct. Each decoder step generates
it in a different way to produce the target word, yt.
3. hi is an annotation that concentrates heavily on the ith word out of T total
words while capturing the information found in the words making up the com-
plete input sentence, {x1, x2, …, xT}.
4. Each annotation, hi, is assigned a weight value at i at the current time step, t.
5. An alignment model, w(.), produces an attention score called et,i that rates how
well st-1 and hi match.
The Bahdanau attention algorithm works as followed:
1. A collection of annotations, hi, are produced by the encoder from the input
sentence.
2. These annotations and the prior hidden decoder state are put into an align-
ment model. This data is used by the alignment model to produce the attention
scores, et,i.
3. The attention ratings are normalised into weight values, at,i, in a range between
0 and 1, by applying a softmax function to them.
4. A context vector, ct, is produced by adding these weights to the previously
computed annotations in order to create a weighted total of the annotations.
5. To compute the final output, yt, the context vector is supplied to the decoder
together with the previous hidden decoder state and the previous output.
6. Up until the end of the sequence, steps 2 through 6 are repeated.
exp(score(h, hs ))
at,s =
∑
S
exp(score(h, hs′ ))
s′
∑
ct = at,s hs
s
4 Proposed Methodology
Figure 3 depicts the various steps involved in the proposed methodology for
both text translation and simplification. Translation and simplification models
have been trained separately but have the same underlying methodology. First,
we pre-process the original sentences from our corpus by removing the special
characters, punctuations and extra white spaces. Then, the pre-processed text is
tokenized and fed to the input layer which transforms the tokenized text to 256
embedding vectors. The embeddings are passed to the LSTM layer of the encoder.
The decoder receives the weights from the attention layers connecting the encoder
and the decoder. The decoder works at the word-level generating one word at a
time which stops when the <END> special token is generated or maximum length
is reached.
862 M. Navlakha et al.
Firstly, for an English data set, we convert all the letters in the data set to low-
ercase letters. Then, the punctuation marks (!.,) and special characters($#) were
replaced by a single space character. Later the multiple spaces were replaced by
a single space, to indicate the beginning and conclusion of the statement when
training a “start” and “end” token is inserted into the front and back of the text,
respectively. After performing these actions, we now finally tokenise the corpus.
Similarly, for the training process, we perform the above sequence of actions on
the parallel Hindi data set.
The LSTM-based encoder–decoder network is used for mapping the input and tar-
get sentences in the sequence-to-sequence model. The proposed model leverages
teacher forcing in order to have a faster convergence. Initially, learning is fairly
64 LSTM-Based Encoder–Decoder Attention Model for Text Translation … 863
Fig. 4 Word-level Seq2Seq model using LSTM for text translation with English tokens as input
to the Encoder and predicted Hindi output tokens by the Decoder
low for training, so using teacher forcing we would feed back the actual expected
results rather than what the model has predicted, leading to a faster updation of
the weights in the correct direction. Figure 4 depicts our base seq2seq model
for English to Hindi translation.
4.4 Attention
Our proposed model stacks an attention layer bridging all the encoder units
to each decoder unit to improve the existing seq2seq models. Attention offers a
learning method where the decoder can learn in a more weighted context from
the encoder and interpret which part of the subsequent encoding network requires
more “attention” when predicting the output. Although the attention-based seq2seq
model requires higher computational resources, the results are quite convincing
compared to the traditional seq2seq. The attention model creates a context vector
that is filtered for each output time step instead of just encoding the input sequence
into a single fixed context vector.
Figure 5 depicts the proposed model with the attention layer. Both the encoder and
decoder consist of a LSTM layer of 256 units with a dropout of 0.2. Initially, the input
layer feeds in the input text and passes it to the embedding layer which transforms the
tokenized sentence into a 256 dimensional embedding. One embedding layer for each
encoder and decoder is initialised at the start. All the encoder units are connected to
Bahdanau attention layer passing the context vector to decoder unit one at a time.
5 Corpus Description
The prerequisite for a machine translation model is the availability of parallel cor-
pora for source and target languages. In terms of the availability of parallel data,
Hindi is a less resourceful and atypical language when compared to its European
864 M. Navlakha et al.
analogues. Also, our major focus is on the Judicial and Legal domain narrowed
down the choices. Thus, we had to create our own data set for the Constitution of
India. Our proposed translation parallel corpus includes these two sources:
1. Parallel Corpus from IIT Bombay English–Hindi corpus. It contains data from
multiple disciplines out of which only the Judicial domain and different Indian
Government website sources were filtered out.
2. Curated our own custom English to Hindi data set on the Constitution of India
with reference to the official Indian Constitution Hindi version.
The corpus is exhaustive with an eclectic diversity in the vocabulary. Table 1 illus-
trates the sentence distribution with respect to each domain in the data set.
90% of the data set’s sentences were used for training, 9% for testing, and 1%
for validation of the translation model.
Alongside, another parallel data set for text simplification was curated, consisting
of articles from the Constitution of India illustrated in Table 2. Original articles
64 LSTM-Based Encoder–Decoder Attention Model for Text Translation … 865
and its corresponding simplified sentences were clubbed to form the dataset
records. The simpler versions of sentences are easily comprehensible to a lay-
man’s understanding.
6 Experimental Set-Up
BLEU-4 [14] is a metrics used to evaluate the similarity between the predicted
sentence to its corresponding reference parallel. In our case, it is the translated or
simplified sentence compared to its original sentence. BLUE score scales from
1.0, indicating an absolute similarity to 0 indicating an absolute mismatch.
7.2 Analysis
BLUE score of our model with different configurations, on the validation dataset
(1% holdout from the corpus), has been presented in Table 3.
As observed in Table 4, our proposed model performed well for shorter sen-
tences. The translated sentence must be similar to the reference sentence in the
data set. The model is capable of capturing the context behind the judicial and leg-
islative jargon and translating them correctly.
However, as observed in Table 5, the model’s performance degrades as the sen-
tence length increases. It starts off correctly but then starts repeating the words
till it gets cut at the max length. This is mainly due to the fact that our model uses
only a single LSTM layer. The model’s performance for longer sentences can be
enhanced by stacking more LSTM layers or even using bidirectional LSTM lay-
ers. The simplification model isn’t added to the tables due its low performance,
resulting into repeated words. This is majorly because the current simplification
corpus only has 300 records.
In the current work, we have focused on different NLP techniques like text trans-
lation and text simplification on the legislative and judicial text records in order to
reduce the complexity and make it more comprehensible to the common people.
Performance and results of the proposed LSTM-based seq2seq model incorporat-
ing teacher forcing have been expounded in Table 3. Both the translation and sim-
plification models have trained on our custom curated data set on the Constitution
of India. The seq2seq lstm model with teacher forcing gives a BLUE score of
12.53, however adding an attention layer enhances the model. It helps the model
better understand the context of the sentences. The NMT model for English–Hindi
language pairs having a BLUE score of 15.34 works well for short sentences as
seen in Table 4. The NMT model successfully captures the judicial and legislative
context in the given sentence. However, we observe a decrease in the performance
for longer sentences as seen in Table 5. The simplification model has a relatively
lower performance mainly due to the fact that we only have around 300 records
for simplification, and the accuracy can future be increased by working on the
data set. Moreover, the performance can further be improvised by increasing the
epochs and the encoder–decoder units which are currently limited due to hardware
constraints.
The research work can further be extended by increasing the simplification and
translation data set with more judicial and legislative sentences. Our proposed
model uses teacher forcing, other methods like beam-search, and state-of-the-
art transformers can also be experimented with to increase the performance. The
accuracy of the text translation model can further be improved by augmenting the
data set with more judicial specific English–Hindi pairs.
868 M. Navlakha et al.
References
1. Cho K, Merriënboer BV, Bahdanau D, Bengio Y (2014) On the properties of neural machine
translation: encoder-decoder approaches. arXiv:1409.1259
2. He X, Haffari G, Norouzi M (2018) Sequence to sequence mixture model for diverse
machine translation. arXiv:1810.07391
3. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks.
Adv Neural Inf Process Syst 27
4. Escolano C, Costa-jussà MR, Fonollosa JAR, Artetxe M (2020) Multilingual machine
translation: closing the gap between shared and language-specific encoder-decoders.
arXiv:2004.06575
5. Xue L, Constant N, Roberts A, Kale M, Al-Rfou R, Siddhant A, Barua A, Raffel C (2020)
mT5: a massively multilingual pre-trained text-to-text transformer. arXiv:2010.11934
6. Mahmud A, Al Barat MM, Kamruzzaman S (2021) GRU-based encoder-decoder attention
model for English to Bangla translation on novel dataset. In: 2021 5th international confer-
ence on electrical information and communication technology (EICT). IEEE (pp 1–6)
7. Brown PF, Cocke J, Della Pietra SA, Della Pietra VJ, Jelinek F, Mercer RL, Roossin P
(1988) A statistical approach to language translation. In: Cooling Budapest 1988 volume 1:
international conference on computational linguistics
8. Martin L, Sagot B, de la Clergerie E, Bordes A (2019) Controllable sentence simplification.
arXiv:1910.02677
9. Narayan S, Gardent C (2014) Hybrid simplification using deep semantics and machine
translation. In: The 52nd annual meeting of the association for computational linguistics, pp
435–445
10. Dong Y, Li Z, Rezagholizadeh M, Cheung JCK (2019) EditNTS: an neural programmer-in-
terpreter model for sentence simplification through explicit editing. arXiv:1906.08104
11. Medsker L, Jain LC (eds) (1999) Recurrent neural networks: design and applications. CRC
press
12. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput
9(8):1735–1780
13. Kunchukuttan A, Mehta P, Bhattacharyya P (2017) The IIT Bombay English-Hindi parallel
corpus. arXiv:1710.02855
14. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of
machine translation. In: ACL
15. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to
align and translate. arXiv:1409.0473
Compiling C# Classes to Multiple Shader
Sources for Multi-platform Real-Time
Graphics
1 Introduction
Nowadays, with the .NET Ecosystem, especially with the newly arrived .NET 5
and 6 [1] we can reach multiple platforms and operating systems from a single
source code. A standard C# class library can be compiled for Windows, Linux,
Mac, to mobile platforms like Android or iOS and even to the Web as a webservice
or webpage. Using multi-platform packages and organizing our code into libraries
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 869
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_65
870 D. Szabó and I. Zoltán
we can accelerate our development for multiple systems. Only the system specific
features must be reimplemented for every platform, our application specific models
can be reused for all platforms. In our research we are merging this multi-platform
C# area and technology with the world of low-level real-time graphics and rendering.
To implement a rendering algorithm or graphics engine You need to use graphics
APIs to access the Graphical Processing Unit (GPU) in Your device. Using these
APIs You can program GPUs, allocate resources on it and issue state modification
and drawing commands to render sequence of images in real-time. Such graphics
APIs are OpenGL, Vulkan, DirectX and Metal. These are C/C++ interfaces which can
be used to communicate with the GPU’s Device Driver, therefore most applications
using these APIs are implemented in C or C++. While the APIs have so-called binding
libraries for C# (extern PInvoke static methods and structures generated from the
C/C++ interfaces) these libraries do not fit the previously presented multi-platform
C# environment perfectly. Every graphics API support different platform. DirectX
supports only Windows, Vulkan supports only modern devices, OpenGL with the
combination of OpenGL ES and WebGL supports every platform (excluding Mac or
iOS), but these are older APIs. The combined use of most these APIs is needed to
support as many platforms and devices as possible. Our goal is to provide a possibility
for .NET C# developers to use graphics APIs through our low-level abstraction layer.
Our library presents a way for developing multi-platform real-time rendering
applications using C#. The library will provide a set of classes and functions for
implementing low-level rendering algorithms in a cross-platform .NET application.
The focus is to support implementing the rendering code only once in a shared
library and all supported platforms can reference and consume this single codebase.
Therefore, the library is an abstraction of Graphics APIs (currently Vulkan with vk
.net [2] and OpenGL with OpenTK [3]). When the application initializes, the most
fitting Graphics API can be selected to be used based on the current platform, device
and capabilities. Furthermore, the real-time rendered images can be embedded into
existing .NET applications’ user interfaces, just like any other regular UI component.
In this paper, after the brief introduction of our library’s structure the focus is
on the GPU sided code implementation, the shader programming with the library.
The custom algorithms that are executed by the GPU during our drawing commands
are implemented in shader languages and these differ between graphics APIs. To
hide this difference, we have implemented C# abstractions not just for the Graphics
APIs, but for their supported shading languages as well. The goals, advantages and
implementation are detailed in later sections of this paper.
Our library is a structure of libraries, not just a single package. There is a base library
which is the graphics API abstraction which the user will use for development most
of the times. There are the implementation layer packages which are the individual
Compiling C# Classes to Multiple Shader Sources for Multi-platform … 871
Fig. 1 CPU-side C# implementation of rendering the hello triangle sample application using our
library (SharpGraphics)
Graphics APIs (currently Vulkan, modern OpenGL and OpenGL ES 3.0) imple-
menting the base library [4]. Lastly there are the view layer packages which adds
support for embedding and presenting the rendered images into multiple .NET UI
frameworks like WPF, Xamarin, Avalonia, etc.
Using the base library, it is possible to implement a custom rendering algorithm
in a multi-platform C# class library. All the rendering features like using shaders,
buffers, pipelines, etc. can be coupled in this module, regardless of what platform or
which graphics API will be targeted. The decoupled architecture makes it possible
to reuse most of the code implemented with our library, because the only platform
specific part is the initialization of the Management, Device and UI surface. The rest
of the rendering application can be shared between platforms.
To render something, a Render Pass must be created with a single Step and a Pipeline.
The Render Pass specifies the output images’ types and their role in the rendering,
when and how it wants to render into them. A Pipeline is a set of configurations for
drawing commands. A Pipeline specifies the used shaders, the type of shader inputs
and resources and other drawing specifications.
For more complex solutions it is possible utilize Buffers, Pipeline Resources
(Uniforms in OpenGL or Descriptor Sets in Vulkan) and Textures [5] (Fig. 1).
in specific drawing stages. Currently the Vertex Shader and Fragment Shader stages
are supported by our library. In OpenGL these shaders shall be implemented in GLSL
(OpenGL Shading Language), however GLSL has multiple versions, also OpenGL
ES and WebGL are using a different versioning. Vulkan uses SPIR-V as its shader
language which is an extended GLSL 460 compiled into an intermediate bytecode
language. Both DirectX and Metal uses their own shader languages.
Currently our library supports generating GLSL 420, 430, 440, 450, 460, 300
ES and SPIR-V bytecode. For modern hardware GLSL 460 or SPIR-V would been
enough, but the current role of OpenGL in our library is to maintain compatibility with
older and simpler platforms therefore we’re supporting older GLSL code generation
as well.
In our library it is possible to promote a C# class into a “shader class”. This will
make the build system run our Source Generator (see in the next section) on the class
and it will generate the shader sources and bytecodes from the class in the correct
format to let us use it during Pipeline creation. To promote a class to shader class it
shall be inherited from the proper ShaderBase class from our library and a Shader
Attribute must be added to the class (presented on Fig. 2). This way we can also
specify what kind of shader (Vertex of Fragment) we are implementing with this
class. Inside this class the regular C# variables and programming constructs can be
used, just like in the implementation a normal C# class. We can even use types or
functions from outside of the class, those will be generated into the shader as well.
However, there are some rules to be careful about, for example it is not allowed to
use any reference types (shaders support only value types). Also, the class must be
a partial class because our generator will generate additional partial classes storing
the compiled shader source codes in fields.
In the class the abstract main method must be overridden. This will be the entry
point of the shader which will be called for every vertex/fragment. Let us start with
a simple Passthrough Vertex and Fragment Shader combination to understand the
semantics of shader development in our library. These shaders are just receiving
input and sending it forward without any modification. Implementation of much
more complex shaders is possible of course, but this is a great starting point.
To implement a simple Passthrough Vertex Shader (sets the incoming position
from the vertex input data as the final position for the vertex without any transfor-
mations or modifications) we must create input variables to receive the vertex input
data. We can declare a local private variable in our shader class as a Vector4 (Vector4
and other mathematics structures are in the official C# System.Numerics namespace,
our shader generator is using these types) and we shall add the in attribute from our
library to this variable. This will indicate that during shader generation this will be
generated with an in storage qualifier into the shader source (similarly for output
variables we have an out attribute). In the Vertex Shader we have some built-in
Compiling C# Classes to Multiple Shader Sources for Multi-platform … 873
Fig. 2 Vertex passthrough shader implemented in C# and compiled into GLSL and SPIR-V using
our library
variables like vPosition which stands for the built-in gl_Position variable in GLSL
or vID which is gl_VertexID in GLSL. To finish our Vertex Shader class, we shall
assign the input position to vPosition stating that this value is going to be the final
vertex position. Also, we can create an in and an out Vector4 variable for sending
the incoming color value of the vertex to the Fragment Shader.
The Fragment Shader can be implemented analogue to the Vertex Shader. We shall
declare an in Vector4 variable for receiving the color from our Vertex Shader and an
out Vector4 for stating the final fragment color. In the overridden main function, we
shall assign the input color to the output color (Fig. 3).
Fig. 3 3D transformation vertex shader implemented in C# and compiled into GLSL and SPIR-V
using our library
Uniform variables are constants inside the shader, but before the drawing
command is issued on the CPU sided code that will eventually execute our shader,
we can set the values of these Uniform variables. Conveniently, in our library the
same C# structures can be used both in the CPU sided code when we are setting the
value and in the GPU shaders where we use it (mind the StructLayout attribute! We
will explain memory layouts later in this section). In the C# shader class, we can use
the Uniform attribute from our library to indicate that a field is a uniform variable.
We need to assign the set and binding inside this set to this variable, because these
are the ids which we can use on the CPU sided code to select which of the Uniform
variables (in case we have multiple) we want to assign a value.
For vector and matrix operations we are consuming the C# System.Numerics
namespace with its types and functions. By implementing a valid calculation with
this API, the formula will be compiled into the correct GLSL formula.
Fragment Shader for Phong Shading. Phong Shading [6] is a straightforward way
of simulating light effects in 3D graphics. An example of a Phong shader using our
library could be implemented like on Fig. 4.
In this shader we are using multiple Uniform variables for multiple purposes.
Material stores the current object’s material information (how it reacts to light, how
Compiling C# Classes to Multiple Shader Sources for Multi-platform … 875
shiny it is, etc.), Light stores the virtual world’s sun’s, the directional light’s infor-
mation (its color and direction) and Scene presents other scene dependent data like
the camera (viewer) position.
When using structures as uniform variables in a shader language we must consider
the memory layout of the fields inside a structure. Every device (GPU) favors a
different memory layout as an optimal layout. While we would expect that the fields
of a structure are stored in the memory by the order of the declarations the device
may expecting it in a different order, simply because that order is more optimal for its
internal architecture. Also, fields may have padding between them, leaving unused
bytes among the bytes of our actual variables in the memory. When we see a structure
with two float fields, we would expect that it should be stored on 8 bytes of memory (2
* 4, because a float is stored on 4 bytes). The device may introduce padding between
the fields, to match a particular memory layout, for example organizing the structure
to have all its fields starting at every 16th byte [7]. To overcome this inconsistency
shader languages introduced standard layouts which are device independent memory
layout options for declaring structures. Our library is using the GLSL std140 layout
for Uniform variables which means the C# structures must obey the rules of this
layout. Like in the example of our LightData structure, You may need to explicitly
set the memory offset (location within the structure) of all fields to match the shader
language’s expectations. Currently our shader generator has no means of checking or
enforcing the correct layout, but this is something that we are planning to add later.
In the Phong shader we are using a Texture Sampler (a resource to read pixels
from an image in memory) to colorize our geometry. For these purposes we have
876 D. Szabó and I. Zoltán
introduced some types in our library which You can use in the C# shader classes to
access certain shader features like texture samplers and sampler operations.
3 C# Source Generators
Source Generators are part of the Roslyn Compiler Platform SDK [8] that is a C#
development library that enables developers to access the syntax tree and model of
their compiled C# application either during development time or during runtime.
Using a Source Generator, we can implement custom code checking and refactoring
tools or we can generate C# source files as well which will be added to the compilation
during the compilation. For our Shader Generators we have used the C# source file
generation feature of the API.
While it is possible to use Source Generators to create a standalone appli-
cation, like a command-line tool, we have created a Source Generator project
which can be referenced and used from any C# projects. These kinds of
projects are called Analyzers, .NET Standard 2.0 class library projects where
the <IsRoslynComponent> true </IsRoslynComponent> option must be added to
the project file and it needs references to the Microsoft.CodeAnalysis.Csharp and
Microsoft.CodeAnalysis.Analyzers packages. For both references we need to add
PrivateAssets = “all” attribute, to indicate that the reference of our analyzer should
not see these references, otherwise it may try to make that project as an Analyzer
project as well and that would result in inconsistency errors with the project type.
An Analyzer project created this way can be referenced from another C# project of
the solution, by adding OutputItemType = “Analyzer” to the referencing project.
During compilation of the referencing project the compiler system will invoke
the Analyzer project’s compiled dll. The Analyzer is executed, and the implemented
custom code will traverse the current state of compilation, optionally add additional
C# sources to it. All of this is happening completely during compilation time, no
runtime reflection or other runtime performance penalties will be applied.
Visual Studio itself uses the Analyzer’s dll as well, but it only loads the dll when
Visual Studio launches and loads the project. This means that every time when You
modify Your Analyzer project’s source code and recompile it, You need to restart
Visual Studio forcing it to use the newly compiled dll.
While an Analyzer project can be used for a lot of purposes, we are currently
focusing on source generation. To get started we need a class in the Analyzer project
implementing the IIncrementalGenerator interface (ISourceGenerator is available
as well, but Incremental Generators are more optimal to use). In the implemented
Initialize method we can access the context of the compilation of the project that
is referencing our Analyzer. We can filter the classes, structures, functions, variable
declarations, or all other programming constructs of this compilation to collect only
the parts of the code which we want to work on. It is important to filter early and to
filter well, because on larger projects a non-performant Analyzer can greatly slow
down the compilation and the responsiveness of Visual Studio.
Compiling C# Classes to Multiple Shader Sources for Multi-platform … 877
Fig. 5 Project file of a Roslyn source generator using other package references
After successful filtering we can schedule the execution of our code generator
method with the filtered data. The filtered data is usually some kind of Syntax
information (ClassDeclarationSyntax in our Shader Generator’s case because we
are filtering for the classes which are promoted with our Shader Attribute). Using
these Syntax classes, we can travel the whole syntax tree if needed. We can query the
fields and methods inside the found classes. We can get the statements inside a method
and the expressions that are constructing those statements. To get more detailed type
information about expressions we can get the semantic model for syntax trees. Using
these models, the type of the fields and the declaration or other information of these
types can be found. Based on this information it is possible to issue warnings, refac-
toring or to generate completely new C# sources and add these to the compilation.
All this parsing and generation happens during development and compile time.
We are creating a library for developers to use; therefore, we need to be able to
package this analyzer project into a NuGet Package. Any .NET Standard project can
be issued for packing using the <GeneratePackageOnBuild> true </GeneratePacka-
geOnBuild> option in the project file, however we need additional options for prop-
erly packing the Analyzer and its references and DLLs. For all other nuget package
references we shall add the PrivateAssets = “all” and GeneratePathProperty = “true”
attributes. Thanks to the latter the path to the compiled DLL of the reference can be
accessed in the project file later for packaging. On Fig. 5 we present the modifications
to the project file that are required to pack the DLLs of referenced packages into our
Analyzer’s package.
4 Shader Generation
After the compilation has been filtered for C# classes annotated with the Shader
Attribute our generator prepares the GLSLangValidator (explained later), then it
decides about the type and stage of the shader based on the Shader Attribute. Currently
only Vertex and Fragment Shaders are supported, so the generator tries to find either
of these two Attributes. We collect information about our C# shader class into a
ShaderClassDeclaration class. The fields of the class are filtered by our in, out and
uniform attributes and the main function of the class is searched for as well. On
878 D. Szabó and I. Zoltán
this prepared information we start our Builders. We have a Builder for all supported
shader and their task is to traverse the syntax tree of the C# shader class and generate
the corresponding shader source as a string. Lastly the generated strings are added to
the compilation as fields in a C# class, therefore the library will be able to get these
shader source strings during Pipeline creation (Fig. 6).
We can determine the stage of the shader by the type of our Shader Attribute, our
library has a separate VertexShaderAttribute and FragmentShaderAttribute for this
role (both are inheriting from a ShaderAttribute base class so after filtering we can do
further analysis based on the subclass types). Later this system can be extended with
support for Compute Shaders or other Graphics Shader stages as well. We collect all
this information to the previously mentioned ShaderClassDeclaration class which is
provided to the Builders.
The generated shader sources are emitted into the compilation as partial classes.
The ShaderBase class which our C# classes needs to inherit from, provides properties
for getting the generated shader source for every supported shader language and
version. For each shader language and version, a partial class is generated, holding
the generated source in a private field, and providing access to it using a property
getter. To properly emit these partial classes, we also need to get the namespace of
the C# shader class which the generator is currently parsing. First, we collect all the
generated sources into a List, with the name of their shader language (to know which
property and field it should emit the source into) and the source string itself. For the
string-based GLSL the source string is the multi-line string of the generated string
itself. For the bytecode-based SPIR-V the source string is the instantiation of a byte
array with the content of the compiled bytes. We are not utterly satisfied with emitting
the compiled shaders into the code this way. We are experimenting with ways to emit
the generated shaders as Embedded Resource files into the compilation.
Compiling C# Classes to Multiple Shader Sources for Multi-platform … 879
Builders are the soul of our whole Shader Generation package because Builders are
responsible for deeply inspecting the C# shader class’s syntax-tree and generate the
source string statement by statement. To be able to extend the system in the future
for supporting more shader languages (for supporting DirectX and Metal), we have
defined a ShaderBuilderBase class which is responsible for exploring the syntax-
tree. Specialized Builders for GLSL or other shader languages are inherited from
this base class to implement the language specific generation steps. Multiple GLSL
builders are also inheriting from each other. While the basic keywords and syntax
of the most modern GLSL 460 are the same in the older GLSL ES 300 on mobile
hardware, some modern features are not supported, and those parts of the source
generation must be overridden to fall back to the older feature set.
ShaderBuilderBase defines the main structure of the generated shader source, by
first generating the Preprocessor keywords, then shader level variables and lastly the
main function. Generation is done using a StringBuilder, which is an optimal way
of concatenating multiple strings into a single string in C#. The base class keeps
track of the current indentation to always emit the proper number of tabulators at the
beginning of the lines. It also stores the line number for adding structure definitions
or functions, which are discovered during the exploration of the main function.
Variables. We generate the in, out, uniform and local shader level variables in this
order. The declarations have been grouped into these storage categories before the
Builders are started, during the preparation. For every category we generate the
declaration for each variable in declaration order. First, we need to get the shader type
of the declared variable. Our library differentiates the C# types in the implemented
C# shader classes and the types that are supported in the shader language. We need
to map these types to each other. For primitive types, the mapping is simple (int
to int, float to float, etc.), but for more complex types we need to emit a different
string into the shader code (Vector3, which is System.Numerics.Vector3 is vec3 in
the shader language). Structures are even more complex (see later). When the type
of the variable is known, we can begin the generation of the declaration. We need
to generate the proper storage qualifier (in, out, uniform) based on the Attributes
attached to the declaration. This is specialized for the GLSL Builders to generate the
qualifier keyword before the type.
For uniform variables struct types are supported, so this behavior must be handled
at the declaration to generate the struct inlined to the declaration. If the uniform
variable is an array of structs, then the inline struct will have only a single struct
array field with the actual struct instances inside. Older GLSL versions do not support
explicit location settings for the uniform variables so for GLSL ES 300 we need to
override this generation with a specialized behavior to rename these uniform variables
during shader generation consistently using the location settings. This way the CPU
side code can still deterministically get the location of these variables in our library.
The generator supports renaming variables during the generation, therefore every
880 D. Szabó and I. Zoltán
time when we are using a variable in the shader code that is renamed, the new name
will be used.
Statements. The main function (or any function) is built up from statements. After
generating the interface of the function (void main() {…}) the statements of the
function must be parsed and generated in order. For every supported statement kind
(Expression, Declaration, If, For, etc.) we have functions to build the corresponding
source for that statement kind in the shader language. Statements are built up from
expressions which are also built up from other expressions. We have builder methods
expressions too. To better understand, we can observe the syntax-tree of an Expres-
sion Statement which assigns the transformed input vertex position to the output
position in the Vertex Shader on Fig. 3. The Expression Statement is a Simple Assign-
ment Expression which have a Left Expression (output position which is an Identifier
Name Expression) an operator (equals) and a Right Expression that is an Invocation
Expression (calling the static Transform extension function of Vector4). The param-
eters of this invocation are also Expressions and so on, until we reach the bottom of
the Syntax-tree where we usually find an Identifier Name (name of a variable) or a
Literal (like a float token).
Our generator is built up with the structure presented on Fig. 7 to be able to
discover the Syntax-tree and generate the equivalent shader source based on it. All
the methods are virtual; therefore, their behavior can be overwritten in the derived
GLSL or other shader builders if special behavior is needed for a shader language.
Types. Whenever the generator encounters the need to find the type of an expression
(variable declarations, invocations, member access) it checks if that type has a corre-
sponding type in the shader language (int for int, vec3 for Vector3, etc.). If there is no
corresponding type, then it assumes that it must be a custom structure and tries to get
its declaration syntax for generating the struct. If there is no such structure defined in
the C# compilation, then it is an unknown type and it will result in a generator error.
Generating structures are done in a separate StringBuilder, because we need to
add the declaration at the beginning of the shader source code (after preprocessor
keywords and before variable declarations). Generating the struct is straightforward,
like generating the variables of the shader. We store the fact into a Dictionary, that
this struct has been generated, so the next time when the generator encounters the
same struct type it will realize that it is already generated.
So far, the C# compiler can detect errors made against the C# specification and our
Generator can detect some of the errors that are rendering the implemented shaders
incompatible with the target shader languages. Before linking the build into a final
compilation, we are using the GLSLangValidator tool to catch any other error that
passed the first two lines of defense.
Compiling C# Classes to Multiple Shader Sources for Multi-platform … 881
Fig. 7 Internals of our shader generator. For all statements in a method, it builds the statement and
the expressions inside the statement
down Visual Studio. We are experimenting with other ways for speeding up the
process.
We can get the executable for the Validator from its official GitHub repository
[9]. We have added executables (for multiple platforms) to the Generator project
as Embedded Resources. This way we can access the executable files while the
Generator is executing. During the preparation phase of our Shader Generator, we are
copying the Validator executable to the Temp folder in the system, if it is not already
there (on Mac and Linux we need to execute the chmod command from our Generator
to add executable rights to this file). Visual Studio or build systems may launch
multiple instances of a Generator/Analyzer at once so we need to handle accessing
the Temp folder from multiple threads simultaneously. We create a temporal folder
for the current session, and we write the completely generated shader string into a
text file in this folder.
Validation. After the Validator executable and the shader file to validate is in place
we need to launch the Validator as a new process with the path to the shader file in
its launch arguments. To validate the only argument the Validator needs is the path
to the shader file. For SPIR-V compilation we need to add -V to compile and -o
{output_file_path} for the compiled file’s path.
We are starting the Validator process with the built-in System.Diagnostics.Process
class in C#. In Windows, we can get the output of the Validator so we can throw a
compile time error with the error received from the Validator.
After starting the Validator Process, we need to wait for the process to end. We are
doing an active pooling wait on the resource, because the async waiting on processes
is not available on .NET Standard 2.0. If the compilation has been canceled (manually
by the user or automatically by the IDE) we shall terminate the process to finish our
generator as soon as possible.
If the generator has a non-zero exit code, then errors occurred during the compi-
lation. Otherwise, the shader is correct. For SPIR-V we can read the compiled binary
file into our Generator to emit the byte array into the source code.
5 Conclusion
With our forthcoming library we are providing a possibility for .NET C# developers
for implementing low-level real-time rendering algorithms and solutions in C# and
to embed the rendered frames into the surfaces of .NET UI frameworks, like WPF
and Xamarin. Using our library, both the CPU and GPU graphics algorithms can be
implemented in C#. The CPU rendering code is an abstraction layer for Graphics
APIs and the native GPU shaders are generated from C# classes during compilation
time.
We are presenting a new kind of development environment for both regular C#
programmers and graphics programmers. We would like to recommend this paper and
our approach both to .NET and graphics developers .NET developers could power
Compiling C# Classes to Multiple Shader Sources for Multi-platform … 883
their application with multi-platform real-time graphics modules and graphics devel-
opers could reach other platforms and new interesting development environments
with our approach.
References
1 Introduction
In image processing, we first convert the image to digital form, and after that, various
image processing operations are performed. Some operations include visualization,
recognition, sharpening, restoration, retrieval, etc. These facilitate improving the
quality of the image or extracting useful information from the image. The input of
an image processing algorithm is an image, and the output can be an image or the
features of that image.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 885
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7_66
886 P. Ranawat et al.
There are three essential components in the generic model of digital image
watermarking:
i. Watermark generation
ii. Watermark embedding
iii. Watermark extraction (Fig. 1).
Cover image,
watermarked image and Watermark detection Cover image
watermark
There are various threats to degrade the watermarking and hinder data security. With
the advancement in watermarking techniques, the attackers are also coming up with
new ideas and techniques to ruin the watermark. Some of the attacks are:
An Enhanced GLCM and PCA Algorithm for Image Watermarking 889
2 Present Work
With the growing Internet usage, there has been a serious concern regarding the secu-
rity of confidential content. For this, many new techniques have emerged, and one
of them is digital watermarking. The two major categorizations of the watermarking
techniques are the spatial domain and frequency domain. In the spatial domain water-
marking technique, we insert the watermark into the cover image by modifying the
lower-order bits of the cover image. This technique minimizes the complexity and
computational values. However, the robustness of this technique is low against a
specific type of attack.
Frequency domain watermarking includes techniques like Discrete Cosine Trans-
form (DCT), Discrete Fourier Transform (DFT), and Discrete Wavelet Transform
(DWT), which make use of some inverse transformations. These techniques provide
security against more attacks, but this technique’s complexity and computational
values are very high. Therefore, we need to design a robust technique against a
maximum number of attacks while having minimum complexity and computational
values.
In the base paper, we observed that the OS-ELM approach is applied to generate
a semi-blind watermarked image. DWT algorithm is deployed for extracting and
analyzing the features of the image.
The technique suggested in this paper uses GLCM and PCA algorithms for gener-
ating a blind watermark. This technique is less complex, and the watermarked image
is more robust against various security attacks (Fig. 2).
Steps followed in this research methodology
1. Initially, all the confidential and non-confidential images are considered. The
confidential image is hidden within the non-confidential image. The keys used
for encryption are generated from non-confidential images.
2. In the next step, we apply the GLCM algorithm to extract features such as co-
relation and homogeneity from the confidential image.
3. Then we apply the PCA algorithm to select features from the already extracted
features. The similarity between pixels is analyzed, and the image is compressed
using various mathematical formulae and statistics.
4. Finally, the watermark created using non-confidential data is embedded into the
cover image using OS-ELM, a machine technique.
Gray-level co-occurrence matrix (GLCM)
It is a [9] statistical method for understanding the spatial relationship among pixels
in an image. It calculates how often pairs of pixels with specific values and in a
specified spatial relationship occur in an image. The various statistics derived using
GLCM are:
1. Contrast: It measures the local variations.
2. Co-relation: It is a measure to calculate the probability of occurrence among
specific pixel pairs.
An Enhanced GLCM and PCA Algorithm for Image Watermarking 891
N −1
Entropy = −ln(Pi, j )Pi, j (1)
i, j=0
N −1
Energy = Pi2j (2)
i, j=0
N −1
Contrast = Pi j (i − j )2 (3)
i, j=0
N −1
Pi j
Homogeneity = (4)
i, j=0
1 + (i − j)2
892 P. Ranawat et al.
N −1
Pi j (i − µ)( j − µ)
Correlation = (5)
i, j=0
σ2
value − mean
z= (6)
standard deviation
2. Covariance matrix computation: We calculate the covariance matrix to see how
the variables in the input data vary from the mean with respect to each other. It is
a p × p symmetric matrix consisting of covariances associated with all possible
pairs of initial variables.
3. Eigenvectors computation and eigenvalues computation: To calculate the prin-
cipal components of the data, we compute Eigen values and Eigen vectors, a
concept of linear algebra. This helps reduce the dimensionality without much
loss of information, and this can be achieved by discarding the components with
low information.
4. Feature vector: It is a matrix with eigenvectors that we choose to keep and
consider relevant. This is the first step toward reducing the dimensions. We
arrange the eigenvectors in decreasing order of their eigenvalues to find the most
significant principal components.
5. Recast the data along the principal component axes: In the previous steps, we have
learned that we did just one change on the data set, i.e., standardization. Apart
from those, we calculate only principal components and perform operations on
them. The input data always remains in terms of initial variables. In this step,
we use the feature vector formed using eigenvectors of the covariance matrix to
reorient the data from the original axes to the ones reoriented by the principal
components.
3 Result
We have used MATLAB for the simulation of our proposed algorithm. MATLAB
is a programming language based on matrix that is very user-friendly, thus, making
the workflow fast and easy. It is prevalent in the various fields of engineering and
science as it provides an interactive environment for developing algorithms, analysis
and visualization of data, and numerical computations. First, the user enters a scale
factor ranging from 0 to 7, as the input image is 8 bits. The watermarked image is
generated according to this entered scale factor value (Figs. 3 and 4).
PSNR—Peak signal to noise ratio is the ratio of the maximum power of an image to
the corrupting noise power that affects the quality of that image. In image processing,
PSNR is generally used to determine the error created by compression in the form
of noise in the original signal.
MSE—Mean square error demonstrates the cumulative squared error between the
compressed and original image. It is mainly used in statistical models to measure the
difference between the observed and predicted values (Fig. 5 and Table 1).
4 Conclusion
The proposed method described in this paper helps in securely embedding the water-
mark into the cover image to obtain a more robust watermarked image. GLCM
and PCA algorithms generate a blind watermark that can safely transmit over any
secured or unsecured channel. In today’s world, there is wide internet usage, thereby
increasing the risk of unauthorized access. Attackers can easily tamper with data
over digital channels, posing a severe security problem.
Unauthorized sources can access sensitive data, and they can also make changes
to this data and distribute it illegally. Therefore, it is vital to have proper security
measures to make sensitive data transmission secure. Watermarking is one such
measure to induce security in digital data.
896 P. Ranawat et al.
5 Future Work
There is always a chance to improvise and improve any work that is done. Following
are a few points that need to be taken into consideration to improve the proposed
algorithm in the future:
(a) This algorithm can be improved in terms of elapsed time, and it can be
minimized.
(b) The efficiency of the proposed algorithm can be verified in the presence of other
attacks to see how well it performs or needs to be improved.
(c) The complexity of the algorithm can be reduced further.
References
1. Vleeschouwer CD, Delaigle JF, Macq B Invisibility and application functionalities in perceptual
watermarking. In: 2002 an overview proceedings of the IEEE, vol 90, pp 64–77
2. Jun X, Ying W Towards a better understanding of DCT co-efficient in watermarking. PACIIA
2008, pp 206–209
3. Jun S, Alam MS Fragility and robustness of binary phase only filter based fragile/semi-fragile
digital image watermarking instrumentation and measurement. 2008 IEEE Trans 57:595–606
4. Gonzalez RC, Woods RE (2009) Digital image processing, 3rd ed. Prentice Hall of India
5. Zeki AM, Abdul Manaf A A novel digital watermarking technique based on ISB. In: 2009
world academy of science, engineering and technology, vol 50, pp 989–996
6. Bamatraf A, Ibrahim R, Salleh MNBM Digital watermarking algorithm using LSB. 2010 in
ICCAIE, pp 155–159
7. Charles Fung AG, Walter G A review study on image digital watermarking. In: 2011 presented
at the 10th international conference on Networks St. Maarten The Netherlands Antilles
8. Zeki AM, Manaf AA, Foozy CFM, Mahmod SS A watermarking authentication system for
medical images. 2011 presented at CET Shanghai China
9. Sheikh R, Patel M, Sinhal A Recognizing MNIST handwritten data set using PCA and LDA. In:
2020 international conference on artificial intelligence: advances and applications, pp 169–177
10. Khan S, Sharma AK, Patel M Performance analysis on modified hybrid DB scan clustering
technique to enhance average total execution time. In: 2020 3rd International conference on
intelligent sustainable systems (ICISS), pp 1603–1605. https://doi.org/10.1109/ICISS49785.
2020.9316086
An Enhanced GLCM and PCA Algorithm for Image Watermarking 897
11. Agrawal S, Patel M, Sinhal A An enhance security of the color image using asymmetric RSA
algorithm, 2021. In: Purohit S, Singh Jat D, Poonia R, Kumar S, Hiranwal S (eds) Proceedings
of international conference on communication and computational technologies. Algorithms for
intelligent systems. Springer
12. Shekhawat VS, Tiwari M, Patel M A secured steganography algorithm for hiding an image and
data in an image using LSB technique, 2021. In: Singh V, Asari VK, Kumar S, Patel RB (eds)
Computational methods and data engineering advances in intelligent systems and computing,
vol 1257. Springer
Author Index
A C
Abdala, Mohammed A., 271 Chaman Verma, 95, 313
Abhiudhaya Upadhyaya, 787 Chander Prabha, 257, 299
Adamu, Birtukan, 681 Charu Saxena, 61
Ademola, Omolola Faith, 225 Chetashri Bhadane, 25
Adil Husain, 491
Aditya Ajgaonkar, 755
Aditya Thaker, 25 D
Ajay Kumar Sharma, 885 Daljit Singh, 377
Dávid Szabó, 869
Akshat Agrawal, 37, 225
Deepika Sood, 149
Akshath Mahajan, 25
Deepti Malhotra, 181, 491
Aliya Katyetova, 585
Devendra Narayan, 241
Alwahab, Dulfiqar A., 271
Dharani, M. K., 453
Amit Verma, 257
Dhiren Patel, 755, 803
Anchal Garg, 437
Dineshkumar, S., 453
Ankit Garg, 169
Drlik, Martin, 121
Anuj Raghani, 755 Dyuwan Shukla, 755
Anunay, 109
Anurag Sinha, 241
Apurva Tiwari, 479 E
Aruna Kumari Kakumani, 401 Eneacu, Florentina Magda, 95
Arvind Jamwal, 533 Enescu, Florentina Magda, 313
Ashima Rani, 545
Ashulekha Gupta, 411
G
Ganesh Reddy Karri, 635
Garima Srivastava, 571
B Gerard Deepak, 467
Bakonyi, Viktória, 559 Ghanashyam, K. J., 3
Beshah, Tibebe, 681 Goutam Datta, 51, 715
Bharti Rana, 329, 363 Gurpreet Singh, 257
Bhavna Arora, 209, 727
Bhavya Sheth, 755
Birihanu, Ermiyas, 681 H
Burgerová, Jana, 695 Harishchander Anandaram, 411
© The Editor(s) (if applicable) and The Author(s), under exclusive license 899
to Springer Nature Singapore Pte Ltd. 2023
Y. Singh et al. (eds.), Proceedings of International Conference on Recent Innovations
in Computing, Lecture Notes in Electrical Engineering 1011,
https://doi.org/10.1007/978-981-99-0601-7
900 Author Index
O
K Ogundokun, Roseline Oluwaseun, 37
Kabir, Ahmed Imran, 843
Kaleem Ullah Bhat, 313
Kanchana, M., 741 P
Kapil Joshi, 411 Padma Sree, L., 401
Karim, Ridoan, 843 Pardeep Singh, 15, 533, 603
Kaushal Binjola, 647 Parul Bansal, 15
Kefie, Hailemichael, 681 Pecuchova, Janka, 121
Khushboo Tripathi, 169 Piskura, Vladimír, 695
Kirti Sharma, 199 Poladi Harshitha, 341
Kshitiz Gahlot, 299 Pooja Anand, 351, 787
Kusum Gupta, 51, 715 Pooja Joshi, 411
Pooja Khanna, 479
Poonam Kashtriya, 603
L Pranit Bari, 421
Logeswaran, K., 453 Pratyaksha Ranawat, 885
Praveen Kumar, 109
Priya Mittal, 299
M
Madhulika Bhadauria, 437
Majeed, Rand S., 271 Q
Majid Zaman, 669 Quadri, S. M. K., 669
Malini Mittal Bishnoi, 613
Mandeep Kaur, 285, 377
Manju Sharma, 505 R
Mayank Patel, 885 Raboaca, Maria Simona, 95, 313
Meith Navlakha, 135, 421, 855 Rahul Raheja, 135
Mitali Chugh, 627 Rajadevi, R., 453
Mohammad Islam, 521 Raja Kulkarni, 771
Mohammad Shafeeq, 521 Raj Gaurav, 169
Mohan Chandru, G., 453 Rajni Bhalla, 199
Mohialden, Yasmin Makki, 411 Rishabh Bhargava, 855
Mohsin Manzoor, 727 Rishabh Mittal, 437
Monica Madan, 545 Ritesh Rastogi, 411
Mopuru Bhargavi, 241 Roopa Devi, E. M., 453
Muheet Ahmed Butt, 669 Ruhina B. Karani, 855
Muskan, 257 Russel Lobo, 855
Ruxana Jabeen, 813
N
Nafees Akhter Farooqui, 521 S
Namit Garg, 241 Sachin Kumar, 479, 571
Author Index 901