0% found this document useful (0 votes)

345 views418 pages

Advances in Iot and Security With Computational Intelligence

Advances in IoT and Security with Computational Intelligence

Uploaded by

khledra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

345 views418 pages

Advances in Iot and Security With Computational Intelligence

Advances in IoT and Security with Computational Intelligence

Uploaded by

khledra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 418

Lecture Notes in Networks and Systems 756

Anurag Mishra
Deepak Gupta
Girija Chetty Editors

Advances in IoT
and Security
with Computational
Intelligence
Proceedings of ICAISA 2023, Volume 2
Lecture Notes in Networks and Systems

Volume 756

Series Editor
Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland

Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of Campinas—
UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Türkiye
Derong Liu, Department of Electrical and Computer Engineering, University
of Illinois at Chicago, Chicago, USA
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering, University of
Alberta, Alberta, Canada
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to both
the contributors and the readership are the short publication timeframe and
the world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
For proposals from Asia please contact Aninda Bose (aninda.bose@springer.com).
Anurag Mishra · Deepak Gupta · Girija Chetty
Editors

Advances in IoT and Security

with Computational
Intelligence
Proceedings of ICAISA 2023, Volume 2
Editors
Anurag Mishra Deepak Gupta
Department of Electronics Department of Computer Science
Deen Dayal Upadhyaya College and Engineering
University of Delhi MNNIT Allahabad
New Delhi, India Prayagraj, India

Girija Chetty
Faculty of Science and Technology
University of Canberra
Bruce, ACT, Australia

ISSN 2367-3370 ISSN 2367-3389 (electronic)

Lecture Notes in Networks and Systems
ISBN 978-981-99-5087-4 ISBN 978-981-99-5088-1 (eBook)
https://doi.org/10.1007/978-981-99-5088-1

This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface

Along with the advancement of technologies in cyber-physical systems, Internet of

things, cloud computing and big data, challenges are traditionally solved by opti-
mization of organizations, business processes and information systems at the whole-
enterprise level. This is increasingly requiring an extension of this perspective to the
societal level. We live in society and deal with it. Without this, we cannot sustain
technological advances. In particular, the innovation that originated in Industry 4.0
has got transformed into Society 5.0, which is expected to play an important role in
bringing about change not only in the enterprises but also in society. Therefore, how
to design, develop, and manage enterprise and societal architectures and systems by
using information technology gain more attention than ever before.
With this objective in mind, the International Conference on Advances in IoT,
Security with AI (ICAISA-2023) was organized by Deen Dayal Upadhyaya College,
University of Delhi, New Delhi, India, in collaboration with University of Canberra,
Canberra, Australia, and NIT, Arunachal Pradesh, Itanagar, Arunachal Pradesh,
India, during March 24–25, 2023. We are thankful to our contributors, participants
and sponsors—STPI Chennai, REC Limited and Power Finance Corporation Limited
who have supported this event wholeheartedly.
This conference has been organized having thirteen parallel technical sessions
besides inaugural and valedictory sessions as the most suitable tracks capable of
serving the electronics, IT and software industries to be specific. Few presentations
by Indian and international industry specialists have been done in these sessions. This
is done with a view to establishing a connection between academia and the industry,
and both of them can get fruitful ideas from each other. We are really thankful to all
our overseas sector and academic experts who have either joined us physically or in
online mode.

v
vi Preface

We are particularly grateful to Dr. Rajendra Pratap Gupta, Mr. Animesh Mishra,
Mr. M. S. Bala and Prof. Balram Pani who blessed us in the inaugural session. We
are also thankful to Mr. N. K. Goyal for his presence in the valedictory session. We
are extremely grateful to Springer Nature, especially Dr. Aninda Bose who agreed to
publish two volumes of conference proceedings in the prestigious series of Lecture
Notes in Networks and Systems.

New Delhi, India Anurag Mishra

Prayagraj, India Deepak Gupta
Bruce, Australia Girija Chetty
Contents

Comparative Study of Metaheuristic Algorithms for Scheduling

in Cloud Computing Based on QoS Parameters . . . . . . . . . . . . . . . . . . . . . . 1
Jyoti Chauhan and Taj Alam
Impact of Spatial Distribution of Repeated Samples
on the Geometry of Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Reema Lalit and Kapil
IoT-Based Smart Farming for Sustainable Agriculture . . . . . . . . . . . . . . . . 27
Geetan Manchanda, Bhumika Papnai, Aadi Lochab, and Shikha Badhani
ELM-Based Liver Disease Prediction Model . . . . . . . . . . . . . . . . . . . . . . . . . 39
Charu Agarwal, Geetika Singh, and Anurag Mishra
Intercompatibility of IoT Devices Using Matter: Next-Generation
IoT Connectivity Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Sharat Singh
Role of Node Centrality for Information Dissemination in Delhi
Metro Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Kirti Jain, Harsh Bamotra, Sakshi Garg, Sharanjit Kaur, and Gunjan Rani
Biometric Iris Recognition System’s Software and Hardware
Implementation Using LabVIEW Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Rajesh Maharudra Patil, B. G. Nagaraja, M. R. Prasad,
T. C. Manjunath, and Ravi Rayappa
A Unique Method of Detection of Edges and Circles of Multiple
Objects in Imaging Scenarios Using Line Descriptor Concepts . . . . . . . . . 85
Rajesh Maharudra Patil, B. G. Nagaraja, M. R. Prasad,
T. C. Manjunath, and Ravi Rayappa
Robotic Vision: Simultaneous Localization And Mapping (SLAM)
and Object Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Soham Pendkar and Pratibha Shingare

vii
viii Contents

Optimum Value of Cyclic Prefix (CP) to Reduce Bit Error Rate

(BER) in OFDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Mahesh Gawande and Yogita Kapse
Optimum Sizing of Solar/Wind/Battery Storage in Hybrid Energy
System Using Improved Particle Swarm Optimization and Firefly
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Gauri M. Karve, Mangesh S. Thakare, and Geetanjali A. Vaidya
Fuzzy Based MPPT Control of Multiport Boost Converter
for Solar Based Electric Vehicle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Vishnukant Gore and Prabhakar Holambe
Image Classification Model Based on Machine Learning Using
GAN and CNN Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Ch. Bhavya Sri, Sudeshna Sani, K. Naga Bavana, and Syed. Hasma
Role of Natural Language Processing for Text Mining of Education
Policy in Rajasthan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Pooja Jain and Shobha Lal
Multilingual and Cross Lingual Audio Emotion Analysis Using
RNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Sudipta Bhattacharya, Brojo Kishore Mishra, and Samarjeet Borah
Multi-modality Brain Tumor Segmentation of MRI Images Using
ResUnet with Attention Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Aditya Verma, Mohit Zanwar, Anshul Kulkarni, Amit Joshi,
and Suraj Sawant
CPF Analysis for Identification of Voltage Collapse Point
and Voltage Stability of an IEEE-5 Bus System Using STATCOM . . . . . . 201
Subhadip Goswami, Tapas Kumar Benia, and Abhik Banerjee
Analysis of Various Blockchain-Based Solutions for Electronic
Health Record System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Namdev Sawant and Joanne Gomes
Coordinated Network of Sensors Over 5G for High-Resolution
Protection of City Assets During Earthquakes . . . . . . . . . . . . . . . . . . . . . . . . 225
Ivelina Daiss, José R. Martí, Amitabh Chhabra, Dragan Andjelic,
Carlos E. Ventura, and Andrea T. J. Martí
Detection of COVID-19 Using Medical Image Processing . . . . . . . . . . . . . . 237
Rekha Sri Durga, I. Akhil, A. Bhavya Sri, R. Lathish,
Sanasam Inunganbi, and Barenya Bikash Hazarika
Contents ix

Text Encryption Using ECC and Chaotic Map . . . . . . . . . . . . . . . . . . . . . . . 247

P. N. V. L. S. Sneha Sree, Vani Venkata Durga Kadavala,
Pothakam Chandu, Savara Murali Krishna, Khoirom Motilal Singh,
and Sanasam Inunganbi
Plant Leaf Disease Detection and Classification: A Survey . . . . . . . . . . . . . 259
Rajiv Bansal, Rajesh Kumar Aggarwal, and Neha Goyal
Performance Evaluation of K-SVCR in Multi-class Scenario . . . . . . . . . . . 269
Vivek Prakash Srivastava, Kapil, and Neha Goyal
An Ensemble Method for Categorizing Cardiovascular Disease . . . . . . . . 281
Mohsin Imam, Sufiyan Adam, Neetu Agrawal (Garg), Suyash Kumar,
and Anjana Gosain
Intrusion Detection System for Internet of Medical Things . . . . . . . . . . . . 293
Priyesh Kulshrestha, T. V. Vijay Kumar, and Manju Khari
Veracity Assessment of Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Vikash and T. V. Vijay Kumar
The Role of Image Encryption and Decryption in Secure
Communication: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
T. Devi Manjari, V. Pavan Surya Prakash, B. Gautam Kumar,
T. Veerendra Subramanya Kumar, Khoirom Motilal Singh,
and Barenya Bikash Hazarika
Reconstructing Masked Face Using GAN Technique . . . . . . . . . . . . . . . . . . 327
Chandni Agarwal, Charul Bhatnagar, and Anurag Mishra
Brain Cancer Detection Using Deep Learning (Special Session
“Digital Transformation Era: Role of Artificial Intelligence, IOT
and Blockchain”) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Shivam Pandey and Shivani Bansal
Traffic Accident Modeling and Prediction Algorithm Using
Convolutional Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
Anil Kumar, Shiv Kumar Verma, and Subhanshu Goyal
Cyberbullying Severe Classification Using Deep Learning
Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
Idi Mohammed and Rajesh Prasad
The Six Sigma Methodology Implementation in Agile Domain . . . . . . . . . 375
Abhay Juvekar, Oscar Leo D’souza, and Anita Chaware
Toward a Generic Multi-modal Medical Data Representation
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
K. M. Swaroopa, Nancy Kaur, and Girija Chetty
x Contents

Universal Object Detection Under Unconstrained Environments . . . . . . . 395

Nancy Kaur, K. M. Swaroopa, and Girija Chetty
Internet of Things-Based 3-Lead ECG Signal Acquisition System . . . . . . 405
Pranamya Sinha, Anuja Arora, Sunil Kumar, Daya Bhardwaj,
and Ravi Kumar

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
Editors and Contributors

About the Editors

Prof. Anurag Mishra has bachelor’s and master’s in Physics from the University
of Delhi. He completed his M.E. in Computer Technology and Applications and
Ph.D. in Electronics also from the University of Delhi. He has extensive experi-
ence of teaching B.Sc. (Hons.), M.Sc., B.Tech. and M.Tech. programs in Electronics
and Computer Science. He has about 28 years of experience as a teacher and as an
active researcher. He has been a consultant for offshoot agencies of the Ministry of
Education, Government of India. Presently, he is nominated as a visitor’s nominee
in a central university by the Government of India. He has 65 refereed papers in
highly cited journals, international conferences and book chapters, three authored,
one edited book and two patents to his credit. He has recently entered into devel-
oping medical applications using deep convolutional neural networks. He is an active
reviewer of papers for Springer, Elsevier and IEEE Transactions. He is a member
of IEEE and also holds membership of the Institute of Informatics and Systemics
(USA).

Dr. Deepak Gupta is an assistant professor in the Department of Computer Science

and Engineering at Motilal Nehru National Institute of Technology Allahabad, Praya-
graj, India. Previously, he worked in the Department of Computer Science and Engi-
neering at the National Institute of Technology Arunachal Pradesh. He received a
Ph.D. in Computer Science and Engineering from the Jawaharlal Nehru University,
New Delhi, India. His research interests include support vector machines, ELM,
RVFL, KRR and other machine learning techniques. He has published over 70
referred journal and conference papers of international repute. His publications have
more than 1384 citations with an h-index of 22 and an i10-index of 45 (Google
Scholar, 21/06/2023). He is currently a member of an editorial review board member
of Applied Intelligence. He is the recipient of the 2017 SERB-Early Career Research
Award in Engineering Sciences which is the prestigious award of India at the early
career level. He is a senior member of IEEE and currently an active member of

xi
xii Editors and Contributors

many scientific societies like IEEE SMC, IEEE CIS, CSI and many more. He has
served as a reviewer of many scientific journals and various national and interna-
tional conferences. He was the general chair of the 3rd International Conference on
Machine Intelligence and Signal Processing (MISP-2021) and associated with other
conferences like IEEE SSCI, IEEE SMC, IJCNN, BDA 2021, etc. He has supervised
three Ph.D. students and guided 15 M.Tech. projects. He is currently the principal
investigator (PI) or a co-PI of two major research projects funded by the Science and
Engineering Research Board (SERB), Government of India.

Dr. Girija Chetty has a bachelor’s and master’s degrees in Electrical Engineering and
Computer Science from India and Ph.D. in Information Sciences and Engineering
from Australia. She has more than 38 years of experience in industry, research and
teaching from Universities and Research and Development Organisations from India
and Australia and has held several leadership positions including the head of Soft-
ware Engineering and Computer Science, the program director of ITS courses, and
the course director for Master of Computing and Information Technology Courses.
Currently, she is a full professor in Computing and Information Technology at School
of Information Technology and Systems at the University of Canberra, Australia, and
leads a research group with several Ph.D. students, post-docs, research assistants and
regular international and national visiting researchers. She is a senior member of
IEEE, USA; a senior member of Australian Computer Society; and ACM member,
and her research interests are in multimodal systems, computer vision, pattern recog-
nition, data mining and medical image computing. She has published extensively
with more than 200 fully refereed publications in several invited book chapters,
edited books, and high-quality conferences and journals, and she is in the editorial
boards, technical review committees and a regular reviewer for several Springer,
IEEE, Elsevier and IET journals in the area related to her research interests. She
is highly interested in seeking wide and interdisciplinary collaborations, research
scholars and visitors in her research group.

Contributors

Sufiyan Adam Department of Computer Science, ARSD College, University of

Delhi, New Delhi, India
Chandni Agarwal GLA University Mathura, Mathura, India
Charu Agarwal Ajay Kumar Garg Engineering College, Dr. A.P.J. Abdul Kalam
Technical University, Ghaziabad, Uttar Pradesh, India
Rajesh Kumar Aggarwal National Institute of Technology, Kurukshetra, India
Neetu Agrawal (Garg) Department of Physics, University of Allahabad, Praya-
graj, U.P., India
Editors and Contributors xiii

I. Akhil Department of Computer Science and Engineering, Koneru Lakshmaiah

Education Foundation, Vaddeswaram, India
Taj Alam Department of Computer Science & Engineering and Information
Technology, Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India
Dragan Andjelic The University of British Columbia, Vancouver, BC, Canada
Anuja Arora Shaheed Rajguru College of Applied Sciences for Women, University
of Delhi, Delhi, India
Shikha Badhani Department of Computer Science, Maitreyi College, University
of Delhi, Delhi, India
Harsh Bamotra Acharya Narendra Dev College, University of Delhi, New Delhi,
Delhi, India
Abhik Banerjee National Institute of Technology, Jote, Arunachal Pradesh, India
Rajiv Bansal National Institute of Technology, Kurukshetra, India;
JMIT Radaur, Radaur, India
Shivani Bansal Assistant Professor, Department of Mathematics, Chandigarh
University, Mohali, Punjab, India
Tapas Kumar Benia National Institute of Technology, Jote, Arunachal Pradesh,
India
Daya Bhardwaj Shaheed Rajguru College of Applied Sciences for Women, Univer-
sity of Delhi, Delhi, India
Charul Bhatnagar GLA University Mathura, Mathura, India
Sudipta Bhattacharya Department of Computer Science and Engineering, GIET
University, Gunupur, India
A. Bhavya Sri Department of Computer Science and Engineering, Koneru Laksh-
maiah Education Foundation, Vaddeswaram, India
Ch. Bhavya Sri Koneru Lakshmaiah Education Foundation, Vijayawada, Andhra
Pradesh, India
Samarjeet Borah Department of Computer Applications, Sikkim Manipal Institute
of Technology, Sikkim Manipal University, Gangtok, Sikkim, India
Pothakam Chandu Department of CSE, Koneru Lakshmaiah Education Founda-
tion, Vaddeswaram, Andhra Pradesh, India
Jyoti Chauhan Department of Computer Science & Engineering and Information
Technology, Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India
Anita Chaware Associate Professor, P G Department of Computer Science,
SNDTWU, Mumbai, India
xiv Editors and Contributors

Girija Chetty Faculty of Science and Technology, University of Canberra, Bruce,

ACT, Australia
Amitabh Chhabra Rogers Communications, Brampton, ON, Canada
Ivelina Daiss The University of British Columbia, Vancouver, BC, Canada
T. Devi Manjari Department of CSE, Koneru Lakshmaiah Education Foundation,
Vaddeswaram, India
Rekha Sri Durga Department of Computer Science and Engineering, Koneru
Lakshmaiah Education Foundation, Vaddeswaram, India
Oscar Leo D’souza HCL Technology, Mumbai, India
Sakshi Garg Acharya Narendra Dev College, University of Delhi, New Delhi,
Delhi, India
B. Gautam Kumar Department of CSE, Koneru Lakshmaiah Education Founda-
tion, Vaddeswaram, India
Mahesh Gawande Electronics and Telecommunication, College of Engineering
Pune, Pune, India
Joanne Gomes St. Francis Institute of Technology, Mumbai, India
Vishnukant Gore Electrical Engineering Department, College of Engineering
Pune, Pune, India
Anjana Gosain USICT, GGSIPU, New Delhi, India
Subhadip Goswami National Institute of Technology, Jote, Arunachal Pradesh,
India
Neha Goyal M.M. Institute of Computer Technology & Business Management,
Maharishi Markandeshwar Deemed to be University, Mullana, Ambala, Haryana,
India
Subhanshu Goyal Marwadi University, Rajkot, Gujarat, India
Syed. Hasma Koneru Lakshmaiah Education Foundation, Vijayawada, Andhra
Pradesh, India
Barenya Bikash Hazarika Department of Computer Science and Engineering,
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
Prabhakar Holambe Electrical Engineering Department, College of Engineering
Pune, Pune, India
Mohsin Imam Department of Computer Science, ARSD College, University of
Delhi, New Delhi, India
Sanasam Inunganbi Department of Computer Science and Engineering, Koneru
Lakshmaiah Education Foundation, Vaddeswaram, India
Editors and Contributors xv

Kirti Jain Department of Computer Science, University of Delhi, New Delhi, Delhi,
India
Pooja Jain Jayoti Vidyapeeth Women’s University, Jaipur, Rajasthan, India
Amit Joshi Department of Computer Engineering and IT, COEP Technological
University (COEP Tech), Pune, Maharashtra, India
Abhay Juvekar IT Consultant, Mumbai, India
Vani Venkata Durga Kadavala Department of CSE, Koneru Lakshmaiah Educa-
tion Foundation, Vaddeswaram, Andhra Pradesh, India
Kapil National Institute of Technology, Kurukshetra, India
Yogita Kapse Electronics and Telecommunication, College of Engineering Pune,
Pune, India
Gauri M. Karve Electrical Engineering Department, PVG’s COET & GKPIM,
Pune, India
Nancy Kaur Faculty of Science and Technology, University of Canberra, Bruce,
ACT, Australia
Sharanjit Kaur Acharya Narendra Dev College, University of Delhi, New Delhi,
Delhi, India
Manju Khari School of Computer and Systems Sciences, Jawaharlal Nehru
University, New Delhi, India
Savara Murali Krishna Department of CSE, Koneru Lakshmaiah Education
Foundation, Vaddeswaram, Andhra Pradesh, India
Anshul Kulkarni Department of Computer Engineering and IT, COEP Technolog-
ical University (COEP Tech), Pune, Maharashtra, India
Priyesh Kulshrestha School of Computer and Systems Sciences, Jawaharlal Nehru
University, New Delhi, India
Anil Kumar Galgotias University, Greater Noida, India;
Deen Dayal Upadhyaya College, University of Delhi, Delhi, India
Ravi Kumar Shaheed Rajguru College of Applied Sciences for Women, University
of Delhi, Delhi, India
Sunil Kumar Shaheed Rajguru College of Applied Sciences for Women, University
of Delhi, Delhi, India
Suyash Kumar USICT, GGSIPU, New Delhi, India;
Department of Computer Science, Hansraj College, University of Delhi, New Delhi,
India
Shobha Lal Jayoti Vidyapeeth Women’s University, Jaipur, Rajasthan, India
xvi Editors and Contributors

Reema Lalit National Institute of Technology, Kurukshetra, India;

Panipat Institute of Engineering and Technology, Samalkha, India
R. Lathish Department of Computer Science and Engineering, Koneru Lakshmaiah
Education Foundation, Vaddeswaram, India
Aadi Lochab Department of Mathematics, Maitreyi College, University of Delhi,
Delhi, India
Geetan Manchanda Department of Mathematics, Maitreyi College, University of
Delhi, Delhi, India
T. C. Manjunath Electronics and Communication Engineering Department,
Dayananda Sagar College of Engineering, Bengaluru, India
Andrea T. J. Martí The University of British Columbia, Vancouver, BC, Canada
José R. Martí The University of British Columbia, Vancouver, BC, Canada
Anurag Mishra Department of Electronics, Deendayal Upadhyay College, Univer-
sity of Delhi, Delhi, India
Brojo Kishore Mishra Department of Computer Science and Engineering, GIET
University, Gunupur, India
Idi Mohammed Computer Science Department, African University of Science and
Technology, F.C.T Abuja, Nigeria
K. Naga Bavana Koneru Lakshmaiah Education Foundation, Vijayawada, Andhra
Pradesh, India
B. G. Nagaraja Electronics and Communication Engineering, Vidyavardhaka
College of Engineering, Mysuru, Karnataka, India
Shivam Pandey Student, Chandigarh University, Mohali, Punjab, India
Bhumika Papnai Department of Mathematics, Maitreyi College, University of
Delhi, Delhi, India
Rajesh Maharudra Patil Electrical Engineering Department, SKNS College of
Engineering Korti, Affiliated to Solapur University, Pandharpur, Maharashtra, India
V. Pavan Surya Prakash Department of CSE, Koneru Lakshmaiah Education
Foundation, Vaddeswaram, India
Soham Pendkar College of Engineering, Pune, India
M. R. Prasad Computer Science and Engineering, Vidyavardhaka College of
Engineering, Mysuru, Karnataka, India
Rajesh Prasad Computer Science Department, African University of Science and
Technology, F.C.T Abuja, Nigeria
Editors and Contributors xvii

Gunjan Rani Acharya Narendra Dev College, University of Delhi, New Delhi,
Delhi, India
Ravi Rayappa Electronics and Communication Engineering, Jain Institute of
Technology, Davanagere, Karnataka, India
Sudeshna Sani Koneru Lakshmaiah Education Foundation, Vijayawada, Andhra
Pradesh, India
Namdev Sawant St. Francis Institute of Technology, Mumbai, India
Suraj Sawant Department of Computer Engineering and IT, COEP Technological
University (COEP Tech), Pune, Maharashtra, India
Pratibha Shingare College of Engineering, Pune, India
Geetika Singh KIET Group of Institutions, Dr. A.P.J. Abdul Kalam Technical
University, Ghaziabad, Uttar Pradesh, India
Khoirom Motilal Singh Department of CSE, Koneru Lakshmaiah Education Foun-
dation, Vaddeswaram, India
Sharat Singh Department of Electronics, Deen Dayal Upadhyaya College, Univer-
sity of Delhi, New Delhi, India
Pranamya Sinha Shaheed Rajguru College of Applied Sciences for Women,
University of Delhi, Delhi, India
P. N. V. L. S. Sneha Sree Department of CSE, Koneru Lakshmaiah Education
Foundation, Vaddeswaram, Andhra Pradesh, India
Vivek Prakash Srivastava National Institute of Technology, Kurukshetra, India
K. M. Swaroopa Faculty of Science and Technology, University of Canberra,
Bruce, ACT, Australia
Mangesh S. Thakare Electrical Engineering Department, PVG’s COET &
GKPIM, Pune, India
Geetanjali A. Vaidya Electrical Engineering Department, PVG’s COET &
GKPIM, Pune, India
T. Veerendra Subramanya Kumar Department of CSE, Koneru Lakshmaiah
Education Foundation, Vaddeswaram, India
Carlos E. Ventura The University of British Columbia, Vancouver, BC, Canada
Aditya Verma Department of Computer Engineering and IT, COEP Technological
University (COEP Tech), Pune, Maharashtra, India
Shiv Kumar Verma Galgotias University, Greater Noida, India
T. V. Vijay Kumar School of Computer and Systems Sciences, Jawaharlal Nehru
University, New Delhi, India
xviii Editors and Contributors

Vikash School of Computer and Systems Sciences, Jawaharlal Nehru University,

New Delhi, India
Mohit Zanwar Department of Computer Engineering and IT, COEP Technological
University (COEP Tech), Pune, Maharashtra, India
Comparative Study of Metaheuristic
Algorithms for Scheduling in Cloud
Computing Based on QoS Parameters

Jyoti Chauhan and Taj Alam

Abstract Cloud computing (CC) has gained huge superiority in recent era by
providing the feature of sharing a pool of computing resources on demand among
various cloud users over the internet. It provides benefits of scalability, flexibility, and
pay-per-use facility using virtualization technology to its clients which attract large
enterprises that work on distributed computing. One important considered research
issue in cloud computing is task scheduling which means that the cloud tasks need
to be appropriately mapped to the existing cloud resources to optimize single or
multiple objectives. The complexity and large search space of task scheduling clas-
sify it as a NP-hard problem. A brief analysis of existing heuristic and metaheuristic
strategies and their application in scheduling cloud environments has been presented
in this paper followed by the comparative study of few metaheuristic algorithms. The
heuristic algorithms cannot produce an exact optimal solution in an acceptable time.
To solve this problem, metaheuristic algorithms based on swarm intelligence and
bio-inspired techniques like Particle Swarm Optimization (PSO), Genetic algorithm
(GA), and Ant Colony Optimization (ACO) algorithm are a good choice for finding
the near-optimal solution. These have been implemented to run in cloud scenarios
and their performance has been compared to optimize the parameters makespan,
average resource usage, and average response time. PSO algorithm is found to be
outperformed ACO and GA in these optimization metrics in various test conditions
in the cloud environment.

Keywords Cloud computing · Scheduling · Metaheuristics · Particle swarm

optimization · Ant colony optimization · Genetic algorithm · QoS parameters

J. Chauhan (B) · T. Alam

Department of Computer Science & Engineering and Information Technology, Jaypee Institute of
Information Technology, Noida, Uttar Pradesh 201309, India
e-mail: jcjyoti.chauhan12@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 1
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_1
2 J. Chauhan and T. Alam

1 Introduction

CC offers a standard platform for cheap and convenient hosting and delivering
computing resources as a utility on demand through the Internet [1]. Cloud providers
rent out physical and logical computing resources on demand from their large data
centers to different cloud users having dynamic needs on pay-per-use basis [2]. Cloud
services are distinctly classified into three kinds: Software as a Service (SaaS), Plat-
form as a Service (PaaS), and Infrastructure as a Service (IaaS). IaaS is an substantial
and fast-growing emerging field to provide maximum benefit to small and medium-
sized organizations [3]. However, with several benefits, there are some major issues
and challenges that need to be addressed in CC such as automated resource provi-
sioning, interoperability, virtualization, privacy and security, data management, load
balancing, network management, application programming interfaces (APIs), and
many more [3, 4].
Virtualization is a primary notion circulating CC technology concept which makes
possible the isolated execution of several cloud users’ tasks at the same time using a
software layer termed as hypervisor or VM monitor [5].
Various cloud users request virtualized resources by specifying a set of resource
instances at any instant to run their task. It is the cloud provider’s responsibility to
allocate resources efficiently and effectively to the given set of tasks at any instant
without any delay which is called resource management. Resource management
includes challenges regarding resource allocation, resource mapping, and modeling,
resource adaptation, resource finding, provisioning, and scheduling of resources.
Both under-provisioning and over-provisioning of resources are avoided as the cloud
services and resources are shared among various cloud clients who use them on a
subscription basis [3]. The main aim of cloud providers is to maximize their profit and
revenue leading to the high performance of the cloud. Hence, cloud providers have to
allocate resources efficiently to save energy usage, improve resource utilization, and
efficient bandwidth management. However, cloud users expect the simplest inter-
face to use Quality of Service (QoS) with minimum expenses, high throughput, and
quick response time [2]. The cloud providers can achieve the objective of maximum
resource usage by minimizing the makespan, task transferring time, task execution
time, energy usage and costs, etc. Cloud users can achieve the objective of reducing
expenses and satisfy QoS by minimizing the average response time.
There should be an efficient and well-managed scheduling mechanism to schedule
the cloudlets to attain maximum resource usage. An efficient scheduling scheme can
be achieved with the appropriate mapping of tasks to the required resources called
task scheduling. Hence, a scheduling problem includes several cloud consumers
tasks that need to be scheduled on the existing VMs followed with few constraints
to achieve optimization of an objective function. The goal is to construct a schedule
specifying which task will be allocated to which resource [6].
The scheduling methods can be categorized into three classifications which are
resource-based scheduling, dependent task-based or workflow-based scheduling, and
independent task scheduling. The tasks are scheduled independently of each other
Comparative Study of Metaheuristic Algorithms for Scheduling … 3

in independent task scheduling, whereas the tasks are bounded with each other via
interdependencies in workflow-based scheduling. Task scheduling methods can be
centralized or distributed. There is only one scheduler for mapping tasks in the
centralized scheduling-based method, whereas the scheduling decisions are decen-
tralized among all available VMs in distributed scheduling. There is one more way
of categorizing job scheduling: static and dynamic schedulings. In static scheduling,
every task is assumed to arrive at the same time. Hence, all the tasks or VMs are
mapped and scheduled based on a priori information. While in dynamic scheduling,
no prior information is there about the task’s arrival, execution time, and VMs. Hence,
all scheduling decisions like resource allocation to incoming cloudlets, execution
time, etc. are done in real-time only. The cloud tasks can be handled immediately
when they arrive called as immediate mode or can be collected in a batch and then the
whole batch can be scheduled called batch mode [7, 8]. The traditional exhaustive
and deterministic scheduling strategies are simple and easy to understand and imple-
ment but do not give any guarantee of getting the optimal solutions in an acceptable
amount of time [9–11]. The traditional heuristic algorithms show slow performance
such as local optimum trap, slow convergence, additional computational time, having
complex operators, and framed only for binary or real searching domain and not
suitable for complex scientific optimization problems and large solution space.
To solve this problem, metaheuristics’ algorithms based on swarm intelligence
and bio-inspired techniques like Particle Swarm Optimization (PSO), Ant Colony
Optimization (ACO), and Genetic Algorithm (GA) are a good choice for finding the
near-optimal solution [6, 7, 9].
Recently, various task scheduling proposals have been proposed in cloud
computing environments but despite that, no comprehensive performance study has
been done to compare existing task scheduling algorithms. A comparative study of
existing metaheuristics algorithms and their application in scheduling in cloud envi-
ronments has been presented in this paper. These three have been implemented to run
in cloud scenarios and their performance has been compared to optimize the parame-
ters makespan, average resource usage, and average response time. The experimental
results are compared and found that PSO outperforms ACO and GA algorithms in
optimizing the objective function.

1.1 Contributions

The following are some of the major contributions made by this work.
• A system model is presented, including a task model, and virtual machine model.
• It demonstrated an adaptive task allocation to virtual machines that dynamically
adjusts task execution time.
• It proposed a model based on PSO for cloud computing and to see its inverse
relationship between makespan and average resource utilization.
4 J. Chauhan and T. Alam

• It looked into the impact of several scenarios on the heterogeneous cloud system
on makespan, response time, and average resource utilization of the system.
• It enables us to perform comparative study of existing metaheuristic techniques
PSO, ACO, and GA and found PSO to be outperformed than GA and ACO.

2 Review Literature

Authors gave an extensive analysis of cloud computing emphasizing its key models,
its architectural principles, its state-of-the-art implementation, its advantages along
with research challenges in [1, 4]. Serving the cloud handy resources to the cloud
users termed as scheduling acts as the main theme in the research of cloud resource
management and primarily its task scheduling section [2, 12]. The global research
community focusing on cloud computing has developed an increasing interest in
its resource scheduling issue. The categorization of resource allocation methods
has been discussed in [2, 12]. Researchers proposed various heuristics algorithms
for independent task scheduling such as Min–Min, Max–Min, round-robin, First
Come First Serve, and many more to overcome the drawbacks of traditional exhaus-
tive and deterministic strategies [13–18]. Authors [13] compared performance of
various heuristic approaches such as Min–Min, Max–Min, and Duplex depending on
metrics Minimum Execution Time (MET) and Minimum Completion Time (MCT).
Researchers have conducted extensive review of dependent job-centered strategies
modeled with Directed Acyclic Graph (DAG) for task scheduling problem described
in [5, 19]. Scheduling strategies for dependent jobs have been presented in [20–22].
Due to complexity and large search space, researchers classify task scheduling as
a NP-hard problem. The heuristic strategies generally suffer from slow convergence,
and the solution generated by heuristic approaches may be stuck in local optima
and difficult to find the exact solution. Thereby, to improve the solution quality and
computing time, metaheuristics’ techniques have already gained vast attention since
the past many years for the NP-hard problems. Metaheuristic approaches provide
near-optimal solutions within an acceptable timespan and make task scheduling algo-
rithms more effective and efficient. Lot of review literatures was given by various
researchers on metaheuristic techniques adopted for task scheduling for distributed
environment, i.e., cloud computing, cluster, and grid environment which includes
ACO, PSO, GA, League Championship Algorithm (LCA), and BAT algorithm [6,
11, 23]. In this direction, Tsai and Rodrigues [11] gave an extensive review of litera-
ture discussing metaheuristics techniques for cloud task scheduling and presented the
major issues and challenges faced in metaheuristics’ algorithms. Researchers have
studied and analyzed the performance of metaheuristic techniques in cloud system
[13, 24]. An ACO algorithm in case of independent task scheduling was proposed in
[25, 26] to optimize QoS parameters in cloud computing. Various GA-based algo-
rithms and their modifications have been proposed by researchers to optimize the QoS
parameters for task scheduling in cloud system in references [27, 28] to be outper-
formed compared to traditional PSO and GA algorithms [29]. Researchers found that
Comparative Study of Metaheuristic Algorithms for Scheduling … 5

PSO provides fast task scheduling and solution quality better than existing heuris-
tics and other metaheuristics in the grid and homogenously distributed and cloud
computing [23, 30–33]. Researchers proposed various modified forms of PSO algo-
rithm which is found to be outperformed than standard PSO and other metaheuristics
discussed in references [34–36]. Although, all compared algorithms show satisfac-
tory revenue from simulation outcomes. However, the new modified PSO algorithm is
much better than the compared algorithms in cloud computing and its performance is
improved by using load balancing technique proposed in the references [37, 38] which
minimizes QoS parameters such as makespan, execution time, resource utilization,
cost, transmission time, round trip time to perform load balancing between cloudlets
and VMs.

3 Scheduling Approaches

3.1 Heuristic Approaches

• First Come First Serve Algorithm: Resources are assigned to the tasks according
to their order of arrival. The early the task arrives, the earlier it gets the resources
and then releases the resource after completing its execution [24, 39].
• Round-Robin Algorithm: The tasks are assigned the resources in an FCFS
manner, but they get the resource only for a small-time quantum. The resource is
pre-empted if the allotted time slot expires and is given to the next waiting task
in the ready queue. The former pre-empted task is directed to wait at the tail of
the ready queue if its execution is not complete [39].
• Min–Min: The notion of Min–Min algorithm is to select the shorter job first having
Minimum Completion Time (MCT) from the given task set and further allocate
the selected shortest task to a resource having minimum expected completion
time. This algorithm computes expected completion time C ij of any ith task from
the cloudlets set T = {t 1 , t 2 , t 3 , …, t n } on any jth resource from a resources set
R = {r 1 , r 2 , r 3 , …, r m } using Eq. (1) given below:

Ci j = E i j + re j . (1)

Here, rej denotes the time to get ready or prepare of resource r j and E ij denotes the
time taken by ith task to execute on jth resource. The expected completion time of
all tasks is calculated using the above Eq. (1), and then, the task having the shortest
expected completion time is selected and is mapped to the respective resource and
detached from the task set. This step is reiterated for all subsequent tasks in the set
until all tasks have been mapped to the respective resources [17].
6 J. Chauhan and T. Alam

• Max–Min: This algorithm prioritizes the longer tasks having maximum MCT
than the shorter tasks. It firstly selects the longer tasks from the given task set
for resource assignment. This algorithm is proved to be superior to the Min–Min
algorithm when the count of shorter tasks is greater than longer tasks [17].
• RASA (Resource Awareness Scheduling Algorithm): The Max–Min and Min–
Min approaches can be used otherwise to enjoy their benefits and overcome their
drawbacks which result in a hybrid efficient scheduling scheme known as RASA.
• Best Fit: This scheduling policy assigns resources to the job which requires the
maximum number of resources from the given task set. When multiple resources
of different types are required by VMs, in that case, one kind of resource can
be taken as a “reference resource” and then choose the best fit according to the
reference resource.
The traditional heuristic algorithms show slow performance such as local optimum
trap, slow convergence, additional computational time, having complex operators,
and framed only for binary or real searching domain. Hence, heuristic algorithms are
not suitable for complex scientific optimization problems and large solution space.
This motivates the researchers to enhance the heuristic approaches to overcome their
drawbacks leading to metaheuristic algorithms.

3.2 Metaheuristic Approaches

Metaheuristic algorithms also use iterative techniques to solve in some acceptable

amount of time in comparison to traditional exhaustive and deterministic heuristic
strategies. However, there exist no such metaheuristics algorithms applicable to all
kinds of scheduling problems. They show varying performance in different problems.
There exist no such algorithms that can give an optimal solution in polynomial time
for optimization problems having large domain and complexity. Researchers have
to satisfy with suboptimal solutions [11]. These algorithms improve computing time
and solution quality. Generally, metaheuristics approaches can be divided into two
broad classes, bio-inspired (BI) and swarm intelligence (SI)-inspired techniques.
Computer science technology can be correlated to nature that may solve several
real-life optimization problems. Some nature-inspired metaheuristics algorithms
commonly used for scheduling in cloud computing include Memetic algorithms
(MAs), Genetic algorithm (GA), Imperative competitive algorithm (ICA), and Lion
algorithm (LA). Further, many metaheuristics algorithms have evolved from the
social behavior of animals such as Wolf, Lions and the behavior of birds and other
insects such as ants, honey bees. Their way of finding source of food in optimal time
is the main inspiration. Many approaches such as ACO, PSO, Honeybee, Bat algo-
rithm, etc. have been inspired by this swarm behavior. In this paper, the authors have
simulated the working of GA, ACO, and PSO in the cloud environment and compared
their performance for response time, makespan, and average resource utilization.
Comparative Study of Metaheuristic Algorithms for Scheduling … 7

• Genetic Algorithm (GA): The concept of the GA method was first given by
Holland in 1975 which proved its effectiveness for complex and large searching
problems. GA is a probabilistic population-based and evolutionary optimization
technique that is motivated by the natural evolutionary process of the chromo-
somes in which the notion of fittest survival is used, i.e., recombination of the
chromosomes provides new better solutions via the use of genetic crossover,
mutation, and inversion [40, 41].
• Ant Colony Optimization (ACO): Ant Colony Optimization (ACO) is used in
computer science and Operation Search for solving complex combinatorial opti-
mization problems. Dorigo in 1992 originally introduced this novel ant system
approach in his Ph.D. thesis. Since 1992, various ACO algorithms have been
proposed which almost share the same idea. The prime idea of ACO is motivated
by the searching behavior of real ants to locate the shortest path through their ant
colonies directing to their food source [42, 43].
• Particle Swarm Optimization (PSO): PSO is expected as a powerful optimiza-
tion and computational technique to get the optimal solution for multimodal
continuous optimization problems. PSO is a swarm intelligent, evolutionary,
and population-based metaheuristic technique developed in 1995 by Kennedy
and Eberhart to perform global search. Originally, its idea was motivated by the
particle’s social behavior and their movement such as birds, fish herds [23, 24]

4 Problem Description and QoS Parameters

The Task Assignment Problem (TAP) can be described as follows. A set of tasks or
cloudlets is represented by set T = {t 1 , t 2 , t 3 , …, t n }, where n is the total number
of independent tasks in a batch which are different in length. All available VMs
are represented by a set VM = {VM1 , VM2 , VM3 , …, VMm }, where m is the total
no. of available VMs which are different in MIPS rating. This implies that tasks
executed on different machines have different execution times and execution costs.
The number of cloudlets is always more than the number of VMs. The processing
time of any cloudlet T i on VMj is denoted by PTij and the completion time of VMj
as CTj . Finishing time and submission time of any cloudlet T i are denoted by FTi
and SubTi , respectively. Response time of ith task is denoted by RTTi, and average
response time is computed as denoted by AvRT. Our objective of minimizing overall
makespan and average response time and maximizing average resource utilization
(LBR) can be described with Eqs. (2), (3), and (5) given below [38, 44]:

Makespan = max{CT j | j = 1, 2, 3, . . . , m,| (2)

n
FTi j − SubTi j
Average Response Time (AvRT) = i=1
, (3)
n
8 J. Chauhan and T. Alam
n
i=1 PTi j
UtilizationVM j = , (4)
makespan
m
j=1 UtilizationVM j
Average Utilization = , (5)
m

Each task in T is bounded by T max and T min , i.e., T min ≤ T i ≤ T max , and each VM
in VM set is bounded by VMmax and VMmin , i.e., VMmin ≤ V j ≤ VMmax [45]. VMs
are always considered to be available all the time. The tasks cannot be interrupted or
pre-empted during processing on VM. Each VM can process only one cloudlet at a
time and cloudlets cannot be run on more than one VM at a time. When cloudlet i
is allocated to machine j, X ij becomes 1, otherwise it is 0. Two basic conditions are
considered to satisfy the above-specified constraints. Condition (6) ensures that each
task is assigned to only one VM [24].

m
X i j = 1 ∀i ∈ T, (6)
j

X i j ∈ {0, 1} ∀ j ∈ M, i ∈ T . (7)

5 Comparative Analysis of PSO, GA, and ACO

Task scheduling aims to perform appropriate mapping of the cloudlets to the available
VMs so that computing resources can be utilized efficiently and cloud users’ expenses
can be minimized.
The aim is to find the best metaheuristic approach for task scheduling which
minimizes makespan and average response time for cloud users and maximizes the
average resource utilization for cloud providers in highly distributed and dynamic
multiprocessing environments, i.e., the cloud computing environment.
The authors have performed various experiments by increasing the number of
cloudlets for heterogeneous systems to perform comparative analysis of existing
metaheuristic algorithms, PSO, ACO, and GA for task scheduling problem for the
parameter settings of VMs and cloudlets. Ten datacenters are created with two hosts
and 50 VMs each in the experiment and cloudlets count is varied from 100 to 1000
under the simulation environment. The task length is taken in the range of 1000–
20,000 Million Instructions (MIs). The cloudlets are assigned to heterogeneous VMs
by varying their MIPS between 500 and 2000 and bandwidth in between 500 and
1000. The stopping criteria are set up to 100 iterations. The results of ten experi-
ments are taken over 100 iterations for task range 100–1000 and the average of the
optimization parameter values is taken.
Comparative Study of Metaheuristic Algorithms for Scheduling … 9

Fig. 1 Makespan comparison

The algorithms are compared based on the following parameters, i.e., makespan,
average response time and average resource utilization. The average of ten repetitions
is taken to obtain the average makespan for PSO, ACO, and GA as shown in Fig. 1.
The PSO algorithm shows a lower makespan than ACO and GA. The PSO takes
less time to execute a given task set on available VMs than ACO and GA which
indicates its outperformance in minimizing the makespan. The cloud users also wish
for quick response time from the cloud system to satisfy their QoS requirements. The
evaluation of average response time for PSO, ACO, and GA algorithms is done as
shown in Fig. 2 which shows that PSO takes less time to respond than ACO and GA.
The average resource utilization is calculated using Eq. (5). It is found that PSO uses
resources more efficiently and effectively as per the cloud providers’ desire to gain
more profit and revenue from cloud computing. The comparison of average resource
utilization is shown in Fig. 3. Based on the experimental or simulation outcomes, it
is clearly visible that few of the scheduling algorithms are very much favorable to
be adopted in cloud computing. From experimental results, it is clearly visible that
PSO found to be outperforming ACO and GA for optimization metrics makespan,
average response time, and LBR.

6 Conclusion and Future Work

In this paper, brief analysis of the existing heuristics and metaheuristics approaches
has been presented for task scheduling. As task scheduling problem is NP-hard in
nature and slow convergence and trap in local optimal occur in heuristic approaches,
the metaheuristics approaches have gained popularity over heuristics one. This paper
10 J. Chauhan and T. Alam

Fig. 2 Average response time comparison

Fig. 3 Average resource utilization comparison

mainly focuses on comparative analysis of metaheuristic approaches for the task

scheduling problem to optimize QoS parameters. The metaheuristics algorithms
PSO, ACO, and GA are evaluated using CloudSim Simulation tool for optimization
metrics: makespan, average response time for satisfying cloud users, and average
resource utilization for cloud provider’s benefits. Simulation results show that PSO
Comparative Study of Metaheuristic Algorithms for Scheduling … 11

outperforms over GA and ACO for makespan, average response time, and average
resource utilization for scheduling batch of independent tasks in a heterogeneous
cloud computing environment. There is no such metaheuristic algorithm which
performs better in all the problems. Their performance varies with the complexity of
the problem. Researchers found PSO as an interesting heuristic algorithm because
of its various advantages compared to other metaheuristic techniques such that it
can be written in few lines of code and can be implemented with only basic math-
ematical operators. PSO is capable of escaping from local optima and shows faster
convergence than other metaheuristic techniques by sustaining a balance between
exploitation and exploration. In most of the less complex and continuous search
space problems, PSO performs better than ACO and GA in terms of its success
rate and quality of the solution as observed in the considered task problem in this
paper. For complex and large search space problems, GA or ACO may perform better
than PSO. PSO is fast gradient, more robust, and stable algorithm. Its mathematical
implementation is easier than ACO and GA as it has few parameters to be adjusted.
This may be the reason for outperforming PSO over GA and ACO in optimizing
the specified QoS parameters. The authors are working on a modified PSO approach
that improves the other QoS parameters such as fault tolerance and reducing the cost
involved in the CC for scheduling workflow-centered scientific applications in cloud
computing.

References

1. Zhang Q, Cheng L, Boutaba R (2010) Cloud computing: state-of-the-art and research

challenges. J Internet Serv Appl 1:7–18. https://doi.org/10.1007/s13174-010-0007-6
2. Madni SHH, Latiff MSA, Coulibaly Y, Abdulhamid SM (2017) Recent advancements in
resource allocation techniques for cloud computing environment: a systematic review. Cluster
Comput 20:2489–2533. https://doi.org/10.1007/s10586-016-0684-4
3. Manvi SS, Krishna Shyam G (2014) Resource management for infrastructure as a service (IaaS)
in cloud computing: a survey. J Netw Comput Appl 41:424–440. https://doi.org/10.1016/j.jnca.
2013.10.004
4. Ghanam Y, Ferreira J, Maurer F (2012) Emerging issues and challenges in cloud computing—a
hybrid approach. J Softw Eng Appl 05:923–937. https://doi.org/10.4236/jsea.2012.531107
5. Masdari M, ValiKardan S, Shahi Z, Azar SI (2016) Towards workflow scheduling in cloud
computing: a comprehensive analysis. J Netw Comput Appl 66:64–82. https://doi.org/10.1016/
j.jnca.2016.01.018
6. Kalra M, Singh S (2015) A review of metaheuristic scheduling techniques in cloud computing.
Egypt Informatics J 16:275–295. https://doi.org/10.1016/j.eij.2015.07.001
7. Alam T, Dubey P, Kumar A (2018) Adaptive threshold based scheduler for batch of independent
jobs for cloud computing system. Int J Distrib Syst Technol 9:20–39. https://doi.org/10.4018/
IJDST.2018100102
8. Xhafa F, Abraham A (2010) Computational models and heuristic methods for grid scheduling
problems. Futur Gener Comput Syst 26:608–621. https://doi.org/10.1016/j.future.2009.11.005
9. Al-Arasi R, Saif A (2020) Task scheduling in cloud computing based on metaheuristic tech-
niques: a review paper. EAI Endorsed Trans Cloud Syst 6:162829. https://doi.org/10.4108/eai.
13-7-2018.162829
12 J. Chauhan and T. Alam

10. Madni SHH, Latiff MSA, Coulibaly Y, Abdulhamid SM (2016) An appraisal of meta-heuristic
resource allocation techniques for IaaS cloud. Indian J Sci Technol 9. https://doi.org/10.17485/
ijst/2016/v9i4/80561
11. Tsai CW, Rodrigues JJPC (2014) Metaheuristic scheduling for cloud: a survey. IEEE Syst J
8:279–291. https://doi.org/10.1109/JSYST.2013.2256731
12. Madni SHH, Latiff MSA, Coulibaly Y, Abdulhamid SM (2016) Resource scheduling for infras-
tructure as a service (IaaS) in cloud computing: challenges and opportunities. J Netw Comput
Appl 68:173–200. https://doi.org/10.1016/j.jnca.2016.04.016
13. Braun TD, Siegel HJ, Beck N et al (2001) A comparison of eleven static heuristics for mapping a
class of independent tasks onto heterogeneous distributed computing systems. J Parallel Distrib
Comput 61:810–837. https://doi.org/10.1006/jpdc.2000.1714
14. Thomas A, Krishnalal G, Jagathy Raj VP (2015) Credit based scheduling algorithm in cloud
computing environment. Procedia Comput Sci 46:913–920. https://doi.org/10.1016/j.procs.
2015.02.162
15. Elzeki OM, Reshad MZ, Elsoud M (2012) Improved max-min algorithm in cloud computing.
Int J Comput Appl 50:22–27.https://doi.org/10.5120/7823-1009
16. Parsa (2009) RASA: a new grid task scheduling algorithm. Int J Digit Content Technol Appl.
https://doi.org/10.4156/jdcta.vol3.issue4.10
17. Devipriya S, Ramesh C (2013) Improved max-min heuristic model for task scheduling
in cloud. In: Proceedings 2013 international conference on green computing, communica-
tion and conservation of energy, ICGCE 2013, pp 883–888.https://doi.org/10.1109/ICGCE.
2013.6823559
18. Maguluri ST, Srikant R, Ying L (2012) Stochastic models of load balancing and scheduling in
cloud computing clusters. In: Proceedings—IEEE INFOCOM, pp 702–710.https://doi.org/10.
1109/INFCOM.2012.6195815
19. Kaur S, Bagga P, Hans R, Kaur H (2019) Quality of service (QoS) aware workflow scheduling
(WFS) in cloud computing: a systematic review. Arab J Sci Eng 44:2867–2897. https://doi.
org/10.1007/s13369-018-3614-3
20. Alam T, Raza Z (2018) Quantum genetic algorithm based scheduler for batch of precedence
constrained jobs on heterogeneous computing systems. J Syst Softw 135:126–142. https://doi.
org/10.1016/j.jss.2017.10.001
21. Shahid M, Raza Z, Sajid M (2015) Level based batch scheduling strategy with idle slot reduction
under DAG constraints for computational grid. J Syst Softw 108:110–133. https://doi.org/10.
1016/j.jss.2015.06.016
22. Zhang Y, Koelbe C, Cooper K (2009) Batch queue resource scheduling for workflow applica-
tions. Proceedings—IEEE international conference on cluster computing. https://doi.org/10.
1109/CLUSTR.2009.5289186
23. Attiya I, Zhang X (2017) A simplified particle swarm optimization for job scheduling in cloud
computing. Int J Comput Appl. https://doi.org/10.5120/ijca2017913744
24. Mathew T, Sekaran KC, Jose J (2014) Study and analysis of various task scheduling algo-
rithms in the cloud computing environment. In: Proceedings 2014 international conference on
advances in computing, communications and informatics, ICACCI 2014, pp 658–664. https://
doi.org/10.1109/ICACCI.2014.6968517
25. Tawfeek M, El-Sisi A, Keshk A, Torkey F (2015) Cloud task scheduling based on ant colony
optimization. Int Arab J Inf Technol 12:129–137
26. Srikanth GU, Maheswari VU, Shanthi P, Siromoney A (2012) Tasks scheduling using ant colony
optimization. J Comput Sci 8:1314–1320. https://doi.org/10.3844/jcssp.2012.1314.1320
27. Jabreel M. The study of genetic algorithm-based task scheduling for cloud computing
28. Safwat A, Fatma A (2016) Genetic-based task scheduling algorithm in cloud computing
environment. Int J Adv Comput Sci Appl 7. https://doi.org/10.14569/ijacsa.2016.070471
29. Almezeini N, Hafez A (2017) Task scheduling in cloud computing using lion optimization
algorithm. Int J Adv Comput Sci Appl 8:. https://doi.org/10.14569/ijacsa.2017.081110
30. Agarwal M, Srivastava GMS (2019) A PSO algorithm based task scheduling in cloud
computing. Int J Appl Metaheuristic Comput 10:1–17. https://doi.org/10.4018/IJAMC.201910
0101
Comparative Study of Metaheuristic Algorithms for Scheduling … 13

31. Masdari M, Salehi F, Jalali M, Bidaki M (2017) A survey of PSO-based scheduling algorithms
in cloud computing. J Netw Syst Manag 25:122–158. https://doi.org/10.1007/s10922-016-
9385-9
32. Salman A, Ahmad I, Al-Madani S (2002) Particle swarm optimization for task assignment
problem. Microprocess Microsyst 26:363–371. https://doi.org/10.1016/S0141-9331(02)000
53-4
33. Zhang L, Chen Y, Yang B (2006) Task scheduling based on PSO algorithm in computational
grid. Proc - ISDA 2006 Sixth Int Conf Intell Syst Des Appl 2:696–701. https://doi.org/10.1109/
ISDA.2006.253921
34. Al-Maamari A, Omara FA (2015) Task scheduling using PSO algorithm in cloud computing
environments. Int J Grid Distrib Comput 8:245–256. https://doi.org/10.14257/ijgdc.2015.8.
5.24
35. Beegom ASA, Rajasree MS (2019) Integer-PSO: a discrete PSO algorithm for task scheduling
in cloud computing systems. Evol Intell 12:227–239. https://doi.org/10.1007/s12065-019-002
16-7
36. Guo L, Zhao S, Shen S, Jiang C (2012) Task scheduling optimization in cloud computing based
on heuristic algorithm. J Networks 7:547–553. https://doi.org/10.4304/jnw.7.3.547-553
37. Awad AI, El-Hefnawy NA, Abdel-Kader HM (2015) Enhanced particle swarm optimization for
task scheduling in cloud computing environments. Procedia Comput Sci 65:920–929. https://
doi.org/10.1016/j.procs.2015.09.064
38. Ebadifard F, Babamir SM (2018) A PSO-based task scheduling algorithm improved using a
load-balancing technique for the cloud computing environment. Concurr Comput 30
39. Salot P (2013) A survey of various scheduling algorithm in cloud computing environment. Int
J Res Eng Technol 2(2):131–135
40. Kaur S, Verma A (2012) An efficient approach to genetic algorithm for task scheduling in cloud
computing environment. Int J Inf Technol Comput Sci 4:74–79. https://doi.org/10.5815/ijitcs.
2012.10.09
41. Konar D, Sharma K, Sarogi V, Bhattacharyya S (2018) A multi-objective quantum-inspired
genetic algorithm (Mo-QIGA) for real-time tasks scheduling in multiprocessor environment.
Procedia Comput Sci 131:591–599. https://doi.org/10.1016/j.procs.2018.04.301
42. Gupta A, Garg R (2017) Load balancing based task scheduling with ACO in cloud computing.
In: 2017 International conference computing applications ICCA 2017, pp 174–179.https://doi.
org/10.1109/COMAPP.2017.8079781
43. Introduction I (2011) Improved ant colony optimization for grid scheduling. 1:596–604
44. Alworafi MA, Dhari A, El-Booz SA et al (2019) An enhanced task scheduling in cloud
computing based on hybrid approach. Springer Singapore
45. Alsaidy SA, Abbood AD, Sahib MA (2020) Heuristic initialization of PSO task scheduling
algorithm in cloud computing. J King Saud Univ—Comput Inf Sci.https://doi.org/10.1016/j.
jksuci.2020.11.002
Impact of Spatial Distribution
of Repeated Samples on the Geometry
of Hyperplanes

Reema Lalit and Kapil

Abstract Support vector machines (SVMs) and their uses in various scientific
domains have been the subject of extensive research in recent years. SVMs are among
the most potent and reliable classification and regression algorithms in various appli-
cation areas. In the proposed work, the impact of location and multiple occurrences
of support vectors on SVM has been studied by noticing the geometrical differences.
Multiple occurrences or repetitions of data points are generally done; in case of
imbalance classes to balance the data otherwise, results will be biased toward the
majority class. Multiple occurrences of the same data points will result in a change
of behavior and orientation of the hyperplane. The hyperplane will change if the
support vectors are deleted or added.

Keywords Support vectors · Support vector machine · Imbalanced class

1 Introduction

One of the most popular techniques for classification problems, such as disease
detection [1, 2], text recognition [3], emotion detection [4] and face detection [5],
is the support vector machine (SVM). For the optimization problem, SVM provides
a globally optimal solution by employing a maximum margin strategy. The notion
of structural risk minimization is included into SVM. Vapnik is presented SVM as
a machine learning model for applications including classification and regression.

R. Lalit (B) · Kapil

National Institute of Technology, Kurukshetra, India
e-mail: reema_62100012@nitkkr.ac.in
Kapil
e-mail: kapil@nitkkr.ac.in
R. Lalit
Panipat Institute of Engineering and Technology, Samalkha, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 15
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_2
16 R. Lalit and Kapil

SVM’s outstanding generalization ability, optimal solution, and exclusionary capa-

bility have recently piqued the interest of the data mining, pattern recognition, and
machine learning communities. SVM has proven to be a potent technique for handling
real-world binary classification issues. SVMs have been demonstrated to outperform
other supervised learning techniques. Due to their solid theoretical foundations and
great generalization capacities, SVMs have become one of the most widely used
classification approaches in recent years [6].
The remaining sections are arranged as follows: Sect. 2 provides a thorough
theoretical analysis of the support vector machine. Section 3 details the proposed
model of the implemented work, experiments, and results obtained, and finally, Sect. 4
presents the conclusion and future directions in this area of research.

2 Review of SVM Classifiers

Support vector machines were created by Vladimir Vapnik in 1979. As seen in Fig. 2,
an SVM is a hyperplane that, with the largest feasible margin, separates a set of
positive samples from a set of negative samples. The distance between the hyperplane
and the closest positive and negative examples in the linear instance determines the
margin. Various versions of SVM are available such as Twin SVM, Least squares twin
SVM, L1-norm-based TSVM, Fuzzy SVM, and SVM for multi-view and multi-class
learning. Jayadeva and Chandra [7], by tackling quadratic programming issues in the
twin support vector machine, produce two non-parallel hyperplanes. Kumar et al. [8],
in least square Twin SVM, equality constraints are modified in inequality constraints.
Wang et al. [9] present L1-norm-based TSVM to increase the robustness of the TSVM
model. To reduce the effects of outliers, fuzzy support vector machines came into the
picture [10]. Multi-view learning further enhances the generalization of the SVM-
based models [11]. Richhariya and Tanveer [12] proposed a reduced Universum
twin support vector learning to address the issue of class imbalance by employing
a tiny rectangular kernel matrix to shorten the computation time of our Universum-
based approach. Ganaie and Tanveer [13] take into account the neighborhood that is
included in the objective function’s weight matrix.

2.1 Support Vector Machine (SVM)

In its linear form, SVM is a hyperplane to distinguish between sets of positive and
negative data samples. Numerous hyperplanes might be used to divide the two classes,
but the one that generates the greatest margin is picked. The margin is determined by
calculating the distance between the hyperplane and the nearest positive or negative
data sample [14]. Let the training data be denoted by T d .

Td = {( A1 D1 ), ( A2 D2 ), . . . , ( Am Dm )}, (1)
Impact of Spatial Distribution of Repeated Samples on the Geometry … 17

where Am ∈ R n and D ∈ {1, −1} are the labels for ith observations i = 1, 2, . . . , m.
In Linear SVM, the Primal QPP is intended to be resolved using Linear SVM.

1 m
min w22 + C ξi
w,b,ξ 2
i=1
s.t. D(∅(A)w + eb) ≥ e − ξ, (2)

where ξ is the slack variable and C is a penalty parameter. The objective is to identify
the best separating hyperplane.

(∅(A)w) + b = 0, where x ∈ R n . (3)

The Dual corresponding to Eq. (2) can be articulated as

1
max − α t D∅( A)∅(A)t Dα + et α
α,μ 2
s.t., Ce ≥ α ≥ 0, D t α = 0, (4)

where D = diag(D) and α ≥ 0, Lagrangian multiplier.

Equation (4) may be rewritten as:

1
min α t D K A, At Dα − et α
α,μ 2

s.t., Ce ≥ α ≥ 0, D t α = 0, (5)

Here, K A, At = ∅(A)∅(A)t represents the linear Kernel function. To find the value
of b and ξ, we will look for support vectors. Data points where α = 0 as per KKT
conditions:

D(∅(A)i w + ei b) + ξi − ei = 0, (6)

where D is diagonal matrix and ei = 1.

Since this is the solution instead of looking for αi = 0, we say that |α| ≤ 10−10
= threshold, and support vectors are the points that lie on the hyperplane or inside
the margin. Variables b and w will be calculated by Eqs. (7) and (8), respectively.
Consideringξi = 0 and μ = 0,

b = Di − α j D j ∅(A)tj ∅( A)i . (7)
j∈sv

Then, mean of b will be taken.

s.t., α ≥ threshold and α ≤ C − threshold

18 R. Lalit and Kapil

w = At Dα. (8)

value of b with support vectors can be calculated via Eq. (9).

(n + + n − )b = (n + − n − ) − α t D A xi . (9)

Here, n + and n − are the support vectors belonging to positive and negative classes
and xi is the data points of both classes. For each support vector, if it is belonging to a
positive class, the value of ξi can be calculated by Eq. (10). For each support vector,
if it is belonging to a negative class, the value of ξi∗ will be calculated by Eq. (11).

ξi = 1 − xi ∅(A)t Dα − b, (10)

ξi∗ = 1 + xi ∅(A)t Dα + b. (11)

xi belongs to the positive class data sample for ξi , and for ξi∗ it will belong to the
negative class data points.

3 Experimental Setup, Results, and Analysis

In this part, an experiment artificially generated dataset is undertaken. The experi-

ments are conducted in MATLAB 2019 on a system with 8 GB RAM, 1 TB storage,
Intel Core i7 processor with a processing speed of 3.0 GHz. As per the selection of
the kernel, the linear kernel is used.

3.1 Dataset Used

For the proposed work, two clustered normal distributed datasets, having two features
X1 and X2, are generated and divided into two classes. Dataset is imbalanced and has
500 and 1000 data samples for positive and negative classes, respectively. Dataset
imbalance is in 1:2 as shown in Fig. 1. Few data samples from each class lie in the
overlapped region. The dataset dimensions are 1500 × 2.
After implementing the SVM classifier on artificially generated dataset and calcu-
lating and plotting support vectors on the classifier as shown in Fig. 2. Now, we will
try to note the geometrical differences after repeating support vectors at different
locations. Firstly, the value of ξ and ξ * is divided into ten different bins for posi-
tive and negative classes, respectively. The distribution of the positive and negative
classes in various bins is shown in Figs. 3 and 4.Two bins of size (0 to 0.5) and (0.5
to 1.0) from the value of ξ and ξ * are created, and their data samples are repeated
Impact of Spatial Distribution of Repeated Samples on the Geometry … 19

Fig. 1 Artificially generated

dataset with normal
distribution

Fig. 2 SVM hyperplane

with support vectors without
repeating the data samples

for both positive and negative classes. After repeating the data samples of the posi-
tive class, the entire hyperplane will be shifted in the upward direction as shown in
Figs. 5 and 6. Similarly, data samples can be repeated for the negative class, then the
entire hyperplane will be shifted in downward direction. ξ and ξ * values obtained
are positive, but some values of ξ and ξ * are negative because some points are lying
very close to the line.

3.2 Algorithm

Step 1. Define a matrix X of size m × n. Define One’s matrix e, slack variables ξ

and ξ * for positive and negative classes respectively.
20 R. Lalit and Kapil

Fig. 3 Histogram showing

the value of ξ for the positive
class

Fig. 4 Histogram showing

the value of ξ * for the
negative class

Step 2. Implement the SVM classifier from Eq. (5). Considering variable C = 1,
and threshold = 10−10 .
Step 3. Calculate Support Vector data points where α ≥ Threshold and plot them.
Step 4. Calculate the value of b and w from Eqs. (7) and (8) respectively.
Step 5. Calculate the value of ξ and ξ * from Eqs. (10) and (11) respectively.
Step 6. Dividing the value of ξ and ξ *into bins and repeating the data points of a
particular bin n number of times at a time, and noticing geometrical differences
on SVM.
Step 7. Now, repeat the number of support vectors at a different location and notice
geometrical differences on SVM. Various locations at which support vectors are
repeated are arranged in four cases.
Case 1: Left side SV of the Positive class.
Impact of Spatial Distribution of Repeated Samples on the Geometry … 21

Fig. 5 Repeating the values

of Bin1 (0–0.5) for positive
class samples, 10 times

Fig. 6 Repeating the values

of Bin2 (0.5–1.0) positive
class samples, 10 times

Case 2: Right side SV of the Positive class.

Case 3: Left side SV of the negative class.
Case 4: Right side SV of the negative class.

3.3 Case 1 and Case 3: Repeating the Left-Side Support

Vectors Belonging to the Positive and Negative Classes

Consider Fig. 2, for the original data. Here, in this case, data points of the positive
class for input feature in the range X1(− 0.6 to − 0.2) and X2(0 to 0.5) are repeated.
In total, eight data points fall in this range. As the number of data points of the
positive left class is repeated, the contour is shifted in the upward direction from that
22 R. Lalit and Kapil

Fig. 7 Repeating the data

points with range X1(− 0.6
to − 0.2) and X2(0 to 0.5),
10 times

particular location as shown in Fig. 7. Similarly, data points of the negative class for
input feature in the range X1(− 0.6 to 0) and X2(0.4 to 0.6) are repeated. In total, 21
data points fall in this range. As the number of data points of the negative left class
is repeated, the contour is shifted in the left downward direction from that particular
location as shown in Fig. 9

3.4 Case 2 and Case 4: Repeating the Right-Side Support

Vectors Belonging to the Positive and Negative Classes

Consider Fig. 2, for original data. Here, in this case, data points of the positive class
for input features in the range X1(0.2 to 0.4) and X2(0 to 0.5) are repeated. In total,
eight data points fall in this range. Then, the geometrical difference is observed. As
the number of the right data points of the positive class is repeated, the contour is
shifted in the upward direction from that particular location as shown in Fig. 8. Similar
behavior can be observed for the negative class, and data points of the negative class
for input features in the range X1(0 to 0.4) and X2(0.4 to 0.6) are repeated. In total,
21 data points fall in this range and the hyperplane is shifted in the right downward
direction from that particular location as shown in Fig. 10. The summary of all the
positive class and negative class cases is summarized in Table 1.
Impact of Spatial Distribution of Repeated Samples on the Geometry … 23

Fig. 8 Repeating the data

points with range X1(0.2 to
0.4) and X2(0 to 0.5), 10
times

Fig. 9 Repeating the data

points with range X1(− 0.6
to 0) and X2(0.4 to 0.6), 10
times

4 Conclusion and Future Directions

In this paper, we presented a novel point of view on the SVM by discussing the impact
of the spatial distribution of repeated samples on the geometry of hyperplanes. As
it is seen in the proposed work by repeating the number of samples on a particular
location, a specified no of times, the hyperplane can shift its position. This means that
average error can be reduced which can further reduce the misclassification of data
samples. In the future, spatial distribution of repeated samples can be implemented
on variants of SVM.
24 R. Lalit and Kapil

Fig. 10 Repeating the data

points with range X1(0 to
0.4) and X2(0.4 to 0.6), 10
times

Table 1 Impact of repeating SV at different locations of positive class on the SVM classifier
Range to repeat Range to repeat No. of Result
point of (X1 point of (X2 repeated
feature) feature) points
Case 1 − 0.6 to − 0.2 0 to 0.5 8 The hyperplane moved in the left
upward direction as shown in Fig. 6
Case 2 0.2 to 0.4 0 to 0.5 8 The hyperplane moved in the right
upward direction as shown in Fig. 7
Case 3 − 0.6 to 0 0.2 to 0.4 21 The hyperplane moved in the left
downward direction as shown in
Fig. 8
Case 4 0 to 0.4 0.4 to 0.6 21 The hyperplane moved in the right
downward direction as shown in
Fig. 9

References

1. Richhariya B, Tanveer M (2018) EEG signal classification using universum support vector
machine. Expert Syst Appl 106:169–182. https://doi.org/10.1016/j.eswa.2018.03.053
2. Eke CS, Jammeh E, Li X, Carroll C, Pearson S, Ifeachor E (2021) Early detection of Alzheimer’s
disease with blood plasma proteins using support vector machines. IEEE J Biomed Health
Inform 25(1):218–226. https://doi.org/10.1109/jbhi.2020.2984355
3. Liu Z, Lv X, Liu K, Shi S (2010) Study on SVM compared with the other text classification
methods. In: 2010 Second international workshop on education technology and computer
science. https://doi.org/10.1109/etcs.2010.248
4. Sepúlveda A, Castillo F, Palma C, Rodriguez-Fernandez M (2021) Emotion recognition from
ECG signals using wavelet scattering and machine learning. Appl Sci 11(11):4945. https://doi.
org/10.3390/app11114945
5. Raji ID, Fried G (2021) About face: a survey of facial recognition evaluation. ArXiv: Computer
Vision and Pattern Recognition. https://arxiv.org/pdf/2102.00813
Impact of Spatial Distribution of Repeated Samples on the Geometry … 25

6. Ramirez-Padron, R. (2007). A roadmap to svm sequential minimal optimization for classifica-

tion. Tutorial online.
7. Jayadeva KR, Chandra S (2007) Twin support vector machines for pattern classification. IEEE
Trans Pattern Anal Mach Intell 29(5):905–910.https://doi.org/10.1109/tpami.2007.1068
8. Arun Kumar M, Gopal M (2009) Least squares twin support vector machines for pattern
classification. Expert Syst Appl 36(4):7535–7543. https://doi.org/10.1016/j.eswa.2008.09.066
9. Wang C, Ye Q, Luo P, Ye N, Fu L (2019) Robust capped L1-norm twin support vector machine.
Neural Netw 114:47–59. https://doi.org/10.1016/j.neunet.2019.01.016
10. Jiang X, Yi Z, Lv JC (2006) Fuzzy SVM with a new fuzzy membership function. Neural
Comput Appl 15(3–4):268–276. https://doi.org/10.1007/s00521-006-0028-z
11. Tang J, Tian Y, Liu X, Li D, Lv J, Kou G (2018) Improved multi-view privileged support vector
machine. Neural Netw 106:96–109. https://doi.org/10.1016/j.neunet.2018.06.017
12. Richhariya B, Tanveer M (2020) A reduced universum twin support vector machine for
class imbalance learning. Pattern Recogn 102:107150. https://doi.org/10.1016/j.patcog.2019.
107150
13. Ganaie M, Tanveer M (2022) KNN weighted reduced universum twin SVM for class imbalance
learning. Knowl-Based Syst 245:108578. https://doi.org/10.1016/j.knosys.2022.108578
14. Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector
machines. Microsoft research technical report, 21. http://recognition.mccme.ru/pub/papers/
SVM/smoTR.pdf
IoT-Based Smart Farming
for Sustainable Agriculture

Geetan Manchanda, Bhumika Papnai, Aadi Lochab, and Shikha Badhani

Abstract The exponential growth of the population and environmental challenges

such as climate change are some of the problems that significantly impact agriculture.
Indian agriculture sector needs an efficient method for improvement in the growth of
food production simultaneously sustainably using resources. Emerging technologies
like Internet of Things (IoT) can provide India with a better and more sustainable agri-
culture sector. In this paper, we first glimpse the role of IoT in agriculture. Then, we
analyze and validate mathematically how various agricultural factors on which IoT
works affect the productivity of different crops using available agricultural datasets.

Keywords Agriculture · IoT · Irrigation · Sensor · Sustainability · Yield

1 Introduction

The agriculture sector is an indispensable sector of every country and becomes even
more important, especially for a developing country like India. Agriculture is the
primary source of livelihood for nearly 58% of India’s population and contributes
about 17% to Gross Value Added (GVA) [1]. India is among the world’s leading
producers of rice and wheat in terms of net production volume; agriculture has a
vital role in import and export as well. Many industries depend on agriculture as
it is the primary source of raw materials like cotton, jute, sugar, tobacco, oils, etc.
According to the Department for Promotion of Industry and Internal Trade (DPIIT), a
cumulative Foreign Direct Investment (FDI) equity inflow of about US$ 9.08 billion
was achieved from April 2000 to 2019 in the agriculture sector alone [2]. A significant
contribution toward any country’s growth is derived from agriculture.

G. Manchanda · B. Papnai · A. Lochab

Department of Mathematics, Maitreyi College, University of Delhi, Delhi, India
e-mail: gmanchanda@maitreyi.du.ac.in
S. Badhani (B)
Department of Computer Science, Maitreyi College, University of Delhi, Delhi, India
e-mail: shikhamalik2@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 27
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_3
28 G. Manchanda et al.

With the increasing global population estimated to touch 9.6 billion by 2050,
advancement in the agriculture sector is a must to feed the growing population [3].
However, farmers in India still use manual methods for crop monitoring, irrigation,
and other activities. These manual methods take time and sometimes cannot detect
the exact situation, leading to poor crop yield. Therefore, food security is a crucial
issue in India. According to the Food and Agriculture Organization of the UN (FAO),
it is estimated that over 189.2 million people go hungry every day in the country [4].
Adopting sustainable farming practices can increase both productivity and reduce
ecological harm as it will help produce a greater agricultural output while using less
land, water, and energy, ensuring profitability for the farmers. Sustainable agriculture
is defined as a system that helps conserve resources and reduces agricultural practices
that pose a threat to the environment [5].
The use of innovations like the Internet of Things (IoT) in farming could have the
best results against the challenges (like adverse environmental conditions, climate
change, increasing expenses, wastage of resources, etc.) in the future [6]. IoT is a
system of interrelated networks of physical tools with sensors, software, and other
technological equipment that can transfer and collect data with other devices or
systems over the internet without requiring human interference [7]. Smart farming
with big data and advanced analytics technology includes automation, adding senses
and analytics to modern agriculture. The use of technology will not only help provide
better yield and less labor effort, but it will also revolutionize agriculture for farmers
in India. The potential of IoT in the agricultural sector motivated us to explore the
same in this research work.
The significant contributions of this paper are as follows:
• We first present a review of Internet of Things (IoT) as an intelligent farming
solution that has the potential to overcome the problems faced in Indian agriculture
and stimulate sustainable agriculture.
• Then, we analyze and validate mathematically how the agricultural factors on
which IoT works affect the productivity of various crops using available agricul-
tural datasets. To validate the role of IoT in agriculture mathematically, we have
used R Studio’s [8] “agridat” package [9].
The rest of the paper is organized as follows. In Sect. 2, we describe the method-
ology of this work. Then, Sect. 3 presents the results, and Sect. 4 presents the
discussion. Lastly, we conclude in Sect. 5.

2 Methodology

We started our work by collecting information about the role and need for IoT in
Indian agriculture and its applications. Then, we experimented using R Studio’s
“agridat” package and selected some of the available datasets to statistically prove
the benefits of using IoT devices for sustainable agriculture.
IoT-Based Smart Farming for Sustainable Agriculture 29

This work was divided into two stages. In the first stage, we analyze the role of
IoT in agriculture. In the second stage, we use the “agridat” package available in
RStudio to analyze the effect of various factors on crop yield using the available
datasets.

2.1 IoT in Indian Agriculture and Its Applications

In this section, we explore how IoT can be beneficial for sustainable agriculture and
how it has the potential to overcome various problems in the agricultural sector.
Agricultural problems in India: The success of the agricultural sector depends
on various factors such as climate, irrigation, soil quality, humidity, seeds, pesticides.
The problems associated with these factors thus affect agricultural production too.
Some of the significant factors are discussed below:
• Climatic conditions: Climate change harms agricultural produce. A rise has been
noticed in all of India’s mean temperature, and the frequency of rainfalls has been
increased in the last three decades. These climatic changes are more likely to affect
the agricultural yield negatively. These changing circumstances are directing us
to monitor climatic conditions [10].
• Irrigation: Irrigation is an essential input for agriculture in every country. The
yield of a crop depends on the way how the watering of these crops is done. In a
tropical country like India, where the rainfall pattern is so uncertain and irregular,
irrigation is the only hope to sustain agriculture. However, over-irrigation has their
ill effects. Large areas of land in Punjab and Haryana have become useless due
to faulty irrigation that led to salinity, alkalinity, and water-logging [11].
• Soil Quality: Soil quality is one of the most essential components for good crop
health. Soil mismanagement and land misuse adversely affect soil health. Farming
practices like in-field burning crop residuals, excessive digging or tillage, irriga-
tion dependent on the flood, and indiscrete use of chemicals often lead to degrada-
tion of soil health [12]. The degrading soil health shows the dire need to monitor
soil health.
• Humidity: Humidity refers to the amount of water vapor present in the air.
Humidity is often expressed in terms of Relative Humidity (RH). The Relative
Humidity is the percentage of water vapor in the air at a given temperature and
pressure. Very high and very low Relative Humidity (RH) does not lead to high
grain yield. These can further contribute to more usage of pesticides which has
its ill effects [13].
• Seeds, Fertilizers, and Pesticides: The three pillars of modern agriculture consti-
tute seeds, fertilizers, and pesticides. The main task of these is to enhance agri-
cultural productivity. Seeds are the most essential input as far as agriculture is
concerned. It has been observed that still many farmers use common grain saved
from the previous crop as seed and cannot distinguish between common grain and
seed. Using common grain seeds affects productivity. Judicious and optimal use
30 G. Manchanda et al.

of fertilizers is necessary to meet the future demand for food with the increasing
population. Based on the study reports of the National Institute of Agricultural
Economics and Policy (NIAP), one-third of the major states apply excess nitrogen
and two-thirds of them apply nitrogen below the optimum level [14]. There are
similar regional imbalances in the use of Potassium (K) and Phosphorus (P). This
further stresses the use of modern technology for the right mix of crops. In India,
a drop in the crop yield has been found due to pests including weeds, insect pests,
diseases, nematodes, and rodents, ranging from 15 to 25% causing a loss of 0.9
to 1.4 lakh crore rupees annually [15].
IoT as a solution: Crop yield is the measure of grains that are produced from a
given land of the plot. It is the most important factor in agriculture as it measures the
performance of the farmer and depicts in totality the efforts and resources invested
in the development of plants on the fields. Increasing crop yield is the main aim of
every farmer and one of the common ways to do so is effectively improving crop
management which includes preparation of soil, sowing of seeds, the addition of
manures and fertilizers, irrigation, protection from weeds, harvesting, and storage.
The above management decisions should be used efficiently in reducing losses and
improving quality. Using IoT to control and monitor devices at the farm which
eventually collects the data from the sowing of seeds to harvesting makes it an easier
task to improve the crop yield without wastage of any resource.
IoT plays a very important role in smart agriculture; IoT sensors are capable of
providing information about agriculture fields. IoT agricultural monitoring system
makes use of sensor networks that collects data from different sensors deployed at
various nodes and sends it through the wireless protocol. The primary data flow
mechanism used by sensors allows them to sense, store, present, evaluate, decide,
and control by receiving real-time data feeds on a variety of gadgets, such as smart
phones and tablets [16]. The main function of IoT gadgets is live monitoring of
environmental data in terms of temperature, moisture, and other types depending on
the sensors integrated with it, and then, farmers can implement smart farming by
getting live data feeds on various devices like smartphones, tablets, etc. The data
generated via sensors can be easily shared and viewed by agriculture consultants via
cloud computing technology [17].
Various sensors that are used in the IoT devices [15] for agriculture to gather
information are discussed below:
• Temperature Sensor: The DS18B20 temperature sensor [18] provides 9-bit to 12-
bit Celsius and it also has an alarm function with non-volatile user-programmable
upper and lower trigger points. The biggest changing range of soil temperature
is 0–40° and the optimum average range required for plant growth is 20–30 °C
[19]. The DS18B20 has 64-bit serial code which allows multiple DS18B20s to
function on the same 1-wire bus.
• Soil Moisture Sensor: The soil moisture sensor has two large exposed conductors
which function as probes for the sensor, together acting as a variable resistor.
When the water level is low in the soil, the conductivity will be low, and thus,
the analog voltage will be low and this analog voltage keeps increasing as the
IoT-Based Smart Farming for Sustainable Agriculture 31

conductivity between the electrodes in the soil changes. In this way, soil moisture
is detected by the sensor [17].
• Light Intensity Sensor: All crops react differently and have different physiologies
to deal with light intensity. Thus, the farmers need to provide sufficient light of at
least 8–10 h per day to have healthy growth. Using smart farming techniques which
include the sensor system that controls the light intensity could be a better option
as it is time-efficient. There are several types of light sensors that is photoresistors,
photodiodes, and phototransistors. They are used for the automated light intensity
monitoring process as they separate the substance of light in a growth chamber
and increase or decrease the brightness of the light to have an accurate level [20].
The use of IoT in the Indian agricultural sector has been widely promoted by the
Government of India as well. The Government of India has introduced new schemes
to help Indian farmers in the advancement of Indian agriculture utilizing the concept
of smart farming. Some of the Government initiatives and schemes are described
below:
• Mobile apps
The Government of India has launched several mobile applications for farmers which
provide information on agriculture activities, free of cost, for the benefit of farmers
and other stakeholders [21].
Crop Cutting Experiments (CCE) Agri Mobile App: This app collects crop cutting
experiment data and has a special quality as it works on both online and offline
modes. Internet is required for only installing this app on mobile and for registration.
After this data can be added to the CCE app without internet and when internet
connectivity is available, data can be pushed to the server [22].
Kisan Suvidha: This app provides information related to weather (humidity,
temperature, etc.), market prices, plant protection techniques, cold stores, godowns,
and agro-advisory section which shows messages for farmers in different local
languages. This app also directly connects the farmer with the kisan call center
where technical graduates answer farmers’ queries [23].
• Agriculture events
Following are some of the events and projects organized by the Government of India
to promote smart farming.
Agri India Hackathon: The Agri India Hackathon is organized by Pusa Krishi,
ICAR—Indian Agricultural Research Institute (IARI), Indian Council of Agricul-
tural Research (ICAR) and Department of Agriculture, Cooperation and Farmers’
Welfare, Ministry of Agriculture and Farmers’ Welfare. It is the largest virtual gath-
ering to boost up the advancements in agriculture. Agri India Hackathon discussed
precision farming including the application of sensors, WSN, ICT, AI, IoT, and
drones. Precision livestock and aquaculture are also a goal of this initiative [24].
SENSAGRI Project for Drone Based Agriculture Technology: SENSAGRI is
“SENsor-based Smart AGRIculture” formulated by the Indian Council of Agricul-
tural Research (ICAR) through the Indian Agricultural Research Institute (IARI).
32 G. Manchanda et al.

The main objective is to develop an indigenous prototype for a drone-based crop and
soil health monitoring system using hyperspectral remote sensing (HRS). It has a
feature to smoothly scout over farm fields, gathering precise information and trans-
mitting the data on a real-time basis. It will be an advantage in the farming sector
at regional/local scale for assessing land and crop health: extent, type, and severity
of damage besides issuing forewarning, post-event management, and settlement of
compensation under crop insurance schemes [25].

2.2 Agridat Package in RStudio

RStudio [8] is an open-source statistics software environment that provides a free

resource for modern statistics computing. The basic RStudio download includes a
range of core tools, but the real strength of RStudio is in the contributed packages that
extend and generalize the code language. “Agridat” is a recently contributed package
that provides access to real datasets from a large number of published agricultural
research papers. The “agridat” datasets are formatted as a data frame [9].
We selected two datasets from the “agridat” package, namely “gregory.cotton”
(a factorial experiment of cotton in Sudan) and “gumpertz.pepper” (phytophthora
disease incidence in a pepper field) [9], and performed an analysis on these datasets.

3 Results

In this section, we present and discuss the results from our analysis. We deduced the
following results after doing the statistical analysis on the datasets “gregory.cotton”
and “gumpertz.pepper” [9].
• The “gregory.cotton” package is a factorial experiment of cotton conducted in
Sudan, and it includes 144 observations on the following six variables: yield (a
numeric vector), year, nitrogen (nitrogen level), date (sowing date), water (irri-
gation amount), and spacing (spacing between plants). We analyzed the effect of
water and nitrogen level on the yield of these crops and the findings are explained
below:
(a) Yield and water—Cotton yield is very much dependent on the amount and
frequency of irrigation water [26]. The effects on yield were studied on three
irrigation levels: I1 = Light, I2 = Medium, and I3 = Heavy. The yield was
found to be maximum under heavy irrigation as depicted in Figs. 1 and 2.
(b) Nitrogen level and yield—Nitrogen is a crucial nutrient that plays important
role in the photosynthesis, growth, and yield of cotton crops [27]. The effects
on the yield were studied on two nitrogen levels: L = none/control, H = 600
rotls/feddan. The yield was found to be maximum under H nitrogen level as
shown in Figs. 3 and 4.
IoT-Based Smart Farming for Sustainable Agriculture 33

Fig. 1 Scatter plot of yield and irrigation level

Fig. 2 Boxplot of irrigation

level and yield

Fig. 3 Boxplot of nitrogen

level and yield

• The “gumpertz.pepper” package is a phytophthora disease incidence in a pepper

field and it includes 800 observations on the following six variables: field (field
factor), row (x-ordinate), quadrant (y-ordinate), disease (presence (Y ) or absence
(N) of disease), water (soil moisture percent), and leaf (leaf assay count). We
analyzed the dependency of the presence of disease in the crop on the soil moisture.
Presence of disease versus soil moisture level: Soil moisture status has an impor-
tant role in the incidence of disease in plants [28]. The presence of disease could
be detected by checking soil moisture level, i.e., plants having high soil moisture
percentage show the pattern of presence of disease as depicted in Fig. 5. Although
34 G. Manchanda et al.

Fig. 4 Scatter plot of yield and nitrogen

Fig. 5 Boxplot presence of

disease and yield

various other factors affect the presence of disease, but this can be an initial warning
sign for the same.

4 Discussion

In this section, we discussed our findings of the merits of IoT for sustainable and
advanced agriculture and derived that IoT is the need of the hour for a good crop
yield which brings some challenges.
Key findings from Agridat package datasets: The crop yield depends on irri-
gation, as depicted in Figs. 1 and 2, and if proper irrigation is not provided, it can
affect crop production. Using IoT devices such as soil sensors and cloud-based data
analytics can monitor the need for water in the soil and thereby allow farmers to
determine when they should irrigate their farms. This will not only help conserve
water but also prevent over-irrigation, which can affect yield adversely.
To minimize losses and increase efficiency in cotton plants, Nitrogen (N) fertilizer
should be applied as close as possible to the time it will be taken up by the plant, indi-
cating that cotton requires varying amounts of N throughout its growth, as depicted
IoT-Based Smart Farming for Sustainable Agriculture 35

in Figs. 3 and 4. By using smart devices, we can automate multiple processes across
our production cycle, increasing yield efficiency through automation. Example: It
will help monitor the plant’s requirements for nutrients in the soil.
If any plant was wilted, dead, or had lesions, the phytophthora disease was consid-
ered present in the plot, as depicted in Fig. 5. IoT devices help in crop management as
they can monitor crop growth and any anomalies to prevent diseases or infestations
that could harm the yield effectively.
Challenges for IoT in Agriculture Sector: Farmers cannot take full advantage
of this technology due to poor Infrastructure. There is a problem with Internet acces-
sibility in farms located in remote areas. In such cases, the monitoring systems these
farmers use become unreliable and useless. The machinery used in the implemen-
tation of the IoT system is costly. The sensors used in this system are the least
expensive, but fitting this system in the agricultural field is too costly.

5 Conclusion

In conclusion, agriculture plays a vital role in developing a country, and introducing

smart farming can be helpful for future needs. Internet of Things (IoT) is an up-
and-coming technology that works in different areas to improve time efficiency, crop
management, water management, control of disease, etc. In our findings, we found
how IoT minimizes human efforts and assists the agricultural sector in achieving
sustainable development goals. To validate that IoT affects agricultural yield and can
help achieve sustainability in agriculture, we analyzed various agricultural datasets
and validated the use of IoT in agriculture. The validation methodology presented
in this work can also aid in identifying new IoT-powered solutions by studying the
effect of various factors on agriculture through the available datasets.

Acknowledgements This research follows from the project work done as part of Summer Intern-
ship Programme (SIP) 2020–21 organized by Centre for Research, Maitreyi College, University of
Delhi.

References

1. Annual Report 2020. ICAR, Government of India, Ministry of Agriculture & Farmers Welfare.
https://icar.gov.in/sites/default/files/ICAR-AR-2020-English.pdf. Last accessed 12 Apr 2022
2. The emerging scope of agri-tech in India. https://www.investindia.gov.in/team-india-blogs/eme
rging-scope-agri-tech-india. Last accessed 12 Apr 2022
3. Balakrishna G, Nageshwara Rao M (2019) Study report on using IoT agriculture farm moni-
toring. Lect Notes Networks Syst 74:483–491. https://doi.org/10.1007/978-981-13-7082-3_
55
4. IFBN: hunger in India. https://www.indiafoodbanking.org/hunger. Last accessed 12 Apr 2022
36 G. Manchanda et al.

5. D’souza G, Cyphers D, Phipps T (1993) Factors affecting the adoption of sustainable agri-
cultural practices. Agric Resour Econ Rev 22:159–165. https://doi.org/10.1017/s10682805000
04743
6. Salecha M (2022) Smart farming: IoT in agriculture. https://analyticsindiamag.com/smart-far
ming-iot-agriculture/. Last accessed 12 Apr 2022
7. Ministry of Electronic and Information Technology: IoT Policy Document. https://meity.gov.
in/sites/upload_files/dit/files/Draft-IoT-Policy%281%29.pdf. Last accessed 12 Apr 2022
8. RStudio: RStudio: Integrated development environment for R. www.rstudio.com. Last accessed
12 Apr 2022
9. Wright K (2022) “agridat”: agricultural datasets. R package version 1.20. https://cran.r-project.
org/package=agridat. Last accessed 12 Apr 2022
10. Effect of climate change on agriculture. Press Information Bureau Government of India
Ministry of Agriculture & Farmers Welfare. https://pib.gov.in/Pressreleaseshare.aspx?PRID=
1696468. Last accessed 12 Apr 2022
11. Krar P. Parts of Haryana have salty groundwater and rains add to the salt
content. https://economictimes.indiatimes.com/news/economy/agriculture/parts-of-haryana-
have-salty-groundwater-and-rains-add-to-the-salt-content/articleshow/71342070.cms
12. Soil health is degraded in most of regions of India. https://www.livemint.com/news/india/-
soil-health-is-degraded-in-most-regions-of-india-11595225689494.html. last accessed 12 Apr
2022
13. Agrometeorology: relative humidity and plant growth. https://agritech.tnau.ac.in/agriculture/
agri_agrometeorology_relativehumidity.html. Last accessed 12 Apr 2022
14. Raising agricultural productivity and making farming remunerative for farmers. https://www.
niti.gov.in/sites/default/files/2019-08/RaisingAgriculturalProductivityandMakingFarmingR
emunerativeforFarmers.pdf. Last accessed 12 Apr 2022
15. Vennila S, Lokare R, Singh N, Ghadge SM, Chattopadhyay C (2022) Crop pest surveil-
lance and advisory project of Maharashtra. https://farmer.gov.in/imagedefault/handbooks/Boo
KLet/MAHARASHTRA/20160725144307_CROPSAP_Booklet_for_web.pdf. Last accessed
12 Apr 2022
16. Meera SN, Kathiresan C (2022) Internet of Things (IoT) in agriculture industries. https:/
/aphrdi.ap.gov.in/documents/Trainings@APHRDI/2017/8_aug/IOT/ShaikNMeera1.pdf. Last
accessed 12 Apr 2022
17. Nayyar A, Puri V (2017) Smart farming: IoT based smart sensors agriculture stick for live
temperature and moisture monitoring using Arduino, cloud computing & solar technology.
In: Communication and computing systems—proceedings of the international conference on
communication and computing systems, ICCCS 2016, pp 673–680. https://doi.org/10.1201/
9781315364094-121
18. DS18B20+T&R. https://in.element14.com/maxim-integrated-products/ds18b20-t-r/temper
ature-sensor-0-5deg-c-to/dp/2515605. Last accessed 12 Apr 2022
19. Aniley AA, Kumar N, Kumar A (2017) Soil temperature sensors in agriculture and the role of
nanomaterials in temperature sensors preparation. Int J Eng Manuf Sci 7:2249–3115
20. Lakhiar IA, Jianmin G, Syed TN, Chandio FA, Buttar NA, Qureshi WA (2018) Monitoring and
control systems in agriculture using intelligent sensor techniques: a review of the aeroponic
system. J Sensors 2018. https://doi.org/10.1155/2018/8672769
21. Mobile apps empowering farmers. https://www.manage.gov.in/publications/edigest/dec2017.
pdf. Last accessed 12 Apr 2022
22. Crop cutting experiment. http://krishi.maharashtra.gov.in/Site/Upload/Pdf/CCE_App_Tut
orial_Primary_Worker_Maharashtra.pdf. Last accessed 12 Apr 2022
23. Kisan Suvidha. http://mkisan.gov.in/aboutmkisan.aspx. Last accessed 12 Apr 2022
24. Agri India Hackathon. https://innovateindia.mygov.in/agriindiahackathon/. Last accessed 12
Apr 2022
25. Agricultural situation in India. https://eands.dacnet.nic.in/PDF/August2016.pdf. Last accessed
12 Apr 2022
IoT-Based Smart Farming for Sustainable Agriculture 37

26. Hunsaker DJ, Clemmens AJ, Fangmeier DD (1998) Cotton response to high frequency surface
irrigation. Agric Water Manag 37:55–74. https://doi.org/10.1016/S0378-3774(98)00036-5
27. Nitrogen fertility and abiotic stresses management in cotton crop: a review
28. Bowers JH (1990) Effect of soil-water matric potential and periodic flooding on mortality
of pepper caused by Phytophthora capsici. Phytopathology 80:1447. https://doi.org/10.1094/
phyto-80-1447
ELM-Based Liver Disease Prediction
Model

Charu Agarwal, Geetika Singh, and Anurag Mishra

Abstract The liver is an important part of the digestive system. Unfortunately, an

unhealthy diet affects the liver and it kills more than 2 million people worldwide every
year. With so many people affected, it is important to develop computational disease
prediction models. In the present work, we develop liver disease prediction model
based on ELM classifier to classify whether a patient is suffering from liver disease
or not based on the data. The dataset used in the work is a standard dataset known
as Indian liver patient dataset (ILPD) which is freely available. The computational
results show that the proposed model outperforms other existing models.

Keywords Liver disease · Extreme learning machine · Activation function ·

Accuracy score · Feedforward neural network

1 Introduction

The Greek word ‘Hepar’ means liver which is the largest gland present in the human
body. A cone-like structure is present on top of the stomach, protected by a rib cage.
Being a vital digestive organ, it is necessary to maintain a healthy liver. A healthy
liver is a key to a healthy body of a person. Unfortunately, because of change in
lifestyle and dietary pattern that involves the intake of junk and canned food, tend to
impact the liver causing it to lose its ability to work efficiently.

C. Agarwal (B)
Ajay Kumar Garg Engineering College, Dr. A.P.J. Abdul Kalam Technical University, Ghaziabad,
Uttar Pradesh, India
e-mail: agarwalcharu@akgec.ac.in
G. Singh
KIET Group of Institutions, Dr. A.P.J. Abdul Kalam Technical University, Ghaziabad, Uttar
Pradesh, India
A. Mishra
Deendayal Upadhyay College, University of Delhi, Delhi, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 39
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_4
40 C. Agarwal et al.

Liver diseases can be classified into four stages: inflammation, fibrosis, cirrhosis,
and liver failure [1]. Inflammation is the initial stage in which the tissue of the body
tends to swell and is also known as hepatitis. Hepatitis is a viral infection of five
types, i.e., A, B, C, D, E. The second stage of liver disease is fibrosis in which
mild scarring of tissues starts to appear. The third stage is cirrhosis, and the late-
stage scarring of liver tissues happens which is permanent. Cirrhosis can be further
classified into compensated and decompensated stages. Compensated cirrhosis is
asymptomatic, and the decompensated cirrhosis is symptomatic in which the liver
is unable to function well. Liver Failure, which is the last stage, is life-threatening.
The only available treatment is an expensive liver transplant.
Rapid recognition of liver illness is beneficial to a person’s ability to live a healthy
life. Liver Function Test (LFT) and Imaging can be done for diagnosis of disease.
The blood sample of a person is collected and a report is generated after analyzing
the sample which includes parameters like sgpt, sgot, total bilirubin, etc. Based on
these parameters, the hepatologist prescribes medication and precautionary measures
which can treat the individual.
In India, liver disease is the tenth leading cause of death and the second major
reason for death in the USA. It was found that approximately two million people die
due to one or the other liver illness [2].
From the above facts, it can be inferred that manual analysis by hepatologists
would be a tedious and difficult task. Manual analysis is also error prone. To help
the medical community, we can build fully automated analytical systems using a
variety of advanced technologies to deliver efficient and accurate results. Various
machine learning algorithms can be used to develop such models. Many researchers
have worked on the development of the model, as shown below.
The ILPD dataset was modified from Geetha and Arunachalam [3] to evaluate the
effectiveness of SVM and LR algorithms in diagnosing liver disease. The authors
examined SVM and LR for accuracy, precision, sensitivity, and specificity and found
that SVM had 75.04% higher accuracy (73.23%) than LR.
Various researchers examined different machine learning classification algorithms
for liver disease prediction [4–10]. They applied algorithms such as SVM, LR, KNN,
RF, DT and computed the values of classification accuracy. After a thorough study
of the literature, it is clear that there is a need to develop computer-based models that
can more accurately predict liver disease with less computational effort.
In this paper, we used extreme learning machines (ELMs) as classifiers to propose
a more computationally efficient and accurate machine learning-based model for
liver disease prediction. ELM is a fast single-layer feedforward neural network with
good generalization capabilities. ELM has already been used successfully for various
other classification task such as ECG classification [11], fingerprint recognition [12],
watermarking [13], face mask detection [14]. The proposed model is trained and
tested on the ILPD dataset. We also examined ELM performance using different
activation functions. This is because networks help understand complex data, taking
input from a previous layer and transforming it into a format that is used as input
to the next layer. ELM applies a nonlinear activation function to transform it into a
linear system. The activation functions used in this model are Sigmoid, Relu, Leaky_
ELM-Based Liver Disease Prediction Model 41

Relu, Tanh, Sin, Tanhre, and Swish with different sets of hidden neurons (8, 16, 32,
64, 128, 512, and 1024). The main contribution of the current experimental work is to
find out the possibility of detecting liver diseases using ELM algorithms. This paper
is organized as follows. Section 2 presents the mathematical formulation of ELM
and its activation function. Section 3 presents details of the ILPD dataset. Section 4
presents a proposed methodology. Section 5 presents results and discussion. Finally,
Sect. 6 concludes the work.

2 Extreme Learning Machine

Extreme Learning Machine, abbreviated as ELM, is a single hidden-layer neural

network [15, 16]. It learns faster than gradient-boosted based algorithms. Weights
between hidden and output layers are the only parameters that need to be learned.
Because ELM learns without iterations, it converges much faster than standard
algorithms.
This means that a single hidden-layer feedforward network consisting of a finite
number of neurons can approximate a continuous function with a compact subset
due to the loose activation function assumption.

3 Dataset

For our study, we have used Indian Liver Patient Dataset (ILPD) [17] from the Kaggle
repository as this is the only dataset that is available freely for the research fraternity.
This dataset has 11 attributes. This is a standard dataset which is used for liver disease
prediction by many researchers [18, 19]. This dataset contains 583 records of which
167 were records of healthy patients and 416 were suffering from liver disease.

4 Proposed Methodology

The methodology presented in this study uses an extreme learning machine as a

classifier to identify whether patients have liver disease based on their data. The
preprocessed data are divided into a training set and a test set. ELM algorithm is then
trained using training dataset to make predictions. Figure 1 shown below depicts
the framework of the proposed methodology for detecting liver disease. Finally, the
performance of the model is analyzed.
42 C. Agarwal et al.

Fig. 1 Framework of the proposed methodology

The steps involved in the proposed framework are as mentioned below:

Step 1. Reading and Uploading of Dataset
The ILPD dataset as mentioned above in Sect. 3 is collected from the data repository
and provided as input to the system.
Step 2. Preprocessing of Dataset
(a) Null Check: The attributes having no value are handled. This can be done either
by removing those records or replacing them by mean, average value. Four null
values were found in the ILPD dataset and replaced with a mean value which
is 0.94.
(b) Duplicate Check: In this, the duplicate values are handled. Thirteen duplicate
values existed in the ILPD dataset, which was dropped for retrieving the clean
dataset.
(c) Feature Scaling: The categorical values are handled in feature scaling. The
attribute ‘gender’ consists of categorical values ‘Female’ and ‘Male’. These
categorical values were replaced by 0 and 1, respectively.
Step 3. Splitting of Dataset
The dataset is split in a certain ratio into training and test data during this phase. In
our study, we split the dataset into two ratios 80:20 where 80% of data is used to
train ELM and 20% is used for testing the model and 70:30 where 70% of data is
used to train ELM and 30% is used for testing the model.
Step 4. The algorithm used for building the proposed model is Extreme Learning
Machine (ELM). First, the ELM is trained with the training data and then tested with
the test data.
Step 5. We analyzed the performance based on accuracy score, precision, recall,
F1-score [20], and training time of the algorithm.
ELM-Based Liver Disease Prediction Model 43

5 Result and Discussion

We compile the performance parameter values for different number of hidden neurons
and activation functions. The present experiment is carried out in three categories as
mentioned below:

5.1 Performance Analysis of ELM Using 80:20

Training–Testing Ratio

In the first experiment, we considered 80% data for training the ELM and 20% data
for testing the ELM. Figure 2 depicts the graphical representation of accuracies of
various activation functions with respect to different numbers of hidden neurons.
A Relu activation function with 32 neurons and TanHRe with 128 neurons give a
maximum accuracy of 77.77%. Then, 77.19% accuracy is obtained using 512, 512,
and 32 neurons from Sigmoid, Leaky_Relu, and Swish, respectively.
The precision score, recall, and F1-score for all the activation functions for
different hidden neurons for the 80:20 data split are shown in Tables 1, 2, and 3,
respectively.
We also analyzed the model based on training time taken by ELM using different
activation functions and concluded that training time is less than 1 s for all the cases
for an 80:20 data split.

Fig. 2 Depicts the accuracy of 80:20 data split

44 C. Agarwal et al.

Table 1 Precision for 80:20 data split

Sigmoid Relu Leaky Relu tanh sin TanHRe Swish
8 1 0.57 0.6 0.5 0.43 0.57 0.6
16 0.5 0.63 0.55 0.54 0.45 0.66 0.6
32 0.75 0.7 0.71 0.72 0.54 0.66 0.66
64 0.83 0.66 0.64 0.63 0.60 0.68 0.59
128 0.85 0.6 0.65 0.56 0.51 0.68 0.6
512 0.733 0.53 0.62 0.47 0.44 0.53 0.57
1024 0.66 0.48 0.52 0.45 0.42 0.5 0.57

Table 2 Recall for 80:20 data split

Sigmoid Relu Leaky Relu tanh sin TanHRe Swish
8 0.21 0.08 0.06 0.45 0.41 0.23 0.06
16 0.1 0.26 0.21 0.26 0.30 0.21 0.13
32 0.06 0.3 0.21 0.17 0.28 0.26 0.3
64 0.10 0.30 0.23 0.30 0.36 0.23 0.28
128 0.13 0.26 0.28 0.28 0.41 0.32 0.26
512 0.23 0.30 0.39 0.39 0.34 0.34 0.23
1024 0.26 0.43 0.45 0.34 0.36 0.34 0.41

Table 3 F1-score for 80:20 data split

Sigmoid Relu Leaky Relu tanh sin TanHRe Swish
8 0.04 0.15 0.11 0.47 0.42 0.33 0.11
16 0.17 0.36 0.31 0.35 0.36 0.32 0.21
32 0.12 0.42 0.33 0.28 0.37 0.37 0.41
64 0.19 0.41 0.34 0.41 0.45 0.35 0.38
128 0.22 0.36 0.39 0.37 0.45 0.44 0.36
512 0.36 0.38 0.48 0.42 0.3 0.42 0.33
1024 0.37 0.45 0.48 0.39 0.39 0.41 0.48

5.2 Performance Analysis of ELM Using 70:30

Training–Testing Ratio

In the second experiment, we considered 70% data for training the ELM and 30%
data for testing the ELM. Figure 3 depicts the graphical representation of accuracies
of various activation functions with respect to different numbers of hidden neurons.
The highest accuracy for the 70:30 data split is 78.36% shown by the sigmoid
function for 32 hidden neurons. The precision score, recall, and F1-score for all the
ELM-Based Liver Disease Prediction Model 45

Fig. 3 Depicts the accuracy of the 70:30 data split

Table 4 Precision for 70:30 data split

Sigmoid Relu Leaky Relu tanh sin TanHRe Swish
8 1 0.4 0.5 0.33 0.21 0.42 0.37
16 0.57 0.5 0.45 0.38 0.42 0.42 0.41
32 0.63 0.55 0.47 0.42 0.42 0.5 0.55
64 0.53 0.47 0.42 0.48 0.35 0.46 0.48
128 0.44 0.44 0.46 0.42 0.41 0.43 0.5
512 0.47 0.52 0.46 0.41 0.4 0.48 0.5
1024 0.52 0.45 0.44 0.425 0.38 0.45 0.47

activation functions for different hidden neurons for the 70:30 data split are tabulated
in Tables 4, 5, and 6, respectively.
For the 70:30 data split, training time was also analyzed and concluded that
training time is less than 1 s for all the cases.

5.3 Comparison with Other Published Works Done

in the Same Domain

Table 7 showcases the comparison of our experimental results with the results of
work done by other authors based on accuracy.
46 C. Agarwal et al.

Table 5 Recall for 70:30 data split

Sigmoid Relu Leaky Relu tanh sin TanRe Swish
8 0.05 0.1 0.15 0.17 0.15 0.15 0.22
16 1 0.2 0.12 0.25 0.3 0.27 0.175
32 0.17 0.27 0.25 0.3 0.27 0.32 0.25
64 0.2 0.27 0.3 0.32 0.25 0.3 0.32
128 0.2 0.27 0.37 0.37 0.35 0.35 0.35
512 0.22 0.5 0.5 0.36 0.37 0.45 0.45
1024 0.3 0.45 0.47 0.42 0.42 0.55 0.52

Table 6 F1-score for 70:30 data split

Sigmoid Relu Leaky Relu tanh sin TanHRe Swish
8 0.09 0.16 0.23 0.22 0.17 0.22 0.28
16 0.17 0.28 0.19 0.3 0.35 0.33 0.24
32 0.27 0.36 0.32 0.35 0.33 0.39 0.34
64 0.29 0.34 0.35 0.38 0.29 0.38 0.38
128 0.27 0.33 0.41 0.33 0.37 0.42 0.41
512 0.3 0.51 0.48 0.39 0.38 0.46 0.47
1024 0.38 0.45 0.45 0.42 0.4 0.5 0.5

Table 7 Accuracy values of different methods executed on ILPD dataset

References ML technique Accuracy (%)
[3] LR, SVM 75.04
[4] LR, SVM, KNN 73.97
[5] J48, LMT, RF, RT, REPTree, DS, Hoeffding Tree 70.67
[6] K-Nearest Neighbor 75.19
K-Means
Naive Bayes
C5.0
Random Forest
[7] PSO, DT 69.6
[8] SVM and BPN 73.2
[9] DT and NB 67.01
[10] K-Nearest Neighbor + Random Forest + Logistic 77.58
Regression
Proposed methodology ELM 78.36
ELM-Based Liver Disease Prediction Model 47

From the table above, we can see that the proposed methodology based on the
ELM classifier shows an accuracy of 78.36. This is the highest precision achieved
compared to other published studies in the same field. Therefore, we can conclude that
liver disease detection models designed with ELM as a classifier have been proven
to be optimal for prediction and can be used in the healthcare field. The current work
can be further extended by applying the proposed model to other datasets as well.

6 Conclusion

As we know, early detection of liver disease can help a person live longer. Manual
analysis of liver disease is a laborious task, so medical departments can use machine
learning models to predict liver disease. In this post, we used Extreme Learning
Machine (ELM) classifiers to build a liver disease prediction model. It is a fast
learning algorithm compared to other neural networks because it uses a feedfor-
ward method instead of backpropagation. ELM is generally preferred over other
methods for AI-related challenges due to its high speed, good generalization, and
ease of implementation. Our work comprises training ELM classifier using various
activation functions and analyzing the performance of our model with different
number of neurons on ILPD dataset first with 80:20 training:testing data ratio, then
70:30 training:testing data ratio, and lastly comparison with other author’s work. We
concluded that the highest accuracy was shown by the sigmoid activation function
with 32 neurons, which is 78.36% for a data split of 70:30.

References

1. https://www.healthline.com/health/liver-failure-stages
2. Asrani SK, Devarbhavi H, Eaton J, Kamath PS (2019) Burden of liver diseases in the world. J
Hepatol 70(1):151–171
3. Geetha C, Arunachalam A (2021) Evaluation based approaches for liver disease prediction
using machine learning algorithms. In: International conference on computer communication
and ınformatics (ICCCI), pp 1–4
4. Thirunavukkarasu K, Singh AS, Irfan M, Chowdhury A (2018) Prediction of liver disease using
classification algorithms. In: 4th International conference on computing communication and
automation (ICCCA), pp 1–3
5. Nahar N, Ara F (2018) Liver disease prediction by using different decision tree techniques. Int
J Data Min Knowl Manage Process:1–9
6. Kumar S, Katyal S (2018) Effective analysis and diagnosis of liver disorder by data mining.
In: International conference on ınventive research in computing applications (ICIRCA), pp
1047–1051
7. Hashem S et al (2018) Comparison of machine learning approaches for prediction of advanced
liver fibrosis in chronic Hepatitis C patients. IEEE/ACM Trans Comput Biol Bioinf 15(3):861–
868
8. Sontakke S, Lohokare J, Dani R (2017) Diagnosis of liver diseases using machine learning. In:
International conference on emerging trends & ınnovation in ICT (ICEI), pp 129–133
48 C. Agarwal et al.

9. Alfisahrin SNN, Mantoro T (2013) Data mining techniques for optimization of liver disease
classification. In: International conference on advanced computer science applications and
technologies, pp 379–384
10. Gogi VJ, Vijayalakshmi MN (2018) Prognosis of liver disease: using machine learning
algorithms. In: International conference on recent ınnovations in electrical, electronics &
communication engineering (ICRIEECE), pp 875–879
11. Yang J, Xie S, Yoon S, Park D, Fang Z, Yang S (2013) Fingerprint matching based on extreme
learning machine. Neural Comput Appl:435–445
12. Kim J, Shin H, Shin K, Lee M (2009) Robust algorithm for arrhythmia classification in ECG
using extreme learning machine. BioMedical Engineering OnLine
13. Mishra A, Agarwal C, Chetty G (2018) Lifting wavelet transform based fast watermarking of
video summaries using extreme learning machine. In: 2018 International joint conference on
neural networks (IJCNN), Rio de Janeiro, Brazil, pp 1–7. https://doi.org/10.1109/IJCNN.2018.
8489305
14. Agarwal C, Itondia P, Mishra A (2023) A novel DCNN-ELM hybrid framework for face mask
detection. Intell Syst Appl 17:200175, ISSN 2667-3053https://doi.org/10.1016/j.iswa.2022.
200175
15. Zhang R, Lan Y, Huang G-B, Xu Z-B (2012) Universal approximation of extreme learning
machine with adaptive growth of hidden nodes. IEEE Trans Neural Netw Learn Syst 23(2):365–
371
16. Guang-Bin H, Qin-Yu Z, Chee-Kheong S (2006) Extreme learning machine: Theory and
applications. Neurocomputing:489–501
17. Moody GB, Mark RG (2001) The impact of the MIT-BIH arrhythmia database. IEEE Eng in
Med and Biol 20(3):45–50
18. Singh G, Agarwal C (2023) Prediction and analysis of liver disease using extreme learning
machine. In: Shakya S, Du KL, Ntalianis K (eds) Sentiment analysis and deep learning.
Advances in ıntelligent systems and computing, vol 1432. Springer, Singapore. https://doi.
org/10.1007/978-981-19-5443-6_52
19. Singh G, Agarwal C, Gupta S (2022) Detection of liver disease using machine learning
techniques: a systematic survey. https://doi.org/10.1007/978-3-031-07012-9_4
20. Grandini M, Bagli E, Visani G (2020) Metrics for multi-class classification: an overview. ArXiv
abs/2008.05756
Intercompatibility of IoT Devices Using
Matter: Next-Generation IoT
Connectivity Protocol

Sharat Singh

Abstract The market for IoT devices is massive, with hundreds of companies devel-
oping a variety of IoT devices; however, due to different methods and software tech-
nologies used to develop these products, all of these devices do not necessarily work
together in a seamless manner. The Connectivity Standard Alliance (CSA) came up
with the concept and created “Matter,” an open standard for all Internet of Things
(IoT) devices. This serves as a universal connectivity standard, making it easier to
use and manage IoT devices. This paper examines the implementation, necessity,
and impact of this new protocol as the next generation of IoT connectivity protocol.

Keywords Matter · Thread · Open standard protocol · Internet of Things

1 Introduction

The term “Internet-of-Things” (IoT) is frequently used in today’s world because it is

at the heart of the most recent technological advancements and developments toward
Industry 4.0, with an estimated compound annual growth rate (CAGR) of 10.53%
between 2022 and 2027 [1]. Kevin Ashton coined the term IoT in 1991, and the
IoT ecosystem has grown exponentially since then, with use cases including smart
appliances in homes (sensors, automation) and industries (logistics chains, robotic
automation, augmenting human labor) [2].
With so many firms and companies having a share in this huge market, the usage of
these devices is dependent on the services and software that the developing companies
provide, which can be in the form of Desktop Application or Smartphone Apps in a
variety of operating systems like Android, iOS, etc. All of these devices operate on
their own instruction sets and protocols, making interoperability and compatibility
of IoT devices from different companies very difficult.

S. Singh (B)
Department of Electronics, Deen Dayal Upadhyaya College, University of Delhi, New
Delhi 110078, India
e-mail: therealsharat@ieee.org

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 49
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_5
50 S. Singh

1.1 Current Alternatives

Right now, these are the current approaches possible to make different/natively
incompatible devices to work together.
• Home automation hubs: One popular approach is to use a central hub that connects
to all of your smart devices and allows you to control them from a single app or
interface [3]. Examples of popular home automation hubs include Amazon Echo,
Google Home, and Apple HomeKit.
• APIs: Many smart devices come with APIs that allow developers to interact with
them programmatically. This means that one can use code to connect different
devices together and create custom automation. For example, you could use an
API to connect a smart thermostat to a smart lighting system, so that when the
temperature drops below a certain level, the lights automatically turn on.
• If–This–Then–That (IFTTT): IFTTT is a web-based service that allows us to
create custom “applets” that connect different devices and services together [4].
For example, you could create an applet that automatically turns off your smart
lights when you leave your home, as determined by your phone’s location.
• Zigbee and Z-Wave: Zigbee and Z-Wave are wireless communication protocols
specifically designed for home automation [5, 6]. These protocols allow devices to
communicate with one another, making it possible to create complex automation
and control all devices from a central hub.

2 Need for Matter

2.1 Smart Home Ecosystems

The standard for Smart Home IoT devices’ management and control is Smart Home
Ecosystems such as Apple Homekit, Google Home, Amazon Alexa, Samsung Smart
Things. These ecosystems connect, consolidate, group, and manage the smart devices
with great ease due to agreement with manufacturers for a common protocol and
development guides provided by these ecosystems.
All of the various ecosystems need their own applications, and every smart device
needs a device application of their own. These device applications are based on
various ecosystems for seamless connectivity and management. The same device,
but different applications, can be used to control all of these devices from various
ecosystems. A device from another ecosystem cannot be detected by or connected
to by any ecosystem.
Intercompatibility of IoT Devices Using Matter: Next-Generation IoT … 51

Fig. 1 Setup of different smart home ecosystems with their own set compatible smart devices

2.2 Consumer and Manufacturer Standpoint

Manufacturers of smart appliances are required to design their products to integrate

with the smart home ecosystem. This means that development for various platforms
must be done concurrently in order to support multiple ecosystems, which costs
money, time, and human resources.
Customers are also perplexed about the ecosystems that different smart appliance
options support. Customers are limited to the selection of products designed for the
specific smart home ecosystem if they already own devices with that ecosystem. This
is a significant setback for the market for smart devices and the driving force behind
the demand for inter-compatible devices (Fig. 1).

2.3 Security

Security is a critical concern in IoT because these devices often handle sensitive data
and may be used to control critical systems.
There are a number of important factors to take into account when it comes to
securing IoT systems, and because there is not a local, secure network, using the
cloud to store data from IoT devices and perform smart analysis and value added
services is unavoidable, which raises data security concerns. (IoT devices frequently
52 S. Singh

gather and transmit private data, such as identifying information or system control
information. The transmission and storage of this data must be protected).

3 What is Matter

The Matter Open Standard (formerly known as the Thread Group) is an open, royalty-
free networking protocol designed for low-power, low-bandwidth devices in the
Internet of Things (IoT) [7]. It is based on the Internet Protocol (IPv6) and is designed
to be simple, secure, and scalable, enabling devices to connect and communicate with
each other and with the cloud.
By using a mesh networking architecture, the Matter Open Standard eliminates the
need for a central hub or server and enables direct device-to-device communication.
Greater reliability is made possible because devices can still communicate even if
some of them are offline or out of range.
The protocol is intended to be low power and low bandwidth, making it suitable
for use in battery-powered devices and devices with constrained resources.
In addition to its technical capabilities, the Matter Open Standard is intended to
be open and interoperable, enabling seamless communication between devices made
by various manufacturers. Additionally, it is supported by a sizable and expanding
ecosystem of businesses and developers, ensuring that it will proceed to develop and
advance over time.

3.1 Architecture

Matter is not a protocol like WiFi, ZigBee, Thread, etc.; it is an application layer that
acts as a standard. Utilizing previously successful technologies like Google Home,
Apple Homekit, and the Connectivity Standard Alliance (formerly known as the
Zigbee Alliance), the application layer was developed in accordance with Matter
guidelines (Fig. 2).
It is constructed using the IPv6 architecture and currently only supports WiFi,
Thread, and Ethernet, with Thread aimed at low-power, resource-constrained IoT
devices like sensors, locks, etc. [9], whereas WiFi is best for high-bandwidth, active
powered smart appliances like cameras, smart hubs, etc.
All devices require to onboard this application layer on their smart devices to be
Matter-certified.
Intercompatibility of IoT Devices Using Matter: Next-Generation IoT … 53

Fig. 2 Brief architecture overview for Matter [8]

3.2 Security

Matter is an open standard for the Internet of Things (IoT) that aims to make it easier
to connect and control smart devices. It is designed to be secure, scalable, and most
important of all, communicate locally. This means that Matter-certified smart devices
do not need to upload or share any device data to the cloud, except for add-ons or
required features provided by the manufacturers that make it necessary to use the
cloud. This functionality is similar to that of Apple Home Kit.
One of the key security features of Matter is the use of secure communication
protocols. The standard defines a set of mandatory and optional security protocols
that devices must implement in order to be Matter-compliant. These protocols include
transport layer security (TLS) for encryption, as well as secure key exchange and
device authentication mechanisms.
Another important aspect of Matter’s security is the concept of a “security
domain,” which is a group of devices that share the same security credentials and
are trusted to communicate with one another. This allows for secure communica-
tion between devices within a security domain while preventing unauthorized access
from outside the domain.
Matter also includes a mechanism for device provisioning, which is the process of
securely onboarding new devices to a network. This includes securely provisioning
the device’s initial credentials using Bluetooth Low Energy (BLE) [10], as well as
any subsequent updates to those credentials.
A feature of the standard is the way that implements the device’s automatic soft-
ware updates and firmware updates, which can help to prevent device vulnerabilities
from being exploited.
54 S. Singh

Overall, Matter aims to provide a robust and secure foundation for IoT devices
to communicate and interact with one another. However, it is important to note that
security is a continuous process and implementing standard alone is not enough. A
good security practice includes regular software updates, monitoring of the device,
and an incident response plan.

4 Thread for Matter

Thread is a fundamental component of reality because it necessitates the use of PAN

for local data transmission. PAN is connected to the internet through a gateway [11].
Thread is the first industry-wide adoption of an IPv6-based low-power-consuming
mesh network with fundamentals like multiple embedded network layers, a specific
physical through network layer, and scalability with hundreds of embedded devices
(up to 32 Routers) [12].
In the context of Matter, Thread is used as the network layer, while the Matter
standard is used as the application layer. This means that Thread is responsible for
creating and maintaining the network, while Matter defines how devices can interact
with one another over that network.
Thread networks are composed of Thread Routers [13], which are devices that
have the capability to forward messages to other devices in the network. Thread
Routers are responsible for creating and maintaining the network topology, as well
as providing security and encryption for the network. Each Thread Router also acts
as a “border router” that allows devices on the thread network to communicate with
the wider Internet.
In total, there are seven types of devices within the thread IoT technology:
• Border Routers: Border Routers play a crucial role in Thread networks as they
serve as the gateway between the Thread network and other IP-based networks
such as the Internet. They are responsible for handling routing between the Thread
network and other IP networks and also provide security and network management
functions. Border Routers are typically more powerful than other devices in the
network, as they need to handle large amounts of data and maintain a secure
connection to the Internet.
• End Devices: End Devices are the simplest and most resource-constrained nodes
in a Thread network. They are typically low-power devices such as sensors, actu-
ators, and control devices. End devices communicate with the rest of the network
through Routers and are designed to be highly power-efficient and simple to use.
• Routers: Routers provide local network routing and forwarding functions and
help extend the reach of the network. They are responsible for forwarding data
between End Devices and Border Routers and also help maintain the network
topology by exchanging information with other Routers. Routers are typically
more powerful than End Devices, but less powerful than Border Routers.
Intercompatibility of IoT Devices Using Matter: Next-Generation IoT … 55

• Commissioning Devices: Commissioning Devices are devices that help add new
devices to the network and manage network security. They are typically used to
securely provision new End Devices with network keys and also to manage the
security of the network.
• Sleepy End Devices: Sleepy End Devices are low-power End Devices that spend
most of their time in a low-power sleep state and wake up periodically to commu-
nicate with the network. They are designed to be highly power-efficient and are
often used in battery-powered applications where long battery life is critical.
• Intermediate Devices: Intermediate Devices are devices that support the Thread
protocol but do not fully implement all its features. They are typically used to
provide additional functionality or to act as bridges between different types of
networks.
• Network Co-processor (NCP): An NCP is a device that provides an IP interface
to a Thread network, enabling other devices to communicate with the network
without implementing the Thread protocol themselves. NCPs are typically used
in more complex systems where the main processor does not have the resources to
handle the Thread protocol directly. They provide a convenient way to add Thread
capability to existing devices and can also help to reduce the power consumption
of the main processor by offloading some of the network processing.
Figure 3 [14] shows a brief working of a thread-based mesh network using the most
prominent types of devices that are found in a common smart home environment.
Thread also supports over-the-air updates which enable devices to update their
firmware or software automatically which can improve the security and stability of
the network.
When a device joins a Thread network, it first goes through a process called
“commissioning” to establish secure communication with the Thread Routers. Once
a device has been commissioned, it can participate in the network and communicate
with other devices on the network.
To sum up, Matter and Thread work together to provide a secure and reliable
networking solution for IoT devices. The Matter standard provides a set of rules for
how devices can interact with one another, while Thread provides the underlying
communication infrastructure to make those interactions possible.

5 Outcome

Matter provides a solution for IoT devices to increase the compatibility of smart
appliances belonging to multiple ecosystems and also simplifies development for
manufacturers. This also means that a company has to develop and focus on one
common “standard.” Additionally, the market for Matter-certified IoT devices will
grow, which will reduce user confusion because all devices will be interoperable and
able to be controlled by a single common application regardless of the supported
ecosystem.
56 S. Singh

Fig. 3 Thread network topology

As can be seen from Fig. 4, all smart devices are compatible with any smart home
ecosystem, thus making all devices work seamlessly together over a suite of software
applications.

5.1 Moving Forward

The Matter IoT Protocol, formerly known as Project CHIP (Connected Home over
IP), has the potential to bring a new level of interoperability and security to the
Internet of Things (IoT) industry.
The launch of Matter is considered to be a huge step forward for IoT, especially in
the consumer market. Cross-platform smart device integration and interoperability
will be possible, making the selection of smart devices easier and more convenient
for IoT customers all over the world, and will increase the range of features that can
be integrated on a smart device.
Matter will also enable the creation of smarter and more efficient ecosystems in a
much larger geographic region, like a Smart Security manager for Residential Soci-
eties, with smart smoke, gas, and motion sensors spread all across using thread, and
Intercompatibility of IoT Devices Using Matter: Next-Generation IoT … 57

Fig. 4 Setup of different smart home ecosystems with Matter-certified smart devices

actively powered devices like cameras, security gates on WiFi/Ethernet, all integrated
for residents.
Here are a few possible future implementations of the Matter IoT protocol:
• Smart Homes: The Matter IoT protocol could become the backbone for smart
homes, enabling seamless integration of different smart devices from different
manufacturers. This would allow homeowners to easily control and automate
their homes, regardless of the brand of their devices.
• Industrial IoT: The integration of various industrial equipments and machineries
from various manufacturers could be made possible by the use of the Matter in
industrial settings. As a result, industrial systems would operate more effectively
and safely overall.
• Health care: Matter could be used in health care, providing a secure and reliable
way for different medical devices and sensors to communicate with each other.
This would improve patient care and make it easier for healthcare professionals
to access and analyze patient data.
• Automotive: The automotive sector could use the Matter IoT protocol to enable
the integration of various in-car devices from various manufacturers. The overall
driving experience would be enhanced, and drivers’ access to and control over
vehicle data would be made simpler and will be accessible remotely as well.
• Agricultural IoT: The integration of various sensors and devices for crop manage-
ment and livestock monitoring could be made possible by the use of the Matter
in agriculture. This would increase agriculture’s productivity and sustainability
58 S. Singh

while also assisting farmers in making better decisions with their devices and
smart system environments.
These are just a few examples of the potential future implementations of the
Matter IoT protocol. With its focus on security, interoperability, and open standards,
the Matter IoT protocol has the potential to revolutionize the IoT industry and bring
new levels of efficiency and convenience to people’s lives.

Acknowledgements The author would like to thank Anshuman Singh (Roll Number: 20HEL2111)
from the Department of Electronics, Deen Dayal Upadhyaya College, University of Delhi, for his
contribution to designing the figures presented in this paper.

References

1. Internet of things (IoT) market growth, trends, covid19 impact, and forecasts (2022
2027). Available: https://www.mordorintelligence.com/industry-reports/internet-of-things-iot-
market. Accessed 15 Dec 2022
2. Aston K. That ’internet of things’ thing. Available: http://www.rfidjournal.com/articles/view?
4986. Accessed 15 Dec 2022
3. Setz B, Graef S, Ivanova D, Tiessen A, Aiello M. A comparison of opensource home automation
systems. https://doi.org/10.1109/ACCESS.2021.3136025
4. Ovadia S. Automate the internet with “if this then that” (IFTTT). https://doi.org/10.1080/016
39269.2014.964593
5. Ergen SC (2004) ZigBee/IEEE 802.15. Available: https://pages.cs.wisc.edu/~suman/courses/
707/papers/zigbee.pdf. Accessed 10 Jan 2023
6. Unwala I, Taqvi Z, Lu J (2018, April) IoT security: ZWave and thread. In: 2018 IEEE green
technologies conference (GreenTech). IEEE, pp 176–182. https://doi.org/10.1109/GreenTech.
2018.00040
7. Thread applications. Available https://www.threadgroup.org/BUILT-FOR-IOT/Smart-Home#
Application. Accessed 10 Jan 2023
8. Matter security and privacy fundamentals connectivity standards alliance documen-
tation. Available: https://csa-iot.org/wp-content/uploads/2022/03/Matter_Security_and_Pri
vacy_WP_March-2022.pdf. Accessed 10 Jan 2023
9. What is thread? Available https://www.threadgroup.org/What-is-Thread/Thread-Benefits.
Accessed 10 Jan 2023
10. How thread can work seamlessly with Bluetooth for commissioning and operation. Available:
https://www.threadgroup.org/news-events/blog/ID/196. Accessed 10 Jan 2023
11. Unwala I, Taqvi Z (2018) Thread: an IoT protocol. In: IEEE green technologies conference.
https://doi.org/10.1109/GreenTech.2018.00037
12. Kim HS, Kumar S, Culler DE. Thread/OpenThread: a compromise in low power wireless
multihop network architecture for the Internet of Things. In: Future internet. https://doi.org/
10.1109/MCOM.2019.1800788
13. Gregersen C. An expert guide to the thread and matter protocols in IoT. Available: https://
nabto.com/matter-thread-protocols-iot/. Accessed 10 Jan 2023
14. Thread in homes. Available https://www.threadgroup.org/BUILT-FOR-IOT/Smart-Home.
Accessed on 4 Feb 2023
Role of Node Centrality for Information
Dissemination in Delhi Metro Network

Kirti Jain, Harsh Bamotra, Sakshi Garg, Sharanjit Kaur, and Gunjan Rani

Abstract The topological structure of the network and the role of node centrality
need to be investigated and evaluated for information dissemination related to trans-
port services and their functionalities, to promote government policies and products,
etc., for vast reachability. Additionally, it is important to study the network reliabil-
ity in case of extreme situations like halting, power failure, overcrowding at a node,
etc. This paper aims to investigate the role of three standard measures of node cen-
trality for information dissemination in the Delhi Metro Network. We simulate the
process of information diffusion utilizing the Susceptible-Infected-Removal (SIR)
spread model through seed nodes identified as vital metro stations. Not only are
the identified central stations ideal for advertising products and disseminating vital
information, but they can also lead to chaotic situations that disrupt the metro’s
functionality.

Keywords Information diffusion · Delhi Metro Network · Social networks · Node

centrality

1 Introduction

With the rapid development of the urban metro network, especially in Delhi, the
spatial structure of the metro network has evolved gradually from a simple line
crossing and grid to complex patterns [1]. To accommodate the increasing population
and congestion, metro infrastructure requires maintenance and extension [2]. The
metro network is not only a solution to traffic congestion but also a potent medium
for information dissemination to popularize vital government policies, products,

K. Jain
Department of Computer Science, University of Delhi, New Delhi, Delhi, India
e-mail: kjain1@cs.du.ac.in
H. Bamotra · S. Garg · S. Kaur · G. Rani (B)
Acharya Narendra Dev College, University of Delhi, New Delhi, Delhi, India
e-mail: gunjanrani@andc.du.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 59
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_6
60 K. Jain et al.

schemes, etc., among a vast fraction of the population through commuters [3]. Metro
systems, even though useful for spreading information, are also prone to power
failures, natural disasters, accidents, and malicious attacks, which entail appropriate
measures to guarantee their safety and reliability.
Complex network theory has recently gained popularity in ecology, finance, social
networks, social sciences, transport systems, etc., for its capabilities to model com-
plex relationships [4]. We make use of network theory to model the complex network
of the Delhi metro, in which a node represents a metro station, and an edge between
two nodes marks a direct route between two stations. Commuters make use of the
metro to reach different destinations and meet people along the way. They propagate
the perceived information along the way and fuel the information depending upon the
topology of the network and the positioning of the nodes (stations) within a network
[1].
Information prevalence depends on the source nodes, which need to be care-
fully selected. Several studies have been conducted to identify the central nodes that
maximize the spread of information [5–7]. Quantifying node importance through
centrality provides a means to rank nodes based on their significance and influence
on others [8]. There exist well-known measures such as degree centrality, between-
ness centrality, closeness centrality, and eigenvector centrality that identify central
nodes considering the topological structure of the network [9, 10].
Existing studies on the metro network focus on its topological characteristics and
its evolution [4, 11]. Recently, Kanwar et al. carried out a complex network-based
comparative analysis between the operational Delhi Metro Network (DMop) and the
extended Delhi Metro Network (DMext). They found similar degree distributions
for both networks [12]. Although an increase in local connectivity in DMext seems
efficient for tackling congestion and managing higher transport loads, it comes at
the cost of increased vulnerability. Another case study on the metro transit system
of Shenzen City explored the effectiveness of a node using an entropy-based multi-
measure metric [13].
Motivated by this, we investigate information diffusion in the Delhi Metro Network
using different centrality measures to understand the pace of information spread and
its disruption under the failure of central nodes.

1.1 Our Contributions

We construct a simple, undirected, and unweighted network for the Delhi metro
system (referred to as DMN) to investigate the following:
(i) Determine the network model for the constructed DMN based on its topological
properties (Sect. 3.1).
(ii) Identify and investigate important metro stations pivotal for information dis-
semination using three centrality measures (Sects. 4.1 and 4.2).
(iii) Report effective centrality measure for the transport network (Sect. 4.3).
Role of Node Centrality for Information Dissemination … 61

(iv) Empirically demonstrates the vulnerability of metro network functionality

under the failure of central stations (Sect. 4.4).
Organization of the paper: Section 2 briefly describes the topological character-
istics and centrality measures. Section 3 provides the methodology that covers the
construction of DMN and information diffusion based on the SIR model. We detail
experiments and their results in Sect. 4.

2 Characteristics of the Network

We briefly explain network characteristics in two parts: fundamental structural prop-

erties and network centrality measures for node ranking [9].

2.1 Structural Properties

(i) Degree Distribution: The degree distribution . Pk of a network is defined as the

fraction of nodes with degree .k. Thus if there are .n nodes in a network and .n k
nk
of them have degree .k, we have . Pk = .
n
(ii) Average∑ Degree: It is the global property of the network and is computed as
n
.⟨k⟩ = i ki = n , where .ki , .m, and .n symbolize the degree of node .i, total
1 2m
n
edges, and the number of nodes, respectively, in the network.
(iii) Density: It is defined as the ratio of the number of edges .m over the maximum
possible number of edges in the network of order .n and is computed as .d =
2m
n(n−1)
.
(iv) Average Shortest Path length: It is defined as the average number of steps
along the∑shortest paths for all∑possible pairs of network nodes, given by
. L̄ = n d = n(n−1)
1 2
(2) 1≤i< j≤n i j 1≤i< j≤n di j , where.di j is the shortest distance
between nodes .i and . j.
(v) Average Clustering Coefficient: The clustering coefficient of a node .i with
degree .ki captures the extent to which its .ki neighbors are linked to each other
and is defined as . f i = ki (k2mi −1)
i
, where .m i represents the number of links. The
average clustering coefficient of a graph .G is simply the average of clustering
coefficients over all the nodes.
62 K. Jain et al.

2.2 Centrality Metrics

The centrality measure captures the significance (importance) of nodes considering

their topological placement in the network. In this section, we give a brief overview
of three standard measures for computing centrality scores:
(i) Betweenness Centrality (BC): It captures the importance of a node .i based on
the count of its occurrence in the shortest path between all pairs of nodes and
∑ σ (i)
is computed as .BCi = h/=i/= j hσjh j . Here, .σh j represents the total number of
shortest paths between nodes .h and . j, and .σh j (i) means the number of those
paths that pass through node .i.
(ii) Closeness Centrality (CC): This metric defines the importance of a node
considering its closeness to all other nodes. Let .di j be the length of the shortest
path between nodes .i and . j. The closeness centrality of a node .i is inversely
proportional to its average distance from other nodes, i.e., .CCi = ∑ ndi j .
j
(iii) Eigenvector Centrality (EVC): This metric considers the neighbors’ impor-
tance while computing the relevance of a node. If a node is connected to highly
important nodes, it will have a higher EVC score than a node connected to
lesser important nodes. The ∑relative eigenvector∑ centrality score (.EVCi ) of a
vertex .i is computed as . λ1 j∈M(i) EVC j = λ1 j∈G ai j EVC j , where . M(i) is
the set of all neighbors of node .i, .λ is a constant, and .ai j =1 if nodes .i and . j
are connected, else 0. This equation is defined recursively to incorporate the
importance of the vertices to which a node is connected.

3 Methodology

3.1 Construction of Delhi Metro Network

The Delhi metro rail consists of 10 color-coded lines serving 230 stations in its
current operational state.1 It is by far the largest and busiest metro rail system in
India and the second oldest after the Kolkata Metro. The station and track lines are
the basic elements of the metro network, and the stations are connected via tracks,
resulting in a complex network.
The Delhi Metro Network (DMN) is modeled as an undirected and unweighted
network and is represented as G = (V, E), where V is a set of nodes (metro stations),
and E is a set of edges (metro tracks) connecting two successive metro stations.
DMN is a sparse network with 244 lines between 230 stations whose visualization
is shown in Fig. 1 plotted using the PyViz tool.2 Note that the exact placement and

1 https://www.delhimetrorail.com/: Note that we omit NOIDA-Greater NOIDA Aqua Line and

Rapid Metro Gurugram.
2 https://pyviz.org/: Python data visualization tool.
Role of Node Centrality for Information Dissemination … 63

Fig. 1 Graph representing the Delhi Metro Network (DMN) using PyViz

Table 1 Topological properties of DMN

Network metrics Value
Number of stations 230
Number of links 244
Network density 0.0092
Average degree 2.12
Minimum and maximum degree 1, 5
Average clustering coefficient 0.0
ASPL 16.856
Radius and diameter 27, 52
ASPL—average shortest path length

positioning of the stations are not taken into account. Basic topological features of
DMN are given in Table 1 along with its degree distribution plot in Fig. 2.
The average clustering coefficient of DMN is 0.0, which affirms the fact that
neighboring stations are not connected. A high average shortest path between nodes,
a smaller clustering coefficient, and a straight line on the log-log scale of the trun-
cated degree distribution make the constructed network a candidate for the scale-free
network over random and small-world networks.
64 K. Jain et al.

Fig. 2 Degree distribution

of DMN

3.2 Information Diffusion Model

Researchers have investigated information diffusion using basic epidemic spread

models [14]. In this work, we adopt the classical Susceptible-Infected-Recovered
(SIR) model for simulating information diffusion on DMN. At any time, the observed
entity may be in one of the following states: susceptible, infected, or recovered,
where the susceptible state indicates the initial state with no information received,
open to receive information and communicate it further; the infected state implies
information received to spread it further, and the recovered state vindicates recouping
to neutralize the state (immune) permanently. The susceptible state changes to the
infected state with probability .β (transmission probability) when the susceptible
encounters a communicator (infected state). On the other hand, a communicator
may stop spreading the information with probability .γ , which is the same as the
recovered state in the SIR model.

4 Results

In this section, we describe the experiment settings and observations covering top-
ranking stations (nodes), the analysis of information diffusion through prominent
nodes, and finally, changes in the process of information diffusion after the removal
of significant nodes.
We create the network using the NetworkX library and implement the SIR model
using the NdLib package in Python (64bits, version 3.7.2). Programs are executed
on an Intel(R) Core(TM) i7, CPU @1.80GHz with 16GB RAM. In the rest of the
paper, unless otherwise specified, transmission probability .β = 0.5 and recovery rate
.γ = 0.1 are used for all the diffusion simulations (Sect. 3.2). The reported results are
averaged over 20 simulations.
Role of Node Centrality for Information Dissemination … 65

4.1 Identifying Prominent Metro Stations

In complex networks, centrality is a useful tool to measure the importance of nodes.

We identify the top-5 central stations (Table 2) in the Delhi Metro Network (DMN)
using three well-known centrality measures, namely betweenness centrality (BC),
closeness centrality (CC), and eigenvector centrality (EVC). We refrain from using
degree centrality as there is not much variation in the degree of the nodes. There is
only one station of degree 5, and the majority of stations are connected to only two
other stations.
It is clear from Table 2 that Kashmere Gate station plays an important role in
the metro network as an interchange station based on betweenness centrality. Its
prominent position as a junction for red, yellow, and violet lines is validated by its
top-most rank by BC. It is proved in the literature that stations located in the center
of the network tend to have a higher rank because of their appearance on most of
the shortest paths between the rest of the stations [15]. Additionally, CC and EVC
measures identify Rajiv Chowk station as a central station, affirming its importance
as an interchange station connecting yellow and blue lines. Note that both New Delhi
and Patel Chowk are ranked 3rd by the CC metric.
Centrality-based analysis implies that both Kashmere Gate and Rajiv Chowk
stations are crucial in the metro network. Considering node centrality while installing
new resources or maintenance of the infrastructure can be valuable and assist in the
fault-free working of metro lines.

4.2 Role of the Central Most Station on Information

Dissemination

Since metro stations are interdependent, information originating from one station
reaches the whole system due to a cascading effect. Identifying a source node using
centrality is based on the premise that a central station transmits information faster to
other stations because of its position in the network. In this experiment, we system-

Table 2 Top-5 metro stations in DMN

Rank BC CC EVC
1 Kashmere Gate Rajiv Chowk Rajiv Chowk
2 INA INA Mandi House
3 Punjabi Bagh West New Delhi Central Secretariat
4 Lajpat Nagar Patel Chowk, Central Barakhamba Road
Secretariat
5 Rajouri Garden Mandi House Patel Chowk
BC—betweenness centrality, CC—closeness centrality, and EVC—eigenvector centrality
66 K. Jain et al.

Fig. 3 Number of stations informed when information disseminates from the most central station
with varying diffusion parameters

atically assess seed (source) node selection with respect to its impact on the speed
and reach of information diffusion.
Figure 3 shows the number of stations informed when the diffusion process begins
at the most central station (Rank 1) as the seed node based on BC, CC, and EVC
(Table 2). Note that Rajiv Chowk is the most central station identified by CC and
EVC, and Kashmere Gate is reported as a top-rank node in BC. We also vary dif-
fusion parameters .β = {0.5, 0.8} and .γ = {0.1, 0.3} to understand the variation in
information spread originating from the same seed node.
It is vindicated from Fig. 3 that information dissemination starting at Rajiv Chowk
leads to the highest final reach with the maximum number of informed stations.
In contrast, the spread from Kashmere Gate leads to a relatively lower reach and
fewer informed stations. However, this observation is also true for varying diffusion
parameters (.β and .γ ) with distinct numbers of informed stations. A high value of
.β results in higher information dissipation among stations, whereas a high value of

.γ eliminates the received information from the station, resulting in fewer informed
stations. The inset table in Fig. 3 shows the total number of stations informed for two
central stations: Kashmere Gate and Rajiv Chowk, for varying diffusion parameters.
We also diffused information from the ranked fifth node and compared the max-
imum number of informed stations (M) with that of the ranked one station to affirm
the importance of the latter (Fig. 4). Higher spread from the top-ranked nodes com-
pared to the lower-ranked stations, viz. Rajouri Garden (BC), Mandi House (CC),
and Patel Chowk (EVC) endorse the said claim.
It is revealed that diffusion from stations with a high CC and EVC score signifi-
cantly contributes to the spread of information. Moreover, the speed of diffusion and
recovery from information are two opposing forces that are decisive for information
Role of Node Centrality for Information Dissemination … 67

Fig. 4 Most informed stations (M) at a time for ranked first (R1) versus fifth (R5) stations

spread. High information dispersion among stations with low eradication acceler-
ates the spread. Hence, identifying the top-central node is crucial for maintaining
the overall prevalence of the information.

4.3 Identifying Efficient Centrality Measure by Regression

Analysis

Since the centrality of the seed node is a key factor in how information spreads
throughout a network, we look into the relationship between the centrality score and
the number of stations that have information.
Information is diffused from each station with a centrality score (CS), and the
maximum (M) and total number (T) of stations informed are noted. The top row
in Fig. 5 shows a relationship between M and CS. The bottom row displays the
dependency between T and CS, with each subfigure corresponding to a centrality
metric. It is clear from the scatter plots that both M and T escalate with an increase
in centrality score. We also plotted a regression line to analyze the relationship
and observed that the CC score has a strong linear relationship with both M and T
(. R 2 = 0.83 and . R 2 = 0.77 respectively).
The stated observations are in tandem with earlier results showing that central
nodes promote the spread of information and closeness centrality is an effective
measure for information propagation.
68 K. Jain et al.

Fig. 5 Relationship of centrality scores (BC, CC, and EVC) with the maximum stations informed
(M) and the total number of stations informed (T)

4.4 Connectivity Analysis

If a station becomes dysfunctional, its load cascades through the network. Hence, it is
important to study the effect of station failure in this scenario. We removed the most
prominent station corresponding to each BC, CC, and EVC sequentially from the
network and recomputed the topological characteristics of the updated network and
top-ranked nodes by three centrality measures (Table 3). Diffusion spread is simulated
from the newly identified central node in the DMN, and comparative results for M
cases are shown for three centrality metrics in Fig. 6.
Results show that the number of informed stations (M) is reduced in the absence
of the top-ranked station for all centrality measures and assert its importance in
maintaining maximum functionality within the network.

Table 3 Properties of the updated DMN after removing the top-ranked station in the original DMN

Network metrics BC CC, EVC

Max degree 4 5
ASPL 18.62 17.56
Radius 28 27
Diameter 54 52
New top-ranked station INA INA, Kashmere Gate
Role of Node Centrality for Information Dissemination … 69

Fig. 6 Maximum number of stations informed (M) before and after the removal of first ranked
stations

5 Conclusion

This paper studies the role of node centrality in information dissemination through
the Delhi Metro Network. The network is sparse, with an average degree of two
nodes, and has almost no clustering coefficient, reflecting no connectivity within
the neighbors of a node. Three well-known centrality measures, namely between-
ness centrality, closeness centrality, and eigenvector centrality, are used to deduce
prominent stations. Rajiv Chowk and Kashmere Gate are identified as prominent
interchange stations and are crucial for the information dissemination required for
the proper functioning of the network.

Acknowledgements This work was supported by the DBT Star College Scheme at Acharya Naren-
dra Dev College, DU.

References

1. Kandhway K, Kuri J (2016) Using node centrality and optimal control to maximize information
diffusion in social networks. IEEE Trans Syst, Man, Cybern: Syst 47(7):1099–1110
2. Frutos Bernal E, Martín del Rey A, Galindo Villardón P (2020) Analysis of Madrid metro
network: from structural to hj-biplot perspective. Appl Sci 10(16):5689
3. Yadav S, Rawal G (2016) The novel concept of creating awareness about tuberculosis at the
metro stations. The Pan Afr Med J 23
4. Chen S, Zhuang D, Zhang H (2018) Urban metro network topology evolution and evaluation
modelling based on complex network theory: a case study of Guangzhou, China. MATEC web
Conf 232:01034
70 K. Jain et al.

5. Alshahrani M, Fuxi Z, Sameh A, Mekouar S, Huang S (2020) Efficient algorithms based

on centrality measures for identification of top-k influential users in social networks. Inf Sci
527:88–107
6. Ilyas MU, Radha H (2011) Identifying influential nodes in online social networks using prin-
cipal component centrality. In: 2011 IEEE international conference on communications (ICC).
IEEE, pp 1–5
7. Madotto A, Liu J (2016) Super-spreader identification using metacentrality. Sci Rep 6(1):1–10
8. Kaur S, Gupta A, Saxena R (2021) Identifying central nodes in directed and weighted networks.
IJACSA 12(8):1421–1443
9. Zaki MJ, Meira W Jr, Meira W (2014) Data mining and analysis: fundamental concepts and
algorithms. Cambridge University Press
10. Das K, Samanta S, Pal M (2018) Study on centrality measures in social networks: a survey.
Soc Netw Anal Min 8:1–11
11. Wu XT, Tse CK, Dong HR, Ho IWH, Lau FCM (2016) A network analysis of world’s metro
systems. In: International symposium on nonlinear theory and its applications (NOLTA),
Yugawara, Japan, November, pp 27–30
12. Kanwar K, Kumar H, Kaushal S (2019) Complex network based comparative analysis of Delhi
metro network and its extension. Phys A: Stat Mech Its Appl 526:120991
13. Du Z, Tang J, Qi Y, Wang Y, Han C, Yang Y (2020) Identifying critical nodes in metro network
considering topological potential: a case study in Shenzhen city-China. Phys A: Stat Mech Its
Appl 539:122926
14. Li M, Wang X, Gao K, Zhang S (2017) A survey on information diffusion in online social
networks: models and methods. Information 8(4)
15. Derrible S (2012) Network centrality of metro systems. PLOS ONE 7(7):1–10
Biometric Iris Recognition System’s
Software and Hardware Implementation
Using LabVIEW Tool

Rajesh Maharudra Patil, B. G. Nagaraja, M. R. Prasad, T. C. Manjunath,

and Ravi Rayappa

Abstract This study employs LabVIEW-generated block diagrams to demonstrate

the software implementation of an automatic biometric iris recognition system in
unconstrained contexts. The paper defines the fundamental building blocks used in
the proposed approaches, as well as the different steps involved in each contribution’s
design process. Three categories of iris recognition procedures are carried out, using
various feature extraction, classification, and matching techniques. Moreover, the
proposed methodology is compared with existing works, showing better accuracy,
lower error rates, and superior performance. The software implementation is carried
out using MATLAB, and the codes are created as .vi files in the LabVIEW environ-
ment. The simulation results are analyzed and commented upon for each contributor.
Finally, general conclusions are drawn after careful examination of all three contri-
butions. The study also proposes a hardware implementation using a microcontroller,
which has yielded excellent results.

Keywords Biometrics · Iris · Authentication · Recognition · Segmentation ·

Images · Preprocessing · Algorithms · GUIs · H/W · S/W · Database

R. M. Patil (B)
Electrical Engineering Department, SKNS College of Engineering Korti, Affiliated to Solapur
University, Pandharpur, Maharashtra, India
e-mail: rajesh.m.pati1972@gmail.com
B. G. Nagaraja
Electronics and Communication Engineering, Vidyavardhaka College of Engineering, Mysuru,
Karnataka, India
M. R. Prasad
Computer Science and Engineering, Vidyavardhaka College of Engineering, Mysuru, Karnataka,
India
T. C. Manjunath
Electronics and Communication Engineering Department, Dayananda Sagar College of
Engineering, Bengaluru, India
R. Rayappa
Electronics and Communication Engineering, Jain Institute of Technology, Davanagere,
Karnataka, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 71
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_7
72 R. M. Patil et al.

1 Introduction

Biometric iris recognition is an advanced technology that has gained significant

attention in recent years. It is a technique used to identify individuals based on
the unique patterns found in their iris, which is the colored part of the eye. Iris
recognition is a highly accurate method of identification that is widely used in various
applications, such as access control systems, forensic analysis, and border security.
The iris of the eye is a complex and highly structured tissue that contains a large
number of unique features that can be used to distinguish one individual from another.
These features include the shape and texture of the iris, as well as the patterns of the
iris muscles and blood vessels.
The development of biometric iris recognition systems has been driven by the need
for secure and reliable methods of identification. The system works by capturing an
image of the iris and then extracting the unique features to create a template that
can be stored and used for future identification. In recent years, the use of biometric
iris recognition has become more prevalent due to advancements in technology and
the need for increased security measures. The technology has been integrated into
various systems, including smartphones, ATMs, and airport security checkpoints.
The purpose of this paper is to demonstrate the software implementation of an
automatic biometric iris recognition system in unrestricted contexts. The study uses
LabVIEW-generated block diagrams and proposes various feature extraction, classi-
fication, and matching techniques to achieve higher accuracy and better performance.
The paper also proposes a hardware implementation using a microcontroller, which
has yielded excellent results.
A biometric iris recognition system typically consists of several phases, including
image acquisition, preprocessing, feature extraction, matching, and decision making.
The image acquisition phase involves capturing an image of the iris using a special-
ized camera. The camera must be capable of capturing high-quality images of the
iris, which is a complex and highly structured tissue.
The preprocessing phase involves enhancing the captured image to improve the
quality of the iris image. This phase typically includes noise reduction, segmentation,
and normalization techniques to remove any noise or unwanted elements from the
image and to standardize the image for further processing. In the feature extraction
phase, unique features of the iris are identified and extracted from the preprocessed
image. These features include the shape and texture of the iris, as well as the patterns
of the iris muscles and blood vessels. The matching phase involves comparing the
extracted features with previously stored templates to determine if a match exists.
The templates are typically stored in a database and are created during the enrollment
process, where a user’s iris image is captured and their features are extracted and
stored for future identification.
Finally, the decision-making phase involves making a decision based on the
match score obtained during the matching phase. If the match score is above a
certain threshold, the system identifies the person, and if it is below the threshold,
the system rejects the identification. Overall, the biometric iris recognition system
Biometric Iris Recognition System’s Software and Hardware … 73

involves complex algorithms and techniques to accurately identify individuals based

on the unique features of their iris. The system has found widespread use in various
applications, including access control systems, forensic analysis, and border security.
The literature on iris biometric recognition has shown that this technology has
great potential for secure authentication purposes. Iris recognition systems are able to
provide high accuracy rates and low false acceptance rates, making them suitable for a
variety of applications such as border control, banking, and access control. However,
the literature also highlights the importance of addressing certain limitations and chal-
lenges associated with iris recognition, such as the impact of varying environmental
conditions and the need for robust feature extraction techniques. Several studies
have proposed innovative solutions to overcome these challenges, including the use
of advanced image processing techniques and machine learning algorithms. Overall,
the literature suggests that iris biometric recognition technology is a promising field
for future research and development.
The article is organized into several sections that cover different aspects of the
development of an iris recognition system. Sect. 2 describes the methodology used
in the research, followed by Sect. 3 on the drawbacks of earlier research in the
field. Sect. 4 discusses image analysis and the algorithmic steps involved in image
recognition. Section 5 focuses on iris recognition procedures using DIP summaries.
Sect. 6 provides an overview of the LabVIEW concepts used in the development
of the system. The proposed contributions are discussed in Sect. 7, and the article
concludes with Sect. 8, which presents the conclusions drawn from the research.

2 Methodology

The generalized DFD for the iris detection using unconstrained environment for the
three proposed methodologies is shown in Fig. 1.
In recent years, iris recognition systems have achieved impressive recognition
rates in controlled environments. The study and implementation of iris recogni-
tion technologies have been the focus of various research communities for the past
50 years. However, most earlier research on iris recognition was limited to clear and
well-captured iris images, and the system’s effectiveness was thought to be highly
dependent on image quality. Images with lower quality and resolution, captured from
a distance, or those that contain dynamic motion can significantly reduce the perfor-
mance of iris recognition systems that are already limited in scope. A non-ideal iris
image is one that suffers from issues such as poor acquisition angles, occlusions,
pupil dilations, image blurriness, and low contrast. Most research has been confined
to restricted environments, but we have focused on addressing this gap by studying
the iris recognition problem in unrestricted settings [1–5].
74 R. M. Patil et al.

Fig. 1 Generalized DFD for the iris detection using unconstrained environment for the three
proposed methodologies
Biometric Iris Recognition System’s Software and Hardware … 75

3 Drawbacks of the Research Works Done by Earlier

Researchers

Numerous attempts have been made to create iris biometric recognition systems
for secure authentication. Many researchers have focused on developing recognition
systems in constrained environments, where the camera must be aimed directly at
the subject’s eye, the subject must look directly into the camera, there must be no
parallax, the subject’s eyes must be open for iris capturing, and adequate lighting must
be present [6–10]. However, only a few have developed iris identification systems in
unrestricted settings, which is the focus of the proposed work presented in this article.
While these algorithms are effective in constrained situations, they may not perform
well in unconstrained environments. It is important to note that for unconstrained
situations, certain limitations must be considered for the system to function properly
and accurately. We have previously covered this topic in earlier articles [11–15].

4 Image Analysis and the Image Recognition Algorithmic

Steps

The standard procedure for the enrollment segment starts with the acquisition of an
image of the iris from a high-resolution iris camera, followed by the identification
of the area of interest (ROI) from the entire image of the subject’s face.
The process of image acquisition catches people’s attention. It should be taken
into consideration that the region of inference only includes the portion of the iris,
and that single portion of the iris should be taken into account for detection purpose
[6–10]. The preprocessed original image is used for analysis and to improve system
performance. The preprocessed image is next subjected to a normalization approach
to reduce noise and improve the effectiveness w.r.t. recognized rule sets, which is
carried along with the improvement of the iris portion. Then, the next stage may
be the identification of the subject once the iris features have been retrieved and
will save a featured vector for comparing of a number of vectors in the database
of the iris collected. The six contributed works that employ the ideas of prepro-
cessing of images, edge detections, segmentations, normalization, feature, and the
classifications for the identification of any humans iris are given in Fig. 1 [11–15]

5 Recognition of Iris Procedures Utilizing Sing DIP

Summaries

Currently, a complete iris scan of a human eye is performed using an iris recognition
system (IDS) consisting of multiple block sets. Each block has a specific function and
is used in our research [5–10]. The procedural aspects of the digital image processing
76 R. M. Patil et al.

Fig. 2 Procedures utilized in the DIP processings

Fig. 3 Functional block diagram of the proposed research work

(DIP) are shown in Fig. 2, while Fig. 3 depicts the functional data flow diagram (DFD)
of the approach presented in this paper.

6 Overview of the LabVIEW Concepts

In this section, we will discuss the contributions made to the iris recognition process
using the LabVIEW tool, which is a product of National Instruments NI®. LabVIEW
is a programming language and development environment that offers an interactive
environment for designing and solving problems in various application-dependent
tasks. It includes a workspace, a command window, a primary program editor’s
window, and a location for program files, as shown in Fig. 2. Additionally, LabVIEW
Biometric Iris Recognition System’s Software and Hardware … 77

Fig. 4 DFD for the iris detection system for election purpose

Fig. 5 Developed IRS block diagram with the help of artificial neural net and LabVIEW software

offers built-in mathematical functional modules that are essential for tackling scien-
tific and engineering challenges. Figure 4 provides a diagrammatic representation
of the data flow diagram (DFD) of the iris recognition system that can be used for
electioneering, while Fig. 5 shows the developed block diagram of the iris detection
system using artificial neural networks and the LabVIEW software.

7 Proposed Contributions

This section presents the contributions of the research as three distinct entities denoted
by C1–C3, which were developed in the LabVIEW environment. The proposed block
diagrams were converted into LabVIEW scripts (.vi files) and executed, resulting in
a binary response of either yes or no indicating the recognition of the iris. The main
78 R. M. Patil et al.

objective of this section is to provide a step-by-step guide to the software implemen-

tation of the proposed algorithm(s) for iris recognition in unrestricted environments.
Two methodologies are suggested, along with the creation of a LabVIEW graph-
ical user interface (GUI). Additionally, this chapter includes discussions, necessary
observations, and justifications along with the presentation of results obtained from
all the test images.
This section presents the key components and processes involved in the proposed
approaches for iris recognition in unrestricted environments. Two distinct methods
have been developed using various feature extraction techniques, classification, and
matching methods to achieve high accuracy and low error rates compared to previous
research. LabVIEW is the software tool of choice for implementation due to its
support and add-on options. The codes are created as .vi files in the LabVIEW envi-
ronment, and the simulation results are observed and discussed for each contribution.
A microcontroller-based hardware setup for iris recognition is also described. The
flowcharts for image extraction are illustrated in Figs. 6 and 7, while Figs. 8 and
9 depict the flowcharts for template matching and pattern recognition processes.
Overall, the findings from all the contributions are analyzed to draw conclusions.
The block diagram for the hardware implementation and interfacing of the LCD
to the microcontroller is presented in Fig. 10, while Fig. 11 provides a brief overview
of the hardware aspects of the implementation process. These figures aim to provide
users with an understanding of how to conduct real-time experiments.
An automated graphical user interface has been developed for iris biometric recog-
nition using LabVIEW software. It combines various image processing concepts such

Fig. 6 Flowchart for image

extraction (Part-1)
Biometric Iris Recognition System’s Software and Hardware … 79

Fig. 7 Flowchart for image

extraction (Part-2)

Fig. 8 Flowchart for

template matching/pattern
recognition (Part-1)

as processing, segmentation, boundary detection using Sobel features, 2D and 3D

Gabor wavelet algorithms, and Haar algorithms with artificial neural networks for
classification purposes. A novel contribution of this work is the development of an
80 R. M. Patil et al.

Fig. 9 Flowchart for

template matching/pattern
recognition (Part-2)

Fig. 10 Hardware implementation block diagram

interactive GUI-based system that is user-friendly and automates the biometric recog-
nition process. The system is designed to operate in unconstrained environments,
including poor lighting conditions, images taken from angles, and long distances.
The iris recognition system has been developed using the NI LabVIEW and
NI Vision software platforms. NI LabVIEW is a popular graphical programming
language for scientific and engineering tasks. The Vision Development Module
provides a library of LabVIEW VIs called NI Vision for LabVIEW, which can
be used to develop applications for scientific imaging and machine vision. This
Biometric Iris Recognition System’s Software and Hardware … 81

Fig. 11 Interfacing of LCD

to microcontroller (hardware
implementation block
diagram)

contribution includes a GUI that demonstrates how various processes involved in iris
recognition systems, such as feature extraction and preprocessing, are implemented
using LabVIEW. Additionally, an example of an iris detection module created with
LabVIEW for electronic voting is presented.
This section describes the development of an automated GUI and the hardware
implementation of iris recognition using an ATMEL microcontroller connected to
LabVIEW. The main focus of this section is the step-by-step implementation of
the algorithm and its integration with a well-developed GUI. One methodology is
suggested in this chapter, along with the creation of a LabVIEW GUI. Additionally,
this section includes a real-time implementation of an iris identification module
used for voting procedures and one of its applications. The chapter presents the
essential observations and justifications in the form of discussions, along with the
varied outcomes achieved for all of the test images. The algorithm for the hardware
implementation is shown in Figs. 12 and 13, respectively.

8 Conclusions

Finally, an iris identification system is developed by applying the preprocessing

normalization and segmentation techniques discussed in previous sections. A test
image is selected as input using our proposed method and successfully identified in
one case, while not detected in the other case, proving the validity and effectiveness
of our methodology. The developed algorithm can take any number of test patterns
as input and determine whether they exist in the stored database. The processing
time for these six techniques is around 10 s, demonstrating the superiority of our
proposed algorithm over other alternatives.
82 R. M. Patil et al.

Fig. 12 Algorithm for the

hardware implementation—I

Fig. 13 Algorithm for the

hardware
implementation—II

In conclusion, the aim of this work was to design a biometric authentication

system that addresses the limitations of previous research on iris-based biometric
authentication. The objectives of the project have been successfully achieved, and
the identified limitations have been overcome through comparison with other studies.
A high-speed computational module for efficient iris detection using biometric means
has been developed, incorporating numerous key elements. Overall, the study report
shows that significant effort has been made to create straightforward and effective
Biometric Iris Recognition System’s Software and Hardware … 83

algorithms for iris identification in unconstrained environments using a combination

of various procedures (hybridized algorithms).

References

1. Kaur N, Juneja M (2014) A novel approach for iris recognition in unconstrained environment.
Journal of Emerging Technologies In Web Intelligence 6(2):243–246
2. Tsai Y-H (2014) A weighted approach to unconstrained ıris recognition. World Academy
of Science, Engineering and Technology International Journal of Computer and Information
Engineering, vol 8, No. 1, pp 30–33. ISSN:1307-6892
3. Roy K, Bhattacharya P, Suen CY (2010) Unideal ıris segmentation using region-based active
contour model. Campilho A, Kamel M (eds) ICIAR 2010, Part II, LNCS 6112, © Springer,
Berlin, pp 256–265
4. Raffei AFM, Asmuni H, Hassan R, Othman RM (2013) Feature extraction for different
distances of visible reflection iris using multiscale sparse representation of local Radon
transforms. Pattern Recogn 46:2622–2633
5. Jan F (2017) Segmentation and localization schemes for non-ideal iris biometric systems.
Signal Process 133:192–212
6. Shin KY, Nama GP, Jeong DS, Cho DH, Kang BJ, Park KR, Kim J (2012) New iris recognition
method for noisy iris images. Pattern Recogn Lett 33:991–999
7. Nagaraja BG, Jayanna HS (2013) Multilingual speaker identification by combining evidence
from LPR and multitaper MFCC. J Intell Syst 22(3):241–251
8. Haindl M, Krupička M (2015) Unsupervised detection of non-iris occlusions. Pattern Recogn
Lett 57:60–65
9. Karakaya M (2016) A study of how gaze angle affects the performance of iris recognition.
Pattern Recogn Lett 82:132–143
10. Barpanda SS, Majhi B, Sa PK (2015) Region based feature extraction from non-cooperative
iris images using triplet half-band filter bank. Opt Laser Technol 72:6–14
11. Proença H, Neves JC (2016) Visible-wave length iris/periocular imaging and recognition
surveillance environments. Image Vis Comput 55:22–25
12. Hu Y, Sirlantzis K, Howells G (2017) A novel iris weight map method for less constrained iris
recognition based on bit stability and discriminability. Image Vis Comput 58:168–180
13. Liu J, Sun Z, Tan T (2014) Distance metric learning for recognizing low-resolution iris images.
Neurocomputing 144:484–492
14. Alvarez-Betancourt Y, Garcia-Silvente M (2016) A key points—based feature extraction
method for iris recognition under variable image quality conditions. Knowl-Based Syst
92:169–182
15. Hajaria K, Gawandeb U, Golharc Y (2015) Neural network approach to ıris recognition in noisy
environment. In: International conference on ınformation security & privacy (ICISP2015),
Procedia Computer Science, vol 78 (2016). 11–12 Dec 2015, Nagpur, India, pp 675–682
A Unique Method of Detection of Edges
and Circles of Multiple Objects
in Imaging Scenarios Using Line
Descriptor Concepts

Rajesh Maharudra Patil, B. G. Nagaraja, M. R. Prasad, T. C. Manjunath,

and Ravi Rayappa

Abstract This paper proposes a novel method for detecting the edges and circles
of multiple objects in imaging scenarios using line descriptor concepts. The method
involves analyzing the intensity gradient and the orientation of the pixels in the image,
and using this information to identify lines and circles that are likely to correspond
to object boundaries. The proposed approach is compared with existing methods
and is shown to provide superior performance in terms of accuracy and computa-
tional efficiency. The method is particularly useful for applications such as object
recognition and tracking, where accurate detection of object boundaries is essential.
The experimental results demonstrate the effectiveness of the proposed method in
detecting edges and circles of multiple objects in different imaging scenarios.

Keywords Image · Edge · Circle · Square · Detection · Simulation · Program ·

Execution · Software · Results

R. M. Patil (B)
Electrical Engineering Department, SKNS College of Engineering Korti, Affiliated to Solapur
University, Pandharpur, Maharashtra, India
e-mail: rajesh.m.pati1972@gmail.com
B. G. Nagaraja
Electronics and Communication Engineering, Vidyavardhaka College of Engineering, Mysuru,
Karnataka, India
M. R. Prasad
Computer Science and Engineering, Vidyavardhaka College of Engineering, Mysuru, Karnataka,
India
T. C. Manjunath
Electronics and Communication Engineering Deparment, Dayananda Sagar College of
Engineering, Bengaluru, India
R. Rayappa
Electronics and Communication Engineering, Jain Institute of Technology, Davanagere,
Karnataka, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 85
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_8
86 R. M. Patil et al.

1 Introduction

In computer vision, the detection of object boundaries is an essential task for various
applications, such as object recognition, tracking, and segmentation. Detecting edges
and circles of multiple objects in imaging scenarios is a challenging problem due to
variations in object shapes, sizes, orientations, and lighting conditions. Many existing
edge and circle detection methods suffer from limitations such as sensitivity to noise,
computation time, and detection of false positives.
The detection of edges and circles in images is a fundamental task in computer
vision, with numerous applications in areas such as object recognition, tracking, and
segmentation. In this literature survey, we review some of the existing methods for
detecting edges and circles in images. The Canny edge detector is one of the most
widely used methods for edge detection. It involves convolving the image with a
Gaussian filter to reduce noise, computing the gradient magnitude and orientation,
and applying non-maximum suppression and hysteresis thresholding to obtain the
final edge map. While the Canny detector can produce high-quality edge maps, it is
computationally expensive and sensitive to the choice of parameters.
The Hough transform is a popular method for detecting circles in images. It
involves converting the image to a parameter space, where circles are represented as
curves, and then detecting peaks in the parameter space to identify the circles. While
the Hough transform can produce accurate results, it is computationally expensive and
sensitive to the choice of parameters. Detection of edges is utilized for segmenting of
the data and in extracting of the fields including image processing, computer vision,
and machine vision. The term “edge detection” refers to a group of mathematical
methods for identifying regions in digital images where there are discontinuities or,
more precisely, when the brightness of the image changes suddenly [1].
The study in [2] presents a comprehensive framework for designing and imple-
menting augmented reality (AR) guidance systems in industrial settings. It also
provides a valuable contribution to the field of AR guidance systems by offering
a comprehensive framework that considers various aspects of designing and imple-
menting AR guidance systems in industrial settings. The case studies and evaluation
demonstrate the effectiveness of the proposed framework and provide insights into
the potential benefits of AR guidance systems in improving industrial processes. The
authors in [3] introduce the Retina U-Net, which is a modification of the popular U-
Net architecture. The Retina U-Net combines the segmentation and detection tasks,
making use of the segmentation supervision to detect objects. The proposed method
achieves state-of-the-art performance on several medical object detection datasets,
including lung nodule detection and polyp detection. The Retina U-Net is compu-
tationally efficient and requires less training data compared to other state-of-the-art
methods. The paper also demonstrates the potential of combining segmentation and
detection tasks for medical object detection, which can lead to more accurate and
efficient detection systems.
Three different categories of edge exist:
• Straight outlines (horizontal in nature)
A Unique Method of Detection of Edges and Circles of Multiple Objects … 87

• Edges that are verticals

• Edges that are perpendicular
A method for dividing an image into sporadic pieces is edge detection. It’s a typical
strategy for digital picture processing, such as in the image morphology feature
extraction pattern recognition process. Edge detection can be used to determine if
a feature’s gray level has significantly changed in an image. This texture marks the
conclusion of one area of the picture and the another part’s starting information.
This will lessen the quantity of information (data) in any type of digital image while
preserving its structural characteristics. Typical edge of an object in a image—1,
2, and 3 is shown in Figs. 2, 3 and 4, respectively, at the same time, the types of
operators could be seen graphically in Fig. 1 [4] (Fig. 5).
To keep track of significant occurrences and changes in the world’s characteristics,
sharp changes in image brightness must be detected. It may be demonstrated that,
assuming very broad assumption for any image’s creation models, picture brightness

Fig. 1 Types of operators

that could be used for the
edge detection process

Fig. 2 Typical edge of an

object in a image—1
88 R. M. Patil et al.

Fig. 3 Typical edge of an object in a image—2

Fig. 4 Typical edge of an

object in a image—3

part or the discontinuities are prominently to correlate to the following parameters

as [5]:
• Discontinuities in the depth,
• Discontinuities in the surface’s orientations,
• Change in the material’s property and
• Illumination of the particular scene in an image.
A Unique Method of Detection of Edges and Circles of Multiple Objects … 89

Fig. 5 Flow chart or the

(DFD) utilized for the edge
detection process of an
object in a picture

2 Block-Diagrams/Flow-Charts

The typical edges of an object in a gray scale image which will be stored in PC’s
computer memory and is shown in Figs. 2, 3 and 4, respectively, whereas the types
of operators that could be utilized for the detection of edges of objects in a image is
shown in Fig. 1 [6].
Viewpoint dependent or viewpoint independent edges can be retrieved from a
two-dimensional picture of a three-dimensional scene. The fundamental qualities of
three-dimensional objects, such as surface marks and shape, are often reflected by a
perspective independent edge. A perspective dependent edge, which changes as the
point of view changes, frequently reflects the scene’s geometric features, for ex., the
occlusion of the objects one above the other (hidden objects) [7].
The line separating a red block from a yellow block, for example, is a typical
edge. A line, on the other hand, could be a tiny no. of image pels of a variable color
on an other-wise constant back ground (as can be retrieved by a ridge detector). As a
result, there may be any one of the edges on the either sides of a line’s in the majority
of the cases considered. The edges obtained from natural photos are rarely perfect
edges which are stepped in nature. Actually, those are typically effected by a few of
the important parameters such as [8]
• Focal-type blurs developed by the finite depth of the file d7 the finite point of the
spread function.
• Penumbral blurs that are developed due to the shadowing effects which are created
by the sources of the light.
90 R. M. Patil et al.

• Shading at a smooth object.

Several academics have chosen a Gaussian type of smoothened edges or typi-
cally an error functions, since the most straightforward modification of the idealized
edge pixel-based mathematical models are to describe the edge blur’s effects in a
majority of the science and engineering applications. Therefore, a 1-D picture could
be modeled mathematically by the mathematical model as [9]

Ir − Il x
f (x) = erf √ + 1 + Il
2 2σ

One of the important parameter, the scaled value of the sigma determines the
blurred scales for the edge. To prevent destroying the image’s actual edges, this scale
parameters could ideally be changed depended on the quality of the image or the
picture [10].

3 Chain Coding Process

In this section, we present the concepts that are used in the detection of the circles
using the digital image processing fundamentals. This section explains a three-stage
procedure for determining the location of a circle. In this section, we’ll go over a
three-step process for determining the circumference and radius of circles [11].
Some of the notions from the canny edge detection operators are used in the
algorithm that incorporates the procedure. We consider the input image to be a gray
scale image with no noise. There are no random variations in the intensity of the
noise-free image examined for input. The block diagram of the suggested algorithm,
which is a three-stage process for detecting the location of circles, is discussed in the
next section [12].

4 Approaches

The block-diagrammatic approach shown in the pictorial representation gives the

concept of the proposed methodology. The preliminary step of the processes are—the
convolution and non-maximum suppressions. Inputs to these blocks are—a noise-
based free gray scale image. The vertical and the horizontal convolution type of
kernels are utilized to get the vertical and horizontal directions for every pel in this
block. For every pel, these value will be approximated to 1 of the 8 orientations as
shown in the diagram in Table 1. The image’s edge strength is calculated using the
vertical and horizontal convolution kernels. The resultant pixel direction and edge
strength are utilized as input to the non-maximum suppression to obtain the thin
edges outlined before [13].
A Unique Method of Detection of Edges and Circles of Multiple Objects … 91

Table 1 A central pixel and its 8 surrounding neighbors

3 2 1
4 pn 0
5 6 7

In the second stage, the thin edges determined from the first. The arcs that satisfy
the conditions of being a component of the circle are contour traced using the pixel
direction of each pixel. In this stage, any spurious points or edge pixels that do not
meet the criteria for being a component of a circle are discarded. As a result, the arcs
that have a higher chance of being part of the circle are kept [14].

5 Introductory Remarks

When the objects are not polyhedral, shape analysis is a method of determining the
shape of irregular objects using two types of descriptors, viz., the line descriptors and
area descriptors. Examples of the shape analysis of objects could be circles, spheres,
ellipses, boundaries, curves, arcs, objects of irregular shapes. The first method is the
line descriptors method, which is explained as follows [15].

6 Line Descriptors

1st method of performing SA and is used for finding out the boundary or the curve
of an object’s length of a regular or irregular object in pixels and uses an encoding
scheme called as chain coding. The process of chain coding is as follows in Table 2
[16].
Chain coding is a technique of finding the length of the closed curve or open curve
in pixels by representing the curve as a sequence of chain codes a ∈ Rn (n being the
length of curve in pixels) and is a relative representation.
• p is a pixel on the boundary of an object.

Table 2 An numerical example of the chain coding process

0 0 0 0 1 1 0 0
0 0 1 1 0 0 1 0
0 1 0 0 0 0 1 0
0 1 0 0 0 0 1. 0
0 0 1 0 0 1 0 0
0 0 0 1 1 0 0 0
92 R. M. Patil et al.

• Start from the right most pixel marked with a dot, ‘.’.
• Write down the relative position of the next adjacent neighboring pixel forming
the boundary/curve.
• Repeat the process till you reach the starting position (for all the pixels on the
curve).
• The vector thus formed is called as the chain code of the curve.
• |C(a)| gives the length of the curve in pixels.
• a = [2, 2, 3, 4, 5, 4, 5, 6, 7, 7, 0, 1, 1]T
• |C(a)| = 13 pixels.
The developed chain coding process (CCP) is invariant to translations and is
variant to rotations. Since chain coding is a relative representation, the chain code
does not change for a curve of identical shape, but, at a different location, translated
by some amount. For rotations, the length of the curve will be the same, but the chain
code of the curve will be different since the starting point will be changed. If the
number of transitions = number of pixels, it is called as an closed curve; if the no.
of transitions is > the no. of pixels, then it is an open curve, i.e., nT = np − 1. Use
of chain coding are in signature verification in banks and in character recognition. A
numerical example is shown in Table 2, whereas the central pixel surrounded by 8
neighbors is shown in Table 1. Then, the three stages of the developed circle detection
algorithmic approach is mentioned below in the form of a program as follows [17].

7 Program/Algorithm Developed

Program:
%% Read Image
Inputimage=imread(‘a.jpg’);
%% Show the image
figure(1)
imshow(Inputimage);
title(‘I/P THE IMAGE CONSIDERING THE NOISEs’)
%% Convert to gray scale
if size(Inputimage,3)==3% RGB image
Inputimage=rgb2gray(Inputimage);
End
%% Convert to binary image
threshold = graythresh(Inputimage);
Inputimage=~im2bw(Inputimage,threshold);
%% Remove all the types of objects that are containing < 30 pixels
Inputimage = bwareaopen(Inputimage,30);
pause(1);
%% Label connected components
[L Ne]=bwlabel(Inputimage);
A Unique Method of Detection of Edges and Circles of Multiple Objects … 93

propied=regionprops(L,‘BoundingBox’);
imshow(~Inputimage);
hold on
for n = 1:size(propied,1)
rectangle(‘Position’,propied(n).BoundingBox,‘EdgeColor’,‘g’,‘LineWidth’,2)
end
hold off
pause (1);
%% Objects extraction
figure
for n=1:Ne
[r,c]=find(L==n);
n1=Inputimage(min(r):max(r),min(c):max(c));
imshow(~n1);
pause(0.5)
end
Output:

The outputs of the program for a single finger, two finger, three finger, four finger,
and a five finger of a human hand is shown in Figs. 6, 7, 8, 9 and 10, respectively.

Fig. 6 Output—1 of a single finger’s detected edge of a human hand

94 R. M. Patil et al.

Fig. 7 Output—2 of two finger’s detected edge of a human hand

Fig. 8 Output—3 of two finger’s detected edge of a human hand

Fig. 9 Output—4 of four finger’s detected edge of a human hand [16]

A Unique Method of Detection of Edges and Circles of Multiple Objects … 95

Fig. 10 Output—5 of five finger’s detected edge of a human hand

8 Conclusion

In conclusion, this paper proposes a unique method for detecting edges and circles of
multiple objects in imaging scenarios using line descriptor concepts. The proposed
method uses a combination of line detection and descriptor techniques to detect edges
and circles, which is then refined using a clustering algorithm to identify individual
objects. The experimental results demonstrate the effectiveness of the proposed
method in detecting edges and circles of multiple objects in various scenarios,
including natural scenes and industrial environments. The proposed method is shown
to outperform existing state-of-the-art methods in terms of accuracy and efficiency.
Furthermore, the proposed method is computationally efficient and can process large
volumes of images quickly. This makes it suitable for real-time applications, such as
industrial inspection and surveillance.

References

1. Lindeberg T (1998) Edge detection and ridge detection with automatic scale selection.
International Journal of Computer Vision 30:117–154
2. Zubizarreta J, Aguinaga I, Amundarain A (2019) A framework for augmented reality guidance
in industry. Int J Adv Manuf Technolo 102:4095–4108
3. Jaeger PF, Kohl SA, Bickelhaupt S, Isensee F, Kuder TA, Schlemmer HP, Maier-Hein KH (2020)
Retina U-Net: embarrassingly simple exploitation of segmentation supervision for medical
object detection. In: Machine learning for health workshop. PMLR, pp 171–183
4. Park JM, Lu Y (2008) Edge detection in grayscale, color, and range images. In: Wah BW (ed)
Encyclopedia of computer science and engineering
5. Canny J (1986) A computational approach to edge detection. IEEE Transactions on Pattern
Analysis and Machine Intelligence 8:679–714
6. Haralick R (1984) Digital step edges from zero crossing of second directional derivatives. IEEE
Transactions on Pattern Analysis and Machine Intelligence 6(1):58–68
96 R. M. Patil et al.

7. Manjunath TC, Vijaykumar KN (2001) Separation of foreground & background objects in

image processing. National Journal of Applied Engineering and Technologies (JAET) 1(1):60–
65. Paper id AET-0014, ISSN-2278-1722
8. Manjunath TC, Suhasini VK (2011) Lossless compression in artificial images. National Journal
of Applied Engineering and Technologies (JAET) 1(1):66–71. Paper id AET-0015, ISSN-2278-
1722
9. Yadava GT, Nagaraja BG, Jayanna HS (2022) Performance evaluation of spectral subtrac-
tion with VAD and time–frequency filtering for speech enhancement. In: Emerging research
in computing, ınformation, communication and applications: proceedings of ERCICA 2022.
Springer Nature Singapore, Singapore, pp 407–414
10. Manjunath TC (2009) Development of an Novel Algorithm for compression of Binary Images.
In: VTU (Belgaum) & University of Mysore sponsored second international conference on
signal and ımage processing (ICSIP-2009), Vidya Vikas Institute of Technology, Mysore,
Karnataka, India, Paper No. 309, pp 381–386. ISBN 978-93-80043-26-5
11. Manjunath TC (2007) Fundamentals of robotics, vols 1, 2, (300+ solved selected questions and
answers, 185+ solved drill problems, 1240+ viva-voce oral questions, a free CD containing
150+ programs in C/C++ and up to date university question papers solutions, chapter wise and
useful robotic web-sites), 5th revised and enlarged edition. Nandu Publishers, Mumbai, 832p
12. Vipula S (2010) Fundamentals of ımage processing. Cengate India Pvt. Ltd., Text-Book
13. Nagaraja BG, Jayanna HS (2012) Multilingual speaker identification with the constraint of
limited data using multitaper MFCC. In: Recent trends in computer networks and distributed
systems security: international conference, SNDS 2012, Trivandrum, India. Springer, Berlin,
pp 127–134
14. Nagaraja BG, Jayanna HS (2013) Multilingual speaker identification by combining evidence
from LPR and multitaper MFCC. J Intell Syst 22(3):241–251
15. Niidome T, Ishii R (2001) A GUI support system for a sight handicapped person by using hand
shape recognition. In: 27th annual conference of the IEEE ındustrial electronics society, vol 1,
pp 535–538
16. Gonzalvez & Woods (2010) Fundamentals of image processing. Addision Wessely
17. Jain AK (2010) Digital image processing, PHI
18. Umbaugh SE (2010) Digital image processing and analysis: human and computer vision
applications with CVIP tools, 2nd edn. CRC Press, Boca Raton. ISBN 978-1-4398-0205-2
19. Barrow HG, Tenenbaum JM (1981) Interpreting line drawings as three-dimensional surfaces.
Artificial Intelligence 17(1–3):75–116
20. Zhang W, In F (1997) Int J Comput Vision 24(3):219–250
21. Ziou D, Tabbon S (1998) Edge detection techniques: An overview. International Journal of
Pattern Recognition and Image Analysis 8(4):537–559
22. Manjunath TC (2007) Fast track to robotics, 4th edn. Nandu Printers & Publishers, Mumbai,
180p
Robotic Vision: Simultaneous
Localization And Mapping (SLAM)
and Object Recognition

Soham Pendkar and Pratibha Shingare

Abstract To create a robot that can navigate the environment, you need an environ-
ment map of the appropriate environment. SLAM is the problem of updating a map
while tracking the location of an unknown environment. LiDAR SLAM is a type of
SLAM whose photodetection and ranging are common remote sensing methods used
to determine the precise distance between an object and a sensor. Helps to draw the
map more accurately. A pulsed laser is used in LiDAR to determine an object’s fluc-
tuating distance. The scanner estimates the altitude measurement by receiving and
recording the time delay between the transmission and receipt of the laser pulse. The
position of the system connected with the LiDAR sensor is also indicated through
GPS. We propose an agnostic front-end LiDAR system and provide a variety of
qualitative results. In addition to SLAM, we will also introduce YOLO v4. In other
words, you can see it only once, which is a new approach to discovering multiple
objects in real time in one frame. The whole image frame is processed by a single
neural network in YOLO. Divide the image into areas and use the probability of each
region to forecast the bounding box. This article introduced YOLO and modified the
YOLO v4 network for real-time object detection.

Keywords SLAM · YOLO · Sensor · LiDAR · Map · Detection · Environment ·

Object

1 Introduction

Robotic vision systems are a crucial piece of technology that allow robots to interact
with the environment and comprehend their surroundings. It includes analyzing
visual input and creating a three-dimensional model of the environment surrounding

S. Pendkar · P. Shingare (B)

College of Engineering, Pune, India
e-mail: pps.extc@coep.ac.in
S. Pendkar
e-mail: Pendkarss20.extc@coep.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 97
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_9
98 S. Pendkar and P. Shingare

the robot using cameras, sensors, and software algorithms [1]. Several sectors,
including industrial automation, autonomous cars, medical robotics, and many more,
can benefit from using robotic vision systems. Robotic vision systems are getting
more sophisticated and are able to carry out difficult tasks with increased accuracy
and efficiency as machine learning and computer vision techniques progress.
The creation of 3D cloud maps is an important challenge for a variety of robotic
applications. To do so, two different techniques are mentioned below.
1. LiDAR SLAM.
Robotics and autonomous systems employ the LiDAR Simultaneous Localization
and Mapping (SLAM) technology to simultaneously map the environment and
localize the robot inside it [2]. It makes use of a LiDAR sensor, which sends out
laser beams to detect its surroundings and generate a 3D point cloud of the area.
This point cloud data is used by the LiDAR SLAM algorithm to identify and
track environmental elements including walls, objects, and landmarks while also
determining the location and orientation of the robot on the map. As a result, the
robot can maneuver and avoid obstacles in real time with accuracy.
In applications like autonomous cars, drones, and mobile robots where precise
and effective mapping and localization are crucial, LiDAR SLAM is frequently
employed. Robots are useful instruments for a number of jobs, including surveillance,
search and rescue, and transportation, since they can work independently in complex
and dynamic situations by employing LiDAR sensors to generate and update a map
of the surroundings.
2. Visual sensor-based SLAM.
Simultaneous Localization and Mapping, or SLAM, is a robotics and computer vision
technology that uses visual sensors to map an uncharted area while also detecting
the location of the robot inside it.
In visual sensor-based SLAM, the robot takes pictures of its surroundings with
one or more cameras. To construct a 3D map of the environment, these photos are
then processed to extract visual elements like corners, edges, and blobs.
The robot’s location and orientation are also calculated by observing how these
visual elements change over time. A comprehensive map of the environment is
created by fusing data from the camera(s) and the robot’s movement, and the position
of the robot inside that environment is continually updated.
Built on visual sensors applications for SLAM may be found in a number of indus-
tries, including robots, autonomous driving, virtual reality, and augmented reality. It
offers the benefit of being inexpensive, lightweight, and free from the need for pricey
sensors like LiDAR [3].
The creation of a rich and exact 3D point cloud map has been made feasible
thanks to recent breakthroughs in LiDAR technology. Since solo odometry is subject
to oscillations in motion estimates, the integration of the modules is very impor-
tant for the accuracy of maps. Despite several advances in LiDAR odometry tech-
niques, this motion estimation error is unavoidable. Front-end LiDAR system is
developed. By building the system in modules and successfully integrating it with
Robotic Vision: Simultaneous Localization And Mapping (SLAM) … 99

Fig. 1 General architecture of SLAM

Scan Context++ and a range of opensource LiDAR odometry techniques, aimed to

establish a comprehensive system that produces accurate point cloud maps.
On sparse data point clouds, SLAM and object identification can operate. ORB-
SLAM employs the current and previous video frames from the monocular vision
sensor to calculate the observer’s location and the point cloud representing the objects
in the environment in the proposed technique. Frame currents are used by deep neural
networks to identify and recognize objects.
This is a fundamental robotics issue; SLAM, on the other hand, offers a wide
range of applications, including deep space exploration, interior localization, and
navigation in vast settings. Deep neural networks have been used to improve the
performance of the SLAM technology in recent years. In general, Researchers build
picture embeddings using DNN and CNN-based image descriptors.
CNN may be used to instantly map unprocessed front camera pixels. Addition-
ally, it has been demonstrated that using SLAM and an object recognition system
simultaneously can increase object recognition performance by providing OR with
extra information on the spatial placement of the identified item, even in systems
that use monocular visual sensors.
The SLAM approach combines two functions that are inextricably linked as shown
in Fig. 1 The front-end recognizes and tracks what’s going on the characteristics of the
image obtained from the moving robot. The back-end interprets these characteristics
and landmark observations, providing estimates for both the landmarks’ and the
robot’s positions with relation to the chosen frame of reference.

2 Literature Survey

Here looking to provide complete picture of robotics, development of research, and

cutting-edge techniques in this sector to enhance the localization of autonomous
robots. Main goal is to assess the methods and algorithms that enable robots to
move freely and securely across challenging settings. Due of these limitations, we
concentrated all our study efforts on only one issue: evaluating a straightforward,
quick, and lightweight SLAM approach that can reduce localization mistakes. A
100 S. Pendkar and P. Shingare

Fast SLAM 2.0 that incorporates scan matching and loop closure detection has been
shown and reviewed [4]. After investigating one of the top deep learning methods
that makes use of convolutional neural networks to assist the robot in detecting
its surroundings and identifying items. With the help of the YOLOv3 algorithm
validation of experiments has been done.
Cutting-edge method for visual sensor data in open spaces that works with point
clouds of sparse data and allows SLAM and object recognition. Unlike deep neural
networks, which can only recognize and classify items in the current frame, ORB-
SLAM determines the observer’s location and generates a cloud of points that
symbolizes the environment’s objects by combining previous and present monoc-
ular visual sensor video frames [5]. The collected point cloud contrasts with the
region that the OR network recognized. Filtration of points that match to the region
indicated by the OR algorithm is done because every point has a counterpart in the
current frame on the 3D map. Clustering method discovers regions in which points
are widely spread to pinpoint the locations of objects detected by OR. Following
step estimates the bounding boxes of the objects that were detected using a heuristic
based on principal component analysis [5].
Due to the poor resolution and background-like objects in aerial photos, tiny
target detection is still a challenging task. Effective and high-performance detector
approaches have been created with the recent advancements in object detecting tech-
nology. The YOLO series is an example of an efficient, lightweight object identifi-
cation technique among them. In this article, we suggest a technique for tweaking
YOLOv4 to enhance the performance of tiny target recognition in aerial photos. The
first effective channel attention module was used to change the structure of it, and the
channel attention pyramid approach was suggested. Useful YOLO channel attention
pyramid is provided [6].
The creation of accurate 3D point clouds is essential even for data-driven urban
studies and many robot tasks. To achieve this, SLAM based on light detection and
ranging LiDAR sensors has been developed. Numerous odometry and location iden-
tification techniques have been independently presented in academics to make up
a complete SLAM system. However, they have not been sufficiently integrated or
merged, making it difficult to upgrade a single place identification or odometry
module. Each module’s performance has significantly increased recently, thus it is
essential to create a SLAM system that can seamlessly combine them and quickly
swap out older modules for the newest. Successful combining of SLAM with Scan
Context++ and several different free LiDAR alternatives for building accurate maps
has been done.
Mathematical framework is used for merging SLAM with object tracking. Two
approaches are outlined: SLAM for generic objects and SLAM for tracking and
recognizing moving things. A joint posterior is computed for the robot and all gener-
alized objects in SLAM with generalized objects. SLAM systems are now in use,
but with a more organized methodology that makes motion modeling of generic
objects easier. Sadly, it is computationally expensive and usually impractical. The
estimation issue is divided into two distinct estimators using SLAM with DATMO.
Because discrete posteriors are preserved for stationary and moving objects, the
Robotic Vision: Simultaneous Localization And Mapping (SLAM) … 101

ensuing estimation problems are substantially less dimensional than in SLAM with
generalized objects. It’s challenging to do SLAM and object tracking from moving
vehicle in congested cities. Workable techniques that address problems with percep-
tion modeling are offered. The recognition of moving objects and data association
are carried out using the SLAM with DATMO framework. The CMU Navlab11 car’s
data was used to demonstrate the use of SLAM with DATMO while it sped through
crowded metropolitan areas at high speeds. A wide range of experimental findings
demonstrate the viability of the suggested theory and methods [2].

3 LiDAR SLAM

LiDAR Simultaneous Localization and Mapping (SLAM) is a technique that uses

Simultaneous Localization and Mapping (SLAM) algorithms with Light Detection
and Ranging (LiDAR) sensors to map an environment in real-time and pinpoint the
location of the LiDAR sensor inside it. The LiDAR sensor produces laser beams,
which reflect off nearby objects and then return to the sensor with data on their size,
location, and shape. This data is used by the SLAM algorithm to follow the movement
of the LiDAR sensor and create a map of the surrounding area. Many technologies,
including autonomous cars, robots, and virtual reality, utilize LiDAR SLAM.
A popular remote sensing method for pinpointing an object’s distance from a
sensor is LiDAR. It calculates the varying distance of an object from a sensor using
a pulsed laser [7]. These light pulses, along with the data acquired by the device,
produce precise 3D information about the sensor and targets. This LiDAR equipment
has three main components: scanner, laser, and GPS receiver. Other elements that
play an important role in data acquisition and analysis are photodetectors and optics.
A single idea governs LiDAR: Calculate the time it takes for laser light to return
to the LiDAR source after shining it at an item, as mentioned in Fig. 2. Light travels
at a speed of around 186,000 miles per hour, the technique of measuring distances
properly with LiDAR appears to be quite quick. The following is the formula that
analysts use to compute an object’s exact distance:
Distance to object = (speed of light × flight time)/2.
LiDAR is a type of active remote sensing technology. This implies that the system
creates its own energy, which will be light in the form of a rapid-fire laser, which
will be used to measure the distance and precision of an item.
A LiDAR sensor is made up of three primary parts:
1. Laser: Sends and receives pulses.
2. Scanner: Receives and records the time difference between light pulse transmis-
sion and reception to compute altitude.
3. Dedicated GPS: Assists the LiDAR sensor in determining the system’s position.
It operates by first illuminating the target with laser light and then using a sensor
to catch the light that is reflected, where the distance to the item is inferred using
the speed of light to determine the distance traveled correctly, like a “time of flight”
102 S. Pendkar and P. Shingare

Fig. 2 Working flow of SLAM

process. Furthermore, the laser’s difference in return time and wavelength is utilized
to create exact digital 3D representations and surface details of the target, as well as
to visually map its distinct properties [8]. As seen, LiDAR technology can generate
accurate and precise information about road structure and identify obstacles to avoid
collision.
LiDAR is a radio-wave and sound-based technology like radar and sonar. LiDAR,
on the other hand, is more exact than them since they can only map the location
of a distant object, whereas LiDAR can build precise digital 3D representations.
This qualifies them for intimate and personal dynamics in a variety of applications,
including driverless cars.

4 YOLO_v4

A cutting-edge real-time object recognition technique that makes use of deep learning
is called You Only Look Once (YOLO) v4. YOLO v4 is an upgrade over earlier
iterations of YOLO, with faster speed and more accuracy.
The technique predicts the bounding boxes, objectless score, and class probability
for each grid cell after splitting an image into a grid. The cross-stage partial (CSP)
architecture-based backbone network used by YOLO v4 aids in increasing the preci-
sion of object recognition. A spatial pyramid pooling (SPP) module is also included,
which enables the model to learn features at various sizes.
Robotic Vision: Simultaneous Localization And Mapping (SLAM) … 103

To further improve model optimization, YOLO v4 employs a variety of loss func-

tions, such as focal loss, binary cross-entropy loss, and IoU loss. The model’s perfor-
mance is further enhanced by several additional methods, including Drop Block
regularization, Mosaic data augmentation, and Mish activation.
On several object detection benchmarks, including the COCO and VOC datasets,
YOLO v4 has produced state-of-the-art results. Real-time object detection is essential
in many applications, including robotics, surveillance, and autonomous vehicles,
where it is frequently employed.
You Only Look Once, often known as YOLO, is a well-known object recognition
technique used in computer vision. YOLO employs a single neural network to the
whole image, predicting bounding boxes and class probabilities concurrently, in
contrast to traditional object identification approaches that rely on numerous passes
over an image. Because of this, YOLO is quick and effective, and it can process
photos in real time using a regular GPU. Applications for YOLO range from object
detection in autonomous vehicles to observing pedestrian behavior in public areas
[9].
You Only Look Once, or YOLO is a modern object detection technology which
operates in real time. It is an object identification system that can instantly iden-
tify several objects in a single image. YOLO has evolved into several variations
throughout time, including YOLOv2, YOLOv3, and YOLOv4.
YOLO has a fundamentally different approach to detection than past methods. To
process the full picture, it just employs one neural network. The image is divided into
regions by this network, which also projects bounding boxes and probabilities for
each region. Utilizing the anticipated likelihood, these boundary boxes are calculated.
The core idea of YOLO is depicted in Fig. 3. Each grid cell in the input picture
must forecast the cantered object that belongs in that grid cell. The input picture is
divided into grid cells of size S × S.
Each grid cell’s adjacent cells B and their confidence levels are projected. These
confidence ratings represent the model’s evaluation of the box’s chances of containing
an item as well as the precision of the predictions made by the box. Another sample
output is shown in Fig. 4.
Compared to classifier-based systems, the YOLO approach provides a number of
benefits. In a single picture, it can distinguish many items. At test time, it assesses the
entire image to ensure that the full context of the image informs all of its predictions.
In contrast to systems like R-CNN that require hundreds of ratings for single picture.
It offers forecasts using a single network rating. It is therefore 1000 times faster than
CNN and 100 times faster than Fast R-CNN. The YOLO design maintains a high
degree of average accuracy while enabling real-time speed and end-to-end training.
104 S. Pendkar and P. Shingare

Fig. 3 The YOLO_v4 model

Fig. 4 Working of an YOLO_v4

5 Hardware Used

In this study, a personal computer was chosen. It features an Intel Core(R) i7-10510U
CPU clocked at 1.80 and 2.30 GHz, 8 GB of DDR4 RAM, and an NVIDIA GeForce
MX350 graphics card with 4 GB of DDR5 RAM to expedite CNN training. For
YOLO operations, the Mi Webcam HD 720p camera was used as a 3D camera
sensor. SmartFly info the LiDAR-053 EAIYDLIDAR X4 LiDAR Laser Radar sensor
module is utilized for LiDAR with a range of 10 m and a frequency of 5 kHz (Fig. 5).
Robotic Vision: Simultaneous Localization And Mapping (SLAM) … 105

Fig. 5 LiDAR sensor and HD 720p webcam

6 Results

LiDAR
SC-PGO is the fundamental core of SC-LiDAR-SLAM. Open source and inte-
grated LiDAR odor measuring technologies have been combined with SC-PGO.
For convenience of usage, the entire LiDAR SLAM system is available through the
repository.
The SLAM output of an example image is shown in Fig. 6, providing a real-time
3D map of an interfaced environment in a specific graphical interface.
YOLO_v4
YOLOv4 achieves cutting-edge results in real-time object detection and is capable
of running at 60 FPS using the GPU. Model is trained to detect 81 other objects
together. The real-time object detection situation is depicted in Fig. 7, where it detects
3 different objects simultaneously.

Fig. 6 LiDAR implementation op stages

106 S. Pendkar and P. Shingare

Fig. 6 (continued)

Fig. 7 Object detected using

YOLO_v4
Robotic Vision: Simultaneous Localization And Mapping (SLAM) … 107

7 Conclusion

Front-end agnostic LiDAR system has been developed and that gave qualitative
results. Easy interaction between several LiDAR (or even radar odor measuring) tech-
nologies can be possible and successfully built the exact point cloud maps thanks to
our modular architecture and Scan Context++’s excellent loop-closing features [10].
In subsequent work, we’ll provide several quantitative evaluations of the performance
of the recommended LiDAR system.
YOLO series is established and enhanced the YOLOv4 network in this study to
better recognize indoor micro targets [11].

References

1. https://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping
2. Wang C-C, Thorpe C, Thrun S, Hebert M, Durrant-Whyte H (2007) Simultaneous localiza-
tion, mapping and moving object tracking. The International Journal of Robotics Research
26(9):889–916
3. Maolanon P, Sukvichai K, Chayopitak N, Takahashi A (2019) Indoor room identify and
mapping with virtual based SLAM using furnitures and household objects relationship based on
CNNs. In: 2019 10th international conference of information and communication technology
for embedded systems (IC-ICTES). IEEE, pp 1–6
4. Chehri A, Zarai A, Zimmermann A, Saadane R (2021) 2D autonomous robot localization using
fast SLAM 2.0 and YOLO in long corridors. In: International conference on human-centered
intelligent systems. Springer, Singapore, pp 199–208
5. Mazurek P, Hachaj T (2021) SLAM-OR: simultaneous localization, mapping and object recog-
nition using video sensors data in open environments from the sparse points cloud. Sensors
21(14):4734
6. Kim M, Jeong J, Kim S (2021) ECAP-YOLO: efficient channel attention pyramid YOLO for
small object detection in aerial image. Remote Sensing 13(23):4851
7. Chan S-H, Wu P-T, Fu L-C (2018) Robust 2D indoor localization through laser SLAM and
visual SLAM fusion. In: 2018 IEEE international conference on systems, man, and cybernetics
(SMC). IEEE, pp 1263–1268
8. Bavle H, De La Puente P, How JP, Campoy P (2020) VPS-SLAM: visual planar semantic
SLAM for aerial robotic systems. IEEE Access 8:60704–60718
9. Garcia-Rodriguez J (ed) (2020) Robotic vision: technologies for machine learning and vision
applications: technologies for machine learning and vision applications. IGI Global
10. Kim G, Yun S, Kim J, Kim A (2022) SC-LiDAR-SLAM: a front-end agnostic versatile
LiDAR SLAM system. In: 2022 international conference on electronics, information, and
communication (ICEIC). IEEE, pp 1–6
11. Cai Y, Alber F, Hackett S (2020) Path markup language for indoor navigation. International
conference on computational science. Springer, Cham, pp 340–352
Optimum Value of Cyclic Prefix (CP)
to Reduce Bit Error Rate (BER)
in OFDM

Mahesh Gawande and Yogita Kapse

Abstract Orthogonal frequency division multiplexing is one of the multi-carrier

transmission systems used in wireless communication networks (OFDM). OFDM’s
bit error rate (BER) is constantly decreasing. Using the cyclic prefix (CP), we can
reduce the bit error rate, but increasing the value of the cyclic prefix will waste our
transmitted data. As a result, in this paper, we will discuss the optimum value of the
cyclic prefix to reduce the bit error rate. When an OFDM signal is sent in a dispersive
channel, a cyclic prefix is used to prevent inter-symbol and inter-carrier interference.
An OFDM signal is said to be dispersed when delivered over a dispersive channel.
OFDM is crucial in 4G communication systems such as fixed Wi-Fi, fixed WiMAX,
mobile WiMAX, and long-term evolution (LTE). The impact of changing design
parameters (i.e., cyclic prefix) on OFDM systems will be studied in this study by
simulating multiple OFDM systems with various modulation schemes in MATLAB
coding/Simulink.

Keywords Cyclic prefix (CP) · Bit error rate (BER) · Inter-symbol interference
(ISI) · Inter-carrier interference (ICI)

1 Introduction

The orthogonal frequency division multiplexing (OFDM) techniques use many

narrow-band sub channels to carry the high data rates. These sub channels are opposed
to one another. They are a narrow band with close spacing. Because of its capacity to
handle multipath interference at the receiver, OFDM is being deployed. Idea behind
the use of OFDM is to part a high-rate data stream into smaller streams that are sent
out in a succession of subcarriers at the same time, any type of digital modulation

M. Gawande (B) · Y. Kapse

Electronics and Telecommunication, College of Engineering Pune, Pune, India
e-mail: gawandemb20.extc@coep.ac.in
Y. Kapse
e-mail: ydk.extc@coep.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 109
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_10
110 M. Gawande and Y. Kapse

can be used as a modulation system, the most prevalent being BPSK, QPSK, and
64-QAM [1].
The result of adding the outputs of all the modulators is the signal to be broadcast.
Compared with alternative modulation techniques, the OFDM system with channel
coding and BPSK modulation produce the lowest BER value. As a result, a BPSK-
based OFDM system using FFT and cyclic code gives the lowest BER value. A
low-rate data stream modulates each orthogonal carrier in the OFDM system, which
divides the spectrum into many of them. It will be possible to utilize the spectrum
more efficiently than with frequency division multiple access if carrier spacing over-
head is removed. Each carrier’s bandwidth is limited, it has a low symbol rate and,
as a result, a high tolerance for multipath delay spread [2]. To cause significant inter-
symbol interference, the delay spread must be enormous. As a result, inter-symbol
interference is a significant issue when evaluating the signal’s performance during
the various phases of transmission. BPSK, 64-QAM, and other modulation constel-
lations are used to modulate and map a wide-band data stream of binary digits to
a symbol stream. Inverse multiplexing is used to de-multiplex these symbols into
several parallel streams. There’s a chance the constellation will be different this
time. As a result, the bit rate of some streams may be higher than that of others. An
inverse FFT is performed on each set of symbols, resulting in a collection of complex
time-domain samples that are quadrature blended with passing band in the typical
manner [3].

2 Related Work

To attain the highest level of data transfer dependability, OFDM was initially devel-
oped in the communication industry as a technique for encoding digital data on
multiple carrier frequencies. From previous research work, I have understood that
only one thing is common in all papers, i.e., as we increase the cyclic prefix value,
our inter-carrier interference and inter-symbol interference reduce. However, one of
the papers also showed that as we increase our cyclic prefix value, our bit error rate is
reduced [4]. However, in the previous paper, they did not talk about optimum value
for same, because as we increase maximum percentage of cyclic prefix then at the
same time our transmitted date is also lost, in previous all these are found on using
simulation of MATLAB, i.e., Simulink. However, I not only worked with simulation
but also did the same thing using MATLAB coding and took the optimum value to
reduce the bit error rate without losing our data too much [5].
In this paper, we will calculate the optimum value of cyclic prefix as previous
work was done on by increasing the value of cyclic prefix. We will reduce the ICI
and ISI, and the BER. However, as a cyclic prefix is increasing at the same time, our
date is also lost, so we will here talk about the optimum value of the cyclic prefix [6].
Optimum Value of Cyclic Prefix (CP) to Reduce Bit Error Rate (BER) … 111

3 Proposed Methodology

A. OFDM Transmitter

Figure 1 depicts the OFDM system’s basic model, i.e., the transmitter part uses the
BPSK modulation technique to modulate digital data to be transmitted, after which
the data is transformed into many parallel streams. The modulated signals are then
given to the IFFT block, which converts the spectrum representation of the data into
the time domain, which is significantly more computationally efficient and employed
in all practical systems [7]. The signals are then prefixed with a cyclic prefix. During
the guard interval, the cyclic prefix consists of the end of the OFDM symbol copied
into the guard interval, followed by the guard interval and the OFDM symbol. The
guard interval comprises a copy of the end of the OFDM symbol so that when
the receiver conducts OFDM demodulation with each multipath, it will integrate
across an integer number of sinusoid cycles. OFDM exhibits exceptional resilience
in multipath scenarios. The cyclic prefix keeps subcarriers orthogonal. The receiver
can catch more multipath energy with a cyclic prefix. The signals are then translated
to serial form and sent through a transmitter. After that, digital data is sent over the
channel [8].
B. AWGN Channel

The AWGN channel model is commonly utilized in OFDM research. In this model,
the amplitudze distribution is Gaussian, and there is just linear addition of white
noise with a constant spectral density. The model doesn’t work. Consider fading,
frequency selectivity, interference, and so on [9].
Even though it is unsuitable for most people, terrestrial networks are still being
utilized to provide essential and reliable services. Controlled mathematical models to
investigate the fundamentals in the absence of the, the behavior of system elements
above, i.e., the reason we go for AWGN channel as compared to Rayleigh Channel.

Fig. 1 Basic block diagram of OFDM

112 M. Gawande and Y. Kapse

C. OFDM Receiver
Serial data received from the AWGN channel is to be converted into parallel form,
i.e., in the number of subcarriers, so that the cyclic prefix can be removed from each
of the subcarriers [10].
Cyclic Prefix Removal Block: Cyclic prefix which is added to eliminate inter-
symbol interference (ISI) is to be removed first to retrieve the original data [11].
Fast Fourier Transform (FFT): The IFFT block on the transmitter side serves
the same function as the FFT block on the receiver side. FFT of each subcarrier is
determined individually parallel to the serial converter. For our results, we are using
256 FFT length and 1024 FFT length [12].
All the subcarriers are then merged in the serial form, i.e., converted from
parallel to serial form to retrieve the original data. Demodulation serial data is then
demodulated by the required modulation technique to get the original data back [13].

4 Simulation Flowchart

Figure 2 shows the flowchart of MATLAB coding, main focus on flowchart is where
we are using the cyclic prefix adder and cyclic prefix remover, as we can say cyclic
prefix adder using on transmitter side and on other side, i.e., on receiver side we are
using the cyclic prefix remover. After completing all the steps, we will get the bit
error rate graph versus signal to noise ratio, showing different values of the cyclic
prefix.

5 Results and Conclusion

A. Simulation Models
For different modulation techniques.

5.1 BPSK (Binary Phase Shift Keying)

See Fig. 3.

5.2 QPSK (Quadrature Phase Shift Keying)

See Fig. 4
Optimum Value of Cyclic Prefix (CP) to Reduce Bit Error Rate (BER) … 113

Fig. 2 Simulation flowchart

114 M. Gawande and Y. Kapse

Fig. 3 Simulation of BPSK model

Fig. 4 Simulation of QPSK model

5.3 16-QAM (16-Quadrature Amplitude Multiplexing)

See Fig. 5.

5.4 32-QAM (32-Quadrature Amplitude Multiplexing)

See Fig. 6.
Using a Bernoulli distribution, the Bernoulli binary generator block produces
random binary numbers (above simulation only eight binary bits are generated).
Utilize this block to mimic digital communication networks and produce random
data bits to acquire performance metrics like bit error rate. Zero is produced by the
Optimum Value of Cyclic Prefix (CP) to Reduce Bit Error Rate (BER) … 115

Fig. 5 Simulation of 16-QAM model

Fig. 6 Simulation of 32-QAM model

Bernoulli distribution with parameter p with probability p and one is produced with
probability 1 − p. The mean value of the Bernoulli distribution is 1 and the variance
is p. (1 − p). Any real value between [0, 1] can be used as the probability of zero
parameter, which determines p.
A column or row vector, a two-dimensional matrix, or a scalar might be the
output signal. The samples per frame parameter determines the number of rows in
the output signal, which corresponds to the number of samples in a frame. The number
of elements in the probability of zero parameter determines how many columns there
are in the output signal, which is equal to the number of channels.
AWGN Channel (or) Rayleigh Channel: This is the path by which the data is
transmitted. The presence of noise in this medium has an impact on the signal and
116 M. Gawande and Y. Kapse

produces data content distortion. Additive white Gaussian noise (AWGN) is a funda-
mental noise model used in information theory to simulate the effect of many random
processes seen in nature.
Rayleigh fading is a statistical model that explains how the propagation environ-
ment affects a radio signal, such as the one used by wireless devices. The underlying
premise of Rayleigh fading models is that the strength of a signal traveling through
such a transmission medium (also referred to as a communication channel) will
randomly change, or fade, in accordance with a Rayleigh distribution, which is the
radial component of the sum of two uncorrelated Gaussian random variables.
Under QAM, 16-QAM, and 64-QAM modulation schemes, the AWGN channel
has the best performance of all channels because it has the lowest bit error rate (BER).
The quantity of noise in this channel’s BER is much lower than in fading chan-
nels. Rayleigh fading has the worst performance of all channels, since its BER has
been heavily influenced by noise under QAM, 16-QAM, and 64-QAM modulation
schemes.
The evaluation of various cyclic prefix lengths indicated that, on that basis, we
established the optimal value acceptable for the results, which lowered our bit error
rate while not causing excessive data loss. Figures 7, 8, 9, and 10 depict the various
cyclic prefix values for various modulation techniques, but for our understanding we
are taking only two modulation techniques, i.e., BPSK, 64-QAM with different FFT
length like 256, 1024 FFT length. From Figs. 7, 8, 9, and 10 we can say that as cyclic
prefix value increases our bit error rate decreases, but from Figs. 7, 8, 9, and 10, we
can conclude that as cyclic prefix value increases, we know that our date is also lost
hence we also take value which is less or optimum to reduce our bit error rate, at the
same time our data to not be lost we can take that value nearly 10%, because that
much data loss is less when compared with 40–50% of cyclic prefix (CP).

for FFT length 16

max bit error rate for cyclic prefix 0.1 is 0.2730
max bit error rate for cyclic prefix 0.4 is 0.2657
max bit error rate for cyclic prefix 0.7 is 0.2565
for FFT length 64
max bit error rate for cyclic prefix 0.1 is 0.2726
max bit error rate for cyclic prefix 0.4 is 0.2627
max bit error rate for cyclic prefix 0.7 is 0.2556
for FFT length 256
max bit error rate for cyclic prefix 0.1 is 0.2713
max bit error rate for cyclic prefix 0.4 is 0.2653
max bit error rate for cyclic prefix 0.7 is 0.2592
for FFT length 1024
max bit error rate for cyclic prefix 0.1 is 0.2747
max bit error rate for cyclic prefix 0.4 is 0.2634
max bit error rate for cyclic prefix 0.7 is 0.2557
Optimum Value of Cyclic Prefix (CP) to Reduce Bit Error Rate (BER) … 117

Fig. 7 Different cyclic prefix values for BPSK modulation of 256 FFT length

Fig. 8 Different cyclic prefix values for BPSK modulation of 1024 FFT length
118 M. Gawande and Y. Kapse

Fig. 9 Different cyclic prefix values for 64-QAM modulation of 256 FFT length

Fig. 10 Different cyclic prefix values for 64-QAM modulation of 1024 FFT length
Optimum Value of Cyclic Prefix (CP) to Reduce Bit Error Rate (BER) … 119

References

1. Lim C, Chang Y, Cho J, Joo P, Lee H (2005) Novel OFDM transmission scheme to overcome
caused by multipath delay longer than cyclic prefix. In: 2005 IEEE 61st vehicular technology
conference, vol 3. IEEE, pp 1763–1767
2. Subotic V, Primak S (2007) BER analysis of equalized OFDM systems in Nakagami, m < 1
fading. Wireless Pers Commun 40(3):281–290
3. Chang Y-P, Lemmens P, Tu P-M, Huang C-C, Chen P-Y (2011) Cyclic prefix optimization
for OFDM transmission over fading propagation with bit-rate and BER constraints. In: 2011
Second international conference on innovations in bio-inspired computing and applications.
IEEE, pp 29–32
4. Mišković B, Lutovac MD (2012) Influence of guard interval duration to interchannel interfer-
ence in DVB-T2 signal. In: 2012 Mediterranean conference on embedded computing (MECO).
IEEE, pp 220–223
5. Lorca J (2015) Cyclic prefix overhead reduction for low-latency wireless communications in
OFDM. In: 2015 IEEE 81st vehicular technology conference (VTC Spring). IEEE, pp 1–5
6. Waichal G, Khedkar A (2015) Performance analysis of FFT based OFDM system and DWT
based OFDM system to reduce inter-carrier interference. In: 2015 international conference on
computing communication control and automation. IEEE, pp 338–342
7. Jadav NK (2018) A survey on OFDM interference challenge to improve its BER. In: 2018
second international conference on electronics, communication and aerospace technology
(ICECA). IEEE, pp 1052–1058
8. Gowda NM, Sabharwal A (2018) CPLink: interference-free reuse of cyclic-prefix intervals in
OFDM-based networks. IEEE Trans Wireless Commun 18(1):665–679
9. Farzamnia A, Hlaing NW, Mariappan M, Haldar MK (2018) BER comparison of OFDM with
M-QAM modulation scheme of AWGN and Rayleigh fading channels. In: 2018 9th IEEE
control and system graduate research colloquium (ICSGRC). IEEE, pp 54–58
10. Athaudage CRN, Dhammika A, Jayalath S (2004) Delay-spread estimation using cyclic-prefix
in wireless OFDM systems. IEE Proceedings-Communications 151(6):559–566
11. Bandele JO (2019) A Matlab/Simulink design of an orthogonal frequency division multiplexing
system model. International Journal of Engineering Inventions 8(4)
12. Alkamil ADE, Hassan OTA, Hassan AHM, Abdalla WFM (2020) Performance evaluations
study of OFDM under AWGN and Rayleigh channels. In: 2020 international conference on
computer, control, electrical, and electronics engineering (ICCCEEE). IEEE, pp 1–6
13. Mohseni S (2013) Study the carrier frequency offset (cfo) for wireless OFDM
Optimum Sizing of Solar/Wind/Battery
Storage in Hybrid Energy System Using
Improved Particle Swarm Optimization
and Firefly Algorithm

Gauri M. Karve, Mangesh S. Thakare, and Geetanjali A. Vaidya

Abstract The integration of hybrid energy systems (HES) with solar photovoltaic
(PV)—wind turbine (WT)—battery energy storage (BES) is increasing rapidly to
enhance the performance of microgrid or power systems along with mitigating local
energy crisis and environmental pollution concerns. In this work, the performance of
HES with PV-WT-BES is improved by minimizing the total annual cost of these
components using improved particle swarm and firefly optimization algorithms.
These two algorithms are compared and analyzed for three system configurations
as PV-BES; WT-BES and PV-WT-BES to determine the optimum capacity sizing of
PV, WT, and BES to fulfill the load of a particular place. The study also highlighted
the impact of the state of charge (SOC) of BES on the optimum sizing of system
components and its overall cost. The effect of SOC variation is analyzed for two
battery chemistries as lithium-ion and lead acid.

Keywords Optimum sizing · Battery storage · Improved particle swarm

optimization · Firefly algorithm

1 Introduction

Hybrid energy systems (HES) with PV and WT are proven to be a boon for addressing
crises due to depletion in fossil fuels and power shortages from the past few eras.
But these PV and WT energy sources have the limitations of uncertain generated
power output and high installation cost, hence battery energy storage (BES) should
be integrated with them. This addition of BES can play a multi-functional role in
the electrical power system such as reducing operating costs or capital expenditures

G. M. Karve (B) · M. S. Thakare · G. A. Vaidya

Electrical Engineering Department, PVG’s COET & GKPIM, Pune, India
e-mail: gmk_elect@pvgcoet.ac.in
M. S. Thakare
e-mail: mst_elect@pvgcoet.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 121
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_11
122 G. M. Karve et al.

when used as a generator in the utility sector, facilitating the integration of RES
into the electric power system, load leveling, peak shaving, stabilizing voltage, and
frequency and maintaining uninterrupted power supply [1]. Despite certain advan-
tages, the market maturity of BES is slow due to its high cost incurred in the expensive
cell material and lack of a corresponding legal framework [2]. Though the combina-
tion of solar-wind-BES is beneficial for mitigating environmental pollution concerns
and local energy crises, this hybrid renewable energy system (HRES) faces challenges
regarding its optimum operation, optimum sizing, system security, system reliability,
and cost of the system [3]. After the literature review, it is observed that researchers
have presented many aspects for better performance of HRES either by optimizing
system cost with total annual or project cost or by taking into account loss of power
supply probability (LPSP) or by minimizing the cost of energy (COE) or by applying
various optimization techniques implemented on real-time case studies. The litera-
ture on optimum sizing of HRES components can be categorized as—minimizing
COE only [3–5] or LPSP only [6, 7]. Some of the literature shows improvement in the
system performance by combining both—COE and LPSP [8–12] and some research
articles proposed HRES optimization studies using techniques like PSO [4, 8–10,
13–15], FA [11, 12, 16], GA [6, 14] or hybrid optimization techniques like PSO-GSA
[10], PSO-GWO [8], SA-PSO [13], etc. The case study of a university campus on
a Mediterranean island by considering economic metrics as net present cost (NPC)
and levelized COE of a PV/WT/BES hybrid system is proposed in [3]. The analysis
of the PV/WT/diesel/BES hybrid system for minimum COE for homes in Morocco,
Spain, and Algeria is carried out using PSO [4]. In [5], HRES is designed for the
remote island—of Jiuduansha, Shanghai by considering the effects of saturation of
RES on various parameters like system reliability, NPC, BES size, and the repayment
period. In [6], the Nigerian case study based on optimum sizing of a hybrid system
consisting of PV/WT and storage system is presented to fulfill the load demand based
on the LPSP using enhanced genetic algorithm. In [7], the comprehensive review
of the optimum sizing of HRES in Oman is described using various optimization
methods. In [8], the enhanced whale optimization algorithm (EWOA) is applied for
the optimization and operation of HRES for electrifying a rural city in Algeria. It also
compared EWOA with various optimization algorithms like PSO, gray wolf opti-
mizer (GWO), and modified GWO to resolve the COE problem considering LPSP.
In [9], optimal planning of the microgrid consisting of solar/wind/bio-generator/
BES is presented with a real-time case study of Bihar, India. It analyzed the tradeoff
between COE and LPSP using hybrid PSO-GWO. In [10, 11], novel optimization
techniques are employed for the optimal sizing of hybrid microgrids of Egypt and
Manipur, India, respectively. In [12], power reliability with COE and load dissatis-
faction criteria for PV/WT systems are proposed using FA. The [13] proposed hybrid
optimization methods using SA and PSO as SA-PSO for determining the optimal
size of a microgrid located in Egypt. The iterative complexity for optimum sizing of
HRES which arises in GA is minimized by PSO [14]. The optimum sizing of HRES
is discussed by comparing the fuel cell with the battery by applying harmony search,
tabu search, SA, and PSO [15]. The paper concluded that PV/WT/BES combination
is economically better than the fuel cell system and PSO resulted better than other
Optimum Sizing of Solar/Wind/Battery Storage in Hybrid Energy … 123

algorithms. The [16] analyzed the optimal design of HRES using FA to achieve the
profitable operations of RES to supply the load.
The objective of this paper is to analyze the performance of the PV-WT-BES
system by minimizing total annual cost (TAC) using IPSO and FA. The objective
also includes the analysis of the impact of variation in SOC on BES sizing with two
BES chemistries lithium-ion batteries and lead acid batteries.
The rest of the paper is arranged as—Sect. 2 describes system components
under consideration, mathematically along with objective function and constraints.
Section 3 discusses size optimization algorithms. The discussion of results is given
in Sect. 4 and 5 concludes the paper with key contributions.

2 System Configuration, Mathematical Modeling

of Components of Hybrid Renewable Energy System,
Objective Function and Constraints

The HRES under consideration along with the specifications is given in Fig. 1. The
HRES comprises PV, WT, inverters/converters, BES, and the load of a particular
location at Rafsanjan, Iran [15].

2.1 Mathematical Modeling of Components of Hybrid

Energy System: [15, 17]

The mathematical modeling of components of HRES like PV, WT, and BES is
referred from [15, 17].

Fig. 1 Hybrid energy system under consideration

124 G. M. Karve et al.

Fig. 2 Average daily solar 1000

Solar Irradiance
irradiance for a specific 800
location

(W/m2)
600
400
200
0
1 3 5 7 9 11 13 15 17 19 21 23
Time (Hour)

2.1.1 Solar PV System

Figure 2 shows the average daily solar irradiance of Rafsanjan, Iran [15]. The output
power of each solar PV panel is derived as per Eq. (1) and total PV power output can
be found by using Eq. (2).
⎧ ( 2 )
⎪
⎪
r
0 ≤ r ≤ Rcr
⎨ Prs ( Rsrs ·R
)cr
Ppv (t) = Prs r Rcr ≤ r ≤ Rsrs (1)
⎪
⎪ Rsrs
⎩
Prs Rsrs ≤ r

Ppv(t) = Npv × Ppv(t) (2)

where,
Prs Rated PV power in watt,
r Solar radiation W/m2 ,
RCR Certain solar radiation in W/m2 ,
Npv Number of PV panels,
Ppv Power output of all PV panels,
ppv−Each (t) Power rating of a PV panel, and
RSRS Solar radiation with the standard environment as 1000 W/m2 .

2.1.2 Wind System

The average daily wind velocity of Rafsanjan, Iran [15] is depicted in Fig. 3. The
output power of each WT is derived as given in Eq. (3) and the total power output of
WT can be found by using Eq. (4).
Optimum Sizing of Solar/Wind/Battery Storage in Hybrid Energy … 125

Fig. 3 Average daily 8

Wind Speed(m/s)
velocity of wind for a
6
specific location
4
2
0
1 3 5 7 9 11 13 15 17 19 21 23
Time (Hour of day)

⎧
⎪
⎨ Pwn Vwr ≤ Vw (t) ≤ Vco
Pw (t) = Pwn [V[Vwwn(t)−Vci ]
−Vci ]
Vci ≤ Vw (t) ≤ Vwr (3)
⎪
⎩0 Vw (t) ≤ Vci or Vw (t) ≥ Vco

Pwt (t) = Nwt × Pwt (t) (4)

where;

Pwt (t) Power generated by each panel in watts at time t

Pwt (t) Total power output (watt) from all WT at time t
N wt Number of WT
Vw Velocity of wind in m/s,
Pwn Nominal output power (watt) supplied by the installed WT,
V ci , V co , V wn Cut-in wind velocity, cut-out wind velocity, and nominal wind
velocity, respectively, in m/s.

2.1.3 Battery Energy Storage (BES)

Depending upon the charge available in the battery (SOC), the BES either can fulfill
the load demand (discharge) or store the excess power if the generated power by
RES (PV/WT) is greater than the load (charge). The energy of the BES at the time
of charging and discharging can be derived from Eqs. (5) and (6) respectively. All
these equations are referred from [15, 17] explicitly (Fig. 4).
Battery Power (watt)

Fig. 4 Battery power during 6000

charging-discharging [17] 4000
2000
0
-2000 1 3 5 7 9 11 13 15 17 19 21 23
-4000
Time (Hour)
126 G. M. Karve et al.

Fig. 5 Average daily load 3000

curve for a specific location 2500

Load (watt)
[15, 17] 2000
1500
1000
500
0
1 3 5 7 9 11 13 15 17 19 21 23
Time (Hour)

A. Charging Mode (EBatt = SOC)

[ ]
E batt (t) = E batt (t − 1) × (1 − σ ) + E pv (t) + E wt (t) − E load (t)/ηinv ) × ηbatt
(5)

B. Discharging Mode
[ ]
E batt (t) = E batt (t − 1) × (1 − σ ) (E load (t)/ηinv ) − E pv (t) + E wt (t) × ηbatt (6)

where;

E pv (t) Energy generated by solar PV (kWh),

E batt (t) Energy stored in battery (kWh),
E load (t) Energy required by the load in kWh,
σ Self-discharge rate of BES,
ïinv Efficiency of inverter in percentage, and
ïbatt Efficiency of BES.

2.1.4 Load

The real-time data from the load curve of a particular area (Rafsanjan, Iran) by
averaging out data of one year into one day or 24 h [15, 17] (Fig. 5).

2.2 Objective Function and Constraints

This work aims to find out the optimum capacity size of the hybrid energy system
(HES) components, which is carried out by minimizing the TAC of these components.
The TAC is addition of capital cost (C Cpt ) and maintenance cost (C Mtn ) of every
component of HES such as solar PV, wind turbine, and BES annually. The cost
intended at the time of the project installation is the capital cost and the cost intended
Optimum Sizing of Solar/Wind/Battery Storage in Hybrid Energy … 127

during the working of the project is the maintenance cost. The minimum TAC is
attained by reducing C Cpt and C Mtn of solar PV, wind turbine, and BES annually.
Equations (7)–(9) are depicting these costs determined along with the equality and
inequality constraints as ΔP = (Pgen − Pdem ) ≥ 0.

TAC = (CCpt + CMtn )of PV, WT and battery (7)

[( )]
Capital Cost ofHRES = CCptpv + CCptwt + CCptbatt
[( ) ]
Capital Cost of HRES = Npv × Cpv + (Nwt × Cwt ) + (Nbatt × Cbatt ) (8)

[ ]
CMtn = (CMtnpv + CMtnwt + CMtnbatt ) (9)

where, C Cpt : Capital cost of solar PV (C CptPV ), Wind (C Cptwt ) and battery (C Cptbatt ),
C Mtn : Maintenance cost of solar PV (C MtnPV ), Wind turbine (C Mtnwt ) and battery
(C Mtnbatt )

3 Size Optimization Algorithms

Optimization is the process of finding out the optimum solution to make something as
feasible and efficient as possible by minimizing or maximizing the problem variables.
In this study, an attempt is made to determine the optimum capacity size of PV/wind/
BES to fulfill the load demand of a particular area by minimizing system TAC using
two optimization techniques as IPSO and FA for three system configurations.

3.1 Improved Particle Swarm Optimization (IPSO) [15, 17,

18]

PSO and its types are introduced by Kennedy and Eberhart in 1995. It is a heuristic
algorithm to determine the optimized solution to a problem. In the IPSO algorithm,
every viable solution to the optimization problem is represented by a ‘particle’ and
it is stated by a velocity vector. The mathematical modeling of this optimization
method is given by Eq. (10) and Eq. (11), which are referred from [15, 17, 18].

Vi+1 = ω · Vi + C1 · r1 (Pbest − X i ) + C2 · r2 (G best − X i ) (10)

X i + 1 = Vi + 1 + X i (11)
128 G. M. Karve et al.

where,

V Velocity of particle,
X Position of particle,
C1, C2 Acceleration constants,
r1, r2 Randomly generated numbers between 0 and 1, and
Pbest , Gbest Local and global best positions of particles.

3.2 Firefly Algorithm (FA): [11, 12, 16]:

The firefly algorithm (FA) was introduced by Xin-She Yang and is inspired by the
flashing behavior of fireflies. The ideal rubrics which are followed in this algorithm
are as given below.
1. All fireflies have the same sex, hence one firefly gets attracted to other fireflies.
2. The brightness of fireflies decides the attractiveness. Therefore, the less luminant
firefly gets attracted by the more luminant one. The attraction is dependent on the
intensity of brightness and they both are inversely proportional to the distance.
If there is no brighter firefly than a certain firefly, it flies arbitrarily.
3. The brightness of the firefly changes with the objective function.
The mathematical modeling of this optimization method is based on the above
three rules and is given by Eqs. (12)–(16). The attractiveness (β) of a firefly is the
function of the distance ‘r’. Hence the relation between attractiveness and distance
of two fireflies is mentioned as

β = β0 · e−γ r2 (12)

The distance of separation of firefly ‘i’ and firefly ‘j’ which have their positions
as ‘X i ’ and ‘X j ’ can be expressed by Eq. (13).
/[( ) ( ) ]
ri j = X i − X j 2 + Yi − Y j 2 (13)

The movement of a firefly ‘i’ is determined as

( )
X i+1 = X i + β X j − X i + α(rand − 0.5) (14)

( )
X i+1 = X i + β0 · e − γ ri j 2 X j − X i + α(rand − 0.5) (15)

where,

Xi Present location of a firefly ‘i’,

Xj Brighter location of another firefly ‘j’, which is at a distance ‘r’ from firefly
‘i’,
Optimum Sizing of Solar/Wind/Battery Storage in Hybrid Energy … 129

β0 Attractiveness at source or reference point,

β Attractiveness of a firefly ‘i’ at its present location,
γ Coefficient of light absorption ranges from 0.1 to 10,
α Randomization variable,
rand Randomly generated number between 0 and 1, and
X i+1 Brightest firefly.

The various terms involved in Eq. (15) are

the 1st term (Xi ) is the present location of firefly ‘I’,
the 2nd term (β 0 · e−γ r2 (X j − X i )) signifies a firefly’s attractiveness while the last
term (α(rand − 0.5)) is used for the random movement,
X i+1 : Updated location (brighter position) of a firefly ‘I’.
For most cases, β 0 = 1.
Practically, the light absorption coefficient (γ ) defines attractiveness (β) variation
and its value indicates the speed of the FA convergence. So, for the brightest firefly
(X b ), the 2nd term changes as
( ( )) ( )
2nd term = β0 · e−γ r2 X j −X i = X j −X i = 0 X b = X b + α[rand − 0.5]
(16)

3.3 Procedure for Implementing IPSO [18] and FA

1. Read input parameters for solar PV as solar irradiance, panel efficiency, and
power rating of single PV panel for 1st and 3rd system configuration.
2. Read input parameters for wind system as wind speed (wind cut-in, cut-out,
nominal), and power rating of single WT for 2nd system configuration.
3. For all three system configurations, read data of load demand over 24 h [15].
4. Compute average annual load demand (kWh) over 24 h [15, 17].
5. Set the number of PV panels (for system configurations 1 and 3) or wind systems
(for system configurations 2 and 3) to one. Calculate the power generated by
PV by using Eqs. (1) and (2).
6. Find out differential power as ΔP = (Pgen − Pdem ).
7. If differential power (ΔP) < zero, then follow step 8. Otherwise, follow step 9.
8. Increase the number of PV panels by one and repeat the process from step 3.
Get the optimum quantity of PV panels (N PV ). Estimate total PV power.
9. Calculate ΔP over a period of 24 h by using N PV .
10. Find ΔP curve over a period of time ∫and convert it∫into the energy curve (ΔW ).
11. From the energy curve as [ΔW = (ΔP dt = (Pgen − Pdem )dt], estimate
battery capacity. Considering SOC, the self-discharge rate of the battery, and
inverter losses, compute the optimum capacity size of the battery.
130 G. M. Karve et al.

12. Find out the optimum quantity of batteries (N Batt ) by taking into account the
1.35kWh capacity of single BES and SOC for lithium-ion and lead acid batteries.
13. Repeat the steps from 2 to 14, for calculating power generation by wind turbine
(WT) by using Eq. (7). Find the optimum number of wind turbines (N WT ) and
batteries (N Batt ) for 2nd system configuration.
14. Repeat the steps from 1 to 14, to find out the optimum N pv , N WT , and N batt for
the 3rd system configuration.

By following the above steps from 1 to 14, IPSO and FA are implemented and
coded in MATLAB 2020 version for the objective function of minimum TAC for the
three system configurations. The results obtained are organized below from Tables 2,
3 and 4 for all system components for both optimization methods.

3.4 Comparison of Optimization Methods IPSO and FA

Though IPSO and FA both are nature inspired and swarm intelligence-based algo-
rithms, they both have some significant differences. FA can exhibit better characteris-
tics than IPSO due to its nonlinear and dynamic nature, which are briefly summarized
in Table 1.

4 Results and Discussions

Tables 2 and 3 give the optimum number of PV panels, WT, and batteries required
to fulfill the load demand for three system configurations as PV-BES; WT-BES, and
PV-WT-BES for minimum TAC using two optimization algorithms—IPSO and FA.
Table 2 considered lithium-ion battery and compared the results obtained with the
results given in [15]. Table 3 considered the optimum system components with lead
acid battery for the same system configurations using IPSO and FA.
Table 4 gives the impact of change in % SOC of both battery chemistries (lithium-
ion and lead acid) on their optimum number and also on the overall annual cost of
the system to fulfill the same load demand. The results are shown only for the third
system configuration as PV-WT-BES with two optimization algorithms—IPSO and
FA.
From Table 2, it is seen that the results obtained by implementing IPSO and FA
are reasonably matching with the results in [15]. It is observed that time taken by
FA method to reach to optimum solution is considerably reduced than that of IPSO
method. In this particular case (Rafsanjan, Iran), it is also seen that out of three
system configurations, WT-BES configuration gives optimum solution. But this may
not be a generalized statement as availability of solar irradiance and wind velocity
at that specific location must be checked prior to the installation of HRES.
Optimum Sizing of Solar/Wind/Battery Storage in Hybrid Energy … 131

Table 1 Comparison of IPSO [15, 18] and FA

Points of comparison Improved particle swarm Firefly algorithm (FA)
optimization (IPSO)
Type of the algorithm IPSO is based on swarm FA is based on swarm
intelligence like fish schooling intelligence like fireflies
Linearity of system ‘Linear system’ due to the ‘Nonlinear system’ due to the
presence of linear terms in presence of the nonlinear
‘position (X i )’ and ‘velocity attraction
( term )as
(V i+1 )’ equations β0 exp(−γ ri j2 )
Modeling of optimization V i+1 = ω·V i + C 1 ·r 1 (Pbest − X i+1 = X i + β(X j − X i ) +
algorithm X i ) + C 2r2 (Gbest − X i ) α(rand − 0.5) and X i+1 = X i +
X i+1 = V i+1 + X i β 0 ·e−γ rij2 (X j−i ) + α (rand −
0.5)
Ability of multi-swarming IPSO does not have this ability FA has this ability due to its
due to its linearity. Hence, it nonlinearity. Hence FA can find
cannot find multiple optimum multiple optimum solutions at
solutions at the same time the same time efficiently
Drawbacks associated with As IPSO uses the ‘velocity FA does not have a ‘velocity
velocity equation’, it has disadvantages equation’, so does not have the
related to velocity initialization disadvantage of velocity
and instability for particles with initialization and instability for
high velocities particles with high velocities
Scaling control IPSO has no scaling control, FA has scaling control (via γ ),
hence less flexibility which makes FA more flexible
Convergence rate Slower when compared with FA Faster when compared with
IPSO
Advantages Simple method, reasonably Easy to implement, accurate
accurate
Disadvantages Slow convergence compared The method does not guarantee
with FA, method may be stuck to find of the global optimal
in local minima solution

The mismatch in results regarding number of the system components may be due
to the inexact mapping of wind velocity and solar irradiance data than that of [15].
While, the mismatch in system’s overall cost than mentioned in [15] is due to the
changes in the cost of PV, WT, and BES.
Table 3 gives the results obtained for lead acid batteries by implementing IPSO
and FA. It is observed that time taken by FA method to reach to optimum solution
is considerably reduced than that of IPSO method. In this case (Rafsanjan, Iran),
it is also seen that out of three system configurations, WT-BES configuration gives
optimum solution. But this may not be a generalized statement as availability of solar
irradiance and wind velocity at that specific location must be checked prior to the
installation of H RES . After comparing Tables 2 and 3, it is seen that the cost of the
system is more with lead acid batteries than that of cost of the system with lithium-ion
batteries. Again, it may not be a general statement as the cost of batteries changes
with the advanced research in the material or chemistry of the battery. So, this work
132 G. M. Karve et al.

Table 2 Optimum N PV , N WT , and N batt (lithium-ion) by IPSO [17] and FA methods

Sr. no. System configuration No. of PV panels, no. of wind turbines, and no. of BES
Paper [15] IPSO FA
1 PV + BES + load N PV = 57, N Batt = N PV = 61, N Batt = N PV = 61, N Batt =
79 80 80
TAC of the system in 5957.47 6237.42
$
Time elapsed (s) – 15.12 4.21
2 WT + BES + load N WT = 10, N Batt = N WT = 8, N Batt = 7 N WT = 8, N Batt = 7
7
TAC of the system in 4554.98 2421.89
$
Time elapsed (s) – 16.12 5.13
3 PV + WT + BES + N PV = 10, N WT = N PV = 12, N WT = N PV = 12, N WT =
load 8, N Batt = 8 7, N Batt = 8 7, N Batt = 8
TAC of the system in 4623.15 5852.53
$
Time elapsed (s) – 22.92 8.95

Table 3 Optimum number of PV, WT, and batteries (lead acid) by IPSO and FA methods
Sr. no. System configuration No. of PV panels, no. of wind turbines, and no. of BES
IPSO FA
1 PV + BES + load N PV = 61, N Batt = 110 N PV = 61, N Batt = 110
TAC of the system in 8742.54
$
Time elapsed (s) 15.12 4.21
2 WT + BES + load N WT = 8, N Batt = 12 N WT = 8, N Batt = 12
TAC of the system in 5341.76
$
Time elapsed (s) 16.12 5.13
3 PV + WT + BES N PV = 12, N WT = 7, N Batt = N PV = 12, N WT = 7, N Batt =
(1.35 kWh) + load 14 14
TAC of the system in 7856.28
$
Time elapsed (s) 22. 92 8.95

has given a choice to the user to select any particular type of battery chemistry (either
lithium-ion or lead acid) as per the requirements or feasibility or economy.
Table 4 presents the effect of % SOC of both battery chemistries (lithium-ion and
lead acid) on their optimum number required for PV-WT-BES system configuration
with both optimization algorithms as IPSO and FA.
Optimum Sizing of Solar/Wind/Battery Storage in Hybrid Energy … 133

Table 4 Results of IPSO and FA method for number of lithium-ion battery and lead acid battery
for 3rd system configuration (PV-WT-BES)
Sr. % SOC of No. of No. of No. of LI No. of Convergence Convergence
no. BES PV WT BES LA BES time for IPSO time for FA (s)
panels (N WT ) (N Batt ) (N Batt ) (s)
(N PV )
1 1350 12 7 8 15 22. 9 8.95
(100%)
2 1080 (80%) 13 7 11 20 20.42 7.61
3 810 (60%) 13 11 42 80 14.61 8.32
4 675 (50%) 16 13 60 ara> 98 15.32 15.32
5 270 (20%) 19 16 75 120 17.96 6.24

The result Tables 2, 3 and 4 showed odd numbers for PV panels, wind turbines
and batteries, but practically even numbers can be preferred after checking technical
and economic feasibility. From Table 4, it is seen that, if the SOC of both battery
chemistries (lithium-ion and lead acid) is reduced from 100% (1350 Wh) to 20%
(270 Wh), then the number of batteries, number of PV panels, and number of WT
required to fulfill the same load demand are increasing for the same system config-
uration. This observation is valid for both optimization algorithms—IPSO and FA.
But FA converges faster than IPSO. Though Table 4 gives the ideal operating % SOC
range of both batteries from 20% (270 Wh) to 100% (1350 Wh), practically the range
of the % SOC of lead acid battery is from 50% (675 Wh) to 100% (1350 Wh) and
that of lithium-ion battery is from 20% (270 Wh) to 100% (1350 Wh). This is shown
in Fig. 6a and b respectively.
Figure 6 shows variation in % SOC of lithium-ion and lead acid battery for 24 h
of a day, respectively. Figure 6 indicates the range of % SOC for lithium-ion battery
which is in between 20 and 100% and for lead acid battery is in between 50 and

(a) (b)
Fig. 6 a % SOC for lead acid battery (20–100%). b % SOC for lithium-ion battery (50–100%)
134 G. M. Karve et al.

Fig. 7 a Convergence for IPSO. b Convergence for FA

100%, which is nothing but one cycle of charging-discharging of that particular

battery. These ranges of % SOC are very important as they decide required number
of batteries to fulfill the load demand. From the results, given in Table 4, it is realized
that lithium-ion batteries needed are less in numbers than that of lead acid batteries to
fulfill the same load demand and hence it will affect overall system cost considering
cost of BES chemistry.
The convergence curves for two optimization methods as IPSO and FA are shown
in Fig. 7a, b respectively for only one run of the program specifically. From these
figures, it is observed that convergence of FA toward the optimum solution is faster
than IPSO algorithm.

5 Conclusions

The paper discussed the optimum sizing of PV, WT, and BES by minimizing TAC of
the hybrid energy system using IPSO and FA. After comparing results of IPSO and
FA with results of referred paper, it is seen that both methods are sensibly accurate,
but FA converges faster than IPSO. For the particular case of Rafsanjan, Iran, it
is also seen that out of three system configurations (PV-BES, WT-BES, PV-WT-
BES), WT-BES configuration gives optimum numbers of system components with
minimum system cost. But this may not be a generalized comment as availability of
solar irradiance and wind velocity at that specific location must be checked prior to
the installation of HRES. The impact of SOC variations of battery on optimum sizing
of system components is also analyzed with these two methods. It is observed that
as % SOC of battery goes on decreasing, the optimum size of battery is increasing
which will increase the overall system cost. This effect is analyzed for two battery
chemistries—lithium-ion and lead acid. The result shows that a greater number of
lead acid batteries are required than that of lithium-ion batteries to fulfill the same
load demand.
Optimum Sizing of Solar/Wind/Battery Storage in Hybrid Energy … 135

References

1. Integrating high levels of renewables into microgrid: opportunities, challenges, strategies, a

GTM research white paper (government/authoritative report giving concise information or
proposals on a complex issue), sponsored by ABB, Feb 2016
2. Bruch M, Müller M (2013) Calculation of the cost-effectiveness of a PV battery system. In:
Proceedings of the 8th international renewable energy storage conference (IRES 2013), Berlin,
Germany, 18–20 Nov 2013
3. Sajed Sadati SM, Jahani E, Taylan O, Baker DK (2018) Sizing of PV-wind-battery hybrid
system for a Mediterranean Island community based on estimated and measured meteorological
data. J SolEnergy Eng 140:011006-1–011006-12
4. El Boujdaini L, Mezrhab A, Moussaoui MA, Jurado F, Vera D (2022) Sizing of a stand-alone
PV–wind–battery–diesel hybrid energy system and optimal combination using a particle swarm
optimization algorithm. Springer, Electrical Engineering, 15 Feb 2022, pp 1–21
5. Ma T, Javed MS (2019) Integrated sizing of hybrid PV-wind-battery system for remote
island considering the saturation of each renewable energy resource. Energy Convers Manage
182:178–190
6. Traoré A, Elgothamy H, Zohdy MA (2018) Optimal sizing of solar/wind hybrid off-grid
microgrids using an enhanced genetic algorithm. JPEE 64–77
7. Al Busaidi AS, Kazem HA, Al-Badi AH, Khan MF (2015) A review of optimum sizing of
hybrid PV–wind renewable energy systems in Oman. Renew Sustain Energy Rev 53:185–193
8. Yahiaoui A, Tlemçani A (2022) Enhanced whale optimization algorithm for sizing of hybrid
wind/photovoltaic/diesel with battery storage in Algeria desert. Wind Eng 46(3):844–865
9. Suman GK, Guerrero J (2021) Roy OP (2021) Optimisation of solar/wind/bio-generator/diesel/
battery based microgrids for rural areas: a PSO-GWO approach. Sustain Cities Soc 67:102723
10. Diab AAZ, Sultan HM, Mohamed IS, Kuznetsov ON, Do TD (2019) Application of different
optimization algorithms for optimal sizing of PV/wind/diesel/battery storage stand-alone
hybrid microgrid. IEEE Access 7:119223–119245
11. Sanajaoba S (2019) Optimal sizing of off-grid hybrid energy system based on minimum cost
of energy and reliability criteria using firefly algorithm. Sol Energy 188:655–666
12. Kaabeche A, Diaf S, Ibtiouen R (2017) Firefly-inspired algorithm for optimal sizing of
renewable hybrid system considering reliability criteria. Sol Energy 155:727–738
13. Hafez AA, Abdelaziz AY, Hendy MA, Ali AFM (2021) Optimal sizing of off-line microgrid
via hybrid multi-objective simulated annealing particle swarm optimizer. Comput Electr Eng
94:107294
14. Sasidhar K, Kumar BJ (2015) Optimal sizing of PV-wind hybrid energy system using genetic
algorithm (GA) and particle swarm optimization (PSO). IJSETR 4(2):8396–8410
15. Maleki A, Askarzadeh A (2014) Comparative study of artificial intelligence techniques
for sizing of a hydrogen-based stand-alone photovoltaic/wind hybrid system. International
Conference on Hydrogen Energy 39:9973–9984
16. Nazarian P, Hadidian-Moghaddam MJ (2015) Optimal sizing of a stand-alone hybrid power
system using firefly algorithm. International Journal of Industrial Electronics and Electrical
Engineering 3(4). ISSN: 2347-6982
17. Karve GM, Kurundkar KM, Vaidya GA (2019) Implementation of analytical method and
improved particle swarm optimization method for optimal sizing battery energy storage hybrid
system of a standalone PV/wind. In: 2019 IEEE 5th international conference for convergence
in technology (I2CT), 29–31 March 2019, pp 1–6
18. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of IEEE
international conference on neural networks, pp 1942–1948
Fuzzy Based MPPT Control of Multiport
Boost Converter for Solar Based Electric
Vehicle

Vishnukant Gore and Prabhakar Holambe

Abstract Power converters play an important role in electrical vehicles from

charging the battery pack to driving the motor. To make this electrifying journey
sustainable, renewable energy will help to increase the range of EVs. In this work,
a multiport boost converter is used for solar PV, energy storage, and DC load. Solar
power is not constant throughout the day, power converter is required for power
management between solar PV and energy storage to give continuous supply to DC
load. The fuzzy-based MPPT controller is used. Fuzzy-based controller with time-
sharing control for proposed converter gives constant supply to load. The multiport
converter operates in different modes, such as multiple input and multiple outputs.
BMS algorithm with mode selection logic is included to prevent overcharging and
deep discharge of energy storage. This prolongs battery life and prevents battery
decay.

Keywords Multiport boost converter · Fuzzy · Solar PV · Energy storage

system · DC load · Power management

1 Introduction

The rapid increase in the number of conventional automobiles causes an increase in

transportation-related greenhouse gas emissions. Fossil fuels must be replaced with
alternative fuels. Electric cars are essential to reach this objective. The present barrier
to vehicle electrification is the price-to-range imbalance and short battery lifecycle
of battery packs. To increase the range of EV we can add another source of energy or
we can increase battery pack. By adding another source of power, it will increase the
range of EV and also it will protect battery life. Solar PV, super capacitors and fuel
cells can be used as a secondary source for power in EVs. For this purpose power

V. Gore (B) · P. Holambe

Electrical Engineering Department, College of Engineering Pune, Pune, India
e-mail: Vishnugore513@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 137
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_12
138 V. Gore and P. Holambe

electronics plays an important role to manage the power between two energy sources
[1] and [2].
Single inductor-based topologies have been proposed in [3–6], single inductor-
based multiple ports DC–DC converters based on the main two step-up and step-
down types. By reducing the number of conduction devices in each stage, the size
of the converter reduces, and the converter becomes more effective. Multiple power
sources are used by single inductor-based multiport topologies, cascade and parallel
multiport converters, to power a wide range of applications. It has minimum losses
and simple control when compared with multiple converters [5, 7–9].
The purpose of this research is to provide a novel battery management system and
an MPPT controller based on fuzzy logic for an isolated PV system. A new method
is used to develop the energy management system with fuzzy-based MPPT control.
In different irradiance, MPPT always track maximum power point [10, 11].
In this work, multiple input/output DC–DC converter is designed for the power
management between solar PV and energy storage system. The control technique
used is a time-sharing closed-loop control, whereas it maintains the power flow
between solar PV, hybrid energy storage, and load. The maximum power point
tracking algorithm is used to extract maximum power from solar PV. The power
converter manages power flow between solar PV and energy storage systems to give
continuous supply to the load. It will also monitor the power generation at solar
PV and charge–discharge of the energy storage system accordingly. The proposed
system will be simulated in a MATLAB Simulink environment.

2 Proposed System

Figure 1 displays the block diagram of the proposed work in which there are two
different sources giving constant supply to the load. Solar PV with energy storage is
fed to the dc load via a multiport boost converter. The main source for the standalone
DC load is solar PV, with batteries operating as energy storage. The control block
presents the MPPT algorithm to track maximum power point from solar PV panel
and the voltage controller to keep continuous output voltage. The power modulation
block is used to generate required duty signals of three switches. The generated PWM
signals go to gate driver circuit, this provides isolation between two voltage levels
and gives required gate pulse to turn on and turn off power converter switches. The
power management between solar, battery, and load is done by using a state flow-
based modified time-sharing control scheme. Solar power depends on the external
parameters, for the uninterrupted power supply to load energy storage is required.
The multiport non-isolated boost converter is used for power management between
solar PV, ESS, and DC load [12, 13].
Fuzzy Based MPPT Control of Multiport Boost Converter for Solar … 139

Fig. 1 Block diagram of the proposed system

3 Solar PV

3.1 Solar PV Array

In order to generate the required amount of energy, a solar photovoltaic system uses
solar energy. To track maximum power, we need some power electronics system.
Photovoltaic cells with MPPT algorithms are employed to continuously capture the
maximum solar energy. At a specific temperature and level of irradiation, the solar
PV module’s output is determined by the PV voltage and current drawn by the load.
By adjusting the solar arrays in series and parallel combinations, we can design solar
panels. From Fig. 2 we can determine the power to voltage relation. For hardware
purposes, the solar simulator can design rated solar panels.

Fig. 2 Characteristics of solar PV

140 V. Gore and P. Holambe

3.2 Fuzzy-Based MPPT Control

Maximum power point algorithms help to track MPP in solar system. Different
methods are present with different algorithms, i.e., modified P and O, I and C.
Fuzzy logic-based method is easy and efficient. Multirule-based resolution and multi-
variable consideration for both linear and nonlinear fluctuations of parameters are
two characteristics of fuzzy logic control. Additionally, it can function with improper
inputs. Fuzzification, rule base, inference engine, and defuzzification are the four
aspects of a fuzzy logic system. Figure 3 shows the fuzzy-based MPPT algorithm
[14].

Fig. 3 Fuzzy-based MPPT algorithm

Fuzzy Based MPPT Control of Multiport Boost Converter for Solar … 141

4 Non-isolated Multiport Boost Converter

The proposed converter shown in Fig. 4 is a non-isolated multiport boost converter.

It consists of three diodes and three semiconductor switches. The switch S1 is used
as a boost switch, S2 for charging the battery, and S3 for discharging the battery. The
diode D1 is before the inductor prevents the solar from reverse power flow. The D2
is used for reverse flow from load and D3 for the battery. The capacitor Ci reduces
input voltage ripples from solar PV and Co from the load output side. The proposed
topology is operated in different modes (1) Double input in which solar is insufficient
to match load demand, then solar and battery both supplies to load. (2) Single input
mode when only solar or battery supply to load (3) Double output mode [DO] when
solar power more than load demand. It delivers a load and a battery [15].

4.1 Double Output Mode

When the solar power is more than the load requirements, the solar PV panel serves as
the primary source of supply for the energy storage and the DC load. Figure 5 shows a
multiport converter in double mode operation. When the solar power increases or the
load decreases then the system goes into DO mode. In this mode, charging switch S2
is on and switch S3 becomes off state. By charging and discharging of inductor, the
proposed converter operates in three stages switch S1 is turned on during the initial
step and solar PV charges the inductor L. Power flows through this switch by D1,
passes the current and D2 and D3 block the current. Here ip, V p represent primary
source current and voltage, current of inductor and output load voltage are represented
as, iL and V o , respectively. By switching the operation of switches, we perform this

Fig. 4 Non-isolated multiport boost converter

142 V. Gore and P. Holambe

Fig. 5 Multiport converter in double mode operation

operation. Due to single inductor-based topology, conduction devices at one stage

are less, and also controller becomes easier. Hence conduction loss is reduced. To
determine output voltage, we use the volt-second balance in a conventional boost
converter. Similarly, in this mode also, we can determine output voltage by using
volt-sec balance
V P V −Vbat DS2
Vo =
D D1

In this mode, solar irradiance increases, and solar power becomes more than the
load demand. The proposed converter in this stage acts like double input mode.
Solar PV as a primary source for the load and battery both. In this mode only switch
1 and switch 2 operates. Third switch becomes off during this stage. In the first
stage primary source supply to the inductor and inductor fully charges. In the next
stage, inductor supply to the load as an only first switch is on. Then by turning third
switch on inductor supplies power to the battery. Figure 6 indicates different stages
of converter in this mode.

5 Proposed Controller

The MPPPT algorithm is set to extract the maximum power from primary source,
i.e., solar PV. Solar power is not constant throughout the day. MPPT by adjusting the
impedance will help to operate solar PV close to maximum power point under varying
conditions like solar irradiance temperature. Here the simple I and C method is used.
The mode selection is important in this control strategy. For the mode selection in
Fig. 7, the PV power and load power taken as an input and Msel is the mode selection
Fuzzy Based MPPT Control of Multiport Boost Converter for Solar … 143

Fig. 6 Three stages of converter in double mode

144 V. Gore and P. Holambe

Fig. 7 Mode selection logic

signal. The state flow control logic is used to select the appropriate mode according to
excess and deficit in solar power. In the state chart, conditions are given to transitions
to select the mode. When there is excess solar PV power, then Msel gives signal to
0 which is DOBM and when there is deficit solar power then gives signal 1 which
is DIBM mode. By using state flow, the control becomes easy and we can see the
changes in the transitions live. Battery charge control is added to control the charging
and discharging of battery in different modes. Lifecycle is main issue in the Lithium-
ion batteries. By limiting the over-charge and over-discharge we can protect battery
lifecycle. Here battery can charge only below 80% SOC and can discharge only
above 20% SOC.

6 Results

In this instance, when the solar irradiance rises from 400 to 800 W/m2 and vice
versa, changes in solar irradiation result in an increase in solar power of 40–70 W.
Due to the system entering DO mode, solar power is more than load power, and
the mode selection block’s output is 1. When solar power is not sufficient to fulfill
load requirement, then both supply to load. For system power management, MPPT
duty signal and voltage control signal as input to the controller. The battery holds
Fuzzy Based MPPT Control of Multiport Boost Converter for Solar … 145

Fig. 8 Simulation results power management between different modes

the remaining energy. By charging and discharging an inductor, solar energy in this
case, provides power to a load as well as a battery. Those changes in mode but output
voltage remains constant (Fig. 8).

7 Conclusion

The modified fuzzy-based MPPT with time-sharing control is used for power flow
control between solar and battery. In this work, a fuzzy-based algorithm is, used for
MPPT which is more efficient than conventional. The suggested converter provides
a constant supply to the load through the operation with multiple inputs and outputs.
Mode selection logic increases system effectiveness and slows down battery degra-
dation. Performing simulation operations in different conditions with a modified
control method provides better results.

References

1. Oosthuizen C, Van Wyk B, Hamam Y, Desai D, Alayli Y (2019) Lot R (2019) Solar
electric vehicle energy optimization for the Sasol Solar challenge 2018. IEEE Access
7:175143–175158. https://doi.org/10.1109/ACCESS.2019.2957056
2. Elshafei M, Al-QutubA, Saif AA (2016) Solar car optimization for the world solar challenge.
In: 2016 13th international multi-conference on systems, signals & devices (SSD), pp 751–756.
https://doi.org/10.1109/SSD.2016.7473675
3. Jiang W, Fahimi B (2011) Multiport power electronic interface—concept, modeling, and
design. IEEE Trans Power Electron 26(7):1890–1900
4. Ki W-H, Ma D (2001) Single-inductor multiple-output switching converters, vol. 1, pp 226–231.
https://doi.org/10.1109/PESC.2001.954024
146 V. Gore and P. Holambe

5. Bandyopadhyay S, Chandrakasan AP (2012) Platform architecture for solar, thermal, and

vibration energy combining with mppt and single inductor. IEEE J Solid-State Circuits
47(9):2199–2215
6. Holambe P, Dambhare S (2021) Sensorless robust controller for buck converter using modified
fast terminal sliding surface. IEEE Kansas Power and Energy Conference (KPEC) 2021:1–6.
https://doi.org/10.1109/KPEC51835.2021.9446228
7. Shao H, Li X, Tsui CY, Ki WH (2014) A novel single-inductor dualinput dual-output DC–DC
converter with PWMcontrol for solar energy harvesting system. IEEE Transactions on Very
Large Scale Integration (VLSI) Systems 22(8):1693–1704
8. Babaei E, Abbasi O (2016) Structure for multi-input multi-output DC–DC boost converter.
IET Power Electronics 9(1):9–19
9. Khan SA, Islam MR, Guo Y, Zhu J (2019) A new isolated multi-port converter with multi-
directional power flow capabilities for smart electric vehicle charging stations. IEEE Trans-
actions on Applied Superconductivity 29(2):1–4, Art no. 0602504. https://doi.org/10.1109/
TASC.2019.2895526
10. Holambe PR, Talange DB, Bhole VB (2015) Motorless solar tracking system. International
Conference on Energy Systems and Applications 2015:358–363. https://doi.org/10.1109/
ICESA.2015.7503371
11. Srivastava A, Nagvanshi A, Chandra A, Singh A, Roy AK (2021) Grid integrated solar PV
system with comparison between fuzzy logic controlled MPPT and P&O MPPT. In: 2021 IEEE
2nd international conference on electrical power and energy systems (ICEPES), pp 1–6, https://
doi.org/10.1109/ICEPES52894.2021.9699492
12. Huang MH, Chen KH (2009) Single-inductor multi-output (simo) DC–DC converters with high
light-load efficiency and minimized cross-regulation for portable devices. IEEE J Solid-State
Circuits 44(4):1099–1111
13. Khaligh A (2007) Stability criteria for the energy storage bi-directional DC/DC converter in
the Toyota Hybrid System II. IEEE Vehicle Power and Propulsion Conference 2007:348–352.
https://doi.org/10.1109/VPPC.2007.4544149
14. Haji D, Genc N (2018) Fuzzy and P&O based MPPT controllers under different conditions. In:
2018 7th international conference on renewable energy research and applications (ICRERA),
pp 649–655. https://doi.org/10.1109/ICRERA.2018.8566943
15. Nguyen BLH, Cha H, Vu T, Nguyen T-T (2021) Integrated multiport bidirectional DC–DC
converter for HEV/FCV applications. IECON 2021—47th annual conference of the IEEE
industrial electronics society, pp 1–6. https://doi.org/10.1109/IECON48115.2021.9589188
16. Schuss C, Fabritius T, Eichberger B, Rahkonen T (2020) Impacts on the output power of
photovoltaics on top of electric and hybrid electric vehicles. IEEE Trans Instrum Meas
69(5):2449–2458. https://doi.org/10.1109/TIM.2019.2962850
17. Jacobs IS, Bean CP (1963) Fine particles, thin films and exchange anisotropy. In: Rado GT,
Suhl H (eds) Magnetism, vol III. Academic, New York, pp 271–350
18. Anand I, Senthilkumar S, Biswas D, Kaliamoorthy M (2018) Dynamic power management
system employing a single-stage power converter for standalone solar PV applications. IEEE
Trans Power Electron 33(12):10352–10362. https://doi.org/10.1109/TPEL.2018.2804658
19. Khaligh A, Cao J, Lee Y-J (2009) A multiple-input DC–DC converter topology. IEEE Trans
Power Electron 24(3):862–868
Image Classification Model Based
on Machine Learning Using GAN
and CNN Algorithm

Ch. Bhavya Sri, Sudeshna Sani , K. Naga Bavana, and Syed. Hasma

Abstract Many decagons faced several unresolved problems because of the growth
of machine learning, including image identification, image detection, picture cate-
gorization, etc. The most fundamental, traditional, and important subject matter of
research in the area of machine learning has always been image recognition. Image
recognition software progresses in society at a faster rate than technology. The protec-
tion of personal information, for instance, when using mobile phones, depends on
the picture recognition. For picture recognition, we used the GAN algorithm and the
CNN algorithm. To categorize segment, and recognize images, machine learning-
based image preprocessing technology is used. Nevertheless, because of the intricacy
of video images and the current nature of things in various application qualifications,
accurate categorization becomes vital and difficult. The usage of image recognition
technologies is very useful in the future generation.

Keywords Eigen faces · Support vector machine · Correlation matrix · Face

detection · Face recognition · CNN · GAN

1 Introduction

With the advent of technology, real-time facial gesture detection is becoming increas-
ingly important in the field of human–computer interaction. While we employ
contact-free and reasonable, face detection-based approaches [1, 2] that use warble
detection to identify face gestures and the vision-based approaches only require one
or more cameras for taking images or videos to recognize face movements. Numerous
vision-based static approaches for recognizing postures or specific poses as well as
dynamic methods for recognizing a series of postures and facial gestures have been
proposed. And machine learning is a significant and challenging subject for image
processing [3, 4], particularly in the field of enormous image processing where

Ch. Bhavya Sri · S. Sani (B) · K. Naga Bavana · Syed. Hasma

Koneru Lakshmaiah Education Foundation, Vijayawada, Andhra Pradesh, India
e-mail: sudeshnasani@kluniversity.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 147
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_13
148 Ch. Bhavya Sri et al.

machine learning approaches can be used to analyze complex data [5]. Machine
learning techniques can be created from complex data like disease identification by
plant leaves image processing [6]. To enable the fair application of image recog-
nition in many domains and industries, the primary features of an image are split
[7]. Machine learning-based image processing techniques have been extensively
employed in picture classification, segmentation, and recognition [8]. A technique
called biometrics is used to measure and examine a person’s physical and behavioral
traits.
An intriguing innovation in machine learning recently is a method called gener-
ative adversarial networks (GANs). GANs, or generative models, create new data
instances that resemble your training data. For example, GANs have the ability to
create images that resemble photographs of faces with human traits even when the
corresponding faces don’t actually belong to any living thing. The primary input for
a GAN algorithm is random noise. The generator then transforms this noise into
a useful output. By introducing noise and sampling from various points across the
target distribution, we may make the GAN provide a broad range of data.
CNN is a well-liked and efficient pattern detection and image processing approach.
It has many benefits, such as adaptability, a simple structure, and reduced training
requirements. Spatial correlations found in the input data are used by CNN. Each
concurrent layer of the neural network is coupled to certain input neurons. The area
is referred to as the “local receptive field”. The focal point of the local receptive field
is a hidden neuron. CNNs, often referred to as convolutional neural networks, are
a class of artificial neural networks used in deep learning and are frequently used
for object and image recognition and categorization [9]. As a result, deep learning
employs a CNN to recognize objects in a picture.

1.1 Face Detection

One of the best image analysis tools that have recently achieved prominence in our
surveillance and security-related applications is face recognition [10]. It requires
verifying someone’s identity by looking at their face. Based on the subject’s face
features, such as their eyes and nose, it captures, assesses, and contrasts patterns.
System access is granted, and authentication is put into place. It uses a human face’s
biometric patterns as part of its biometric identification mechanism.

1.2 Developing Face Recognition Software

Depending on the system’s settings, a face recognition system may collect an

incoming image from a camera device in 2D or 3D. Then, using the faces stored
in a database, it confirms the crucial details of the incoming picture signal in a
real-time image or video frame. According to [11], data derived from this kind of
Image Classification Model Based on Machine Learning Using GAN … 149

efficient real-time data is safer than data derived from a static image. Face recognition
is a commonly used method that uses biometrics to map facial attributes from our
database. Face verification is a method for comparing two faces to find the correct
person.

2 Literature Review

Zhu et al. in [11] used data complexity for the generation of contextual synthetic
data. In this work, calculating the length of boundary calculation technique is used
to calculate the data complexity [DC]. The length of the class boundary decides the
complexity of the data. The dimensionality may also influence classifier accuracy.
The use of synthetic datasets is more useful in analyzing the algorithm in a
controlled scenario. Several geometrical descriptors has been defined by several
studies by identifying characteristics of the datasets. These descriptors were found
useful to understand classifier performance. This approach can be more useful in
identifying the performance of the algorithm in different degrees of class imbalance.
They arrived at the expression for data complexity calculation which is given in
Eq. (1). They have taken the example of the generation of minimum spanning trees
with different data complexities.

P = b × (n − 1) (1)

b [0,1]
n Number of instances
b Length of class boundary (desired complexity)
p Number of edges connecting different classes.

Yuan in [12] applied visual attention-based networks for the synthesis of the
image. Here the main objective is to convert magnetic resonance (MRI) images
to computed tomography (CT) images using the fully convolutional networks, to
reduce the side effects for the patient because of the radiation due to CT scan. An
MR input picture is first divided into overlapping areas, and the generator is then
used to forecast the associated CT patch for each patch. The first method proposed in
this paper is supervised GANs, which consist of a network that contains a generator
for predicting the CT and a discriminator for separating the genuine CT from the
generated CT. Usually, GANs have two network-generators (G) and discriminators
(D). In order to reduce the binary cross entropy (BCE) between D’s decisions and the
appropriate label (real or synthetic), we minimize the BCE between D’s decisions and
the correct label (real or synthetic). Generators are FCNs that generate images, and
discriminators are CNNs that calculate the likelihood that the input image was created
from real images (real or synthetic). The second approach presented is auto-context
model (ACM) for refinement.
150 Ch. Bhavya Sri et al.

Wang et al. in [13, 14] used synthetic data for image segmentation. Synthetic data
is vital because it can be generated to meet specific requirements or conditions that
are not available in existing (real) data. The technique for synthetic data generation
is GAN. GAN is an unsupervised task in machine learning. Generative adversarial
networks consist of two models that automatically discover and learn the patterns
form the input data. The Generator and Discriminator models run in competition with
each other to generate new records, examine records, and classify the variances within
a dataset. The numbering of heading levels should be limited to two. Lower-level
headings are structured as run-in headings and are left unnumbered.
The generative adversarial self-attention network (SAGAN), which provides
attention-driven, long-range dependency modeling for image tasks, was employed by
Zhang, Han, and colleagues [15]. Generative adversarial networks (GANs) with tradi-
tional convolutional architectures only produce high-resolution information based on
local spatial points and function cards with poor resolution. Signals from every func-
tional location can be used by SAGAN to produce details. The discriminator may also
examine several intricate details in far-off passages. The pictures match up with one
another. Recent studies have also revealed that the performance of GANs is impacted
by the conditioning of the generators. Apply the spectral normalization supplied to
the GAN generator using this information to check whether the training dynamics
are improved. The suggested SAGAN [16] performs better than task 1 and generates
the strongest drive. On the difficult ImageNet dataset, they moved the starting point
from 36.8 to 52.52 and decreased Fréchet’s starting distance from 27.62 to 18.65.
Low-rise visualizations demonstrate that the generator benefits from the relevant
neighborhood within the object’s shape, are not from its immediate surroundings.

3 Experimental Investigation

We discovered the most effective technique for producing the necessary high-quality
photographs after having a literature review. The optimal method for producing as
many images as needed, according to our research, is GANs, or generative adversarial
networks [17]. To create new, artificial instances of data that can be mistaken for
genuine data, algorithmic structures known as GANs use two neural networks in
competition with one another. There are often two networks in GANs. Generators G
and discriminators D, which can distinguish between genuine and synthetic images,
are both trained simultaneously. Whereas generators G are FCNs that create images,
discriminators D are CNNs that estimate the likelihood that an input image is taken
from a real image. D was trained to distinguish between actual and artificial data,
whereas G was trained to create realistic visuals that will deceive D [18].
We need to identify the criteria used to obtain the necessary quality photographs
from our experimental research. With the celebrity dataset, we initially trained the
model. It includes pictures of famous people. We can produce the photos as needed.
Image Classification Model Based on Machine Learning Using GAN … 151

3.1 Description of Modules

The following modules are used in the suggested system:

1. Removing embedded content from the photos.
2. The GAN algorithm’s training.
3. Identifying faces in still photos and video frames.
Removing embedded content from the photos. In this module, embeddings is the
term used for the feature vectors. To find faces in a picture, a DL face detector
based on caffe has been utilized [19]. For extracting the embeddings from the
photographs inside the dataset, this module uses a porch-based embedder. The
retrieved embeddings are then kept in an encoded way in a pickle file [20].
The GAN algorithm’s training. To learn the generative model, which defines
how the data is produced using a probabilistic model, generative is used. The model
is trained in an adversarial environment, using the adversarial [21] (Fig. 1).
Identifying faces in still photos and video frames. We followed a similar proce-
dure to extract embeddings from a set of real-time images for input. A trained support
vector model is used to detect, contrast, and recognize faces. The accuracy of detected
face values is evaluated using the triplet loss function by comparing the vectors
(Fig. 2).

Fig. 1 Flowchart for proposed system

152 Ch. Bhavya Sri et al.

Fig. 2 Analyzing and detecting images

Fig. 3 Sample dataset

3.2 Response and Conversation

The dataset of photos of celebrities for this article is taken from Kaggle. The sample
photos that are included in the collection are described in Fig. 3. The number of
photos provided in the dataset affects how well this training model performs.

4 Result Analysis

The main criterion used to evaluate the effectiveness of any face identification algo-
rithm is the obtained accuracy of the match. The accuracy is calculated using the
algorithm’s ability to recognize face input. The percentage of the match is displayed.
It is essential for the algorithm to show the closest percentage of matches. With
Image Classification Model Based on Machine Learning Using GAN … 153

Table 1 Summary of test

Classification Noiseless test (%) Noisy test (%) Recognition time (ms)
h1 90.31 85.80 10
h2 96.85 85.65 13
h3 93.04 75.65 11
BP 90.13 80.69 11
GAN 94.67 92.54 56

50 no. of real-time images included in the dataset, the recommended GAN-based

facial detection system achieves a similar accuracy of roughly 90%. The accuracy
can be improved even further by expanding the collection’s number of photographs.
Vectors, light intensity, pixel quality, and other elements are the most crucial factors
to consider while analyzing facial features (Table 1).
These three theories have been chosen, H1, which has the best performance in the
training set, has the worst performance in the test set. The outcomes demonstrate that
the overfitting issue with a neural network is a problem that the evolutionary algorithm
is unable to fully resolve, though overall classification accuracy is exceptionally high.
Character offset, distortion, skew, and blur problems in the gained image cause any
text features to have a spatial distribution. It is challenging to appropriately categorize
noise interference using a few basic criteria. The execution of the identification image
is better when the experiment’s fitness is Fitness (h2) = 0.9065. This is demonstrated
by the experiment’s total accurate rate for the comprehensive classification. The
observation of recognition rate leads us to the conclusion that neural networks trained
with genetic algorithms have greater recognition rates than neural networks taught
with conventional techniques (Fig. 4).

5 Data Visualization Techniques

Pie Chart and Bar Chart. Pie charts are one of the most well-known and often-
used ways of data visualization and are utilized in a wide range of applications. This
pie chart shows how well-known each celebrity is because of their movies. This pie
chart is also easy to understand and apply when studying the case study. It can be an
effective tool for communicating with even the most ignorant audiences because it
graphically represents data as a small portion of a larger total. It makes it possible for
the audience to easily understand information or compare data in order to undertake
analysis. The bar chart’s bar lengths display how each group stacks up against the
value. When there are too many categories present, the bar’s labeling and clarity can
become a problem. This bar graph displays the celebrity’s level of online popularity
(Fig. 5).
Correlation Matrix. A correlation matrix is a table that exhibits correlation coef-
ficients among several variables. Each cell’s color represents the degree to which and
154 Ch. Bhavya Sri et al.

Fig. 4 Validation curve for noiseless test in GAN

Fig. 5 Chart representation

whether two variables are related to one another, reflecting the relationship between
the two variables. Correlation matrices can be used to summarize huge datasets and
find patterns. A correlation matrix may be used in business to investigate the connec-
tions between different product-related data items, such as the launch date, etc. This
matrix displays the facial expressions of famous people. Additionally, the celebrity’s
name is displayed (Fig. 6).
Scatter Plot Matrix. The dataset’s scatter plot matrix displays all pairwise scatter
between different variables as a matrix for k sets of variables or columns, such as
(x1, x2…xk), along with their names in this scatter plot matrix. Many relationships
Image Classification Model Based on Machine Learning Using GAN … 155

Fig. 6 Correlation matrix for list_bbox_celeba.csv

between variables can be examined in one chart by scatter plots. We may generally
assess if there is a linear link between several variables by using scatterplot matrices.
This is especially useful for identifying particular variables that might correlate with
your genomic or proteomic data. To display bivariate correlations between different
combinations of variables, scatter plots are arranged in a grid (or matrix). Numerous
associations can be investigated in a single chart thanks to the scatter plots in the
matrix, which each show the link between a pair of variables. As a result, the scatter
plot matrix has k rows and k columns for each of the k variables in the dataset. Each
row and column represents a scatter plot. Additionally, this scatter matrix graph
demonstrates how the celebrity’s facial emotions are displayed together with their
names (Fig. 7).
On the vertical axis, variable xj.
On the horizontal axis, variable xi.
156 Ch. Bhavya Sri et al.

Fig. 7 Scatter plot and density plot

6 Conclusion

Real-world ML models require a lot of data to be trained, so growing the dataset

requires a lot of resources, including time, money, and labor. We have ultimately
opted to take into account the GAN model to generate the synthetic data for the
low-intensity images, which is motivated by the research papers below. And we
were successful. Using the GAN model, we are able to produce photos. In recent
years, celebrity identification research has made extensive use of machine learning,
a crucial artificial intelligence technique. Due to its intelligence, broad overview,
and high detection efficiency, it has progressively come to the forefront of image
identification research. It took a lot of target data to perform this experiment. But
obtaining huge amounts of useful data is quite challenging when it comes to target
recognition. This is the main issue preventing deep learning from being used in the
area of picture recognition. In order to properly apply deep learning, on the basis
of the original database, it is crucial to develop a more effective method to carry
Image Classification Model Based on Machine Learning Using GAN … 157

out physical data growth. Although data is all around us, tagged data is uncommon.
Similar to other fields, collecting data for picture recognition is simpler, but doing
so manually requires a lot of time and effort.

References

1. Sukmandhani AA, Sutedja I (2019) Face recognition method for online exams. In: International
conference on information management and technology (ICIMTech), Jakarta/Bali, Indonesia,
pp 175–179
2. Venkateswar Lal GR, Nitta AP (2019) Ensemble of texture and shape descriptors using support
vector machine classification for face recognition. Ambient Intell Humaniz Comput
3. Fayyoumi A, Zarrad A (2014) Novel solution based on face recognition to address identity
theft and cheating in online examination systems. Adv Internet Things 4(3):5–12
4. Bah SM, Ming F (2020) An improved face recognition algorithm and its application in
attendance management system. Array 5
5. Kranthikiran B, Pulicherla P (2020) Face detection and recognition for use in campus
surveillance. Int J Innovative Technol Exploring Eng 9(3)
6. Mitra D, Gupta S (2022) Plant disease identification and its solution using machine learning.
In: 2022 3rd international conference on intelligent engineering and management (ICIEM),
London, United Kingdom, pp 152–157. https://doi.org/10.1109/ICIEM54221.2022.9853136
7. Kamencay P, Benco M, Mizdos T, Radil R (2017) A new method for face recognition using
convolutional neural network. Digital Image Process Comput Graphics 15(4):663–672
8. Traoré YN, Saad S, Sayed B, Ardigo JD, de Faria Quinan PM (2017) Ensuring online exam
integrity through continuous biometric authentication
9. Sani S, Bera A, Mitra D, Das KM (2022) COVID-19 detection using chest X-Ray images based
on deep learning. Int J Softw Sci Comput Intell (IJSSCI) 14(1):1–12. https://doi.org/10.4018/
IJSSCI.312556
10. Traoré I, Awad A, Woungang I (eds) (2017) Information security practices. Springer, Cham
11. Zhu C, Zheng Y, Luu K, Savvides M (2017) CMS-RCNN: contextual multi-scale region-
based CNN for unconstrained face detection. In: Bhanu B, Kumar A (eds) Deep learning for
biometrics. Advances in computer vision and pattern recognition. Springer, Cham
12. Yuan Z (2020) Face detection and recognition based on visual attention mechanism guidance
model in unrestricted posture. In: Scientific programming towards a smart world
13. Wang B, Chen LL (2019) Novel image segmentation method based on PCNN. Optik 187:193–
197
14. Wang K, Zhang D, Li Y et al (2017) Cost-effective active learning for deep image classification.
IEEE Trans Circ Syst Video Technol (99):1–1
15. Zhang H et al (2019) Self-attention generative adversarial networks. International conference
on machine learning. PMLR
16. Merrigan A, Smeaton AF (2021) Using a GAN to generate adversarial examples to facial image
recognition. ArXiv. https://doi.org/10.2352/EI.2022.34.4.MWSF-210
17. Cheng F, Hong Z, Fan W et al (2018) Image recognition technology based on deep learning.
Wireless Pers Commun C:1–17
18. Lin BS, Liu CF, Cheng CJ et al (2018) Development of novel hearing aids by using image
recognition technology. IEEE J Biomed Health Inf 99:1-1
19. Zhang XB, Ge XG, Jin Y et al (2017) Application of image recognition technology in census
of national traditional Chinese medicine resources. Zhongguo Zhong yao za zhi = Zhongguo
zhongyao zazhi = China J Chinese Materia Medica 42(22):4266
158 Ch. Bhavya Sri et al.

20. Sun D, Gao A, Liu M et al (2015) Study of real-time detection of bedload transport rate using
image recognition technology. J Hydroelectric Eng 34(9):85–91
21. Aggarwal A, Mittal M, Battineni G (2021) Generative adversarial network: An overview of
theory and applications. Int J Inf Manag. 100004. https://doi.org/10.1016/j.jjimei.2020.100004
Role of Natural Language Processing
for Text Mining of Education Policy
in Rajasthan

Pooja Jain and Shobha Lal

Abstract The knowledge of education policy will bring an array of new growth,
but it has necessitated an improved type of human–machine intercommunica-
tion, in which the machine enhances a thoughtful and interactive intelligence.
Natural language processing (NLP), a part of artificial intelligence (AI), is
the competence of a computer program to comprehend spoken and written
human language (https://www.linguamatics.com/what-text-mining-text-analytics-
and-natural-language-processing; Zhang and Segall in IJITDM 7(4):683–720,
(2008)) [1, 2]. After being thoughtful about it, in mining, one should have sagacity
for the predetermination of policy (Bhardwaj in Int J Eng Res Technol (IJERT) 1(3),
2012; Maes in Commun ACM 7:30–40, 1994) [3, 4]. Using NLP, this provides a
quick way of extracting information about education policy. This paper focuses on
manipulating NLP commands after data collection using unstructured interviews
about the attitude of NLP and then filling out a website questionnaire form to collect
the satisfaction result. Coding is executed to get the required data using Python and
NLP. During the analysis of feedback at colleges in Jaipur, Rajasthan, it is divulged
about the satisfaction of using NLP commands, so it is observed that NLP creates
a convenient way of mining. The goal behind this text mining is to identify the
importance of NLPs in getting data into an integrated form. Lastly, in the execution
phase, it narrates the process to obtain cognition for extricating data about policies
for gratification.

Keywords NLP · Education policy · Unstructured data mining · Web text

mining · Execution

P. Jain (B) · S. Lal

Jayoti Vidyapeeth Women’s University, Jaipur, Rajasthan, India
e-mail: Pooja1981jain@yahoo.com
S. Lal
e-mail: dean.fet@jvwu.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 159
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_14
160 P. Jain and S. Lal

1 Introduction

Text mining is one of the AI techniques. It enlists NLP and converts unstructured
text into data analysis format. Data on the web is mainly in unstructured format [5,
6]. Unstructured data is inputted into models to get predictions. NLP is a sub-part
of data science that consists of processes for intelligently processing, interpreting,
and getting knowledge from text data. NLP and its components can be used to
organize large amounts of data, perform various automated tasks, and solve a variety
of problems. Important tasks of NLP are text classification, text matching, and co-
reference resolution. Text mining is a technique for reviewing the records of a large
group to find knowledge from the data. It is broadly useful for getting knowledge
[7–10]. This uncovers documentation of large amounts with interrelationships. To
process the text, text mining can be used with NLP. Text mining produces structured
data that can be incorporated into databases [11–15].

1.1 Interpretation with ML for NLP

Python is a highly regarded and machine-friendly programming language in the arti-

ficial intelligence world. It works on a variety of data science topics such as ML, NLP,
and more. It has a path for every stage of the data science process [16]. A query in
Python extracts the data for cleaning and sorting. NLP simulates making devices more
intelligent to search the web. It allows machines to read the text and reply accordingly.
It encompasses both natural language understanding and natural language genera-
tion [17]. A search engine like Google provides every type of required data due to
NLP. Understanding the meaning of text can be accomplished by using machine
learning for NLP. NLP turns unstructured text into usable data. It can be catego-
rized as supervised machine learning (SML) and unsupervised machine learning
(USML). If the model is put into other text, it is considered SML, and a set of algo-
rithms that extricate meaningful data is viewed as UML. Classification and regression
are two categories of SML. Through classification, it can be used for fraud detec-
tion, image classification, customer retention, diagnostics, etc. On the other hand,
commercial prophecy, climate foretelling, business conjecture, evaluating expecta-
tions of life, citizens’ development projections, etc., are manipulated by regression.
So classification problems are applied to train a model to predict qualitative goals.
After predicting a number, the relationship between dependent and independent vari-
ables is discovered in regression. And when the datasets, like unlabeled data, are
loaded into the model for analysis and clustering without being predefined, clustering,
association, and dimensionality reduction (generalization) are the ways to describe
UML. Customer segmentation, targeted marketing, recommender systems, etc., are
included in clustering. Market basket analysis, customer clustering in retail, price
bundling, assortment decisions, cross-selling, and others are contained in the associ-
ation, whereas meaningful compression, structure discovery, feature elicitation, and
Role of Natural Language Processing for Text Mining of Education … 161

big data visualization, etc., are operated using dimensionality reduction. This is an
unsupervised technique where the unlabeled groups of similar entities are processed
as image compression, recognizing forgery newscasts, unsolicited processes, adver-
tising mechanisms, systematizing web marketing, associating crooked or delinquent
tasks, recording surveys, and others are solved by it [18].

1.2 About Education Policy 2020

National Education Policy 2020 includes nearly 2 lakh suggestions from 2.5 lakh
gram panchayats, 6600 blocks, 6000 urban local bodies, and 676 districts. By 2030,
this new policy aims to universalize education from pre-school to the secondary level.
There is a strong emphasis on foundational literacy. Vocational education will begin
in Grade 6 with internships, and until Grade 5, it will be taught in the parent’s native
language. According to NEP 2020, it has been dividing the 10 + 2 system into the
5 + 3 + 3 + 4 format. Flexibility in a higher education curriculum will be added
[19–21]. Medical education will be mingled with Ayurveda, Naturopathy, Unani,
Homoeopathy, Siddha, and vice versa at the undergraduate level, according to the
education policy [22].

2 Methodology

In the execution, it is initialized as “Education Policy Websites.” For applying web

text mining, NLP commands are used. If we get a satisfactory result from policy
extraction using NLP, we will stop the execution. Otherwise, it will be continued
with the same technique for a worthy result (Fig. 1).

2.1 Code and Executed Screenshots of Python for NLP

NLP is applied for cleaning and summarizing text, tokenizing sentences and words,
getting the frequency of words, etc. There are some steps in text mining for deriving
meaningful information when manipulating NLP with Python code [23] (Figs. 2, 3,
4, 5, 6, 7, 8, 9).
#Installing NLTK (Natural Language Toolkit)
C:\Users\HP\AppData\Local\Programs\Python\Python39>python
>>> import nltk
>>> nltk.download()
Showing info https://raw.githubusercontent.com/nltk/nltk_data/
gh-pages/index.xml
#Working with tokenization in NLP
162 P. Jain and S. Lal

Education Web Text Mining

Policy (Information
Start Retrieval, Extraction,
Websites
& Mining)

If Get NLP
With
Purposef Python

Yes

Stop
Execution

Fig. 1 Execution flow of text mining

Fig. 2 Stemming for text mining

>>> Education_Policy="According to NEP 2020, it has been dividing

the 10+2 system into the 5+3+3+4 format. Flexibility in a higher
education curriculum will be added."
>>> token=word_tokenize(Education_Policy)
>>> token
[’According’, ’to’, ’NEP’, ’2020’, ’,’, ’it’, ’has’, ’been’,
’dividing’, ’the’, ’10+2’, ’system’, ’into’, ’the’, ’5+3+3+4’,
’format’, ’.’, ’Flexibility’, ’in’, ’a’, ’higher’, ’education’,
’curriculum’, ’will’, ’be’, ’added’, ’.’]
# Locating the frequency distinct in the tokens
>>> from nltk.probability import FreqDist
>>> fdist=FreqDist(token)
>>> fdist
FreqDist({’the’: 2, ’.’: 2, ’According’: 1, ’to’: 1, ’NEP’: 1,
’2020’: 1, ’,’: 1, ’it’: 1, ’has’: 1, ’been’: 1, ...})
Role of Natural Language Processing for Text Mining of Education … 163

Fig. 3 Stemming and lemmatization for text mining

Fig. 4 Removing stop words for text summarization

>>> fdist1=fdist.most_common(9)
>>> fdist1
[(’the’, 2), (’.’, 2), (’According’, 1), (’to’, 1), (’NEP’, 1),
(’2020’, 1), (’,’, 1), (’it’, 1), (’has’, 1)]
# Opening a jupyter notebook
164 P. Jain and S. Lal

Fig. 5 Classifying words using POS-tagging, tagged token and Brown Corpus

Fig. 6 Importing re (regular expression) module for finding

Role of Natural Language Processing for Text Mining of Education … 165

Fig. 7 Finding, searching, splitting, replacing patterns

Fig. 8 Importing beautiful soup and nltk.tokenize

166 P. Jain and S. Lal

Fig. 9 Importing sent_tokenize() and word_tokenize() from nltk.tokenize package using Beautiful
Soup

C:\Users\HP\AppData\Local\Programs\Python\Python39>jupyter
notebook
[W 14:47:40.293 NotebookApp] Terminals not available (error was No
module named ’winpty.cywinpty’)
[I 14:47:40.543 NotebookApp] Serving notebooks from local direc-
tory: C:\Users\HP\AppData\Local\Programs\Python\Python39
[I 14:47:40.543 NotebookApp] Jupyter Notebook 6.2.0 is running at:
[I 14:47:40.543 NotebookApp] http://localhost:8888/
?token=85319cedbe702cff61e821a7e71b767c23e5c6db032d48ef
[I 14:47:40.559 NotebookApp] or http://127.0.0.1:8888/
?token=85319cedbe702cff61e821a7e71b767c23e5c6db032d48ef
[I 14:47:40.559 NotebookApp] Use Control-C to stop this server and
shut down all kernels (twice to skip confirmation).
[C 14:47:40.637 NotebookApp]
To access the notebook, open this file in a browser: file://
/C:/Users/HP/AppData/Roaming/jupyter/runtime/nbserver-1700-
open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=85319cedbe702cff61e821a7e71b767c
23e5c6db032d48ef
or http://127.0.0.1:8888/?token=85319cedbe702cff61e821a7e71b767c
23e5c6db032d48ef
[W 14:49:57.733 NotebookApp] 404 GET /undefined/undefined (::1)
22.060000ms referer=None
[I 14:53:45.992 NotebookApp] Creating new file in
[I 14:53:46.054 NotebookApp] Creating new notebook in
[I 14:53:46.443 NotebookApp] Creating new notebook in
[I 14:53:46.683 NotebookApp] Creating new notebook in
[W 14:53:46.939 NotebookApp] 404 GET /undefined/undefined (::1)
29.570000ms referer=None
Role of Natural Language Processing for Text Mining of Education … 167

[I 14:53:46.943 NotebookApp] Creating new notebook in

[I 14:53:51.419 NotebookApp] Kernel started: d18dbe85-4850-45b2-
a71f-534acdb74e99, name: python3

#Using urllib.request for fetching URLs

>>> import urllib.request
>>> response = urllib.request.urlopen(’https://www.rajasthanshi
ksha.com/’)
>>> html = response.read()
>>> print(html)
b’<!doctype html ><html class="ie8" lang="en><html lang="en-US">
‘
‘
Continue URL page source display

#Installing beautifulsoup4, a Python package to drag data from HTML

and XML files
>>> import bs4 as bs
>>> parsed_article=bs.BeautifulSoup(html,’lxml’)
>>> text = parsed_article.get_text
>>> print(text)
<bound method PageElement.get_text of <!DOCTYPE html>
<html class="ie9" lang="en"> <head></head><body>
‘
Continue URL page source display
,
</body></html>

3 Results

It is the scheme of problem-solving arrangements after loading, examining, and

executing the data [24–27]. It is viewed that NLP grants the way of interconnecting
SML with users. To the cognition of education policy, NLP offers tokenization,
stemming, lemmatization, and classifying words using POS-tagging, urllib.request,
re-module, and beautifulsoup4 for text mining and summarization [28, 29] (Fig. 10).
By using unstructured interviews and the responses to a website questionnaire
form, it has been proven that it is easy to get satisfaction after using NLP through a
generated online website’s address, https://drpoojajain.in/Chartreport.aspx?aa=51E
7F0C352CE201B50C8EC347DE68701AC365347058EEF911308281EE25E4DB
E392E3B51E564391A2B48F363740728D4F43F52596D548B65B9FB54ED49A
C83882F6C08EF.
This analysis tends toward the mining of web pages using NLP. It was focused
on a more appropriate way of doing text mining. Unstructured data was collected
through educational websites for summarization [30, 31]. The whole work made up
an interactive platform.
168 P. Jain and S. Lal

Fig. 10 NLPMINING versus line of satisfaction

4 Conclusion

In this culmination of research, a reader will keep an expeditious technique to extri-

cate about education policy. NLP has reached the level of an interdisciplinary area of
AI. Web text mining serves a valuable task in finding relevant information from big
data. Thus, several web codes of NLP show how to mine the web contents of policy
and can be mined for appropriate data using Python. It will be beneficial for the user
to obtain mining and summary as per requirements. The principle of this paper is to
accumulate the generated description of the complicated text and present a way to
bring out the fruitful product, and it will enhance the knowledge of users for making
tools for mining techniques.

References

1. https://www.linguamatics.com/what-text-mining-text-analytics-and-natural-language-proces
sing
2. Zhang Q, Segall RS (2008) Web mining, a survey of current research, techniques, and software.
IJITDM 7(4):683–720
Role of Natural Language Processing for Text Mining of Education … 169

3. Bhardwaj B (2012) Extracting data through web mining. Int J Eng Res Technol (IJERT) 1(3).
ISSN: 2278-0181
4. Maes P (1994) Agents that reduce work and information overload. Commun ACM 7:30–40
5. Shahmoradi L (2014) Structure-based web pages clustering. Int J Sci Eng Res 5(4). ISSN:
2229–5518
6. https://www.datamation.com/big-data/structured-vs-unstructured-data.html
7. Tsuyoshi M, Saito K (2006) Extracting user’s interest for web log data. IEEE 343–346. ISBN:
0-7695-2747-7
8. Malarvizhi R, Saraswathi K (2013) Web content mining techniques tools & algorithms—a
comprehensive study. IJCTT 4(8). ISSN: 2231-2803
9. https://shodhganga.inflibnet.ac.in:8443/jspui/handle/10603/334941
10. Shetty S, Hans V (2019) Education for skill development and women empowerment. EPRA
Int J Econ Bus Rev Peer Reviewed J 7(2). e-ISSN: 2347–9671|, p-ISSN: 2349-0187
11. Inamdar SA, Shinde GN (2011) Web data mining using an intelligent information system
design. Int J Comput Tech Appl 280–283. ISSN: 2229–6093
12. Saini S, Pandey HM (2015) Review on web content mining techniques. Int J Comput Appl
118(18) (0975–8887)
13. Khalili A (2008) A semantic web service-oriented model for project management. In: IEEE 8th
international conference on computer and information technology workshops. CIT Workshops,
pp 667–672
14. Fedak G (2009) BitDew: a data management and distribution service with multi-protocol
file transfer and metadata abstraction. J Netw Comput Appl 32(5) [Next Generation Content
Networks, pp 961–975]
15. Mebrahtu A, Srinivasulu B (2017) Web content mining techniques and tools. IJCSMC 6(4):49–
55. ISSN: 2320-088X
16. Barfourosh AA, Motahary Nezhad HR, Anderson ML, Perlis D (2002) Information retrieval
on the world wide web and active logic: a survey and problem definition
17. https://searchenterpriseai.techtarget.com/definition/natural-language-processing-NLP
18. Barba P (2020) Machine learning (ML) for natural language processing (NLP), September 29,
2020. https://www.lexalytics.com/lexablog/
19. https://www.rajasthanshiksha.com/download-national-edcuation-policy-2020-pdf/
20. https://www.rajras.in/education/
21. https://economictimes.indiatimes.com/jobs/making-skilling-part-of-education-system-a-cha
llenging-task/articleshow/67633636.cms?from=mdr
22. https://www.mid-day.com/amp/mumbai/mumbai-news/article/new-education-policy-will-suf
focate-medical-education-22945232
23. https://www.kdnuggets.com/2018/11/text-preprocessing-python.html
24. Sharma P, Bhartiya R (2012) An efficient algorithm for improved web usage mining, vol 3,
issue 2, pp 766–769. ISSN: 2229-6093
25. Beniwal R, Tanwar V (2014) Evaluation of web personalization. IJIRST 1(6). ISSN: 2349-6010
26. Ameen A, Khan KUR, Rani BP (2012) Semantic web personalization: a survey. Inf Knowl
Manage 2(6). ISSN: 2224-5758
27. Yadav M, Mittal P (2013) Web mining: an introduction. Int J Adv Res Comput Sci Softw Eng
3(3). ISSN: 2277 128X
28. https://towardsdatascience.com/gentle-start-to-natural-language-processing-using-python-
6e46c07addf3
29. https://machinelearningmastery.com/clean-text-machine-learning-python/
30. Vijayarani S, Suganya E, Prakathambal M (2018) Web log files in web usage mining research—
a review, vol 5, issue 2. ISSN: 2394-2320
31. Ratnakumar AJ (2010) An Implementation of web personalization using web mining
techniques. J Theor Appl Inf Technol [2005–2010 JATIT]
Multilingual and Cross Lingual Audio
Emotion Analysis Using RNN

Sudipta Bhattacharya, Brojo Kishore Mishra, and Samarjeet Borah

Abstract Speech and language are necessary for a person’s development of their
emotional and social skills. Reading, writing, and verbal apprehensions are vital
components of the overall learning process, and hence, instruction or some linguistic
representation is necessary for academic achievement. This study has been high-
lighted using several audio files based on various linguistic methodologies, and it
has been compared using multilingual and cross-lingual methods. For our experi-
ments with the English, German, Italian, and Canadian-English emotion datasets,
we have decided to use CNN and LSTM as our models of choice. We obtained the
most outstanding results in Canadian English during the trial with a single language.
Here, we obtained CNN at 99.12% and LSTM at 98.96%. During the experiment
with several languages, only English test data was found to be secure at 98.19% in
CNN and 93.41% in LSTM. Lastly, in the cross-lingual domain of Canadian English,
namely 93.25%, we found 98.19% using the CNN and LSTM models.

Keywords MFCC · Convolution neural network · LSTM · Emotion detection

1 Introduction

Research into multilingual and cross-lingual audio emotion analysis seeks to provide
methods and algorithms for detecting and interpreting emotional information in voice
data across various languages. There are many potential uses for emotion analysis

S. Bhattacharya (B) · B. K. Mishra

Department of Computer Science and Engineering, GIET University, Gunupur, India
e-mail: sudipta.bhattacharya@giet.edu
B. K. Mishra
e-mail: brojokishoremishra@gmail.com
S. Borah
Department of Computer Applications, Sikkim Manipal Institute of Technology, Sikkim Manipal
University, Gangtok, Sikkim, India
e-mail: samarjeet.b@smit.smu.edu.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 171
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_15
172 S. Bhattacharya et al.

in fields including medicine, education, customer service, and even the entertain-
ment industry. Recent years have seen substantial advancements in the accuracy and
robustness of emotion detection in spoken language, thanks to deep learning models,
particularly recurrent neural networks (RNNs) [1]. Models that can reliably detect
and interpret emotions in voice data across many languages are the goal of multilin-
gual and cross-lingual audio emotion analysis. It is difficult because some languages
may not have enough training data, while others may have emotional expressions
and cultural conventions that differ significantly from English. Emotions can be
represented in many ways, including intonation, vocabulary choice, and grammat-
ical constructions, but these vary from language to language. One form of deep
learning valuable model in voice emotion recognition is recurrent neural networks
(RNNs) [2]. RNNs capture the temporal relationships and long-term dependencies
in sequential data because of their architecture for handling such data. It includes
speech signals and text. While emotions are generally expressed over time and may
be impacted by earlier events or context, this is especially crucial for speech emotion
identification. Several methods using RNNs to analyse emotional content in audio
have been developed for use with languages other than English. Using a single model
that can process different languages is one option. A multilingual RNN is one such
model. A model of this type may be taught to recognise emotions in any language by
being exposed to data in those languages. Another strategy involves creating indi-
vidual models for each language and developing ways to share information, like
transfer learning or multitask learning. It allows the models to learn from each other
and share characteristics, increasing their performance across languages [3].
The need for annotated data in some languages is a problem for multilingual
and cross-lingual audio emotion analysis. Training and assessing machine learning
models need annotated data, which can be challenging and expensive. Using unsuper-
vised or semi-supervised learning methods, which can make use of either completely
unlabelled data or only a small quantity of labelled data, is one way to tackle this
problem. Other strategies include unsupervised pre-training, in which the model
is initially trained on a large body of unlabelled data before fine-tuning a smaller
amount of labelled data. Consideration of cultural differences in emotional expres-
sion is another obstacle in multilingual and cross-lingual audio emotion analysis [4].
Cultural norms about the display of emotion vary from one society to the next. It is
possible to tackle this issue by incorporating cultural knowledge or creating models
that can adapt to varied cultural contexts.
Finally, cross-cultural, and multilingual audio emotion analysis is a promising
field with numerous practical applications. There are several methods for creating
models that can process many languages, and RNNs are a potent tool for speech
emotion recognition. However, several obstacles must be overcome, such as more
annotated data for languages and the requirement to consider cultural variations
in effect. More thorough and reliable models for assessing emotions in spoken
language across languages and cultures will require further study in this field.
Detecting emotion is one of today’s most critical marketing strategies. We could
see a person’s feelings from their speech. Speech emotion recognition was a tech-
nology that extracted emotional features from speech signals and compared and
Multilingual and Cross Lingual Audio Emotion Analysis Using RNN 173

analysed the characteristic parameters. It is a hot research topic in human–computer

interaction (HCI). It has wide applications, such as the interface with robots, banking,
call centres, cardboard systems, computer games, etc. For classroom orchestration
or e-learning, information about the emotional state of students can provide focus on
enhancing teaching quality [5]. For example, teachers can use speech emotion recog-
nition to decide what subjects can be taught and must be able to develop strategies for
managing emotions within the learning environment. Speech emotion recognition
technology captures words, and emotions can be recognised, and the system deter-
mines the accuracy of the predicted feeling. Emotion detection technology is one of
the fastest growing technologies in the world. Almost 20% of the human population
faces the inability to use their hands or have weaker eyesight, making them suffer
the problem of being unable to use today’s technology to its fullest. Speech recog-
nition technology can solve these problems and can be avoided by users. In 2020,
the speech recognition market had grown to 10.70 billion dollars and projected up
to 17% growth in 2026. The paper is designed considering that emotion recognition
from the speech is state of the art [6].

2 Related Work

In 2021, Saad et al. [7] used the TESS database to analyse language-independent
vocal features in Bengali and English. They made a verbatim translation of the two
languages using a sample of “Say the word Read” and “Poro” should “ti bolo.” In
this work, support vector machine is incorporated, and 50 audio samples are for
six different emotions: happy, angry, neutral, sad, disgusted, and fear. The overall
recognition rate for Bangla, English and Canadian English TESS was 88.3%, 85%,
and 93.3%, respectively. Dupuis et al. [8] conducted this experiment with 56 under-
graduate students from the University of Toronto in 2011. They listened to speeches
delivered by both young and old speakers and identified the emotion of the speakers.
Overall, the accuracy was 82%. The Ravdess database includes eight emotions:
happy, sad, angry, calm, fearful, surprised, neutral, and disgusted. The following
research has been conducted using the same. In 2016, Shegokar and Sircar [9] used
the Ravdess database to create a quadratic SVM using a five-fold cross-validation
technique. Their accuracy rate was 60.1%, with the limitation in the male sample
speech along with the use of selected features in SVM. In 2016, Zhang et al. [10]
proved that accuracy could be 57.14% if the dataset uses a song-to-speech relation-
ship using group multitask features using only four emotions: angry, happy, neutral,
and sad. In 2017, Zhang [11] used spectrograms from songs and speech to multitask
gated residual networks (GResNets), claiming the model was task-specific with an
accuracy of 65.97%.
In 2017, Popova [12] used convolutional neural network VGG-16 as a classifier;
they obtained 71% accuracy when Mel-spectrogram was taken from the speech. The
German Emo-DB database presents the emotions of boredom, anger, sadness, fear,
disgust, neutrality, and happiness, and the number of the audio file in it is 535. In 2010,
174 S. Bhattacharya et al.

Luengo et al. [13] used the German Emo-DB database to use the features of spectral,
intonation, and intensity regression features, sentence end, voice quality features,
statistics and speech rate. The accuracy rate of their research was 78.30%, despite
the database having 535 utterances at 8 kHz with 16 bits per sample, rather than
the original 16 kHz. According to Wu et al. [14], modulation spectral features and
prosodic features use multiclass linear discriminant analysis (LDA) classifiers with
an accuracy rate of 85.8%. On the other hand, in 2012 Lampropoulos and Tshisintzis
[15] found that combining MPEG-7 descriptors, MFCC, and timbral using SVM
with RBF kernel leaves out one equation that results in 83.93% accuracy. In 2014,
Pohjalainen et al. [16] used the first derivative of MFCC and the second derivative
of MFCC. The GMM classifier can produce an accuracy rate of 68.49%, and they
claim that improvement can be possible if the training data is selected at the time of
training of the GMM-based model. In 2014, Huang et al. [17] used a CNN input layer
along with the last layer with an SVM classifier utilising a spectrogram and found
that 88.3% accuracy is concluded while including the speaker and 85.2% accuracy
is concluded when the speaker is excluded.
In 2019 Latif et al. [18] works with Emo-Vo with features eGeMAPS and
algorithm used SVM, and accuracy was found to be 61.8 without data augmentation.
Haider et al. [19] worked with emboss features, eGeMAPs and SVM algorithm
for classification and found 80% accuracy.

3 Methods and Materials

3.1 Database

Speech emotion detection is a subfield of linguistics concerned with determining an

individual’s emotional state based on their speech cues. Researchers utilise speech
emotion databases to build and test algorithms for speech emotion identification.
These databases comprise recordings of persons speaking, while exhibiting a variety
of emotional states. Most of the time, these databases contain audio recordings of
speech, together with labels that indicate the speaker’s emotional state at the time the
recording was made. Researchers working in the domains related to speech emotion
detection and allied fields will find these databases to be valuable resources. Table 1
represents different emotional databases [20].

3.1.1 TESS Dataset

The TESS dataset consists of 2800 audio files of 200 utterances. Two female actors
aged between 26 and 64 years and seven emotions like neutral, surprise, pleasant
surprise, anger, disgust, happiness, and fear is exciting. The TESS database language
is Canadian English. All audio files are in wav format.
Multilingual and Cross Lingual Audio Emotion Analysis Using RNN 175

Table 1 Represents different emotional database

Database Population Age Emotion Utterance Languages Emotion Sample
name rate ( kHz)
TESS 2F 26–64 8 2800 Canadian A, D, F, H, 48
database English Ps, S, N
Emo-Db 5 M and 5 21–35 7 535 German A, F, J, B, 48
F S, D, N
RAVDESS 12 M and 21–33 8 1440 English A, F, J, S, 48
12 F D, Sur, N,
C
Emo-Vo 3 M and 3 21–35 6 585 Italian D, F, A, J, 48
F Sur, S
(A = Anger, D = disgust, F = fear, H = happiness, Ps = pleasantly surprised, S = sad, N = neutral,
B = boredom, J = joy, Sur = surprise, C = calm)

3.1.2 Emo-DB Database

The Emo-Db database is a German emotional database comprising five male and five
female actors with 535 unique utterances. This database consists of seven emotions
neutral, disgust, sadness, boredom, anger, joy, and fear. All the data is in 48 and
16 kHz sample rates, and all the information is in wav format.

3.1.3 RAVDESS Speech Dataset

RAVDESS database consists of audio and video, both total files 2880, the total
number of actors is 24 (12 male and 12 female). We have chosen only audio files.
We worked on 1440 audio files in our experiment with 60 trials per actor. It contains
eight emotions, disgust, surprise, anger, fear, sad, calm, with two levels of expression
(normal and robust).

3.1.4 Emo-VO—Italian SER Dataset

Emo-Vo is an Italian emotional database consisting of six emotions: sadness, fear,

anger, joy, surprise, and neutral. Emo-Vo consists of three males and three females,
with a total utterance of 585. All files are in sample rate in 48 kHz. It is an acted
database.
176 S. Bhattacharya et al.

3.2 Data Augmentation Technique

Several data augmentation techniques have been introduced, including white noise,
shifting sound, stretching sound, and changing the sound pitch have been proven in
our work. The database size gets four times using all these methods. We have trained
in augmented data and tested in original data in every mono-lingual, multilingual,
and cross-lingual experiment.
The effectiveness of machine learning models can be enhanced by using data
augmentation, which is an effective method for expanding the amount and variety of
a dataset. Methods for improving audio records are as follows.
Adding white noise to sound data can create the impression of traffic noise or other
forms of background noise. Adding a randomly generated noise signal to the source
audio would accomplish this. When we talk about “time shifting,” we’re referring
to the practice of modifying the beginning or end of sound transmission, e.g., a new
start or ending time can be chosen randomly, and the audio signal can be trimmed or
padded as needed. Sound stretching is altering the length of an audio transmission
without altering its pitch. Time stretching algorithms, which change the playback
speed of the audio signal, can be used for this purpose.
A signal’s frequency content must be modified to modify the pitch of a sound.
Pitch-shifting algorithms can be applied to the audio stream to alter its pitch without
affecting its pace or timing.
These methods can be used singly or in tandem to provide a significant and varied
dataset for use in machine learning. Sound data can also be improved using other
ways, including filtering, equalisation, and modulation.

3.3 Feature Extraction

Feature extraction plays a vital role in any recognition system.

Nowadays, the deep learning model can take care of all feature extraction policies.
We have used hand-picked features which have shown better results than others.
In this experiment, we have chosen Mel-frequency cepstral coefficients (MFCC).
MFCC has 40 numbers features, generation features from sound, and in experimental
we have chosen MFCC, which shows better results [21].

3.3.1 MFCC

Mel-frequency cepstral coefficients (MFCCs) are frequently employed in speech and

audio signal processing for feature extraction. They are used to compactly capture
the signal’s spectral characteristics for analysis and classification. The Mel scale
is perceptually-based, representing how the human ear hears frequency. MFCCs
are computed by taking the Fourier transform of a spoken signal and mapping the
Multilingual and Cross Lingual Audio Emotion Analysis Using RNN 177

resulting spectral information onto this scale. The coefficients are calculated by
discrete cosine transformation of the Mel-spectrum after it has been translated into
the cepstral domain (DCT). The initial step in computing MFCCs is to pre-emphasise
the signal by amplifying its high-frequency components to increase the signal-to-
noise ratio of the spoken signal. After that, a window function is applied to each
overlapping signal frame to lessen the spectral leakage representation of the spectral
properties of the spoken stream.
Speech recognition, speaker identification, and music genre categorisation are just
some audio processing tasks that have significantly benefitted from using MFCCs.
In addition, they are helpful for classification tasks that rely on machine learning
because they offer a compact and robust representation of the spectral features of the
speech signal.
Overall, in speech and audio signal processing, Mel-frequency cepstral coeffi-
cients (MFCCs) are a standard feature extraction method. They are helpful for anal-
ysis and classification because they offer a condensed representation of a signal’s
spectrum characteristics.
MFCC features very well in identifying monosyllabic words and spoken
sentences. Moreover, the MFCC spectrogram can identify the pattern by which we
can quickly identify words, emotions, etc. Speech signals commonly contain tones
of varying frequencies, each style with an actual frequency, f (Hz), and the subjec-
tive pitch is computed on the Mel scale. They were converted into a Mel-spectrum
of a speech. First and foremost, a noise reduction technique occurs, framing after
that windowing, then converting speech into fast Fourier transformation, after that
long energy computation, then converted into discrete cosine transformation and
converted into Mel-spectrum. Fast Fourier transform (FFT) calculations are then
used to determine each frame’s power spectrum, filtered through a series of triangles
to produce a Mel scale representation. The filter bank mimics the limited frequency
range of the human ear, which means it works best at lower frequencies. When the
spectrum has been converted to the Mel scale, the logarithm of each filter bank’s
output is calculated, and the discrete cosine transform (DCT) is then performed on
the resulting coefficients. The estimated cepstral coefficients are considered [22, 23].

3.4 Convolution Neural Network

Convolutional neural networks (CNNs) excel at the visual recognition of still and
moving images. Convolution detects graphic patterns. Pooling layers minimise
extracted features, and convolution uses a filter or kernel to remove edges, textures,
and shapes. The output is then enhanced with a nonlinear activation function like
ReLU. CNNs use convolutional, pooling, and fully connected layers. Convolutional
layers extract picture features. Fully combined layers identify feature spatial dimen-
sions. The CNN optimises filter weights and bias settings during training to get the
178 S. Bhattacharya et al.

most accurate output for any input image. Backpropagation communicates the differ-
ence between the predicted and actual output from layer to layer, updating weights
and biases.
CNNs excel at object detection, facial recognition, and handwritten digit recog-
nition. They are also employed in natural language processing, speech recognition,
and drug creation. CNNs’ ability to automatically learn features without feature engi-
neering is advantageous. They can also handle hazy, distorted, and different-sized
photographs.
CNNs are effective, but they require a lot of data and computing. In addition,
adversarial attacks, in which even minor changes to a picture might fool the network,
may also affect them. These reasons make convolutional neural networks a popular
and successful deep learning model for image and video recognition. They excel in
finding objects and patterns in dense visual data by automatically learning valuable
attributes from input pictures.
Convolution neural networks have three fundamental elements besides the input
layer. The convolution layer provides a feature map with classed features. The pooling
layer’s principal task is down sampling pixels, while max pooling keeps the strongest
pixel. Finally, the flattening layer flattens the previous layer’s output into a vector.

3.5 LSTM Model

LSTM is a particular version of the RNN model which solves short-term memory
problem. LSTM is a combination of long-term and short-term memory, a combination
of forget gate, input gate, and output gate. Forget gate works with a sigmoid function,
and the input and output gate also work with tanh and sigmoid functions. Forget gate
and replace the keyword. Forget gate is used to take hidden input cell (h(t−1)) and
previous cell x(t). Input gate adds keyword h(t−i), and x(t) is cell state; this has been
done by tanh with the −1 to +1. With the regulatory filter multiplied by a sigmoid,
create a vector with the function. The output gate sent it to the hidden channel for
further use. Output gates are broken into three steps:
• Creating a filter
• Multiplying with the filter
• Transferring them from the secret cell to the next cell Fig. 1 represents a diagram
of the LSTM model.

4 Proposed Model Framework

The design of our proposed LSTM model represents in Fig. 2. The initial layer
consists of 512 filters, kernel size 5, stride one, batch normalisation, and ReLu
applied. The dropout rate is applied at 0.1. The next onward shape is 25,612,864. We
consider 1D data with three dense layers, and the optimiser used stochastic gradient
Multilingual and Cross Lingual Audio Emotion Analysis Using RNN 179

Fig. 1 Diagram of LSTM model

descent (SGD) with a momentum of 0.9 and a decay le-6. According to our clas-
sification goal, we set our dense layer. The layer consists of filters to change input
value data hypermeter that consists of filter size (F) and stride (S); the output name
features map. In this work, we have used 40X1 with hand-picked features of MFCC.
Figure 3 represents a proposed model framework in which we have only shown
the RNN model (LSTM) in the first step; we have processed audio and extracted
features from them. We have only used MFCC features. After that, we have chosen

Fig. 2 Represent of proposed LSTM model for SER

180 S. Bhattacharya et al.

Fig. 3 Represents proposed model framework

k-fold cross-validation for the test and train set. After this, we sent these train and
test data into the LSTM model selected and classified into emotions.

5 Experimental Results and Discussion-

We have done a series of experiments on different emotional languages database

like mono-lingual, multilingual, and cross-lingual for the evaluation accuracy of the
SER model in linguistic variation. Mono—the lingual experiment, we have chosen
five expressive language databases like English and Canadian English, German, and
Italian. In the mono-lingual investigation, we have training and testing with individual
databases, like training with English and testing with English. We have compared our
model with CNN. Our result is also reflected in CNN. We have also done augmented
data with every speech emotion database, training with augmented data, and testing
with original data. TESS data consists of 2800 utterances. After data augmentation,
the total file size was 11,200, with 8960 phrases from the training data and 560
statements from the original 20% database. After data augmentation, the entire file
size was 11,200, and 80% of training data was 8960 and testing with the authentic
20% of the database, which is 560 utterances. TESS database accuracy is 92% in the
initial database and 99.12% in the augmented database. The original TESS database
has 92% accuracy and the augmented database has 99.12% accuracy. In the LSTM
Multilingual and Cross Lingual Audio Emotion Analysis Using RNN 181

model, 93% of the original and 98.96% of the expanded databases were present.
In the Emo-Db emotional database, the total number of utterances is 535, after the
total augmentation number of phrases is 2140, training data is 1712, and the test
with original data is 107. In Emo-Db, we experimented with CNN. In the initial
database, we got 70.37%; in the augmented database, we got 92.96%. An LSTM
experiment produced 71.27% in the original database and 85.38% in the expanded
database. In the RAVDESS database, the total number of speech utterances in 1440
after augmentation was 5760, training data for augmentation was 4608 and testing
data 288 statements from 20% of original data. We experimented with CNN in the
RAVDESS dataset. We obtained 70.46% in the original dataset and 91.80% in an
augmented dataset. In the LSTM model, we have secured 75.35% in the original
dataset and 90.38% in the expanded dataset. There are 585 utterances in the Emo-
Vo emotional database. After data augmentation, the total number of files is 2340,
and the test data is 117 from 20% of the original dataset. We have experimented
with the Emo-Vo dataset and found 73.47% in CNN with the original dataset and
90.87% in the augmented dataset. In the multilingual database, we have only shown
English languages in training and testing with all languages. The number of utterances
used for training was 4608 in the augmented dataset and testing with all languages
original 20% of data, which is 1172, in the multilingual database. We have done some
experiments with CNN and LSTM models; two experiments have been done with
the original and augmented models. In the CNN model, 72.59% and 98.19% for data
augmentation and 87.88% and 93.41% also got LSTM in original and augmented
datasets, respectively. Training has been done for all the linguistic databases and
testing with individual linguistic datasets in the cross-lingual database. Training
for augmented data is 16866 and testing with individual languages database like
for German 107 utterances has been used for testing purposes. Similarly, French
100, Italian 117, and Canadian English 560 were used for the experiment. In the
cross-lingual dataset many experiments have been done for the cross-lingual original
dataset using CNN and LSTM training with all languages of the emotional database.
Best with individual languages like German, English, Italian, and Canadian English
as accuracy found for CNN is 68.25%, 70.38%, 67.35%, and 90.25%, respectively,
and for LSTM, 52.87%, 65.25%, 70.21%, 87.88%, respectively. Next, we have also
done augmented experiments in the cross-lingual database in both model CNN and
LSTM, with accuracy, found 78.12%, 78.04%, 72.25%, and 93.25% for CNN model
and 72.18%, 87.88%, 92.24%, 98.12%, respectively. Table 2 represents the result of
mono-lingual, multilingual, and cross-lingual experiment results.

5.1 Some Potential Comparison Studies Based

on Mono-lingual Database

We have chosen a mono-lingual database for our general studies and compared it with
some potential work of different emotional databases (mono-lingual). In the TESS
182 S. Bhattacharya et al.

Table 2 Represents all the observations based on mono-lingual, multilingual, and cross-lingual
experiments
Linguistic Database Database Train Test Model Epochs Accuracy
type (%)
Mono-lingual RAVDESS Original English English CNN 100 70.46
Augmented 91.80
Original LSTM 150 75.35
Augmented 90.38
Emo-Db Original German German CNN 100 70.37
Augmented 92.76
Original LSTM 150 71.27
Augmented 85.38
EMO-Vo Original Italian Italian CNN 100 73.47
Augmented 90.87
Original LSTM 150 70.56
Augmented 92.62
TESS Original Canadian Canadian CNN 100 92
Augmented English English 99.12
Original LSTM 150 93
Augmented 98.96
Multilingual Multilingual Original English ALL CNN 100 72.59
Database Augmented 98.19
Original LSTM 150 87.88
Augmented 93.41
Cross-lingual Cross-lingual Original ALL Italian CNN 100 67.33
database Augmented 72.25
Original LSTM 150 70.21
Augmented 92.24
Original ALL Canadian CNN 100 90.25
Augmented English 93.25
Original LSTM 150 87.88
Augmented 98.12

database, Saad et al. performed slightly better than our proposed LSTM model;
Fig. 4 represents a comparison study in a mono-lingual experiment in TESS data. In
RAVDESS data, our proposed model outperforms other models, Fig. 5 represents a
comparison study in the RAVEDSS database. In the Emo-DB database, our model
outperforms all models except Hung et al. Figure 6 illustrates a comparison study
between all potential works of the Emo-VO database. Finally, Fig. 7 represents a
Multilingual and Cross Lingual Audio Emotion Analysis Using RNN 183

Fig. 4 Potential comparison

between mono-lingual Our LSTM model 93%
experiment for TESS
database
Dupuis et al. [8] 82%

Saad et al.[7] 93.30%

Fig. 5 Potential comparison

between mono-lingual Our LSTM model 90.38%
experiment for RAVDESS
database Popova et al.[12] 71%

zhang et al.[10] 57.14%

Shegokar and Sircar[9] 60.10%

Fig. 6 Potential comparison Our LSTM model 85.38%

between mono-lingual
experiment for Emo-DB Hung et al.[17] 88.00%
database Pohjalainen et al.[16] 68.49%
Lampropolous et… 83.93%
Wu et al.[14] 85.80%
Luengo et al.[13] 78.30%

comparison study between all possible models based on the Emo-Vo database; our
model outperforms other models.

5.2 Multilingual Database Analysis

In multilingual database analysis, training in English (TESS database and RAVDESS

database) with total training data is 3952 and testing with all languages (Emo-DB,
Emo-VO, TESS, and RAVDESS) original data, the total number of files is 1072.
Here, we only show the classification report of our proposed LSTM model. Tables 3
and 4 contain RNN-based multilingual analysis based on original and augmented
data, respectively.
184 S. Bhattacharya et al.

Fig. 7 Potential comparison Potential Comparison between

between mono-lingual monolingual Experiment for Emo-
experiment for Emo-VO VO Database
database

92.62
Our model LSTM
%

Haider Fetal.[19] 80%

61.80%
Latif et al.[18]

Table 3 Represents classification report of multilingual experiment in original data

Precision Recall F1-score Support
Anger 0.96 0.94 0.95 174
Disgust 0.88 0.71 0.79 276
Fear 0.85 0.95 0.90 183
Sad 0.90 0.92 0.91 264
Happy 0.76 0.89 0.82 175
Avg/total 0.87 1072

Table 4 Represents classification report of multilingual experiment in augmented data

Precision Recall f1-score Support
Anger 0.98 1.00 0.92 190
Disgust 0.97 0.98 0.97 254
Fear 1.00 0.98 0.99 211
Sad 0.99 0.99 0.99 242
Happy 0.96 0.94 0.95 175
Avg/total 0.98 1072

5.3 Cross-Lingual Database Analysis

In cross-lingual database analysis, we have trained with all databases and tested with
different individual databases. Our proposed LSTM model performs training with
all emotional database with original training file size 4288 and testing with original
data 117 Emo-Vo (Italian database) files. Similarly, in the augmented cross-lingual
experiment, we have chosen 17,152 as training data and 117 for testing purposes.
Multilingual and Cross Lingual Audio Emotion Analysis Using RNN 185

Tables 5 and 6 represent the original and augmented experiment for the cross-lingual
experiment (training with all databases and testing with the Emo-VO database). Our
LSTM model outperforms CNN, which we have mentioned in Table 2.
Another cross-lingual experiment has been done with the TESS database where
training with all databases with original training data (3952) and testing with actual
data (560), and for augmented experiment training with 17,152 and testing with 560
utterances. Tables 7 and 8 represent classification reports of original and augmented
cross-lingual experiments based on the TESS database.

Table 5 Represents classification report of cross-lingual experiment (testing with EMO-Vo) in

original data
Precision Recall F1-score Support
Anger 0.78 0.69 0.72 23
Disgust 0.77 0.68 0.69 21
Fear 0.69 0.78 0.85 19
Sad 0.79 0.80 0.78 26
Happy 0.66 0.64 0.65 28
Avg/total 0.70 117

Table 6 Represents classification report of cross-lingual experiment (testing with EMO-Vo) in

augmented data
Precision Recall F1-score Support
Anger 0.88 0.89 0.85 29
Disgust 0.82 0.81 0.79 18
Fear 0.99 0.98 0.95 17
Sad 0.89 0.85 0.88 26
Happy 0.85 0.84 0.85 27
Avg/total 0.92 117

Table 7 Represents classification report of cross-lingual experiment (testing with TESS) in original
data
Precision Recall F1-score Support
Anger 0.88 0.89 0.85 89
Disgust 0.82 0.81 0.80 108
Fear 0.85 0.81 0.87 117
Sad 0.83 0.85 0.84 126
Happy 0.85 0.84 0.85 120
Avg/total 0.87 560
186 S. Bhattacharya et al.

Table 8 Represents classification report of cross-lingual experiment (testing with TESS) in

augmented data
Precision Recall F1-score Support
Anger 0.98 0.99 0.95 99
Disgust 0.98 1.00 0.99 110
Fear 0.95 0.98 0.95 115
Sad 0.98 0.95 0.98 117
Happy 0.95 0.93 0.95 119
Avg/total 0.98 560

6 Conclusions

This paper has tested different emotion-detecting methods. Researchers combine

emotional traits. Researchers are continuously arguing over speech emotion traits.
Our research shows that database utterances increase accuracy. Increased epoch,
optimiser, and batch size improve accuracy. LSTM outperforms CNN. In certain
studies, CNN outperforms another model, although the TESS database has just two
female young and adult female actors. LSTM outperforms CNN in multilingual and
cross-lingual databases. Finally, multilingual, and cross-lingual audio emotion anal-
ysis utilising RNNs can improve our comprehension of spoken language emotion
across languages and cultures. Accurate and robust speech emotion detection models
can improve healthcare, education, customer service, and entertainment applications.
RNNs can recognise speech emotions in different languages. These strategies include
employing a single model for several languages, distinct models for each language
with ways to transfer information, and unsupervised or semi-supervised learning to
solve the absence of annotated data in some languages. Cultural understanding or
adaptable models may be needed to address cultural disparities in emotional expres-
sion. Despite progress, many issues remain. Specific languages require annotated
data, making machine learning model training and evaluation problematic. This chal-
lenge requires novel ways to get tagged or unlabelled data to improve performance.
Cultural differences in emotional expression may require cultural understanding or
adaptable models.
RNN-based multilingual and cross-lingual audio emotion analysis should focus
on developing more accurate and robust models to manage many languages and
cultural variances in emotional expression. Linguists, computer scientists, psychol-
ogists, and anthropologists will collaborate on this research. New evaluation tools
and benchmarks for multilingual and multicultural performance will also be needed.
RNN-based multilingual and cross-lingual audio emotion analysis has several uses.
This research can help us interpret spoken emotion and generate more practical
applications in healthcare, education, customer service, and entertainment. We can
unlock speech emotion recognition’s full potential and contribute to natural language
processing by tackling these issues and pushing research.
Multilingual and Cross Lingual Audio Emotion Analysis Using RNN 187

References

1. Le XH, Ho HV, Lee G, Jung S (2019) Application of long short-term memory (LSTM) Neural
network for flood forecasting. Water 11(7):1387. https://doi.org/10.3390/w11071387
2. Sherratt F, Plummer A, Iravani P (2021) Understanding LSTM network behaviour of IMU-
based locomotion mode recognition for applications in prostheses and wearables. Sensors
21(4):1264. https://doi.org/10.3390/s21041264
3. Janse PV, Magre SB, Kurzekar PK, Deshmukh RR (2014) A comparative study between MFCC
and DWT feature extraction technique. Int J Eng Res Technol 3
4. Sen S, Dutta A, Dey N (2019) Speech processing and recognition system. In: Audio processing
and speech recognition. Springer, Singapore, pp 13–43
5. Bhattacharya S, Borah S, Mishra BK, Das N (2022) Deep analysis for speech emotion recog-
nization. In: 2022 second international conference on computer science, engineering and
applications (ICCSEA), Gunupur, India, 2022, pp 1–6. https://doi.org/10.1109/ICCSEA54677.
2022.9936080
6. Bhattacharya S, Das N, Sahu S, Mondal A, Borah S (2021) Deep classification of sound:
a concise review. In: Patil VH, Dey N, Mahalle P, Shafi Pathan M, Kimbahune VV (eds)
Proceeding of first doctoral symposium on natural computing research. Lecture notes in
networks and systems, vol 169. Springer, Singapore. https://doi.org/10.1007/978-981-33-407
3-2_4
7. Saad F, Mahmud H, Ridwan Kabir M, Alamin Shaheen M, Farastu P, Kamrul Hasan M (2021) A
case study on the independence of speech emotion recognition in Bangla and English languages
using language-independent prosodic features. ArXiv E-Prints, arXiv:2111.10776. https://doi.
org/10.48550/arXiv.2111.10776
8. Dupuis K, Pichora-Fuller MK (2014) Intelligibility of emotional speech in younger and older
adults. Ear Hear 35(6):695–707
9. Shegokar P, Sircar P (2016) Continuous wavelet transform based speech emotion recogni-
tion. In: 2016 10th international conference on signal processing and communication systems
(ICSPCS). IEEE, pp 1–8
10. Zhang B, Provost EM, Essl G (2016) Cross-corpus acoustic emotion recognition from singing
and speaking: A multi-task learning approach. In 2016 IEEE International conference on
acoustics, speech and signal processing (ICASSP) (pp 5805–5809). IEEE
11. Zhang S, Zhang S, Huang T, Gao W, Tian Q (2017) Learning affective features with a hybrid
deep model for audio–visual emotion recognition. IEEE Transactions on Circuits and Systems
for Video Technology, 28(10):3030–3043
12. Popova OV (2017) To the issue of culturological approach to professional speech training
targeted for the future translators of Chinese 2017
13. Luengo I, Navas E, Hernáez I (2010) Feature analysis and evaluation for automatic emotion
identification in speech. IEEE Trans Multimedia 12(6):490–501
14. Wu S, Falk TH, Chan W-Y (2011) Automatic speech emotion recognition using modulation
spectral features. Speech Commun 53:768–785
15. Lampropoulos AS, Tsihrintzis GA (2012) Evaluation of MPEG-7 descriptors for speech
emotion recognition. In: 2012 eighth international conference on intelligent information hiding
and multimedia signal processing (IIH-MSP). IEEE, pp 98–101
16. Pohjalainen J, Alku P (2014) Multi-scale modulation filtering in automatic detection of
emotions in telephone speech. In: 2014 IEEE international conference on acoustics, speech,
and signal processing (ICASSP). IEEE
17. Huang Z, Dong M, Mao Q, Zhan Y (2014) Speech emotion recognition using CNN. In:
Proceedings of the 22nd ACM international conference on multimedia (2014), pp 801–804
18. Latif S, Qadir J, Bilal M (2019) Unsupervised adversarial domain adaptation for cross-lingual
speech emotion recognition. In: 2019 8th international conference on affective computing and
intelligent interaction (ACII), pp 732–737
19. Haider F, Pollak S, Albert P, Luz S (2020) Emotion recognition in low-resource settings: an
evaluation of automatic feature selection methods. Comput Speech Lang 65:101119
188 S. Bhattacharya et al.

20. Bhattacharya S, Borah S, Mishra BK, Mondal A (2022) Emotion detection from multilingual
audio using deep analysis. Multimedia Tools Appl
21. Jiang D-N, Lu L, Zhang H-J, Tao J-H, Cai LH (2002) Music type classification by spec-
tral contrast feature. In: 2002 IEEE international conference on multimedia and Expo, 2002
(ICME’02), vol 1. IEEE, pp 113–116
22. Dey N, Ashour AS (2018) Sources localization and DOAE techniques of moving multiple
sources. In: Direction of arrival estimation and localization of multi-speech sources. Springer,
Cham, pp 23–34
23. Sen S, Dutta A, Dey N (2019) Audio indexing. Audio processing and speech recognition, pp
1–11
Multi-modality Brain Tumor
Segmentation of MRI Images Using
ResUnet with Attention Mechanism

Aditya Verma , Mohit Zanwar , Anshul Kulkarni , Amit Joshi ,

and Suraj Sawant

Abstract Brain tumors occur when abnormal cells grow within the brain. They
can put pressure on healthy parts of the brain or spread into those areas. Early and
prompt disease detection and diagnosis boost these individuals’ life expectancy. The
most popular technique for visualizing important brain areas is magnetic resonance
imaging (MRI). There are different modalities in magnetic resonance images and they
differ in contrast and function. Four modalities of magnetic resonance images are: T1,
T1-CE (contrast enhanced), T2 (spin–spin relaxation), and flair. In MRI, separation
of different tumor tissues from normal tissues is termed as segmentation process. An
MRI tumor must be manually segmented, which takes a lot of computation and could
produce inaccurate results. Consequently, segmenting brain tumors requires the use
of automated procedures. In this study, we propose a pipeline of preprocessing tech-
niques and a deep learning model for segmenting brain tumors, thereby enhancing
the capability of automated algorithms to support doctors in clinical diagnosis.

Keywords Brain tumor · MRI · Segmentation · Unet · Deep learning

A. Verma (B) · M. Zanwar · A. Kulkarni · A. Joshi · S. Sawant

Department of Computer Engineering and IT, COEP Technological University (COEP Tech),
Wellesley Road, Pune, Maharashtra 411005, India
e-mail: vermaar19.comp@coep.ac.in
M. Zanwar
e-mail: mohitsz19.comp@coep.ac.in
A. Kulkarni
e-mail: kulkarniaa19.comp@coep.ac.in
A. Joshi
e-mail: adj.comp@coep.ac.in
S. Sawant
e-mail: sts.comp@coep.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 189
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_16
190 A. Verma et al.

1 Introduction

The human brain is a hugely complicated organ. Brain tumors arise when normal
healthy cells develop mutations in their DNA structures. These affected cells then
keep multiplying and survive even when all healthy cells die. They are far more
noticeable in children and old people. Having said that, they are one of the most diffi-
cult diseases to treat and have more chances of being cured if detected and segmented
early on. Employing a method that uses deep learning, the model keeps continuously
learning and hence potentially helps in more accurate and better decision-making than
an experienced neurosurgeon and be of a great clinical application. Brain tumors are
very heterogeneous in size, shape, location, etc. MRI also involves a lot of noise and
boundaries are also very irregular. Furthermore, there is a class imbalance problem
which makes it even more challenging to segment the brain tumor.
There have been numerous attempts to address this issue over the last few years.
Babu et al. [1] proposed using LSM and CV model with active contour segmentation
method. Unet is a very common and useful approach which could either be used
after skull stripping [2] or after data augmentation [3] or with a multiscale module
[4]. Unet architecture could also be modified using CNN [5] or the number of layers
could be decreased to reduce complexity of the model [6]. Abou Elenein et al.
[7] employed an encoder–decoder algorithm using pyramid pooling network. Wang
et al. [8] used a transformer to provide feature embeddings for the CNN decoder
(TransBTS). Brain tumor segmentation can also be done using the genetic algorithm
[9] or by transforming the 3D brain into a solid unit ball and then a cube followed
by CNN [10].
We wish to achieve two main objectives. The first objective is to handle class
imbalance in the dataset. The second objective is to segment a 3D volume into a
mask where labels represent different regions such as whole tumor, tumor core, and
enhancing tumor. Both objectives are explained in the proposed method section.
This paper includes five sections in total: literature review in Sect. 2, proposed
methodology in Sect. 3, results and discussion in Sect. 4, and conclusion and future
scope in Sect. 5.

2 Literature Review

The last several decades have seen a rise in research on computerized brain tumor
segmentation, indicating a growing interest in this area of still-developing study.
This section discusses a few of the current techniques for segmenting brain tumors.
The arduous and error-prone process of manually segmenting brain tumor is tedious.
Researchers have suggested numerous automated approaches to address these issues.
Magadza et al. [11] conducted a study of advanced deep learning techniques
for segmenting brain tumors, emphasizing their key components, and different
approaches along with critical review of open challenges in medical image analysis.
Multi-modality Brain Tumor Segmentation of MRI Images Using … 191

Atiyah et al. [12] suggested using Unet encoder with EfficientNet-B7 architecture.
In their proposed method they made use of four Nvidia P40 GPU’s and attained great
accuracy. In addition to fusion loss function, Zhou et al. [13] used a powerful 3D
residual neural network (ERV-Net). Shan et al. [14] proposed a depth-wise CNN to
save computational resources and to combine features from various receptive fields.
Mlynarski et al. recommended utilizing both fully and poorly labeled training data
whereas Díaz-Pernas et al. used a deep convolution method with multiscale approach
which could analyze three types of tumors [15, 16]. Tiwari et al. [17] performed a
review on different segmentation techniques for brain tumors and gave a detailed
comparison between them. A preprocessing strategy was put out by Ranjbarzadeh
et al. to reduce time complexity and solve the overfitting issue. Along with that,
they included a distance-wise attention (DWA) mechanism, although it was limited
when it came to tumors that covered more than one-third of the entire brain [18]
whereas Naser et al. [19] utilized Unet, transfer learning and a completely integrated
classifier. In 2021, Khan et al. [20] made use of K-means clustering for brain tumor
segmentation using the BraTS 2015 benchmark datasets with better accuracy than
previously reported methods. Long short-term memory (LSTM) and ConvNet are
combined which enhances the results by using edge enhancement, noise reduction,
histogram equalization, and Laplacian of Gaussian filtering [21–23].
A lot of research has been performed by using the CNN and RCNN architectures
along with decoder blocks and UNet encoders [24–26]. Yogananda et al. created a
three-group framework using the 2019 BraTs dataset and each group consisted of
three 3D-dense-Unets [27]. Sajid et al. [28] suggested a preprocessing stage in which
3D MR images are converted to 2D slices to keep constant dimensions and made
use of two-path, three-path, and hybrid CNN and controlled the overfitting problem.
Yang et al. [29] developed an autonomous segmentation technique (RF and SK-TP
CNN) to improve the capability of nonlinear mapping. Using the BraTs 2015 and
2021 datasets, Elmezain et al. [30] developed a method in 2022 by combining the
deep capsule network with the latent-dynamic conditional random field.
Wang et al. [31] introduced a 2.5D network that bridges the gaps of memory
consumption, model complexity, and receptive field. The analysis was done on BraTs
2017 and 2018 datasets. Chen et al. [32] introduced a number of layers with a
perceptron-based method to enhance the performance. Wu et al. [33] put out a deep
CNN neural network fusion support vector machine approach in which the model
was run on the BraTs and a custom dataset in three phases. With deep learning-based
selective attention, Akil et al. [34] developed a method employing contiguous regions
and multiclass weighted cross-entropy. Zhao et al. reviewed different methods with
DNN on the BraTs 2019 dataset whereas Zhang et al. in their research, came up
with a powerful hybrid clustering technique coupled with morphological procedures
to reduce noise sensitivity and enhance segmentation stability using fuzzy C-means
algorithm to segment images [35, 36]. Biratu et al. [37] suggested an enhanced region-
growing technique using skull stripping for efficient seed point initialization. Jiang
et al. [38] proposed a novel edge extraction algorithm and self-adaptive balancing
class weight coefficient to solve the class imbalance problem which further achieved
better performance.
192 A. Verma et al.

Some of them have used the FLAIR MRI data whereas most of the work is done
on the BraTs dataset [39, 40].
The above review shows that the Unet model was predominant compared with
other models but has certain limitations which can be overcome.

3 Proposed Methodology

Figure 1 depicts the stages of the experiment which contains five steps. It includes
preprocessing of MRI volumes as the first stage followed by data augmentation. The
next stage explains the patching strategy to which ResUnet-A model was applied.
The trained model is assessed using a variety of performance measures in the last
phase.

3.1 Preprocessing and Data Augmentation

The range of pixel intensity values is altered through normalization. All values were
normalized in range 0–1. Resampling resizes an image according to the desired voxel
spacing. This technique was used to change the number of voxels per mm. Each input
volume was resampled with voxel spacing of 1.62, 1.62, and 1.62. Patching refers
to dividing a large volume into a smaller set of volumes. The size of each 3D patch
is (64, 64, 64). The concept of overlapping patches is borrowed from Akil et al.
[34]. The overlapping patches will produce five predictions per patch, where the first
pixel’s forecast will be influenced by the predictions of the remaining four pixels and
vice versa for each voxel. Therefore, even when using small patches, architectures
are still able to categorize and determine the overall context. The size of overlapping
patches for our experiment was (32, 32, 32).
There were six augmentation methods used: scaling, rotations, elastic deforma-
tion, mirroring, brightness, and Gaussian noise. The probability of data augmentation
per sample was 0.8.

Fig. 1 Method pipeline

Multi-modality Brain Tumor Segmentation of MRI Images Using … 193

3.2 ResUnet with Attention Mechanism

Figure 2 depicts the overall architecture of the model with dimensions of the feature
map in each layer. Since patch size of (64, 64, 64) was used with two modalities, the
input layer has dimension of (64, 64, 64, 2). The final layer has dimension of (64,
64, 64, 4) since there are four labels to be predicted.
Figure 3a depicts the residual convolution block which consists of three sets of
(Conv. + Batch Norm.) layers. The first two convolution operations are computed
with kernel size of (3, 3, 3) while the third convolution operation is computed with
a kernel size of (1, 1, 1). A skip connection is used to merge the feature map of
convolution with different kernel sizes, so that both larger and smaller features are
accounted in the feature map. The three sets are followed by ReLu activation function.
Using a max pooling layer with a pool size of 2, the dimension of the feature map is
then decreased. The middle layer, which connects the encoder and decoder layers, is
shown in Fig. 3b. It consists of two sets of (Conv. + Batch Norm.) layers with a kernel
size of (3, 3, 3). Figure 3c depicts the gating signal which contains a convolution
layer with kernel size of (1, 1, 1) followed by batch normalization layer. The aim of
the gating signal is to return the gating feature map with the same dimension as of
the upper layer feature map.
Figure 4a depicts the attention block which is used to focus more on important
features rather than non-useful background information. It follows two pathways. The
first path involves the use of the gating output which then undergoes a convolution
operation with a kernel size of (1, 1, 1) and stride 1. Only a convolution operation
with a kernel size of (3, 3, 3) with a stride of 2 is used in the second path. The
two-path feature map is then concatenated. The feature map is upsampled by size of

Fig. 2 Architecture of the ResUnet-A

194 A. Verma et al.

Fig. 3 Components of the model

Fig. 4 Attention and upsampling block

2 as it is a part of the decoder. After feature extraction, the dimensions of the volume
must be reduced which can be accepted by the attention block. Figure 4b depicts the
upsampling block which concatenates the previous downsampled layer feature map
with the attention feature map followed by two sets of (Conv. + Batch Norm.) layers
with a kernel size of (3, 3, 3).

4 Results and Discussion

4.1 Dataset (BraTs 2020)

The dataset consisted of 369 folders where each sample includes four modalities, i.e.,
t1, t2, t1ce, and flair in the nifti format. Classes in the mask comprise the enhancing
tumor: 4, necrotic and non-enhancing tumor core: 1, peritumoral edema: 2, and
non-tumor: 0. After preprocessing 189 samples were filtered.
Multi-modality Brain Tumor Segmentation of MRI Images Using … 195

4.2 Experimental Setup

In this experiment, two modalities, i.e., t1ce and flair were used for the segmentation
process. Both modalities are cropped to (128, 128, 128) since the majority of the
useful volume lies in the centrally cropped region of slices. Each modality was
concatenated, resulting in a shape of (128, 128, 128, 2). The segmented mask was
cropped to size of (128, 128, 128). The volumes which consist of less than 1% useful
masks are discarded to save computational resources. Total number of volumes after
preprocessing was 189.
MRI data was split into two folders, i.e., the training and testing sample, where
84% of volumes were used for training and 16% was used for testing. Adam optimizer
with an initial learning rate of 10−4 was used. The batch size was 2. The training
included two-fold cross-validation with a total of 200 epochs. The implementation
was done in Python using the MISCNN package and Keras. Nvidia Tesla V100 GPU
was used for training. The regions for evaluation were whole tumor (labels 1, 2, 4),
tumor core (labels 2, 4), and enhancing tumor (label 4).
The loss function is the sum of dice coefficients and the cross-entropy called dice
cross-entropy loss [41] was used to achieve the best results.

2 i⊆l u i vi
k k
ldc =− (1)
K k⊆K i⊆l u ik + i⊆l vik

The loss function is written above with u representing the softmax output and v
representing the GT’s single hot encoding. I is the number of voxels in the training
batch, and K is the total number of labels.

4.3 Performance Metrics

Accuracy: It represents the proportion of correct predictions among all predictions.

TP + TN
Accuracy = (2)
TP + TN + FP + FN

Precision: The measure of the accuracy of a positive prediction.

TP
Precision = (3)
TP + FP

Dice Score: It is computed by intersection area by the number of voxels in both

images.
196 A. Verma et al.

2 × TP
Dice Score = (4)
2 × TP + FP + FN

Sensitivity: It is a measure of how many true positives are predicted out of all actual
positives.

TP
Sensitivity = (5)
TP + FN

4.4 Results and Discussion

In this experiment, the ResUnet-A model along with dice cross-entropy loss function
was used. The residual block allows the architecture to extract both smaller and
larger features using deep layers without degrading performance. A good score was
achieved in enhancing tumor region, i.e., minority class because only those samples
were chosen which had more than 1% useful mask. Since ResUnets generally take
more time and effort to train, two modalities were experimentally chosen to reduce
training and inference time to make the solution practically efficient.
Table 1 gives that a high accuracy of 98.93% was achieved with a precision of
86.73%, while the sensitivity of predicting all labels was 88.56%. Table 2 depicts
the region-wise scores for semantic segmentation.
Table 3 depicts the comparison of existing models with the proposed method
which uses the BraTs dataset for training and evaluation. In comparison, the proposed
method yields a greater dice score for the enhancing tumor region, however it yields
a decent score for the tumor core and whole tumor regions. Higher score could have
been achieved if the number of fold for cross validation was increased. Fig. 5 below
illustrates a sample of the predicted tumor area.

Table 1 Results using the proposed method (ResUnet-A)

Accuracy Precision Sensitivity Dice
0.9893 0.8673 0.8856 0.8710

Table 2 Region-wise segmentation results

Background WT TC ET
Sensitivity 0.9956 0.8489 0.8805 0.8781
Dice 0.9960 0.8302 0.8665 0.8501
Multi-modality Brain Tumor Segmentation of MRI Images Using … 197

Table 3 Comparison with existing models

Paper Model WT TC ET
Zhou et al. ERV-NET 0.81 0.91 0.86
Mlynarski et al. UNET 0.85 0.78 0.74
Shan et al. 3D CNN 0.78 0.90 0.83
Jiang et al. DDU-NET 0.89 0.83 0.78
Proposed method ResUnet-A 0.83 0.86 0.85

Fig. 5 Prediction sample output

5 Conclusion and Future Scope

In this paper, a method to segment brain tumors into four classes and evaluate the
results based on three regions was proposed. We selected only those samples which
were meaningful for this problem and which resulted in less training time for the
model. The proposed pipeline for volumetric segmentation trains on smaller but
important samples and gives satisfactory results. The ResUnet-A model with dice
cross-entropy loss function was used for training. We also used the concept of overlap-
ping patches to get better results. The results could have been further enhanced if the
number of folds in cross validation were increased. It should be noted that the experi-
ment results were not validated on the online platform. We have shown that reducing
the volume size with only two modalities and selecting only important samples
does not affect the results significantly. This approach performs memory-efficient
segmentation that can help radiologists with faster diagnosis and treatments.
For future work, we intend to perform 3D tumor classification which will predict
whether the tumor is non-cancerous or cancerous.

Acknowledgements We are thankful to the Department of Computer Engineering and IT, COEP
Tech. for providing GPU server facility to implement this work. This facility was established under
TEQIP-III (A World Bank Project).
198 A. Verma et al.

References

1. Babu KR, Indira ND, Prasad KV, Shameem S (2021) An effective brain tumor detection from
t1w MR images using active contour segmentation techniques. J Phys Conf Ser 1804(1):012,
174. https://doi.org/10.1088/1742-6596/1804/1/012174
2. Rao N, Reddy DLS, Gujja H (2022) Brain MRI segmentation binary u-net based architecture
using deep learning algorithm. https://doi.org/10.21203/rs.3.rs-1916275/v1
3. Ottom MA, Rahman HA, Dinov ID (2022) Znet: deep learning approach for 2d MRI brain
tumor segmentation. IEEE J Transl Eng Health Med 10:1–8. https://doi.org/10.1109/jtehm.
2022.3176737
4. Zhang F, Wu L, Wang Y, Yang Y, Li M, Li J, Xu Y (2022) A multi-scale brain tumor segmen-
tation method based on u-net network. J Phys Conf Ser 2289(1):012, 028. https://doi.org/10.
1088/1742-6596/2289/1/012028
5. Kajal M, Mittal A (2022) A modified u-net based architecture for brain tumour segmentation
on BRATS 2020. https://doi.org/10.21203/rs.3.rs-2109641/v1
6. Jena B, Jain S, Nayak GK, Saxena S (2022) Analysis of depth variation of u-NET architecture
for brain tumor segmentation. Multimedia Tools Appl. https://doi.org/10.1007/s11042-022-
13730-1
7. AboElenein NM, Piao S, Zhang Z (2022) Encoder–decoder network with depthwise atrous
spatial pyramid pooling for automatic brain tumor segmentation. Neural Process Lett. https://
doi.org/10.1007/s11063-022-10959-7
8. Wang W, Chen C, Ding M, Yu H, Zha S, Li J (2021) TransBTS: multimodal brain tumor
segmentation using transformer. In: Medical image computing and computer assisted inter-
vention—MICCAI 2021. Springer International Publishing, pp 109–119. https://doi.org/10.
1007/978-3-030-87193-2
9. Arif M, Jims A, Ajesh F, Geman O, Craciun MD, Leuciuc F (2022) Application of genetic
algorithm and u-net in brain tumor segmentation and classification: a deep learning approach.
Comput Intell Neurosci 2022:1–11. https://doi.org/10.1155/2022/5625757
10. Lin WW, Juang C, Yueh MH, Huang TM, Li T, Wang S, Yau ST (2021) 3d brain tumor
segmentation using a two-stage optimal mass transport algorithm. Sci Rep 11(1). https://doi.
org/10.1038/s41598-021-94071-1
11. Magadza T, Viriri S (2021) Deep learning for brain tumor segmentation: a survey of state-of-
the-art. J Imag 7(2):19. https://doi.org/10.3390/jimaging7020019
12. Atiyah AZ, Ali KH (2022) Segmentation of human brain gliomas tumour images using u-net
architecture with transfer learning. Diyala J Eng Sci 17–29. https://doi.org/10.24237/djes.2022.
15102
13. Zhou X, Li X, Hu K, Zhang Y, Chen Z, Gao X (2021) ERV-net: an efficient 3d residual neural
network for brain tumor segmentation. Expert Syst Appl 170:114, 566. https://doi.org/10.1016/
j.eswa.2021.114566
14. Shan C, Li Q, Wang CH (2022) Brain tumor segmentation using automatic 3d multichannelfea-
ture selection convolutional neural network. J Imaging Sci Technol 66(6):060, 502–1–060,
502–9. https://doi.org/10.2352/j.imagingsci.technol.2022.66.6.060502
15. Mlynarski P, Delingette H, Criminisi A, Ayache N (2019) Deep learning with mixed supervision
for brain tumor segmentation. J Med Imaging 6(03):1. https://doi.org/10.1117/1.jmi.6.3.034002
16. Díaz-Pernas FJ, Martínez-Zarzuela M, Antón-Rodríguez M, Gonzàlez-Ortega D (2021) A
deep learning approach for brain tumor classification and segmentation using a multiscale
convolutional neural network. Healthcare 9(2):153. https://doi.org/10.3390/healthcare9020153
17. Tiwari A, Srivastava S, Pant M (2020) Brain tumor segmentation and classification from
magnetic resonance images: review of selected methods from 2014 to 2019. Pattern Recogn
Lett 131:244–260. https://doi.org/10.1016/j.patrec.2019.11.020
18. Ranjbarzadeh R, Kasgari AB, Ghoushchi SJ, Anari S, Naseri M, Ben-dechache M (2021)
Brain tumor segmentation based on deep learning and an attention mechanism using MRI
multi-modalities brain images. Sci Rep 11(1). https://doi.org/10.1038/s41598-021-90428-8
Multi-modality Brain Tumor Segmentation of MRI Images Using … 199

19. Naser MA, Deen MJ (2020) Brain tumor segmentation and grading of lower-grade glioma
using deep learning in MRI images. Comput Biol Med 121:103, 758. https://doi.org/10.1016/
j.compbiomed.2020.103758
20. Khan AR, Khan S, Harouni M, Abbasi R, Iqbal S, Mehmood Z (2021) Brain tumor segmentation
using k-means clustering and deep learning with synthetic data augmentation for classification.
Microsc Res Tech 84(7):1389–1399. https://doi.org/10.1002/jemt.23694
21. Iqbal S, Khan MUG, Saba T, Mehmood Z, Javaid N, Rehman A, Abbasi R (2019) Deep learning
model integrating features and novel classifiers fusion for brain tumor segmentation. Microsc
Res Tech 82(8):1302–1315. https://doi.org/10.1002/jemt.23281
22. Thillaikkarasi R, Saravanan S (2019) An enhancement of deep learning algorithm for brain
tumor segmentation using kernel based CNN with m-SVM. J Med Syst 43(4). https://doi.org/
10.1007/s10916-019-1223-7
23. Kumar MJ, Sai NR, Chowdary CS (2020) RETRACTED: an efficient deep learning approach
for brain tumor segmentation using CNN. IOP Conf Ser Mater Sci Eng 981(2):022, 012. https://
doi.org/10.1088/1757-899x/981/2/022012
24. Hossain T, Shishir FS, Ashraf M, Nasim MAA, Shah FM (2019) Brain tumor detection using
convolutional neural network. In: 2019 1st international conference on advances in science,
engineering and robotics technology (ICASERT). IEEE. https://doi.org/10.1109/icasert.2019.
8934561
25. Pitchai R, Praveena K, Murugeswari P, Kumar A, Bee MKM, Alyami NM, Sundaram RS,
Srinivas B, Vadda L, Prince T (2022) Region convolutional neural network for brain tumor
segmentation. Comput Intell Neurosci 2022:1–9. https://doi.org/10.1155/2022/8335255
26. Chang J, Zhang L, Gu N, Zhang X, Ye M, Yin R, Meng Q (2019) A mixpooling CNN architec-
ture with FCRF for brain tumor segmentation. J Vis Commun Image Represent 58:316–322.
https://doi.org/10.1016/j.jvcir.2018.11.047
27. Yogananda CGB, Wagner B, Nalawade SS, Murugesan GK, Pinho MC, Fei B, Madhuran-
thakam AJ, Maldjian JA (2020) Fully automated brain tumor segmentation and survival predic-
tion of gliomas using deep learning and MRI. In: Brainlesion: Glioma, multiple sclerosis, stroke
and traumatic brain injuries. Springer International Publishing, pp 99–112. https://doi.org/10.
1007/978-3-030-46643-510
28. Sajid S, Hussain S, Sarwar A (2019) Brain tumor detection and segmentation in MR images
using deep learning. Arab J Sci Eng 44(11):9249–9261. https://doi.org/10.1007/s13369-019-
03967-8
29. Yang T, Song J, Li L (2019) A deep learning model integrating SK-TPCNN and random forests
for brain tumor segmentation in MRI. Biocybernetics Biomed Eng 39(3):613–623. https://doi.
org/10.1016/j.bbe.2019.06.003
30. Elmezain M, Mahmoud A, Mosa DT, Said W (2022) Brain tumor segmentation using deep
capsule network and latent-dynamic conditional random fields. J Imaging 8(7):190. https://doi.
org/10.3390/jimaging8070190
31. Wang G, Li W, Ourselin S, Vercauteren T (2019) Automatic brain tumor segmentation based on
cascaded convolutional neural networks with uncertainty estimation. Front Comput Neurosci
13. https://doi.org/10.3389/fncom.2019.00056
32. Chen S, Ding C, Liu M (2019) Dual-force convolutional neural networks for accurate brain
tumor segmentation. Pattern Recogn 88:90–100. https://doi.org/10.1016/j.patcog.2018.11.009
33. Wu W, Li D, Du J, Gao X, Gu W, Zhao F, Feng X, Yan H (2020) An intelligent diagnosis
method of brain MRI tumor segmentation using deep convolutional neural network and SVM
algorithm. Comput Math Methods Med 2020:1–10. https://doi.org/10.1155/2020/6789306
34. Naceur MB, Akil M, Saouli R, Kachouri R (2020) Fully automatic brain tumor segmentation
with deep learning-based selective attention using overlapping patches and multi-class weighted
cross-entropy. Med Image Anal 63:101, 692. https://doi.org/10.1016/j.media.2020.101692
35. Zhao YX, Zhang YM, Liu CL (2020) Bag of tricks for 3d MRI brain tumor segmenta-
tion. In: Brainlesion: Glioma, multiple sclerosis, stroke and traumatic brain injuries. Springer
International Publishing, pp 210–220. https://doi.org/10.1007/978-3-030-46640-420
200 A. Verma et al.

36. Zhang C, Shen X, Cheng H, Qian Q (2019) Brain tumor segmentation based on hybrid clustering
and morphological operations. Int J Biomed Imaging 2019:1–11. https://doi.org/10.1155/2019/
7305832
37. Biratu ES, Schwenker F, Debelee TG, Kebede SR, Negera WG, Molla HT (2021) Enhanced
region growing for brain tumor MR image segmentation. J Imaging 7(2):22. https://doi.org/
10.3390/jimaging7020022
38. Jiang M, Zhai F, Kong J (2021) A novel deep learning model DDU-net using edge features to
enhance brain tumor segmentation on MR images. Artif Intell Med 121:102180
39. Zeineldin RA, Karar ME, Coburger J, Wirtz CR, Burgert O (2020) DeepSeg: deep neural
network framework for automatic brain tumor segmentation using magnetic resonance FLAIR
images. Int J Comput Assist Radiol Surg 15(6):909–920. https://doi.org/10.1007/s11548-020-
02186-z
40. Jun W, Haoxiang X, Wang Z (2021) Brain tumor segmentation using dual-path attention u-net in
3d MRI images. In: Brainlesion: Glioma, multiple sclerosis, stroke and traumatic brain injuries.
Springer International Publishing, pp 183–193. https://doi.org/10.1007/978-3-030-72084-1
41. Isensee F, Petersen J, Klein A, Zimmerer D, Jaeger PF, Kohl S, Wasserthal J, Koehler G,
Norajitra T, Wirkert S, Maier-Hein KH (2019) Abstract: nnU-net: self-adapting framework
for u-net-based medical image segmentation. In: Informatik aktuell. Springer Fachmedien
Wiesbaden, pp 22–22. https://doi.org/10.1007/978-3-658-25326-4
CPF Analysis for Identification
of Voltage Collapse Point and Voltage
Stability of an IEEE-5 Bus System Using
STATCOM

Subhadip Goswami , Tapas Kumar Benia , and Abhik Banerjee

Abstract This paper presents an efficient illustration of power flow based on NR

method and continuous power flow (CPF) for a IEEE-5 bus system and comparison
of the line losses between these two types of power flows. Firstly, the power flow
results are checked keeping loads in 3 out of 5 buses, and then, a compensating device
STATCOM is used to minimize the loss. It is seen that the losses are minimized after
shunt compensation. The CPF results show a plot of the loading parameter which
indicates the weakest bus of the system. It is also found that losses are checked with
increase in load in certain buses. All the simulations are carried out in power system
analysis toolbox (PSAT) environment, and the results are obtained accordingly.

Keywords Continuation power flow (CPF) · PSAT · STATCOM · Loading

parameter (λ)

1 Introduction

In recent years, the use of FACTS devices has increased extensively due to their use
of minimizing the loss in power system analysis. The continuous power flow is an
effective tool for analysis of power flow solutions from base load and continuing
till, a steady-state voltage stability limit is reached. Here, the power flow process
is well conditioned around critical point. In such a process, even if single-precision
computation is used, then also divergence is not observed due to ill conditioning
of the system. Even the loading parameter also helps in identifying the weakest

S. Goswami · T. K. Benia · A. Banerjee (B)

National Institute of Technology, Jote, Arunachal Pradesh 791113, India
e-mail: abhik@nitap.ac.in
S. Goswami
e-mail: subhadip.phd@nitap.ac.in
T. K. Benia
e-mail: tapas.phd20@nitap.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 201
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_17
202 S. Goswami et al.

bus of a system of buses. Voltage stability has to be maintained for all buses of the
system under normal operating conditions as well as when subjected to a disturbance.
There are many classifications for voltage stability when it becomes unstable due
to occurrence of fault in the system. They are large disturbance voltage stability,
small disturbance voltage stability, short-term voltage stability and long-term voltage
stability. Voltage collapse is typically associated with reactive power demand of
load not being met due to shortage in reactive power production and transmission.
The term voltage collapse is also often used for voltage instability conditions. It
is the method by which a series of subsequent events caused by voltage instability
resulting abnormally low voltages or even a blackout in a wide range of model types
of devices, both conventional and non-conventional. With PSAT, several static and
dynamic analyzes can be completed [1]. Using a different load flow technique in
PSAT, steady-state analysis of the IEEE-6 bus system is also displayed, and the line
losses are also detected with the change in loading, in order to establish a better
result [2]. By studying eigen-values and PV curves, solar photovoltaic generation at
peak demand conditions aids in boosting the loading margin that enhances system
stability without having a detrimental impact on voltage stability [3]. Voltage stability
margin is also improved in order to maximize the effective generator reactive power
reserve with PV and PQ generators by using the optimized one stage and two stage
approaches of preventive control action [4]. In order to identify the most sensitive
node to prevent blackouts or voltage collapse of the transmission system in the
IEEE-I4 bus system utilizing PSAT, the power flow results have also been compared
and studied [5]. The location of FACTS devices such as SVC and STATCOM at
midpoint of transmission line increases the power transfer capability [6]. HVAC
systems with different frequencies are interconnected for delivering more power over
longer distances with fewer losses [7]. The load flow analysis is carried out using
Gauss–Seidel method and Newton–Raphson method [8]. The modified two-area,
four generator with a parallel HVDC system is also simulated in PSAT software of
MATLAB for performing the load flow analysis [9]. A new MATLAB-based power
system analysis tool (PSAT) is described that freely distributed on line [10].

2 System Structure

The aforementioned Fig. 1 is a 5 bus test system where bus 1 is the slack bus, bus
2 is the generator bus (PV bus), and buses 3, 4, and 5 are exclusively load buses
(PQ). Figure 2 shows the circuit diagram of the 5-bus system compensated with
STATCOM at bus-3. Similarly, Figs. 3 and 4 depict the circuit model for 5-bus
system compensated with STATCOM at bus-4 and bus-5, respectively.
CPF Analysis for Identification of Voltage Collapse Point and Voltage … 203

Fig. 1 Circuit diagram of

proposed IEEE-5 bus system

Fig. 2 Circuit diagram of

5-bus system at compensated
at bus-3

3 Simulation Results and Discussion

The normal power flow is carried out for the 5-bus test system shown in Fig. 1. It
is seen from Table 1 that voltage is stable for buses 1 and 2 because buses 1 and 2
are slack bus and generator bus, respectively. But, buses 3, 4, and 5 have incurred
voltage drops as they are exclusively load buses.
The shunt compensating device STATCOM is then inserted in the 5-bus system.
Firstly, it is inserted at bus-3 as shown in Fig. 2. Then again, power flow is conducted,
and it is seen that voltage at bus-3 becomes 1.00 p.u. It is shown in Table 2.
204 S. Goswami et al.

Fig. 3 Circuit diagram of 5-bus system at compensated at bus-4

Fig. 4 Circuit diagram of 5-bus system at compensated at bus-5

Then as shown in Fig. 3, STATCOM is inserted in bus-4, and voltage stability is

obtained for that particular bus (Tables 3 and 4).
Similarly, it is seen in Fig. 4 that STATCOM is inserted in bus-5, and voltage is
stabilized in that bus. So, it is seen that STATCOM when inserted in a particular bus
stabilizes the voltage of that bus (Tables 5, 6, and 7).
CPF Analysis for Identification of Voltage Collapse Point and Voltage … 205

Table 1 Power flow results for proposed IEEE-5bus system

Bus V (p.u.) Phase Pgen (p.u.) Qgen (p.u.) Pload (p.u.) Qload (p.u.)
1 1.06 0 1.2198 0.90804 0 0
2 1 −0.03531 0.48794 −0.37496 0.2 0.25
3 0.98704 −0.0804 0 0 0.45 0.1
4 0.98316 −0.08571 0 0 0.4 0.05
5 0.96979 −0.09948 0 0 0.6 0.1
Bold represents a different voltage in p.u system

Table 2 Bus voltage results

Bus number Voltage (p.u.)
after STATCOM insertion at
bus-3 1 1.06
2 1.00
3 1.00
4 0.9832
5 0.9697

Table 3 Bus voltage results

Bus number Voltage (p.u.)
after STATCOM insertion at
bus-4 1 1.06
2 1.00
3 0.9870
4 1.00
5 0.9697

Table 4 Bus voltage results

Bus number Voltage (p.u.)
after STATCOM insertion at
bus-5 1 1.06
2 1.00
3 0.9870
4 0.9832
5 1.00

Now, line losses are compared once without STATCOM and then with using
STATCOM. In Table 8, it is seen that line losses are minimized using STATCOM.
Now, we increase the load by a certain per cent individually at bus-3 and bus-4,
and then, the change in line losses is observed, once without STATCOM and then
with STATCOM. The simulation results of varying load with change in line losses
in shown in Tables 9 and 10, respectively, for bus-3 and bus-4. It is seen that losses
decrease with STATCOM for both buses 3 and 4 from 60% increase in load onwards.
206 S. Goswami et al.

Table 5 Line losses at bus-3 using STATCOM

From To bus P Q From To bus P Q PL QL
bus bus
1 2 89.137 77.423 2 1 −86.656 −69.979 2.481 7.444
1 3 41.963 14.087 3 1 −40.568 −9.902 1.395 4.185
2 3 24.500 −7.510 3 2 −24.106 8.692 0.394 1.182
2 4 27.669 −5.320 4 2 −27.192 6.749 0.476 1.429
2 5 54.488 4.245 5 2 −53.293 −0.661 1.195 3.584
3 4 19.632 12.197 4 3 −19.578 −12.037 0.053 0.160
4 5 6.771 5.726 5 4 −6.707 −5.535 0.064 0.191
Total loss 6.058 18.175

Table 6 Line losses at bus-4 using STATCOM

From To bus P Q From To bus P Q PL QL
bus bus
1 2 89.208 77.402 2 1 −86.725 −69.953 2.483 7.449
1 3 41.945 14.273 3 1 −40.547 −10.080 1.398 4.193
2 3 24.463 −7.269 3 2 −24.072 8.442 0.391 1.172
2 4 27.794 −8.421 4 2 −27.288 9.939 0.506 1.518
2 5 54.468 2.691 5 2 −53.279 0.878 1.190 3.569
3 4 19.619 −7.866 4 3 −19.575 8.001 0.045 0.134
4 5 6.801 7.298 5 4 −6.721 −7.059 0.080 0.239
Total loss 6.091 18.274

Table 7 Line losses at bus-5 using STATCOM

From To bus P Q From To bus P Q PL QL
bus bus
1 2 89.476 77.323 2 1 −86.987 −69.856 2.489 7.468
1 3 41.816 17.302 3 1 −40.358 −12.927 1.458 4.374
2 3 24.354 −3.433 3 2 −23.991 4.522 0.363 1.089
2 4 27.608 −3.480 4 2 −27.143 4.874 0.465 1.394
2 5 55.025 −16.149 5 2 −53.710 20.096 1.315 3.946
3 4 19.348 −1.175 4 3 −19.310 1.290 0.038 0.114
4 5 6.454 −5.761 5 4 −6.393 5.944 0.061 0.183
Total loss 6.189 18.568
CPF Analysis for Identification of Voltage Collapse Point and Voltage … 207

Table 8 Losses comparison with and without STATCOM at bus-3, bus-4, and bus-5
Bus number Without STATCOM With STATCOM
3 6.778 6.058
4 6.778 6.091
5 6.778 6.189

Table 9 Comparison of losses with loading with and without STATCOM at bus-3
% Loading Loss without STATCOM Loss with STATCOM
20 7.469 8.345
40 10.832 11.255
60 14.947 14.812
80 19.230 19.043
100 24.245 23.977

Table 10 Comparison of losses with loading with and without STATCOM at bus-4
% Loading Loss without STATCOM Loss with STATCOM
20 7.469 8.4
40 10.832 11.337
60 14.947 14.925
80 19.230 19.197
100 24.245 24.179

Table 11 Continuous power flow results for the above 5 bus system
Bus V (p.u.) Phase Pgen (p.u.) Qgen (p.u.) Pload (p.u.) Qload (p.u.)
1 1.06 0 9.5773 2.1043 0 0
2 1 −0.44023 1.9884 1.05931 0.9942 1.2428
3 0.6398 −0.74959 0 0 2.237 0.4971
4 0.62994 −0.81906 0 0 1.9884 0.24855
5 0.52175 −1.0531 0 0 2.9826 0.4971

The continuous power flow is also carried out for the above test system, and
power flow results are displayed in Table 11. The global power flow report is shown
in Table 12.
The per unit values of real and reactive power for total generation, total load, and
total loss are shown in Table 12. When CPF analysis is done, then a graph of loading
parameter is also obtained (Fig. 5).
208 S. Goswami et al.

Table 12 Overall real and reactive generation, load and loss

Total generation
Real power (p.u.) 11.5657
Reactive power (p.u.) 12.6974
Total load
Real power (p.u.) 8.2022
Reactive power (p.u.) 2.4855
Total loss
Real power (p.u.) 3.3635
Reactive power (p.u.) 10.2119

Fig. 5 Loading parameter plot

4 Conclusion

The above results show that voltage stability is obtained by using a shunt compen-
sating device called STATCOM. Also, the line losses are checked using STATCOM.
It is seen that losses decrease by using STATCOM from 60% increase in load. The
CPF analysis is also done to determine the weakest bus, and it is found from loading
parameter plot that bus-5 is the weakest bus.
CPF Analysis for Identification of Voltage Collapse Point and Voltage … 209

Acknowledgements Without all of the team members’ invaluable advice, input, and inspiration,
this effort would not have been possible. I consider it an honour to collaborate with them to effectively
finish this paper.

References

1. Milano F (2005) An open source power system analysis toolbox. IEEE Trans Power Syst 20(3)
2. Nitve B, Naik R (2014) Steady state analysis of IEEE-6 bus system using PSAT power toolbox.
Int J Eng Sci Innovative Technol [lJESIT] 3(3). ISSN: 2319–5967
3. Tamimi B, Canizares C, Bhattacharya K (2011) Modelling and performance analysis of large
solar photo-voltaic generation on voltage stability and inter-area oscillations. IEEE PES General
Meeting, July 2011
4. Mousavi AO, Bozorg M, Cherkaoui R (2013) Preventive reactive power management for
improving voltage stability margin. Electr Power Syst Res 96:36–46
5. Mishra P, Udupa HN, Ghune P (2013) Calculation of sensitive node for IEEE-14 bus system
when subjected to various changes in Load. In: Proceedings of lRAJ international conference,
21 July 2013, Pune, India. ISBN: 978-93-82702-22-1
6. Wadhwa CL (2009) Electrical power systems, 4th edn. New Age International Publishers
7. Kundur P (1997) Power system stability and control. McGraw-Hill, Inc.,
8. Gupta JB (2009) A course in power systems. S. K. Kataria & Sons
9. Chow J (2003) Power system toolbox version 2.0 load flow tutorial and functions manual
10. Bagchi S, Goswami S, Bhaduri R, Ganguly M, Roy A (2016) Small signal stability anal-
ysis and comparison with DFIG incorporated system using FACTS devices. In: 2016 IEEE
1st international conference on power electronics, intelligent control and energy systems
(ICPEICES)
Analysis of Various Blockchain-Based
Solutions for Electronic Health Record
System

Namdev Sawant and Joanne Gomes

Abstract Digitization helps most of the countries in the world to adopt electronic
health record (EHR) system to store and access sensitive patient medical record easily.
Developing secure IT environment for EHR is a challenging task due to repeatedly
cyber attack on healthcare system. Recently, use of blockchain in healthcare appli-
cations increases due to its inherent features. Blockchain addresses various security
issues faced by healthcare system. The main objective of the paper is to review
and analyze the use of blockchain-based solutions in healthcare domain for EHR.
Comparative analysis is carried out based on type of blockchain used, their pros and
cons along with various security parameters and performance parameters. Based on
the analysis, the future research direction have been suggested.

Keywords Blockchain · Electronic health record (EHR) · Paper-based health

record (PBHR) · Electronic medical record (EMR)

1 Introduction

Healthcare is one of the largest sectors in world, and its global market has been
reached up to $11,908 billion. It is segmented into pharmaceutical medicine, medical
equipment, healthcare service, and biologics. Healthcare service is one of the largest
and important section of healthcare industry. To improve the quality and decision-
making process of the healthcare service, data generated through the interaction
between patient and doctors called healthcare data are an important parameter. Data
Scientists predicted that per year more than 2000 Exabyte of data generated [1]. These
healthcare data are categorized into three groups based on who maintain and handle
it, personal health record (PHR), electronic medical record (EMR), electronic health

N. Sawant (B) · J. Gomes

St. Francis Institute of Technology, Mumbai, India
e-mail: namdev.sawant@sknscoe.ac.in
J. Gomes
e-mail: jgomes@sfit.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 211
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_18
212 N. Sawant and J. Gomes

record (EHR) [2]. PHR is an electronic health data along with other information
maintained and managed by patient in privately and securely. EMR is also an elec-
tronic record created, gathered, and managed by single hospital. Whereas, EHR is an
electronic health record exchanged between different healthcare providers. Storing
and exchanging data between providers, EHR gives various benefits as compared to
paper-based health record (PBHR) [3]. EHR has many benefit over PBHR.
• Cost: PBHR is costlier to manage and maintain over the time than EHR. Initial
cost to setup infrastructure is more in case of EHR but gradually it reduces over
the time.
• Storage: Storing data on decentralized network or cloud becomes more accessible
in case of EHR. PBHR requires big warehouses to store data and become costlier
when handled by many providers.
• Security: Record can be lost, damaged, or misused because of human error
in PBHR, whereas more security mechanism can be provided for unauthorized
access in EHR.
• Diagnosis: EHR helps to improve diagnose the diseases as well as prevent it as
compared to PHBR.
• Access: Exchanging or accessing accurate record is tedious and time consuming
process in case of PBHR. Data stored in electronic form helps.
• Readability and Accuracy: Standard procedure is followed to make document
more readable and accurate to avoid confusion in EHR, whereas most of the time
PBHR affect due to medical error as insufficient space to write information more
details [3].
Advancement in IT sector helps to manage and maintain EHR. Due to sensi-
tive nature of healthcare data confidentiality, authentication, access control, non-
repudiation, interoperability, transparency, accountability, and privacy become need
of IT infrastructure [1]. To improve the security of information technology (IT)
systems, various healthcare standards are proposed. Patient safety, evidence-based
care, process improvement, easy exchange of information, and reducing cost are
some benefits of standard. Standards are listed in Table 1 [4].

Table 1 Healthcare standards

Name of standard Description
HL7 Health level seven international standard founded in 1987. Used to transfer
and share data between various healthcare providers
ISO Specifies the information architecture required for interoperable
13606-5:2010 communications between systems and services that need or provide EHR
data
ISO/IEEE 11073 Medical/health device communication standards. Enable communication
between medical, healthcare, and wellness devices and with external
computer systems
ISO 20301:2014 Describe general characteristics of machine-readable cards used in the field
of healthcare
Analysis of Various Blockchain-Based Solutions for Electronic Health … 213

Fig. 1 Number of healthcare data breaches

Developing secure IT enabled infrastructure for the healthcare environment

is challenging issue. Since last 10-year healthcare industry struggling with data
breaches and how to prevent it from occurring. In US itself between 2009 and 2021,
4,419 data breaches found which involve more than 500 records. Figure 1 shows per
year count of data breach as per given data in [5].
Besides the data shown in Fig. 1 in 2017, 150,000 patients got affected due to
medical data stored on Amazon database which accidentally got open to public.
Indian healthcare Websites have been hacked, and nearly about 68 lakh records
which contain patient and doctor information have been steal between October 2018
and March 2019. More effort has been taken by healthcare provider to implement
security policies and protocol, still healthcare system more affected by cyberattacks.
It gives challenge to healthcare organization to secure EHR from breaches as well
as criminal activity. Presently, most of the systems are implemented using either
client–server or cloud-based architecture. These systems are suffering from privacy
and security flaws [5].
To provide high-level security, recently, blockchain technology is utilized remark-
ably, mostly used for secure data sharing, managing health record, privacy, and
access control. Blockchain emerged with implementation of first cryptocurrency
called Bitcoin in 2008 by Nakamoto [6]. It is peer-to-peer distributed ledger tech-
nology that can record transactions between two parties efficiently and in verifiable
and permanent way. Node, transactions, block, chain, miner, and consensus are the
basic component of blockchain architecture. Blockchain can be public, private, and
consortium [7].
Blockchain-based applications are classified into various research areas such as
privacy and security, business and industrial applications, governance, education,
healthcare, and Internet of things [8]. This paper focuses on healthcare application.
The blockchain in cloud computing is used to provide security, privacy, data audit
techniques for healthcare data [9, 10]. In this section, the detail study of recent work
214 N. Sawant and J. Gomes

in healthcare domain with the help of blockchain is presented. With the advancement
in healthcare domain, the use of blockchain has been incorporated to provide user
privacy, data security, authentication, data sharing [11]. Proposes blockchain-based
architecture to control, share healthcare data but author fails to show the implemen-
tation of system. Scattered data of patient is an issue solved by Roehrs et al. [12] by
providing unified view, interoperability and scalability issues are handled without
testing security and privacy of data [13]. Presented about how to share medical
data among custodians in trust less environment through Ethereum platform. System
also discusses about security and auditing of data. Another blockchain-based solu-
tion given for secure and scalable medical data sharing [14]. IPFS is used to store
large amount of data as off-chain database. Healthcare information exchange through
the approach of off-chain storage and on chain verification for privacy and authen-
ticity provided in [15]. Logic implementation for access control is more complex.
Blockchain-based service framework proposed to manage personal medical data [16]
in this complete control of data given to patient. Proposed scheme has only frame-
work, and no implementation is carried out. Among multiple authority to preserve
privacy of patient, attribute base signature scheme is proposed [17]. The perfor-
mance cost of the system linearly increases as the number of authority and patient
attributes increases but this scheme helps to resist the collision attack. Decentralized
blockchain technology can help to find the missing EMR from distributed replica
nodes. System implement smart contract to automate action against EMR but it fails
to work when patient is in emergency situation [18]. Another decentralized attribute-
based signature scheme is proposed for healthcare system using blockchain which
help, secure data sharing, easy access the EHR and non-repudiation. In this system,
owner of data has no control over write operation [19]. To access and retrieve the
EMR data efficiently, hyperledger based blockchain solution provided. Proposed
access control protocol helps to hide signature information. This system leads to
down in case of ordering of data increases [20]. The work presented in [21] proposes
new blockchain-based architecture for access model along with authorization scheme
in which users are provided control of the system at granular level. All user has the
same encryption key which leads to non-repudiation. Smart mobiles can also be
used to handle blockchain-based system remotely for personal health data sharing
and collaboration. Application is user centric for sharing data among various doctors
[22, 23]. The work in [24] proposes an approach that control the use of blockchain
for store transactional information about e-Health records and access control policies
(ACPs). ACPs are defined at user as well as resource level. Access policies and indi-
vidual authorization stored in blockchain which may lead to data leakages. Sharing
of EHR within healthcare provider helps proper diagnosis of patient. Decentralized
IPFS-based EHR sharing framework discussed in [25]. Data stored on cloud and user
can share the medical data through mobile as well. System prevents unauthorized
access and allows sharing of medical data in reliable way. System allow to access
patient data to all authorized doctors without patient permission. Yong et al. [26]
have implemented blockchain-based EHR sharing protocol which focuses on secu-
rity and privacy of record. Result shows that the proposed protocol is computational
Analysis of Various Blockchain-Based Solutions for Electronic Health … 215

efficient. Author has designed this protocol without using standard blockchain plat-
form. Recently, one more blockchain enabled patient centric framework is proposed
for healthcare application [27]. For experimental analysis, hyperledger caliper tool is
used to provide various security parameter such as latency, throughput, and resource
utilization.
This research work reviews and analyzes various blockchain-based solutions
for EHR on the basis of security parameters such as authentication, access
control, privacy, interoperability, confidentiality, and performance parameters like
throughput, latency, and speedup. Prospective research areas in said domain have
also been identified.
The rest of the paper is organized as follows: Sect. 2 introduces blockchain funda-
mentals to present overview of blockchain technology. Section 3 presents blockchain
development tools followed by blockchain for EHR methodology in Sect. 4. Section 5
presents the analysis of blockchain work done in EHR domain, and finally, conclusion
is presented.

2 Blockchain Technology/Fundamentals

Blockchain emerges with implementation of first cryptocurrency called Bitcoin.

Blocks in blockchain are connected with each other as shown in Fig. 2. Each block
contains block id, previous hash, time stamp, nonce, transaction root, and hash of
block.

Fig. 2 Blocks in blockchain

216 N. Sawant and J. Gomes

Hash of Previous Block: Because the hash of the previous block is contained in
the hash of the new block, the blocks of the blockchain all build on each other.
Without this component, there would be no connection and chronology between
each block. SHA-256 hash algorithm is used for Bitcoin.
Root Hash of Markle Trees: All transactions contained in a block can be
aggregated in a hash. This is the root hash of the Merkle tree.
Timestamp: A timestamp in the block itself. The time is given in seconds since
1.1.1970.
The Nonce: The nonce, i.e., number used only once in a bitcoin block is a 32-bit
(4-byte) field whose value is adjusted by miners so that the hash of the block will
be less than or equal to the current target of the network.

Classification of Blockchain: Blockchain can be public, private, or consortium [7].

• Public/Permissionless: Anyone can participate in a public blockchain. All partic-
ipants hold a copy of the ledger but none of the participants actually own the
ledger. This ensures the decentralized nature of the blockchain. Platform used to
implement this is Ethereum or Bitcoin.
• Private Blockchain: These are the permissioned blockchain. Only selected
chosen node can participate. It is managed by one organization called trusted
third party. Hyperledger fabric and ripple are the blockchain platform.
• Consortium Blockchain: Selected group of nodes can participate in distributed
consensus process. It can be used one or more industries.
Features of Blockchain
• Distributed Ledger: The data can be accessed, monitored, stored, and appended
on multiple system.
• Transparent: The stored data on blockchain are transparent to user. It prevents
data from being altered or stolen.
• Open Source: Provide open-source access to everyone connected to network and
help to maintain transparency and security.
• Anonymity: The identity of individual remains anonymous.
• Immutability: The record once stored becomes forever and cannot be modified
easily.
• Consensus Mechanism: It ensure that group of blockchain network peers called
miners agree on the new block added to ledger.

3 Blockchain Development Tools

Developing blockchain-based application in decentralized environment is a chal-

lenging task due to its architecture. Developer has to learn various programming
languages, frameworks, and IDEs. Some of the tools are given bellow which are
frequently used.
Analysis of Various Blockchain-Based Solutions for Electronic Health … 217

Solidity: It is an statically typed object oriented programming language used to

develop smart contract run on blockchain platform such as Ethereum and Hyper-
ledger Fabrics. Solidity is more influenced by C++, JavaScript, and Python and has
been designed to target Ethereum Virtual Machine (EVM). EVM provides run time
environment for smart contract. Smart contract is a program compiled by EVM which
allows user to do transaction without any human interference [28].
Truffle: Truffle is a JavaScript-based testing framework/IDE helps for developing,
deploying, and testing the decentralized applications (dApps) on the Ethereum
network. Systematic directory structure and files help to develop, deploy, and test
smart contract easily. Truffle works on OS X, Linux, Windows, with Node.js installed
on it. Current stable version of Truffle is 0.6.12 [29].
Ganache: This simulator helps to develop, deploy, and test smart contract locally
without any cost, Ganache Blockchain provides safe and deterministic environment.
It has user interface (UI) as well as command line interface (CLI). The UI is the
desktop application software that supports both Ethereum and Corda development.
However, the CLI component only supports Ethereum development. All versions are
available for Windows, Mac, and Linux. It also acts as local blockchain simulator
[30].
Metamask: It is a Ethereum-based cryptocurrency wallet software. Wallets accessed
through browser extension or mobile app. It provides various services to manage
digital assets such as secure login, token exchange, and securely storing tokens into
wallets. It also acts as gateway for decentralized blockchain applications. It allows
to store passwords, private keys, and secrete recovery phrase in an encrypted format
on local machine which significantly helps to avoid mass account thefts [31].

4 Blockchain for EHR

Emerging healthcare industry adopting latest innovative model for faster and effi-
cient communication between different stakeholders. In the today’s Internet age, it
is easy to obtain and share the information through smart devices. But this leads to
risk of malicious attacks and risk of sensitive information compromised. So, there
are some basic requirements associated with healthcare. These are system security
(authentication, access control, confidentiality), interoperability, privacy, and data
sharing [32].
Cornelius [33] found that blockchain for EHR is a major research topic in literature
due to its properties such as decentralization, immutability, smart contract, and open
and transparent nature. Some of the benefits of using blockchain in EHR are
• Decentralization: The same copy of healthcare records will be available to all
stakeholders, and all of them have same access and control privileges. No single
entity will have control over the data.
218 N. Sawant and J. Gomes

• Security and Privacy: Blockchain will help to create tamper-proof records

through its immutability property. All records are encrypted and time stamped
and added in to existing distributed databases in chronological order. Privacy and
identity of patient are protected through cryptographic keys.
• Data Ownership: Patients will get assurance that their records are not misused
or altered. He will get full control of his data. This is achieved through strong
cryptographic protocol and pre-defined smart contract.
• Data Verifiability: Every stakeholder can check the integrity and validity of
records stored on blockchain. This feature is mostly applicable where data
verification process is required, such as insurance claim processing.
• Trust: As blockchain records are accessible to all, nobody has to question whether
information of EHR has been altered for personal benefit.
In further sections of this paper, the detail analysis of blockchain solutions for
EHR has been given. The articles are categorized in to two groups those who have
proposed framework/model/concept with security parameters and those who have
implemented and did the performance analysis. Objective is to analyze and compare
these articles with respect to security and performance parameters in order to review
future research areas.
Various security parameters, essential for EHR, are as follows:
• Authentication: All healthcare providers are need to authenticate before handling
EHR system. Most of the blockchain-enabled frameworks provide authentication
through public key infrastructure (PKI) system in which separate private and
public keys are provided to each users.
• Data Storage on Blockchain: Medical information stored on blockchain becomes
immutable, i.e., cannot be modified. It enhance the security and trust of the system.
• Access Control: It is a security service which prevent unauthorized use of data.
This service controls the accessibility of data under certain conditions.
• Privacy: It is a right of individual to keep their EHR private. Privacy ensures only
authorized users of the system access EHR.
• Interoperability: It is an ability of the system to exchange the EHR between two
or more stakeholders of healthcare systems. It ensures the accuracy of shared data
and helps to increase efficiency and diagnostics testing.
• Confidentiality: Any identifiable information taken from patient by doctor is
considered as confidential information. To achieve confidentiality, such private/
confidential EHR not made available to unauthorized user.
• Patient Centric: EHR system considered patient centric if consent taken from
patient while doing various operations such as store, access, modification, or
exchange of medical record.
Various analysis parameters are as follows:
• Latency: Evaluated based on average response time required for client to access
or store data on blockchain and obtain the response.

Latency = Time when response received − Submitted time

Analysis of Various Blockchain-Based Solutions for Electronic Health … 219

• Throughput: This is the rate at which valid transaction is committed by the

blockchain network.

Throughput = Total committed transaction/total time in second

• Speedup: Number of completed transaction with time.

5 Comparative Analysis of Existing Blockchain-Based

Solutions

This section briefly analyzed and compared various security and performance param-
eters which are handles in various blockchain-based solutions with respect to
EHR.
• Authentication: All healthcare providers are need to authenticate before handling
EHR system. Most of the blockchain-enabled frameworks provide authentication
through public key infrastructure (PKI) system in which separate private and
public keys are provided to each users.
• Data Storage on Blockchain: Medical information stored on blockchain becomes
immutable, i.e., cannot be modified. It enhance the security and trust of the system.
• Access Control: It is a security service which prevent unauthorized use of data.
This service controls the accessibility of data under certain conditions.
• Privacy: It is a right of individual to keep their EHR private. Privacy ensures only
authorized users of the system access EHR.
• Interoperability: It is an ability of the system to exchange the EHR between two
or more stakeholders of healthcare systems. It ensures the accuracy of shared data
and helps to increase efficiency and diagnostics testing.
• Confidentiality: Any identifiable information taken from patient by doctor is
considered as confidential information. To achieve confidentiality, such private/
confidential EHR not made available to unauthorized user.
• Patient Centric: EHR system considered patient centric if consent taken from
patient while doing various operations such as store, access, modification, or
exchange of medical record.
• Latency: Evaluated based on average response time required for client to access
or store data on blockchain and obtain the response.

T = Tres − Tsub (1)

• Throughput: This is the rate at which valid transaction is committed by the

blockchain network.
Totalcom_ tranz
Throughput = (2)
Totaltime
220 N. Sawant and J. Gomes

Throughput = Total committed transaction/total time in second

• Speedup: Number of completed transaction with time.

The comparative analysis of the blockchain-based solutions for EHR based on
following security and performance parameters. These parameters are listed below
along with their conventions (Fig. 3 and Table 2).

S1—Authentication S2—Data storage on blockchain S3—Access control

S4—Privacy S7—Patient centric S5—Interoperability
S6—Confidentiality P3—Speedup P1—Latency
P2—Throughput

Security Parameter Analysis: It is observed that out of 16 articles studied, only

1 articles have done the work in the field of interoperability. Author has achieved
it through cloud computing without using standard platform of blockchain imple-
mentation such as Bitcoin, Etherium or Hyperledger fabrics. Most of the papers
are using standard techniques to implement access control, authentication, confiden-
tiality, and data storage on blockchain network. Various schemes are implemented to
achieve privacy and centric model such as purpose centric access control, off-chain
storage and on chain verification, attribute-based signature scheme, and public key
infrastructure.
Performance Parameter Analysis: Out of 16 papers, only 5 paper implemented
the respective proposed framework and did the performance analysis with respect to
following points:

Fig. 3 Percentage of studied articles handling security parameter

Analysis of Various Blockchain-Based Solutions for Electronic Health … 221

Table 2 Comparison based on security and performance parameter

Citation No. S1 S2 S3 S4 S5 S6 S7 P1 P2 P3
√ √ √ √
[11] × × × × × ×
√ √ √ √ √ √
[12] × × × ×
√ √ √ √ √ √ √ √
[13] × ×
√ √ √ √ √
[14] × × × × ×
√ √ √ √
[15] × × × × × ×
√ √ √ √ √
[16] × × × × ×
√ √ √ √ √
[17] × × × × ×
√ √ √
[18] × × × × × × ×
√ √ √
[19] × × × × × × ×
√ √ √ √ √ √ √
[20] × × ×
√ √ √ √
[21] × × × × × ×
√ √ √ √ √ √ √ √ √
[22] ×
√ √ √ √
[24] × × × × × ×
√ √ √ √
[25] × × × × × ×
√ √ √ √ √
[26] × × × × ×
√ √ √ √ √ √ √ √ √
[27] ×
√ √ √ √ √ √
[23] × × × ×

• Average time taken for number of blocks travel from source node to destination
node within 100 nodes to travel 1490 blocks 0.216 s time required. This time can
be change with respect to block size [12].
• Average time is calculated with respect to number of request received to the
system. Result shows to handle 10 simultaneously request 145 s required [13],
whereas method proposed in [20] takes 122 s.
• Framework proposed in [22] evaluated based on time taken for data validation
process and integrity proof generation with respect to number of blocks.
• Paper [27] uses hyperledger caliper benchmark tool to analysis developed
blockchain-based application. Performance parameters are analyzed with respect
to configuration parameter such as block size, endorsement policy, channel,
resource allocation, and ledger database. Number of organization and peers and
block size are affecting factor over result.
Findings:
• Most of the research works only propose framework without its actual implemen-
tation.
• Research is also needed to focus in the area of designing dynamic blockchain-
enabled framework that ensure secure data sharing from present healthcare
systems.
222 N. Sawant and J. Gomes

• More research needs to focus on performance evaluation metrics such as latency,

throughput, and system security parameters for blockchain-enabled medial data
sharing framework.
• Frameworks are also need to be tested in various ordering services such as Raft,
Kafka, and solo with fault tolerance.
• It also shows lack of implementation of national standard policies in existing
healthcare applications.

6 Conclusion

EHR-based healthcare systems help for real-time decision, availability of critical

important information, unnecessary testing and improve organizational performance.
It also helps to share the medical record of patient among different healthcare profes-
sional. Traditional EHR system suffers more from cyberattacks. Statistics show that
every year from 2009 to 2021, the rate of attack on healthcare system is increasing.
Study shows that blockchain technology will reshape the EHR ecosystem. Mostly,
blockchain in healthcare research carried out for data sharing, securing, and providing
privacy to EHR and access control. It has been found that current blockchain-enabled
EHR system lacks to follow the legal requirement of various countries, patient centric
framework, performance evaluation metrics, use of standard platform, etc.

References

1. Banks MA (2020) Sizing up big data. Nat Med 26(1):5–6

2. Zhang R, Liu L (2010) Security models and requirements for healthcare application clouds.
In: Proceedings of 3rd international conference on cloud computing. IEEE, pp 268–275
3. Sahneya R, Sharmab M (2018) Electronic health records: a general overview. J Curr Med Res
Pract (Elsevier) 8:67–70
4. Par OE, Soysal E (2012) Security standards for electronic health records. In: International
conference on advances in social networks analysis and mining. IEEE, pp 815–817
5. Healthcare Data Breach Report (2022) HIPAA J
6. Nakamoto S (2008) Bitcoin: a peer-to-peer electronic cash system. https://bitcoin.org/bitcoin.
pdf
7. Khezr S, Moniruzzaman M, Yassine A, Benlamri R (2019) Blockchain technology in
healthcare: a comprehensive review and directions for future research. J Appl Sci (MDPI)
9(9):1–28
8. Casino F, Dasaklis TK, Patsakisa C (2019) A systematic literature review of blockchain-based
applications: current status, classification and open issues. J Telematics Inform (Elsevier)
36:55–81
9. Kanna GP, Gupta A, Kumar Y, Patel NP (2022) An enhanced cloud-based healthcare system
for patient data privacy and security using hybrid encryption. In: International conference on
innovative practices in technology and management (ICIPTM), pp 112–117
10. Sawant NM, Pottigar VV, Mane NS (2016) A survey on auditing techniques used for preserving
privacy of data stored on cloud. In: International conference on electrical, electronics, and
optimization techniques (ICEEOT), pp 1320–1323
Analysis of Various Blockchain-Based Solutions for Electronic Health … 223

11. Yue X, Wang H, Jin D, Li M, Jiang W (2016) Healthcare data gateways: found healthcare
intelligence on blockchain with novel privacy risk control. J Med Syst (Springer Link) 40(218)
12. Roehrs A, Da Costa CA, da Rosa Righi R (2017) OmniPHR: a distributed architecture model
to integrate personal health records. J Biomed Inform (Science Direct) 71:70–81
13. Xia QI, Sifah EB, Asamoah KO, Gao J, Du X, Guizani M (2017) MeDShare: trust-less medical
data sharing among cloud service providers via blockchain. IEEE Access 5:14757–14767
14. Rifi N, Rachkidi E, Agoulmine N, Taher NC (2017) Towards using blockchain technology for
health data access management. In: 4th international conference on advances in biomedical
engineering (ICABME). IEEE, pp 1–4
15. Jiang S, Cao J, Wu H, Yang Y, Ma M, He J (2018) BlocHIE: a blockchain-based platform for
healthcare information exchange. In: IEEE international conference on smart computing, pp
49–56
16. Chen Y, Ding S, Xu Z, Zheng H, Yang S (2018) Blockchain based medical records secure
storage and medical service framework. J Med Syst (Springer) 43(5):1–5
17. Guo R, Shi H, Zhao Q, Zheng D (2018) Secure attribute-based signature scheme with multiple
authorities for blockchain in electronic health records systems. IEEE Access 6:11676–11668
18. Azaria A, Ekblaw A, Vieira T, Lippman A (2016) MedRec: using blockchain for medical data
access and permission management. In: Proceeding of 2nd international conference on open
and big data. IEEE, pp 26–30
19. Sun Y, Zhang R, Wang X, Gao K, Liu L (2018) A decentralizing attribute-based signature
for healthcare blockchain. In: 27th international conference on computer communication and
networks (ICCCN), pp 1–9
20. Fan K, Wang S, Ren Y, Li H, Yang Y (2018) Medblock: efficient and secure medical data
sharing via blockchain. J Med Syst (Springer Link) 42(8)
21. Zhang X, Poslad S (2018) Blockchain support for flexible queries with granular access control
to electronic medical records (EMR). In: IEEE international conference on communications
(ICC), pp 1–6
22. Liang X, Zhao J, Shetty S, Liu J, Li D (2017) Integrating blockchain for data sharing and
collaboration in mobile healthcare applications. In: IEEE 28th annual international symposium
on personal, indoor, and mobile radio communications (PIMRC). IEEE, pp 1–5
23. Cao S, Wang J, Du X, Zhang X, Qin X (2020) CEPS: a cross-blockchain based electronic health
records privacy-preserving scheme. In: IEEE international conference on communications
(ICC), pp 1–6
24. Dias JP, Reis L, Ferreira HS, Martins A (2018) Blockchain for access control in e-health
scenarios. arXiv preprint arXiv:1805.12267v1
25. Nguyen DC, Pathirana PN, Ding M, Seneviratine A (2019) Blockchain for securing EHR
sharing on mobile cloud based e-health system. IEEE Access 7:66792–66806
26. Wang Y, Zhang A, Zhang P, Wang H (2019) Cloud-assisted EHR sharing with security and
privacy preservation via consortium blockchain. IEEE Access 7:136704–136719
27. Singh AP, Pradhan NR, Luhach AK, Agnihotri S, Jhanjhi NZ et al (2021) A novel patient-
centric architectural framework for blockchain-enabled healthcare applications. IEEE Trans
Ind Inf 17(8):5779–5789
28. Solidity—Solidity 0.8.15 documentation (soliditylang.org)
29. https://trufflesuite.com
30. https://www.trufflesuite.com/ganache
31. https://docs.metamask.io/guide/
32. McGhina T, Choo K-KR, Liu CZ, He D (2019) Blockchain in healthcare applications: research
challenges and opportunities. J Netw Comput Appl (Elsevier) 135:62–75
33. Agbo CC, Mahmoud QH, Eklund JM (2019) Blockchain technology in healthcare a systematic
review. Healthcare (MDPI) 7(56):1–30
Coordinated Network of Sensors Over
5G for High-Resolution Protection
of City Assets During Earthquakes

Ivelina Daiss, José R. Martí, Amitabh Chhabra, Dragan Andjelic,

Carlos E. Ventura, and Andrea T. J. Martí

Abstract This paper introduces a novel concept in earthquake early warning for
the smart city (EEW-SC) environment based on a finely spaced network of seismic
sensors with a wireless backbone 5G network. The sensors are spaced 50–100 m apart
to predict the damage from one structure to another depending on the resonances
of the structure and the shape of the seismic waves striking the structure a few
seconds before an earthquake arrives. The capability of performing predictive real-
time performance-based damage assessment (PBDA) is unique and not possible with
existing sensor technologies. Depending on the expected severity of the damage,
automated actions, such as disconnecting electricity and gas, can be triggered to
protect the asset and prevent fires and explosions.

Keywords A dense network of sensors over 5G · City assets earthquake

protection · Earthquake early warning · PBDA · Internet of things

1 Introduction

Using advanced wave propagation modelling (AWPM) [1] and performance-based

damage assessment (PBDA) techniques [2], together with the capabilities of the 5G
telecommunications network [3], we have designed an IoT application for earth-
quake early warning (EEW) capable of triggering automated control actions a few
seconds before destructive seismic waves arrive and cause damage to buildings and
equipment. The network of sensors has a very high spatial resolution (50–100 m),
very high accuracy (about 95%), and very low cost (about $100 per sensor station).
Due to its fine granularity and accuracy, the network can prevent extensive damage
to critical infrastructure and loss of life. The differences in subsoil characteristics

I. Daiss (B) · J. R. Martí · D. Andjelic · C. E. Ventura · A. T. J. Martí

The University of British Columbia, Vancouver, BC V6T 1Z4, Canada
e-mail: ivelinadaiss@alumni.ubc.ca
A. Chhabra
Rogers Communications, Brampton, ON L6T 0C1, Canada

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 225
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_19
226 I. Daiss et al.

and local reflections in a city distort the seismic waveforms at different geographical
coordinates. The assets’ frequency response and the particular frequency content
of the arriving waves determine the damage suffered by the asset, and automatic
response actions, such as disconnecting electrical circuit breakers, closing gas lines,
and others, can be performed to protect the equipment, buildings, and lives. Currently,
earthquake early warning systems have limited capability for automated actions since
they only predict the intensity of the arriving earthquake waves and not the frequency
content that will be applied to the structure [4].

1.1 Current Technology for Earthquake Early Warning

Conventional networks of sensors for earthquake early warning are regional systems
based on expensive (about $30 K) high-sensitivity seismic sensor stations separated
about 30 km from each other that can measure the small amplitude P-waves that
precede the large destructive S-waves by 10–20 s for a 50–100 km away earth-
quake. From the intensity of the P-waves, the intensity of the S-waves is predicted
based on historical earthquakes, and an alarm is issued to the entire city. Since the
number of earthquakes in a particular region is relatively small, the historical data
is normally assembled from all over the world. However, earthquake propagation
is highly dependent on the particular characteristics of a local region in terms of
propagation parameters and local wave reflections, and even though the earthquakes
may originate from a similar fault, by the time the waves arrive in a city, they may
have suffered multiple distortions.
There are many limitations to the conventional approach for predicting damage
to the city’s assets [4]. (1) Since the sensors are separated by 30 km or so, the spatial
resolution is limited to about 30 km. (2) Since the small magnitude of the P-waves
is used to predict the magnitude of the large S-waves, and the relationship between
these two waves is not a constant physical parameter, this prediction has a large
margin of uncertainty. (3) Since the prediction is based on historical earthquakes and
there are not two faults that are identical or two regions with the same propagation
characteristics, earthquakes that arrive at a city located tens of kilometres away from
the epicentre will present waveforms that can be vastly different from one location
to another. In addition, the effects of climate change in recent years are making
historical records even less reliable.
Even though traditional networks are working towards placing their sensors closer
to each other (e.g. every 10 km), the other aspects of the prediction (mostly the
estimation of the S-waves from the P-waves) are subjected to the same limitations
described above. In general, the accuracy of the traditional methods is in the order
of 50% in magnitude and location. Due to these limitations, when assets need to be
protected individually, they are provided with their own sensors increasing the spatial
resolution, but the accuracy limitation of predicting the S-waves from the P-waves
remains. To increase the frequency resolution in determining the frequency response
Coordinated Network of Sensors Over 5G for High-Resolution … 227

of the structures, travelling wave methods are being introduced [5]; however, these
methods are not fast enough to protect thousands of assets in a city.

1.2 Proposed Dense Network of Sensors for Individual Asset

Damage Assessment

Figure 1 illustrates a dense network of sensors across the city, forming a fine grid.
The sensors (nodes) are separated by 50–100 m. Each sensor is connected wirelessly
to the underlying 5G communications network, which presents a low latency at any
given node. This uniformly low latency is equivalent to having all the sensors next
to each other (except for the small latency) “in the same workbench”. Since waves
are propagating along the grid, comparing the measurements at all nodes point by
point allows us to see how the waves propagate. Superimposing these measurements
on an underlying wave propagation model built with transmission line segments, we
can consider the differences in the subsoil paths and predict where the waves will
be in the next few seconds. The method’s accuracy is high, and the sources of error
are predictable. Accounting for errors in the model, in the sensors, and in the subsoil
values, the expected accuracies are in the order of 95%.
With the proposed method, for a city of about 2.5 M people, placing a sensor every
100 m, we need about 300,000 sensors for a total cost of about $30 M. With traditional
technology, this number of sensors would cost about $9B, which is prohibitively
expensive. The main reason for this high cost is that conventional seismic sensors
have to be very sensitive to detect the very small P-waves to predict the S-waves,

Fig. 1 Dense network of

sensors
228 I. Daiss et al.

Fig. 2 Proposed EEW for the smart city solution

while inexpensive MEMS sensors are used in the proposed solution to measure and
predict the full waveforms, which include the strong S-waves.
Figure 2 shows the integration of the mini-seismic (MSe) stations with the 5G
network through a Raspberry Pi board and a wireless communication module. As
indicated, the cost for 300,000 stations is $30 M, which is quite reasonable compared
to the large savings expected in assets and lives.
The main requirement of the proposed solution is an extensive telecommunica-
tions infrastructure with high-performance requirements in terms of latency, band-
width, and processing speeds due to the very short time frames for making decisions
before the earthquake arrives.
The expansion of IoT technologies in recent years is transitioning from using
conventional wireless technologies such as Bluetooth, Zigbee, and LoRa, to commu-
nicating over extensive networks over 4G and 5G. Particularly, 5G networks offer
high throughput, high reliability, wide-coverage, and full IP support for IoT devices.
In the context of critical IoT applications, such as ours, they fulfil the need for low
latency, guaranteed bandwidth, and added security. They support data transmission
over large areas and are much cheaper than satellite communication. Another benefit
of using 5G technologies is that an application that relies on the massive deployment
of sensors can use the existing telecommunications infrastructure instead of building
and deploying new infrastructure (gateways, access points, etc.).
The 5G wireless technology is an enabler technology that can accommodate
massive-scale IoT deployments with the consistency of quality of service required
for mission-critical applications with a high impact on lives and property. For our
Coordinated Network of Sensors Over 5G for High-Resolution … 229

application, 5G provides ultra-low latency, designated spectrum, and processing on

the edge.
This paper presents some of the validation tests performed in the 5G Living Lab
of the University of British Columbia to verify the communications requirements of
the proposed solution.

2 Mini-Seismic Stations

We have developed MSe stations (Fig. 3) centred around an easily available Raspberry
Pi 4 board equipped with a general-purpose input osutput (GPIO) interface. This inter-
face allows control of external components, such as control circuits and communica-
tions modules connected to its outputs, and input information from sensor modules
and control circuits. The GPIO standard allows interaction using both synchronous
(I2C and SPI interfaces) and asynchronous (UART interface) serial communica-
tions. The GPIO also contains programmable pins that can be used to meet specific
application needs.
The operating system of the Raspberry Pi is highly optimized for the ARM CPU.
The OS is based on the Debian Linux distribution and allowed us to use several
available tools/libraries during the development and testing of the MSe software.
The sensor selected for this application is a 3-axis MEMS accelerometer
√
ADXL355 [6]. This sensor provides ultra-low noise density (22.5 µg/ Hz) on all
axes. The ADXL355 comes with an integrated 20-bit ADC, and it supports digital
SPI and I2C interfaces that can be used for communication with the processing unit
of the seismic station.
In our tests, we used different IoT modules to test the network’s access and
performance impact on the overall application. The modules tested included:
• Quectel BG96 for LTE-M network access,

Fig. 3 MSe sensor board

230 I. Daiss et al.

• Quectel RM500Q for LTE and 5G on FR1 network access, and

• Quectel RM510Q for 5G on FR1 and FR2 network access.
The communication protocol between the station and the earthquake early warning
(EEW) agent running on the multiaccess edge computer (MEC) of the Rogers’
network was developed using a machine-to-machine (M2M) framework that utilizes
the constrained application protocol (CoAP). The UDP protocol was used as a trans-
port protocol as it is the preferred transport layer for real-time applications. This
M2M framework is based on the OMA lightweight M2M protocol standards and can
be easily modified to enable full device management as required.

3 Sensors Coordination

A large deployment (in the tens or hundreds of thousands) of MSe stations requires
a carefully integrated data collection and management system. Each sensor records
the seismic wave at its location node, which is then coordinated in time with the
data collected in the other nodes. The objective is to accurately reassemble in time
the waves propagating across all nodes at about 4 km/s and perform all calculations
needed to predict how the waves will be propagating across the grid in the next few
seconds.
To compare the waves at different nodes, we need to have an almost exact common
time reference, despite the latency and jitter of the communications network. In the
network, the latency changes from location to location in space and time (jitter).
Conventional timekeeping methods, such as clocks in the boards that provide a
time stamp when a data packet is sent, are not accurate enough for synchronizing
the phase angles of the travelling waves over very short distances of 50–100 m.
Also, maintaining the clocks of thousands of sensors synchronized just before the
earthquake begins is logistically very difficult. GPS synchronization is subject to
non-coverage locations, and onboard hardware clocks are susceptible to temperature
changes and drifts. In addition, these clock synchronization technologies add to the
cost of the devices.
Operationally, constant synchronization to keep the application awake would
unnecessarily load the network in normal time before it is actually needed when
the earthquake occurs. Also, having to wake up possibly hundreds of thousands
of devices would subtract valuable early warning time, which is very critical for
earthquake early warning.

3.1 Asynchronous Sensor Coordination Algorithm

To satisfy the strict synchronization requirements of our application and start the
process whenever the earthquake begins, we developed an asynchronous sensor
Coordinated Network of Sensors Over 5G for High-Resolution … 231

coordination (ASC) algorithm, which is query-based and coordinated by a single

clock in the server. This algorithm avoids clocks’ drifting and can be started right
after the earthquake epicentre has been detected. Taking advantage of CoAP’s built-
in request/response message exchange protocol, the EEW agent collects, sorts, and
pre-processes all data from all sensors and forwards the synchronized data to the
EEW application. The EEW application uses the reconstructed waveforms for the
prediction model.
Figure 4 illustrates the ASC mechanism. The agent creates a schedule for sending
repeated requests for samples to all devices within its sector. In a large deployment of
sensors, to reduce the processing times, there will be multiple areas of sensors, each
area assigned to its own MEC computer. Upon query, each MSe station responds
with a packet containing the latest samples from the sensor. The interval between
two requests to the same device depends on the device’s sampling rate and allows
for a full packet of samples to be generated before it is requested.
Acting as a universal timekeeper, the EEW agent keeps track of the round-trip
time (RTT) of each request/response exchange and estimates the time in a virtual
universal clock at which each sample was generated. This estimation is a function of
the RTT and the device’s sampling rate. Once processed, time and amplitude samples
are stored in sorted order for each one of the sensors.
We verified the performance of ASC in a virtual environment with 15 virtual MSe
station devices and our EEW agent hosted on Rogers MEC. We used pre-recorded
earthquake data from Natural Resources Canada and injected statistical latency data
measured in the Rogers network. We verified that the virtual universal time approach

Fig. 4 Asynchronous sensor coordination (ASC) algorithm

232 I. Daiss et al.

gives very accurate results, and we also examined the behaviour of the system and
the data collection mechanism in different wireless technologies.
Because multiple data points are sent together every time a request for a data
packet is made to the board, there is a source of error if the agent’s request arrives
in the interval between two neighbouring samples. This sampling error is very small
as long as the sampling frequency is at least ten times the maximum frequency
we want to measure in the earthquake waveform. In our application, we sample at
200 Hz, which gives us up to a frequency of 20 Hz of the earthquake waveform. This
frequency is sufficient for accurately predicting damage to most structures [7].
ASC was very consistent in its estimates and produced very accurate timestamps.
Figure 5 shows the results obtained with ASC’s estimation and measured with a
highly controlled testing environment. The average error was about 0.002 ms in the
estimated timestamps of two devices detecting the wave at the same location and at
the same time. The requests are sent in parallel to two devices sampling in parallel. In
this case, the estimated timestamps should be the same, except for the network errors
plotted in Fig. 5. The very low error achieved allows us to trace earthquake waveforms
with a range of frequencies up to 20 Hz very accurately. This wide frequency range
is adequate to predict damage to most structural assets (Fig. 6).
The tests to verify the ASC synchronization algorithm were conducted in a highly
controlled laboratory environment. In these tests, we injected artificial uplink and
downlink delays and externally added jitter. The accuracy of ASC was estimated
by comparing the estimated times versus the actual measured times in the network
time protocol. In these tests, the accuracy was estimated by comparing the estimated
times versus the sample times recorded by NTP-synchronized MSE stations.
We measured the round-trip time (RTT) for each packet but estimated the downlink
time from server to sensor. We first assumed that the travelling time was equal in the
uplink and downlink trips, that is, ½ of the RTT. This assumption was then verified
using other assumed ratios of downlink/uplink times, but the ½ ratio was overall very
accurate.

Fig. 5 Error in the estimation of the universal time

Coordinated Network of Sensors Over 5G for High-Resolution … 233

Fig. 6 Wide range of frequencies in the Shanghai tower response

In a cellular network, downlink and uplink times are not equal, but their ratio is
not fixed because the network delay changes continuously due to the scheduling of
the traffic in the network. Since the grid solution is based on following the wave
propagation from sensor to sensor, we need to minimize the error between devices.
Assuming ½ RTT for all devices resulted in a very accurate synchronization time
when reconstructing the waveforms regardless of the ratio at a given moment.

3.2 Latency Measurements

Even though ASC compensates for the network’s latency and jitter, many error limits
are affected by higher latencies. Table 1 shows our field test results for different
network technologies. The minimum technology for useful results was 4G-LTE,
giving us an earthquake frequency response bandwidth of 5 Hz. With 5G FR1, the
bandwidth was up to 10 Hz, while with 5G FR2 (mmWave), we can capture up to
20 Hz of the seismic waves. A bandwidth of 20 Hz is sufficient to characterize the
fragility of most structures. Table 1 compares different technologies tested. It should
be noted in the results of this table that in addition to the bandwidth that can be
captured using the ASC algorithm with different technologies, the absolute value of
the latency limits the computer time we have available to process the data streams in
real-time.
234 I. Daiss et al.

Table 1 Network technologies

Technology Device Ave RTT (ms) 95% RTT (ms) Comments
LTE CatM1 4G Quectel BG96 80.2 105.4 Too slow for PBDA
LTE (4G) Quectel RM510Q 21.5 26.7 PBDA to 5 Hz
FR1 (5G) 2.5 GHz Quectel RM510Q 11.8 18.4 PBDA to 10 Hz
FR2 (5G) 28 GHz Quectel RM510Q 9.3 11.8 PBDA to 20 Hz

4 Performance-Based Damage Assessment

The primary focus of most modern building codes is to ensure human life safety.
However, the degradation of life quality and the economic losses in the days after the
earthquake can be considerable and should be part of the design process. Integrated
design methodologies for the building owner, architect, and engineer, to choose
the desired level of seismic performance for buildings and nonstructural components
when subjected to a specified level of ground motion should be added to the standards
[8].
Real-time performance-based damage assessment (PBDA) is not possible with
current earthquake early warning systems because they cannot predict wide-band
waveforms, and therefore, their prediction accuracy is very limited. The PBDA index
determines, in a probabilistic manner, the level of risk to which the infrastructure
is exposed, driving decisions for seismic reinforcement, backup of critical data, and
safety of people, as well as the cost of insurance to cover this risk. This risk depends
on the sensitivity of the infrastructure to the soil site conditions and the dynamic
characteristics of the earthquake in the specific region. Figure 6 [9] illustrates the
wide range of frequencies in the frequency response of the 121-storey Shanghai
tower that may be excited by the earthquake waveforms.

5 Infrastructures Interdependencies Integration (I2SIM)

With the state of damage of each critical asset given by the PBDA, we can make
disaster recovery decisions to preserve the city’s integrity. Local responses consist
of automatic actions that can be applied in a couple of seconds before the earthquake
strikes. An important local application is to open the circuit breaker feeding wires
or equipment that will collapse and produce short circuits that result in fires. Global
responses prioritize the responders’ actions to restore the most critical services as fast
as possible. UBC’s infrastructures interdependencies simulator (i2SIM) [10] is a tool
that allows us to coordinate these actions within our earthquake early warning (EEW-
SC) disaster response environment. Figure 7 shows a diagram of i2SIM where the
main infrastructures of a city’s downtown area are represented for disaster response.
The information from the PBDA subsystem is fed into the i2SIM management
console to determine where the responders should prioritize their responses. i2SIM
Coordinated Network of Sensors Over 5G for High-Resolution … 235

Fig. 7 i2SIM for a city’s downtown area

coordinates the supply of services that each critical infrastructure needs from each
other and, given the damage level of each infrastructure, determines how to distribute
the available resources to maximize the speed of recovery and restore the city’s
wellbeing. For example, immediately after an earthquake, there will be a deprecation
in the capacity of electrical feeders and water pipes to supply electricity and water
to the emergency units (EU) of the hospitals. In addition, major roads and highways
may have severe structural damage and are no longer functional. i2SIM will inform
the responders how many people can be treated at each available EU, depending on
the structural damage and availability of resources in each EU.
In many cases of earthquakes, achieving a “recovered” minimum functional state
of a building or infrastructure could take months to a year. With the information and
knowledge provided by our proposed EEW-SC system, a minimum operating system
state with minimum essential services for survival could be achieved in weeks rather
than months.
236 I. Daiss et al.

6 Conclusion

In this paper, we have introduced a new approach to protection and response to large
earthquake disasters to protect the infrastructure assets across a smart city. The main
premise is that considerable life and property can be saved during earthquakes by
having a better knowledge of the damage expected from the earthquake shaking.
In many cities that suffer strong damage during earthquakes, an extensive 5G or 4G
communication network already exists that allows the implementation of the solution
proposed in this paper. In this project, we have designed a network of economical
mini-seismic stations that can crisscross the area of the city. This resolution in space
allows the development of more comprehensive damage assessment and disaster
response strategies to save human lives and quickly restore the city’s wellbeing.

References

1. Martí J (2021) The EQZ transmission line model for earthquake wave propagation. The
University of British Columbia
2. Moehle J, Deierlein G (2004) Framework methodology for performance-based earthquake
engineering. In: WCEE, Vancouver, B.C., Canada
3. Ericsson—Deutsche Telekom (2021) Enabling time-critical applications over 5G with rate
adaptation
4. Wald D (2020) Practical limitations of earthquake early warning. Earthq Spectra 36:1412–1447
5. Hoshiba M (2021) Real-time prediction of impending ground shaking: review of wavefield-
based (ground-motion-based) method for earthquake early warning. Front Earth Sci 9
6. Analog Devices Inc (2020) ADLX354/ADXL355 data sheet
7. Arnold C (2006) FEMA 454: designing for earthquakes. In: A manual for architects. Providing
protection to people and buildings. Oakland, California
8. Haselton CB, DeBock et al (2019) Resilient design for functional recovery—expectations for
current California buildings and approaches to resilient design. Haselton Baker Risk Group,
LLC Seismic Performance Prediction Program (SP3)
9. Ventura C (2020) Joint time-frequency analysis in OMA. In: IMAC XXXVIII conference on
structural dynamics
10. I2SIMRT (2022) I2SIM-RT software and user’s manual. I2SIM-RT Technologies Inc.,
Vancouver, Canada
Detection of COVID-19 Using Medical
Image Processing

Rekha Sri Durga, I. Akhil, A. Bhavya Sri, R. Lathish, Sanasam Inunganbi,

and Barenya Bikash Hazarika

Abstract Since the outbreak of COVID-19, several human life has been affected in
many aspects and witness huge death cases. Ultimately, the World Health Organiza-
tion has declared COVID-19, a pandemic, which has created a magnificent loss all
over the world, especially in those countries having poor health hygiene and is finan-
cially slower to respond. Medical image processing has been carried out to implement
in various applications of healthcare, such as cancerous cell detection, lung nodules
classification, thyroid diagnosis, diabetic retinopathy detection, and fetal localiza-
tion. The sources for the study are medical images, e.g., X-ray, CT, and MRI, and
these numerous sources of medical images have made the medical image processing
techniques tackle the COVID-19 outbreak. Huge research work as a response to the
outbreak to combat the deadly disease has been proposed and implemented using
healthcare technology available to us. Therefore, in this paper, we are motivated to
analyze and summarize several state-of-the-art research works related to COVID-19
medical image processing. Further, we also highlight an overview of deep learning
and its applications to healthcare found in the last decade.

Keywords SARS-CoV2 · Medical image processing · Deep learning

1 Introduction

In human history, one of the outbreaks that have created a worldwide health crisis is
the COVID-19 pandemic affecting humans of all age groups. Initially, fewer people
get affected and were confined to one geographical region and not much of a potential
threat to the human race; however, in the later stages, the disease outbreak has become
an immensely high-risk pandemic as declared by the World Health Organization
(WHO). It has the potential of infecting millions of lives in all geographical regions,

R. S. Durga · I. Akhil · A. Bhavya Sri · R. Lathish · S. Inunganbi (B) · B. B. Hazarika

Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation,
Vaddeswaram, India
e-mail: Cisanasam@kluniversity.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 237
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_20
238 R. S. Durga et al.

Fig. 1 Detection of COVID-19

especially ones with weaker health systems. The new virus discovered is deadly
mainly due to the fact that no vaccine was available, and it is transmitted through
direct or indirect contact with the affected individual.
We have several medical image resources that made deep learning a great tech-
nique to detect the virus in the human body like CT scan, MRI scan, etc.; then, we
can preview deep learning and its applications in which healthcare found in the last
decade. To detect if the patient has COVID, first, they go under some scans (CT scan/
MRI scan), and if they have, they would know at what level is the patient infected.
Since this virus emerged in China and the first case emerged, it has cost over 4 million
people. We have many methods like medical imaging, X-ray session, etc. Doctors
take the diagnosis which is mainly dependent on the RT-PCR test as it is not the final
report. It only indicates whether the person has SARS-CoV-2 or not. As to know
how much he’s infected with the virus, X-ray session is conducted, but for a more
detailed report, they take a CT scan (computed tomography) (Fig. 1).
In this scan, doctors mainly observe the lungs through the X-ray image or CT
scan images of the patient and indicate the symptoms. As we all know, medical
image processing is the process of exploring the image datasets of the body, which
is commonly obtained from a computed tomography scan (CT scan) or magnetic
resonance imaging (MRI) scan. This technique is mainly carried out by radiologists
to understand more about the symptoms of the infected patient.
Several organizations and governments have instigated to invest in the COVID-
19 vaccine and related research abundantly and ardently. Several-related symptoms
were identified and enlisted so that the general public is aware and conscious of the
same to be perceptive to investigate and follow treatment to the earliest to reduce the
mortality rate. Further, tremendous research works are being carried out connecting
to the COVID-19 outbreak. The medical image processing approach has achieved
huge momentum in several health sectors, especially in cancer detection [1]. Further,
deep learning and machine learning techniques are popular choices for the detection
of several disease detection as reported in [2].
Detection of COVID-19 Using Medical Image Processing 239

The objective of this paper is to emphasize the input of machine learning, deep
learning, and medical image processing technique to counter the COVID-19 outbreak
all over the world. A review of the state-of-the-art techniques designed by using the
methods.

2 Contribution of Medical Image Processing in COVID-19

Several researchers are found to be determined to work on many scientific studies in

diverse directions in order to fight the outbreak. The techniques may include medical
image processing integrated with a deep learning approach attempting to identify any
possible solution [3, 4]. Table 1 summarizes diverse publications that have similar
objectives. It has presented the relevance of medical image processing with deep
learning methods in the identification and detection of COVID-19. Further, the study
also featured challenges combined with such applications that will be helpful for
future research work that is not explored.
For analyzing medical images, the deep learning algorithm is becoming a popular
selection of methodology, in addition to object detection, segmentation, registra-
tion, image classification, etc., a survey of deep learning in medical image anal-
ysis is presented in [5, 6]. The focus areas are neuro, retinal, pulmonary, digital
pathology, breast, cardiac, abdominal, and musculoskeletal; which summarized the
current methodology applied and further discussed the future direction of research.
An important aspect of any diagnostic model is to meet certain minimum require-
ments for it to be reliable for the radiologist and the engineers that have developed the
model. One such research is presented in [7] that has listed the minimum checklist
to be made by any COVID-19 diagnosis model.
In [17], a comprehensive survey on early detection of COVID-19 is proposed
using several transferred learning methods, using X-ray images. The authors have
evaluated eight pre-trained convolutional neural networks, they are AlexNet, VGG-
16, GoogleNet, MobileNet-v2, squeezeNet, ResNet-34, ResNet-30, and Inceprion-
V3. Among them, the best accuracy is reported using ResNet providing 98.33%
performance. The validation of the proposed methods is executed on a publicly
available chest X-ray dataset, created by cohen JP (COVID-chest X-ray-dataset).
The dataset is gathered from several different open sources, and regular updating is
also performed. In their work, the authors have limited only to the binary classification
of COVID-19 vs normal and have not considered other pneumonia disorders.
As an upgradation of the previous paper, the authors in [18] have presented the
classification of 2 classes (COVID-19 vs normal), 3 classes (COVID-19 vs normal
vs non-COVID pneumonia), and 4 classes (COVID-19 vs normal vs non-COVID
pneumonia vs non-COVID bacterial pneumonia). They have utilized raw chest X-
ray and CT scan images for the automatic detection of COVID-19 with the help
of a novel CNN architecture term as CoroDet. The authors have combined eight
different publicly available COVID-19 repositories in order to generate one of the
largest datasets for COVID-19 detection. The performance of the proposed method
240 R. S. Durga et al.

Table 1 Contribution in the field of COVID-19 using medical image processing

Paper Dataset Algorithm Remark
[8] Optical coherence tomography Transfer learning with The proposed design has
(OCT) image dataset DL structure not considered dataset
from different sources to
ensure generic practice
[9] Chest X-ray images from VGG16 with customized The prediction on the
children below 5 years known as framework highly nonlinear locality
anteroposterior chest X-ray that will ensure the
dataset. Generated from reliability of the random
Guangzhou Women and sampling process is not
Medical Center, China considered in this study
[10] The dataset is obtained from Deep learning The ordered feature of
Xi’an Jiaotong University First architecture with AI genetic, CT scan images,
Affiliated Hospital, Nanchang and epidemiological data
University First Hospital. It has not been included and
contains chest CT scan images the relation among them is
of COVID-positive and normal not determined
people
[11] The dataset is collected from Human-in-the-loop The current study has not
Shanghai Public Health Clinical strategy, DL-based considered the correlation
Center and contains chest CT segmentation between the symptoms,
scan images network—VB Net epidemiology, and medical
response. Further, the
quantification of imaging
metrics is not reported
[12] The dataset consists of X-ray Bayesian convolutional Comparison analysis of
images of lungs that specify of network the proposed method with
the posterior-anterior (PA) part. traditional state-of-the-art
It is obtained from Dr. Joseph techniques is not
Cohen’s GitHub repository and considered
enhanced with chest X-ray
images from the Kaggle dataset
[13] The dataset is the amalgamation Transfer learning based A larger dataset that
of images from Github and on CNN contains several viral
Cohen, Italian Society of infections and COVID-19
Medical and Interventional data can be considered so
Radiology (SIRM), that an in-depth study can
Radiopaedia, and image dataset be performed for their
from Radiological Society of identification
North America (RSNA)
[14] The dataset contains chest CT Convolutional neural A systematic inference on
scan image obtained from network (CNN) the significance or effect
Tongji Hospital, China Architecture of COVID-19 outbreak on
the opacities is not
considered
(continued)
Detection of COVID-19 Using Medical Image Processing 241

Table 1 (continued)
Paper Dataset Algorithm Remark
[15] The dataset consists of X-ray COVIDX-Net consists of A larger dataset with
images obtained from Dr. deep CNN architecture varied medical images can
Adrian Rosebrock and Dr. be considered for more
Joseph Cohen detail analysis
[16] The dataset has chest X-ray Deep CNN Implementation of the
images from Dr. Joseph Cohen’s architecture—ResNet50, CNN models on larger
GitHub repository InceptionV3, and datasets to enhance
InceptionResNetV2 classification performance
not considered

has been reported to be 99.1% in binary classification, 94.2% in 3-class classification,

and 91.2% in 4-class classification. The drawback of the paper is that it performs
poorly on low-quality X-ray images.
In [19], the authors have introduced a new deep learning model for multitasking
that jointly identifies COVID-19 patients and further segments COVID-19 lesions
from chest CT scan images. The different tasks to be carried out are segmentation,
classification, and reconstruction, which are evaluated on a different dataset. They
have used a dataset from 1369 patients that consist of 449 COVID-19, 98 having lung
cancer, 397 suffering from several kinds of pathology, and 425 normal patients. The
proposed method has reported an area under the ROC curve greater than 97% for
classification and segmentation showing a dice coefficient higher than 0.88. However,
the authors could have extended the experiment not only to CT scan images and also
to the different networks that need to be considered that extract useful information
to test on a larger dataset to confirm the goodness of the performance.
Another research work that is based on the study and analysis of pre-trained
networks for the diagnosis of COVID-19 is presented in [20]. The work is applied
to X-ray and CT scan medical images on four pre-trained models, i.e., VGG16,
DenseNet121, ResNet50, and ResNet152 for binary classification. A total of 2481
CT scan images have been used in their experiment, from a dataset known as the
Sars-cov-2 CT scan dataset; it consists of 1229 normal healthy patients and 1252
COVID-positive patients. The authors have used evaluation metrics such as accuracy,
F1 score, precision, and recall. To further enhance, data fusion of publicly available
datasets can be considered that mixed patients having lung problems such as TB,
AIDS, and COVID-19, and multi-criteria classification can be done. The severity of
the system can be diagnosed with the dataset that includes metadata of the stages of
diseases.
The complexities that deal with the imbalanced dataset with image quality and
sizes are proposed in [21] with different preprocessing steps to achieve an effective
and accurate CNN classification. Some of the phases in preprocessing methods are
data balancing, data augmentation, and analysis from a medical expert. A primary
dataset is used in their experiment that consists of 136 COVID-positive and 42
242 R. S. Durga et al.

normal people X-ray images; contributing a total of 178 images for analysis. The
overall accuracy of 99.5% is reported in the proposed CNN model.
This paper [22] describes a deep learning technique for the detection of COVID-19
patient using CT scan images. For the experimental study and analysis chest, X-ray
images were used as a sample dataset keeping in mind the economic features of X-ray
equipment with time efficiency and availability in the majority of the hospitals or
clinics. The authors have claimed that the developed technique can detect in presence
of the virus in the shortest time possible, which will lessen the pressure on RT-PCT
testing when hospitals or diagnostic centers need to run for huge testing. For the study,
several pre-trained networks have been employed such as MobileNet-V2, VGG-16,
ResNet-50, and InceptionV3 which are modified based on the requirement as a head
model from a base model. The model has been reported to achieve 92% and 98% of
validation accuracy at epochs 8 and 10, respectively. The presented method can be
extended for improving accuracy by setting on a larger dataset.
Another transfer learning model is presented for the identification and detection
of COVID-19. In this paper [23], the authors have utilized a well-organized CT scan
image dataset which is known as COVIDx CT-2A and COVIDx CT-2B. The dataset
contains as many as 194,992 CT scan images taken from 3745 patients that belong
to the age group of 0–93 years. In this paper, the authors have modeled a reprise
version of ResNet-V2 and changed the parameter as per their requirement for all the
CNN layers. It has provided 99.2% accuracy in detecting COVID-19 cases. However,
their work is limited on the theoretical ground and can be validated by experts and
physicians for its clinical uses.
In [24], the authors have proposed the detection of COVID, and its severity is also
predicted using optimization methods and deep learning architecture. The extraction
of features from CT scans and chest X-ray images based on CNN is presented in
a feature-learning stage. The authors have suggested fine-tuning the CNN model
with data augmentation to improve the performance of the developed model. The
presented work reported an accuracy of 97.36%. The authors have suggested the
generalization of the proposed model by adding a new labeled image.
Another deep learning approach for accurate identification of COVID-19 from
chest X-ray images is presented in [25]. The proposed model is reported to detect any
abnormal structure in the CT scan or X-ray images. The developed methodology for
identification is broken down into three phases; namely dataset preparation, prepro-
cessing, and finally training and classification. The authors have suggested that with
a larger amount of data, the results can be improvised.
Detection of COVID-19 Using Medical Image Processing 243

3 Medical Image Processing with Deep Learning

in COVID-19

Several deep learning models have been reported in the literature to identify the
accurate outcome and consistent results by standardizing and digitizing medical
images dataset with the techniques of medical image processing. Further, it has been
observed that radiographic patterns on CT chest scans in the early phase of COVID
infection have a higher positive detection rate as compared to the RT-PCR in detecting
COVID-19. Popular universities such as Stanford have provided data, models, tools,
research studies, and funding opportunities for COVID-19 research. The research
effort combined with COVID-19 datasets has helped to build comprehensive medical
image processing and DL models for identification, virus diagnosis, treatment, and
even potential vaccine development.
In the field of drug research and vaccine research, medical image processing tech-
nique with deep learning has the capacity to contribute to the COVID-19 outbreak.
Enormous datasets of chemical compounds are available where several machine
learning and deep learning models can be trained; which can boost human immunity
and protect from infection. Therefore, the pattern of the compound can be studied
and learned using machine learning techniques in less time. Further, the researchers
along with the input of medical experts can test the algorithm whether their newly
composed compounds in the medicine can be used as vaccines or not.
The virus generally represents a similar part of the antigen that induces the disease.
When the vaccine is introduced into the body, the immune system of the host body is
activated which helps in the generation of particular antibodies for the identification
and demobilization of the virus. The virus will multiply rapidly, and their respective
antigen is likely to undergo mutation and which will prevent the identification or
detection of the developed antibodies. So, the vaccine generation effort devoted to
classifying the T-cell epitopes for the COVID-19 virus needs to be discussed. That is
why several researchers have used the CNN model as a deep learning method for the
prediction of cross-immunoreactivity (CR) in heterogeneous epitope vaccines with
the help of experts or physicians.
Several applications are developed following the concept of machine learning,
medical image processing, and computer vision methods to control and monitor
the transmission of COVID-19 disease. For the identification, detection, inspec-
tion, and guiding of COVID-19, several smart devices based on machine learning,
medical image processing, and computer vision were deployed among the public.
For instance, ventilators, automatic sanitizers, respirators, and protective gear are
employed for attending to the patients at the same time protecting healthcare special-
ists as an assurance for virus containment. The temperature of individuals is measured
using a thermal screener to check the temperature rise among them as the primary
symptom of COVID-19 is high temperature. Vision-guided robots have been used to
ensure social distancing which is practiced strictly among COVID-infected patients
and their near ones. Several countries’ governments have launched drones to detect
COVID-19 infections among people in remote areas. Other ML and image processing
244 R. S. Durga et al.

techniques are being employed for the extensive manufacturing of gears and health-
care products to be used by all who work in hospitals and offices. These devices assist
in avoiding the spreading of the virus by minimizing human contact. These technolo-
gies have proved their roles in diagnosing and reducing airborne virus particles, which
have the possibility of infecting a large number of people.
It is a time-consuming process, although demanding, to manually detect COVID-
19 or not. Therefore, the application of medical image processing with deep learning
or machine learning approach is highly supported for the prediction of disease
using publicly available datasets. The approach is effectively supported by health
organizations such as WHO, IMCR, etc.
However, the availability of the dataset and its accuracy is questionable for deter-
mining the efficiency of the diagnostic system developed using machine learning
techniques. Further, it is difficult to gather medical data such as X-ray and CT scan
images from wider demographics and healthcare organizations. Moreover, access to
other information related to the patients such as family history, work, education, and
other behavioral characteristics is required for determining the technical implemen-
tation using machine learning or deep learning approach. Also, how some people got
infected without any symptoms needs to study.

4 Conclusions

Medical image processing with a deep learning approach has been treated as an effec-
tive method to furnish rational and reliable solutions for the diagnosis and identifica-
tion of the COVID-19 outbreak. Several applications of the deep learning approach
have been reported in the literature in the field of medical diagnosis with the objec-
tive of providing potential solutions. In this paper, we have summarized some of the
recent diverse publications based on COVID-19 that have similar objectives. It has
presented the relevance of medical image processing with deep learning methods
in the identification and detection of COVID-19. Further, the study also featured
challenges combined with such applications that will be helpful for future research
work that is not explored.
However, there is still a requirement for discussion and process among different
organizations from government, industry, and academia that demands extensive effort
and time. Further, we have listed the application of medical image processing with
a deep learning approach in COVID-19 in diagnosis and vaccine preparation. More-
over, the challenges associated with computer-aided diagnosis and its impact on the
medical field are also discussed.
Detection of COVID-19 Using Medical Image Processing 245

References

1. Khan FA et al (2020) Blockchain technology, improvement suggestions, security challenges

on smart grid and its application in healthcare for sustainable development. Sustain Cities Soc
55:102018
2. Bhattacharya S, Maddikunta PKR, Pham QV, Gadekallu TR, Chowdhary CL, Alazab M,
Piran MJ (2021) Deep learning and medical image processing for coronavirus (COVID-19)
pandemic: a survey. Sustain Cities Soc 65:102589
3. Iwendi C et al (2020) COVID-19 patient health prediction using boosted random forest
algorithm. Front Public Health 8:357
4. Hakak S et al (2020) Have you been a victim of COVID-19-related cyber incidents? Survey,
taxonomy, and mitigation strategies. IEEE Access 8:124134–124144
5. Litjens G, Kooi T, Bejnordi BE, Setio AA, Ciompi F, Ghafoorian M, Van Der Laak JA, Van
Ginneken B, Sánchez CI (2017) A survey on deep learning in medical image analysis. Med
Image Anal 1(42):60–88
6. Liu T, Siegel E, Shen D (2022) Deep learning and medical image analysis for COVID-19
diagnosis and prediction. Annu Rev Biomed Eng 24
7. Hryniewska W, Bombiński P, Szatkowski P, Tomaszewska P, Przelaskowski A, Biecek P (2021)
Checklist for responsible deep learning modeling of medical images based on COVID-19
detection studies. Pattern Recogn 118:108035
8. Kermany DS et al (2018) Identifying medical diagnoses and treatable diseases by image-based
deep learning. Cell 172.5:1122–1131
9. Rajaraman S et al (2018) Visualization and interpretation of convolutional neural network
predictions in detecting pneumonia in pediatric chest radiographs. Appl Sci 8(10):1715
10. Wang S et al (2021) A deep learning algorithm using CT images to screen for Corona Virus
Disease (COVID-19). Eur Radiol 31:6096–6104
11. Shan F et al (2020) Lung infection quantification of COVID-19 in CT images with deep
learning. arXiv:2003.04655
12. Ghoshal B, Tucker A (2020) Estimating uncertainty and interpretability in deep learning for
coronavirus (COVID-19) detection. arXiv:2003.10769
13. Apostolopoulos ID, Mpesiana TA (2020) COVID-19: automatic detection from X-ray images
utilizing transfer learning with convolutional neural networks. Phys Eng Sci Med 43:635–640
14. Huang L et al (2020) Serial quantitative chest CT assessment of COVID-19: a deep learning
approach. Radiol: Cardiothorac Imaging 2(2):e200075
15. Hemdan EE-D, Shouman MA, Karar ME (2020) COVIDx-net: a framework of deep learning
classifiers to diagnose COVID-19 in X-ray images. arXiv:2003.11055
16. Narin A, Kaya C, Pamuk Z (2020) Automatic detection of coronavirus disease using X-ray
images and deep convolutional neural networks. arXiv:2003.10849
17. Nayak SR, Nayak DR, Sinha U, Arora V, Pachori RB (2021) Application of deep learning
techniques for detection of COVID-19 cases using chest X-ray images: a comprehensive study.
Biomed Signal Process Control 64:102365
18. Hussain E, Hasan M, Rahman MA, Lee I, Tamanna T, Parvez MZ (2021) CoroDet: a deep
learning based classification for COVID-19 detection using chest X-ray images. Chaos, Solitons
Fractals 142:110495
19. Amyar A, Modzelewski R, Li H, Ruan S (2020) Multi-task deep learning based CT imaging
analysis for COVID-19 pneumonia: classification and segmentation. Comput Biol Med
126:104037
20. Yang D, Martinez C, Visuña L, Khandhar H, Bhatt C, Carretero J (2021) Detection and analysis
of COVID-19 in medical images using deep learning techniques. Sci Rep 11(1):1–13
21. Reshi AA, Rustam F, Mehmood A, Alhossan A, Alrabiah Z, Ahmad A, Alsuwailem H, Choi
GS (2021) An efficient CNN model for COVID-19 disease detection based on X-ray image
classification. Complexity
22. Uddin A, Talukder B, Monirujjaman Khan M, Zaguia A (2021) Study on convolutional neural
network to detect COVID-19 from chest X-rays. Math Probl Eng
246 R. S. Durga et al.

23. Zhao W, Jiang W, Qiu X (2021) Deep learning for COVID-19 detection based on CT images.
Sci Rep 11(1):1–12
24. Syarif A, Azman N, Repi VVR, Sinaga E, Asvial M (2022) UNAS-Net: a deep convolutional
neural network for predicting COVID-19 severity. Inf Med Unlocked 28:100842
25. Xue Y, Onzo BM, Mansour RF, Su SB (2022) Deep convolutional neural network approach
for COVID-19 detection. Comput Syst Sci Eng 201–211
Text Encryption Using ECC and Chaotic
Map

P. N. V. L. S. Sneha Sree, Vani Venkata Durga Kadavala, Pothakam Chandu,

Savara Murali Krishna, Khoirom Motilal Singh, and Sanasam Inunganbi

Abstract With the advancement of technology in the modern society, there is also
a need for an advance security system. The public-key encryption known as elliptic
curve cryptography (ECC) is considered to be more secure than traditional methods
such as Rivest, Shamir, Adleman (RSA) for the same key size. In ECC, a key pair
is generated by selecting a point on an elliptic curve and a secret integer, called the
private key. In this research, we proposed an ECC-based chaotic map text encryption
scheme. A chaotic sequence is generated where BitXoR is performed with the input
ASCII characters. The XoRed data is grouped, and ECC is performed to each group to
generate the cipher text. The chaotic system provides better randomness while ECC
provides efficient security. Experimental analysis shows that the proposed method
can efficiently encrypt and decrypt the input text data.

Keywords ECC · Elliptic curve discrete logarithm problem (ECDLP) ·

Encryption · Decryption · Data encryption standard (DES) · Advanced encryption
standard (AES) · RSA

1 Introduction

Cryptography is the practice of protecting information’s confidentiality, integrity,

and authenticity using mathematical techniques and protocols. Symmetric-key and
public-key cryptography are the two main categories of cryptography. ECC is a
public-key cryptography system that uses a mathematical structure called an elliptic
curve to create a set of secure keys for encrypting and decrypting data. ECC is known
for its efficiency and security, making it a popular choice for a wide range of applica-
tions. This includes secure communication over the Internet, digital signatures, and

P. N. V. L. S. Sneha Sree · V. V. D. Kadavala · P. Chandu · S. M. Krishna · K. Motilal Singh (B) ·

S. Inunganbi
Department of CSE, Koneru Lakshmaiah Education Foundation,
Vaddeswaram, Andhra Pradesh, India
e-mail: khmotilal@kluniversity.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 247
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_21
248 P. N. V. L. S. Sneha Sree et al.

other applications where data security is important [1]. It is important to keep secret
keys secure and protect them from unauthorized access, as this could compromise
the security of the encrypted data [2]. A literature review on various existing meth-
ods will provide an overview of the key findings, trends, and developments in the
field. The different variants and variations of ECC have been proposed, as well as
any known vulnerabilities or attacks on the algorithm. This study shows that ECC
surpasses RSA in terms of operational effectiveness and security. This research also
raises the possibility that ECC may be best suited for memory-constrained devices
like smartphones. ECC is an equivalent of the Deffie-Hellman key exchange (DHKE)
protocol. With a smaller key size than RSA, ECC offers more security. In [1], the
authors used ECC in electrical devices since that have less memory and consume less
power. The mapping technique’s security will give twofold security for text encryp-
tion [2], and the method suggested in this work has the significant benefit of avoiding
the need to pad the grouped hexadecimal with an extra bit when the integer is odd
since it is expected that the length zero group is NULL. The encryption and decryp-
tion of a matrix-based message for an ElGamal-based message were reported by the
authors in [3]. AES uses a variable-length key, with sizes ranging from 128 bits to
256 bits, and is considered much more secure than DES. In [4], the modified AES
is more effective than the original AES since it encrypts text and picture data more
quickly. Homomorphic Encryption is a trusted and private method for cloud data cen-
ter processing and storage. Currently, HE has been used by the authors, and it appears
conceivable that its first extensive use will be in ML applications that enable private
AI [5]. Through this research, authors achieved access control limitation with the use
of attribute-based encryption, which is both secure and trustworthy. This platform is
known as CloudSim [6]. By tackling the clustering problem, the developers of this
mechanism take the first step into the field of unsupervised learning, a significant area
of machine learning that has numerous practical applications. In order to conduct
a privacy-preserving assessment, fully homomorphic encryption (FHE) techniques
are used [7]. A new encryption technique [8] which provides a high-level security
with a small sized is proposed where the traditional method of converting letters into
affine points on the elliptic curve has been removed. Here, the input text is converted
into ASCII values, and then, they are grouped according to a respective size and then
these groups converted into long numbers within the group, and then key generation
is done followed by generation of cipher text. This approach helps eliminate the cost
of operation associated with mapping characters to elliptic curve co-ordinates while
also eliminating the requirement for using a standard lookup table. A new [9]and
efficient stenography method which is used for hiding the biomedical image in a
ordinary file or message has been proposed. In this paper to maintain continuous
communication through channel where it has no security, ElGamal cryptosystem is
adopted. A new encryption technique [10] is introduced which is a symmetric key
encryption-based scan and cycle-shift operation called chaotic map. Input is refined
according to the requirement using respective processes. Henon map played a key
role in their technique. Finally double scrambled image, an encrypted image is gen-
erated. A new algorithm [11] is proposed which protects medical images against
an attack. Chaotic systems are used in this technique. This algorithms have two
Text Encryption Using ECC and Chaotic Map 249

main parts, and they are high-speed permutation and adaptive diffusion. It is such an
effective algorithm which says image cannot be decrypted if there exists any small
change in produce key. It is mentioned that there is a requirement of key space of.2100 .
The authors found that end-to-end key generation and picture encryption based on
deep learning provided benefits in enormous key spaces and automatic generation,
with a lessened need on complex cryptography architecture [8]. In this mechanism,
authors proposed an unique mechanism of performing image security with an aid of
deep neural network. Authors emphasized that shouldn’t offer heavy cryptographic
operation. Usage of stacked encoder has actually overcome the iterative problem
of feed-forward approach while enhanced chaotic map leads to better generation
[12]. Author used a new bit reversion, chaotic logmap, and deep CNN to gener-
ate keys for encryption operations. Permutation, DNA encoding, diffusion, and bit
reversion, which scrambles and modifies the pixels, are used here to securely encrypt
images [13]. This framework [14] is mainly uses dynamic DNA encoding, hyper-
chaotic system, and elliptic curve cryptography. The colorful image is encrypted
into a DNA order using process randomly selected row-level encoding-rules. Hyper-
chaotic system is utilized for producing pseudo-random sequences to arranged image
information at two different levels called bit-level and block-level. The method in
[15] concentrates on encryption of color and gray image(s) using chaotic system. It
uses a chaotic system of different dimensions. This proposal is very sensitive about
initial conditions because the important chaotic sequences are generated based the
initial conditions. Finally, the algorithm performs encryption using the permutation
table. A new image encryption [16] transmitting algorithm using parallel mode is
proposed. To increase the security level, both sequence signal generator and chaotic
cryptography will be combined. The cryptographic properties of chaotic signals can
fully used by flexible digital logic circuit. The overall contribution of the proposed
encryption scheme can be summarized as follows:
1. A new text encryption is proposed based on ECC and logistic map.
2. The proposed method can successfully encrypt and decrypt any input text data.
The rest of the paper is structured as follows. Section 2 describes the preliminaries.
Section 3 displays the data grouping. Section 4 describes the proposed methodol-
ogy. Section 5 presents the experimental analysis of the algorithm, followed by the
conclusion in Sect. 7.

2 Preliminaries

2.1 Mathematical Computation with Elliptic Curve

Over a Finite Field

The following calculations depict the mathematical operation using the co-ordinates
in elliptic curve whose range is a finite field.
250 P. N. V. L. S. Sneha Sree et al.

1. Point addition: Any two co-ordinate points which are distinct, . S(x1 , y1 ) and
. T (x 2 , y2 ), on point addition returns a new co-ordinate .U (x 3 , y3 ) which satisfies
the equation of elliptic curve. Mathematically representation is as follows:

x = {λ2 − x1 − x2 } mod p
. 3 (1)

y = {λ(x1 − x3 ) − y1 } mod p
. 3 (2)

if . P /= Q,
y2 − y1
. λ= mod p (3)
x2 − x1

otherwise,

3x12 + a
λ=
. mod p (4)
2y1

2. Point multiplication: Point multiplication is computed as addition of multiple

points.
.i R = R + R + · · · i times (5)

Point addition and point doubling operation are combined to perform Point
multiplication efficiently [13].

2.2 Chaotic System

Logistic Map Logistic map is a widely used point of entry into a consideration of the
concept of chaos. This logistic map is a polynomial with degree 2, which is regularly
referred to as a typical thing such as how complex, chaotic behavior can emerge
from simple non-linear dynamic equations. Figure 1 shows the bifurcation diagram
of logistic map. The logistic map (LM) is define as

a
. x+1 = t ∗ ax (1 − ax ) (6)

where the common values of the parameter t are in the range [0, 4], so that .ax holds
on .[0, 1].
Text Encryption Using ECC and Chaotic Map 251

Fig. 1 Bifurcation diagram

of logistic map

3 Data Grouping

Performing encryption operation to every ASCII value will be costly. In order to

reduce the computation, grouping of data is adopted.

1. A number of a particular base, . Ri , can be converted to another base . X by the

following operations:
Consider values . R0 , R1 , R2 , R3 , . . . , Rn−1 , Rn
They we follow grouping as:
Base: X

. Big I nteger = Rn ∗ X n + Rn−1 ∗ X n−1 + · · · + R1 ∗ X 1 + R0 ∗ X 0 (7)

4 Proposed Methodology

The communicating parties adopted DHKE method over the elliptical curve to share
the produced secret key. The proposed encryption and decryption algorithm are
illustrated in Fig. 2.

4.1 Encryption Algorithm

1. Input text data is taken and converted to its corresponding ASCII values.
2. Using a secret key as seed value, a chaotic sequence is generated, and its lengths
are equal to length of input.
252 P. N. V. L. S. Sneha Sree et al.

Key
Input Text Data Cipher Text

D
E e
n Logistic Map c
Corresponding ASCII Value Corresponding ASCII Value
c r
r y
y p
p Chaotic Sequence
Data Grouping t
t i
i o
o n
n XORed Data ECC Decryption
P
P r
r o
Data Grouping
o To Integer Form c
c e
e s
s ECC Encryption s
s

To Integer Form XORed Data

Corresponding ASCII Characters Corresponding ASCII Characters

Plain Text
Cipher Text

Fig. 2 Schematic structure of the proposed algorithm

3. XoR operation is executed between the values in step 1 and step 2.

4. Data grouping is performed to the values in step 3, and a series of large integers
are generated, ‘32’ is padded if the series is odd.
5. Two adjacent values are taken as input for the ECC encryption.
6. ECC encryption is performed to the values in step 5 [17].
7. The encrypted data from step 5 is converted to integer values
8. The data from step 6 is converted to its corresponding ASCII values, and thus,
the cipher text is generated

4.2 Decryption Algorithm

1. The received cipher is converted to its ASCII value.

2. Data grouping is performed to the values in step 1.
3. ECC decryption is applied to the data in step 2 [17].
4. The data in step 3 is converted to integer values by reversing the data grouping
process.
Text Encryption Using ECC and Chaotic Map 253

5. Using the secret seed value a chaotic sequence is generated.

6. XoR operation is performed between the values in step 4 and step 5.
7. The data from step 6 is converted to its respective ASCII value, and thus, the
original plain text is obtained.

5 Experimental Simulation

The simulation is carried out using Mathematical on a system with RYZEN 7 5000
series as processor, 16 GB RAM. ECC Brainpool Parameters for 512-Bit Curves
(. ECC512 curve) [18] is used. Any text data can be used for encryption.
Encryption process As an example let the input be: Koneru Lakshmaiah Education
foundation Guntur, Andhra Pradesh 522302. So, its corresponding ASCII values
will be- {75, 111, 110, 101, 114, 117, 32, 76, 97, 107, 115, 104, 109, 97, 105, 97,
104, 32, 69, 100, 117, 99, 97, 116, 105, 111, 110, 32, 102, 111, 117, 110, 100,
97, 116, 105, 111, 110, 32, 71, 117, 110, 116, 117, 114, 44, 32, 65, 110, 100, 104,
114, 97, 32, 80, 114, 97, 100, 101, 115, 104, 32, 53, 50, 50, 51, 48, 50}. And
the XoRed data is obtained as- {78, 25, 67, 236, 248, 230, 115, 200, 29, 25, 104,
78, 40, 34, 106, 80, 64, 91, 103, 87, 114, 75, 13, 225, 77, 43, 13, 57, 124, 56,
8, 39, 69, 105, 230, 248, 254, 34, 81, 117, 94, 89, 75, 87, 26, 8, 165, 92, 53, 9,
113, 20, 237, 83, 223, 85, 245, 38, 245, 72, 31, 53, 61, 111, 50, 18, 109, 125.}.
With the . ECC512 curve, 31 ASCII characters can be grouped [19] using the data
grouping algorithm. After data grouping, the grouped ASCII values are generated as
(234136428739045330699128408015630808708125583027459807202164636369605
1964706311736867755528131649211506053168719863324161623481619564754351
92615029, 3433971040155273319344750108422875121163765540091608785014053
9015657813159510245160576271572969789227080230947761550347425550890901
7135687624294432, 64073990790852999619018802). The input (. Pm ) for the ECC
encryption is taken as. Pm [1]= (2341364287390453306991284080156308087081255
830274598072021646363696051964706311736867755528131649211506053168719
86332416162348161956475435192615029, 343397104015527331934475010842287
512116376554009160878501405390156578131595102451605762715729697892270
802309477615503474255508909017135687624294432 and. Pm [2]= (6407399079085
2999619018802, 32). As the series is odd 32 is padded. After performing the encryp-
tion operation and mapping to its corresponding ASCII character, the cipher text is
generated as shown in Fig. 3.

Fig. 3 Cipher text generated using the proposed method

254 P. N. V. L. S. Sneha Sree et al.

Decryption process To the receiver’s side, the secret seed value is obtained through
Diffie-Hellman key exchange methodology. The cipher text is also obtained, and
its corresponding ASCII value is generated as (5713, 9933, 7552, 2032, 61898,
51902, 2129, 64847, 18986, 46567, 11406, 22448, 3623, 33762, 13639, 23347,
61355, 8546, 35491, 53060, 62795, 12998, 19758, 18365, 27639, 59146, 27837,
7240, 55565, 18301, 266, 23758, 6059, 5448, 60237, 22395, 35904, 34552, 34643,
62697, 50282, 7939, 33091, 21466, 9806, 54530, 16759, 11541, 6495, 27776, 2473,
36629, 2101, 61940, 56653, 34610, 7985, 34171, 7914, 14258, 43804, 4131, 11407,
48397, 28328, 7756, 13897, 3002, 49230, 31412, 32085, 18834, 219, 13498, 47050,
35891, 64995, 29259, 2112, 16930, 33157, 24114, 16805, 27949, 19678, 44486,
12601, 26367, 38128, 48349, 10148, 42166, 33352, 28755, 42228, 24219, 33945,
36586, 53019, 10272, 23688, 20614, 52782, 38289, 50240, 37789, 61036, 33175,
54140, 44924, 19202, 37383, 20944, 35767, 14662, 57554, 46006, 1582, 17501,
44470, 59259, 56649, 4214, 32943, 25415, 21629, 7027, 22764). After data group-
ing, we obtained the values as-(692617089956388824536608628841288590053
1964574752970775831636177001632709041223471323091848676365986807520925
230661579701384920024739891072163904960051489, 275497454792548792130339
0100395977366319657773641180457277490430902626674895543682695733337911
698764303589142167768931883714591950914869678122269099179592, 452481156
0089991239981546387384157036172096688426413904760445949592200527912764
3481249620738781923038159740598752199893078388652562410730865346101119
25208, 3107720616474291426241574354539899713886556055364482199927592282
7420416028906147387437477029825154145297225727292079201506705861792497
32595930757060914281). These values are taken as input, and ECC decryption is
performed. Then, it is converted to reverse data grouping as obtained as {78, 25, 67,
236, 248, 230, 115, 200, 29, 25, 104, 78, 40, 34, 106, 80, 64, 91, 103, 87, 114, 75, 13,
225, 77, 43, 13, 57, 124, 56, 8, 39, 69, 105, 230, 248, 254, 34, 81, 117, 94, 89, 75, 87,
26, 8, 165, 92, 53, 9, 113, 20, 237, 83, 223, 85, 245, 38, 245, 72, 31, 53, 61, 111, 50,
18, 109, 125.}. Finally using the secret key as seed value and XoRed is performed
to obtain {75, 111, 110, 101, 114, 117, 32, 76, 97, 107, 115, 104, 109, 97, 105, 97,
104, 32, 69, 100, 117, 99, 97, 116, 105, 111, 110, 32, 102, 111, 117, 110, 100, 97,
116, 105, 111, 110, 32, 71, 117, 110, 116, 117, 114, 44, 32, 65, 110, 100, 104, 114,
97, 32, 80, 114, 97, 100, 101, 115, 104, 32, 53, 50, 50, 51, 48, 50}. These values are
converted back to its ACSII characters to get the decrypted text which is the same as
input text data.

6 Experimental Analysis

6.1 Key Space

The size of the key that is used often affects how secure a cryptographic procedure is.
Because there are more potential keys to attempt, a bigger key size typically suggests
Text Encryption Using ECC and Chaotic Map 255

that it will be harder for an attacker to guess or break the key. However, as you pointed
out, it’s crucial to take into account the computational load while raising the key size
because it might affect the system’s performance. The proposed method used key
space as .2512 .

6.2 Key Sensitivity

Data is encrypted and decrypted using keys in cryptographic systems, and the security
of the system depends on keeping the keys private. Key sensitivity is therefore a
crucial factor to take into account when developing and implementing a cryptographic
system, and precautions should be made to guarantee that keys are handled and
maintained securely. When the key is change by a single bit, the whole outputs get
an avalanche effect. While decrypting, Fig. 4 shows the key sensitivity effect with 1
bit difference in key.

6.3 Histogram Analysis

Data are shown using a histogram according to their frequencies. The histogram for
the cipher text shows equal distribution of its frequency and is shown in Fig. 5.

Fig. 4 Key sensitivity effect with 1 bit key difference

Fig. 5 Histogram generated

using the proposed method
256 P. N. V. L. S. Sneha Sree et al.

6.4 Ciphertext Only Attack

A sort of cryptographic assault known as a cipher text only attack (COA) occurs
when an attacker only has access to cipher text and is unaware of the accompanying
plain text or encryption key. By examining the cipher text, the attacker seeks to
ascertain the plain text or the key. A typical attack used to evaluate the security of
cryptographic systems is the COA. Both the strength of the encryption technique
and the security of the key used may be assessed using them. In general, the more
difficult it is to properly conduct a COA, the stronger the encryption technique and
the longer the key. Due to our method’s adoption of . ECC512 parameters, this attack
will fail.

6.5 Known Plaintext Attack

An assault known as a Known plaintext attack(KPA) is a form of cryptographic attack

in which the attacker has eligibility to both the cipher text and the respective plain
text of a communication and trial to decipher the encryption key or technique in use.
The security of the encrypted data may be jeopardized by KPA attacks; hence, it is
crucial to build cryptographic systems that are resistant to them. Use of powerful
encryption techniques and lengthy, unpredictable keys that are challenging for an
attacker to decipher is one strategy to strengthen resistance to KPAs. Due to our
method’s adoption of . ECC512 parameters, this attack will fail.

6.6 Performance Comparison

The proposed method is compared with the methods in [1]. The inputs are take
as 10,000 words counts for different text inputs. While using the data grouping,
the proposed method can grouped more ASSCII characters as the value of ECC
prime modulo. P = 512. Table 1 shows the performance comparison with the existing
methods where the proposed method shows better performance.

Table 1 Performance comparison

Method Encryption Decryption
Proposed 0.06562 0.0643
Reference [1] 0.0764 0.0725
Text Encryption Using ECC and Chaotic Map 257

7 Conclusion

A new and efficient text encryption algorithm has been proposed based on ECC and
chaotic system. The input text data is converted to its corresponding ASCII values,
and a chaotic sequence is also generated using the secret key as seed value. XoRed
operation is performed in between the input data and chaotic sequence which is later
grouped to a fixed grouped size. Each group is then converted to a number of single
large integers. ECC encryption is then performed on the larger integers generated.
Further, the encrypted integers are then converted to ASCII values, and finally, the
cipher text is generated. The proposed method can successfully encrypt and decrypt
the input text data.

References

1. Keerthi K, Surendiran B (2017) Elliptic curve cryptography for secured text encryption. In:
International conference on circuit, power and computing technologies (ICCPCT). IEEE, pp
1–5
2. Agrawal E, Ram Pal P (2017) A secure and fast approach for encryption and decryption of
message communication. Int J Eng Sci 11481
3. Laiphrakpam DS, Rohit T, Singh KM, Awida MA (2022) Encrypting multiple images with an
enhanced Chaotic map. IEEE Access 10:87844–87859
4. Gamido HV, Sison AM, Medina RP (2018) Mdified AES for text and image encryption. Indone-
sian J Electr Eng Comput Sci 11(3):942–948
5. Lauter K, Private AI: machine learning on encrypted data. In: Recent advances in industrial
and applied mathematics, pp 97–113
6. Almuzaini KK, Kumar SA, Raju R, Vikas G, Shrivastava R, Halifa A (2022) Key aggrega-
tion cryptosystem and double encryption method for cloud-based intelligent machine learning
techniques-based health monitoring systems. In: Computational intelligence and neuroscience,
2022
7. Jaschke A, Armknecht F (2018) Unsupervised machine learning on encrypted data. In: 25th
international conference, Calgary, AB, Canada, Aug 15–17
8. Bao Z, Xue R (2021) Survey on deep learning applications in digital image security. Opt Eng
60(12):120901
9. Laiphrakpam DS, Khumanthem MS (2017) Medical image encryption based on improved
ElGamal encryption technique. Optik 147:88–102
10. Shahna KU, Mohamed A (2020) A novel image encryption scheme using both pixel level and
bit level permutation with chaotic map. Appl Soft Comput 90:106162
11. Moafimadani SS, Chen Y, Tang C (2019) A new algorithm for medical color images encryption
using chaotic systems. Entropy 21(6):577
12. Maniyath SR, Thanikaiselvan V (2020) An efficient image encryption using deep neural net-
work and chaotic map. Microprocess Microsyst 77:103134
13. Erkan U, Toktas A, Enginoğlu S, Akbacak E, Thanh DNH (2022) An image encryption scheme
based on chaotic logarithmic map and key generation using deep CNN. Multim Tools Appl
81(5):7365–7391
14. Jasra B, Moon AH (2022) Color image encryption and authentication using dynamic DNA
encoding and hyper chaotic system. Expert Syst Appl 206:117861
15. Lagmiri SN, Elalami N, Elalami J (2018) Color and gray images encryption algorithm using
chaotic systems of different dimensions. Int J Comput Sci Netw Secur 18(1)
258 P. N. V. L. S. Sneha Sree et al.

16. Yu J et al (2020) Image parallel encryption technology based on sequence generator and chaotic
measurement matrix. Entropy 22(1):76
17. Khoirom MS, Laiphrakpam DS, Tuithung T (2021) Audio encryption using ameliorated ElGa-
mal public key encryption over finite field. Wireless Pers Commun 117:809–823
18. Elliptic curve parameter. http://www.ecc-brainpool.org/download/Domainparameters.pdf.
Accessed 19 Aug 2022
19. Singh KM, Singh LD, Tuithung T (2023) Improvement of image transmission using chaotic
system and elliptic curve cryptography. Multimed Tools Appl 82:1149–1170
20. Singh KM, Singh LD, Tuithung T (2022) Improvement of image transmission using chaotic
system and elliptic curve cryptography. Multim Tools Appl, 1–22
Plant Leaf Disease Detection
and Classification: A Survey

Rajiv Bansal, Rajesh Kumar Aggarwal, and Neha Goyal

Abstract Yields are impacted by climate and temperature, making them susceptible
to pathogen infection during growth. Progressive disease detection and prevention
in crops are compulsory to avoid disease-induced damage during growth, harvest-
ing and post-harvesting, enhance productivity, and ensure yield sustainability. In
the earlier decade, the authors contributed several research articles to detect disease
locations and identify complex disease patterns using leaf images. The leaf is the
most prominent organ that shows the most distinct features that plant pathologists
can identify through visual inspection. This article analyzes the principal aspects that
affect the design and effectiveness of disease detection and classification framework
using current technologies. An in-depth analysis of the various findings, highlighting
advantages and shortcomings, has been discussed, leading to more realistic conclu-
sions about the subject. The assessment is centralized on providing a thorough study
and factors for evolving AI-based techniques to support plant disease detection and
provide disease oversight support to agriculturalists.

Keywords Plant disease · Lesion · Detection · Segmentation · Classification

R. Bansal · R. K. Aggarwal
National Institute of Technology, Kurukshetra, India
e-mail: r_k_a@nitkkr.ac.in
R. Bansal (B)
JMIT Radaur, Radaur, India
e-mail: rajiv_62000071@nitkkr.ac.in; rajivbansal@jmit.ac.in
N. Goyal
M.M. Institute of Computer Technology & Business Management, Maharishi Markandeshwar
Deemed to be University, Mullana, Ambala, Haryana, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 259
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_22
260 R. Bansal et al.

1 Introduction

Cultivation depends on the quantity and quality of yield. Agriculture is an integral

part of a nation’s GDP. In India, 58% of the total population primarily based on
agriculture for their living. Crop pests and diseases have been the leading cause of
reduced food production and have impacted food quality. Since the advent of farming,
plant diseases have caused significant monetary, social, and environmental losses.
Pest results in 20 to 40% loss in global food production. Therefore, the pest problem
must be solved efficiently and accurately.
The pathogen, a favorable environment, and the host are three significant ele-
ments that aid plant disease formation. Primarily diseases affect the plant from the
bottom towards the top, while some show symptoms on leaves. Infection results in
diseases spreading throughout the crop. Crops must be regularly monitored because
plant diseases may appear after pollination. Early disease management is required to
prevent the spread of disease. There are different types of diseases in plants which
affect various plant organs. Plant disease is an abnormality in plants’ physiology
and morphology behavior. Diseased plants usually have noticeable spots on leaves,
stems, flowers, or fruits. Generally, every disease condition presents a distinct per-
ceptible pattern that can be used to detect irregularities distinctively. Usually, the
leaves of plants are the primary source for identifying plant diseases, and most of the
symptoms of diseases appear on the leaves. Plant diseases caused by infection, i.e.,
fungi, viruses, or bacteria, are also known as biotic diseases. Other factors include
soil deficiency and environmental factors that cause abiotic diseases [1, 2]. Through
visual inspection, pathologists can identify leaf-related diseases, i.e., foliar diseases.
Generally, fungal diseases result in up to 50% yield losses. The remaining manuscript
is organized as follows; the related work of disease detection and classification frame-
work is presented in Sect. 2. A comparative analysis of the literature using machine
learning and deep learning model is done in Sect. 3. Several challenges in the frame-
work are discussed in Sect. 4. Section 4 concludes the manuscript and shows future
aspects.

1.1 Motivation

Diagnosing disease in plants at early stages is essential to overcome several issues.

Early diagnosis methods are based on detecting plant lesions by plant protection
experts having real-time experience. The visual judgment of disease detection relies
highly on expert knowledge, which may sometimes result in subjective gaps, con-
tain biases, and have low accuracy. Another approach is based on the microscopic
observation of pathogens which has high accuracy. However, it is time-consuming,
which makes it inappropriate for field detection. Image processing tools are used for
effective and efficient disease recognition and detection to facilitate non-experts or
non-pathologists.
Plant Leaf Disease Detection and Classification: A Survey 261

The researcher can identify foliar diseases using computer vision, machine learn-
ing, and deep learning techniques. Effective disease diagnosis system must incor-
porate disease identification [3, 4], identification of multiple diseases in multiple
crops and multiple diseases in a single crop [5], estimation of the various disorders
[6], assessment of the right volume of pesticide needs to be spread [7, 8], and other
appropriate measures for restricting the spread of disease [4].

2 Related Work

Researchers have proposed several techniques in the last decades using computa-
tional intelligent practices, soft computing, and image processing to improve the
detection and classification system to facilitate crop field monitoring. The disease
detection and classification task are divided into various modules, i.e., image acqui-
sition, image pre-processing, image segmentation, and classification. The first and
foremost important phase is selecting input organs and taking pictures of the plant,
including the leaf, stem, root, and branches. The performance of any recognition
system entirely depends on training data. Therefore, the image acquisition phase is
a foundation and challenging.The requirements of plant diseases can be divided into
three different levels: what, where, and how.
1. “what” corresponds to the classification, identify the label of the category to
which it belongs.
2. “where” corresponds to the location task, identifying types of diseases that exist
in the image, but also gives their specific locations
3. “how” corresponds to the segmentation task.

2.1 Disease Detection and Classification

After acquiring leaf images with disease patterns, the pre-processing image phase
deals with noise removal and content enhancement. In this phase, various image pre-
processing techniques like colour space conversion, threshold segmentation, rotation,
transformation, contrast stretching, smoothening, and many more are explored. The
next phase deals with the segmentation process. This phase aims to partition the
given images to get the region of interest, i.e., spotted or lesion region, infected area
in input images. The resultant image is easy to analyze and more meaningful in
separating infected and non-infected regions.
The segmented images are then forwarded to the next phase of feature extraction.
This phase converts an image to a feature vector. The feature represents the relevant
and discriminating attributes associated with the objects that differentiate them from
other objects. Various discriminating characteristics like shape, colour, and texture
are discussed to make classification efficient. In the last phase, the machine learn-
262 R. Bansal et al.

ing model comes into the picture that classifies healthy/ non-healthy plants. The
phase entirely depends on early stages, i.e., pre-processing, segmentation, and fea-
ture extraction. A model is trained with a set of training images, and then, a trained
model categorizes the new sample into healthy or diseased plants.
The machine learning model for disease detection and classification Joshi et al.
[9] uses 115 self-selected leaf images to classify four disease patterns. The author
employed colour moments, shape descriptor, eccentricity, and orientation and pre-
sented the 88.15% accuracy using . K -NN classifier. In the study [10], the author
explored the vegetation index classification model to identify sheath blight patterns
from leaf images captured through a multi-spectral camera. Coulibaly et al. [11]
deployed the transfer learning using the VGG-16 model to classify two disease pat-
terns and healthy leaf images. The author used only 99 images of the self-captured
dataset for training the classifier. The model produced 89% accuracy with 25 test
samples. Rothe et al. [12] utilized EBPNN with 85.52% accuracy on images captured
through the digital camera. The model was designed to classify three disease types:
bacterial blight, Alternaria, and myrotheciu. In another study [13], a self-collected
dataset captured using mobile phones in controlled conditions was analyzed with
textures features and gradient values. SVM with radial basis kernel produced 82%
classification results. Comparative analysis of various research studies focused on
image processing techniques and machine learning models is discussed in Table 1.
Deep learning-based disease detection and classification To overcome the arbi-
trary selection of plant disease spots and features, deep learning applications make the
feature extraction more objective and result in efficient research and high technology
transformation speed [11, 14–16]. In the study, [17], the transfer learning approach is
used to implement EfficientNet architecture with pre-trained noisy-student weights.
Experiments are performed using 14 plant cultures with 39 background and aug-
mented image sub-categories. Optimal mobile network-based CNN (OMCNN) was
implemented on various phases of disease detection [18]. Image segmentation is done
using bilateral filtering, and Kapur’s thresholding is used to identify the affected por-
tion of the leaf image. MobileNet was deployed at the feature extraction phase of
the detection and classification system. In addition, an emperor penguin optimizer is
used to optimize the hyperparameter. For the classification phase, an extreme learn-
ing machine is implemented. In the study, [19], transfer learning is used to identify
the diseases in multiple crops. CNN-based visual geometry group is used to improve
performance.
However, deep learning approaches suffer from several issues, including depen-
dency on the large amount of training data that increases the computational cost,
noise on labels, leads to overfit, and degraded performance.

2.2 Compartaive Analysis

See Table 1.
Table 1 Comparative analysis for plant disease detection and classification using leaf images with machine learning models
References Diseases Image data Count of images Features Classifier Accuracy (%)
[20] Brown spot, Leaf blight, APS image data & RRI 400 images, RTT- 70:30 Local features and SVM 94.16
False smut, vocabulary using SIFT
descriptors and building
using BoW technique,
respectively
[10] Sheath blight Captured using UAV Color features (red, green and Vegetation index method 63
(multi-spectral camera) hue)
[21] Gray leaf spot, Common rust, Plant Village 3823 images Local texture features SVM, DT, RF, Naïve Bayes’ 87
Northern blight, Healthy
[5] Common rust, Northern Self-captured, using 50 for each category Six first-order histograms and KNN (distance metric), SVM KNN: 85 SVM:
blight, both diseases, Healthy Samsung digital camera GLCM features (kernel) 88
leaf PL200, from Agriculture
university Dharwad
[22] Downy mildew, Frog eye, Plant Village 4775 leaf images Fusion (Color, texture, shape) SVM 84
Septoria leaf blight
Plant Leaf Disease Detection and Classification: A Survey

[23] 18 Culture, four disease 46 images LBP (Texture) One-class SVM 83.3–95
pattern
[24] Canker, Melanose, Scab, Image Gallery dataset 1000 citrus fruit images Color, Geometric, and Multiclass SVM , DT, LDA, 95.8
Greening, black spot, Texture features KNN, and Ensemble Boosted
Anthracnose Tree
[25] Powdery mildew Plant Village dataset 50 healthy and 50 infected Quantification of diseased 99
area using calculated lesion
area ratio
[26] Brown spot, Powdery mildew Collected from University of 100 images (Training: 60 & GLCM features SVM 100
Florida & Plant Village Testing: 40)
dataset
(continued)
263
264

Table 1 (continued)
References Diseases Image data Count of images Features Classifier Accuracy (%)
[27] Healthy, Fusarium, Captured from ARI, Turkey 80 leaf images Wavelet transform (min, max, ANN, NB, KNN 84
Mycorrhizal fungus, Both mean, SD)
[11] Yellowing, Malformation of Self-captured and some 124 images (training& Transfer learning VGG-16 model with early 89
ear, Plantule, Partial green downloaded from Internet validation: 99; testing: 25) stopping technique
ear, Healthy
[14] 25 plants & 58 diseases Combined (PlantVillage and 87,848 Transfer learning AlexNet, Overfeat, 33.27–99.53
self-captured) GoogLeNet, VGG,
AlexNetOWTBn
[15] 14 plants and 59 diseases Self-prepared (augmented) 1575 (original), 46,409 Transfer learning GoogLeNet 25–100
(augmented)
[16] Grapevine yellow Self-captured and Plant 272 (self-captured), Plant AlexNet, GoogLeNet, 98, 96, 98, 99,
Village datasets Village (3400 images) Inception-v3, ResNet-50 & 99, 94
101, SqueezeNet
[28] Esca Black rot Healthy Plant Village dataset 2986 Siamese network 92
[29] 9 tomato disease pattern Captured (natural 5000 DWT, Haar wavelets ResNet-50, VGG-16, 83.06
background) ResNet-101, ResNet-152,
ResNeXt-50
[30] 6 disease pattern Plant Village dataset 9000 Transfer learning CNN 99.84
[31] 17 classes of corn, grapevine, Plant Village dataset 15,873 fuzzy color histogram and PNN 95.68%
and tomato fuzzy GLCM for color and
texture
[32] 38 disease pattern and 44 Plant village dataset& own 54,305 & 10,851 Attention dense learning DADCNN-5 99.93%, 97.93%
health pattern dataset
[33] 5 apple leaf diseases Apple dataset 2141 RegNet-Adam Transafer learning 99.23%
R. Bansal et al.
Plant Leaf Disease Detection and Classification: A Survey 265

3 Findings

In this article, we surveyed more than 25 articles on plant disease detection and
classification. The article in the survey focused on multiple disease patterns, images
acquired in multiple conditions, different devices, including smartphone cameras,
digital cameras, UAV images captured with the multi-spectral camera, augmented
dataset, and images downloaded from the Google search engine. Some researchers
captured images with natural backgrounds; others collected leaf images with proper
sunlight and controlled settings. From the survey, it is inferred that the quality of leaf
images and the mode of collection greatly influenced the efficiency of the pattern
recognition system. It also reduces the overhead of the high pre-processing cost of
the disease detection phase. The entire recognition system aims to answer three lev-
els, i.e., “where”, “how”, and “what”. Several researchers focused on cost-effective
machine learning and image processing-based approaches, while others focused on
efficient deep learning-based models. The comparative analysis presented in Table 1
confirmed that amount of training images plays a significant role. The research study
focused on a single plant with two or more disease types. Few kinds of research
focused on large-scale disease detection and identification systems due to the high
computational complexity and the non-scalability of machine learning algorithms.
The similarity in the pattern of diseased spots in various cultures demands dis-
criminating attributes that increase computation cost and classification time. Deep
learning-based algorithms are more accurate than image processing tools but compu-
tationally expensive and require vast training data. The studies with transfer learning
approaches are inefficient on self-captured dataset and augmented dataset.

4 Conclusion

The survey concentrates on deep learning and machine learning approaches for foliar
disease detection and identification using leaf images while addressing several chal-
lenges. The performance of plant disease detection and recognition depends on the
quality of acquired images. Experimental results are greatly influenced by the avail-
ability of the dataset, i.e., real-time, controlled conditions, and limited size. The
framework must be effective and efficient. Improved accuracy with a lesser compu-
tational cost is highly desirable.

References

1. Bera T, Das A, Sil J, Das AK (2019) A survey on rice plant disease identification using
image processing and data mining techniques. In: Emerging technologies in data mining and
information security. Springer, Singapore, pp 365–376
266 R. Bansal et al.

2. Kaur S, Pandey S, Goel S (2018) Semi-automatic leaf disease detection and classification
system for soybean culture. IET Image Process 12(6):1038–1048
3. Johannes A, Picon A, Alvarez-Gila A, Echazarra J, Rodriguez-Vaamonde S, Navajas AD, Ortiz-
Barredo A (2017) Automatic plant disease diagnosis using mobile capture devices, applied on
a wheat use case. Comput Electron Agric 138:200–209
4. Syed-Ab-Rahman SF, Hesamian MH, Prasad M (2022) Citrus disease detection and classifi-
cation using end-to-end anchor-based deep learning model. Appl Intell 52(1):927–938
5. Deshapande AS, Giraddi SG, Karibasappa KG, Desai SD (2019) Fungal disease detection in
maize leaves using Haar wavelet features. In: Information and communication technology for
intelligent systems. Springer, Singapore, pp 275–286
6. Wang G, Sun Y, Wang J (2017) Automatic image-based plant disease severity estimation using
deep learning. Comput Intell Neurosci
7. Upadhyay SK, Kumar A (2021) Early-stage Brown spot disease recognition in paddy using
image processing and deep learning techniques. Trait du Signal 38(6)
8. Wang C, Du P, Wu H, Li J, Zhao C, Zhu H (2021) A cucumber leaf disease severity classification
method based on the fusion of DeepLabV3+ and U-Net. Comput Electron Agric 189:106373
9. Joshi AA, Jadhav BD (2016) Monitoring and controlling rice diseases using image processing
techniques. In: 2016 International conference on computing, analytics and security trends
(CAST). IEEE, pp 471–476
10. Zhang D, Zhou X, Zhang J, Lan Y, Xu C, Liang D (2018) Detection of rice sheath blight using
an unmanned aerial system with high-resolution color and multispectral imaging. PloS ONE
13(5):e0187470
11. Coulibaly S, Kamsu-Foguem B, Kamissoko D, Traore D (2019) Deep neural networks with
transfer learning in millet crop images. Comput Ind 108:115–120
12. Rothe PR, Kshirsagar RV (2015) Cotton leaf disease identification using pattern recognition
techniques. In: 2015 International conference on pervasive computing (ICPC). IEEE, pp 1–6
13. Hallau L, Neumann M, Klatt B, Kleinhenz B, Klein T, Kuhn C, Oerke EC (2018) Automated
identification of sugar beet diseases using smartphones. Plant Pathol 67(2):399–410
14. Ferentinos KP (2018) Deep learning models for plant disease detection and diagnosis. Comput
Electron Agric 145:311–318
15. Barbedo JGA (2019) Plant disease identification from individual lesions and spots using deep
learning. Biosyst Eng 180:96–107
16. Cruz A, Ampatzidis Y, Pierro R, Materazzi A, Panattoni A, De Bellis L, Luvisi A (2019) Detec-
tion of grapevine yellows symptoms in Vitis vinifera L. with artificial intelligence. Comput
Electron Agric 157:63–76
17. Hanh BT, Manh Van, H.,& Nguyen, N. V. (2022) Enhancing the performance of transferred effi-
cient net models in leaf image-based plant disease classification. J Plant Dis Protect 129(3):623–
634
18. Ashwinkumar S, Rajagopal S, Manimaran V, Jegajothi B (2022) Automated plant leaf disease
detection and classification using optimal MobileNet based convolutional neural networks.
Mater Today: Proc 51:480–487
19. Paymode AS, Malode VB (2022) Transfer learning for multi-crop leaf disease image classifi-
cation using convolutional neural network VGG. Artif Intell Agric 6:23–33
20. Bashir K, Rehman M, Bari M (2019) Detection and classification of rice diseases: an automated
approach using textural features. Mehran Univ Res J Eng Technol 38(1):239–250
21. Kusumo BS, Heryana A, Mahendra O, Pardede HF (2018) Machine learning-based for auto-
matic detection of corn-plant diseases using image processing. In: 2018 International confer-
ence on computer, control, informatics and its applications (IC3INA). IEEE, pp 93–97
22. Kaur S, Pandey S, Goel S (2018) Semi-automatic leaf disease detection and classification
system for soybean culture. IET Image Process 12(6):1038–1048
23. Pantazi XE, Moshou D, Tamouridou AA (2019) Automated leaf disease detection in different
crop species through image features analysis and one class classifiers. Comput Electron Agric
156:96–104
Plant Leaf Disease Detection and Classification: A Survey 267

24. Sharif M, Khan MA, Iqbal Z, Azam MF, Lali MIU, Javed MY (2018) Detection and classifi-
cation of citrus diseases in agriculture based on optimized weighted segmentation and feature
selection. Comput Electron Agric 150:220–234
25. Sengar N, Dutta MK, Travieso CM (2018) Computer vision based technique for identification
and quantification of powdery mildew disease in cherry leaves. Computing 100(11):1189–1201
26. Esmaeel AA (2018) A novel approach to classify and detect bean diseases based on image pro-
cessing. In: 2018 IEEE symposium on computer applications & industrial electronics (ISCAIE).
IEEE, pp 297–302
27. Karadağ K, Tenekeci ME, Taşaltın R, Bilgili A (2020) Detection of pepper fusarium disease
using machine learning algorithms based on spectral reflectance. Sustain Comput: Inform Syst
28:100299
28. Goncharov P, Ososkov G, Nechaevskiy A, Uzhinskiy A, Nestsiarenia I (2018) Disease detection
on the plant leaves by deep learning. In: International conference on neuroinformatics. Springer,
Cham, pp 151–159
29. Fuentes A, Yoon S, Kim SC, Park DS (2017) A robust deep-learning-based detector for real-
time tomato plant diseases and pests recognition. Sensors 17(9):2022
30. Ashqar BA, Abu-Naser SS (2018) Image-based tomato leaves diseases detection using deep
learning
31. Nagi R, Tripathy SS (2023) Plant disease identification using fuzzy feature extraction and PNN.
Signal, Image Video Process, pp 1–7
32. Pandey A, Jain K (2022) A robust deep attention dense convolutional neural network for plant
leaf disease identification and classification from smart phone captured real world images. Ecol
Inform 70:101725
33. Li L, Zhang S, Wang B (2022) Apple leaf disease identification with a small and imbalanced
dataset based on lightweight convolutional networks. Sensors 22(1):173
Performance Evaluation of K-SVCR
in Multi-class Scenario

Vivek Prakash Srivastava, Kapil, and Neha Goyal

Abstract The support vector classification regression machine for K-class classifi-
cation (K-SVCR) based on the “1-verses-1-verses-rest” structure is a unique multi-
class classification method, and it generates ternary output {−1, 0, 1}. Since it gener-
ates ternary output, the training data requires corresponding labels. In this article, we
have checked the performance of K-SVCR to explore the impact of (1) the labeling
of the class and (2) the relative location of class clouds. Many artificially gener-
ated datasets and a real-world dataset are considered to understand the impact of
the choice of labels and relative location of the class distribution. Accuracy is used
for K-SVCR concerning the choice of class labels and the relative location of data
clouds. We found that the change in the class labels—positive + , negative −, and
neutral 0 affects the accuracy of classification significantly.

Keywords Three-class classification · Support vector machines · Robustness ·

K-SVCR

1 Introduction

The support vector machines (SVMs) for binary class classification problems were
put forward by Vapnik [1]. The SVMs solve a convex optimization, a quadratic
programming problem (QPP). It has some important advantages such as globally
optimal solution [2] and unique solution [3]. SVMs have shown promising results
for different kinds of applications like identification of fingerprints [4], recognition

V. P. Srivastava · Kapil (B)

National Institute of Technology, Kurukshetra, India
e-mail: navkapil@gmail.com
V. P. Srivastava
e-mail: vivek_61900123@nitkkr.ac.in
N. Goyal
MMICTBM, Maharishi Markandeshwar University, Mullana, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 269
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_23
270 V. P. Srivastava et al.

of different face expressions [5], detection and discovery of drug [6], biomedicine
[7], Alzheimer’s disease detection [8], plant identification [9], and so on. The SVMs
are specific to binary class classification problems. During the learning process, the
SVM classifier builds a decision function as given below:

f (x, w) = sign(h(x, w)) (1)

with output {+ 1, − 1} and

h(x, w) = w, x + b (2)

It is usually seen that many real-world problems of classification require a multi-

class classification approach for which it is necessary to extend the binary classifier
to a multi-class classifier. Mainly, there are two types of approaches for solving
multi-class classification problems. First one is the “1-versus-rest” method, and it
uses the “decomposition-reconstruction” technique which utilizes the binary SVC to
solve the problem of multi-class classification. It constructs m binary classifiers [10]
which involves all the training samples for each turn, so class imbalance problem is
likely to occur in this strategy. The second approach is “1-verses-1” method which
constructs m(m−1)2
binary classifiers [11]. This approach leaves out the remaining
training instances in each classifier’s training process, resulting in a less informed
classifier. The K-SVCR [12, 13] utilizes the “1-versus-1-versus-rest” approach. This
strategy also constructs m(m−1)2
binary classifiers, so every classifier gets trained
with all of the training data. This strategy conquers the information loss risk, and
imbalance class problems, so the K-SVCR can give finer performance than some
other SVM methods for multi-class classification.
In multi-class classification, the number of classes, say K is greater than 2. It is
solved by the union of decision functions of two classes. The K-division is trans-
formed into an ensemble of L number of two-part partitions, f 1 ,…f L by a decompo-
sition method and then a rebuilding scheme selects one, or none, from the K classes
as output forms the combination of L learning SVMs. The K-SVCR uses a new
SV algorithm as a decomposition method. For multi-class classification when the
number of classes K > 2, and the classes are explicitly separated, the output + 1 or
− 1 will be allocated to the learning machines, and if the output is 0 labeled, then
it will not belong to any of the classes and in this case, and classes are not explic-
itly separated. The K-SVCR machine is related to a quadratic programming (QP)
problem which can clearly be seen in midway between the SVC technique and the
SV learning technique for regression evaluation, in conclusion, SVR along with 0 is
the unique value that is to be regressed [12].
Performance Evaluation of K-SVCR in Multi-class Scenario 271

2 Literature Review

Many researchers have used K-SVCR in many different contexts. Tian et al. [14]
proposed the K-LSVCR algorithm which solves linear programming instead of
quadratic programming. Ma et al. [15] proposed K-RLSSVCR which is the robust
least square version of the K-SVCR, and according to this paper, K-SVCR is sensitive
to outliers and also time-consuming so K-RLSSVCR uses truncated square loss and
squares ξ −insensitive loss of ramp to partly minimize the effect of outliers. Some
other researchers like [16, 17] have utilized K-SVCR in many different aspects. The
labels of the classes are usually chosen as + 1, − 1, and 0. In the K-LSVCR tech-
nique, a decision function is constructed through which the input given is partitioned
into three classes. When the decision function value = + 1, a positive vote is added
to the + 1 class, and to other classes, no vote is added. Similarly, when the value of
the decision function = − 1, a positive vote is added to the − 1 class, and to other
classes, no vote is added; when the value of the decision function = 0, then to both +
1 and − 1 classes, a negative vote is added, and to other classes, no votes are added,
and it provides neat information about other classes labeled as 0, like “1-versus-rest”
does.

3 K-Support Vector Classification Regression

Based on the motivation from Vapnik’s SVM theory, [18] for multi-class classifica-
tion, the support vector classification regression machine for K-class classification
(K-SVCR) [12] is a relatively newer method for multi-class classification. It gives
the output in the format {− 1, 0, 1}, i.e., ternary output. So, the K−SVCR combines
both regression and classification in the same machine, and all over the decomposi-
tion state, it maintains the “1-versus-1-versus-rest” organization for all the training
data instances (Fig. 1).
The formulation of K−SVCR as a convex QPP is given as

1
min w2 + c1 (e1t ξ1 + e2t ξ2 ) + c2 e3t (φ + φ ∗ ) (3)
w,b,ξ1 ,ξ2 ,φ,φ ∗ 2

subject to

Aw + e1 b ≥ e1 − ξ1
− (Bw + e2 b) ≥ e2 − ξ2
− δe3 − φ∗ ≤ Cw + e3 b ≤ δe3 + φ,
ξ1 , ξ2 , φ, φ ∗ ≥ 0
272 V. P. Srivastava et al.

Fig. 1 Geometrical
representation of K-SVCR.
Multi-class classification
with ternary {−1, 0, + 1}
output

Here, positive slack variables are ξ 1 , ξ 2 , φ, and φ ∗ , c1 , c2 are the penalty parameters,
and e1 , e2 , and e3 are vectors of ones of suitable dimensions. To avoid overlapping,
the positive parameter δ must be less than 1.
The formulation of dual of the primal equation of the above equation can be stated
as
1
max q t α − α t H α, (4)
α 2
subject to

0 ≤ α ≤ F,

here,

Q = At , −B t , C t , −C t , H = Q t Q,
F = [c1 e1 ; c1 e2 ; c2 e2 ; c2 e3 ], w = Q t α

and

q = [e1 ; e2 ; −δe3 ; −δe3 ],

The separating hyperplane f (x) = wt x + b can be found by solving the constraint

optimization problem.
Performance Evaluation of K-SVCR in Multi-class Scenario 273

4 Experimental Design

The section details the experiments conducted to analyze the performance of K-

SVCR in a multi-class scenario. For this purpose, several experiments are conducted
with artificially generated and real-world datasets.
Dataset: All the datasets have three classes. The first dataset, Dataset 1 (Fig. 2a)
is generated with normally distributed random numbers, and the center of the distri-
butions is kept in a line. In the second dataset, Dataset 2, the samples in each class
are normally distributed, but the centers are arranged on vertices of an equilateral
triangle. Data distribution for three-class datasets is shown in Fig. 2b. These two
datasets consist of 100 samples in each category and 300 in all. The model’s perfor-
mance is evaluated on 10% data samples from all classes selected randomly, the
remaining 90%, i.e., 270 samples, are used to train the model.
Further, the effectiveness of the K-SVCR is evaluated on a real-world dataset, Iris
dataset [19]. The dataset is about the Iris flowers with sepal length (SL), sepal width
(SW), petal length (PL), and petal width (PW) as features describing the flower. The
objective of the dataset is to identify the flower class from Iris Setosa, Iris Versicolor,
and Iris Verginica. The experiments are performed by considering one class as a
neutral class and only two features simultaneously (Fig. 5).
Experimental Setup: The experiments were carried out in MATLAB 2022a on a
PC with Intel® Core™ i7-8700CPU@3.20 GHz machine with 32 GB of RAM. The
MATLAB function quadprog.m is used for solving the dual problems of K−SVCR.
The experiments conducted in the article aim to identify the impact of how these
classes will be labeled and which class should be considered a neutral class to gain
effective results. Several experiments are conducted to analyze how the class label
will impact the performance of K-SVCR. In binary class classification problems one-
versus-rest methodologies, one class is considered positive, and the rest samples
are considered negative. But K-SVCR solves three-class classification regression
problems by solving one QPP. It is an essential and challenging task to choose a
neutral class because the hyperplane of neutral class wt x + b = 0 forces the class
sample lying in between the two classes. There are three hyperparameters to optimize
c1 , c2 , and δ.
Experiments are performed with random values of c1 and c2 , while the value of δ is
taken from the range 0 to 1. For case 1, the middle class is labeled as a negative class,
and either side class sample is labeled as neutral and positive. In case 2, the middle
class is labeled as a positive class, and the remaining two are considered negative
and neutral classes. For these two cases, values of c1 and c2 are taken 1 and 10,
respectively, and δ is taken 0.1. Performance of these two cases is 76.67 and 75.66%.
In the Last Case 3, the middle class is considered neutral. Classification performance
is raised by 18%. We achieved 93.33% accuracy for case 3 with the same value of
hyperparameter. It is easy to visualize the sample in two dimensions and label them
accordingly. Still, for real class classification problems, it is challenging to know the
class and neutral class distribution (Fig. 3).
274 V. P. Srivastava et al.

(a) Dataset 1: Linearly arranged dataset

(b) Dataset 2: Triangle arranged dataset

Fig. 2 Dataset distribution of three-class dataset

Performance Evaluation of K-SVCR in Multi-class Scenario 275

Fig. 3 Classification performance of K-SVCR for Dataset 1

Fig. 4 Classification performance of K-SVCR for Dataset 2

In another experiment with Dataset 2, classes are labeled as presented in Fig. 4.

Here, c1 , c2 , and δ are taken 1, 1 and 0.6, respectively. The classification performance
of case 1 is 63.33%, and for case 2, the model is 86.67% accurate. From these
experiments, it is inferred that the labeling of the classes plays a significant role.
Real-life Dataset: This section details the performance of the K-SVCR in a multi-
class scenario on a real-life dataset, namely the Iris dataset [19]. For this purpose,
several experiments are conducted on the Iris dataset.
The iris datasets have four attributes namely sepal length, sepal width, petal
length, and petal width, and three classes namely Iris Setosa, Iris Versicolor, and Iris
Virginica. This dataset has been divided into six datasets by taking two attributes at a
time along with class labels, namely (1) sepal length, sepal width (SLSW), (2) sepal
length, petal length (SLPL), (3) sepal length, petal width (SLPW), (4) sepal width,
petal length (SWPL), (5) sepal width, petal width (SWPW), (6) petal length, petal
width (PLPW). Each of these six datasets are given three different labeling schemes
276 V. P. Srivastava et al.

(a) Neutral = Versicolor (b) Neutral = Setosa (c) Neutral = Verginica

Fig. 5 Classification performance of K-SVCR for Iris dataset (sepal length and sepal width)

on the basis of neutral class chosen. Therefore, one of three classes, Iris Setosa,
Iris Versicolor, and Iris. Virginica are considered as neutral class. The remaining
two classes are labeled positive and negative arbitrarily, so in total, 18 datasets are
considered to understand the impact of choice of labeling in K-SVCR. The diagrams
with accuracy are generated by doing experiments on these datasets for K-SVCR
performance evaluation. These datasets consist of 150 samples in each category and
150 corresponding labels, 450 in all. The model’s performance is evaluated on 10%
data samples from all classes selected, and the remaining 90%, i.e., 405 samples, are
used to train the model.
Experiments are performed with random values of c1 and c2 , while the value of δ
is taken from 0 to 1. We have tuned the different values of c1 , c2 from 1 to 10, and
the value of δ from 0 to 1 for getting better accuracy but preferably values taken as
c1 = 0.1, c2 = 1 and δ = 0.3. By conducting the experiments on above mentioned 18
datasets, the diagrams with accuracy are generated as given in Figs. 5, 6, 7, 8, 9, and
10. For example, consider Fig. 9, sepal length petal width dataset, the accuracies are
93%, 80%, and 73% by keeping the neutral class Iris Versicolor, Iris Setosa, and Iris
Virginica, respectively, Also from Table 1, it is explicit that when we keep Versicolor
as a neutral class, the accuracy is remarkably higher in 5 cases, while for petal length
petal width dataset case, the accuracy is less than the Setosa neutral case. For the
sepal width petal length label (SW PW Label) dataset, the minimum training data
required is 70% for training the model, and the accuracy generated in this case is
remarkably well and is 91%, but for other cases, the training data 90% holds good
and is generating the better accuracy (Table 1).

5 Conclusion and Future Scope

In this article, the support vector classification regression machine for K-class clas-
sification (K-SVCR) is implemented on two artificially generated datasets and also
on 18 different sub-datasets of real-world Iris datasets. The model is based on the
“1-verses-1-verses-rest” structure. It solves a three-class classification problem by
Performance Evaluation of K-SVCR in Multi-class Scenario 277

(a) Neutral = Versicolor (b) Neutral = Setosa (c) Neutral = Verginica

Fig. 6 Classification performance of K-SVCR for Iris dataset (sepal length, petal length)

(a) Neutral = Versicolor (b) Neutral = Setosa (c) Neutral = Verginica

Fig. 7 Classification performance of K-SVCR for Iris dataset (sepal length, petal width)

(a) Neutral = Versicolor (b) Neutral = Setosa (c) Neutral = Verginica

Fig. 8 Classification performance of K-SVCR for Iris dataset (sepal width, petal length)

solving one QPP and represents a unique multi-class classification method that gener-
ates ternary output {−1, 0, 1}. Since it generates ternary output, the performance of
K-SVCR is analyzed with respect to the class label. From the experiments conducted,
it is noted that labeling classes are a challenging task that significantly affects perfor-
mance. For the real-world classification problem, it is essential to know which class
would be the neutral class because the neutral class plays a significant role in better
278 V. P. Srivastava et al.

(a) Neutral = Versicolor (b) Neutral = Setosa (c) Neutral = Verginica

Fig. 9 Classification performance of K-SVCR for Iris dataset (sepal width, petal width)

(a) Neutral = Versicolor (b) Neutral = Setosa (c) Neutral = Verginica

Fig. 10 Classification performance of K-SVCR for Iris dataset (petal length, petal width)

Table 1 Accuracy of K-SVCR on Iris dataset having different 18 subsets

Accuracy in percentage
Iris data type Versicolor neutral Setosa neutral Virginica neutral
SLSW 93 80 73
SLPL 93 80 73
SLPW 93 80 73
SWPL 93 80 80
SWPW 93 80 80
PLPW 60 80 20

performance. It is explicit that when we keep Versicolor as a neutral class, the accu-
racy is remarkably higher in 5 cases, while for the sepal length sepal width dataset
case, the accuracy is equal to Virginica neutral case. Some methods may be explored
to provide labels to classes while solving multi-class classification problems.
Performance Evaluation of K-SVCR in Multi-class Scenario 279

References

1. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.

org/10.1007/bf00994018
2. Yang H, Zhang S, Deng K, Du P (2007) Research into a feature selection method for hyper-
spectral imagery using PSO and SVM. J China Univ Min Technol 17(4):473–478. https://doi.
org/10.1016/s10061266(07)60128-x
3. Burges CJC, Crisp D (1999) Uniqueness of the SVM solution. Neural Inf Process Syst 12:223–
229. https://papers.nips.cc/paper/1735uniqueness-of-the-svm-solution.pdf
4. Win KT, Li K, Chen J, Viger PF, Li K (2020) Fingerprint classification and identification
algorithms for criminal investigation: a survey. Futur Gener Comput Syst 110:758–771. https://
doi.org/10.1016/j.future.2019.10.019
5. Richhariya B, Gupta D (2019) Facial expression recognition using iterative universum twin
support vector machine. Appl Soft Comput 76:53–67. https://doi.org/10.1016/j.asoc.2018.
11.046
6. Heikamp K, Bajorath J (2014) Support vector machines for drug discovery. Expert Opin Drug
Discov 9(1):93–104. https://doi.org/10.1517/17460441.2014.866943
7. Pardalos PM, Pasiliao EL, Vazacopoulos A (2007) Data mining in biomedicine. Springer US
EBooks. https://doi.org/10.1007/978-0-387-69319-4
8. Tanveer M, Richhariya B, Khan RU, Rashid A, Khanna PK, Prasad MNV, Lin C (2020) Machine
learning techniques for the diagnosis of Alzheimer’s disease. ACM Trans Multimed Comput
Commun Appl 16(1s):1–35. https://doi.org/10.1145/3344998
9. Goyal NG, Gupta K (2022) A hierarchical laplacian TWSVM using similarity clustering for
leaf classification. Clust Comput 25(2):1541–1560. https://doi.org/10.1007/s10586-022-035
34-1
10. Hsu C-W, Lin C-J (2002) A comparison of methods for multiclass support vector machines.
IEEE Trans Neural Networks 13(2):415–425. https://doi.org/10.1109/72.991427
11. Kreßel UH (1999) Pairwise classification and support vector machines. MIT Press EBooks, pp
255–268. https://dl.acm.org/citation.cfm?id=299094.299108
12. Angulo C, Parra X, Català A (2003) K-SVCR. A support vector machine for multi-class
classification. Neurocomputing 55(1–2):57–77. https://doi.org/10.1016/s0925-2312(03)004
35-1
13. Angulo C, Català A (2000) K-SVCR. A multi-class support vector machine. In: Machine
learning: ECML. pp 31–38. https://doi.org/10.1007/3-540-45164-14
14. Deng N, Qi Z, Tian Y (2005) A new support vector machine for multi-class classification. In:
The fifth international conference on computer and information technology (CIT’05). https://
doi.org/10.1109/cit.2005.27
15. Ma J, Zhou S, Chen L, Wang W, Zhang Z (2019) A sparse robust model for large scale multi-
class classification based on K-SVCR. Pattern Recogn Lett 117:16–23. https://doi.org/10.1016/
j.patrec.2018.11.012
16. Zhong P, Fukushima M (2007) Regularized nonsmooth Newton method for multi-class support
vector machines. Optim Methods Softw 22(1):225–236. https://doi.org/10.1080/105567806
00834745
17. Pang X, Xu Y, Xiao X (2021) A doubly sparse multiclass support vector machine with simul-
taneous feature and sample screening. IEEE Trans Syst Man Cybern: Syst 51(11):6911–6925.
https://doi.org/10.1109/tsmc.2020.2964718
18. Vapnik V (1995) The nature of statistical learning theory. Springer New York EBooks. https://
doi.org/10.1007/978-1-4757-2440-0
19. Fisher RA (1936) UCI machine learning repository: iris data set. UCI Machine Learning
Repository. https://archive.ics.uci.edu/ml/datasets/iris
An Ensemble Method for Categorizing
Cardiovascular Disease

Mohsin Imam, Sufiyan Adam, Neetu Agrawal (Garg), Suyash Kumar,

and Anjana Gosain

Abstract Recently, machine learning models have become a key methodology in

detection of cardiovascular diseases (CVD). This gives medical practitioners diag-
nostic support and indicators. In this work, we compare various machine learning
(ML) classification algorithms, apply them to disease dataset and examine how these
algorithms perform when subjected to either of the classes to aid in the study and
investigation of CVD through computer-aided diagnosis (CAD). Our two main goals
in this work are to first offer an automated machine learning ensemble model for
categorizing cardiovascular malignancies and second to compare the performance
of several classification algorithms to find the best classifier for the task. The proposed
technique is specifically developed as a potential support for clinical care based on
patient diagnostic data. The proposed approach exhibits an accuracy of 94.28% in the
detection of cardiac illnesses when a thorough examination of binary classification
is performed and averaged over numerous model training iterations. We believe that
incorporating the suggested ensemble methods would produce stable and dependable
CAD systems.

Keywords Healthcare · Machine learning · Ensemble methods · Cardiovascular

diseases

M. Imam · S. Adam
Department of Computer Science, ARSD College, University of Delhi, New Delhi, India
N. Agrawal (Garg)
Department of Physics, University of Allahabad, Prayagraj, U.P., India
S. Kumar (B) · A. Gosain
USICT, GGSIPU, New Delhi, India
e-mail: suyashgarg@gmail.com
S. Kumar
Department of Computer Science, Hansraj College, University of Delhi, New Delhi, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 281
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_24
282 M. Imam et al.

1 Introduction

We currently live in a time where artificial intelligence has a significant influence on

our daily lives. Industry, transportation, and government have all been transformed
by systems based on machine learning (ML) and deep learning (DL) [1]. Machine
learning has recently demonstrated state-of-the-art performance in a variety of disci-
plines, including fraud detection, image identification, and product recommendation.
Similarly, the healthcare domain has benefited the most from machine learning and
deep learning. Healthcare, a sector that has historically been resistant to significant
technological shocks, is now starting to be affected by ML algorithms [2]. The classi-
fication of interstitial lung illnesses, the detection of lung nodules, the reconstruction
of medical images, and the identification of body organs from medical images, to
name a few, have all been recently accomplished with remarkable results using ML
techniques [3].
Worldwide, medical organizations gather information on a range of health-related
areas. To extract insightful knowledge from these data, multiple machine learning
approaches can be used. However, the volume of data collected is massive and
usually very noisy. Despite being too enormous for human brains to process, these
datasets may be easily studied using a variety of machine learning approaches. These
algorithms have lately demonstrated to be quite useful in accurately predicting the
presence or absence of certain diseases and conditions [3].
The most comprehensive solution to class imbalance problems is ensemble
methods. Ensemble techniques combine predictions of various models to enhance
the performance [4]. Due to its strong generalization potential, ensemble learning
has attracted a lot of study attention for more than two decades. In this paper, we have
carried out a comparative analysis of well-known healthcare datasets of heart disease
and applied ensemble machine learning algorithms like random forest, AdaBoost,
and gradient boost along with a few other traditional classification algorithms like
random forest, decision trees, and support vector machines.
The rest of the paper is organized as follows. In Sect. 2, we have briefly discussed
and reviews pertinent and a brief literature review of classic ML algorithms and data
mining techniques in CVD prediction. The research methodologies used in this work
are discussed in Sect. 3. Section 4 provides performance evaluation which includes
accuracy, recall, precision, and AUC score metrics. In Sect. 5, we have discussed and
analyzed the outcomes of prediction results. Finally in Sect. 6, we have concluded
the results obtained.

2 Literature Survey

Numerous studies have been conducted utilizing diverse machine learning and
deep learning methodologies for classifying heart disease and its stages. We have
reviewed some of the significant research on the prediction of CVD including classic
An Ensemble Method for Categorizing Cardiovascular Disease 283

Table 1 Summary of selected literature

Ref. Technique Advantage Limitation Accuracy
(%)
[5] KNN stacking Stacking model Manual disease 73.17
achieved higher labeling may lead to
accuracy compared to bias and limit the
other data mining generalizability of the
techniques results
[6] Ensemble learning High predictive High dimensionality 94.91
combining five classifiers accuracy of dataset/high
computation time
[7] NB + SVM High accuracy Computationally 84.87
costly
[8] iRSF: based on the Diverse dataset with Low accuracy 82.10
traditional RSF features
[9] SVM, NB Good accuracy High computation 84.78
time
[10] Hybridization of RSA and Less complex Low dimensionality 93.33
random forest model to compared to dataset used
improve the performance conventional random
of random forest model forest model
[11] Decision tree, Bayesian High accuracy Lack of proper data 99.2
classification, genetic preprocessing may
algorithm, artificial neural result in overfitting
networks (ANN)
[12] Logistic regression + Higher accuracy Limited variables may 89
random forest affect model
performance
[13] J48-DT, KNN, NB, SMO Limited accuracy No model tuning 83.37

and ensemble-based machine learning and deep learning techniques in this part as
summarized in Table 1.
The limitations and advantages of the proposed methods for CVD diagnosis have
been outlined in Table 1 for better understanding the significance of the proposed
approach. To address the limitations, new methods are needed to accurately detect
CVD.

3 Methodology

Figure 1 shows the workflow of our methodology which includes data preprocessing
for treatment of outlier and skewness of the attributes. Each of these techniques has
been covered in detail as follows:
284 M. Imam et al.

Fig. 1 Experiment workflow with UCI dataset

3.1 Dataset

In this study, we obtain the heart disease dataset from the UCI machine learning
repository heart disease dataset. We have a total of 303 instances, 164 of which
belong to the healthy and 139 to those with cardiac disease with 14 clinical features
are collected for each data record [13].

3.2 Data Preprocessing

In addition to the methods employed, the quality of the dataset and the preprocessing
techniques also influences the performance and precision of the prediction model.
Preprocessing prepares the dataset and transforms it into a format that the algorithm
can interpret. It is possible for datasets to contain errors, missing data, redundancy,
noise, and other issues that render the data unsuitable for direct usage by the machine
learning algorithm. The size of the dataset is an additional issue. Some datasets have a
large number of attributes, which makes it more difficult for the algorithm to examine
the data, detect patterns, and generate correct predictions.
Moreover, many ML models work better when the data is distributed normally
and worse when it is skewed. It is crucial to recognize the skewness that is present
in the features and to carry out appropriate transformations and mappings in order to
convert the skewed distribution into a normal distribution. In our dataset, we apply
logarithmic transformation to the skewed attributes excluding those whose skewness
value is minimal. As a result, the majority of the data for each feature on which
logarithmic transformation is applied, shifts closer to its respective mean, which has
a substantial effect on the skewness value which can be seen in Fig. 2.

Fig. 2 Corrected skewness of numerical valued attributes

An Ensemble Method for Categorizing Cardiovascular Disease 285

Fig. 3 Boxplot: trestbps, chol, and oldpea

Apart from this, a dataset may contain outlier values that deviate from the other
data and exceed the expected range. Figure 3 shows outliers in dataset. We perform
outlier removal on the basis of attributes trestbps and chol which do not lie in the
range.

3.3 Feature Engineering

In order to have effective prediction models with fewer computationally intensive

models, it is advantageous to limit the amount of input attributes in a classifier. Two
of the fourteen features in the dataset, age, and sex are used to identify the patient’s
personal information. As a result, we eliminate these from our dataset. The remaining
eleven features are considered vital since they contain crucial clinical records.
Pearson’s correlation analysis helps to identify the feature in the datasets that is
closely associated with the class feature [14]. Figure 4 depicts the Pearson’s correla-
tion matrix for each feature of the heart disease dataset. It clearly demonstrates that
cp, thalach, and chol have the highest magnitude of correlation values, with cp and
thalach being positively correlated and chol being negatively correlated; therefore,
these are the factors that have the greatest impact on the likelihood of developing
heart disease. None of the features show a high correlation with one another while
display a decent correlation with the target class.

4 Model Training and Disease Classification

This is the most critical step, during which a model for predicting the disease class,
i.e., whether a person has heart disease or not is constructed. For this, we have imple-
mented a number of machine learning methods. Our problem statement is a binary
class classification problem, and the algorithm is a supervised learning method [15]
for classifying incoming observations according to previously established criteria.
286 M. Imam et al.

Fig. 4 Correlation matrix of attributes

The models used to determine whether or not a person has a cardiovascular disease
will all be discussed in the section that follows.
Ensemble-based classification algorithms are one of the most extensively used
classification techniques for class imbalanced problem [16]. Their popularity is a
result of their superior performance as compared to single-learner systems and their
ease of deployment in real-world healthcare applications [17]. Our work focus on
classification using various ensemble models and comparing them with traditional
machine learning methods. We specifically used 10 machine learning models to
classify the presence of CVD, including ensemble models such as AdaBoost, Gradi-
entBoost, XgBoost, LightGBM, random forest, and some other classical machine
learning models such as support vector classifier, decision tree, and K-nearest
neighbors.
We have used fivefold cross-validation to evaluate the efficacy (or accuracy) of
machine learning models. It protects against overfitting in prediction models, partic-
ularly when the amount of data may be limited. Further, we have also used Grid-
SearchCV to specify hyperparameters to fit the estimator model with the best score
on our training dataset. Table 2 displays the hyperparameters that led to the highest
prediction score.
An Ensemble Method for Categorizing Cardiovascular Disease 287

Table 2 Statistical details of the dataset attributes

Model Parameters used
Logistic max_iter = 5000, intercept_scaling = 0.6
regression
Decision tree ‘criterion’: ‘entropy’, ‘max_depth’: 5, ‘min_samples_leaf’: 4, ‘min_
classifier samples_split’: 6
Random forest ‘bootstrap’: False, ‘max_depth’: 2, ‘max_features’: ‘sqrt’, ‘min_samples_
classifier leaf’: 2, ‘min_samples_split’: 2, ‘n_estimators’: 72
Support vector kernel = ‘linear’, probability = True, shrinking = False, cache_size = 200,
classifier verbose = False
Naïve Bayes var_smoothing = 2e−9
KNN n_neighbors = 5, weights = ‘uniform’, algorithm = ‘brute’, metric =
‘minkowski’
AdaBoost ‘algorithm’: ‘SAMME.R’, ‘learning_rate’: 0.1, ‘n_estimators’: 50
Gradient ‘learning_rate’: 0.05, ‘n_estimators’: 75, criterion:’friedman_mse’,
boosting subsample = 1.0
XgBoost ‘learning_rate’: 0.1, ‘max_depth’: 1, ‘n_estimators’: 50
LightGBM ‘learning_rate’: 0.1, ‘n_estimators’: 100, ‘num_leaves’: 20

5 Model Performance Evaluation

The performance of classifiers has been assessed using a variety of performance

evaluation metrics [18]. The performance of the employed machine learning model
framework for heart disease prediction class is evaluated using the four criteria which
are precision, F-measure, accuracy, and receiver operating characteristic (ROC)
curve. The obtained metrics scores are in Table 3 for specified models.
Table 3 shows that the decision tree classifier and KNN models had lower F1-
scores compared to the other models, indicating they may not be as effective in
capturing the relationships between the features and target variable. On the other
hand, the Naive Bayes and RFC models had strong performance in both precision
and F1-score for the positive class. The F1-score is the harmonic mean of precision
and recall and provides a balance between the two metrics. A high F1-score indicates
that the model is able to achieve high precision and recall, which are crucial for binary
classification problems. The RFC, SVC, Naïve Bayes, AdaBoost, Gradient Boost,
XgBoost, and LightGBM models had an F1-score of 0.94 or higher for class 0,
which indicates that they perform well in identifying class 0 samples. The decision
tree classifier and KNN models had lower F1-scores of 0.78 and 0.89, respectively,
which suggests that they may not be as effective in identifying class 0 samples.
Precision, on the other hand, measures the fraction of true positive predictions
made by the model out of all positive predictions. A high precision indicates that
the model has low false positive rates, which is important in binary classification
problems where false positive predictions can lead to serious consequences. All the
models except the SVC model had a precision of 0.94 or higher for class 0, indicating
288 M. Imam et al.

Table 3 Performance metrics table

Models Class Performance metrics
Precision F1-score Accuracy
Decision tree classifier 0 0.78 0.85 86.63
1 0.87 0.86
RFC 0 0.92 0.93 94.01
1 0.96 0.94
SVC 0 0.94 0.92 93.66
1 0.92 0.67
Naïve Bayes 0 0.90 0.91 93.39
1 0.95 0.94
KNN 0 0.89 0.88 91.18
1 0.92 0.92
AdaBoost 0 0.91 0.92 93.66
1 0.94 0.94
Gradient boost 0 0.93 0.90 93.71
1 0.94 0.96
XgBoost 0 0.91 0.92 94.28
1 0.96 0.95
LightGBM 0 0.89 0.90 92.27

that they have low false positive rates for class 0 samples. The SVC model had a
precision of 0.92 for class 0, which suggests that it may have a higher false positive
rate compared to other models.
In terms of class 1, i.e., the person has a heart disease, all the models except
the SVC model had a precision and F1-score of 0.94 or higher, indicating that they
perform well in identifying class 1 samples. The SVC model had a low precision
and F1-score of 0.67 and 0.92, respectively, for class 1, which suggests that it may
not be as effective in identifying class 1 samples compared to other models.
ROC Curve
When the threshold for classifying a sample is changed, the performance of a binary
classifier system is graphically represented by a ROC curve. The true positive rate
(TPR) and false positive rate (FPR) at various categorization levels are plotted on
the ROC curve. The percentage of positive samples that are correctly categorized
as positive is known as the true positive rate, whereas the percentage of negative
samples that are wrongly classified as positive is known as the false positive rate,
while a perfect classifier will have a ROC curve that is a step function from (0, 0) to
(0, 1), a classifier that is randomly guessing will have a ROC curve that is a diagonal
line from the bottom left to the top right of the figure.
An Ensemble Method for Categorizing Cardiovascular Disease 289

Fig. 5 ROC curves for each

specified model based on
their TPR and FPR

They are helpful since they do not take class imbalance into account and give a
visual representation of the trade-off between the genuine positive rate and the false
positive rate. Figure 5 displays the combined ROC curve for each selected model,
and its shows that XgBoost and AdaBoost outperform other models with 95.5%
accuracy.
The models evaluated in this study show promising results for the task of predicting
heart disease. The AdaBoost and XgBoost models showed particularly strong perfor-
mance and could be recommended for use in similar binary classification tasks. The
optimization of the hyperparameters for each model played a crucial role in achieving
the high-performance results, highlighting the importance of tuning these parameters
for improved model performance.

6 Discussion and Conclusion

Our study summarizes the latest developments in the area of cardiovascular disease
classification through the use of ensemble methods. In this paper, we have offered a
comparative analysis of ensemble machine learning-based computational models in
the application of predicting cardiovascular diseases. Our research findings demon-
strate that the use of boosting-based ensemble learning algorithms significantly
outperforms the application of a single classic machine learning algorithm. To build a
robust model for classifying heart disease based on the given attributes in the dataset,
we employed careful feature selection techniques and utilized evaluation techniques
like cross-validation and grid-search CV for optimal hyperparameter tuning.
290 M. Imam et al.

Our proposed frameworks, when used in real-world diagnostics, will assist to

reduce the number of incorrect diagnoses and identify promising treatment options.
The primary contribution of our work is employing variety of ensemble methods
ranging from basic random forest classifiers to specific boosting machines like
XgBoost and LGBM classifiers algorithms. Ensemble learning improves prediction
accuracy because it encourages variation among base learners. All models’ perfor-
mance is assessed using precision, specificity, recall, and F1-score. The comparative
analysis has aided to identify XgBoost and AdaBoost classifier work best in these
scenarios and is proved to be quite accurate in prediction of CVD.

References

1. Chen IY, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M (2021) Ethical machine learning
in healthcare. Ann Rev Biomed Data Sci 4:123–144
2. Qayyum A, Qadir J, Bilal M, Al-Fuqaha A (2020) Secure and robust machine learning for
healthcare: a survey. IEEE Rev Biomed Eng 14:156–180
3. Austin PC, Tu JV, Ho JE, Levy D, Lee DS (2013) Using methods from the data-mining and
machine-learning literature for disease classification and prediction: a case study examining
classification of heart failure subtypes. J Clin Epidemiol 66(4):398–407
4. Kumar S, Kaur P, Gosain A (2022)A comprehensive survey on ensemble methods. In: IEEE
7th international conference for convergence in technology (I2CT), Mumbai, India, 2022, pp
1–7. https://doi.org/10.1109/I2CT54291.2022.9825269
5. Dehkordi SK, Sajedi H (2018) Prediction of disease based on prescription using data mining
methods. Health Technol 9(1):37–44
6. Jan M, Awan AA, Khalid MS, Nisar S (2018) Ensemble approach for developing a smart heart
disease prediction system using classification algorithms. Res Rep Clin Cardiol 9:33–45
7. Venkatalakshmi B, Shivsankar M (2014) Heart disease diagnosis using predictive data mining,
international journal of innovative research in science. Eng Technol 3(3):1873–1877
8. Miao F, Cai Y-P, Zhang Y-X, Fan X-M, Li Y (2018) Predictive modeling of hospital mortality
for patients with heart failure by using an improved random survival forest. IEEE Access
6:7244–7253
9. Lakshmi MS, Haritha D, SRKIT V (2016) Heart disease diagnosis using predictive data mining.
Int J Comput Sci Inf Secur
10. Javeed A, Zhou S, Yongjian L, Qasim I, Noor A, Nour R (2019) An intelligent learning system
based on random search algorithm and optimized random forest model for improved heart
disease detection. IEEE Access 7:180235–180243
11. Soni J, Ansari U, Sharma D, Soni S (2011) Predictive data mining for medical diagnosis: an
overview of heart disease prediction. Int J Comput Appl 17(8):43–48
12. Islam HM, Elgendy Y, Segal R, Bavry AA, Bian J (2017) Risk prediction model for inhospital
mortality in women with ST-elevation myocardial infarction: a machine learning approach. J
Heart Lung 1–7
13. Brahmi B, Shirvani MH (2015) Prediction and diagnosis of heart disease by data mining
techniques. J Multi Eng Sci Technol 2:164–168
14. Benesty J, Chen J, Huang Y (2008) On the importance of the Pearson correlation coefficient
in noise reduction. IEEE Trans Audio Speech Lang Process 16(4):757–765
15. Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning
algorithms. In: Proceedings of the 23rd international conference on Machine learning. pp
161–168
16. Rokach L (2019) Ensemble learning: pattern classification using ensemble methods
An Ensemble Method for Categorizing Cardiovascular Disease 291

17. Che D, Liu Q, Rasheed K, Tao X (2011) Decision tree and ensemble learning algorithms with
their applications in bioinformatics. In: Software tools and algorithms for biological systems.
pp 191–199
18. Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification
evaluations. Int J Data Min Knowl Manage Process 5(2)
Intrusion Detection System for Internet
of Medical Things

Priyesh Kulshrestha, T. V. Vijay Kumar, and Manju Khari

Abstract The healthcare industry is very vulnerable to cyber-attacks. It is now

witnessing attack types that were previously unknown to the stakeholders and at a
greater frequency. Although the scientific development involving humans has been at
the cost of security and privacy, it has nevertheless helped a lot by providing the best
technological and analytical support. In this paper, the architecture of IoMT is studied.
Further, different types of intrusion detection systems are studied, and some of the
established models are compared briefly. It also suggests some emerging technologies
that can be integrated with the existing IoMT systems in order to make them more
secure. Finally, the challenges and limitations of these systems are outlined.

Keywords IoT · IoMT · Cybersecurity · IDS

1 Introduction

Humanity’s curiosity to know more has unearthed immense amounts of data, which
attract multiple adversaries for benefits. In recent years, the Internet of things (IoT)
exposed such data to the Internet and made it more vulnerable. Although intrusion
detection systems exist to cope with vulnerabilities, the data-sensitive healthcare
industry is increasingly facing cybersecurity issues [1]. Internet of medical things
(IoMT) is a subset of IoT that relates to the medical, healthcare, and personal well-
being of an individual and their loved ones. Since the exposure of humans to the
Internet is making them conscious about themselves, the number of monitoring

P. Kulshrestha (B) · T. V. Vijay Kumar · M. Khari

School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi 110067,
India
e-mail: priyes36_scs@jnu.ac.in
T. V. Vijay Kumar
e-mail: tvvk@jnu.ac.in
M. Khari
e-mail: manjukhari@jnu.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 293
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_25
294 P. Kulshrestha et al.

devices over the internet has increased immensely [2, 3] resulting in an increase in the
IoMT cyberthreats. The primary threat is the unknown cyber-attacks, whose signature
is unavailable and therefore remains unrecognizable by the detection mechanisms
[4].
Industries like Manufacturing, Finance, Professional Services, Energy, Retail,
Healthcare, Transportation, Government, Education, Media, etc., are facing cyber-
security issues. According to IBM’s security X force report, the healthcare industry
has jumped from 10th in 2019 to 7th in 2020 and up to 6th in 2021 [5], which also
draws major attention to this industry. People do follow latest market trends but
are not fully aware of the gadgets. They usually take protocols for granted, which
could lead to hazardous situations. Although no system can be fully secure, they are
responsible for preserving the greatest amount of security and privacy.
Cyber-attacks can have an impact both financially or non-financially. Most people
often underestimate theft and take it casually especially when such thefts have no loss
associated, viz. currency value. Devices in the IoMT environment are closer to one
another. This leads to people unknowingly providing windows to adversaries. The
problem is with the devices on edge that are unable to perform rigorous computations
and which require data to be transferred to the cloud. Sniffing personal data and
injecting undesirable data in and out of the system are done while communicating
data on the network. Addressing these flaws in existing mechanisms is the focus of
this paper.
The paper is organized as follows. Section 2 is concerned with the overview
of the IoMT with some use-cases. It defines the components of IoMT and its 3-
layered architecture with proper functioning. Also, a brief taxonomy of intrusion
detection systems (IDS) is given with detail of the previous works related to health-
care cybersecurity. In Sect. 3, the possible integrative technologies with IoMT are
briefly discussed. Subsequently, in Sect. 4, the challenges and limitations such as
the inefficiency of devices to compute, not having signatures of unknown malware,
improper/insufficient datasets, high false alarm rates of IDS, low intrusion detection
rates, non-generic IDS or platform independent malware detectors, etc., are briefly
discussed to open an insight on the present-day picture of malware detection in IoMT.
Section 5 concludes this paper.

2 IoMT

IoMT is an application of IoT that inter-relates medical equipment used for moni-
toring and assessing an individual’s health. Everything focuses on an individual’s
health awareness, monitoring and treatment. The streamlining of services through
the targeting of each such service can generate many use-cases where IoMT can
be of help. Some of the use-cases could be remote patient monitoring, smart
hospital management, self-health management, real-time data analytics, fast emer-
gency services, use of ingestible sensors, etc. Subsequent subsections will provide
an insight to its components, architecture, and intrusion detection systems.
Intrusion Detection System for Internet of Medical Things 295

2.1 Components of IoMT

Wearable sensors play a vital role in capturing sensitive data. The term wireless body
area network (WBAN) got the IEEE standard in Feb 2012; it was coined by Van
Dam in 2001 [6]. WBAN contains sensor nodes that are attached to the living body
to measure bio-signals like heart rate, blood pressure, Spo2, brain signals, etc. These
sensor nodes communicate data in two ways, namely the in-body communication
and the on-body communication. These sensors, which are built for short-range
communication, are always in a slave mode to feed raw data to the master. Through the
Internet, the data is synced back to the servers of service providers like Amazon web
services (AWS), Apple HealthKit, Android HealthKit, etc., for analytical purposes.
In hospitals, the medical devices directly send the Electronic Health Record (EHR)
to their respective authorized (ideally) masters for storage and analysis.
The next subsection provides the detailed architecture of IoMT along with the
workings of each layer.

2.2 3-Layer Architecture

The 3-layered architecture of IoMT is depicted in Fig. 1 [7]. This architecture

comprises the device layer, the fog layer, and the cloud layer. The device layer
comprises of sensors that gather data such as implanted devices, wearables, envi-
ronmental monitoring devices, and equipment’s in hospitals. Most such devices are
made for short-range communication due to their energy limitations. As a result,
these devices have limited ability to comprehensively analyze the collected data
which results in this data to be sent to cloud for further processing. The fog layer acts
as the data transmission layer which has routers and gateways. The cloud layer is the
data processing layer and comprises of server-side data analytics using algorithms
or machine learning models.

2.3 Intrusion Detection in IoMT

Intrusion is an unauthorized access to any digital system with the intention to damage
or gain sensitive information therefrom. An attack that can compromise any of the
confidentiality, integrity, or availability is considered as an intrusion. Systems which
detect intrusions are called IDS. There exist many types of software based IDSs
which can be broadly classified according to their methodology, input data source or
behavior [8] as shown in Table 1.
HIDS runs on an independent device/host and monitors the traffic from the device
only, whereas NIDS is set up in a network to monitor traffic from every device on the
network. Passive IDS primarily just log and notify the possible threat while active
296 P. Kulshrestha et al.

Fig. 1 IoMT architecture [7]

Table 1 Basic classification

Basis IDS
of IDS
Input data source Host-based IDS (HIDS)
Network-based IDS (NIDS)
Behavior Passive IDS
Active IDS
Detection methodology Signature-based IDS (SIDS)
Anomaly-based IDS (AIDS)

IDS suitably change the environment accordingly to block the threat. SIDS detects on
the basis of previously stored patterns, while AIDS tries to detect unknown malware
attacks.
Many types of malware intrusion corrupt the system, namely denial of services
(DoS), distributed DoS (DDoS), SQL injection, malware attacks with both known
and unknown signatures, Botnet attacks, etc. Various intrusion detection models exist
in the healthcare domain to counter such attacks. These are presented in Table 2.
The ML and DL-based IDS are efficient and capable of detecting such attacks
with accuracy. IDS’s capabilities can be enhanced through the merger of new and
innovative technologies. The next section briefly mentions the upcoming technolo-
gies pertaining to IoMT cybersecurity that can be integrated to provide appropriate
features they are proficient in.
Intrusion Detection System for Internet of Medical Things 297

Table 2 Some IDS for IoMT

Objective Used approach Outcomes Dataset(s) Limitations
Design and Implementing a Accuracy = The Parkinson tremor • Device’s high
development of polynomial 93.98% dataset from PhysioNet energy
novel mobile model and consumption
agent based IDS using statistical
for medical regression to
devices [6] detect device
level anomalies
IDS framework Ensemble Accuracy = TON-IoT • Training time
using ensemble design 96.35% for ensemble
learning and combining Precision = learning is
fog-cloud decision tree 90.54% higher than the
architecture [7] (DT), Naïve F1 score = different ML
Bayes (NB), 95.03% used alone
and random False alarm • NB present the
forest (RF) rate = worst result for
5.59% different metric
Detection used to evaluate
rate = ML model
99.98%
Detection of fake Deep learning Accuracy = 70 different • Not
glucose Multilayer 93.98% representative datasets implemented
measurements perceptron Precision = under diabetes category on real-time
and/or command methodology 93.98% of UCI insulin pump
on with Rectified Recall = framework
wireless insulin linear unit 93.98% environment
pump [9] (ReLU) F1-score = • No guidelines
93.98% mentioned if
false glucose
measurement
detected
Designed a N-gram for Accuracy = Synthetic data • Assumed that
system “HEKA” feature 98.4% attacker knows
for network extraction and F1 score = protocols used
intrusion ML K-nearest 98% by PMD to
detection [10] neighbor establish
(KNN), support communication
vector machine
(SVM), RF, DT
Designed Machine Accuracy = Synthetic data • The metric of
framework learning KNN, 91% FPR is not
“Healthgaurd” DT, RF, F1-score = mentioned
based in machine artificial neural 90% • Not specified
learning [11] network (ANN) the use case of
solution
(continued)
298 P. Kulshrestha et al.

Table 2 (continued)
Objective Used approach Outcomes Dataset(s) Limitations
Integrating smart Multilayer Accuracy = Synthetic dataset • Not all metrics
detection engine perceptron 93% with are used
into a firewall or (MLP) and two hidden
Web filter or wavelet neural layers
intrusion network Accuracy =
detectors [12] 90% with
one hidden
layer
Highly scalable 3 hybrid deep Accuracy = Publicly available IoT • This framework
hybrid (deep learning 99.83% dataset (name not is prone to a
learning) algorithms Precision = mentioned) single point of
DL—driven Convolutional 99.43% failure
software defined neural network Recall =
network long short-term 99.73%
(SDN)—enabled memory F1-score =
framework [13] (CNN-LSTM) 99.77%
shows best
performance
Deep neural Compared Accuracy = Benchmark dataset • The dataset is
network-based KNN, deep 99.9% from Kaggle (name not not designed
IDS [14] neural network DNN-PCA mentioned) for IoMT
(DNN), Naïve with gray • Overhead is not
Bayes (NB), wolf gave calculated
RF, SVM the best
modified with result
principal
component
analysis (PCA)
and grey wolf
Fog-based attack Design of a Accuracy = NSL-KDD (network • Overhead is not
detection FBAD 98.19% security calculated
(FBAD) framework Detection laboratory—knowledge • Not all metrics
framework [15] using an rate = discovery in databases) are used
ensemble of 97.09%
online False
sequential positive rate
extreme = 2.04%
learning
machine

3 Integrative Technologies in IoMT

Integrating IoMT with technologies like AI has multiplied its strength in terms of
security and smartness. Some of the technologies that can be integrated into IoMT
are briefly discussed in the following subsections.
Intrusion Detection System for Internet of Medical Things 299

3.1 Blockchain Technology

Blockchains are believed to have been invented by Satoshi Nakamoto in 2008 for
the digital currency bitcoin, which provides an immutable ledger of transactions.
It is a peer-to-peer technology to share data and its computation in a decentralized
manner [16]. Features of blockchain can be used for handling vulnerabilities in
EHRs. For instance, if three stakeholders (Hospital, Government, Insurance provider)
are involved in a task, then every peer node hosts an instance of distributed ledger
consisting of EHRs. It will ensure the tamper-proof digital EHRs. Additionally,
decentralized control can distract adversaries from a single point of attack.

3.2 Physically Unclonable Function (PUF) Devices

PUFs is a physical entity that for a certain given condition and inputs produces a
physically defined digital fingerprint that serves as a unique identifier. It works by
implementing a challenge-response authentication. For instance to check the legiti-
macy of the doctor, an individual or a sensor node before establishing a communi-
cation session between them, one can use PUFs for physical secure authentication
[17].

3.3 SDN in IoMT

SDN is the physical separation or abstraction of the network control plane from the
data forwarding plane. The data plane forwards the network traffic to the destination,
while the control plane manages the tasks involved to make forwarding decisions.
It defines a protocol between the two planes. The SDN can be used to establish
communication between each IoMT connected devices and sensors [18].

3.4 5G Technology

5G (5th generation) technology is expected to have a major impact on the develop-

ment and deployment of new medical devices and systems. It will additionally enable
remote efficient monitoring, telemedicine, and other innovative medical services. The
capabilities of 5G such as high-bandwidth for humongous amount of high resolution
medical data, low-latency for real-time medical treatments, and remote surgery and
network slicing, which allows dedicated slices of network for specific applications,
which could provide the necessary security and optimum performance for the specific
use case [19].
300 P. Kulshrestha et al.

Since no system is fully foolproof, the emergent flaws in them provide suitable
opportunities to further evolve with advancements in technologies. The next section
will discuss the challenges and limitations associated with the established IDSs in
the development of the emerging integrative technologies.

4 Open Challenges in IoMT

The challenge is to make users fully aware and make them learn its remedies. Further-
more, this section discusses the challenges and limitations in context to the resource
providers and developers which are open for approaching future research directions.

4.1 Infallible Secured Systems

The intrusion detection mechanisms discussed in literature are not adequately capable
enough to detect various types of malwares. Mostly, IDS is attack specific or specific
to the environment of IoMT test bed. So, there is a need for robust systems which
can handle different types of attacks simultaneously and effectively.

4.2 Efficient Devices and Algorithms

Since WBAN sensors and devices are meant to be portable and wearable, their size
restriction restricts their in-house power capacity. This impacts their efficiency as
regards their computing strength capacity [6, 20].

4.3 Heterogenous IoMT Environment

IoMT constitutes a wide range of sensors and devices, from fully capable server
machines to just small actuators. Every such device works on different protocols.
Hence, IDS should be designed so that they are able to support every block and piece
effectively [21].

4.4 Scalability

IoMT is a heterogenous type of network with different sizes and protocols of commu-
nication of devices that have a varying resource consumption. Since, for every block
Intrusion Detection System for Internet of Medical Things 301

of protocols, single malware detection mechanism cannot work, IDS should be

scalable with respect to different protocols [14].

4.5 Low Detection and High False Alarm Rates

Although IoMT-IDS is meant to detect every intrusion, it is observed that in most

cases, they have in high detection rate as well as a high false alarm rate and vice-
versa [7, 22, 23]. As a result, there is a need to compute a trade-off between these
functionalities in order to arrive at a desired IoMT-IDS.

4.6 Lesser Availability of Datasets

A literature review is evident from the fact that most IDS is validated on datasets
obtained from IoMT test beds. For better intrusion detection, generic and published
datasets in the domain of IoMT need to be researched [24–26].

4.7 Zero-Day Attacks

Signature-based attacks are easy to rectify. But today unknown attacks, having no
previous signature, are taking place. Zero-day attacks are software vulnerabilities
that are completely unknown to the stakeholders [27, 28]. In healthcare organiza-
tions worldwide, IoMT is experiencing such zero-day attacks, glimpses of which are
reported by the U.S. department of Health and Human Services (HHS) [29].

4.8 Issues with Emerging Technologies

Integration with different technologies strengthens the product and adds more quality
to it even as it inherits its negative consequences. While using blockchain-enabled
systems, scalability is the issue [16, 30]. The number of EHRs will be saturated for
a particular framework of hospitals. Also, the mining cost of day-to-day interactive
logs of wearables will not be cost efficient. Similarly, every device cannot be made
SDN enabled hence difficulty in data gathering [18]. According to Masud et al. [17],
in PUF systems, there are authentication bottlenecks at the satellite broadband pivot
point. In case of 5G in IoMT, the major problems are the high cost of telesurgery
devices and the attached legal requirements [31].
302 P. Kulshrestha et al.

5 Conclusion

AI is rapidly changing the healthcare industry. Smart healthcare systems are achiev-
able through IoMT technology. The interconnectivity of electronics and digital
devices to serve efficiently and smartly has generally led to making a trade-off with its
security. Security challenges could include hacking into EHRs or using AI-powered
malware to disrupt the operation of medical devices. This paper provides an insight
into IoMT and its security. This paper enunciates the 3-layered architecture of IoMT,
which are device, fog, and cloud layer. Further, some of the intrusion detection
mechanisms were discussed based on ML and DL. Based on the literature review,
the DL-based models shown high accuracy in detecting intrusion. The problems
with the existing IDSs are high number of false alarms and not so robust systems and
heterogeneity in IoMT devices and communication protocols. Some of the challenges
encountered were related to lesser availability of published datasets for healthcare
network traffic analysis and scalability of IDS solutions. The analyzed problems can
possibly be resolved using integrative technologies such as blockchain, PUFs, and
SDN-enabled devices for more purposeful intrusion detection and upgraded 5G tech-
nology may prove to be a game changer for more operational and enhanced security
capabilities.

References

1. Ud Din I, Almogren A, Guizani M, Zuair MA (2019) Decade of Internet of Things: analysis

in the light of healthcare applications. IEEE Access 7:89967–89979
2. Eskandari M, Janjua ZH, Vecchio M, Antonelli F (2020) Passban IDS: an intelligent anomaly-
based intrusion detection system for IoT edge devices. IEEE Internet Things J 7:6882–6897
3. Ciklabakkal E, Donmez A, Erdemir M, Suren E, Yilmaz MK, Angin P (2019) ARTEMIS: an
intrusion detection system for MQTT attacks in internet of things. In: Proceedings of the IEEE
symposium on reliable distributed system. pp 369–371
4. Zoppi T, Ceccarelli A, Bondavalli A (2021) Unsupervised algorithms to detect zero-day attacks:
strategy and application. IEEE Access 9:90603–90615
5. IBM (2022) IBM security X-force threat intelligence index. IBM Corporation, pp 42–42. https://
www.ibm.com/downloads/cas/ADLMYLAZ
6. Thamilarasu G, Odesile A, Hoang A (2020) An intrusion detection system for internet of
medical things. IEEE Access 8:181560–181576
7. Kumar P, Gupta GP, Tripathi R (2021) An ensemble learning and fog-cloud architecture-driven
cyber-attack detection framework for IoMT networks. Comput Commun 166:110–124
8. Liu M, Xue Z, Xu X, Zhong C, Chen J (2019) Host-based intrusion detection system with
system calls: Review and future trends. ACM Comput Surv 51
9. Newaz AKMI, Sikder AK, Babun L, Uluagac AS (2020) HEKA: a novel intrusion detection
system for attacks to personal medical devices. In: IEEE conference on communications and
network security (CNS)
10. Rathore H, Al-Ali A, Mohamed A, Du X, Guizani M (2017) DLRT: deep learning approach
for reliable diabetic treatment. In: IEEE global communications conference GLOBECOM
2017—Proceedings 2018-January, pp 1–6
Intrusion Detection System for Internet of Medical Things 303

11. Newaz AI, Sikder AK, Rahman MA, Uluagac AS (2019) HealthGuard: a machine learning-
based security framework for smart healthcare systems. In: 6th international conference on
social networks analysis management and security (SNAMS). pp 389–396
12. Al-Shaher MA, Hameed RT, Ţăpuş N (2017) Protect healthcare system based on intelligent
techniques. In: 4th international conference on control, decision and information technologies
(CoDIT). pp 421–426
13. Khan S, Akhunzada A (2021) A hybrid DL-driven intelligent SDN-enabled malware detection
framework for Internet of Medical Things (IoMT). Comput Commun 170:209–216
14. Swarna Priya RM, Maddikunta PKR, Parimala M, Koppu S, Gadekallu TR, Chowdhary CL,
Alazab M (2020) An effective feature engineering for DNN using hybrid PCA-GWO for
intrusion detection in IoMT architecture. Comput Commun 160:139–149
15. Alrashdi I, Alqazzaz A, Alharthi R, Aloufi E, Zohdy MA, Ming H (2019) FBAD: fog-based
attack detection for IoT healthcare in smart cities. In: IEEE 10th annual ubiquitous computing,
electronics and mobile communication conference UEMCON. pp 0515–0522
16. Dilawar N, Rizwan M, Ahmad F, Akram S (2019) Blockchain: securing internet of medical
things (IoMT). Int J Adv Comput Sci Appl 10:82–89
17. Masud M, Gaba GS, Alqahtani S, Muhammad G, Gupta BB, Kumar P, Ghoneim A (2021)
A lightweight and robust secure key establishment protocol for internet of medical things in
COVID-19 patients care. IEEE Internet Things J 8:15694–15703
18. Liaqat S, Akhunzada A, Shaikh FS, Giannetsos A, Jan MA (2020) SDN orchestration to combat
evolving cyber threats in internet of medical things (IoMT). Comput Commun 160:697–705
19. Mishra L, Vikash, Varma S (2021) Seamless health monitoring using 5G NR for internet of
medical things. Wireless Pers Commun 120
20. Khan FA, Haldar NAH, Ali A, Iftikhar M, Zia TA, Zomaya AY (2017) A continuous change
detection mechanism to identify anomalies in ECG signals for WBAN-based healthcare
environments. IEEE Access 5:13531–13544
21. Aldhaheri S, Alghazzawi D, Cheng L, Alzahrani B, Al-Barakati A (2020) DeepDCA: novel
network-based detection of IOT attacks using artificial immune system. Appl Sci 10
22. Begli M, Derakhshan F, Karimipour H (2019) A layered intrusion detection system for critical
infrastructure using machine learning. In: Proceedings of 7th international conference on smart
energy grid engineering (SEGE). pp 120–124
23. Salem O, Alsubhi K, Mehaoua A, Boutaba R (2021) Markov models for anomaly detection
in wireless body area networks for secure health monitoring. IEEE J Sel Areas Commun
39:526–540
24. Rbah Y, Mahfoudi M, Balboul Y, Fattah M, Mazer S, Elbekkali M, Bernoussi B (2022) Machine
learning and deep learning methods for intrusion detection systems in IoMT: a survey. In: 2nd
international conference on innovative research in applied science, engineering and technology
(IRASET)
25. Sun Y, Lo FPW, Lo B (2019) Security and privacy for the internet of medical things enabled
healthcare systems: a survey. IEEE Access 7:183339–183355
26. Si-Ahmed A, Al-Garadi MA, Boustia N (2022) Survey of machine learning based intrusion
detection methods for internet of medical things
27. Roumani Y (2021) Patching zero-day vulnerabilities : an empirical analysis. 1–13
28. Tang R, Yang Z, Li Z, Meng W, Wang H, Li Q, Sun Y, Pei D, Wei T, Xu Y, Liu Y (2020)
ZeroWall: detecting zero-day web attacks through encoder-decoder recurrent neural networks.
In: Proceedings—IEEE INFOCOM. pp 2479–2488
29. Razdan S, Sharma S (2021) Internet of medical things (IoMT): overview, emerging technolo-
gies, and case studies. IETE Tech Rev (Institution Electron Telecommun Eng India)
30. Esposito C, De Santis A, Tortora G, Chang H, Choo KKR (2018) Blockchain: a panacea for
healthcare cloud-based data security and privacy? IEEE Cloud Comput. 5:31–37
31. Li J, Yang X, Chu G, Feng W, Ding X, Yin X, Zhang L, Lv W, Ma L, Sun L, Feng R, Qin J,
Zhang X, Gou C, Yu Z, Wei B, Jiao W, Wang Y, Luo L, Yuan H, Chang Y, Cai Q, Wang S,
Giulianotti PC, Dong Q, Niu H (2022) Application of improved robot-assisted laparoscopic
telesurgery with 5G technology in urology. Eur Urol 83:41–44
Veracity Assessment of Big Data

Vikash and T. V. Vijay Kumar

Abstract The inconsistent nature of Big Data drives the interest of research commu-
nity for devising new techniques for assessing, predicting and computing the veracity
of Big Data. The inconsistencies in Big Data is primarily due to limited number of
authorized sources. Moreover, social media platforms and web-based application
are the main source of such inconsistencies of Big Data. If our data is not reliable,
then the very purpose of analyzing the Big Data would be compromised. Outcomes
derived from such analysis would be of virtually no relevance. So before analyzing
the data and finding new insights, one needs to compute its veracity in terms of
its correctness, consistency, reliability, trustworthiness, credibility and authenticity.
This paper focuses on the research that has been carried out to compute the veracity
of Big Data and outlines the research gaps and challenges associated with computing
such data veracity.

Keywords Big data · Veracity · Veracity assessment

1 Introduction

In this age of computing, Big Data is the heart of all the widely established large enter-
prises, social networking websites and IoT/web-based applications. From economic
evolutions to the smartly growing business models, Big Data has become the game
changer capable of tackling challenges that beset business uncertainties. Big Data
caters to the needs of all big companies to sustain in the rapidly evolving trends of
today’s global market. No company can survive without analyzing their customer
data, which continuously grows with time. According to a research, Facebook handles

Vikash (B) · T. V. Vijay Kumar

School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi 110067,
India
e-mail: vikash21_scs@jnu.ac.in
T. V. Vijay Kumar
e-mail: tvvk@jnu.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 305
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_26
306 Vikash and T. V. Vijay Kumar

500 TB of data each day [1]. This data is in structured, unstructured and semi-
structured form. It would not be possible to handle such a huge amount of data using
traditional methods. For example, analyzing the buying patterns of customers can
help a business model to predict changes in the demands of customers in order to
optimize the inventories accordingly. Big Data has numerous applications such as
stock market prediction, weather forecasting, education, customer behaviour predic-
tion, etc. In the past few years, out of the four V’s of Big Data, researchers have
delved the importance of veracity of Big Data. Veracity of Big Data needs to be
addressed very urgently because the social network and IoT/web-based networks are
generating the data at humongous rates which is chock full of inconsistencies and
uncertainties. So, veracity of data is a huge challenge in comparison with the other
V’s of Big data. Veracity of Big data is discussed next.

1.1 Veracity of Big Data

The original 3 V’s of Big Data were originally defined by [2]. Then an IBM employee
was the first source found to coin the term veracity as the fourth V [3]. After veracity
was adopted by IBM, it began appearing in Big Data research in 2013. The amount
of research work done in this area is very limited. Data veracity does not have
a unified definition. Some researchers have defined data veracity in terms of data
uncertainty due to data inconsistency and incompleteness [4]. McArdle and Rob
defined veracity in terms of authenticity, reliability and precision of collected data
[5]. Others have defined it as data correctness [6]. Some other researchers have
defined it as trustworthiness, completeness, consistency, integrity between the data
and its resources.
Thus, veracity of Big Data can be defined as trustworthiness, correctness, relia-
bility, consistency, authenticity and completeness of data. Veracity is somewhat of
a broader domain to work on which may include credibility of information, authen-
ticity of information, consistency, precision of data, trust computation of information
and reliability of information. Among these, trust is a multidisciplinary topic for
research. Several fields such as sociology, philosophy, automation and computing
and networking have defined trust as under:
• Sociology: Subjective probability of the trustor that the trustee will not indulge
in an action that hurts the trustors’ interest under uncertainty [7].
• Philosophy: Trust is a moral phenomenon and the violation of moral behaviour
leads to distrust [8].
• Automation: Trust is basically a situation where one agent will try to achieve
another agent’s goal under vulnerability and uncertainty [9].
• Computing and Networking: An agent trust is a subjective probability that
another agent or human will exhibit behaviour in a reliable manner under certain
risk [10].
Veracity Assessment of Big Data 307

According to research, around 80% of Big Data is uncertain [11]. U.S. spends
$3.1 trillion every year because of poor data quality [12]. In the year 2016, 62% of
U.S. adults got news from social media, which was an increase from 49% in 2012
[13]. Every smartphone user generates and consumes data through social media and
web-based applications. Facebook or Twitter like social networking platforms can
be used to spread misinformation and/or disinformation just to mislead the mindset
or opinion of an individual or a group of individuals. Several scenarios have been
observed in which the misinformation has caused the depreciation in the share values
of an enterprise and have even impacted on the presidential election in U.S. Various
other incidents have also been recorded in the past which have led to inconclu-
sive investigations and unwanted hassle situations just to mislead the people or in
spreading propaganda. These incidents can harm an individual or a group of individ-
uals or even humanity as a whole. So, the quality of data needs to be assured. Since
the quality of data decides the quality of the analysis, its veracity can be said to be
the most important V from among all the Vs of Big Data.
This paper focuses on the research that has been carried out to compute the veracity
of Big Data and outlines the research gaps and challenges associated with computing
data veracity.
The organization of this paper is as follows. The first section is the introduction.
The second section elaborates the literature review. The third section discussed the
research gaps and challenges in the field of data veracity followed by tentative solu-
tions provided in the fourth section. The fifth section emphasizes the future research
directions. The last section is the conclusion.

2 Literature Review

Several attempts have been made by researchers to address the challenge of predicting
and assessing data veracity. The real-time scenario where data veracity is assessed is
in the case of social networking websites. Attempts have been made to quantify the
veracity of Twitter microblogs and the datasets available for the veracity domain such
as Liar, Facebook Hoax, Buzz Face, Fake News Net, etc. Researchers have worked on
the sentiment analysis of the news content dataset available using machine learning
techniques. Computational statistics, along with artificial intelligence-based machine
learning models, can be used to assess the veracity of Big Data which is considered
to be an NP hard problem. Veracity of data is somewhat a bigger notion to work
on which includes assessing the reliability, credibility, trustworthiness, authenticity,
precision and completeness of the collected data. Most researchers have worked
on the computation of trust which is one of the important aspects of determining
veracity.
Crowdsourcing techniques have been used to assess Big Data veracity [6]. Crowd-
sourcing is basically a technique in which a group of people share their interest for
achieving a common goal to solve a problem. In [6], an app called “TAG ME” is
developed through which people are going to tag tweets from their smartphones based
308 Vikash and T. V. Vijay Kumar

on three categories, i.e. positive, negative and neutral. This information is saved and
passed to the Bayesian predictor to train the classifier. Other than this, a different
Bayesian predictor has been used on a verified dataset using a trinomial function.
It was shown that crowdsourcing techniques perform reasonably well in assessing
veracity of Big Data. In [14], a method that computes the emotional weights of
the news content from “The Star and The Onion” news dataset is proposed. First,
emotions from the news were identified using Emolex followed by finding the weight
of the emotions of each news content using the following expression [14]:

1.0 ∗ v
W = (1)
max value of(v)

where v is the value of emotional state and max value of (v) is the value of the highest
emotional state conveyed in the news. The computed weights become an input to the
input layer of the multilayer perceptron, which classifies the news as true or false. It
was observed that fake news is dominated by emotions such as anger, sadness, joy,
etc. In [15], a web-based application called Backdrop is presented with which a user
can interact with different knowledgebase with the aim of finding out which particular
claims can be considered more trustworthy. Backdrop was designed to annotate infor-
mation and semantics found online to check how the veracity of a statement differs
with claims. In [16], three approaches for assessing the veracity of data have been
outlined, namely the implicit, the explicit and the authoritative approach. The implicit
approach takes into consideration the fact that implicitly the truthful statement differs
from the untruthful statement not only in terms of its content and expression, but also
in the context they have been used for assessing data veracity. The explicit approach
needs an outside data to assess the veracity of data. Finally, the authoritative approach
needs legitimate authority to verify the claim with the objective to find out its credi-
bility. In [17], veracity has been defined using three dimensions, namely objectivity/
subjectivity; truthfulness/deception and credibility/implausibility. Using these three
dimensions, the veracity index is computed. The veracity index gives a systematic
view in assessing the quality of Big Data especially in the case of textual data. In
[18], assessing, predicting and improving data veracity has been presented in three
different contexts, namely for social media networks, for web applications and for
Internet of Things applications. Semantic analysis and Twitter influence have been
used to assess data veracity. In [10], all the factors and constructs affecting the trust
between the trustor and the trustee are discussed. Further, it emphasizes the impor-
tance of considering scaling of trust at different levels, in contrast to binary levels,
thereby laying a sound foundation for quantifying the veracity of Big Data using
the trust computation. In [19], machine learning models for computing trust either
as a two-class classification problem or a continuous target variable class problem
for social and ad-hoc networks is presented. Also, the various properties of trust
like privacy protection and context awareness and various attacks that can occur in
reputation-based models like amazon, which include white washing, bad mouthing,
on off attack, collusion and conflict behaviour attacks, are presented. In [20], how
Veracity Assessment of Big Data 309

the information can be classified into three categories, namely disinformation, misin-
formation and unverified information is presented. These three types of information
are distinguished based on their intention. Disinformation typifies bad intention in
comparison with misinformation and unverified information. Rumours are a type of
misinformation and fake news is a type of a hoax, which is a type of disinformation.
Also, the relationship between the two problems, i.e. disinformation detection and
truth discovery is presented. Further, various approaches to tackle the disinforma-
tion detection and truth discovery such as traditional (feature-based, kernel-based,
graph-based, iteration-based, etc.) and neural network-based (DNN-based, RNN-
based, CNN-based) approaches for disinformation detection is discussed. In [21],
the four steps for analyzing rumours have been suggested. These are data annota-
tion, rumour classification and prediction, rumour diffusion and rumour visualization.
Data annotation is mainly concerned with how the rumours are spreading through
the network and how one can verify and/or validate/invalidate the rumours. Rumour
classification and prediction basically addresses the task of how one can analyze and
predict the source tweet as rumours by looking at the retweets, i.e. veracity prediction
using the stance classification. Rumour diffusion basically refers to how the infor-
mation is being spread out through the network and how it can be tackled. Finally,
rumour visualization is concerned with visualizing the rumour using the GRIP system
and dynamic graph drawing-based methods. In [22], a probability-based solution
for tackling the problem of trust computation is proposed. At random, 50 samples
from a database were taken and the trust was computed for different weights of the
predefined trust factors like “personal information sharing, known in person, mutual
friends, common interest, past conversation history”. They have simply computed
the weighted arithmetic mean using these trust factors to compute the trust for a user
in the network and total average trust value for the whole network using the binomial
probability. In [23], user-based and item-based collaborative filtering have been used
to compute similarity to enhance the trust of users towards the network. User-based
collaborative filtering prefer items that have been preferred by similar users whereas,
item-based collaborative filtering prefers items which have been preferred earlier.
In new heuristic similarity model (NHSM), three factors, namely similarity, prox-
imity and significance have been used to compute similarity between users for the
Movielens dataset. In [24], a more specific domain in truth discovery that computes
the hardness of claims using the maximum likelihood estimation is presented. For
computing the source reliability and correctness of a claim, three real world-based
scenarios Oregon shooting, Paris attacks and Baltimore riots on the Twitter social
networking platform were used. In [25], rumour detection and stance classification
were considered together. RNN with multi-task architecture (MTUS, MTES) has
been used to compute veracity. The next section introduces detailed study on the
research gaps and challenges regarding the literature review.
310 Vikash and T. V. Vijay Kumar

3 Research Gaps and Challenges

Research gaps have been mentioned in Table 1 based on the study of papers in groups
of three or four. Only few studies have highlighted the research gaps which laydown
a root-level foundation for future research work and the limitations of the studies
found in the literature review.
The following challenges exist pertaining to assessing data veracity such as
• Lack of models for veracity assessment of data from disparate data sources.
• Multiple definitions and interpretations of data veracity.
• Less number of datasets available for assessing the veracity.
• Lack of scalable solutions for veracity assessment.

4 Tentative Solutions

This section introduces the tentative solutions for the above-mentioned research chal-
lenges. These solutions are proposed by taking into account the multiple definitions
of veracity of data, and the factors which can help to compute the veracity of data,
in a most efficient way. Only few studies have used the advance AI tools to address
the problem of assessment of veracity of Big Data.
• To address heterogeneity of data sources as a problem for assessing the veracity
of data, there is a need to come up with a single conceptual framework which will
suit all domains for computing veracity. This conceptual framework can lead to
such an environment where the computational model of veracity can deal with
different sources of data up to certain extent.
• To cope up with multiple interpretations of veracity as a research challenge, the
research community should define veracity in such a way so that it highlights
and touches every aspect of veracity at fundamental level. These multiple inter-
pretations need to be put concisely in a manner to avoid any domain specific
interpretation of veracity.
• Most of the work is done on textual data of the Twitter microblog using super-
vised machine learning techniques [16]. To address unavailability of datasets as a
research challenge there should be enough datasets available for computing and
comparing the veracity of data among the different approaches. Availability of
datasets can be resolved by using the API for social network and IoT/web-based
network and different graph-based approaches can be used such as extracting
graph of retweet network, interaction-based network from Twitter and other social
networks.
• Scalability of solutions as a research challenge can be addressed by finding
more intact solutions using machine learning tools and computational statistics
to address the problem. Machine learning, computational statistics and proba-
bility can lay the foundation for achieving scalable solutions for problem veracity
assessment.
Veracity Assessment of Big Data 311

Table 1 Research summary and gap analysis

References Description Gap analysis
[16] Three approaches for assessing the • From these papers one gets to know the latest
veracity of data: trends and techniques regarding assessment of
• Implicit veracity of data
• Explicit • Most of the papers have used the supervised
• Authoritative approach machine learning tools for text analysis to
[6] Crowdsourcing to classify twitter compute the veracity of data in terms of trust
• Most papers have not mentioned the data
tweets using Bayesian classifier
acquisition process and the results that they
[17] Veracity Index using three came up with. Only few studies have
dimensions: mentioned the datasets which are published
• Objectivity/subjectivity and openly accessible
• Truthfulness/deception • Other studies have worked using proprietary
• Credibility/implausibility tools which are not available publicly
[14] Emotional weight analysis method to • Veracity itself doesn’t have any unique
train multilayer perceptron to classify definition, as it is defined differently for
the news as true or false different domains
[19] Summarizes the machine learning Basically, feature selection engineering is the
models suitable for computing the most crucial task before training a machine
trust in social networks and different learning model
ad hoc networks and multi-agent • Before studying the trust feature vector, the
systems depending on various trust trust constructs and its properties and how it
properties can be scaled needs to be studied
[22] Computed the trust for some • How to consider the trust features so that
predefined trust factors like “personal properties of trust can be preserved such as
information sharing, known in person, subjectivity, context dependency, dynamicity
mutual friends, common interest, past and asymmetry to differentiate between
conversation history” and assigned trustworthy users and untrustworthy users
• To identify relevant features for computing
some weights. Study used the basic
trust in order to compute the veracity more
weighted arithmetic mean for
precisely
calculating the trust and binomial
• Most of the works are on implicit approaches
probability for calculating the trust for
for assessing the veracity of data in
the whole network
comparison with explicit and authoritative
[10] Covers basically the various aspect of approaches
trust such as construct, factors and
scaling of trust. Identify constructs
that are more important while
devising the computational model for
trust? Study has elaborated the various
properties of trust, research challenges
and future research direction for
modelling the trust
[26] Study has worked over trust feature
vectors which include features such as
“friends of user n, followers of user n,
FF ratio (followers of n/friends of n),
interaction between n and m,
similarity between n and m, etc.”
(continued)
312 Vikash and T. V. Vijay Kumar

Table 1 (continued)
References Description Gap analysis
[20] Addressed the problem of • Most of the work has been done using implicit
disinformation detection and truth feature, i.e. features containing user profile
discovery from a single reference information, emotions, linguistic and
using traditional and neural information propagated on a network
network-based techniques • Social networking websites not only contain
[27] Study suggests that around 51% the text data but also the images, which makes
tweets come with images, which the information appear real or trustworthy
• In certain scenarios some sort of fake images
comprise of visual and statistical
is spread resulting in fake information, which
features
is a severe issue
[24] Study has explored the somewhat • More work on datasets that contain text and
more specific domain in truth images needs to be carried out so that veracity
discovery that computing the hardness assessment, especially in case of social
of claims using the maximum networks, is more precise and accurate
likelihood estimation to compute the • Unavailability of social network datasets make
source reliability and correctness of the quantification process of trust more
claim using the three real world-based complex as it would require use of techniques
scenarios Oregon shooting, Paris that involve human intervention like
attacks and Baltimore riots crowdsourcing or fact checking
[25] Study basically identifies that if one
would approach the rumour detection
and stance classification separately,
comparatively results or the outcomes
for the veracity would be quite
different with one who consider them
jointly
[18] Study has been conducted in such a • Not much work has been carried out on
way that it divides the literature quantification of veracity that preserves the
review in three parts, i.e. veracity of properties of trust like context dependency,
data in social networks, web-based dynamicity, asymmetry and subjectivity
applications and IoT applications • Only few studies have proposed solutions that
[15] Study claims to come up with a are reliable for handling the trust attacks in
web-based application called recommendations or reputations-based
Backdrop, which is used to annotate systems such as bad mouthing, white washing,
information and semantics found collusion and conflict behaviour
• Most of the work has been carried out on
online to check how veracity of
graph-based approach for quantification of
statements differ with claims
trust in social networks that considers
[23] Study claims to use the collaborative neighbour node recommendations, which is
filtering, which is basically of two vulnerable to trust attacks
types, i.e. user-based and item-based. • There is no standard way of scaling the trust
User-based collaborative filtering which is one of the measures for computing
prefers the items that have been veracity of data
preferred by similar users. Item-based
collaborative filtering will prefer the
items which have been preferred
earlier
Veracity Assessment of Big Data 313

5 Future Research Direction

Future work can be done in plenty of ways out of which few important areas have
been mentioned below. Most of the work has been done using computation of trust
which is one of the important constituents of veracity, using different approaches for
quantifying data veracity.
• Veracity is a multidisciplinary topic for research and it has multiple interpretations
which are domain specific. Research work needs to work out a domain independent
conceptual framework for veracity.
• Feature extraction and feature selection can be the next domains in social network
for analyzing what are the features which are independent of the platform, i.e.
platform independent features to compute veracity of data for any social network.
• Only few people have worked with datasets containing images. Mostly posts
on social network forums include images and, therefore, veracity computation
becomes relevant. So, there is a need to define approaches for computing veracity
of datasets that includes images.
• Future work may be done basically for finding more scalable solutions by using
advance machine learning techniques and tools to address the problem of data
veracity.
• Privacy protection needs to be addressed while dealing with, or collecting, the
datasets which contain sensitive user information.
• Some standard datasets need to be published for further analysis and comparison of
results regarding computation of the veracity of data for finding the most efficient
approach.

6 Conclusion

Veracity is going to be the most important area in the field of data science and Big Data
in comparison with the other 3 V’s of Big Data. This paper focuses on the motivation
for computing the veracity of data and its various interpretations in literature. Veracity
deals with trustworthiness and consistency of the information, without which the sole
purpose of analyzing Big Data is defeated. If the data under analysis is not certain
then the insights obtained after performing analysis would be of no relevance. Data
veracity can help in information sorting, aggregating information and information
filtering. Veracity can help people by enriching the credible content for the users
who are connected through online social networks for taking efficient decisions by
having the credible/trustworthy opinion of their friends. Various approaches have
been discussed in this paper for computing the veracity of data and their detailed
research gaps have been emphasized. This paper discussed challenges regarding the
field of data veracity and their tentative solutions mentioned in literature. Future
research directions have also been mentioned in the paper for taking notes for further
research in the field of data veracity. Veracity can give us more promising results
314 Vikash and T. V. Vijay Kumar

and bring about positive changes in people’s lives by enriching the social capital of a
social network. There is a need to make computation of veracity scalable, for which
advance AI and machine learning tools and techniques can be used. After surveying
the literature, it can be gleaned therefrom that data veracity assessing techniques and
tools will be required to meet the challenges posed by the astonishingly increasing
rates of Big data creations, which may help to shape the decision-making in an
effective way. Feature extraction and feature selection can be the next domains in
social network for identifying platform independent features for computing veracity
of data.

References

1. Facebook processes more than 500 TB of data daily. http://www.cnet.com/news/facebook-pro

cesses-more-than-500-tb-of-data-daily/. Last accessed 17 Aug 2021
2. Laney D (2001) 3D data management: controlling data volume. Appl Deliv Strat 4
3. Snow D Adding a 4th V to big data—veracity. http://dsnowondb2.blogspot.com/2012/07/add
ing-4th-v-to-big-data-veracity.html. Last accessed 08 Feb 2023
4. Debattista J, Lange C, Scerri S, Auer S (2016) Linked “Big” data: towards a manifold increase
in big data value and veracity. In: Proceedings 2nd IEEE/ACM international symposium on
big data computing. BDC 2015, pp 92–98
5. Mcardle G, Kitchin R (2016) Improving the veracity of open and real-time urban data. Built
Environ 42(3):457–473
6. Agarwal B, Ravikumar A, Saha S (2016) A novel approach to big data veracity using crowd-
sourcing techniques and Bayesian predictors. In: 15th IEEE international conference on
machine learning and applications (ICMLA). pp 1020–1023
7. Gambetta D (2000) Can we trust? Trust: making and breaking cooperative relations, Electronic
edition, vol 13. Department of Sociology, University of Oxford, pp 213–237
8. Rotter JB (1980) Interpersonal trust, trustworthiness, and gullibility. Am Psychol 35(1):1–7
9. Lee JD, See KA, City I (2004) Trust in automation: designing for appropriate reliance. Hum
Factors 46(1):50–80
10. Cho JH, Chan K, Adali S (2015) A survey on trust modeling. ACM Comput Surv 48(2):1–40
11. Puget JF (2015) Optimization in the big data age. IBM Analytics
12. Redman TC Bad data costs the trillion per year. https://hbr.org/2016/09/bad-data-costs-the-u-
s-3-trillion-per-year. Last accessed 15 Dec 2022
13. Gottfried BYJ, Shearer E News use across social media platforms. https://www.pewresearch.
org/journalism/2016/05/26/news-use-across-social-media-platforms-2016. Last accessed 17
Sep 2022
14. Tarmizi FAA, Tan PX, Sharif KY, Kamioka E (2019) Online news veracity assessment using
emotional weight. In: ACM international conference proceeding series part F148384. pp 60–64
15. Leblay J, Chen W, Lynden S (2017) Exploring the veracity of online claims with BackDrop. In:
International conference on information knowledge management proceedingspart F131841. pp
2491–2494
16. García Lozano M, Brynielsson J, Franke U, Rosell M, Tjörnhammar E, Varga S, Vlassov V
(2020) Veracity assessment of online data. Decis Support Syst 129
17. Rubin VL (2014) Veracity roadmap: is big data objective, truthful and credible? Adv Classif
Res Online 24:4–15
18. Assiri FY (2020) Methods for assessing, predicting, and improving data veracity: a survey.
ADCAIJ 9:5–30
19. Wang J, Jing X, Yan Z, Fu Y, Pedrycz W, Yang LT (2020) A survey on trust evaluation based
on machine learning. ACM Comput Surv 53(5)
Veracity Assessment of Big Data 315

20. Fan XU (2021) A unified perspective for disinformation detection and truth discovery in social
sensing: a survey. ACM Comput Surv 55(6):33
21. Devi PS, Karthika S (2018) Veracity analysis of rumors in social media. In: 2nd international
conference on computer, communication, and signal processing: special focus on technology
and innovation for smart environment, ICCCSP
22. Yadav P, Gupta S, Venkatesan S (2014) Trust model for privacy in social networking using prob-
abilistic determination. In: International conference on recent trends in information technology,
ICRTIT
23. Garakani MR, Jalali M (2014) A trust prediction approach by using collaborative filtering
and computing similarity in social networks. In: International congress on technology,
communication and knowledge, ICTCK
24. Marshall J, Syed M, Wang D (2016) Hardness-aware truth discovery in social sensing applica-
tions. In: Proceedings 12th annual international conference on distributed computing in sensor
systems, DCOSS. pp 143–152
25. Liu X, Gao J, He X, Deng L, Duh K, Wang YY (2015) Representation learning using multi-
task deep neural networks for semantic classification and information retrieval. In: NAACL
HLT 2015 conference of the North American chapter of the association for computational
linguistics: human language technologies, proceedings of the conference. pp 912–921
26. Zhao K, Pan L (2015) A machine learning based trust evaluation framework for online social
networks. In: Proceedings 2014 IEEE 13th international conference on trust, security and
privacy in computing and communications, TrustCom. pp 69–74
27. Jin Z, Cao J, Zhang Y, Zhou J, Tian Q (2017) Novel visual and statistical image features for
microblogs news verification. IEEE Trans Multimedia 19(3):598–608
The Role of Image Encryption
and Decryption in Secure
Communication: A Survey

T. Devi Manjari, V. Pavan Surya Prakash, B. Gautam Kumar,

T. Veerendra Subramanya Kumar, Khoirom Motilal Singh,
and Barenya Bikash Hazarika

Abstract Information security is a vital tool for protecting the confidentiality and
integrity of digital information. In this paper, we consider different image encryption
techniques based on advanced encryption standard (AES), chaotic system, RSA,
elliptic curve cryptography (ECC), data encryption standard (DES), and hybrid
encryption schemes. Further, the discriminative capability of each encryption scheme
is then examined. Various security analyzes were also considered to show the
effectiveness of the respective models proposed by different researchers.

Keywords Encryption · Decryption · Symmetric · Asymmetric · AES · ECC ·

DES · RSA

1 Introduction

In the modern world, the exchange of digital information has become an integral part
of daily life. Digital data is essential for many aspects of society, from online banking
and e-commerce to social media and communication between government agencies.
However, this reliance on digital information also exposes it to potential threats, such
as unauthorized access, tampering, or interception. To protect the confidentiality
and integrity of digital information, encryption and decryption techniques have been
developed to secure the transmission and storage of data. In this paper, we presented a
theoretical analysis of the role of encryption and decryption in secure communication.
We begin by reviewing the basic concepts of encryption and decryption, including
symmetric and asymmetric key algorithms, cryptographic protocols, and protocols
for secure key exchange. We then examine the strengths and limitations of various
encryption and decryption techniques, including their ability to resist attacks like

T. Devi Manjari · V. Pavan Surya Prakash · B. Gautam Kumar ·

T. Veerendra Subramanya Kumar · K. M. Singh (B) · B. B. Hazarika
Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
e-mail: khmotilal@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 317
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_27
318 T. Devi Manjari et al.

Image Encryption

Private Key Public Key Hybrid

AES DES ECC RSA Chaotic System

Fig. 1 Different types of encryption schemes

brute force, PNSR, SSIM, key space, histogram analysis, etc. The rest of the paper
is organized as follows. Section 2 shows the literature survey. Section 3 shows the
comparative analysis followed by conclusion in Sect. 4 (Fig. 1).

2 Literature Survey

2.1 AES-Based Encryption Schemes

AES is a popular symmetric key block cipher published by NIST, mentioned by Joan
and Vincent [1]. It is known for its security, low cost, and versatility in hardware
and software implementations. Alsaffar et al. [2] proposed two methods for securely
transferring medical images, one using AES-GCM combined with the whirlpool hash
function and ECDSA, and the other using only AES-GCM and ECDSA. Faragallah
[3] developed a secure cryptosystem that combines hashed image LSB watermarking
with AES or RC6 encryption to protect audio data. In this system, plain audio is
transformed into 4 × 4 blocks, XORed with a private image, and then embedded
using LSB watermarking before being encrypted using AES or RC6.

2.2 DES-Based Encryption Scheme

To increase the security of the DES algorithm, Yun-Peng et al. [4] propose incorpo-
rating chaos into the encryption process. Experimental evaluation of the method
revealed that the suggested encryption algorithm offers good security and can
successfully preserve the secrecy of digital photos. The unique encryption method
Dang et al. [5] use is based on chaotic systems and conventional encryption tech-
niques. Before encrypting the image data, the approach uses chaos to randomize it,
thus offering very high security for online image transmission.
The Role of Image Encryption and Decryption in Secure … 319

2.3 Chaos-Based Encryption Schemes

Encryption schemes based on chaotic systems are popular due to their properties,
including ergodicity, simplicity, and resistance to key sensitivity and key space. In
their research [6], Guesmi et al. proposed an image encryption scheme that combines
DNA masking, SHA-256, and the Lorenz system. The plain image is split into
RGB planes, encoded using DNA encoding, and scrambled using chaotic sequences
generated from the Lorenz system. The SHA-256 hash value is changed to a binary
sequence and encoded using DNA encoding. The resulting cipher image is obtained
through DNA decoding. Brindha and Ammasai [7] proposed an image encryption
scheme that uses the Henon map and Lorenz equation with multiple levels of diffu-
sion. This scheme involves dividing the input image into square blocks, confusing
them using the Henon chaotic map, and performing two diffusion operations using the
Lorenz equation and a matrix generated from a complex function applied to the input
image. The final encrypted image is obtained by scrambling the confused-diffused
image using Arnold’s transformation.

2.4 Hybrid-Based Encryption Schemes

Hybrid encryption, proposed by Brindha et al. [8], is a technique used to secure digital
communication by combining symmetric and asymmetric key algorithms. In this
approach, a symmetric key is used to encrypt the data, while an asymmetric key is used
to protect the symmetric key. The use of hybrid encryption offers several advantages
over the use of either symmetric or asymmetric key algorithms alone, including
increased security and efficiency. For example, if an attacker was to compromise the
symmetric key, they would still need to obtain the asymmetric key in order to decrypt
the data. This added layer of protection can be further enhanced by using multiple
keys and algorithms, as proposed by Robshaw and Seurin [9].

2.5 ECC-Based Encryption Schemes

Elliptic curve cryptography (ECC) is a popular method for public key encryption
(PKE), with security based on the elliptic curve discrete logarithm problem (ECDLP).
ECC has been demonstrated to provide high security with smaller key sizes, espe-
cially in constrained environments, as shown in the research of Koblitz [10]. Maria
and Muneeswaran [11] proposed a method for encrypting both text and images using
ECC. For the purpose of generating a private key and a random integer k, they used
a connected linear congruential generator and applied elliptic curve point multipli-
cation to each ASCII value of the text or pixel value of the image to map them to
elliptic curve points. Decryption involves solving the elliptic curve discrete logarithm
320 T. Devi Manjari et al.

problem (ECDLP), but the small order of the elliptic curve used in this scheme may
not provide sufficient security in practical implementations.

2.6 RSA-Based Encryption Schemes

Micciancio and Peikert [12] have written about the security of lattice-based cryp-
tography, which is believed to be more secure than RSA. In today’s world, safe-
guarding data during civilian satellite missions has become crucial, and the use of
encryption techniques is a necessity. However, traditional encryption methods like
DES, RSA, and AES are not suitable for multimedia data transmission. To counter
this, we present a new technique that combines chaotic maps and AES, specifically
designed to secure satellite imagery against illegal use and unauthorized access.
Saranya et al. [13] proposed to implement a high value of exponent in RSA algo-
rithm to improve security. RSA has become the choice algorithm for functions such
as authenticating phonecalls, encrypting credit-card transactions, security e-mail,
and providing Internet security functions.

3 Comparative Analysis

The different encryption techniques previously discussed show how encryption algo-
rithms could protect the data. However, every algorithm has its pros and cons. The
most important criterion for an encryption algorithm is the security and effective-
ness of the algorithm. Considering security, some of the responsible factors are
PNSR, SSIM, the peak-signal-to-noise-ratio (PSNR), and structural-similarity-index
(SSIM) which are applied to identify the image’s accuracy. Key space: Where brute
force attack can be avoided if the minimum requirement key space is fulfilled.
Histogram Analysis: The output cipher produces equal pixel values distribution for
an ideal algorithm. Histogram analysis for different encryption schemes using pepper
image is shown in Fig. 2. Some of the existing state-of-the-art methods are simu-
lated on Mathematica software with a system configuration of intel(R) Core(TM)
i5-1035G1 CPU @ 1.00 GHz. Color images of size 512 × 512 from SIPI image
database [25] are taken as the input for the comparative analysis which is shown in
Table 1. For fair comparison, the key size for all the method in Table 1 is taken as 512
bits. Figures 3, 4, 5, and 6 show the NPCR, UACI, entropy, and PSNR comparisons
for the images—house, pepper, and baboon using different encryption schemes.
Ref. [14–24] are encryption schemes based on chaotic systems, AES, RSA,
ECC, and DES, respectively. In [14], the author proposed a multiple image encryp-
tion where multiple k images are combined to form big images, and encryption is
performed using a chaotic system. In [14, 15], it is advised for the best image encryp-
tion in terms of pixel correlation since it uses a chaotic function logistic map and
a particle swarm. Due to some shortcomings, traditional encryption standards such
The Role of Image Encryption and Decryption in Secure … 321

(a)

(b) (c) (d) (e)

(f) (g) (h) (i)

(j) (k) (l) (m)

Fig. 2 a Pepper image, b, d, f, h, j, l encrypted image of (a) using AES, DES, chaotic system,
hybrid system, ECC, and RSA scheme, respectively, c, e, g, i, k, m corresponding histogram of b,
d, f, h, j, l, respectively

as DES, AES, international data encryption algorithm (IDEA), and blowfish are not
the most suitable options for image encryption. In [16], complex algorithms like
cryptographic mechanisms, etc., cannot be used directly because of the increase of
demand for resources. So, the less complex algorithms are used as a requirement for
IoT to provide security for data communication. The devices are smaller in size and
have limited processing power in IoT architecture. In [19, 20], the proposed method
incorporates security and quality-focused data transfer through the application of
cryptographic and steganographic elements. They used hybrid encryption (RSA and
AES), and the entire security level was increased through the strategic application
of data encryption and data decryption. In parallel, an adaptive genetic algorithm
322 T. Devi Manjari et al.

Table 1 Comparison of various image encryption schemes

Method NPCR (%) UACI (%) Entropy Key space PSNR Speed
Ref. [14] 99.61 33.45 7.999 2286 37.53 Good
Ref. [15] 99.61 30.15 7.972 240 36.67 Good
Ref. [16] 99.57 17.35 7.997 2256 8.397 Average
Ref. [17] 99.59 33.71 7.954 2192 76.94 Average
Ref. [18] 99.61 38.00 7.998 2512 − 41.27 Poor
Ref. [19] 99.79 36.65 7.899 2256 34.10 Poor
Ref. [20] 99.83 23.81 7.992 2224 55.99 Good
Ref. [21] 70.24 35.22 7.854 2192 52.22 Good
Ref. [22] 99.73 33.68 7.999 21232 8.8329 Good
Ref. [23] 99.31 65.26 7.324 232 15.14 Poor
Ref. [24] 98.15 36.41 7.081 256 − 11.39 Poor
Ref. [26] 99.65 33.48 7.997 2181 26.93 Good
Ref. [27] 99.62 33.65 7.997 2279 48.66 Good
Ref. [28] 99.60 33.41 7.999 2256 14.71 Good
Ref. [29] 99.60 33.46 7.148 254 13.46 Good

Fig. 3 NPCR value 120

comparisons for House Pepper Baboon
images—house, pepper, and 100
babbon using different
encryption schemes 80
Values

0
Chaotic AES RSA ECC DES
Encryption Schemes

based on OAPA was developed after discovering the quality preservation compo-
nent of steganography, which is most frequently utilized in safeguarding medical
data. This algorithm raises the least significant bit of embedding over image blocks.
The proposed method is appropriate for a wide range of communication needs,
including cloud communication, the transmission of healthcare data, and communi-
cation between IoT devices. In [21], the author proposed a medical image security
model. For the security of medical imaging, innovative and secure algorithms are
proposed. In [22], the author improved the Rayot’s method, where while converting
large values using base converter, it becomes larger than the prime modulo P and
The Role of Image Encryption and Decryption in Secure … 323

Fig. 4 UACI value House Pepper Baboon

comparisons for 40
images—house, pepper, and 35
babbon using different
encryption schemes 30
25

Values
20
15
10
5
0
Chaotic AES RSA ECC DES
Encryption Schemes

Fig. 5 Entropy value 8.2

comparisons for House Pepper Baboon
images—house, pepper, and 8
babbon using different
encryption schemes 7.8
7.6
Values

7.4
7.2
7
6.8
6.6
Chaotic AES RSA ECC DES
Encryption Schemes

Fig. 6 PSNR value 45

comparisons for House Pepper Baboon
40
images—house, pepper, and
babbon using different 35
encryption schemes 30
Values

25
20
15
10
5
0
Chaotic AES RSA ECC DES
Encryption Schemes
324 T. Devi Manjari et al.

is difficult to decrypt. The authors avoided this flaw by generating the larger integer
always less than that of modulo prime by using inverse modulo operator. In [23],
the primary concept is about an effective chaotic encryption of images based on AT
and SVD. The author proposed the technique in which a plain image is submitted to
AT confusion and SVD diffusion. The result shows the efficiency of the suggested
AT-SVD image cipher. In [24], the author proposed an asymmetric image encryp-
tion algorithm based on ECC and a chaotic system. The proposed algorithm makes
key transmission along with management simple and clear. By analyzing the other
algorithms in the above table, ECC is considered the best encryption method due
to its security, which is strong due to its mathematics, and the security of ECC
which is based on ECDLP. In [26], on the basis of chaotic systems and permutation-
substitution (SP) networks, the author proposed a novel image encryption method.
Diffusion, substitution, diffusion, and permutation are the four cryptographic phases
that make up this process. By analyzing the results, we can say that the suggested
encryption solution has more security, sensitivity, and speed than earlier methods.
In [27], the paper studies an algorithm for full recovery of the plain image from
a ciphered image using permutation and diffusion. A new spatiotemporal chaotic
system is introduced in which a permutation-diffusion mechanism is used to design
an image encryption scheme. A hash function SHA-2 is employed to compute the
hash values used as the initial conditions of a chaotic system. A new pre-modular,
permutation, and diffusion (PPD) cipher is proposed to solve the problem of two bits
being changed at the same time, while the pixel summation is kept unchanged. In
[28], encryption of digital information is necessary to protect it from security threats.
Some of the most widely used encryption algorithms are designed specifically for text
encryption, such as the international data encryption algorithm (IDEA), triple-DES
(3DES), advanced encryption standard (AES), and data encryption standard (DES).
In [29], one of the benefits of chaotic systems is that they are relatively secure when it
comes to communication. This paper proposes two modifications that will enhance
the security of an image cryptosystem, which suffers from some drawbacks. The
first modification introduces a P-box for permutation, and the second introduces a
S-box for substitution. Both modifications are tested in simulations and found to be
successful.

4 Conclusion

This paper gives a brief overview of different image encryption schemes. Symmetric
key encryption schemes such as AES provide fast and better security, but an issue
exists in exchanging the private key. At the same time, asymmetric encryption
schemes like RSA need bigger key size and take more computation time. Further,
ECC outperforms other public key encryption schemes regarding cipher size, key
size, and computation cost. Finally, researchers are working on hybrid or modified
algorithms that can provide better security, smaller key size, and smaller cipher data.
The Role of Image Encryption and Decryption in Secure … 325

References

1. Joan D, Vincent R (2002) Advanced encryption standard. US: National Institute of Standards
and Technology
2. Alsaffar DM, Almutiri AS, Alqahtani B, Alamri RM, Alqahtani HF, Alqahtani NN, Ali AA
et al (2020) Image encryption based on aes and rsa algorithms. In: 2020 3rd international
conference on computer applications & information security (ICCAIS). IEEE, pp 1–5
3. Faragallah OS (2018) Secure audio cryptosystem using hashed image lsb watermarking and
encryption. Wireless Pers Commun 98(2):2009–2023
4. Yun-Peng, Zhang, Liu Wei, Cao Shui-Ping, Zhai Zheng-Jun, Nie Xuan, and Dai Wei-di. Digital
image encryption algorithm based on chaos and improved DES. In: 2009 IEEE international
conference on systems, man and cybernetics. IEEE, pp 474–479
5. Dang PP, Chau PM (2000) Image encryption for secure internet multimedia applications. IEEE
Trans Consum Electron 46(3):395–403
6. Guesmi R, Farah M, Kachouri, Samet MA (2016) A novel chaos-based image encryption using
DNA sequence operation and secure Hash Algorithm SHA-2. Nonlinear Dyn 83(3):1123–1136
7. Brindha M, Ammasai G (2016) Image encryption scheme based on block-based confusion and
multiple levels of diffusion. IET Comput Vision 10(6):593–602
8. Brindha M, Ammasai G (2016) A chaos based image encryption and lossless compression
algorithm using hash table and Chinese Remainder Theorem. Appl Soft Comput 40:379–390
9. Robshaw MJB, Seurin Y (2007) The design and analysis of hybrid encryption schemes. J
Cryptol 20(4):361–396
10. Koblitz N (1987) Elliptic curve cryptosystems. Math Comput 48(177):203–209
11. Maria S, Muneeswaran K (2012) Nonce based elliptic curve cryptosystem for text and image
applications. Int J Netw Secur 14(4):236–242
12. Micciancio D, Peikert C (2020) Satellite image encryption based on aes and discretised chaotic
maps. Autom Control Comput Sci 54(5):446–455
13. Saranya et al (2014) (IJCSIT) Int J Comput Sci Inf Technol 5(4):5708–5709
14. Encoding and chaotic system (2019) Multimedia Tools Appl 78(6):7841–7869
15. Ahmad M, Alam MZ, Umayya Z, Khan S, Ahmad F (2018) An image encryption approach
using particle swarm optimization and chaotic map. Int J Inf Technol 10(3):247–255
16. Nayak MK, Swain PK (2020) MSIT: a modified lightweight algorithm for secure Internet of
Things. In: 2020 IEEE international symposium on sustainable energy, signal processing and
cyber security (iSSSC). IEEE, pp 1–6
17. Ahmad M, Alam MZ, Umayya Z, Khan S, Ahmad F (2018) An image encryption approach
using particle swarm optimization and chaotic map. Int J Inf Technol 10:1–9
18. Mir UH, Singh D, Lone PN (2022) Color image encryption using RSA cryptosystem with a
chaotic map in Hartley domain. Inf Secur J Glob Perspect 31(1):49–63
19. Denis R, Madhubala P (2021) Hybrid data encryption model integrating multi-objective adap-
tive genetic algorithm for secure medical data communication over cloud-based healthcare
systems. Multim Tools Appl 80(14):21165–21202
20. Denis R, Madhubala P (2020) Evolutionary computing assisted visually-imperceptible hybrid
cryptography and steganography model for secure data communication over cloud environment.
Int J Comput Netw Appl 7:208–230
21. Shankar K, Elhoseny M, Dhiravida Chelvi, Lakshmanaprabu SK, Wu W (2018) An efficient
optimal key based chaos function for medical image security. IEEE Access 6:77145–77154
22. Singh KM, Dolendro Singh L, Tuithung T (2022) Improvement of image transmission using
chaotic system and elliptic curve cryptography. Multim Tools Appl 1–22
23. Malladar R, Kunte S (2016) Selective video encryption using Sattolo’s encryption technique.
In: 2016 International conference on electrical, electronics, communication, computer and
optimization techniques (ICEECCOT). IEEE, pp 268–273
24. Afifi A (2019) Efficient Arnold and singular value decomposition based chaotic image
encryption. Int J Adv Comput Sci Appl 10(3)
326 T. Devi Manjari et al.

25. The USC-SIPI Image Database. http://sipi.usc.edu/database/. Accessed 19 May 2022

26. Belazi A, El-Latif A, Belghith S (2016) A novel image encryption scheme based on substitution-
permutation network and chaos. Signal Process 128:155–170
27. Ye G, Huang X (2017) An efficient symmetric image encryption algorithm based on an
intertwining logistic map. Neurocomputing 251:45–53
28. Iqbal N, Khan M, Khurshid K, Hussain I (2022) An efficient hybrid encryption model based
on deep convolutional neural networks, deoxyribonucleic acid computing and chaotic system.
Multim Tools Appl:1–23
29. Hermassi H, Rhouma R, Belghith S (2013) Improvement of an image encryption algorithm
based on hyper-chaos. Telecommun Syst 52:539–549
Reconstructing Masked Face Using GAN
Technique

Chandni Agarwal, Charul Bhatnagar, and Anurag Mishra

Abstract The COVID-19 pandemic has made face recognition and identification a
complex task, as people often cover a significant portion of their face with masks as
a precautionary measure. This creates difficulties for biometric devices and secure
authentication systems, as masks obstruct facial key points that are necessary for
face detection. The presence of masks also presents challenges for face identifica-
tion. There is a shortage of paired and aligned face images that show faces both with
and without masks. This study proposes a framework for reconstructing the occluded
part of the face that is covered by a mask. The GAN-based unpaired image trans-
lation method is used to translate masked face images into unmasked face images
as the reconstructed faces. A synthetic paired face dataset is created to evaluate the
performance of the model in reconstructing the unmasked face from a masked face
and is used to train the proposed GAN-based face reconstruction model. The model
is based on transfer learning and the pix2pix cGAN architecture and the results of
the comparison analysis show that our model outperforms other state-of-the-art face
reconstruction models in both quality and quantity.

Keywords Image inpainting · Face reconstruction · Masked face recognition ·

Generative adversarial network

C. Agarwal (B) · C. Bhatnagar

GLA University Mathura, Mathura, India
e-mail: chandni1972@gmail.com
C. Bhatnagar
e-mail: charul@gla.ac.in
A. Mishra
Department of Electronics, Deendayal Upadhyay College, University of Delhi, Delhi, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 327
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_28
328 C. Agarwal et al.

1 Introduction

Face recognition is being used more and more for security and human interaction
with machines [1, 2]. The population was affected by the COVID-19 outbreak. Iden-
tifying masked faces is a challenge for face recognition systems, even though the
treatment protocol includes wearing a mask. Most facial recognition methods used in
human computer interaction applications fail to recognize masked faces. The tradi-
tional face recognition systems identify faces based on facial landmark detection and
find the facial features missing for identifying a masked face. The issue of occluded
face photos, including masks, has not been fully resolved despite the exponential
development in research studies on face recognition. There is also a lack of a dataset
for masked faces with complex mask sizes and face variation. As a result, recog-
nizing and verifying the identity of individuals wearing masks has become a widely
researched topic, and the need for more advanced facial recognition methods has
arisen.
In the proposed work, we demonstrate the reconstruction of an unmasked face
from a masked face using Generative Adversarial Networks (GANs). The experi-
ments are carried out using the image inpainting or image completion mechanism
for reconstructing the face using Generative Adversarial Network (GAN) [3–7]. To
remove the mask and synthesize the affected regions in detail while maintaining the
overall consistency of the facial structure, we use image interpolation with GAN [8]
on the masked face. The model we have selected for our work is the pix2pix cGAN
[8]-based model. We have experimented with three models in our paper, generative
image inpainting with contextual attention (GC) [6], pix2pix cGAN [8], and other
are customized GAN.
The key contributions of the work are:
• Development of a new approach that uses GANs to automatically remove masks
from faces and precisely reconstruct the concealed areas.
• Creation of a synthetic masked face dataset called “MaskedFace-CelebA-HQ,”
which consists of 29,571 images and is based on the benchmark face dataset
“CelebA-HQ.”
• Presentation of a qualitative and quantitative comparison study of three GAN
models (GAN with contextual attention, Pix2Pix GAN, and the proposed model)
for face reconstruction, with an in-depth analysis of the reconstructed faces in
terms of face recognition accuracy.
The remaining sections of the paper are structured as follows. A review of related
studies is presented in Sect. 2, followed by a detailed description of the proposed
model in Sect. 3. The experimental setup and results are discussed in Sects. 4 and
5, respectively. The paper concludes with a discussion of the findings and potential
future work in Sect. 6.
Reconstructing Masked Face Using GAN Technique 329

2 Related Work

Recently, deep learning GAN-based methods have become popular for a range of
applications, including data augmentation [9], pose estimation [10], object removal,
and image painting [11]. The achievement of this methodology can be credited to its
utilization of unsupervised learning, producing highly detailed and genuine images,
and the robustness of adversarial training.
Object removal techniques that don’t rely on learning [5, 12] have attempted
to tackle occlusions such as sunglasses and other objects by synthesizing the absent
content through finding similar patches from other regions of the image. An approach
was proposed [13] to eliminate occlusion objects from facial photos by modifying
the path priority function with a regularized factor. However, these techniques have
limitations and only work for small holes with limited color and texture variations.
Iizuka et al. [14] presented a learning model based on GANs that can remove
objects and repair the impacted regions. The model has two discriminators (local and
global) to ensure that the reconstructed image is both locally and globally realistic. A
post-processing technique called Poisson blending [15] was applied to avoid visible
seams. This method can handle random damage, but struggles with producing high-
quality photos and creates artifacts when the damage is near the edge of the image.
Zeng et al. [16] developed a controlled image inpainting system by integrating a
deep generative model with closest neighbor-based global matching.
In their work, Boutros et al. [17] created a novel embedding unmasking model
which removes masks by creating a new feature embedding that resembles an
unmasked face using the feature embedding of a masked face as input. Another
study by Din et al. [4] applied image-to-image translation with GAN-based image
inpainting to automatically remove masks from photos. Farahanipad et al. [7]
presented a GAN-based approach to reconstruct masked faces using cycle GAN,
which generated the occluded parts of the face in a realistic manner with promising
results.
In this study, a novel framework for automatically removing masks and recon-
structing masked faces is proposed using image-to-image translation. This approach
addresses the limitations of traditional methods in restoring missing parts of facial
images, taking into consideration different facial angles and expressions.

3 Proposed Method

A new framework is proposed in this study that uses GAN-based image-to-image

translation to automatically reconstruct masked faces and remove masks, producing
a seamless representation of the unmasked face. The process aims to fill in missing
parts and remove masks in a realistic manner, as shown in Fig. 1.
330 C. Agarwal et al.

Fig. 1 Proposed methodology

3.1 Image-to-Image Translation Using Conditional GAN

In the Image-to-Image Translation using Conditional Generative Adversarial

Networks [8], two main architectures are used for the generator and discrimi-
nator, respectively. The generator utilizes a U-net architecture and the discriminator
employs a patchGAN. The training process of the generator is depicted in Fig. 2,
while the training process of the discriminator is illustrated in Fig. 3.
The loss function in architecture is

L c GAN(G, D) = E x,y log log D(x, y) + E x,z [D(x, G(x, z))] (1)

Equation 1 in the paper represents two parts, one for the discriminator and the
other for the generator. The discriminator outputs 1 when the input (x, y) is real, and

Fig. 2 Training procedure

for the generator
Reconstructing Masked Face Using GAN Technique 331

Fig. 3 Training procedure

for the discriminator

outputs 0 when the input is a fake sample generated by the generator (x, G(z)). The
objective of the generator is to learn to produce samples that are more similar to
real samples, and it is trained based on the discriminator. The goal is for the value
of G(z) to be as close to zero as possible, indicating that the generated sample is
similar to a real sample. The discriminator, which changes D(x, G(z)) to 1 from
D(x, G(z)) = 0, contributes to learning the original distribution. Along with fooling
the discriminator, the generator also creates images that are close to the truth by
combining the loss function with an L1 loss. In essence, Eq. 2 demonstrates that the
generator’s additional L1 loss is added to the loss function.

L L1 (G) = E x,y,z y = G(x, z)1 (2)

Therefore, Eq. 3 gives the final loss function as given below:

G ∗ = arg arg G D L c GAN(G, D) + λL l1 (G) (3)

It’s crucial to note that the L1 loss only captures low-frequency image features
and lacks the ability to preserve high-frequency details. To address this issue, the use
of PatchGAN in combination with the Adam optimizer is implemented to enhance
the hazy output [18]. The parameter lambda determines the relative significance of
the two objectives.
332 C. Agarwal et al.

Fig. 4 MaskTheFace[21]: a tool designed to generate a paired face dataset made up of masked and
unmasked images by warping the mask template according to the crucial face landmarks to attain
believable masked faces

4 Experimentation

4.1 Construction of Masked Face Dataset

To our knowledge, there is a limited availability of real-world, pre-processed, and

ready-to-use masked face datasets for performance assessment. While the real-world
masked face dataset (RMFD) [19] is the largest with 24,771 photos, the faces are not
aligned with the source photographs or the corresponding unmasked faces, making
face recognition challenging.
To address this, we created a smaller, pre-processed dataset called “MaskedFace-
CelebA-HQ,” which consists of 29,571 synthetic masked face images. This dataset
was generated based on the benchmark face dataset “CelebA-HQ” [20], using the
“MaskTheFace” [21] tool. This tool warps the mask template based on six crucial
facial landmark positions on the face. Figure 4 shows the process applied to mask
the unmasked face.

4.2 Architecture Implementation

The proposed work for unmasking the masked face uses a modified Pix2PixGAN
model [8]. The architecture for the generative networks is adapted from [8], which
consists of one generator and one discriminator. The generator in the proposed model
is a modified U-Net and includes an encoder with eight downsampling layers and a
decoder with eight upsampling layers.
The encoder is comprised of a series of blocks, where each block consists of a
convolution operation followed by batch normalization (except for the first block)
and leaky ReLU activation.
The architecture of each block in the decoder consists of a Transposed Convolution
layer followed by Batch Normalization and ReLU activation, with Dropout applied
to the first three blocks.
Reconstructing Masked Face Using GAN Technique 333

The last layer of the network produces an output of shape (batch_size, 30, 30, 1),
where each 30 × 30 image patch is used to classify a 70 × 70 portion of the input
image. The discriminator is given two inputs, both having dimensions [256, 256,
3], which are joined together and used to make a judgment on whether the image
is authentic or generated. The discriminator classifies the input and target images as
real while classifying the input and generated images as fake. To calculate losses, the
model is trained over 150,000 epochs with 5862 unpaired, 256 × 256 facial photos
with and without masks at a learning rate of 0.003.

5 Result and Discussion

We now go over the qualitative and quantitative results of our method on real-world
images with masks, as well as how it compares to other prior state-of-the-art image
editing techniques.

5.1 Qualitative Comparison

We conducted a qualitative comparison between our proposed method and two other
approaches, Yu et al. [6] and pix2pix cGAN [8], using real-world test images as
shown in Fig. 5. Our comparison showed that while Yu et al. [6] was able to reduce
artifacts at the margins, it was not capable of fully recovering complex face struc-
tures. The pix2pix cGAN is a conditional GAN that uses real data, noise, and labels
to generate images. It trains using a pair-to-pair image translation method, which
involves using a provided dataset. The cGAN consists of one generator and one
discriminator. We modified the pix2pix cGAN by changing the downsampling layers
of the discriminator and adjusting the hyperparameters.
Additionally, despite the presence of face masks that cover significant facial
features in each test image, our proposed model is capable of effectively removing
the mask and generating output images with a natural appearance and structural
integrity, surpassing the results of other leading image manipulation techniques.

5.2 Quantitative Comparison

5.2.1 Image Quality Metrics

The performance of the proposed method, along with the models by Yu et al. [6] and
Pix2Pix cGAN [8], was assessed using a synthetic masked face dataset of 29,571
images. This dataset was derived from the publicly accessible CelebA-HQ [20]
celebrity face image collection. We evaluated the generated output images using
334 C. Agarwal et al.

Fig. 5 Visual comparison between the proposed method and representative image interpolation
methods. From left to right: input image, a ground truth, b masked face, c GAN with contextual
attention, d pix2pix cGAN, e our proposed method

Table 1 Image quality

Model PSNR SSIM MAE
metrics: SSIM, PSNR, and
MAE GAN with contextual attenuation 28.181 0.9390 0.0168
Pix2Pix GAN 30.633 0.9537 0.0056
Our work 32.021 0.9751 0.0048
The bold shows better results

Structural Similarity (SSIM) [22], PSNR [23], and MAE. SSIM is a full reference
metric that measures the perceptual difference between two similar images. PSNR
[23] is a metric that quantifies the ratio between the maximum possible power of
a signal and the power of any disturbance or noise. As there is no corresponding
ground truth for real images containing masks, we used the synthetic test dataset
created from CelebA-HQ for evaluating image quality metrics. The results of the
comparison between our proposed method, Yu et al. [6] and pix2pix cGAN [8] are
shown in Table 1 which shows the better performance of our model.

5.2.2 Face Recognition

Face recognition is considered to face verification or face identification and is based

on two matching strategies: 1-to-1 and 1-to-many. Research shows the various
face recognition models proposed for face identification and verification. We have
presented the quantitative results of face recognition of reconstructed faces performed
for 1500 faces from the CelebA-HQ masked face dataset using pix2pix cGAN [8]
and the proposed model. The face recognition result is shown in Table 2 using
Reconstructing Masked Face Using GAN Technique 335

Table 2 Face recognition results in terms of precision, recall, and F1-score for reconstructed faces
using two mentioned models
Model Accuracy Precision Recall F1-score
Pix2pix cGAN 0.787 0.727 0.787 0.738
Proposed model 0.824 0.767 0.824 0.784

VGGFace as face descriptor and MTCNN as face detector for the face recogni-
tion task of reconstructed faces. The qualitative assessment is conducted using four
metrics: Accuracy, precision, recall, and F1-score. The accuracy metric calculates
the number of instances that were correctly classified out of the total instances. Preci-
sion, on the other hand, represents the model’s accuracy in terms of the proportion
of true positive predictions out of all positive predictions made by the model. Recall
assesses a classifier’s capacity to identify all positive examples. The F1-score, also
known as the F-score, evaluates a model’s accuracy by combining the precision and
recall into a single metric and is employed to categorize samples as “positive” or
“negative.”
Table 2 demonstrates the better performance of our proposed model over the
existing framework of pix2pix cGAN.

6 Conclusion

In the current work, we present state-of-the-art techniques for non-interactive

mask removal from masked face images. GAN-based image-to-image transformation
reconstructs obscured or obscured parts of the face as realistically as possible. Our
proposed pipeline can be used in a variety of applications, including secure authenti-
cation, cyber forensics, and criminal facial recognition. The results from both quali-
tative and quantitative evaluations indicate that the proposed model outperforms the
state-of-the-art facial reconstruction methods when applied to large facial objects,
such as face masks. In the future, there are ways to enhance the results and make
the model more practical. One possibility is to broaden the masked face dataset by
including actual faces along with the CelebA-HQ benchmark. Another proposal is
to develop a user-friendly graphical user interface for mask removal from faces.

References

1. Manogaran G, Thota C, Lopez D (2022) Human-computer interaction with big data analytics.
In: Research anthology on big data analytics, architectures, and applications. IGI Global, pp
1578–1596
2. Sardar A, Umer S, Rout RK, Wang SH, Tanveer M (2022) A secure face recognition for
IoT-enabled healthcare system. ACM Trans Sensor Netw (TOSN)
336 C. Agarwal et al.

3. Khan MKJ, Ud Din N, Bae S, Yi J (2019) Interactive removal of microphone object in facial
images. Electronics 8(10):1115
4. Din NU, Javed K, Bae S, Yi J (2020) Effective removal of user-selected foreground object from
facial images using a novel GAN-based network. IEEE Access 8:109648–109661
5. Criminisi A, Pérez P, Toyama K (2004) Region filling and object removal by exemplar-based
image inpainting. IEEE Trans Image Process 13(9):1200–1212
6. Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative image inpainting with
contextual attention. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp 5505–5514
7. Farahanipad F, Rezaei M, Nasr M, Kamangar F, Athitsos V (2022) GAN-based face recon-
struction for masked-face. In: Proceedings of the 15th ınternational conference on PErvasive
technologies related to assistive environments, pp 583–587
8. Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adver-
sarial networks. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp 1125–1134
9. Luo M, Cao J, Ma X, Zhang X, He R (2021) FA-GAN: face augmentation GAN for deformation-
invariant face recognition. IEEE Trans Inf Forensics Secur 16:2341–2355
10. Chou CJ, Chien JT, Chen HT (2018) Self adversarial training for human pose estimation. In:
2018 Asia-Pacific signal and ınformation processing association annual summit and conference
(APSIPA ASC). IEEE, pp 17–30
11. Liu G, Reda FA, Shih KJ, Wang TC, Tao A, Catanzaro B (2018) Image inpainting for irregular
holes using partial convolutions. In: Proceedings of the European conference on computer
vision (ECCV), pp 85–100
12. Darabi S, Shechtman E, Barnes C, Goldman DB, Sen P (2012) Image melding: combining
inconsistent images using patch-based synthesis. ACM Trans Graphics (TOG) 31(4):1–10
13. Park JS, Oh YH, Ahn SC, Lee SW (2005) Glasses removal from facial image using recursive
error compensation. IEEE Trans Pattern Anal Mach Intell 27(5):805–811
14. Iizuka S, Simo-Serra E, Ishikawa H (2017) Globally and locally consistent image completion.
ACM Trans Graphics (ToG) 36(4):1–14
15. Zhang L, Wen T, Shi J (2020) Deep image blending. In: Proceedings of the IEEE/CVF winter
conference on applications of computer vision, pp 231–240
16. Zeng Y, Gong Y, Zeng X (2020) Controllable digital restoration of ancient paintings using
convolutional neural network and nearest neighbor. Pattern Recogn Lett 133:158–164
17. Boutros F, Damer N, Kirchbuchner F, Kuijper A (2022) Self-restrained triplet loss for accurate
masked face recognition. Pattern Recogn 124:108473
18. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:
1412.6980
19. Wang Z, Wang G, Huang B, Xiong Z, Hong Q, Wu H et al (2020) Masked face recognition
dataset and application. arXiv preprint arXiv:2003.09093
20. Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality,
stability, and variation. arXiv preprint arXiv:1710.10196
21. Anwar A, Raychowdhury A (2020). Masked face recognition for secure authentication. arXiv
preprint arXiv:2008.11104
22. All about Structural Similarity Index (SSIM): Theory + code in PyTorch. Avail-
able online: https://medium.com/srm-mic/allabout-structural-similarity-index-ssim-theory-
code-in-pytorch-6551b455541e
23. Signal-to-Noise Ratio as an Image Quality Metric. Available online: https://www.ni.com/en-
in/innovations/white-papers/11/peak-signal-to-noise-ratio-as-an-image-quality-metric.html
Brain Cancer Detection Using Deep
Learning (Special Session “Digital
Transformation Era: Role of Artificial
Intelligence, IOT and Blockchain”)

Shivam Pandey and Shivani Bansal

Abstract Mutations cause quickly and uncontrollably, which causes neurolog-

ical disorders. This could cause injury if not addressed in the beginning stages.
There have been countless considerable efforts and positive achievements during the
sector, exact segmentation process remains demanding undertakings. As a result of
changes in cancerous tissue, morphology, and density, diagnosing neurodegenera-
tive illnesses is particularly challenging. Sometimes requires a while to identify a
medical condition, as well as the radiologist’s skillset is crucial. Because as propor-
tion of cases has expanded, the quantity of information required to be processed has
massively enhanced, rendering outdated techniques extremely expensive and useless.
Numerous studies examined into a range of quickly and precisely methods for identi-
fying central nervous system cancers. Deep convolutional techniques had also lately
gained popularity for creating automation that can quickly and reliably classify or
diagnose neurological disorders. The from before the artificial neural network archi-
tecture for categorizing patient data, notably central nervous system diseases, is made
possible. After feature subset adjustment, another InceptionV3-ResnetV2 retraining
model is generated. To enhance medical images diagnostics, this model uses exten-
sively deployed well before frameworks (InceptionV3-ResnetV2), as well as its
outcome is indeed a numeric code 0 or 1. (0: normal, 1: tumor). These testing findings
demonstrate that our CNN predictor outperformed the competition task towards the
CNN’s effectiveness being improved more by CNN accretion input variables, since
the BRATS 2021 Task 1 sample, using CNN obtained 96.98% efficiency.

Keywords Deep neural network · Classifier · Artificial intelligence · Deep

learning · Image processing · Image extraction

S. Pandey
Student, Chandigarh University, Mohali, Punjab, India
S. Bansal (B)
Assistant Professor, Department of Mathematics, Chandigarh University, Mohali, Punjab, India
e-mail: shivani.bansal40@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 337
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_29
338 S. Pandey and S. Bansal

1 Introduction

Electronic healthcare photographs were becoming increasingly important nowadays

for identifying a variety of ailments. As particular, it also is employed in teaching and
science. Digital clinical imaging is becoming increasingly necessary; for instance,
their Institute of Radiography there at Health Center of Zurich created 18,000–20,000
pictures every day in 2012 [1]. Pathology examination writing including photogram-
metry necessitates a precise and productive laptop assessment method. Directly
reviewing computed tomography is indeed, a night before going to bed, imprecise,
and mistake-prone process. The cancer diagnosis has developed into a critical matter
throughout the years, coming through at top six among some of the biggest killers
in the United States. According to estimates, 5.6 million people possess neurolog-
ical disorders, among which 20% were aggressive and 80% are normal [2]. British
American Cancer Society estimates approximately 78,980 persons are expected to be
identified with neurological disorders by 2022, having 55,150 of these abnormalities
being normal and 24,530 of them being aggressive (13,840 men and 10,690 women)
[3]. Cancer cells are the leading reason for cancer-related mortality in both adults and
kids globally, per research [4]. The cancer diagnosis represents the most common
type of neurological cancer. It is the uncontrolled growth of neuronal. Central nervous
system cancers are categorized between general and specific neurological disorders.
The former begins as malignancy elsewhere in the organism and progresses to the
brains, while the latter begins in the central nervous system and typically remains
there [5]. Tumors are available in two distinct varieties: aggressive and normal. A
cancerous lump, on the contrary side, is an especially severe cancer that expands
through one place to next. A cancerous lump is a sluggish form of cancer that is
unable to penetrate neighboring organs [6]. Proper detection of a terminal disease
necessitates a significant amount of time, as well as heavily relies on the radiologist’s
capabilities. This volume of information which needs to be evaluated has increased
dramatically as just a result of the increase in sufferers, making old procedures both
expensive and inaccurate [7]. These challenges were brought on by major changes in
the distribution, morphology, and aggressiveness of an identical tissue or organ tumor
as well as comparable presentations of many other autoimmune disorders. The head
tumor’s incorrect diagnosis can have serious repercussions and lower the participant’s
chance of survival. Developing automatic machine vision techniques has become
more and more popular as a way to get around the drawbacks of human diagnostics
[8] as well as other implementations [9–11]. Currently, a number of laptop diagnos-
tics methods have been devised to clearly detect neurological disorders. Different
methodologies for accurately and quickly recognizing and identifying neurological
disorders have been the topic of extensive study. Nowadays, convolutional technolo-
gies are extensively used to develop automated processes that really can quickly and
effectively diagnose or classify neurological disorders. Once again, facilitates the
categorization of neurological disorders by enabling the utilization of a model that
predates convolutional neural network models [12] and has previously been devel-
oped for a wide range of applications, including notable use cases like ImageNet
Brain Cancer Detection Using Deep Learning (Special Session “Digital … 339

[13]. A faceted deeper network compensates deep learning [14]. The human brain
employs the stochastic gradient descent method to lessen the difference between both
the goal and the real output. However, despite the additional levels, creating neural
network-based artificial intelligence simulations becomes increasingly challenging.

2 Related Work

The advanced deep learning-based brain cancer categorization methods are intro-
duced throughout this part. There are multiple techniques for detecting neurological
disorders depending on deep learning and transfer learning methods.

1. Approaches Based on Deep Learning

A 16-layer VGG-16 deep NN was developed by Srikanth and Suryanarayana

published [15]; it transfers enhanced imagery from such a previous which was before
stage through a convolution operation to extract the characteristics and downsample
these. Their suggested method improved the inter accuracy of brain cancer magnetic
images. Layers that are totally interconnected and data augmentation have been used
to resolve the generalization error issue. Finally, their suggested model produces
the greatest results with a 92% efficiency across 20 brass casings. For something
like a multiple categorization, research writers of [16] presented a neural approach
to identify among three distinct types of malignant tumors, comprising gliomas,
meningiomas, and endocrine cancers.
Researchers extracted features using magnetic resonance data using an approach
which was before deep learning approach. The retrieved characteristics are identified
using recently been demonstrated classifiers. With such an estimated accuracy rate of
93%, this same recommended method exceeds current methods. This original study
performance indicators include accuracy, F-score, memory as well as sensitivity.
Transfers learning approaches are an extremely successful tactic when there are
relatively few specialized in treating, based on the study’s findings. The deep origin
continuous communication was recommended by Ref. [17] in relation to a multiple
classification of cerebral malignancies. Researchers modified their deep residual
V2 network’s convolutional layers by adding a complex network and a soft-max.
The proposed model increases the precision of brain cancer categorization. A freely
released dataset of 3064 images depicting a terminal disease was used to evaluate
the conceptual system. The suggested approach accuracy outperforms cutting-edge
methods by 94.69%.

2. Approaches Using Machine Learning

The technique which determines whether or not there exists a cancer and after-
ward classifying the tumor type was given by Pareek et al. [18]. A total of 150
magnetic resonance imaging scans of the head were used to assess the suggested
technique’s ability to detect central nervous system cancers. This categorization
340 S. Pandey and S. Bansal

procedure used the supervised learning method, as well as the extraction of features
used singular value decomposition. To evaluate the stages of the tumor, researchers
also measured its size. The fresh approach put forth in [19] beats older approaches
and yields successful success. The recommended strategy makes use of packed accel-
erated amazing tools, and distribution of gradients techniques to enhance operations
and create an excluded suite. They employ a support vector algorithm during the
classification phase. A sizable sample has been utilized to evaluate the optimum
parameters. When contrasted to cutting-edge methods, this strategy’s efficiency
is 90.27%. According to analytical outcomes, this approach fared better than the
newest methods. By making use of the potential of classical connections, a quali-
tative completely self-neural system utilizing three phases of classical for localiza-
tion is being proposed [20]. This revolutionary unsupervised qutrit-based counter-
propagation technology, employed internally, replaces the intricate atomic baseline
values predicted in supervisory systems. Using this technique, incremental quantum
superpositions can propagate across all levels of the demonstration.

3. Methods Based on Hybrids

Besides terms of measurement, Khairandish et al. [21] introduced a fusion model

CNN and SVM with cutoff point edge detection. This same combination CNN-
SVM that has been suggested shows overall improved exactness to 94.4959% which
was before systems such as order to retrieve information, ImageNet, and VGGNet
are used to select knowledge in both tumor areas and surrounding structures [22].
Despite the fact that deep traits are essential for diagnosis, certain reduced informa-
tion regarding tumors might have been lost. A deep network is established to under-
stand reduced data as a consequence. The classification of neurological disorders
using the ISLES2015 and BRATS2015 records was suggested as [23]. Regarding
experiments, approaches like VGG16, VGG19, and ResNet50 are included. The
inter classifications is instead built using the algorithms max-pooling, supervised
learning, and SVM-cubic, and success is determined by the highest rank attained
from technique. Considering three different categorization designs, Irmak et al.
[24]’s presented three distinctive convolutions. The very first network design has
a 92.33% prediction performance for detecting brain cancer. About 92.66% of both
the secondary deep CNN construction is accurate. According to the first convolution
layer, there are five different kinds of neurological disorders: hypothalamus, schwan-
noma, glioma, malignant tumors, and healthy. The latest convolution layer correctly
classifies cerebral cancers with a reliability of 93.14%. Inceptionv3, ResNet-50,
GoogleNet, and number of different sources are modern brain methods that outper-
form the proposed CNN architectures. All of the crucial deep neural network
governing equations were immediately recognized by the mesh refinement optimizer.
Appropriate identification outcomes are generated using healthcare information that
have been collected from the respondents.
Brain Cancer Detection Using Deep Learning (Special Session “Digital … 341

3 Methodology Used

The brain cancer classification algorithm is proposed within those varying purposes
on either an efficient version that employs a deep neural network. Figure 1 depicts the
architectural framework of something like the suggested paradigm. Using classifica-
tion is employed to make a malignancy diagnosis. It comprises about an Inception-
ResnetV2 learning algorithm accompanied by something like a predicted optimal
refinement. This outcome is a numeric 0 or 1 (0: healthy, 1: cancer), and now it
makes utilization popular which was before frameworks (Inception-ResnetV2) to
speed up the identification of neurological disorders (Table 1).

1. CNN Improvement of Model Parameters

The chosen training set is covered in this part. These factors which make up a classifier
generally independent of something like the approaches are therefore not determined
out from information. There are two basic types of hyperparameters: (1) those that
determine the show’s architecture, and (2) those that train the structure.

2. Segmenting Data

The assessment can concentrate mostly on region particularly afflicted by malig-

nancy while cutting down on calculation time with a cutoff point edge detection for
automated patchwork generation [24, 25].
Resizing Images
For this stage, photos were downsized toward the pretrained connections key
length of (223 223 3).

Fig. 1 Proper structure of our deep neural approach

342 S. Pandey and S. Bansal

Table 1 Parameter and their

Parameter Value
values used in approach
CNN training options (default)
Rate drop factor 0.1000
Momentum learn 0.9000
L2Regularization 1.000 × 10–4
Learn rate drop period 10
Gradient threshold Inf
Gradient threshold method I2norm
Validation data Imps
Verbose frequency 50
Validation patience Inf
Validation frequency 50
Reset input normalization 1
CNN training options (custom)
Initial learn rate 1.000 × 10–4
Execution environment Gpu
Mini batch size 8
Max epochs 20
Verbose 0
Shuffle Every-epoch
Learn rate schedule Piecewise
Optimizer ADSCFGWO

Data Enhancement
Orientation, luminescence manipulation, horizontally flip, longitudinal switch, as
well as other modifications were added to an optimization procedure of the utilized
photos that will further help the system be so much more effective and avoid being
feature for the perceived task.

3. Histogram Equalization

Histogram equalization is a technique for binarization that boosts contrasted in

pictures. In comparison with the conventional data augmentation, the adaptable tech-
nique calculates numerous scatter plot, each one relating to a distinct pixel intensity,
and afterward utilizes these to transfer the brightness image characteristics. As a
result, it is suitable for enhancing both the discriminative power and the clarity
underlying boundaries within every region of such an image.
Structural analysis
Prior to integration, unwanted non-brain components are removed using feature
extraction. Figure 2 demonstrates the segmentation method in detail.
Brain Cancer Detection Using Deep Learning (Special Session “Digital … 343

Fig. 2 Sample data images results of data enhancement technique

4. Separation Algorithm for 3D U-Net Topology

The structure included in efficient and precise picture categorization is called U-Net
[26]. It consists of a route that expands and a channel that contracts. This contraction
channel follows the conventional coevolutionary neural architecture. An activation
algorithm as well as a 22 highest speed using skip connections are performed after
applying two 33 empire waist convolution operation consecutively, before demodu-
lating. Each step of something like the subsampling procedure doubles the amount of
useful information. A 23-processing element which shrinks the characteristic image’s
size, a combine also with proportionately clipping CNN model again from contrac-
tion route, and 33 pooling layers, every one of which is complemented by something
like activation. Following U-Net [26]’s empirical investigations and use in diagnostic
picture classification in 2015, many U-Net-based alternative systems were already
suggested; 3D U-Net [26] represents prominent of those same. A design of something
like the 3D U-Net is shown in Fig. 2. The algorithm encoder-decoder quality is high
upon that U-previous Net’s design (2D). The decoder module analyzes the incoming
image and conducts morphological operations. A fragmented masked is created by
the related encoder. That approach supervises actual masking generation via directly
modifying. In contrast to 2D U-Net, 3D U-Net uses 3D combination to recover as
344 S. Pandey and S. Bansal

well as reconstruct each characteristic following intake of depth images. Therefore,

in order to avoid inefficiencies, 3D U-Net also adds regularization (Fig. 4).

4 Data Description

Dataset [27] is really the sample that was employed. The substantial quantity of
nonlinear and non-routine routinely obtained inter pictures of glioma, with neurot-
ically patient diagnosed, as well as searchable promoter activation condition are
utilized for preparation, validating, and conducting an experiment. During several
more routinely acquired MRI images were therefore added to the samples used
in Task 1. Experienced research aims develop and validate regression coefficients
descriptions of cancer comment thread between every individual throughout the
learning, validating then assessment collections in order to statistically evaluate
actual predicted tumor clusters.
Data Preparation
As described in Fig. 3, another used dataset was categorized into the following three
groups: 2400 MRI pictures again for given dataset, 500 for the validation set and 50
for the testing dataset.
The subsequent stage is to duplicate the pictures required for information analysis
and visualization, as illustrated. It is made up of two classifications of magnetic

Fig. 3 Structural analysis of

the operations performed
[25]
Brain Cancer Detection Using Deep Learning (Special Session “Digital … 345

Fig. 4 Organization of 3D U-Net

resonance: NO—tumor-free, recorded as 0, and YES—tumor-positive, encoded as

1. Three directories, negative, positive, and prediction—each containing providing
brain magnetic resonance imaging images—make up the dataset (Fig. 5).
As illustrated in Fig. 6, there were four main processes in the preparation of the
information used in this investigation to enhance the scanning techniques.

5 Algorithm for 3D U-Net Division

This segmented procedure uses four courses. The segmented classifications are (green
color), no cancer, non-enhancing brain cancer (red color), and (yellow color). Eventu-
ally, those sessions became divided into three separate sections. Examples of photos
and coverings with something like a malignant terminal disease just in 2021 sample,
using three-dimensional U-net segmented framework has been developed for further
accurate and quick medical image processing. 10% of the sample is used for assess-
ment, 20% for validating, as well as 70% for trained. At various online sources,
functionality is built continuously. Figure 7 illustrates how the approach improves
segmented predictive performance up to 96.98%.
346 S. Pandey and S. Bansal

Fig. 5 Number of counts in each class

Fig. 6 Four main processes

used to enhance the scanning
technique
Brain Cancer Detection Using Deep Learning (Special Session “Digital … 347

Fig. 7 Loss, accuracy, and

validation loss of our
approach

6 Conclusion

A huge number of different academicians examined a wide range of approaches also

aim of quickly and effectively detecting and categorizing brain tumors. Utilizing
a neural network architecture has previously which was before again for cancer
imaging, specifically again for classification of brain cancers, is facilitated by deep
learning. One primary purpose of this investigation would be to create a better frame-
work in order to increase the precision of diagnosis. In this research, a brain cancer
classification model-based neural network was developed. This method utilized to
improve the CNN’s input variables. These trials have been using the classifier, as
well as the findings demonstrate that something that generated the better effect
because the CNN’s accuracy was improved afterward when the optimization was
carried out. Considering the Task 1 sample, the optimization succeeded with such a
96.98% prediction performance. A biggest disadvantage of the suggested technique
is checking points, which is lengthy because of the additional refinement stages.
Due to the small amount of the training examples, it’s possible that it’s not practical,
so we want to address this problem in subsequent research by generalizing larger
information. In addition to classifications, researchers wish to create a forecast in
subsequent research.

References

1. Al-Galal SAY, Alshaikhli IFT, Abdulrazzaq MM (2021) MRI brain tumor medical images
analysis using deep learning techniques: a systematic review. Health Technol 11:267–282
2. Rahman ML, Reza AW, Shabuj SI (2022) An internet of things-based automatic brain tumor
detection system. Indones J Electr Eng Comput Sci 25:214–222
348 S. Pandey and S. Bansal

3. Key Statistics for Brain and Spinal Cord Tumors. Available online: https://www.cancer.org/
cancer/brain-spinal-cord-tumors-adults/about/key-statistics.html. Accessed on 20 Sep 2022
4. Ayadi W, Elhamzi W, Charfi I, Atri M (2021) Deep CNN for brain tumor classification. Neural
Process Lett 53:671–700
5. Liu J, Li M, Wang J, Wu F, Liu T, Pan Y (2014) A survey of MRI-based brain tumor segmentation
methods. Tsinghua Sci Technol 19:578–595
6. Amin J, Sharif M, Haldorai A, Yasmin M, Nayak RS (2021) Brain tumor detection
and classification using machine learning: a comprehensive survey. Complex Intell Syst
8:3161–3183
7. Yang Y, Yan LF, Zhang X, Han Y, Nan HY, Hu YC, Hu B, Yan SL, Zhang J, Cheng DL
et al (2018) Glioma grading on conventional MR images: a deep learning study with transfer
learning. Front Neurosci 12:804
8. Nazir M, Shakil S, Khurshid K (2021) Role of deep learning in brain tumor detection and
classification (2015 to 2020): a review. Comput Med Imaging Graph 91:101940
9. El-Kenawy ESM, Mirjalili S, Abdelhamid AA, Ibrahim A, Khodadadi N, Eid MM (2022)
Meta-heuristic optimization and keystroke dynamics for authentication of smartphone users.
Mathematics 10:2912
10. El-kenawy ESM, Albalawi F, Ward SA, Ghoneim SSM, Eid MM, Abdelhamid AA, Bailek
N, Ibrahim A (2022) Feature selection and classification of transformer faults based on novel
meta-heuristic algorithm. Mathematics 10:3144
11. El-Kenawy ESM, Mirjalili S, Alassery F, Zhang YD, Eid MM, El-Mashad SY, Aloyaydi
BA, Ibrahim A, Abdelhamid AA (2022) Novel meta-heuristic algorithm for feature selection,
unconstrained functions and engineering problems. IEEE Access 10:40536–40555
12. Abdelhamid AA, El-Kenawy ESM, Alotaibi B, Amer GM, Abdelkader MY, Ibrahim A, Eid
MM (2022) Robust speech emotion recognition using CNN+LSTM based on stochastic fractal
search optimization algorithm. IEEE Access 10:49265–49284
13. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich
A (2015) Going deeper with convolutions. In: Proceedings of the 2015 IEEE conference on
computer vision and pattern recognition (CVPR), Boston, MA, USA, 7–12 June 2015
14. Alhussan AA, Khafaga DS, El-Kenawy ESM, Ibrahim A, Eid MM, Abdelhamid AA (2022)
Pothole and plain road classification using adaptive mutation dipper throated optimization and
transfer learning for self driving cars. IEEE Access 10:84188–84211
15. Srikanth B, Suryanarayana SV (2021) Multi-Class classification of brain tumor images using
data augmentation with deep neural network. Mater Today Proc
16. Deepak S, Ameer P (2019) Brain tumor classification using deep CNN features via transfer
learning. Comput Biol Med 111:103345
17. Kokkalla S, Kakarla J, Venkateswarlu IB, Singh M (2021) Three-class brain tumor classification
using deep dense inception residual network. Soft Comput 25:8721–8729
18. Pareek M, Jha CK, Mukherjee S (2020) Brain tumor classification from MRI images and calcu-
lation of tumor area. In: Advances in intelligent systems and computing, Springer, Singapore,
pp 73–83
19. Ayadi W, Charfi I, Elhamzi W, Atri M (2020) Brain tumor classification based on hybrid
approach. Vis Comput 38:107–117
20. Konar D, Bhattacharyya S, Panigrahi BK, Behrman EC (2022) Qutrit-inspired fully self-
supervised shallow quantum learning network for brain tumor segmentation. IEEE Trans Neural
Netw Learn Syst 33:6331–6345
21. Khairandish M, Sharma M, Jain V, Chatterjee J, Jhanjhi N (2022) A hybrid CNN-SVM threshold
segmentation approach for tumor detection and classification of MRI brain images. IRBM
43:290–299
22. Öksüz C, Urhan O, Güllü MK (2022) Brain tumor classification using the fused features
extracted from expanded tumor region. Biomed Signal Process Control 72:103356
23. Kadry S, Nam Y, Rauf HT, Rajinikanth V, Lawal IA (2021) Automated detection of brain
abnormality using deep-learning-scheme: a study. In: Proceedings of the 2021 seventh interna-
tional conference on bio signals, images, and instrumentation (ICBSII), Chennai, India, 25–27
March 2021
Brain Cancer Detection Using Deep Learning (Special Session “Digital … 349

24. Irmak E (2021) Multi-classification of brain tumor mri images using deep convolutional neural
network with fully optimized framework. Iran J Sci Technol Trans Electr Eng 45:1015–1036
25. Saber A, Sakr M, Abo-Seida O, Keshk A, Chen H (2021) A novel deep-learning model for
automatic detection and classification of breast cancer using the transfer-learning technique.
IEEE Access 9:71194–71209
26. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image
segmentation. In: Lecture notes in computer science. Springer International Publishing, Berlin,
Germany, pp 234–241
27. Gupta S, Saini A (2018) An artificial intelligence-based approach for managing risk of IT
systems in adopting cloud. Int J Inf Technol 13:1–9. https://doi.org/10.1007/s41870-018-
0204-2
28. Saber A, Keshk A, Abo-Seida O, Sakr M (2022) Tumor detection and classification in breast
mammography based on fine-tuned convolutional neural networks. IJCI Int J Comput Inf
9:74–84
29. BRaTS 2021 Task 1 Dataset, RSNA-ASNR-MICCAI Brain Tumor Segmentation (BraTS)
Challenge 2021. Available online: https://www.kaggle.com/datasets/dschettler8845/brats-
2021-task1?select=BraTS2021_Training_Data.tar. Accessed on 20 Sep 2022
Traffic Accident Modeling and Prediction
Algorithm Using Convolutional
Recurrent Neural Networks

Anil Kumar, Shiv Kumar Verma, and Subhanshu Goyal

Abstract Predicting traffic accidents is a critical issue for enhancing transportation,

public safety, and also securing routes. This issue has been often complicated by
the rarity of accidents in space and time, as well as the environment’s spatial variety
(e.g., urban vs. rural). Many earlier domains of research on traffic accident prediction
merely applied traditional predictive models to restricted information without effec-
tively tackling the aforesaid barriers, resulting in disappointing results. So, here we
suggest a traffic accident prediction model as well as an analytical framework relying
on Convolutional Recurrent Neural Networks (CRNNs). We have employed a UK
traffic accident dataset in which it was preprocessed and trained by utilizing CRNN.
Various convolution kernels are in charge of extracting different features and new
variables, which are then fed into the edge computing server’s constructed training
model for training and testing purpose. Modeling and estimating the probability
of road accidents, developing traffic accident risk alerting guidelines and transmit-
ting real-time warning notifications to vehicle units are addressed. By altering the
driver’s driving condition in real-time, the driver can prevent traffic fatalities. We
compared our proposed model (CRNN) to other models including Long Short-Term
Memory (LSTM), DenseNet, ResNet, Visual Geometry Group (VGG), and Recur-
rent Neural Networks (RNNs) on performance metrics such as sensitivity, speci-
ficity, accuracy, F-score, and memory utilization, and found that our model provides
improved satisfaction in terms of accuracy (95.10%) and memory utilization (88%).

Keywords Artificial intelligence (AI) · Convolutional recurrent neural networks

(CRNNs) · Internet of Vehicles (IoV) · Traffic accident

A. Kumar (B) · S. K. Verma

Galgotias University, Greater Noida, India
e-mail: anil.20scse3020014@galgotiasuniversity.edu.in
A. Kumar
Deen Dayal Upadhyaya College, University of Delhi, Delhi, India
S. Goyal
Marwadi University, Rajkot, Gujarat, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 351
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_30
352 A. Kumar et al.

1 Introduction

The tremendous growth of vehicles has resulted in a batch of issues that governments
must address quickly and effectively. A few of them, like traffic congestion [1], has
been alleviated since drivers can now view traffic information and choose a less
crowded path to avoid traffic jams using real-time traffic volume data and vehicle
navigation systems depending on GPS. On the other hand, issues like traffic accidents
are difficult to control [2]. According to WHO report, every year, approximately 1.2
million humans end up dying, and 50 million are severely injured in accidents all
around the globe [3]. With so much suffering caused by traffic accidents, it’s critical
to learn what causes them to create a better road environment.
A traffic accident can be caused by a variety of circumstances, including driver
behavior, weather, and road conditions. Even though some research has focused on
the relationship between traffic accidents and these parameters, revealing dynamic
changes in accident risk with these factors is extremely difficult. To be more precise,
driving behavior differs from person to person, making real-time and large-scale
observation difficult. Furthermore, weather conditions are rarely accurately portrayed
in traffic accident scenes [5]. Moreover, the road conditions are far too steady to depict
risk shifts dynamically. Figure 1 depicts the plot of estimated road traffic deaths in
developed countries for past years.
Alongside its advantages, transportation, similar to some other endeavor or pecu-
liarity, has disadvantages and limitations for street clients [2, 4]. Traffic congestion
and related issues are on the ascent all over the planet, representing a danger to
individuals’ lives and property. Helpless traffic offices and conditions bring about
metropolitan contamination, rising fuel, and energy utilization, squandering a long
period of time each day in gridlock, squandering local area administration offices
and public resources, and, at last, the event of mishaps bringing about injury, passing,
and property harm [3, 6].

12.5
Fatalities per 100,000 Vehicles

8.2
7.1
5.5 5.7 5.7
4.4 4.7

Swedan UK Spain Germany Japan Australia France America

Fig. 1 Estimated road traffic deaths (in millions) in developed countries for past years [4]
Traffic Accident Modeling and Prediction Algorithm Using … 353

The Feed-Forward Neural Network is the most prevalent forms of ANN, where the
linear flow of information from the input layer reaches to output layer via the hidden
layer/s. Recurrent Neural Networks (RNNs) [7–9] and Convolutional Recurrent
Neural Networks (CRNNs) [10–12] are the extensions of ANNs. The CRNN is supe-
rior at image processing jobs, whereas the RNN is more applicable in text processing
applications. Due to the various hidden layers structured in their designs, they are
being sometimes termed “Deep Learning Approaches”. There are numerous ambi-
guities and gaps in the data that traditional approaches cannot resolve. As a result,
AI utilizes these underlying uncertainties to construct a cause-and-consequences
relation in a variety of real-life situations.
The significant points of concern in this paper are CRNN-based traffic acci-
dent prediction algorithm, adopting the convolution kernel to extract feature values
and eventually compared with classic machine learning prediction algorithms. The
prediction algorithm of the CRNN does have a reduced loss as well as greater
predictive accuracy.
This paper is organized as follows. Section 2 presents an overview of the several
researches that have been undertaken so far by researchers in related area. Section 3
illustrates the methodology of the proposed system. Sections 4 presents our experi-
mental approach and evaluate and analyze the results obtained from our experiments.
We conclude the paper in Sect. 5. References are included in the end.

2 Background Works

Clustering and categorization of the data related to traffic accident is a way to reduce
unexpected outcomes. A technique of categorizing traffic events is relied on the type
of traffic occurrence [13–15]. Furthermore, some studies have treated traffic accident
data based on criteria such as visibility (daylight, and even nighttime circumstances)
[16]. Several clustering techniques, like latent class clustering [17], k-means clus-
tering [18], and community recognition algorithms, were employed to cluster road
accident records for the first time before accident analysis [19].
A deep fusion model that can deal with clear cut and persistent factors simultane-
ously was proposed [20]. Not exclusively does the model consider the elements of
auto collisions; however, it likewise considers the spatial–temporal connections in
rush hour gridlock stream. In this model, the unmitigated factors are taken care of by
a stacked restricted Boltzmann machine (RBM), the ceaseless factors are dealt with
by a stacked Gaussian-Bernoulli RBM, and the removed elements are melded by a
joint layer. The exhibition of the recommended model was analyzed and contrasted
with some benchmark models utilizing removed I-80 information.
A method for doing electric power steering (EPS) reverse engineering for external
control was proposed [21]. The fundamental goal of the linked research was to solve
the problem of predicting the dynamic trajectory of an autonomous vehicle with
precision. This was achieved by developing a new equation for calculating lateral
tire forces and modifying some vehicle characteristics during road tests.
354 A. Kumar et al.

A special methodology [22] for precise crossing point traffic forecast by fusing
extra information sources into the expectation model other than street traffic volume
information. We exploit the data gathered from reports of car crashes and roadwork
at convergences specifically. They additionally investigate two different forms of
learning plans: cluster learning and Internet learning. Gradient Boosting, Random
Forest (RF), and Extreme Gradient Boosting are three famous troupe choice tree
models utilized in the cluster learning plan, while the Fast Incremental Model Trees
with Drift Detection (FIMT-DD) model is used in the web-based learning plan. The
proposed technique was tried utilizing datasets made accessible by the Victorian
Government of Australia. The result reveals that incorporating contiguous episodes
and roadwork data works on the exactness of convergence traffic estimate.

3 Methodology

Here we have used the UK accident dataset for our study [6]. This research focuses
on a variable-based classification system for determining the degree of traffic acci-
dents. The collected data is preprocessed and noise is removed. Then, using the
suggested Convolutional Recurrent Neural Networks (CRNNs) model [23], the data
is trained. This trained data is further classified with the use of an edge computing plat-
form in which the intensity of risk is predicted. Detailed architecture and algorithms
are discussed in the following subsections. Figure 2 shows the basic methodology
diagram of our method.

Traffic Accident Dataset

Pre-Processing

Sampled Training

Training Testing Set Testing Classifier

CRNN Trained
Classifier Model

Model Comparison

Model

Fig. 2 Basic methodology diagram for traffic accident modeling and prediction [11]
Traffic Accident Modeling and Prediction Algorithm Using … 355

3.1 Model: Traffic Accident Prediction

Under the traffic accident prediction framework, two steps are utilized to predict
traffic accidents: preprocessing of dataset and prediction model training by using
a classifier. This research employs the UK Car Accident dataset from the United
Kingdom and first filters out all the features that have a stronger effect on traffic
circumstances. Before the CRNN training model can be created, the data must be
de-meaned as well as normalized. Subtract the average of each data dimension from
each dimension of the original data, then substitute. Second, before scaling the results
to the same scale, divide each dimension’s data by its standard deviation. Weather
variations, road surface smoothness, vehicle speed, vehicle type, light levels, road
type, travel length, and other attributes may be extracted automatically when prepro-
cessed data is supplied into the CRNN training model. To depict the current traffic
situation, create a status matrix [24].
The resulting parameter of the CRNNs’ training model varies between 0 and
1. Under this scenario, the lower the chance of a traffic accident, the higher the
final result is biased toward 0. A road accident, on the other hand, is nearly sure to
regularly arise. Through edge calculation, the Mobility Management Entity (MME)
can package as well as deliver the projected outcomes to the operating vehicle [25].
This can also remind the driver to modify his or her speed promptly and to give
greater attention to the surroundings. Ultimately, we must fulfill the goal of reducing
the volume of traffic accidents.
Deep learning method is a subset of machine learning technology that can manage
large amounts of data and obtain attribute values on its own, making it perfect for use
on the Internet of Vehicles (IoV). Even the centralized cloud computing technique
finds it inadequate for deep learning of data received via automobile networking
interfaces due to network capacity limits. The hidden layer calculations for most of
the layers in deep learning technique can be delivered to the edge server, and finally
the reduced information is supplied to the cloud server, to effectively and appropri-
ately manage resources and to continuously improve network resource scheduling.
Wayside, micro-cloud, and base station capture as well as send real information on
vehicle driving and traffic to an edge server. The edge server decodes and evaluates
the information instantly when the volume of data acquired is significant enough.
The present deep learning activities are carried out by the core network’s centralized
cloud computing center, with the outcomes communicated back to an edge server,
and is then forwarded toward the vehicle unit. Certain deep learning activities are
offloaded from the cloud to the edge nearest IoV devices, limiting data transmission
to the cloud and freeing network bandwidth resources [26].
356 A. Kumar et al.

3.2 Convolutional Recurrent Neural Networks (CRNNs)

for Traffic Accident Prediction Algorithm

Convolutional Recurrent Neural Networks (CRNNs) are a hybrid of two prominent

neural networks. Images are feature extracted by utilizing a Convolutional Neural
Network (CNN), and a Recurrent Neural Network (RNN), which was developed for
synthesizing multi-view characteristics out of each image for ultimate prediction. In
the following section, we present basic concepts before we discuss the algorithms
themselves.

3.2.1 Neural Network

Neural Networks’ basic calculations are inextricably linked to neurons. The neurons
comprise the basic building blocks of every Neural Network (NN), so they’re mostly
utilized to incorporate nonlinear features. Sigmoid, Tan Hyperbolic (tanh), Rectified
Linear Units (ReLu), Softmax, and other activation functions are popular. A percep-
tual layer is made up of two layers of functional neurons. External input signals are
received by the perceptron layer’s input layer. The M-P neuron is the output layer
(the functional layer of the perceptron layer) [10]. The perceptron model’s formula
is defined as:

y = f 1 (ωa + b) (1)

3.2.2 Convolutional Recurrent Neural Networks (CRNNs)

Before approaching the convolutional computing layer for actual training, the data
from the original traffic dataset [1] must be preprocessed, which comprises de-
meaning and normalization. The process of reducing every dimension of input data
to zero is termed “de-mean”. It seeks to align the specimen centroid well with the
coordinate system’s origin. By dividing each dimension’s data by its standard devia-
tion, the data is normalized. The purpose would be to scale this information to a very
similar scale. The result from the convolutional layers (feature maps) is turned into
a sequence of feature vectors in CRNN instead of employing fully-connected layers
at the end of CNN. These vectors are then fed into a bidirectional RNN (Fig. 3).
The proposed algorithm has the potential of achieving ultra-fast, high-accuracy
recognition. Furthermore, the real-time accuracy can be enhanced by increasing the
response time.
Traffic Accident Modeling and Prediction Algorithm Using … 357

Fig. 3 Sample prediction model based on CRNN [25]

4 Performance Analysis

This paper incorporates Python’s high-level neural network API framework Keras, as
well as the UK Car Accident 2005–2015 dataset, to replicate the presented accident
prediction system. The Windows 10 OS, 10th Gen Intel Core i7, and Nvidia RTX
3080 Max-Q GPU were the system specifications utilized in the construction. Here we
evaluated our proposed model (CRNN) with other models such as LSTM, DenseNet,
ResNet, VGG, and RNN against performance measures like accuracy, specificity,
sensitivity, recall, F-score, and memory utilization.
The comparative analysis in terms of sensitivity, specificity, and accuracy of
various models such as VGG, LSTM, DenseNet, and RNN with our proposed model
CRNN under 5 datasets is given in Table 1. The graphical representation of models
versus average sensitivity, specificity, and accuracy is shown in Fig. 4. From the
graph it is evident that our proposed method CRNN is giving better performance
compared with the existing models VGG, LSTM, DenseNet, and RNN.
The comparative analysis (recall, F-score, and memory utilization) of various
models with our proposed model under 5 datasets is given in Table 2. The graphical
representation of models versus average recall and F-score is shown in Fig. 5. From
the above evidences it is proved that CRNN out performs existing methods.
From the comparative analysis of Table 1, it is evident that the CRNN network
is giving better accuracy of 96%, specificity of 90%, and sensitivity of 92.3% when
compared with the existing methods such as VGG, LSTM, DenseNet, and RNN.
The encouraging results are due to the feature extraction by utilizing a Convolu-
tional Neural Network (CNN), and a Recurrent Neural Network (RNN), which was
developed for synthesizing multi-view characteristics out of each image for ultimate
prediction. The improved result in terms of recall, F-score, and memory utilization
358 A. Kumar et al.

Table 1 Comparison analysis (sensitivity, specificity, and accuracy) [7]

Dataset Models Sensitivity (%) Specificity (%) Accuracy (%)
1 VGG 88.00 78.00 85.90
LSTM 87.30 84.00 87.40
DenseNet 89.60 88.00 90.30
RNN 90.67 81.00 93.10
Proposed (CRNN) 91.20 89.00 94.40
2 VGG 83.90 82.00 80.00
LSTM 79.50 85.00 86.20
DenseNet 83.00 88.90 82.80
RNN 85.00 80.30 87.00
Proposed (CRNN) 84.00 85.00 95.00
3 VGG 83.60 81.00 82.90
LSTM 82.90 84.50 86.50
DenseNet 86.60 86.30 84.50
RNN 87.80 83.90 89.70
Proposed (CRNN) 88.40 88.30 94.00
4 VGG 87.00 77.00 84.80
LSTM 86.20 83.00 86.30
DenseNet 88.50 87.00 89.20
RNN 89.56 80.00 92.01
Proposed (CRNN) 90.10 88.00 93.30
5 VGG 89.00 79.00 86.10
LSTM 88.40 85.00 88.50
DenseNet 90.70 89.00 91.40
RNN 91.77 82.00 94.20
Proposed (CRNN) 92.30 90.00 95.55
This table summarizes the performance of our proposed model(CRNN) based on sensitivity,
specificity and accuracy for all datasets which are highlighted in bold numbers and showed the
outperformance of CRNN compared to other models

Fig. 4 Models’ versus 100 Sensitivity

average (sensitivity, 95 (%)
specificity, accuracy) 90
85 Specificity
80 (%)
75
70 Accuracy (%)

over the set of five datasets that has been considered for our study is presented in
Table 2.
Traffic Accident Modeling and Prediction Algorithm Using … 359

Table 2 Comparison analysis (recall, F-score, and memory utilization) [10]

Dataset Models Recall (%) F-score (%) Memory utilization (%)
1 VGG 82.00 83.70 90.00
LSTM 84.10 87.00 92.00
DenseNet 86.60 86.00 94.00
RNN 88.00 79.00 89.00
Proposed (CRNN) 91.00 85.00 88.00
2 VGG 83.00 84.80 91.00
LSTM 85.20 88.00 93.00
DenseNet 87.30 87.10 95.00
RNN 89.00 80.00 90.00
Proposed (CRNN) 92.00 86.00 89.00
3 VGG 84.00 85.90 92.00
LSTM 86.30 89.00 94.00
DenseNet 88.40 88.20 96.00
RNN 90.00 81.00 91.00
Proposed (CRNN) 93.00 87.00 90.00
4 VGG 82.10 83.00 92.01
LSTM 84.20 87.13 92.25
DenseNet 86.70 86.12 94.02
RNN 88.10 79.10 89.02
Proposed (CRNN) 91.10 85.08 91.11
5 VGG 84.02 86.12 92.20
LSTM 86.50 89.25 94.02
DenseNet 88.60 88.50 96.60
RNN 90.04 81.20 91.01
Proposed (CRNN) 93.50 87.20 90.02
The Recall value, F-score and Memory Utilization for all datasets of our proposed model (CRNN)
are shown in bold numbers and evidently outperformed other models except F-score as evident in
this table

Fig. 5 Models’ versus 100

average (recall, F-score, 95
memory utilization) Recall (%)
90
85 F-Score (%)
80 Memory
75 utilization (%)
70
360 A. Kumar et al.

5 Conclusions

The world’s population is rising tremendously and the number of automobiles on the
road is expanding in tandem which increases the risk of traffic accidents and injuries.
As a result, having a traffic accident modeling as well as a prediction system is critical
for mitigating the situation. Here, we proposed a traffic modeling as well as prediction
framework centered on CRNN. We have used the UK traffic accident dataset which
is preprocessed and trained by CRNN. Edge computing platform was used for the
classification of trained data. When the system predicts the danger of an accident, it
sends an alert signal to the vehicle unit in movement, based on the intensity of the
risk. We have evaluated our proposed model (CRNN) to other models such as LSTM,
DenseNet, ResNet, VGG, and RNN against performance measures like accuracy,
specificity, sensitivity, recall, F-score, and memory utilization. Our proposed model
(CRNN) outperforms other existing models with an average accuracy of 95.10% and
is a promising area for being further researched with diverse selection of datasets.

References

1. Tedjopurnomo DA, Bao Z, Zheng B, Choudhury FM, Qin A (2020) A survey on modern deep
neural network for traffic prediction: trends, methods and challenges. IEEE Trans Knowl Data
Eng 34:1544–1561
2. An J, Fu L, Hu M, Chen W, Zhan J (2019) A novel fuzzy-based convolutional neural network
method to traffic flow prediction with uncertain traffic accident information. IEEE Access
7:20708–20722
3. Qingchao L, Wang B, Zhu Y (2018) Short-term traffic speed forecasting based on attention
convolutional neural network for arterials. Compu Aided Civil Infrastruct Eng 33(11):999–
1016
4. Fukuda S, Hideaki U, Hideki F, Tomonori Y (2020) Short-term prediction of traffic flow under
incident conditions using graph convolutional recurrent neural network and traffic simulation.
IET Intel Transport Syst 14(8):936–946
5. Zheng M, Li T, Zhu R, Chen J, Ma Z, Tang M, Cui Z, Wang Z (2019) Traffic accident’s severity
prediction: a deep-learning approach-based CNN network. IEEE Access 7:39897–39910
6. Manchanda C, Rathi R, Sharma N (2019) Traffic density investigation & road accident analysis
in India using deep learning. In: Proceedings of the 2019 international conference on computing,
communication, and intelligent systems (ICCCIS). IEEE, Greater Noida, India, pp 501–506
7. Chen C, Xiaoliang F, Chuanpan Z, Lujing X, Ming C, Cheng W (2018) SDCAE: stack denoising
convolutional autoencoder model for accident risk prediction via traffic big data. In: Proceed-
ings of the 6th international conference on advanced cloud and big data (CBD). IEEE, Lanzhou,
China, pp 328–333
8. Plante PL, Francovic-Fontaine E, May JC, McLean JA, Baker ES, Laviolette F, Marchand M,
Corbeil J (2019) Predicting ion mobility collision cross-sections using a deep neural network:
DeepCCS. Anal Chem 91(8):5191–5199
9. Sameen MI, Pradhan B (2017) Severity prediction of traffic accidents with recurrent neural
networks. Appl Sci 7(6):476
10. Cai Q, Abdel-Aty M, Yuan J, Lee J, Wu Y (2020) Real-time crash prediction on expressways
using deep generative models. Transp Res Part C Emerg Technol 117(102697):2020. https://
doi.org/10.1016/j.trc.2020.102697
Traffic Accident Modeling and Prediction Algorithm Using … 361

11. Petersen NC, Filipe R, Francisco CP (2019) Multi-output bus travel time prediction with
convolutional LSTM neural network. Expert Syst Appl 120:426–435
12. Mohamed AA, Qian K, Elhoseiny M, Claudel CG (2020) Social-STGCNN: a social spatio-
temporal graph convolutional neural network for human trajectory prediction. In: Proceedings
of IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, Seattle,
USA, pp 14412–14420
13. Golob TF, Recker WW, Leonard JD (1987) An analysis of the severity and incident duration
of truck-involved freeway accidents. Accid Anal Prev 19(5):375–395
14. Giuliano GE (1989) Incident characteristics, frequency, and duration on a high volume urban
freeway. Transp Res *A, 23A(5):387–396
15. Ozbay K, Kachroo P (1999) Incident management in intelligent transportation systems 1-248.
Artech House Publishers, Norwood, MA
16. Wei X, Shu X, Huang B, Taylor EL, Chen H (2017) Analyzing traffic crash severity in work
zones under different light conditions. J Adv Transp 2017:5783696
17. Depaire B, Wets G, Vanhoof K (2008) Traffic accident segmentation by means of latent class
clustering. Accid Anal Prev 40(4):1257–1266
18. Anderson TK (2009) Kernel density estimation and K-means clustering to profile road accident
hotspots. Accid Anal Prev 41(3):359–364
19. Lin L, Wang Q, Sadek AW (2014) Data mining and complex network algorithms for traffic
accident analysis. Transp Res Rec 2460(1):128–136
20. Li J, Hong D, Gao L, Yao J, Zheng K, Zhang B, Chanussot J (2022) Deep learning in multimodal
remote sensing data fusion: a comprehensive review. Int J Appl Earth Obs Geoinf 112:102926
21. Shadrin SS, Varlamov OO, Ivanov AM (2017) Experimental autonomous road vehicle with
logical artificial intelligence. J Adv Transp 2017:2492765
22. Alajali W, Zhou W, Wen S (2018) Traffic flow prediction for road intersection safety. In:
2018 IEEE SmartWorld, ubiquitous intelligence & computing, advanced & trusted computing,
scalable computing & communications, cloud & big data computing, internet of people and
smart city innovation, Guangzhou, China, pp 812–820
23. Jain AK (2018) Working model of self-driving car using convolutional neural network, Rasp-
berry Pi and Arduino. In: Proceedings of the 2nd international conference on electronics,
communication and aerospace technology (ICECA), IEEE, Coimbatore, India, pp 1630–1635
24. Heidari M, Rafatirad S (2020) Semantic convolutional neural network model for safe business
investment by using Bert. In: Proceedings of the 7th international conference on social networks
analysis, management and security (SNAMS). IEEE, Paris, France, pp 1–6
25. Guo K, Yongli H, Zhen Q, Hao L, Ke Z, Yanfeng S, Junbin G, Baocai Y (2020) Optimized
graph convolution recurrent neural network for traffic prediction. IEEE Trans Intell Transp
Syst 22(2):1138–1149
26. Jiang W, Luo J (2022) Graph neural network for traffic forecasting: a survey. Expert Syst Appl
Int J 207(C):117921
Cyberbullying Severe Classification
Using Deep Learning Approach

Idi Mohammed and Rajesh Prasad

Abstract Given the accessibility of the Internet, social media has become a popular
means of communication. Social media facilitates communication, but it has also
created several issues. People may be bullied despite the advantages of using social
media. Due to the extensive variety of language used, identifying cyberbullying in
posts is complex. The detrimental effects of bullying on social media are getting
worse every day, which is frightening. In this study, we proposed cyberbullying
detection and its severe classification model using roBERTa, convolutional neural
network (CNN), and long short-term memory (LSTM). We used an annotated dataset
with a total of 7547 negative tweets derived from the Nigerian top influential Twitter.
The proposed model demonstrates nearly 97% accuracy in detecting and classifying
posts into four severe classes: Very severe, severe, moderate, and mild.

Keywords Cyberbullying · Social media · Twitter · Severe · CNN · LSTM

1 Introduction

Considering how widespread the Internet is, people now communicate easily on
social media. Social media is helpful for communication, but it has also led to several
problems. Users of social media are more likely than non-users to experience abuse
or contempt [1]. Aggressive behavior, which can be verbal, physical, or social is
referred to as bullying. Cyberbullying is, by extension, defined as bullying that takes
place over digital devices like cell phones, tablets, and computers [2]. Verbal and
emotional abuse, such as spreading untrue stories, disclosing accurate or incorrect

I. Mohammed (B) · R. Prasad

Computer Science Department, African University of Science and Technology, F.C.T Abuja,
Nigeria
e-mail: mohammed@aust.edu.ng
R. Prasad
e-mail: rprasad@aust.edu.ng

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 363
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_31
364 I. Mohammed and R. Prasad

personal information via texts, and sharing things on social networking platforms,
are the most prevalent forms of cyberbullying.
Text-based comments or messages are a common method of communication on
social media platforms and have evolved into the main form of cyberbullying. In
cyberbullying, motive, power, elements of repetition, and differential are considered
essential factors [3]. We observe that individuals, particularly teenagers and young
adults, are coming up with novel strategies for cyberbullying. The level of depression
and anxiety among cyberbullying victims is higher, and they also perform worse at
work, have more suicidal thoughts and attempts, perform worse in school, and are in
worse physical and mental health [4]. Because there is a larger online audience and
messages propagated more quickly online than in conventional bullying, the negative
consequences of cyberbullying are more severe.
Research conducted by Bayzick et al. [5] identified Exclusion, Denigration,
Outing, Flooding, Masquerade, Flaming, Cyberstalking, Harassment, and Trolling
as different types of cyberbullying. To manage cyberbullying dies, natural language
processing (NLP) and machine learning (ML) are combined to identify whether the
text contains cyberbullying content automatically. Deep learning algorithms can be
used in online social networks, social curation, wikis, tweeting, forums, and social
bookmarking to identify cyberbullying. These algorithms are made to automatically
find cyberbullying texts among massive amounts of data.
Severe techniques used in previous studies to classify the severity of cyberbul-
lying such as Hierarchical Squashing-Attention Network [1], fine-grained categories
[6], and language-based techniques [7] did not put into consideration users’ dialect
and behavior when choosing training datasets. In our study, we consider Nigerian
English posts on social media and translated the posts into English to enable our
proposed model to act robustly. Also, users’ behaviors were put into consideration
when annotating the dataset to classify the severe level.
Therefore, in the current social media platforms, cyberbullying is a study issue
that demands further attention. The paper’s contributions are as follows:
(i) An annotating scheme to generate a cyberbullying detection and severe
classification dataset for Nigerian English.
(ii) Model for cyberbullying detection and its severe classification in social media
posts.
The research continues as follows: Sect. 2 offers a literature review of related
research papers. Section 3 describes the methodology used to develop an approach
that matches the objectives of this study. Section 4 describes the proposed model
architecture and how it operates. Section 5 presents and discusses how well different
classifiers did when given the task of classifying tweets according to their severity
ratings. Finally, Sect. 6 concludes the study.
Cyberbullying Severe Classification Using Deep Learning Approach 365

2 Related work

Even though the term “Cyberbullying” did not exist two decades ago, nonetheless,
the issue has now become widespread [8]. Cyberbullying is harassment committed
using digital tools. It can take place on gaming platforms, messaging platforms, social
media, and mobile phones. Its consistent behavior is meant to frighten, infuriate, or
degrade the targets. Twitter and other social networks aim to prohibit or remove
publications that support cyberbullying victimization.
Very limited researchers are working on the detection of cyberbullying severity.
Research describing cyberbullying severity is conducted by authors in [9]. Their
research aimed toward detecting and classifying cyberbullying severity using Naïve
Bayes, KNN, Decision Tree (J48), Random Forest, and SVM Classifiers. They offer
a comprehensive approach to measuring the severity of cyberbullying in online social
networks. Also, they build machine learning multi-classifier for classifying cyber-
bullying severity into different levels with multi-class classification problems as well
as for binary class classification problems. They classify cyberbullying into severe
experiences into long-term, physical threats, sexual, and trapping.
Authors in [8], categorize severity into ten (10) levels, ranging from 1 (mild) to 10
(severe). To detect cyberbullying, they employed a language-based technique. They
also generated a feature to measure the overall “badness” of a post which is called
SUM and compute it by taking a weighted average of the “bad” words (weighting
by the severity assigned).
Research conducted by authors in [1] classify cyberbullying severity posts by
using a Hierarchical Squashing-Attention Network. The authors established a cyber-
bullying severity dataset of Chinese-language marked with three severity levels
(serious, medium, and slight) and develop a new squashing-attention mechanism
of hierarchical squashing-attention network. The authors adopted a cross validation
approach to evaluate training algorithms which resulted in 79.76% accuracy.
Van Hee et al. [6] presented model using fine-grained categories to detect cyber-
bullying on online post. Their experiment was conduction on Dutch dataset. The
classification includes insult, defamation, exclusion, and threat. Annotations iden-
tified include Author’s role, threat, insult, curse, defamation, sexual talk, defense,
and encouragement to the harasser. They used lexical features to gain insight into the
difficulty and learnability of the detection and fine-grained classification of cyberbul-
lying conducted on a Dutch dataset. They classify cyberbullying events into Insult,
Threat/Blackmail, Defamation, Curse/Exclusion, Defense, Sexual talk, and Support
to the harasser.
However, the reviewed automated cyberbullying detection techniques still require
development to determine the level of cyberbullying in social media posts with high
accuracy. Not much work has been put into determining the severity of bullying feel.
In this study, we proposed a model that combines robustly optimized roBERTa with
long short-term memory trained with dataset constructed with recent post to identify
and classify severe levels of cyberbullying.
366 I. Mohammed and R. Prasad

3 Methodology

Several scientific studies have shown that social media may be valuable sources of
data for analysis as well as for understanding people’s attitudes and behavior [3].
This section outlines how the dataset for the research originated. Figure 1 and outline
the procedure of data gathering.

3.1 Data Collection

Gathering a dataset is a very challenging task and a well-labeled dataset is necessary

to get reliable findings [4]. More research work has been done in the United Kingdom
and United State English thus, a lot of datasets resources are available for the English
language, which is not suitable for other versions of English languages such as
Nigerian English. To cover the research gap, the study uses a dataset obtained from
https://data.mendeley.com/datasets/5d3y8v68sd. The dataset is named “Dataset for
cyberbullying comments derived from Nigerian topmost influential Twitter handles”.
Figure 1 presents a data gathering and creation flow flowchart.

3.2 Text Preprocessing

The obtained datasets of comments were then subjected to preprocessing operation

keeping post containing text only (remove URL, user handle, emotional signs, etc.).
The clean text was subjected to a sentiment analysis test using Cardiff NLP/Twitter-
roBERTa-base-sentiment.

Fig. 1 Data gathering and creation flow

Cyberbullying Severe Classification Using Deep Learning Approach 367

Table 1 Columns in the dataset and their descriptions

Column Description
Id Unique identification number for each comment
Comments The content of the Tweets
Cyberbullying Indicate dual class (i.e., cyberbullying or non-cyberbullying)
class
Cyberbullying Extracted exact cyberbullying text (n/a is written as a Placeholder when there
phrase is no cyberbullying text to extract in the tweet)

3.3 Sentiment Analysis

Usually referred to as opinion mining, sentiment analysis can detect, extract, and
quantify the emotional undertone of a body of text. With the development of deep
language models like roBERTa [9], it is now possible to evaluate more challenging
data domains, such as news texts where authors often convey their opinions or senti-
ments less openly. Twitter is well-known for being a platform where users may tweet
about their emotions. In this regard, our sentiment analysis section uses roBERTa
model to determine whether a post is expressing positive (1), neutral (0), or negative
(− 1) emotions. Any text that was found negative is appended to the collected list. The
result is then saved in a CSV file, downloaded, and forwarded for annotation. A total
of 7547 negative tweets were used for annotation. Table 1 contains the descriptions
for each column for annotated dataset.

3.4 Data Annotation

The negative streams of text collected from the sentiment analysis stage were
annotated manually resulting in 48,354 cyberbullying entries and 151,646 non-
cyberbullying. As a testing dataset, the cyberbullying entries were selected, annotated
by two English speakers. We used Cohen’s kappa Statistics to evaluate the result of
manual annotation which measures the degree of agreement between the annota-
tors [10]. Annotators are labeled A, B, and C for easy identification. The annotation
result of A is compared with B, A with C, and vice versa as provided in Table 2. The
formula to calculate Cohen’s kappa for two raters is shown in Eq. 1.

po − pe 1 − po
k= =1− , (1)
1 − pe 1 − pe

where

po the level of perceived agreement amongst raters

pe the hypothetical chance agreement probability.
368 I. Mohammed and R. Prasad

Table 2 Cohen’s method

Pair Kappa value
with inter-annotator
agreement value A with B 0.8623
A with C 0.8754
B with C 0.8901
Average 0.8726

Table 3 Distribution of
Classification Annotated tweets
dataset by cyberbullying class
Mild 1084
Moderate 1280
Severe 1256
Very severe 87

Cohen’s Kappa technique yields a result of 0.87. Based on the result, the adoption
rate of the dataset annotations produced by this study is near the perfect agreement
level.
For the second annotation, we adopted a high-quality annotated dataset for harass-
ment post provided by Thirunarayan and Shalin [11]. To conduct our experiment on
assessing the intensity of harassment using the dataset, we translate the dataset into
English, then divided the annotated tweets that were cyberbullied into 4 levels. Mild,
moderate, severe, and very severe. We then categorize personnel related tweets as
mild, political racist as moderate, intellectual as severe, and murder as very severe.
Dataset is published at https://data.mendeley.com/datasets/4w6fcyzdfp/1. As a result,
a dataset with features in Table 3 was created.

3.5 Classification Algorithm

The most important stage of the text classification pipeline is selecting the optimal
classifier. Without a complete conceptual understanding of each algorithm technique,
we cannot effectively choose the best text classification model for implementation.
A model to identify cyberbullying behaviors and their severity was developed using
the severity detection of cyberbullying from tweets. To select the best algorithm for
classification, we tested several machine learning algorithms namely: Convolutional
neural network (CNN), Naïve Bayes, support vector machine (SVM), long short-term
memory (LSTM), and K-nearest neighbors (KNN).
Cyberbullying Severe Classification Using Deep Learning Approach 369

3.5.1 Convolutional Neural Network

Convolutional neural network (CNN), is a type of artificial neural network [12],

which, when compared to other networks layers, it has a deep feed-forward archi-
tecture and amazing generalization ability. CNN can more effectively recognize
objects by learning very abstract characteristics of their spatial data. As a method
for producing promising results in a variety of applications in computer vision,
including object detection, classification of image, facial recognition, vehicle recog-
nition, speech recognition, and text recognition [13]. CNN has more accuracy and a
high acceptance rate among researchers.

3.5.2 Naive Bayes

Naive Bayes algorithms are mostly used in sentiment analysis, recommendation

systems, spam filtering, etc. It is one of the earliest machine learning algorithm
[14]. They are fast and easy to implement but their major drawback is that inde-
pendent predictors are required. In most of the real-life cases, the predictors are
dependent. This impairs the classifier’s effectiveness. For textual characteristics and
word embedding in our investigation, we utilized the Naive Bayes classifier’s most
basic configuration.

3.5.3 Support Vector Machine

Support vector machine (SVM) was developed within the context of statistical
learning theory and have successfully been used by various applications, including
face recognition, time series forecasting, and processing data for medical diagnosis
[15].
Finding separators that can identify the various classes in the search space is the
main goal of SVM [7]. By training our SVM mode, each of the four classes very
severe, severe, moderate, and mild were applied as target variable using one-against-
all approach. For each classifier, the class is fitted against all the other classes. The
results of the comparison will be presented in the results section.

3.5.4 Long Short-Term Memory

Long short-term memory (LSTM) cell type is a unique configuration of the funda-
mental RNN unit [16]. LSTM is local in time and space its time and space compu-
tational complexity is O(1) [17]. Based on its ability to contact the context, we used
the LSTM algorithm to examine the sentiment in derived characteristics.
370 I. Mohammed and R. Prasad

3.5.5 K-Nearest Neighbors

K-nearest neighbors (KNN) is a straightforward machine learning technique [18].

This algorithm classifies a fresh sample from its neighbor based on distance. Unla-
beled observations are categorized by the KNN classifier by placing them in the
category of the most comparable labeled samples. Finding the K-nearest neighbors
in the training set, and then classify each item into its K-nearest neighbors is the
major task perform by KNN [7].

4 Proposed Model for Cyberbullying Severe Classification

In this section, we illustrate the design of model with four components for cyber-
bullying severe classification (CSC). The components include data input, sentiment
analyzer, cyberbullying detector, and severe classifier. Figure 2 illustrates the overall
layout of our model. In the remaining portion of this section, we explain details about
our model.

4.1 CSC Model Working Steps

The CSC model begins with reading test data. The test data is then pre-processed
and submitted to sentiment analysis stage which utilizes roBERTa model to detect if
the stream of data contains negative sentiment. roBERTa model is used to tokenize
words and build word embeddings [9]. The sentiment analysis section gives output
Negative (− 1), Neutral (0), and Positive (1). Cyberbullying detector stage begins
with reading 7547 training data and a stream of negative test data. The cyberbullying
detector utilized CNN and LSTM algorithms. The CNN algorithm was used to extract

Fig. 2 Cyberbullying severe classifier (CSC) model

Cyberbullying Severe Classification Using Deep Learning Approach 371

features from streams of test data. The LSTM algorithm works by analyzing the
sentiment in retrieved features based on its attribute of being able to contact the
context [13]. The cyberbullying detector gives out labels cyberbullying (1) and non-
cyberbullying (0). If the output of this stage if given as non-cyberbullying the next
streams of test data is loaded for testing while if the output is cyberbullying, then the
next stage is loaded.
The final stage of this model is the severe classifier. Begins with reading 3707
training data and cyberbullying text. The model utilized CNN and LSTM algorithms
to classify the severity of cyberbullying.

4.2 Model Training Algorithms

The CSC model uses Twitter-roBERTa-base and LSTM algorithms for model training
and severe classification using the dataset explained in section III(A). Millions of
tweets were used to train the roBERTa-base model and fine-tuned for sentiment
analysis with the TweetEval benchmark [19]. The CSC model sentiment analysis
module implemented in Python was used. roBERTa-base model consist of 12 base
layers and 125 million parameters [9]. The layers’ goal is to produce a useful feature
representation of words from which the more sophisticated layers may quickly extract
the necessary data. The results of the dropout layers are then fed into a LSTM. The
LSTM model retains data for the purpose of identifying input dependencies that are
far away in the feature. When words are changed into numbers, the models learn
more effectively.

4.3 Evaluation Metrics

F1-score, accuracy, precision, and recall were utilized to assess the classification and
model comparison. The aim of evaluation is to confirm model performance [20].
The percentage of classifications a model successfully forecast divided by the total
number of predictions is known as model accuracy [21]. Accuracy is calculated using
Eqs. (2) and (3).

CP
Accuracy = (2)
TP

Acronyms definition
CP correct prediction
TP total predictions.
Equation 3 can be used to calculate the positive and negative accuracy of the
binary classification.
372 I. Mohammed and R. Prasad

TP + TN
Accuracy = (3)
TP + FN + FP + TN

Acronyms definition
TP true positive
TN true negative
FN false negative
FP false positive.
Precision is defined as the proportion of accurately categorized positive samples
(True Positive) to the total number of positively classified samples. It’s computed
using Eq. (4).

TP
Precision = (4)
TP + FP

The percentage of true positive predictions that are correctly categorized and
measured as Eq. (5).

TP
Recall = (5)
TP + FN

The F-score, also known as the F1-score, is used to assess a model’s accuracy on
a dataset. It is used to evaluate binary categorization methods that label examples as
“positive” or “negative”. The accuracy evaluation measure could not be accurate if
there isn’t a balanced dataset. F1-score is calculated using Eq. (6).

2TP
F1 = (6)
2TP + FP + FN

5 Results and Discussion

In this section, we examines how well various classifiers performed when tasked with
categorizing tweets based on their severity ratings. Results for each classifier’s multi-
class categorization in various contexts are shown in Table 4. With CNN + LSTM
features, performance was significantly improved in terms of accuracy and F-score.
The results were obtained using data obtained from posts in Nigeria top influential
Twitter handles. All posts were subjected to test after text preprocessing without
any translation. Our suggested strategy outperforms numerous feature-engineered
strategies and procedures in terms of identifying cyberbullying and categorizing it
as severe in binary classification. In this research, several recommended features
were added on top of the default classifier settings, but only some features enhanced
classifier performance. The CNN + LSTM model performs well because the deep
Cyberbullying Severe Classification Using Deep Learning Approach 373

Table 4 Comparison of
Case study Algorithm Accuracy F-score
testing accuracy of CNN,
SVM, KNN, Naïve Bayes, 1 CNN 95.64 0.91
and LSTM algorithms SVM 87.42 0.88
KNN 86.65 0.83
Naïve Bayes 76.56 0.82
LSTM 97.54 0.88
2 CNN + SVM 95.33 0.94
3 CNN + KNN 91.37 0.91
4 CNN + Naïve Bayes 93.76 0.93
5 CNN + LSTM 97.34 0.95
6 SVM + KNN 96.01 0.87
7 SVM + Naïve Bayes 89.52 0.88
8 SVM + LSTM 96.87 0.94
9 KNN + Naïve Bayes 79.76 0.89
10 Naïve Bayes + LSTM 89.61 0.91

learning model can learn more about the text’s underlying semantic structure [22].
The CNN + LSTM model can ensure that each microblog’s overall metrics are
met, but it can also extract more detailed semantic data, modify different model
parameters, and optimize the algorithm to better classify emotions.

6 Conclusion

In this study, we presented a new multi-class approach to detect and classify severity
of cyberbullying posts written in Nigeria English in social media. The main objective
of this research is to build a dataset of cyberbullying severity and development a model
for severe classification of posts made in Nigerian English. The obtained results
show an improvement in terms of F-score and accuracy when compared with other
techniques. Our study proves that detecting severity of cyberbullying is influenced
using language. We were unable to conduct an in-depth investigation of user activity
on social media. Despite this limitation, we believe that our proposed work will
contribute the multi-class classification of cyberbullying from binary classification
of cyberbullying or non-cyberbullying. Furthermore, present research focuses on
text-based post on social media. Other forms of communications such as voice,
image, and video need to be investigated with same pattern to identify severity.
374 I. Mohammed and R. Prasad

References

1. Wu JL, Tang CY (2022) Classifying the severity of cyberbullying incidents by using a

hierarchical squashing-attention network. Appl Sci 12(7). https://doi.org/10.3390/app120
73502
2. Fang Y, Yang S, Zhao B, Huang C (2021) Cyberbullying detection in social networks using
bi-gru with self-attention mechanism. Information 12(4):1–18. https://doi.org/10.3390/info12
040171
3. Langos C (2012) Cyberbullying: the challenge to define. Cyberpsychol Behav Soc Netw
15(6):285–289. https://doi.org/10.1089/cyber.2011.0588
4. Arif M (2021) A systematic review of machine learning algorithms in cyberbullying detection:
future directions and challenges. J Inf Secur Cyber-crimes Res 4(1):01–26. https://doi.org/10.
26735/gbtv9013
5. Bayzick J, Kontostathis A, Edwards L (2011) Detecting the presence of cyberbullying using
computer software. Springer, pp 11–12
6. Van Hee C et al (2015) Detection and fine-grained classification of cyberbullying events. Int
Conf Recent Adv Nat Lang Process RANLP 2015:672–680
7. Talpur BA, O’Sullivan D (2020) Cyberbullying severity detection: a machine learning approach.
PLoS One 15(10):1–19. https://doi.org/10.1371/journal.pone.0240924
8. Talpur BA et al (2011) Using machine learning to detect cyberbullying. Informatics 2(4):1–22.
https://doi.org/10.1109/ICMLA.2011.152
9. Sirisha U, Chandana BS (2022) Aspect based sentiment and emotion analysis with ROBERTa,
LSTM. Int J Adv Comput Sci Appl 13(11)
10. McHugh ML (2012) Lessons in biostatistics interrater reliability: the kappa statistic. Biochem
Medica 22(3):276–282 [Online]. Available: https://hrcak.srce.hr/89395
11. Thirunarayan K, Shalin VL (2018) A quality type-aware annotated corpus and lexicon for
harassment research
12. Ghosh A, Sufian A, Sultana F, Chakrabarti A, De D (2019) Fundamental concepts of
convolutional neural network, vol 172. https://doi.org/10.1007/978-3-030-32644-9_36
13. Santur Y, Sentiment analysis based on gated recurrent unit
14. Berrar D (2018) Bayes’ theorem and naive Bayes classifier. Encycl Bioinform Comput Biol
ABC Bioinforma 1–3(September):403–412. https://doi.org/10.1016/B978-0-12-809633-8.204
73-1
15. Evgeniou T, Pontil M (2001) Support vector machines: theory and applications. Lect Notes
Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinform), 2049 LNAI, no.
May, pp 249–257. https://doi.org/10.1007/3-540-44673-7_12
16. Yeturu K, Machine learning algorithms, applications, and practices in data science, 1st ed., vol
43. Elsevier B.V. https://doi.org/10.1016/bs.host.2020.01.002
17. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780.
https://doi.org/10.1162/neco.1997.9.8.1735
18. Zhang Z (2016) Introduction to machine learning: K-nearest neighbors. Ann Transl Med 4(11).
https://doi.org/10.21037/atm.2016.03.37
19. Kusumawardani RP, Maulidani MW (2020) Aspect-level sentiment analysis for social media
data in the political domain using hierarchical attention and position embeddings. https://doi.
org/10.1109/ICoDSA50139.2020.9212883
20. Powers DMW (2020) Evaluation: from precision, recall and F-measure to ROC, informedness,
markedness and correlation. https://doi.org/10.9735/2229-3981
21. Ema RR, Adnan MN, Khatun MA, Kabir SS, Galib SM, Hossain MA (2022) Protein secondary
structure prediction based on CNN and machine learning algorithms. Int J Adv Comput Sci
Appl 13(11):74–81. https://doi.org/10.14569/IJACSA.2022.0131108
22. Yan X, Huang T (2015) Tibetan sentence sentiment analysis based on the maximum entropy
model. In: Proceedings 2015 10th international conference on broadband and wireless
computing, communication and applications, BWCCA 2015, pp 594–597. https://doi.org/10.
1109/BWCCA.2015.32
The Six Sigma Methodology
Implementation in Agile Domain

Abhay Juvekar, Oscar Leo D’souza, and Anita Chaware

Abstract Six Sigma is a method for updating processes to reduce variability and
defects. Although many world-class manufacturing businesses have adopted Six
Sigma, it is still relatively new in the software sector. This article discusses how the Six
Sigma technique may be used to reduce faults in software maintenance projects. To
solve the fundamental problem of minimizing customer reported defects during the
maintenance phase of the software, the Define–Measure–Analyze–Improve–Control
(DMAIC) technique was used. The purpose of this study was to show how a software
process may adopt a systematic approach for achieving world-class quality while
also achieving customer satisfaction as well as the overall profit of a company. The
project discuss is the live maintenance project done by the software QA team which
was responsible for ensuring that the software meets quality standards and customer
satisfaction. The work done in this project shows that the implementation of Six
Sigma during the maintenance or QA process has improved the total profit of the
Software Service company and thus saved the ruining of brand of the company.

Keywords QA project: Six Sigma · DMAIC · DMADV

1 Introduction

The service industry (manufacturing industry and software industry) throughout the
world had played a vital role in the economies of both developed and developing
countries demanding an emphasis on the quality of service. The success of Japanese

A. Juvekar (B)
IT Consultant, Mumbai, India
e-mail: Abhay.Juvekar@yahoo.com
O. L. D’souza
HCL Technology, Mumbai, India
A. Chaware
Associate Professor, P G Department of Computer Science, SNDTWU, Mumbai 400049, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 375
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_32
376 A. Juvekar et al.

industry, particularly in the 1970s and 1980s, encouraged the entire globe to focus
on quality concerns [1]. The Japanese experience has shown that the requirements
and expectations of consumers are the most important aspects in determining quality
[2]. There are no exceptions, in the software industry as well. As a result, the value
of high-quality software can no longer be overlooked. Today’s customers expect and
value greater quality and are ready to pay a premium for it [3]. The characteristics
of high-quality software are:
• proper development,
• compliance with requirement specifications,
• good performance that meets customer expectations, and
• suitable for usage.
Software projects that are of poor quality are delayed, failed, abandoned, or
rejected, and may accelerate the execution of the penalty clause and cancelation
of the business contract, resulting in the removal from the preferred vendor list. Even
completed software projects may require costly ongoing maintenance and corrective
releases or service packs to ensure excellent software quality [4]. In this research,
the IT solution provider firm DG Solutions which has taken the E-ticketing and
Reservations Project of Hindustan Asia Pacific Airways (HAPA) a reputed airline
with the following specification—the project size was estimated to be 307070 SMC
assuming a productivity of 8.8 SMC/PD and the duration was for around 32 months
with a resource allocation of around 45 personnel all through that period. But due to
poor productivity and delayed deliverables, the project was labeled as RED and was
under the purview of High Risk Project Management (HRPM) with the possibility
of HAPA blacklisting DG solutions.
Originally, the project size was estimated to be 307070 SMC assuming a produc-
tivity of 8.8 SMC/PD and the duration was for around 32 months with a resource allo-
cation of around 45 personnel all through that period. However, with in loopholes in
the system like the Requirement Gathering Phase took an overly long time and despite
the extended period elapsed it was far from being done, though the activities were
studiously monitored and time reporting was maintained, but the idle time of SMEs
and BAs was increasing multifold, code reusability was not on plan, requirement
volatility, complexity, and ambiguity was rife and irregular/overlapping configura-
tion management led to missing components of Software. This give HAPA a buggy
Software with Schedule Overrun, loss incurred to the tune of $1.2 million which
finally ruin its brand in the international market. For DG solutions this computed to
a very high Schedule Variance (SV) bringing the project to the stage of foreclosure
associated with huge penalties.
The Six Sigma Methodology Implementation in Agile Domain 377

2 Literature Review

According to IEEE [5], a process is “a set of actions carried out for a certain goal.”
There is a correlation between processes and outcomes, hence process optimization
can increase the quality of a software product. It gives project participants a consistent
approach of doing the same task in the same way every time. Process improvement
is concerned with identifying and enhancing processes. Defects discovered in past
attempts are addressed in subsequent efforts [6]. Quality processes are usually intro-
duced toward the end of the project cycle, usually before implementation, in tradi-
tional Software Development Life Cycle (SDLC) approaches. Unit testing, system
testing, integration testing, and other terminology are widely used. Some of the most
efficient approaches emphasize on design reviews and code reviews, but they, too,
occur after a deliverable has already been created [7]. Six Sigma corrects this by
implementing toll gates at each level of the project’s life cycle. As a result, the
SDLC’s idea, requirements collecting, systems definition, software development,
software testing, deployment, and maintenance stages correlate to toll gates [8]. Toll
gates should be included from the start of a software project to increase the probability
of a successful outcome. One among the many available models and techniques, Six
Sigma can be used for process improvement.
Six Sigma, according to Tomkins [5], is “a program aiming at the near-elimination
of faults from every product, process, and transaction.” “A deliberate endeavor to raise
profitability, expand market share, and improve customer satisfaction via statistical
methods that can lead to break-through quantum increases in quality,” is according
to Harry [9]. Also, it is a “new strategic paradigm of management innovation for firm
survival in the twenty-first century, implying three things: statistical measurement,
management strategy, and quality culture” [10].
Six Sigma has two key methodologies: DMAIC and DMADV. Define–Measure–
Analyze–Improve–Control (DMAIC) is used to improve an existing business
process, and Define–Measure–Analyze–Design–Verify (DMADV) is used to create
new product or process designs for predictable, defect-free performance [10].

3 Proposed Solution

With the Waterfall Model, the Earned Value Analysis showed the infeasibility of
continuing with the Waterfall Model and still make the deadline. As the Client
key members were not always available due to their heavy workloads, the project
progress suffered, and the onus was put entirely on the Vendor Project Team. This was
despite regular status updates being presented to the Steering Committee comprising
stakeholders from both the Client and the Vendor.
Rather than crash the project, it was decided in concurrence with the customer to
move from the Waterfall to the Agile Model of delivery. Thus, there was stakeholder
signoff on shift of model. This was so that Quick Wins after incremental deliveries
378 A. Juvekar et al.

Table 1 QA project gains

Hard Gains/Quantitative Tar Soft Gains/Quality
Increase in productivity 30% Automation
On time delivery (faster delivery) 25% Reusability
Savings in the $ (cost reduction) $ 10,000,0 Increase in profitability
Defect reduction 3.4 DPMO Decrease in COPQ

would enthuse the Client Project Team members and provide the necessary impetus
and encouragement to move forward positively. This also allowed DG to give periodic
deliveries as well as ensure transparency and joint/accepted accountability for delays
that might be wholly attributable to the Customer and/or other Vendors engaged for
interfaces.
The following improvements were expected from the QA Project (Table 1).

4 Methodology

The goal of this study is to improve the quality level and process capabilities of
software build for the E-ticketing and Reservations of the highly reputed airlines,
utilizing the Six Sigma DMAIC methodology. When the DMAIC technique is used,
it produces targeted goods at the right time and at a low cost.
The different stages followed under this methodology are explained below.

4.1 Define Phase

The output of As-IS project current productivity of 8.34 SMC/PD was to be targeted
to reach to the goal of 12.54 SMC/PD. To fulfill this requirement the process started
with getting a project Charter (Fig. 1).
The initial step is to define and develop a project charter considering the Project
Output and Goal as seen in Fig. 2.
Each sprint duration was taken as 12 weeks. Pre-planning was done over a span of
8 weeks, followed by the sprints. Steady State Observation was done over 16 weeks.
Next step was to create the Supplier, Input, Process, Output, Customer (SIPOC) was

Fig. 1 The AS-IS project

current level and TO-BE Current Output Goal
project goal/target Productivity 12.54
SMC/PD
8.34 SMC/PD
The Six Sigma Methodology Implementation in Agile Domain 379

Project Title: To increase the productivity by 50%, reduce the delay and make Schedule var than zero. Thus, to remove the penalty
threat.
Problem Statement Opportunity Statement
Pain: Low Productivity
Due to poor productivity, deliverables are having a high positive sc iance. This impacts the Impact of pain in $: 10 milli loosing market
customer satisfaction as end-customer is unh customer is continuously complaining about branding.
the bug-ridden and delay delivery. Project product and quality needs to be
brought to six sigma
Goal statement Project scope
Product and Process under ment:
Metric Current level Goal / Targe Target D Productivity. Starts wit 2020 Ends with 20-
Aug-2021

Productivity 8.34 SMC/P 12.54 SMC/ 20-Aug-21

Project plan
Team Selection
Activity Start End Goal Met

Pre-Planning 01-Feb-20 28-Mar-20 Project Start

Sprint I 29-Mar-20 21-Jun-20 8.80 SMC/PD Champion: ABC (BB)

Steady State tion 21-Aug-21 11-Dec-21 Met

Fig. 2 Snap shot project charter for the Six Sigma project

the second activity. SIPOC indicates major activity or sub processes in a business
process and the top-down flow chart for the processes (Table 2).
The third activity was to prioritize the voice of customer and voice of business to
those which are related to project objective. Critical to Qualities (CTQs) and Critical
for Processes (CFPs) were thus determined. The fourth activity was to sketch the
top-down chart for PROCESS showing the different processes with the HR and
other details like starting point as customer and end point as satisfying the customer.
And finally, the fifth activity prepared was to define the process map to identify the
important processes in the Project. The following were the output of this process
Voice of customer → Delayed Delivery/Redundant usage of customer resources/
Loss of Revenue and Threat of vendor change at great cost
Critical to Quality → High SV on Positive side/Extended working hours for
vendor as well as customer/PSDD much above the USL
Voice of business → Rare Technology in the global market/Complex Airline
Domain/High Response Time required from all ends/Diminishing Profit Margins
Critical for Processes → Inexperienced resources/No trainer available in market
for TPF & Assembler/Reduction in cost.

Table 2 Snippet of SIPOC diagram for training process

Suppl Inputs Process Output Custom
Customer Training material o Process of train Revised training Business
technology revision experts
Business Material (Revised) Simplification Lucid & illustrate Training
experience gradation material department
380 A. Juvekar et al.

4.2 Measure Phase

A mathematical relationship called as transfer function Y =F(x), was established to

show the different process output measure. Y has a dependent input and process
variables X. In this study, there were 3 output/performance measures, i.e., Y 1, Y 2,
Y 3.
Y 1—High SV on Positive Side = f (Domain Training, Technology Training,
Process Training, Mock Testing & Evaluation)
Y 2—Effort Overrun for Vendor as well as customer = f (Requirement Elicita-
tion, Requirement Harmonization, Breakdown of Requirements, Involvement of
Testers, SMEs/BAs from Requirement stage)
Y 3—PSDD much above USL = f (Internal reviews made stringent, Indepen-
dent testing implemented, Automated Testing introduced, Defect Prevention
Algorithm introduced).
The goal of the measurement phase is to gather meaningful data about vital
processes by measuring key process parameters in order to figure out the issue area.
Identify what to measure, assess the measuring system, gather data, identify causes
of variation, and calculate the sigma level—are the major actions in this phase. The
current inputs, procedure, and outputs are all logged throughout the initial action.
This task aids in the quantitative assessment of the situation. The focus was majorly
on the improvement of the productivity, the measured characteristic (CTQ—Critical
to Quality) were high SV on positive side. Extended working hours for vendor as
well as customer, PSDD much above the USL. The relevant data is acquired during
the following action, data collection. We can graphically present the data after we get
it. The graphical representation of data aids in the identification of process causes
of variance. Based on CFT’s Brainstorming, Pareto Analysis, etc., the problem’s
weightage in order of priority is shown in Fig. 3.

4.3 Analyze Phase

In this phase of DMAIC, identify responsible assignable causes for poor quality or
variability in the existing process. The cause-and-effect diagram is a visual brain-
storming tool for capturing potential causes of an issue used at the analyses stage
to figure out what’s causing an issue, or in the improve stage to figure out what’s
causing it. The responsible causes (X’s) for each of the Y ’s are mentioned in cause-
and-effect diagram. Figure 4 shows the cause-and-effect diagram for the of the Y ’s
in the project, also called the Fish bone diagram due to its appearance (Figs. 4).
The Six Sigma Methodology Implementation in Agile Domain 381

Rank (% Weightage)
Rank (% Weightage)

30 30

10
5 5

1. Domain 2. No senior 3. Low 4. Poor Salary 5. No 6. Others

knowledge member in the Motivation structure employee
experts are not team with PM level retention
the skill Policy
developers.

Fig. 3 A sample of ranking done to the CFTs

Fig. 4 Fish bone diagram showing factors impacting the process map Y3 in the measure phase of
sprint 1

4.4 Improve Phase

Based on the analysis phase, all the possible solutions for all the 3 problems (Y s)
were listed down, and the solution matrix was used to decide which the best solu-
tion is. Sigma impact, time impact, cost effect, and other implications are among
the assessment factors of the solution matrix. These factors’ scoring was done by
382 A. Juvekar et al.

34.65 High SV on Positive side (Y1)

Extended Working Hours for customer & vendor (Y2)

15.56

9.65
5.36 1.43 6 6
1.13 1.46
1.14 0.30.59 0 0 0.1 0.58 1.12 1.05
Average

Average
Cp

Cpk

Sigma Level
Standard deviation
Sigma Level
Standard deviation

-1.77-0.84

After Six Sigma

Fig. 5 The results before and after implementing the Six Sigma

Table 3 Indicating the values before and after the Six Sigma implementation
Output indicator—PSDD much above U DPMO Sigma level
Before Six Sigma implementation 1,000,000.00 0
After Six Sigma implementation 3.36 6

finding the correlation of solution to criteria. Higher the score better is the solution
in meeting the four requirements. So, from among the many causes found, 12 causes
were selected for improvement. The modification of the current process according
to the result of analyze phase was done in the improve phase. Figure 5 shows the
comparative visualization of the complete process before and after implementing the
Six Sigma (Table 3).

4.5 Control Phase

The sustainability in the improvements achieved and to monitor them to ensure the
continuity and sustainability success, the control chart thus obtained is given in
Table 4.
The Six Sigma Methodology Implementation in Agile Domain 383

Table 4 Control chart showing all the important results for Six Sigma
Output indicator Average Standard deviation Cp Cpk
Y1—High SV on the positive 0.10 1.13 1.46 1.43
Y2—Extended Working hour customer as 9.65 0.58 1.12 1.05
well as vendor
Y3—PSDD much above USL 0 0 1.43 1.04

Table 5 QA project gains—target versus achieved

Hard gains/Quantitative Target Threshold Achieved
Increase in productivity 30% 28% 42.50%
On time delivery (faster delivery) 25% 22% 32.73%
Savings in the $ (cost reduction) $ 10,000,000 $ 9,000,000 $ 14,346,177
Defect reduction 3.4 DPMO 3.4 DPMO 3.36 DPMO
Note Cost = Productivity * Manhours * $ rate

Usually, as part of the closure of the project, one of the final deliverables is
replicating the project. As a result of this success of Six Sigma with one organiza-
tion the replication of this plan was done in two other firms at all locations where
project is executed. Action Plan of E-ticketing & Reservations for HAPA was imple-
mented on similar project. Reusable components like checklist, templates, knowledge
management portals, code optimizer were used wherever required.
Improvements due to the Six Sigma exercise are measured and the results as
shown in Table 5 reflect that, there is increase in productivity, reduction in schedule
overrun, considerable cost savings and reductions in number of bugs (Table 5).
Additionally, automation was introduced by using the High-Level Assembler
(HLASM) and Transaction Processing Facility (TPF) tools. Code was made reusable
which helped in increasing the profitability by reducing the time in subsequent coding
exercises. The tools coupled with more systemic controls in QA led to a decrease in
the COPQ.

5 Conclusion

The Six Sigma exercise was successful. This paper reports the results of a Six
Sigma implementation carried out in the Airlines in their QA. The project shows
that: Damage control was done in time and there was no business loss; produc-
tivity increase was gradual, stable, and sustainable; there was no downfall in the
productivity; customer co-operation was excellent during the crisis, team received
awards, bonus, increments, ESOPS after successful delivery milestones comple-
tion. Algorithm used for this project was taken for Organization Innovation and
Deployment.
384 A. Juvekar et al.

The process would be more effective if—internal people would have been given
an opportunity of bettering themselves instead of lateral hiring; lateral recruits should
have been done in planned manner and keeping demand supply threshold in mind;
domain knowledge experts should have scaled up as per the project technical require-
ments; proactive contribution of senior management in Six Sigma Project; High
Risk Project Management group’s contribution would have been proactive; require-
ment stability index should have been captured. Looking into what went wrong, we
conclude that the concept of nine Sigma would be more beneficial and maybe an
opportunity to reduce cost. This can be taken as a further scope for research.

References

1. Hsieh YJ, Huang LY, Wang CT (2012) A framework for the selection of six sigma projects in
services: case studies of banking and health care services in Taiwan. Serv Bus 6(2):243–264
2. Free Six Sigma Lessons. Motorola University (2008). http://www.motorola.com/content.jsp?
globalObjectId=3069-5787
3. Coby P (2004) Community Spirit. Airline Bus 20(6). http://www.flightglobal.com/articles/
2004/06/01/182274/community-spirit
4. http://www.iata.org/stbsupportportal (2009). StB Support Portal. https://www.iata.org/en/pub
lications/
5. Srivastava A, Bhardwaj S, Saraswat S (201) SCRUM model for agile methodology. In:
Proceeding—IEEE international conference on computing, communication and automation
ICCCA 2017, vol 2017-Janua, pp 864–869
6. Abrahamsson P, Salo O, Ronkainen J, Warsta J (2002) Agile software development methods:
review and analysis. VTT publication 478, Espoo, Finland, 107 p
7. Harvie DP (2016) Targeted scrum: software development inspired by mission command, vol
42, no 5, pp 476–489
8. Mundra A et al (2013) Practical scrum-scrum team: way to produce successful and quality
software. In: Proceedings of 13th international conference on computational science and its
applications, IEEE, pp 119–123
9. Hart MA (2011) Agile product management with scrum: creating products that customers love
by Roman Pichler. J Prod Innov Manag 28
10. Pan Z, Park H, Baik J, Choi H (2007) A Six Sigma framework for software process improve-
ments and its implementation. In: Proceedings of the 14th AsiaPacific software engineering
conference (APSEC’07), pp 446–453
11. Sutherland J, Schwaber K (2013) The scrum guide. The definitive guide to scrum: the rules of
the game
Toward a Generic Multi-modal Medical
Data Representation Model

K. M. Swaroopa, Nancy Kaur, and Girija Chetty

Abstract This paper presents a generic multi-modal medical data representation

model, based on utilizing the knowledge from the abundance of medical data in the
publicly available medical imaging databases. Using novel deep learning techniques,
this paper proposes an AI model that is technically capable of capturing character-
istics of complex health conditions. The findings from this work based on extensive
experimental work allows the development of robust and automatic detection of
gliomas, an aggressive form of brain tumors. This model can provide significant
benefits to the wide medical AI community and stimulate development and benefit
universal health care in the long term.

Keywords Multi-modal · Glioma · AI · Deep learning · Medical imaging

1 Introduction

Clinical diagnosis is a complex and complicated endeavor, combined with highly

expert skills requirements. It requires a full understanding of the patient’s condition
and medical history and must be validated with strong clinical evidence. Years of
specialized training and experience are necessary for a medical graduate to transform
into an independent and expert clinical consultant. The recent surge of computer-
aided decision support tools based on deep learning and Artificial Intelligence (AI) for
assisting clinicians have shown excellent improvement in quality of health services
provision, and personalized precision medicine. However, these improvements are
possible at the expense of large resource requirements, in terms of massive datasets
containing well-curated annotations and labels, and the simplistic inferences drawn
from limited data drawn from specific cohorts and cannot be generalized for all
populations. From a practical point of view, these successes do not translate for rare

K. M. Swaroopa · N. Kaur · G. Chetty (B)

Faculty of Science and Technology, University of Canberra, Bruce, ACT 2617, Australia
e-mail: girija.chetty@canberra.edu.au

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 385
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_33
386 K. M. Swaroopa et al.

diseases settings, or diseases without significant prevalence, in terms of an existing

and clear pathological understanding.
Recent developments in the machine learning field, especially the denoising diffu-
sion models, self-supervised learning and transformer networks, provide significant
promise, in terms of new pathways in utilizing massive heterogeneous imaging data
sources mostly sitting idle on medical institutions. The information from these data
sources can be leveraged in an innovative way for contribution toward improving the
generalization ability of computer-aided assistive technologies and clinical decision
support models.
This paper proposes to build a generic and universal representation model that
serves as a single, holistic comprehensive representation framework leveraging
heterogeneous multi-modal medical data sources and can contribute toward funda-
mental transformation in the way development of many downstream medical tasks
are approached, such as the early detection, segmentation, classification, severity
detection and staging of different rare and complex diseases from multi-modal data
sources. Representation models are pre-trained machine learning models which can
be fine-tuned toward a diverse range of downstream tasks. This paper proposes a novel
framework for development of representation models based on embedding multi-
modal medical information captured from multiple heterogeneous data sources. The
aim of the proposed generic, universal representation model is to:
• Provide improved generalization based on joint analysis of multiple imaging
modalities, by exploiting critical abilities of novel deep learning and AI algo-
rithms, to correctly handle unseen data, and making it trustable. The modali-
ties used in this research include different radiological scan modalities such as
X-ray, Computed Tomography (CT), Magnetic Resonance Imaging (MRI), and
Ultrasound (US) scans for different organs.
• Provide support for analysis with incomplete and missing input data elements
and modalities. Medical data are often incomplete for the reasons that some tests
were not necessary or simply that some pieces of information were not recorded.
A generic model will produce a representation even if any subset of the supported
modalities is not available.
• Leverage clinical database information to jointly train the data representation
model. Several publicly available databases contain non-imaging information
such as patient attributes, imaging parameters, and image reports, and can be
linked to the hospital wide EHR records system to find additional informa-
tion such as pathology data. It is possible to obtain an improved medical data
generalized representation model, by incorporating additional information in the
weakly-supervised and semi-supervised learning objectives.
Toward a Generic Multi-modal Medical Data Representation Model 387

2 Background

In recent years, AI models, especially the deep learning models, have shown
promising outcomes results on a wide range of medical data analysis tasks, ranging
from ultrasound-based lung disease detection [1] to detecting brain tumor based on
multi-modal MRI scans [2], with some results being on par or even surpassing human
expert performance. These promising outcomes illustrate a positive expectation for
automated and computer-aided medical data interpretation and diagnosis, with the
potential to inclusion in the clinical workflow, reduce errors in clinical practice,
and better health outcomes for patients. However, these successes come at the cost
of two important pre-conditions: (1) well-curated and labeled good-quality data in
extremely large volumes (requiring high resources and high curation costs), and (2)
Simplification of operational settings, attempting to answer a well-defined clinical
question from a single data modality.
The real-world scenarios are often the opposite. For most of the research prob-
lems, curating large datasets and good-quality data annotation is difficult, though
the data does, however, exist in abundance. Decades of digitized health records
and imaging data are sitting quietly on servers in health networks. The trouble
researchers often encounter is that these data are often not analysis ready. Even for
the same modality, the data may have different presentations (e.g., imaging result
varies by techniques, device makers and models, and operators), incomplete (e.g.,
some information is not recorded), and (most disappointingly) unlabeled, rendering
their use difficult. Data annotation is crucial and needs to be done by clinicians in
order to facilitate data analytics research, which is resource intensive. Even though
medical AI research is a field of scientific research, primarily focusing on developing
AI technologies; however, its development is significantly limited by the clinical
input from clinicians [3].
Representation models are pre-trained models which can be fine-tuned toward
different downstream clinical tasks. They do not remove the need for accumulating
data annotation but, with small-scale data annotation of the downstream task, they
can function with a strong generalization ability. From the technological aspect,
representation models take the form of deep learning neural network models and
often use self-supervised learning methods to learn on unlabeled data. In the natural
language processing (NLP) domain, massive-scale language representation models
are named foundation models, e.g., BERT [4], DALL·E [5], GPT-3 [6]. In the context
of computer vision, representation models cover the widely used convolutional neural
network (CNN, or ConvNet), such as ResNets [4] and the more recent Vision Trans-
formers (ViT) [7] which are often pre-trained on large datasets, such as ImageNet,
with or without using labeled annotations.
In real-world clinical practice, a clinical judgment takes account of the patient’s
demographics, medical history, as well as imaging exam and/or non-imaging test
results (e.g., blood test). For imaging analysis, general computer vision pre-trained
representation models are widely used as the starting point for medical image anal-
ysis. However, pre-trained models using medical imaging data are scarce. The few
388 K. M. Swaroopa et al.

existing models [1, 8, 9] are limited to a specific imaging modality and designed
to solve only the downstream clinical task, and they do not consider non-imaging
modalities. Therefore, there is a lack of a generic universal representation model that
can handle multi-modal data at the same time. A general-purpose medical represen-
tation model, with a wide coverage of imaging modalities and tasks, will create a
breakthrough transformative innovation in the field of medical image analysis. Such a
model will stimulate the flourishing development of medical AI in general, making a
positive impact on the health of mankind. The research reported in this paper lays the
foundation by creating initial representations toward a general-purpose medical data
representation model, with a reference implementation focusing multiple imaging
models that are applicable for different downstream tasks. The model is exten-
sible for combining both imaging and non-imaging modalities such as electronic
health records. The performance of the representation model proposed was evaluated
against different downstream task specific metrics, such as classification, detection,
or segmentation tasks. The proposed representation model can be extended hori-
zontally to model other imaging modalities such as the histopathology imaging and
OCT and inspire the technical development of generic vision foundation model, a
universal model to produce representations for images and videos from different
imaging technologies including natural scenes and medical imaging data.

3 Proposed Representation Model

The proposed medical data representation model can be highly significant for the
medical image analysis research community by addressing one of the most funda-
mental problems in the field of medical image analysis, which carries a huge potential
benefit to health care area. It will also advance the broad development of medical AI
in general, subsequently providing benefits to the local and global healthcare system.
The outcome from this research can provide an alternative source of generalization
ability for many medical data analysis projects. This representation model will facil-
itate an agile development cycle for diseases with rare prevalence (i.e., an incident
rate less than 1 in 2000 in a general population [10]) or where there is limited data. It
will also aid the researchers in making sensible development and validation decisions
on small-scale datasets before needing to perform a large multi-center validation.
With data availability being a long-standing limiting factor in the field of medical
image analysis, researchers often adopt ready-to-use computer vision representation
models to improve generalization of their learning models. However, there is a signif-
icant gap between natural senses (e.g., photos and videos from digital cameras and
smartphones) and medical imaging modalities (e.g., MRI, US, CT, and X-ray videos
and images). Closing the pre-training and downstream task domain gaps is the key to
successful transfer learning. Recently, several medical image representation models
were proposed [11–13]; however, each of these targets a specific imaging modality,
which narrows the scope of the downstream tasks to which it can provide benefit.
Further, existing multi-modal methods aim at fusing information from pre-selected
Toward a Generic Multi-modal Medical Data Representation Model 389

Fig. 1 A generic multi-modal data representation model

modalities under the assumption that the targeted modalities must all be presented,
and if any of the modality is missing or inadequate, this impacts on the representa-
tion model and downstream clinical application. This assumption is not always true
in the real world, and it limits the number of eligible studies to be included in the
model’s learning [14]. Also, the information in hospital databases is usually scattered
in the database tables, medical reports, and imaging data files. It is often incomplete
and non-uniformly stored. However, the most of this information is not ready to use
for studies, due to the need for data preparation involving cleaning, pre-processing,
and removal of noise and acquisition artifacts. All these activities need input from
multidisciplinary experts, from clinical, and database exploration professionals [15]
(Fig. 1).
The focus of the proposed generic, universal representation model is to: (1) Effi-
ciently capture the common anatomic representation of the internal human body and
maximize sharing and training of model parameters to improve efficiency of the
model; (2) Build a robust representation model to handle missing data elements and
modalities, particularly for EHR and pathology test results, and clinical text notes,
and redundant data from multiple imaging modalities.
By leveraging the recent advances in deep learning such as text and vision
transformer-based learning architectures, and by combining features at multiple
levels, including low level features, cross modal features, high-level features, the
representation model developed is capable of extracting complex and latent, hidden
information from multiple data sources, and can solve multiple downstream clinical
application tasks, such as disease detection, classification, or pathology segmentation
for assessing severity, or staging and tracking of disease progression due to different
pharmacological interventions.
390 K. M. Swaroopa et al.

4 Experimental Work

In this section, we report the study design and methodology for the proposed generic
medical data representation model for a use case involving two different downstream
clinical tasks:
1. Semantic segmentation task involving multi-modal brain images for extracting
meaningful tumor regions, such as the active tumor (AT), necrotic core (NCR),
and peritumoral edematous/infiltrated tissue (ED), directly from multi-modal/
multi-parametric MRI scans (T1w, T1ce, T2w, and FLAIR).
2. Classification task involving segmented brain tumors from multi-parametric
MRI, and extracting features and associate them with tumor severity, contributing
to better prognosis and treatment.
Task 1: For segmentation downstream task, we used the brain tumors
dataset from the Medical Segmentation Decathlon challenge (http://medicaldecat
hlon.com/) [16]. The data is collected from the Multi-modal Brain Tumor Image
Segmentation Benchmark Challenge (BraTS) dataset from 2016 and 2017 [17]. The
task is to segment tumors into three different subregions (active tumor (AT), necrotic
core (NCR), and peritumoral edematous/infiltrated tissue (ED)) from multi-modal
multisite MRI data (T1w, T1ce, T2w, and FLAIR). There are 388 subjects in the
dataset, with each subject consisting of four 3D volumes (T1, T1c, T2, FLAIR) and
corresponding manual annotated labels. Each of the T1, T1c, T2, and FLAIR volume
images are of size 240 × 240 × 155. Ten percent of the data was used for testing,
and 90% data for training and validation. We also performed data augmentation,
with an aim to increase the diversity of the data set by performing random, realistic
transformations, such as rotations, flips, zooming, pixel intensity modifications, and
much more. This also ensures a degree of invariance to these transformations for the
resulting trained models, resulting in better generalization. There are many possible
data augmentation techniques, ranging from basic to more advanced transformations,
including methods for combining multiple images into sets of “new” images (e.g.,
what’s called “CutMix” or “MixUp” and more). When doing data augmentation, it is
vital that the transformations won’t change the correct label (for example, zooming
in on a region of the image that doesn’t contain the information needed to assign the
class of the original image. In our case, we normalized the image, resized them all
to the same size, and did some random motion as our data augmentation.
The next step for building the model was choosing the deep learning architecture,
and for this we used an enhanced UNet architecture as shown in Fig. 2. We also used
a novel loss function based on combining the Dice loss and Cross-Entropy loss, as a
weighted sum of these two losses. Figure 3 shows the trained model performance in
terms of prediction on the validation subset of the data. With just one epoch, the model
performance obtained on test set was a multi-dice score of 0.7252, 0.5850, 0.7105
for each label (active tumor (AT), necrotic core (NCR), and peritumoral edematous/
infiltrated tissue (ED).
Task 2: The second downstream task considered was extracting imaging-
biomarkers for brain cancer analyses from MRI of glioma. This can be done by
Toward a Generic Multi-modal Medical Data Representation Model 391

Fig. 2 Enhanced UNet architecture for deep learning model development

Fig. 3 Segmentation model performance visualization for validation and test dataset

using a deep learning model to segment brain tumors from multi-parametric MRI,
and then extracting features from the resulting tumor. Such features can potentially
be associated with tumor severity and prognosis and contribute to better treatment.
Extracting features from objects of interest in medical images for diagnostic purposes
is often referred to as radiomics [18].
The goal of radiomics is to extract information from medical images that can
be used as part of a medical imaging-based diagnostic workflow. The information
can be extracted from various imaging modalities, e.g., different MRI contrasts, PET
imaging, CT imaging, and so on. One can then combine it with other sources of infor-
mation (e.g., demographics, clinical data, and genetics). In such a way, radiomics—
and radio genomics—can open the door to sophisticated and powerful analyses.
By estimating the locations and extent of the brain tumors’ T2-enhancing and non-
enhancing regions, we can then use this to extract the tumor location and the tumor
burden. Additionally, we can look at the features of the MRI images inside each of
these two tumor parts. Tumor burden and anatomical locations are highly informa-
tive when assessing prognosis and planning treatment in brain tumors. Once we can
segment the tumors, we automatically obtain tumor volumes. If we have repeated
scans of the same tumors, we obtain estimates of tumor progression. By further anal-
yses one can also estimate the anatomical locations of the tumors. Figure 4 shows
how segmented tumor regions from task 1 can be analyzed further with extraction
of several features to assess the disease progression. For task 2 we used versions of
the MRI images that have already been co-registered and converted to NIfTI format
392 K. M. Swaroopa et al.

Fig. 4 Extracting radiomic features from segmented masks from task 1

from the TCGA collection [19]. We’ve prepared a small sample dataset containing
data from 10 subjects. We used the same UNet architecture for building the segmen-
tation model and segmented masks from task 1 were used for extracting radiomic
features. The radiomic features together with the other information we have about
the subjects can provide relevant clinical information. In other words, to what extent
the various features are associated to clinical outcomes (e.g., survival), either indi-
vidually or together. This can be done using, e.g., plots, basic statistics, statistical
modeling, or machine learning. Figure 5 shows the outcomes of analysis in terms of
length of survival for our subjects. It shows how various radiomics features relate
to IDH mutation status, survival times and volume of the enhancing-non-enhancing
tumor regions for each subject. Note that multiple sources of information beyond
MRI could be valuable when assessing a glioblastoma case. A system tasked with
extracting relevant, actionable information should therefore have access to more than
the MRI images. This is a general principle in medicine, that important information
about a patient, disease, or condition is represented in a vast set of heterogeneous data.
This leads to the need for integrated diagnostics and proposed generic multi-modal
representation model accommodates integrated diagnostics to be performed.

Fig. 5 Length of survival of subjects from radiomic feature analysis of segmented masks
Toward a Generic Multi-modal Medical Data Representation Model 393

5 Conclusions and Further Plan

A generic multi-modal medical data representation model is presented in this paper,

based on utilizing the knowledge from the abundance of medical data in the publicly
available medical imaging databases. Using novel deep learning techniques, the
proposed AI model is technically capable of capturing health condition characteris-
tics for complex diseases such as glioblastoma which requires integrated diagnostics
with multiple downstream tasks, including segmenting different tumor regions from
multi-modal multi-parametric scans, and extracting radiomic features to assess and
track the level of tumor progression. Further work in progress involves combining
the genomic data and clinical EHR data for an integrated cancer diagnostics. The
proposed model can provide significant benefits to the wide medical AI community
and stimulate development and benefit universal health care in the long term.

References

1. Durrani N, Vukovic D, van der Burgt J, Antico M, van Sloun RJG, Canty D, Steffens M, Wang
A, Royse A, Royse C, Haji K, Dowling J, Chetty G, Fontanarosa D (2022) Automatic deep
learning-based consolidation/collapse classification in lung ultrasound images for COVID-
19 induced pneumonia. Sci Rep 12(1):1–15 [17581]. https://doi.org/10.1038/s41598-022-221
96-y
2. Ahmad P, Qamar S, Shen L, Rizvi SQA, Ali A, Chetty G (2022) Multi-scale 3D UNet: multi-
scale 3D UNet for brain tumor segmentation. In: Crimi A, Bakas S (eds) International MICCAI
brainlesion workshop: glioma, multiple sclerosis, stroke and traumatic brain injuries—7th
international workshop, BrainLes 2021, held in conjunction with MICCAI 2021 (Lecture Notes
in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics), vol 12963 LNCS). Springer, pp 30–41. https://doi.org/10.1007/978-
3-031-09002-8_3
3. Park S, Kim G, Oh Y, Seo JB, Lee SM, Kim JH, Moon S, Lim J-K, Ye JC (2021) Vision
transformer for covid-19 cxr diagnosis using chest x-ray feature corpus. arXiv preprint arXiv:
2103.07055 (PREPRINT)
4. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P,
Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Inf Process
Syst 33:1877–1901
5. Caron M, Touvron H, Misra I, J´egou H, Mairal J, Bojanowski P, Joulin A (2021)
Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF
international conference on computer vision, pp 9650–9660
6. Ramesh A, Pavlov M, Goh G, Gray S, Voss C, Radford A, Chen M, Sutskever I (2021) Zero-
shot text-to-image generation. In: International conference on machine learning, pp 8821–8831.
PMLR
7. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, pP 770–778
8. Chen RJ, Krishnan RG (2022) Self-supervised vision transformers learn visual concepts in
histopathology. In: Learning meaningful representations of life (LMRL) Workshop, NeurIPS,
pp 558–575
9. Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Lu T, Luo P, Shao L (2021) Pyramid vision
transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings
of the IEEE/CVF international conference on computer vision, pp 568–578
394 K. M. Swaroopa et al.

10. Born J, Wiedemann N, Cossio M, Buhre C, Brändle G, Leidermann K, Goulet J, Aujayeb

A, Moor M, Rieck B, Borgwardt K (2021) Accelerating detection of lung pathologies with
explainable ultrasound image analysis. Appl Sci 11(2):672. https://doi.org/10.3390/app110
20672
11. Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet:
transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:
2102.04306
12. He K, Chen X, Xie S, Li Y, Doll´ar P, Girshick R (2022) Masked autoencoders are scalable
vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pp 16000–16009
13. Elliott EJ, Zurynski YA (2015) Rare diseases are a ‘common’ problem for clinicians. Aust Fam
Physician 44(9):630–633
14. Bakas S, Reyes M, Jakab A, Bauer S, Rempfler M, Crimi A et al (2018) Identifying the best
machine learning algorithms for brain tumor segmentation, progression assessment, and overall
survival prediction in the BRATS challenge. arXiv preprint arXiv:1811.02629
15. Swaroopa KM, Chetty G (2021) Multimodal segmentation based on a novel 3D U-Net deep
learning architecture. In: Khan MGM, Chetty G, Xia F (eds) 2021 IEEE Asia-Pacific conference
on computer science and data engineering, CSDE 2021 (2021 IEEE Asia-Pacific conference
on computer science and data engineering, CSDE 2021). IEEE, Institute of Electrical and
Electronics Engineers. https://doi.org/10.1109/CSDE53843.2021.9718438
16. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end
object detection with transformers. In: European conference on computer vision, pp 213–229.
Springer
17. Biessmann F, Salinas D, Schelter S, Schmidt P, Lange D (2018) “deep” learning for missing
value imputation in tables with nonnumerical data. In: Proceedings of the 27th ACM
international conference on information and knowledge management, pp 2017–2025
18. Liao Z, Girgis H, Abdi A, Vaseli H, Hetherington J, Rohling R, Gin K, Tsang T, Abolmaesumi P
(2019) On modelling label uncertainty in deep neural networks: automatic estimation of intra-
observer variability in 2d echocard ography quality assessment. IEEE Trans Med Imaging
39(6):1868–1883
19. Antonelli M, Reinke A, Bakas S et al (2022) The medical segmentation decathlon. Nat Commun
13:4128. https://doi.org/10.1038/s41467-022-30695-9
Universal Object Detection Under
Unconstrained Environments

Nancy Kaur, K. M. Swaroopa, and Girija Chetty

Abstract This paper presents a universal object detection framework for uncon-
strained environment settings where machines can only learn from massive unlabeled
multimodal data and a few labeled data. This research aims to tackle key challenges in
computer vision and expects to produce next-generation object detection techniques
that can effectively detect objects of diversified categories in complex application
settings. The proposed universal object detection framework is based on a novel
formulation to solve anomaly detection problem leveraging multimodal heteroge-
neous data sources and denoising diffusion models and application to a wide set of
complex application settings.

Keywords Object detection · Multimodal · Anomaly detection · Diffusion models

1 Introduction

Universal object detection is one of the most important image analysis and computer
vision tasks, and fundamental for many cutting edge and ground-breaking tech-
nologies like autonomous driving, intelligent robotics, and so on. The goal of an
object detection task is to detect objects of certain classes (such as humans, vehi-
cles, or animals) in data sources like images and videos. The advances in machine
learning and AI technologies, particularly deep learning networks has fueled signif-
icant progress in the state of the art in the object detection field [1, 2]. Inspired from
biological visual processing techniques, the recent deep learning-based algorithms
allow modeling of powerful and deeper visual knowledge from large-scale and well
labeled datasets and obtain superior object detection performance as compared to

N. Kaur · K. M. Swaroopa · G. Chetty (B)

Faculty of Science and Technology, University of Canberra, Bruce, ACT 2617, Australia
e-mail: girija.chetty@canberra.edu.au

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 395
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_34
396 N. Kaur et al.

traditional feature engineering methods. Availability of large-scale labeled datasets

is the key to obtain exemplary performance of object detection task [3], where iden-
tification of ground truth labels, is undertaken by human annotators for labeling and
localization of objects seen in images and video streams [4]. This can then enable
sophisticated supervised learning models to be built and can lead to excellent object
detection performance. However, in real-world settings, though the massive data can
be collected publicly, it is highly labor intensive and expensive to engage human
annotators to continuously label dynamically changing objects in the massive data
collected. For instance, a typical real-world setting involving a small video recording
of even five minutes captured from a crowded urban environment, would need manual
labeling of hundreds of objects, with several annotators spending hours to locate the
objects and create labels. Without these manual, expert generated labels, existing deep
learning models based on supervised learning strategy, usually have sub-optimal
object detection performance, particularly for unseen objects, or rare objects, not
available in the annotated data. This shortcoming is more pronounced in medical
object detection field, where the annotators need to be domain experts and special-
ists in the medical discipline. This has significant limitations on generalizability and
real-world applicability of current AI and machine learning (AI/ML) technologies in
practice. Further, most of the methods developed rely on single and isolated sources
of information for building AI/ML models, which make them incapable of handling
complex and dynamic application settings. One of reason for limited generalizability,
performance and robustness of existing AI/ML models could be lack of generic and
universal frameworks and approaches, as most of the methods are designed to work
for a specific downstream task in a restricted application setting and cannot be easily
adapted and translated to other downstream tasks and application settings. Hence
there is an urgent need for a generic and university framework for automatic object
detection that can work for any downstream task, and for any application setting.
In this paper, we propose a novel universal object detection framework that can
be used for a wide set of downstream tasks and application settings. The proposed
object detection framework addresses some of the main shortcomings of the current
AI/ML-based machine learning models, such as the:
• Inability of current deep learning models to learn from unannotated data.
• Inadequate object detection performance, and poor quality of generated pseudo
labels from the recent self-supervised learning models with unlabeled data.
• Inability to exploit the multimodal data and complementary and co-operative
information available in multiple sources for developing deep learning models
with unlabeled data.
The proposed universal object detection framework aims to address some of the
challenges of recent methods, in addition to their suitability for a wide set of down-
stream tasks and application settings. Rest of the paper is organized as follows. The
next section presents the background and related work, and proposed object detection
framework is presented in Sect. 3. The detail of the experimental work is presented
in Sect. 4, and the paper concludes in Sect. 5 with some conclusions of the research
and plans for further work.
Universal Object Detection Under Unconstrained Environments 397

2 Background

Recently several deep learning models have been developed to address the chal-
lenges associated with difficulties of labeling and annotations, particularly in medical
object detection field, where anyone and everyone can’t annotate the ground truth,
and availability of domain experts in a particular medical specialty is needed. Some
of these recent algorithms attempt to address this challenge, based on learning algo-
rithms, such as the self-supervised learning [5], multimodal learning [6], and few-
shot learning [7, 8]. While the self-supervised learning approach relies on gener-
ation of pseudo labels from unlabeled data for building deep learning models,
multimodal learning approach attempts to learn from information from different
data modalities like texts descriptions and 3D point cloud arrangements. Few-shot
learning approaches on the other hand, focus on learning from data with only one
or a few labels, or with only a few categories of objects labeled. However, these
approaches make several assumptions and lead to less-than-optimal performance
outcomes particularly for medical object detection, such as:
• The current self-supervised learning schemes for object detection use binary
pseudo labels for learning on unlabeled data, and focus on the image classifica-
tion task, resulting in inaccurate prediction of object co-ordinates due to primitive
modeling of object appearances [9].
• The state-of-the-art approaches proposed recently on multimodal learning for
object detection rely on strategies for mining visual information either from 3D
point cloud data [10] or from text-based image caption data [6]. However, they
show acceptable performance outcomes with models based one or two modalities,
due to modeling difficulties in disparate and heterogeneous data sources.
• The existing few-shot learning approaches for object detection still count on a large
amount of well-labeled baseline training datasets before learning on a few labels
and have been unable to embed the prior knowledge in their learning schemes
[8, 11].
The universal object detection framework proposed in this paper aims to address
the short comings of these recent approaches, particularly in terms of their applica-
bility to medical object detection task, which is a significantly complex endeavor. The
novel approach proposed here is based on approaching the complex medical object
detection task as a weakly supervised anomaly detection problem and employing
denoising diffusion models. The proposed approach has the capability to outperform
the current weakly supervised object detection methods that rely on generative adver-
sarial networks or autoencoder models. These traditional models are more difficult to
train or find it challenging to preserve fine details in the image. The universal object
detection framework we propose here is based on novel denoising diffusion models
for solving the weakly supervised anomaly detection problem and uses a combination
of deterministic iterative noising and denoising scheme with classifier guidance for
398 N. Kaur et al.

image-to-image translation between diseased and healthy subjects. This approach

has capability to generate detailed anomaly detection maps without the need for
a complex training procedure, and applicable to any operational setting, including
medical and non-medical object detection. The evaluation of the proposed approach
was done on medical imaging decathlon dataset [12]. Next section described the
details of the proposed denoising diffusion models for universal and generic object
detection framework.

3 Denoising Diffusion Models

For medical object detection, it is difficult to obtain ground truth annotations and
labels at pixel-level, and often they are biased to human annotators, or simply
unavailable. Addressing these difficulties by formulating the problem of medical
object detection as a weakly supervised anomaly detection task, particularly with
novel denoising diffusion models can be a promising line of investigation. The main
advantage of this approach is that the weakly supervised anomaly detection relies
only on availability of few image-level labels in the model building stage. Figure 1
shows the schematic for the proposed universal object detection framework based
on denoising diffusion modeling-based weakly supervised anomaly detection task.
By assuming two unpaired sets of images for the model building phase in the
training stage, with the first set containing images of healthy subjects and the second
set with images of subjects affected by a disease, we need only the image and the
corresponding image-level label (weakly supervised setting) as healthy or diseased
during training stage. The model building phase comprises two parts, where the first
part involves building a denoising diffusion model [13], followed by a two-class clas-
sifier for classifying healthy and diseased tissues. In the second part, the anomaly
map for an unseen image without any labels is built using the model built in part 1.
The overall scheme is based on using a reverse sampling scheme with a diffusion
process, and encoding the anatomical information of the image, which is essen-
tially an iterative noising process. And a subsequent denoising stage with classifier

Fig. 1 Universal object detection framework based on denoising diffusion modeling-based weakly
supervised anomaly detection task
Universal Object Detection Under Unconstrained Environments 399

guidance to generate image of the healthy tissue. The final stage involves a pixel-
wise anomaly map between the original and synthetic image built with denoising
diffusion model, allowing identification of diseased tissue. The iterative encoding
and denoising process allows preservation of most of the details of the input image
that represents normal tissue, and enhances the tissue affected by the disease on
the synthesized image. The workflow for medical object detection as an anomaly
detection task based on denoising diffusion involves image-to-image translation,
comprising transformation of an image of a patient to an image without any patholo-
gies. It is important to note, that only pathological regions in the image are changed
in this process, and rest of the image is preserved. This allows the anomaly map to be
constructed as the difference between the original and translated image. Using diffu-
sion models, detail preservation during image-to-image translation is more efficient
as compared to other modeling approaches such as the variational autoencoders and
generative adversarial networks (GANs). Further, the unique formulation of object
detection problem as a weakly supervised anomaly detection task, based on lever-
aging the multimodal data sources, along with denoising diffusion models, it is
possible to obtain improved generalizability and applicability for any downstream
task or application settings, including classification, segmentation, or detection task
for medical or non-medical application settings. Diffusion models, being based on the
Markov chain theory, learn to generate their synthetic outputs by gradually denoising
an initial image packed with random Gaussian noise. This iterative denoising process
makes the inference runs of diffusion models significantly slower than other gener-
ative models, but in exchange, it allows them to extract more representative features
from their input data, enabling them to outperform other models in the end. The eval-
uation of the proposed universal object detection framework was done for a medical
object detection task involving detection of brain tumors from MRI scans. For this
task, brain tumor detection was done from multimodal multiparametric scans from
publicly available medical segmentation decathlon dataset (MSD-Task 01 dataset
[12]). The denoising diffusion model was developed, using a two-dimensional (2D)
axial slice from multiple modalities, a T1-weighted (T1), a contrast-enhanced T1-
weighted (T1Gd), a T2-Weighted (T2), or a T2 fluid attenuated inversion recovery
(T2-FLAIR) sequence of a brain MRI, and a user-defined cropped area of that slice
was synthesized, to represent a realistic and controllable image of either a high-grade
glioma and its corresponding components (e.g., the surrounding edema), or tumor-
less (apparently normal) brain tissues. Further theoretical details of approach used
for development of denoising diffusion probabilistic models are provided in [13, 14].
400 N. Kaur et al.

4 Experimental Work

The experimental validation of the proposed universal object detection framework

based on leveraging multimodal data sources and denoising diffusion models was
done on publicly available MSD Challenge—Task 01 dataset [12]. The Task-01
dataset is the brain tumor dataset and is a subset of the data used in the 2016 and
2017 Brain Tumor Image Segmentation (BraTS) challenges [15]. Specifically, 750
multiparametric magnetic resonance imaging (MRI) scans from patients diagnosed
with either glioblastoma or lower-grade glioma were included. The multiparametric
MRI sequences of each patient included native (T1) and post-Gadolinium (Gd)
contrast T1-weighted (T1-Gd), native T2-weighted (T2), and T2 fluid attenuated
inversion recovery (T2-FLAIR) volumes. These MRI scans were acquired during
routine clinical practice, using different equipment and acquisition protocols, among
19 different institutions and pooled to create a publicly available benchmark dataset
for the task of segmenting brain tumor sub-regions (i.e., edema, enhancing, and non-
enhancing tumor). The scanners used for the acquisition of these scans varied from
1 to 3 T. All scans were co-registered to a reference atlas space using the SRI24
brain structure template [16], resampled to isotropic voxel resolution of 1 mm3 , and
skull-stripped using various methods followed by manual refinements. Gold stan-
dard annotations for all tumor sub-regions in all scans were approved by expert
board-certified neuroradiologists.
The diffusion model was trained with a similar formulation as in [13], without
data augmentation. As outlined before, the MSD-Task01 dataset contains 3D brain
Magnetic Resonance (MR) images of subjects with a brain tumor, as well as pixel-
wise ground truth labels. Every subject in the dataset, has a scan corresponding to
four different MR sequences, namely, T1-weighted, T2-weighted, FLAIR, and T1-
weighted with contrast enhancement (T1Gd). Since our focus is building proof of
concept diffusion models, we focused on a 2D approach, considering only axial slices,
with each slice containing four modalities/channels (T1, T1Gd, T2, T2-FLAIR). We
used a dataset with a partition of 388 training files, 96 validation files, and used a
UNet model for guided training of denoising stage. Each image slice was of size 240
× 240 pixels, and there were 160 slices in each MRI volume. The hyperparameters
used were 250 training epochs, log-dice loss, Adam optimizer, and LR ratio set to
0.0001. Since tumors mostly occur in the middle of the brain, we excluded the lowest
60 slices and the uppermost 60 slices. A slice is considered healthy if no tumor is
found on the ground truth label mask. All other slices get the image-level label as
“diseased”.
We used 5598 healthy slices, and 10,607 diseased slices for training and validation
with a split of 80:20 for training and validation sets for building the UNet model.
The test set consists of 1082 slices containing a tumor, and 705 slices without tumor.
For the evaluation of proposed denoising diffusion based universal object detection
model, we compare our method to two of the most popular formulations, with the
same dataset (MSD-Task01), namely the variational autoencoder (VAE) proposed in
[17] and generative adversarial network (GAN) proposed in [14]. For ablation studies,
Universal Object Detection Under Unconstrained Environments 401

random noise was added for L steps to the input image and sampling was performed
using the denoising diffusion probabilistic model with UNet classifier guidance. For
all experiments, the two additional hyper parameters, the number of steps were set
to 500 and number of samples to 100. Figure 2 shows the example patient images
for all four MRI image sequences from MSD-Task01 dataset, and comparison of the
proposed method with the GAN and variational autoencoder approach. And Fig. 3
shows the anomaly map for a healthy subject, with no anomaly seen in the anomaly
map, along with a subject with small size of tumor, and how the proposed model
is able to detect that accurately. As can be seen in Fig. 2, the proposed denoising
diffusion model performs as good as GAN, without expensive data augmentation and
complex deep learning architecture design requirements for guidance stage. Figure 4
shows additional results for a diseased image and the anomaly map produced and
their close similarity to the ground truth label/masks available in the dataset.

Fig. 2 Results for an image of the MSD-Task01 dataset for L = 500 and s = 100
402 N. Kaur et al.

Fig. 3 Results for an image of the MSD-Task01 dataset for L = 500 and s = 100 (Top image for
a healthy subject)

5 Conclusions and Further Plan

In this paper, we present a Universal Object Detection framework to work under

unconstrained and complex settings, such as the medical object detection settings.
The model leverages the multimodal data sources, and denoising diffusion models for
solving object detection problem as a weakly supervised anomaly detection formula-
tion. By combining the iterative noising and guided classifier denoising, it results in
improved tumor detection performance as compared to other computationally inten-
sive methods such as generative adversarial networks and variational autoencoders.
Further work is in progress for examining the proposed object detection framework
for other downstream tasks in medical object detection available in MSD dataset, as
well as non-medical object detection tasks.
Universal Object Detection Under Unconstrained Environments 403

Fig. 4 Additional results for diseased subjects from the MSD-Task01 dataset for L = 500 and s =
100

References

1. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, … Fei-Fei L (2015) Imagenet large

scale visual recognition challenge. Int J Comput Vision 115(3):211–252
2. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional
neural networks. Commun ACM 60(6):84–90
3. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, … Guo B (2021) Swin transformer: hierarchical
vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international
conference on computer vision, pp 10012–10022
4. Shao S, Li Z, Zhang T, Peng C, Yu G, Zhang X, … Sun J (2019) Objects365: a large-scale, high-
quality dataset for object detection. In: Proceedings of the IEEE/CVF international conference
on computer vision, pp 8430–8439
5. He K, Fan H, Wu Y, Xie S, Girshick R (2020) Momentum contrast for unsupervised visual
representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and
pattern recognition, pp 9729–9738
404 N. Kaur et al.

6. Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, … Sutskever I (2021) Learning
transferable visual models from natural language supervision. In: International conference on
machine learning. PMLR, pp 8748–8763
7. Wang YX, Girshick R, Hebert M, Hariharan B (2018) Low-shot learning from imaginary
data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp
7278–7286
8. Joseph KJ, Khan S, Khan FS, Balasubramanian VN (2021) Towards open world object detec-
tion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
pp 5830–5840
9. Bar A, Wang X, Kantorov V, Reed CJ, Herzig R, Chechik G, … Globerson A (2022) Detreg:
unsupervised pretraining with region priors for object detection. In: Proceedings of the IEEE/
CVF conference on computer vision and pattern recognition, pp 14605–14615
10. Deng J, Shi S, Li P, Zhou W, Zhang Y, Li H (2021) Voxel r-cnn: towards high performance voxel-
based 3d object detection. In: Proceedings of the AAAI conference on artificial intelligence,
vol 35, no 2, pp 1201–1209
11. Kang B, Liu Z, Wang X, Yu F, Feng J, Darrell T (2019) Few-shot object detection via feature
reweighting. In: Proceedings of the IEEE/CVF international conference on computer vision,
pp 8420–8429
12. Antonelli M, Reinke A, Bakas S et al (2022) The medical segmentation decathlon. Nat Commun
13:4128. https://doi.org/10.1038/s41467-022-30695-9
13. Nichol AQ, Dhariwal P (2021) Improved denoising diffusion probabilistic models. In: Interna-
tional conference on machine learning. PMLR, pp 8162–8171. https://doi.org/10.48550/arXiv.
2102.0967 (2018)
14. Siddiquee MMR, Zhou Z, Tajbakhsh N, Feng R, Gotway MB, Bengio Y, Liang J (2019)
Learning fixed points in generative adversarial networks: from image-to-image translation to
disease detection and localization. In: Proceedings of the IEEE/CVF international conference
on computer vision, pp 191–200
15. Menze BH et al (2015) The multimodal brain tumor image segmentation benchmark (BRATS).
IEEE Trans Med Imaging 34:1993–2024
16. Bakas S, Reyes M, Int E, Menze B (2018) Identifying the best machine learning algorithms
for brain tumor segmentation, progression assessment, and overall survival prediction in the
BRATS challenge. arXiv preprint arXiv:1811.02629
17. Chen X, Konukoglu E (2018) Unsupervised detection of lesions in brain mri using constrained
adversarial auto-encoders. arXiv preprint arXiv:1806.04972
Internet of Things-Based 3-Lead ECG
Signal Acquisition System

Pranamya Sinha, Anuja Arora, Sunil Kumar, Daya Bhardwaj,

and Ravi Kumar

Abstract IoT, also known as the Internet of Things, has been a promising technology
that involves numerous devices worldwide to stay connected to collect, share and
analyze data, thereby building invocations that can help humanity. The IoT-enabled
system consists of a smart device connected to processors, sensors, and communi-
cation hardware to receive data from the environment, process it into digital signals,
and transfer it to devices for analysis. Internet of Things-Based 3-Lead ECG Signal
Acquisition System is an efficient trio of components. It is a patient-centric moni-
toring system, which allows them to actively collect real-time rhythmic contraction
and relaxation of cardiac muscles as signals and store the data in the database. It can
then be analyzed using personal devices like laptops and smartphones. Providing
a window for actionable insights in cases of emergencies. In this paper, we have
constructed an IoT-based ECG system by taking into account the basis of ECG
mechanism. It obtains impulses via electrodes from specific locations on the body
as input for the AD8232 sensor operating at a supply range of 2–3.5 V. The sensor
filters biopotential signals, amplifies them, and transfers them to the microprocessor
(Arduino board Uno). The board obtains the signals from the sensor and, as per the
instructions given via the code through the compatible software, it processes the
signals, and sends them as an output for visualization by the system connected to
the board. The system produces an ECG pattern in the Arduino Ide Serial plotter
and provides heart rate and heart rate variability. The values can be obtained from
a serial monitor and are analyzed using different applications. IoT-based ECG can
be incorporated into ambulances for diagnosing various cardiovascular conditions
of the patients remotely by the doctor.

Keywords IoT · Electrocardiogram · Biopotential signals · Patient centric ·

Actionable insights · Heart rate · Heart rate variability and serial monitor

P. Sinha · A. Arora · S. Kumar · D. Bhardwaj · R. Kumar (B)

Shaheed Rajguru College of Applied Sciences for Women, University of Delhi, Delhi, India
e-mail: ravi.kumar@rajguru.du.ac.in
D. Bhardwaj
e-mail: daya.bhardwaj@rajguru.du.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 405
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_35
406 P. Sinha et al.

1 Introduction

In recent years, people have experienced a phenomenal change in the way of living
due to the pandemic. It has created distance between us leading to the expansion of
digital space, from the simple task of ringing bells to the diagnosis made possible
with rapidly advancing technology.
With health being a strong matter of concern these years, monitoring, balancing,
and maintaining health is really important. Requirement of friendly, simple, afford-
able, and compatible in-hand technology, is a savior. These devices can teach, help,
and learn about the human body and its behavior. Internet of Things, or IoT, has
the capability of connecting a billion devices together and allowing them to collect,
process, and share data. Thereby, utilizes the information to simplify major tasks.
This technology also supports cloud accessibility, giving access to people all over
the world to integrate their ideas.
The technology of IoT can be applied to the field of medicine where health care
professionals can diagnose and prescribe treatment to a patient in emergency at a
faraway location. Using the technology in hand, An ECG device has been built to
record, store, and process data so as to keep a real-time check of heart rhythm.
The prototype created uses personal devices like desktop, laptop, smartphone, etc.
which creates an interference between the hardware and database. A connection is
established between the sensor, microcontroller, and personal device through which
ECG data is received and visualized.

2 Theoretical Framework

Increasing population has the urgent need of rapid tests. The easiest test to check
the functionality of the heart is by using ECG also known as Electrocardiogram
which records heart beats, rates, and rhythm over time as action potential propagates
throughout the heart with every cycle of systole and diastole. There is a standard
ECG pattern (Fig. 1) produced by healthy individuals consisting of
• P Wave—It is the first peak, obtained by the excitation of two arteries
• QRS Wave—It shows ventricular contraction (also known as QRS complex)
• T Wave—It shows ventricular relaxment once electrical impulse stops to spread

The Internet of Things-based 3 lead acquisition system calculates heart rate, heart
rate variability, and ECG pattern of a person and sends it to the cloud. Using the
system the user can monitor his health-related parameters. This system can also be
integrated into an ambulance wherein all the critical health-related parameters of
patients can be acquired and sent to the cloud through which clinicians can analyze
the condition in advance. The prime objective of 3 lead-based ECG systems is to
acquire the physiological parameters using sensors for various purposes.
Internet of Things-Based 3-Lead ECG Signal Acquisition System 407

Fig. 1 Standerd ECG signal

pattern

The heart muscle generates action potential via self-stimulating tissue whose
stimulus flows throughout the organ. At the time of depolarization action potential
travels through the cardiac muscle which the body can conduct to the surface. These
signals can be captured, amplified, and recorded.
Internet of Things-Based 3-Lead ECG Signal Acquisition System Works on a
voltage of 3.3 V supplied by the computer or electrical device connected to it and the
sensor picks the suitable potential from the electrical signals generated from the heart
via the electrodes placed in a specific position on the body to create an Einthoven
Triangle (Fig. 2a) [1]. The sum of vectors of the frontal plane in the Cardiac Vector
at any instant onto the three axes of the Einthoven Triangle is zero.
Bipolar limb lead electrodes known as classical limb lead which measures the
potential by two active electrodes placed on defined position on two limbs and
produces an algebraic sum of the potentials of 2 constituent active electrodes (Fig. 2b).
In a healthy human, heart rate ranges from 60 to 100 bpm. A lower heart rate at
rest showcases more efficient heart functioning and better cardiovascular condition.
For example, a well-trained cricketer might have a normal resting heart rate near to
40 bpm.
Heart rate variability is when the time interval between the heartbeats does not
fluctuate much and even though these fluctuations are not so strong, they can still
indicate current or future heat related problems and can also indicate mental health
issues like anxiety, stress depression which affects the cardiac activity [2].
3 Lead IoT-based ECG system measures Heart Rate Variability and Heart Rate
by using the formula:

Heart rate = 60/R − R intervals

HRV = HR/60 − RR interval
408 P. Sinha et al.

Fig. 2 a Einthoven Triangle.

b Three bipolar limb lead

Electrodes used to sense the heart rate in the form Einthoven Triangle are bipolar
limb leads which are a form of classical limb leads.
These leads measure the potential using two active electrodes placed on any two
limbs and represent the algebraic sum of the potentials of two constituent active
(electrodes) leads (Table 1).

Table 1 Types of leads use for the evaluation of ECG signal

Three bipolar limb lead
Lead I It records the potential at the left arm (LA) minus the potential at the right arm (RA), or
LA–RA (left arm positive)
Lead II Is the potential at the left leg (LL) minus the potential at the right arm (RA), or LL–RA
(left leg positive)
Lead This leads records the potential at the left leg (LL) minus the potential at the left arm
III (LA), or LL–LA (left leg positive)
Internet of Things-Based 3-Lead ECG Signal Acquisition System 409

3 Materials and Methods

The architect of Internet of Things-Based 3-Lead ECG Signal Acquisition System

consists of 3 major parts shown in Fig. 3. A Microprocessor (Arduino board Uno),
ECG electrodes, ECG sensors.
The prototype required an ECG sensor AD8232 to capture the cardiac signal,
Arduino Uno, a microcontroller that can combine signals once received from the
sensor and send the output (Fig. 3a).

3.1 Hardware Design

• ECG sensor AD8232—is the major component of the system which is responsible
to collect physiological data from the body and transfer it to the microprocessor.
Sensors are an integrated chip of a specialized instrumentation amplifier (IA), an
operational amplifier (A1), a right leg drive amplifier (A2), and a misapply refer-
ence buffer (A3). The sensor includes leads off detection circuitry and an automatic
fast restore circuit that brings back the signal shortly as leads are reconnected [3].
• Arduino Board Uno is a microcontroller board based on the ATmega328P. It has
14 digital input/output pins (of which 6 can be used as PWM outputs, 6 analog
inputs, a 16 MHz ceramic resonator, a USB connection, a power jack, an ICSP
header and a reset button). UNO contains everything that was needed to support
the microcontroller; simply by connecting it to a computer with a USB cable or
powering it with an AC-to-DC adapter or battery to get started [4].
• Bipolar electrodes were connected to the AD8232 sensor. There are 3 ECG nodes
(Positive—red, negative—yellow, and green—neutral). The sensor was connected

Fig. 3 a Block diagram of the prototype. b Circuit diagram

410 P. Sinha et al.

Fig. 4 Connection between

sensor board label and
Arduino Uno

to Arduino using Jumper wire in the following manner. Libelium has made the
sensor compatible with Arduino Uno USB cable was used to plug in the Arduino
board to the computer, for uploading of code and display of output [5] (Fig. 4).

3.2 Software

Arduino integrated development environment (IDE) version 1.8.19 was used to write
code and give commands to the Arduino board. The program code was written in
the language C++, along with which a library was installed called pulse sensor
playground which helped us to obtain the output. In order to capture data, Arduino
Ide [6] and Cool term software [7] was used. The graph obtained from the Arduino
Serial plotter was compared with the graph from Google Sheets formed by using
data obtained from Arduino Serial Monitor. In the flow chart (Fig. 5) there is a clear
demonstration of the part taken by the signals, its conversions and representation on
the signals. The signals acquired from body were amplified, converted to analogue
and processed by the commands to build a system that can take biological signals
and produce heart rate and heart rate variability.

3.3 Testing System

We tested the prototype system by fitting 3 electrodes, 2 electrodes were fit on both the
wrist and one of the electrodes was fit on the right foot. The ECG was connected to the
Sensor Platform and Arduino Uno. The Program Code was uploaded to the Arduino
Uno chip. The USB cable was connected to Arduino Uno. Graphs were obtained
through Arduino Serial Plotter and Data obtained from Arduino Serial Monitor were
extracted by using Coolterm application, extracted data was then converted to csv
file and uploaded to Google Sheets to obtain graphs.
All data is captured as numbers that refer to the sensor voltage’s value. Value
ranged from 450 to 700. The system established acquired signals from the surface
Internet of Things-Based 3-Lead ECG Signal Acquisition System 411

Fig. 5 Steps to extract data

using Arduino and heart beat
sensor

Fig. 6 Characteristics of signals obtained from 3 lead ECG acquisition system

by nonpolarized process and gave a signal that showed clear depiction of waves with
the following characteristics (Fig. 6).

4 Results and Discussion

The 3 lead IoT-based ECG system recorded the electrical activity of the heart at rest
and provided information about the heart rate and its contractions, depicting that
the heartbeats were normal. The signals generated by the Serial plotter COM 3 of
412 P. Sinha et al.

Arduino were same as that of standard ECG signals (PQRS signals) that appears
from an electrocardiogram machine. The P waves, QRS complex and T wave were
clearly distinguishable in the graph obtained from the data received from the system.
In the result (Fig. 7) the P wave obtained had a positive wave with a duration less
than/equal to 0.12. As the blood follows from artery to ventricle, prior ventricle
contraction is about an interval of about 0.1 to 0.2 is seen which indicates a normal
PR interval. Which is followed by a negative wave (Q wave), and R wave with highest
amplitude, representing the ventricle contraction. When the purine fibers depolarize
a long negative wave known as the S wave is produced. At the junction of end of
depolarization and appearance of repolarization a significant point exists known as J
point which holds a significance in identifying any metabolic disorder (its elevation
indicates hypothermia). As the J point crosses there is an increase in the amplitude
and a perk T wave appears which represents the repolarization of ventricles (the
shape of this wave is dependent on the body control and regulation mechanism).
Between the S and T wave lies an ST segment.
The regular time interval between two R waves is denoted as R-R interval. The
variation in the interval can be an indication of abnormal functioning of nerves and
these signals are also impacted by stress, body cycles and hormonal level. The results
obtained in the above graph depict a regular R-R interval showcasing a normal heart
activity. Also it was ensured that the person was at state of rest and calm while
monitoring its heart activity.
There is an appearance of short, positive, asymmetric P wave due to arterial
contraction followed by PR segment including a short QRS complex which is seen
due to rapid depolarization of ventricles indicating proper functioning of the system,

Fig. 7 Marking of the PQRST peaks in the output obtained from the prototype and depiction of R
interval
Internet of Things-Based 3-Lead ECG Signal Acquisition System 413

the depression/elevation is seen by a significant height difference, clearly depicting J

Point and PR segment. The start of ST segment is marked by J Point. The appearance
of the steeper slope in the graph demonstrates repolarization of the ventricles. The
result was monitored for 10 min.
The voltage values that appear in the serial monitor along with Timestamp are
captured using cool term software. The data got recorded as a CSV file to obtain the
line graph which resembled the standard ECG pattern (Fig. 7). The sensor presents in
the system analyzed the heart rate and heart rate variability whose values appeared in
serial monitors. Heart rate appeared to be 75–85 beats per minutes and the variation
between the heart rate is measured as heart rate variability which was 1.2 and ECG
millivolt value around 238–688 mv which was due to the electrical activity of the
organ.
Not much of the variability is seen in the heart rate as the individual examined for
the system had no history of any heart related issues and is at the state of rest at the
time of testing.

5 Conclusion and Future Work

This paper represents a prototype of an IoT-based 3 lead acquisition system and its
implementation as a healthcare monitoring system, based on knowledge of Electro-
cardiogram and IoT Technology. The system provides constant facilities to monitor
cardiac activity and related disease which can be used personally at home or in any
emergency situations. The data extracted is consistently updated to the database at
a regular time interval which can be analyzed in order to be assisted by clinicians,
patients, and their caretaker to monitor the health condition and take appropriate
actions. It can measure heart rate and heart rate variability which helps in identifi-
cation of different cardiac conditions. Data extracted from the prototype along with
ECG pattern can be easily interpreted by a person and it can be incorporated with
existing portable technology making it user friendly and convenient to be used daily.
The data obtained from the acquisition system can be uploaded or directly sent to the
doctors by different softwares for the continuous monitoring the health of patient.
Accuracy can be increased by increasing the electrodes which can lead to more signal
generation and therefore increasing the performance. The obtained data can be stored
in a database for future references or for analyzing a person’s heart health from his
previous records.

Acknowledgements This work was financially supported by Centre for Multidisciplinary

Research, Innovation & Entrepreneurship, Shaheed Rajguru College of Applied Sciences for
Women, University of Delhi, India
414 P. Sinha et al.

References

1. Ghia CL (2007) Textbook of physiology (7th ed). Jaypee Publication, New Delhi, pp 213–215
2. Circuits Digest: https://circuitdigest.com/microcontroller-projects/understanding-ecg-sensor-
and-program-ad8232-ecg-sensor-with-arduino-to-diagnose-various-medical-conditions. Last
accessed on 10 Sept 2022
3. https://www.analog.com/media/en/technical-documentation/data-sheets/ad8232.pdf. Last
accessed on 21 Aug 2022
4. Aurdino: https://docs.arduino.cc/hardware/uno-rev3. Last accessed on 5 Sept 2022
5. Microcontrillers: https://microcontrollerslab.com/ad8232-ecg-module-pinout-interfacing-with-
arduino-applications-features/. Last accessed on 15 Aug 2022
6. Aurdino: https://www.arduino.cc/en/software. Last accessed on 18 Sept 2022
7. Coolterm: https://coolterm.en.lo4d.com/windows. Last accessed on 18 Sept 2022
Index

A Devi Manjari, T., 317

Aadi Lochab, 27
Abhay Juvekar, 375
Abhik Banerjee, 201 G
Aditya Verma, 189 Gauri M. Karve, 121
Akhil, I., 237 Gautam Kumar, B., 317
Amit Joshi, 189 Geetanjali A. Vaidya, 121
Andjelic, Dragan, 225 Geetan Manchanda, 27
Anil Kumar, 351 Geetika Singh, 39
Anita Chaware, 375 Gunjan Rani, 59
Anjana Gosain, 281
Anshul Kulkarni, 189
Anuja Arora, 405 H
Anurag Mishra, 39, 327 Harsh Bamotra, 59

B J
Barenya Bikash Hazarika, 237, 317 Joanne Gomes, 211
Bhavya Sri, A., 237 Jyoti Chauhan, 1
Bhumika Papnai, 27
Brojo Kishore Mishra, 171
K
Kapil, 15, 269
C Kaur, Nancy, 385, 395
Chandni Agarwal, 327 Khoirom Motilal Singh, 317, 247
Charu Agarwal, 39 Kirti Jain, 59
Charul Bhatnagar, 327
Ch. Bhavya Sri, 147
Chetty, Girija, 385, 395 L
Chhabra, Amitabh, 225 Lathish, R., 237

D M
Daiss, Ivelina, 225 Mahesh Gawande, 109
Daya Bhardwaj, 405 Mangesh S. Thakare, 121

© The Editor(s) (if applicable) and The Author(s), under exclusive license 415
to Springer Nature Singapore Pte Ltd. 2023
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1
416 Index

Manju Khari, 293 S

Manjunath, T. C., 71, 85 Sakshi Garg, 59
Martí, Andrea T. J., 225 Samarjeet Borah, 171
Martí, José R., 225 Sanasam Inunganbi, 237, 247
Mohammed, Idi, 363 Savara Murali Krishna, 247
Mohit Zanwar, 189 Sharanjit Kaur, 59
Mohsin Imam, 281 Sharat Singh, 49
Shikha Badhani, 27
Shivam Pandey, 337
Shivani Bansal, 337
N
Shiv Kumar Verma, 351
Naga Bavana, K., 147
Shobha Lal, 159
Nagaraja, B. G., 71, 85
Sneha Sree, P. N. V. L. S., 247
Namdev Sawant, 211
Soham Pendkar, 97
Neetu Agrawal (Garg), 281
Subhadip Goswami, 201
Neha Goyal, 259, 269 Subhanshu Goyal, 351
Sudeshna Sani, 147
Sudipta Bhattacharya, 171
O Sufiyan Adam, 281
Oscar Leo D’souza, 375 Sunil Kumar, 405
Suraj Sawant, 189
Suyash Kumar, 281
P Swaroopa, K. M., 385, 395
Pavan Surya Prakash, V., 317 Syed. Hasma, 147
Pooja Jain, 159
Pothakam Chandu, 247
Prabhakar Holambe, 137 T
Pranamya Sinha, 405 Taj Alam, 1
Tapas Kumar Benia, 201
Prasad, M. R., 71, 85
Prasad, Rajesh, 363
Pratibha Shingare, 97
V
Priyesh Kulshrestha, 293
Vani Venkata Durga Kadavala, 247
Veerendra Subramanya Kumar, T., 317
Ventura, Carlos E., 225
R Vijay Kumar, T. V., 293, 305
Rajesh Kumar Aggarwal, 259 Vikash, 305
Rajesh Maharudra Patil, 71, 85 Vishnukant Gore, 137
Rajiv Bansal, 259 Vivek Prakash Srivastava, 269
Ravi Kumar, 405
Ravi Rayappa, 71, 85
Reema Lalit, 15 Y
Rekha Sri Durga, 237 Yogita Kapse, 109

Emerging Trends in Computer Science and Its Application
No ratings yet
Emerging Trends in Computer Science and Its Application
810 pages
Artificial Intelligence Theory and Applications Proceedings of AITA 2023, Volume 1 (Harish Sharma, Antorweep Chakravorty Etc.) (Z-Library)
No ratings yet
Artificial Intelligence Theory and Applications Proceedings of AITA 2023, Volume 1 (Harish Sharma, Antorweep Chakravorty Etc.) (Z-Library)
495 pages
Mobile Radio Communications and 5G Networks: Nikhil Marriwala C. C. Tripathi Shruti Jain Dinesh Kumar
No ratings yet
Mobile Radio Communications and 5G Networks: Nikhil Marriwala C. C. Tripathi Shruti Jain Dinesh Kumar
687 pages
Intelligent Computing: Kohei Arai Editor
No ratings yet
Intelligent Computing: Kohei Arai Editor
1,183 pages
International Conference On Innovative Computing and Communications
No ratings yet
International Conference On Innovative Computing and Communications
884 pages
Harish Sharma Editor Vivek Shrivastava Editor Kusum
No ratings yet
Harish Sharma Editor Vivek Shrivastava Editor Kusum
722 pages
Intelligent Computing and Optimization
No ratings yet
Intelligent Computing and Optimization
378 pages
profileWen-Cheng-Laipublication363980266 A Stacking Ensemble Framework For Android Malware Predi 2
No ratings yet
profileWen-Cheng-Laipublication363980266 A Stacking Ensemble Framework For Android Malware Predi 2
752 pages
Applications of Internet of Things Proceedings of ICCCIOT 2020 Jyotsna K. Mandal Instant Download
100% (3)
Applications of Internet of Things Proceedings of ICCCIOT 2020 Jyotsna K. Mandal Instant Download
59 pages
Lecture Notes On Data Engineering and Communications Technologies
No ratings yet
Lecture Notes On Data Engineering and Communications Technologies
806 pages
Proceedings of Fourth International Conference On Computer and Communication Technologies
No ratings yet
Proceedings of Fourth International Conference On Computer and Communication Technologies
657 pages
Proposal Architecture of The Smart Campus
No ratings yet
Proposal Architecture of The Smart Campus
1,102 pages
Machine Learning For Predictive Analysis Proceedings of Ictis 2020 9789811571053
100% (2)
Machine Learning For Predictive Analysis Proceedings of Ictis 2020 9789811571053
627 pages
IOT With Smart Systems: Jyoti Choudrie Parikshit Mahalle Thinagaran Perumal Amit Joshi
No ratings yet
IOT With Smart Systems: Jyoti Choudrie Parikshit Mahalle Thinagaran Perumal Amit Joshi
792 pages
Soft Computing: Theories and Applications
No ratings yet
Soft Computing: Theories and Applications
929 pages
Publicación Diana
No ratings yet
Publicación Diana
933 pages
Data Science and Applications (Etc.) (Z-Library)
No ratings yet
Data Science and Applications (Etc.) (Z-Library)
569 pages
Information and Communication Technology For Competitive Strateg 2021
No ratings yet
Information and Communication Technology For Competitive Strateg 2021
1,158 pages
Artificial Intelligence in Internet of Things Key Digital Trends
No ratings yet
Artificial Intelligence in Internet of Things Key Digital Trends
14 pages
ICT - Smart Systems and Technologies - Proceedings of ICTCS 2023, Volume 4
No ratings yet
ICT - Smart Systems and Technologies - Proceedings of ICTCS 2023, Volume 4
471 pages
Intelligent Cyber Physical Systems and Internet of Things: Jude Hemanth Danilo Pelusi Joy Iong-Zong Chen Editors
No ratings yet
Intelligent Cyber Physical Systems and Internet of Things: Jude Hemanth Danilo Pelusi Joy Iong-Zong Chen Editors
896 pages
Proceedings of The Third International Conference On Innovations in Computing Research (ICR'24)
No ratings yet
Proceedings of The Third International Conference On Innovations in Computing Research (ICR'24)
794 pages
Top Five Machine Learning Libraries in Python - A Comparative Analysis
No ratings yet
Top Five Machine Learning Libraries in Python - A Comparative Analysis
577 pages
Proceedings of International Conference On Data Analytics and Insights, ICDAI 2023
No ratings yet
Proceedings of International Conference On Data Analytics and Insights, ICDAI 2023
799 pages
Proceedings of Sixth International Congress On Information and Communication Technology
No ratings yet
Proceedings of Sixth International Congress On Information and Communication Technology
1,046 pages
Bhushan B. AI Models For Blockchain-Based Intelligent Networks in IoT 2023
No ratings yet
Bhushan B. AI Models For Blockchain-Based Intelligent Networks in IoT 2023
396 pages
R of Success in e Learning Engagement During COVID 19 A Case of Veritas University Abuja Nigeria
No ratings yet
R of Success in e Learning Engagement During COVID 19 A Case of Veritas University Abuja Nigeria
749 pages
7 PDF
No ratings yet
7 PDF
255 pages
Intelligent Systems and Applications - Proceedings of The 2023
No ratings yet
Intelligent Systems and Applications - Proceedings of The 2023
885 pages
Proceedings of The Sixth International Scientific Conference "Intelligent Information Technologies For Industry" (IITI'22)
No ratings yet
Proceedings of The Sixth International Scientific Conference "Intelligent Information Technologies For Industry" (IITI'22)
533 pages
Applications of Internet of Things: Jyotsna K. Mandal Somnath Mukhopadhyay Alak Roy Editors
100% (1)
Applications of Internet of Things: Jyotsna K. Mandal Somnath Mukhopadhyay Alak Roy Editors
258 pages
Proceedings of Icactce 23 - The International Conference On Advances in Communication Technology and Computer Engineering
No ratings yet
Proceedings of Icactce 23 - The International Conference On Advances in Communication Technology and Computer Engineering
744 pages
International Conference On Computer Networks and Communication Technologies
No ratings yet
International Conference On Computer Networks and Communication Technologies
1,035 pages
Kumar Et Al., 2024
No ratings yet
Kumar Et Al., 2024
528 pages
ICTCS20211
100% (1)
ICTCS20211
787 pages
Springer Book - Outfit Recommendation-Chpater 78
No ratings yet
Springer Book - Outfit Recommendation-Chpater 78
992 pages
Data Science and Applications (Satyasai Jagannath Nanda, Rajendra Prasad Yadav Etc.) (Z-Library)
No ratings yet
Data Science and Applications (Satyasai Jagannath Nanda, Rajendra Prasad Yadav Etc.) (Z-Library)
546 pages
Icicit 2020
No ratings yet
Icicit 2020
981 pages
Intelligent Systems Proceedings of SCIS 2021 Full Book Download
100% (8)
Intelligent Systems Proceedings of SCIS 2021 Full Book Download
15 pages
Mumbai Salaried 9
33% (3)
Mumbai Salaried 9
4 pages
Artificial Intelligence Invoice Management
No ratings yet
Artificial Intelligence Invoice Management
1,036 pages
10.1007@978 3 030 39047 1
No ratings yet
10.1007@978 3 030 39047 1
408 pages
2022-Pag 155-ICT With Intelligent Applications - Proceedings of ICTIS 2022, Volume 1-Springer (2
No ratings yet
2022-Pag 155-ICT With Intelligent Applications - Proceedings of ICTIS 2022, Volume 1-Springer (2
827 pages
Federated Learning For IoT Applications (Satya Prakash Yadav, Bhoopesh Singh Bhati Etc.)
No ratings yet
Federated Learning For IoT Applications (Satya Prakash Yadav, Bhoopesh Singh Bhati Etc.)
269 pages
Applied Information Processing Systems 2022
100% (1)
Applied Information Processing Systems 2022
588 pages
Iot and Cloud Computing For Societal Good: Jitendra Kumar Verma Deepak Saxena Vicente González-Prida Editors
No ratings yet
Iot and Cloud Computing For Societal Good: Jitendra Kumar Verma Deepak Saxena Vicente González-Prida Editors
331 pages
Communication and Intelligent Systems: Harish Sharma Vivek Shrivastava Ashish Kumar Tripathi Lipo Wang Editors
No ratings yet
Communication and Intelligent Systems: Harish Sharma Vivek Shrivastava Ashish Kumar Tripathi Lipo Wang Editors
462 pages
Proceedings of FTC - Deni Darmawan
No ratings yet
Proceedings of FTC - Deni Darmawan
830 pages
Computer Communication, Networking and IoT - Proceedings of 5th ICICC 2021, Volume 2
No ratings yet
Computer Communication, Networking and IoT - Proceedings of 5th ICICC 2021, Volume 2
439 pages
EAI/Springer Innovations in Communication and Computing: Series Editor
No ratings yet
EAI/Springer Innovations in Communication and Computing: Series Editor
136 pages
Libin Paper
No ratings yet
Libin Paper
307 pages
Emerging IT/ Ict and Ai Technologies Affecting Society: Mousmi Ajay Chaurasia Chia-Feng Juang Editors
No ratings yet
Emerging IT/ Ict and Ai Technologies Affecting Society: Mousmi Ajay Chaurasia Chia-Feng Juang Editors
327 pages
Data, Engineering and Applications
No ratings yet
Data, Engineering and Applications
331 pages
Lecture Notes in Networks and Systems Se
No ratings yet
Lecture Notes in Networks and Systems Se
22 pages
Regular Expression Question Solution
100% (2)
Regular Expression Question Solution
68 pages
Artificial Intelligence and Its Applications: Brahim Lejdel Eliseo Clementini Louai Alarabi Editors
No ratings yet
Artificial Intelligence and Its Applications: Brahim Lejdel Eliseo Clementini Louai Alarabi Editors
613 pages
Lecture Notes in Networks and Systems
No ratings yet
Lecture Notes in Networks and Systems
16 pages
10.1007@978 981 15 2774 6 PDF
100% (1)
10.1007@978 981 15 2774 6 PDF
563 pages
Icicc 2017
No ratings yet
Icicc 2017
732 pages
Advances in Communication Cloud and Big Data
100% (1)
Advances in Communication Cloud and Big Data
178 pages
Iot As A Service: Bo Li Mao Yang Hui Yuan Zhongjiang Yan
No ratings yet
Iot As A Service: Bo Li Mao Yang Hui Yuan Zhongjiang Yan
502 pages
Improving Maternal Child Health (MCH) Services
No ratings yet
Improving Maternal Child Health (MCH) Services
22 pages
Health Assessment
100% (2)
Health Assessment
34 pages
Operating System Installation
No ratings yet
Operating System Installation
42 pages
Linux VIVA Questions
No ratings yet
Linux VIVA Questions
8 pages
3-IPD - Introduction and Bsic Principless
No ratings yet
3-IPD - Introduction and Bsic Principless
13 pages
Java Multithreading For Senior Engineering Interviews Part I
No ratings yet
Java Multithreading For Senior Engineering Interviews Part I
80 pages
IMD 301 Introduction To Cataloguing
No ratings yet
IMD 301 Introduction To Cataloguing
22 pages
The Institute of Chartered Accountants of India: Information Technology Training Programme
No ratings yet
The Institute of Chartered Accountants of India: Information Technology Training Programme
34 pages
Unbeaten
No ratings yet
Unbeaten
401 pages
A MATLAB Primer in Four Hours With Practical Examples
No ratings yet
A MATLAB Primer in Four Hours With Practical Examples
23 pages
Chapter 4 Memory Element
No ratings yet
Chapter 4 Memory Element
87 pages
Ss Guide 2
No ratings yet
Ss Guide 2
12 pages
CDSL Account Closing PDF
No ratings yet
CDSL Account Closing PDF
1 page
Datasheet XC888
No ratings yet
Datasheet XC888
144 pages
Short Case Study
50% (2)
Short Case Study
1 page
Kevin Choi Resume
No ratings yet
Kevin Choi Resume
2 pages
Acc 205 Data Processing and Programming
No ratings yet
Acc 205 Data Processing and Programming
20 pages
Spiceland IA 6e Excel Guide
No ratings yet
Spiceland IA 6e Excel Guide
31 pages
Write Optimized Dso
No ratings yet
Write Optimized Dso
3 pages
Using ERwin 7 Metadata Integration
No ratings yet
Using ERwin 7 Metadata Integration
20 pages
Plan Academy-The Project Control Manager's Guide To Getting Your Team Up and Running With Primavera P6
No ratings yet
Plan Academy-The Project Control Manager's Guide To Getting Your Team Up and Running With Primavera P6
9 pages
Ch-2 Information Technology
No ratings yet
Ch-2 Information Technology
15 pages
Macro Config?LVKH Gaming?? V1934
No ratings yet
Macro Config?LVKH Gaming?? V1934
6 pages
Pivot Table Exercise
No ratings yet
Pivot Table Exercise
26 pages
QEPM
No ratings yet
QEPM
13 pages
Telemedicine & Education Centre in Four Tribal Settlements in Karulayi, Nilambur As Sansad Adarsh Gram Yojana Project
No ratings yet
Telemedicine & Education Centre in Four Tribal Settlements in Karulayi, Nilambur As Sansad Adarsh Gram Yojana Project
10 pages
130DSA Ashhad
No ratings yet
130DSA Ashhad
7 pages
Obout Treeview: 1. Put DLL From Folder
No ratings yet
Obout Treeview: 1. Put DLL From Folder
5 pages
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
From Everand
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
Dr.Chandrakant
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.