Advances in Iot and Security With Computational Intelligence
Advances in Iot and Security With Computational Intelligence
Anurag Mishra
Deepak Gupta
Girija Chetty Editors
Advances in IoT
and Security
with Computational
Intelligence
Proceedings of ICAISA 2023, Volume 2
Lecture Notes in Networks and Systems
Volume 756
Series Editor
Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of Campinas—
UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Türkiye
Derong Liu, Department of Electrical and Computer Engineering, University
of Illinois at Chicago, Chicago, USA
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering, University of
Alberta, Alberta, Canada
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to both
the contributors and the readership are the short publication timeframe and
the world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
For proposals from Asia please contact Aninda Bose (aninda.bose@springer.com).
Anurag Mishra · Deepak Gupta · Girija Chetty
Editors
Girija Chetty
Faculty of Science and Technology
University of Canberra
Bruce, ACT, Australia
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
v
vi Preface
We are particularly grateful to Dr. Rajendra Pratap Gupta, Mr. Animesh Mishra,
Mr. M. S. Bala and Prof. Balram Pani who blessed us in the inaugural session. We
are also thankful to Mr. N. K. Goyal for his presence in the valedictory session. We
are extremely grateful to Springer Nature, especially Dr. Aninda Bose who agreed to
publish two volumes of conference proceedings in the prestigious series of Lecture
Notes in Networks and Systems.
vii
viii Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
Editors and Contributors
Prof. Anurag Mishra has bachelor’s and master’s in Physics from the University
of Delhi. He completed his M.E. in Computer Technology and Applications and
Ph.D. in Electronics also from the University of Delhi. He has extensive experi-
ence of teaching B.Sc. (Hons.), M.Sc., B.Tech. and M.Tech. programs in Electronics
and Computer Science. He has about 28 years of experience as a teacher and as an
active researcher. He has been a consultant for offshoot agencies of the Ministry of
Education, Government of India. Presently, he is nominated as a visitor’s nominee
in a central university by the Government of India. He has 65 refereed papers in
highly cited journals, international conferences and book chapters, three authored,
one edited book and two patents to his credit. He has recently entered into devel-
oping medical applications using deep convolutional neural networks. He is an active
reviewer of papers for Springer, Elsevier and IEEE Transactions. He is a member
of IEEE and also holds membership of the Institute of Informatics and Systemics
(USA).
xi
xii Editors and Contributors
many scientific societies like IEEE SMC, IEEE CIS, CSI and many more. He has
served as a reviewer of many scientific journals and various national and interna-
tional conferences. He was the general chair of the 3rd International Conference on
Machine Intelligence and Signal Processing (MISP-2021) and associated with other
conferences like IEEE SSCI, IEEE SMC, IJCNN, BDA 2021, etc. He has supervised
three Ph.D. students and guided 15 M.Tech. projects. He is currently the principal
investigator (PI) or a co-PI of two major research projects funded by the Science and
Engineering Research Board (SERB), Government of India.
Dr. Girija Chetty has a bachelor’s and master’s degrees in Electrical Engineering and
Computer Science from India and Ph.D. in Information Sciences and Engineering
from Australia. She has more than 38 years of experience in industry, research and
teaching from Universities and Research and Development Organisations from India
and Australia and has held several leadership positions including the head of Soft-
ware Engineering and Computer Science, the program director of ITS courses, and
the course director for Master of Computing and Information Technology Courses.
Currently, she is a full professor in Computing and Information Technology at School
of Information Technology and Systems at the University of Canberra, Australia, and
leads a research group with several Ph.D. students, post-docs, research assistants and
regular international and national visiting researchers. She is a senior member of
IEEE, USA; a senior member of Australian Computer Society; and ACM member,
and her research interests are in multimodal systems, computer vision, pattern recog-
nition, data mining and medical image computing. She has published extensively
with more than 200 fully refereed publications in several invited book chapters,
edited books, and high-quality conferences and journals, and she is in the editorial
boards, technical review committees and a regular reviewer for several Springer,
IEEE, Elsevier and IET journals in the area related to her research interests. She
is highly interested in seeking wide and interdisciplinary collaborations, research
scholars and visitors in her research group.
Contributors
Kirti Jain Department of Computer Science, University of Delhi, New Delhi, Delhi,
India
Pooja Jain Jayoti Vidyapeeth Women’s University, Jaipur, Rajasthan, India
Amit Joshi Department of Computer Engineering and IT, COEP Technological
University (COEP Tech), Pune, Maharashtra, India
Abhay Juvekar IT Consultant, Mumbai, India
Vani Venkata Durga Kadavala Department of CSE, Koneru Lakshmaiah Educa-
tion Foundation, Vaddeswaram, Andhra Pradesh, India
Kapil National Institute of Technology, Kurukshetra, India
Yogita Kapse Electronics and Telecommunication, College of Engineering Pune,
Pune, India
Gauri M. Karve Electrical Engineering Department, PVG’s COET & GKPIM,
Pune, India
Nancy Kaur Faculty of Science and Technology, University of Canberra, Bruce,
ACT, Australia
Sharanjit Kaur Acharya Narendra Dev College, University of Delhi, New Delhi,
Delhi, India
Manju Khari School of Computer and Systems Sciences, Jawaharlal Nehru
University, New Delhi, India
Savara Murali Krishna Department of CSE, Koneru Lakshmaiah Education
Foundation, Vaddeswaram, Andhra Pradesh, India
Anshul Kulkarni Department of Computer Engineering and IT, COEP Technolog-
ical University (COEP Tech), Pune, Maharashtra, India
Priyesh Kulshrestha School of Computer and Systems Sciences, Jawaharlal Nehru
University, New Delhi, India
Anil Kumar Galgotias University, Greater Noida, India;
Deen Dayal Upadhyaya College, University of Delhi, Delhi, India
Ravi Kumar Shaheed Rajguru College of Applied Sciences for Women, University
of Delhi, Delhi, India
Sunil Kumar Shaheed Rajguru College of Applied Sciences for Women, University
of Delhi, Delhi, India
Suyash Kumar USICT, GGSIPU, New Delhi, India;
Department of Computer Science, Hansraj College, University of Delhi, New Delhi,
India
Shobha Lal Jayoti Vidyapeeth Women’s University, Jaipur, Rajasthan, India
xvi Editors and Contributors
Gunjan Rani Acharya Narendra Dev College, University of Delhi, New Delhi,
Delhi, India
Ravi Rayappa Electronics and Communication Engineering, Jain Institute of
Technology, Davanagere, Karnataka, India
Sudeshna Sani Koneru Lakshmaiah Education Foundation, Vijayawada, Andhra
Pradesh, India
Namdev Sawant St. Francis Institute of Technology, Mumbai, India
Suraj Sawant Department of Computer Engineering and IT, COEP Technological
University (COEP Tech), Pune, Maharashtra, India
Pratibha Shingare College of Engineering, Pune, India
Geetika Singh KIET Group of Institutions, Dr. A.P.J. Abdul Kalam Technical
University, Ghaziabad, Uttar Pradesh, India
Khoirom Motilal Singh Department of CSE, Koneru Lakshmaiah Education Foun-
dation, Vaddeswaram, India
Sharat Singh Department of Electronics, Deen Dayal Upadhyaya College, Univer-
sity of Delhi, New Delhi, India
Pranamya Sinha Shaheed Rajguru College of Applied Sciences for Women,
University of Delhi, Delhi, India
P. N. V. L. S. Sneha Sree Department of CSE, Koneru Lakshmaiah Education
Foundation, Vaddeswaram, Andhra Pradesh, India
Vivek Prakash Srivastava National Institute of Technology, Kurukshetra, India
K. M. Swaroopa Faculty of Science and Technology, University of Canberra,
Bruce, ACT, Australia
Mangesh S. Thakare Electrical Engineering Department, PVG’s COET &
GKPIM, Pune, India
Geetanjali A. Vaidya Electrical Engineering Department, PVG’s COET &
GKPIM, Pune, India
T. Veerendra Subramanya Kumar Department of CSE, Koneru Lakshmaiah
Education Foundation, Vaddeswaram, India
Carlos E. Ventura The University of British Columbia, Vancouver, BC, Canada
Aditya Verma Department of Computer Engineering and IT, COEP Technological
University (COEP Tech), Pune, Maharashtra, India
Shiv Kumar Verma Galgotias University, Greater Noida, India
T. V. Vijay Kumar School of Computer and Systems Sciences, Jawaharlal Nehru
University, New Delhi, India
xviii Editors and Contributors
Abstract Cloud computing (CC) has gained huge superiority in recent era by
providing the feature of sharing a pool of computing resources on demand among
various cloud users over the internet. It provides benefits of scalability, flexibility, and
pay-per-use facility using virtualization technology to its clients which attract large
enterprises that work on distributed computing. One important considered research
issue in cloud computing is task scheduling which means that the cloud tasks need
to be appropriately mapped to the existing cloud resources to optimize single or
multiple objectives. The complexity and large search space of task scheduling clas-
sify it as a NP-hard problem. A brief analysis of existing heuristic and metaheuristic
strategies and their application in scheduling cloud environments has been presented
in this paper followed by the comparative study of few metaheuristic algorithms. The
heuristic algorithms cannot produce an exact optimal solution in an acceptable time.
To solve this problem, metaheuristic algorithms based on swarm intelligence and
bio-inspired techniques like Particle Swarm Optimization (PSO), Genetic algorithm
(GA), and Ant Colony Optimization (ACO) algorithm are a good choice for finding
the near-optimal solution. These have been implemented to run in cloud scenarios
and their performance has been compared to optimize the parameters makespan,
average resource usage, and average response time. PSO algorithm is found to be
outperformed ACO and GA in these optimization metrics in various test conditions
in the cloud environment.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 1
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_1
2 J. Chauhan and T. Alam
1 Introduction
CC offers a standard platform for cheap and convenient hosting and delivering
computing resources as a utility on demand through the Internet [1]. Cloud providers
rent out physical and logical computing resources on demand from their large data
centers to different cloud users having dynamic needs on pay-per-use basis [2]. Cloud
services are distinctly classified into three kinds: Software as a Service (SaaS), Plat-
form as a Service (PaaS), and Infrastructure as a Service (IaaS). IaaS is an substantial
and fast-growing emerging field to provide maximum benefit to small and medium-
sized organizations [3]. However, with several benefits, there are some major issues
and challenges that need to be addressed in CC such as automated resource provi-
sioning, interoperability, virtualization, privacy and security, data management, load
balancing, network management, application programming interfaces (APIs), and
many more [3, 4].
Virtualization is a primary notion circulating CC technology concept which makes
possible the isolated execution of several cloud users’ tasks at the same time using a
software layer termed as hypervisor or VM monitor [5].
Various cloud users request virtualized resources by specifying a set of resource
instances at any instant to run their task. It is the cloud provider’s responsibility to
allocate resources efficiently and effectively to the given set of tasks at any instant
without any delay which is called resource management. Resource management
includes challenges regarding resource allocation, resource mapping, and modeling,
resource adaptation, resource finding, provisioning, and scheduling of resources.
Both under-provisioning and over-provisioning of resources are avoided as the cloud
services and resources are shared among various cloud clients who use them on a
subscription basis [3]. The main aim of cloud providers is to maximize their profit and
revenue leading to the high performance of the cloud. Hence, cloud providers have to
allocate resources efficiently to save energy usage, improve resource utilization, and
efficient bandwidth management. However, cloud users expect the simplest inter-
face to use Quality of Service (QoS) with minimum expenses, high throughput, and
quick response time [2]. The cloud providers can achieve the objective of maximum
resource usage by minimizing the makespan, task transferring time, task execution
time, energy usage and costs, etc. Cloud users can achieve the objective of reducing
expenses and satisfy QoS by minimizing the average response time.
There should be an efficient and well-managed scheduling mechanism to schedule
the cloudlets to attain maximum resource usage. An efficient scheduling scheme can
be achieved with the appropriate mapping of tasks to the required resources called
task scheduling. Hence, a scheduling problem includes several cloud consumers
tasks that need to be scheduled on the existing VMs followed with few constraints
to achieve optimization of an objective function. The goal is to construct a schedule
specifying which task will be allocated to which resource [6].
The scheduling methods can be categorized into three classifications which are
resource-based scheduling, dependent task-based or workflow-based scheduling, and
independent task scheduling. The tasks are scheduled independently of each other
Comparative Study of Metaheuristic Algorithms for Scheduling … 3
in independent task scheduling, whereas the tasks are bounded with each other via
interdependencies in workflow-based scheduling. Task scheduling methods can be
centralized or distributed. There is only one scheduler for mapping tasks in the
centralized scheduling-based method, whereas the scheduling decisions are decen-
tralized among all available VMs in distributed scheduling. There is one more way
of categorizing job scheduling: static and dynamic schedulings. In static scheduling,
every task is assumed to arrive at the same time. Hence, all the tasks or VMs are
mapped and scheduled based on a priori information. While in dynamic scheduling,
no prior information is there about the task’s arrival, execution time, and VMs. Hence,
all scheduling decisions like resource allocation to incoming cloudlets, execution
time, etc. are done in real-time only. The cloud tasks can be handled immediately
when they arrive called as immediate mode or can be collected in a batch and then the
whole batch can be scheduled called batch mode [7, 8]. The traditional exhaustive
and deterministic scheduling strategies are simple and easy to understand and imple-
ment but do not give any guarantee of getting the optimal solutions in an acceptable
amount of time [9–11]. The traditional heuristic algorithms show slow performance
such as local optimum trap, slow convergence, additional computational time, having
complex operators, and framed only for binary or real searching domain and not
suitable for complex scientific optimization problems and large solution space.
To solve this problem, metaheuristics’ algorithms based on swarm intelligence
and bio-inspired techniques like Particle Swarm Optimization (PSO), Ant Colony
Optimization (ACO), and Genetic Algorithm (GA) are a good choice for finding the
near-optimal solution [6, 7, 9].
Recently, various task scheduling proposals have been proposed in cloud
computing environments but despite that, no comprehensive performance study has
been done to compare existing task scheduling algorithms. A comparative study of
existing metaheuristics algorithms and their application in scheduling in cloud envi-
ronments has been presented in this paper. These three have been implemented to run
in cloud scenarios and their performance has been compared to optimize the parame-
ters makespan, average resource usage, and average response time. The experimental
results are compared and found that PSO outperforms ACO and GA algorithms in
optimizing the objective function.
1.1 Contributions
The following are some of the major contributions made by this work.
• A system model is presented, including a task model, and virtual machine model.
• It demonstrated an adaptive task allocation to virtual machines that dynamically
adjusts task execution time.
• It proposed a model based on PSO for cloud computing and to see its inverse
relationship between makespan and average resource utilization.
4 J. Chauhan and T. Alam
• It looked into the impact of several scenarios on the heterogeneous cloud system
on makespan, response time, and average resource utilization of the system.
• It enables us to perform comparative study of existing metaheuristic techniques
PSO, ACO, and GA and found PSO to be outperformed than GA and ACO.
2 Review Literature
Authors gave an extensive analysis of cloud computing emphasizing its key models,
its architectural principles, its state-of-the-art implementation, its advantages along
with research challenges in [1, 4]. Serving the cloud handy resources to the cloud
users termed as scheduling acts as the main theme in the research of cloud resource
management and primarily its task scheduling section [2, 12]. The global research
community focusing on cloud computing has developed an increasing interest in
its resource scheduling issue. The categorization of resource allocation methods
has been discussed in [2, 12]. Researchers proposed various heuristics algorithms
for independent task scheduling such as Min–Min, Max–Min, round-robin, First
Come First Serve, and many more to overcome the drawbacks of traditional exhaus-
tive and deterministic strategies [13–18]. Authors [13] compared performance of
various heuristic approaches such as Min–Min, Max–Min, and Duplex depending on
metrics Minimum Execution Time (MET) and Minimum Completion Time (MCT).
Researchers have conducted extensive review of dependent job-centered strategies
modeled with Directed Acyclic Graph (DAG) for task scheduling problem described
in [5, 19]. Scheduling strategies for dependent jobs have been presented in [20–22].
Due to complexity and large search space, researchers classify task scheduling as
a NP-hard problem. The heuristic strategies generally suffer from slow convergence,
and the solution generated by heuristic approaches may be stuck in local optima
and difficult to find the exact solution. Thereby, to improve the solution quality and
computing time, metaheuristics’ techniques have already gained vast attention since
the past many years for the NP-hard problems. Metaheuristic approaches provide
near-optimal solutions within an acceptable timespan and make task scheduling algo-
rithms more effective and efficient. Lot of review literatures was given by various
researchers on metaheuristic techniques adopted for task scheduling for distributed
environment, i.e., cloud computing, cluster, and grid environment which includes
ACO, PSO, GA, League Championship Algorithm (LCA), and BAT algorithm [6,
11, 23]. In this direction, Tsai and Rodrigues [11] gave an extensive review of litera-
ture discussing metaheuristics techniques for cloud task scheduling and presented the
major issues and challenges faced in metaheuristics’ algorithms. Researchers have
studied and analyzed the performance of metaheuristic techniques in cloud system
[13, 24]. An ACO algorithm in case of independent task scheduling was proposed in
[25, 26] to optimize QoS parameters in cloud computing. Various GA-based algo-
rithms and their modifications have been proposed by researchers to optimize the QoS
parameters for task scheduling in cloud system in references [27, 28] to be outper-
formed compared to traditional PSO and GA algorithms [29]. Researchers found that
Comparative Study of Metaheuristic Algorithms for Scheduling … 5
PSO provides fast task scheduling and solution quality better than existing heuris-
tics and other metaheuristics in the grid and homogenously distributed and cloud
computing [23, 30–33]. Researchers proposed various modified forms of PSO algo-
rithm which is found to be outperformed than standard PSO and other metaheuristics
discussed in references [34–36]. Although, all compared algorithms show satisfac-
tory revenue from simulation outcomes. However, the new modified PSO algorithm is
much better than the compared algorithms in cloud computing and its performance is
improved by using load balancing technique proposed in the references [37, 38] which
minimizes QoS parameters such as makespan, execution time, resource utilization,
cost, transmission time, round trip time to perform load balancing between cloudlets
and VMs.
3 Scheduling Approaches
• First Come First Serve Algorithm: Resources are assigned to the tasks according
to their order of arrival. The early the task arrives, the earlier it gets the resources
and then releases the resource after completing its execution [24, 39].
• Round-Robin Algorithm: The tasks are assigned the resources in an FCFS
manner, but they get the resource only for a small-time quantum. The resource is
pre-empted if the allotted time slot expires and is given to the next waiting task
in the ready queue. The former pre-empted task is directed to wait at the tail of
the ready queue if its execution is not complete [39].
• Min–Min: The notion of Min–Min algorithm is to select the shorter job first having
Minimum Completion Time (MCT) from the given task set and further allocate
the selected shortest task to a resource having minimum expected completion
time. This algorithm computes expected completion time C ij of any ith task from
the cloudlets set T = {t 1 , t 2 , t 3 , …, t n } on any jth resource from a resources set
R = {r 1 , r 2 , r 3 , …, r m } using Eq. (1) given below:
Ci j = E i j + re j . (1)
Here, rej denotes the time to get ready or prepare of resource r j and E ij denotes the
time taken by ith task to execute on jth resource. The expected completion time of
all tasks is calculated using the above Eq. (1), and then, the task having the shortest
expected completion time is selected and is mapped to the respective resource and
detached from the task set. This step is reiterated for all subsequent tasks in the set
until all tasks have been mapped to the respective resources [17].
6 J. Chauhan and T. Alam
• Max–Min: This algorithm prioritizes the longer tasks having maximum MCT
than the shorter tasks. It firstly selects the longer tasks from the given task set
for resource assignment. This algorithm is proved to be superior to the Min–Min
algorithm when the count of shorter tasks is greater than longer tasks [17].
• RASA (Resource Awareness Scheduling Algorithm): The Max–Min and Min–
Min approaches can be used otherwise to enjoy their benefits and overcome their
drawbacks which result in a hybrid efficient scheduling scheme known as RASA.
• Best Fit: This scheduling policy assigns resources to the job which requires the
maximum number of resources from the given task set. When multiple resources
of different types are required by VMs, in that case, one kind of resource can
be taken as a “reference resource” and then choose the best fit according to the
reference resource.
The traditional heuristic algorithms show slow performance such as local optimum
trap, slow convergence, additional computational time, having complex operators,
and framed only for binary or real searching domain. Hence, heuristic algorithms are
not suitable for complex scientific optimization problems and large solution space.
This motivates the researchers to enhance the heuristic approaches to overcome their
drawbacks leading to metaheuristic algorithms.
• Genetic Algorithm (GA): The concept of the GA method was first given by
Holland in 1975 which proved its effectiveness for complex and large searching
problems. GA is a probabilistic population-based and evolutionary optimization
technique that is motivated by the natural evolutionary process of the chromo-
somes in which the notion of fittest survival is used, i.e., recombination of the
chromosomes provides new better solutions via the use of genetic crossover,
mutation, and inversion [40, 41].
• Ant Colony Optimization (ACO): Ant Colony Optimization (ACO) is used in
computer science and Operation Search for solving complex combinatorial opti-
mization problems. Dorigo in 1992 originally introduced this novel ant system
approach in his Ph.D. thesis. Since 1992, various ACO algorithms have been
proposed which almost share the same idea. The prime idea of ACO is motivated
by the searching behavior of real ants to locate the shortest path through their ant
colonies directing to their food source [42, 43].
• Particle Swarm Optimization (PSO): PSO is expected as a powerful optimiza-
tion and computational technique to get the optimal solution for multimodal
continuous optimization problems. PSO is a swarm intelligent, evolutionary,
and population-based metaheuristic technique developed in 1995 by Kennedy
and Eberhart to perform global search. Originally, its idea was motivated by the
particle’s social behavior and their movement such as birds, fish herds [23, 24]
The Task Assignment Problem (TAP) can be described as follows. A set of tasks or
cloudlets is represented by set T = {t 1 , t 2 , t 3 , …, t n }, where n is the total number
of independent tasks in a batch which are different in length. All available VMs
are represented by a set VM = {VM1 , VM2 , VM3 , …, VMm }, where m is the total
no. of available VMs which are different in MIPS rating. This implies that tasks
executed on different machines have different execution times and execution costs.
The number of cloudlets is always more than the number of VMs. The processing
time of any cloudlet T i on VMj is denoted by PTij and the completion time of VMj
as CTj . Finishing time and submission time of any cloudlet T i are denoted by FTi
and SubTi , respectively. Response time of ith task is denoted by RTTi, and average
response time is computed as denoted by AvRT. Our objective of minimizing overall
makespan and average response time and maximizing average resource utilization
(LBR) can be described with Eqs. (2), (3), and (5) given below [38, 44]:
Each task in T is bounded by T max and T min , i.e., T min ≤ T i ≤ T max , and each VM
in VM set is bounded by VMmax and VMmin , i.e., VMmin ≤ V j ≤ VMmax [45]. VMs
are always considered to be available all the time. The tasks cannot be interrupted or
pre-empted during processing on VM. Each VM can process only one cloudlet at a
time and cloudlets cannot be run on more than one VM at a time. When cloudlet i
is allocated to machine j, X ij becomes 1, otherwise it is 0. Two basic conditions are
considered to satisfy the above-specified constraints. Condition (6) ensures that each
task is assigned to only one VM [24].
m
X i j = 1 ∀i ∈ T, (6)
j
X i j ∈ {0, 1} ∀ j ∈ M, i ∈ T . (7)
Task scheduling aims to perform appropriate mapping of the cloudlets to the available
VMs so that computing resources can be utilized efficiently and cloud users’ expenses
can be minimized.
The aim is to find the best metaheuristic approach for task scheduling which
minimizes makespan and average response time for cloud users and maximizes the
average resource utilization for cloud providers in highly distributed and dynamic
multiprocessing environments, i.e., the cloud computing environment.
The authors have performed various experiments by increasing the number of
cloudlets for heterogeneous systems to perform comparative analysis of existing
metaheuristic algorithms, PSO, ACO, and GA for task scheduling problem for the
parameter settings of VMs and cloudlets. Ten datacenters are created with two hosts
and 50 VMs each in the experiment and cloudlets count is varied from 100 to 1000
under the simulation environment. The task length is taken in the range of 1000–
20,000 Million Instructions (MIs). The cloudlets are assigned to heterogeneous VMs
by varying their MIPS between 500 and 2000 and bandwidth in between 500 and
1000. The stopping criteria are set up to 100 iterations. The results of ten experi-
ments are taken over 100 iterations for task range 100–1000 and the average of the
optimization parameter values is taken.
Comparative Study of Metaheuristic Algorithms for Scheduling … 9
The algorithms are compared based on the following parameters, i.e., makespan,
average response time and average resource utilization. The average of ten repetitions
is taken to obtain the average makespan for PSO, ACO, and GA as shown in Fig. 1.
The PSO algorithm shows a lower makespan than ACO and GA. The PSO takes
less time to execute a given task set on available VMs than ACO and GA which
indicates its outperformance in minimizing the makespan. The cloud users also wish
for quick response time from the cloud system to satisfy their QoS requirements. The
evaluation of average response time for PSO, ACO, and GA algorithms is done as
shown in Fig. 2 which shows that PSO takes less time to respond than ACO and GA.
The average resource utilization is calculated using Eq. (5). It is found that PSO uses
resources more efficiently and effectively as per the cloud providers’ desire to gain
more profit and revenue from cloud computing. The comparison of average resource
utilization is shown in Fig. 3. Based on the experimental or simulation outcomes, it
is clearly visible that few of the scheduling algorithms are very much favorable to
be adopted in cloud computing. From experimental results, it is clearly visible that
PSO found to be outperforming ACO and GA for optimization metrics makespan,
average response time, and LBR.
In this paper, brief analysis of the existing heuristics and metaheuristics approaches
has been presented for task scheduling. As task scheduling problem is NP-hard in
nature and slow convergence and trap in local optimal occur in heuristic approaches,
the metaheuristics approaches have gained popularity over heuristics one. This paper
10 J. Chauhan and T. Alam
outperforms over GA and ACO for makespan, average response time, and average
resource utilization for scheduling batch of independent tasks in a heterogeneous
cloud computing environment. There is no such metaheuristic algorithm which
performs better in all the problems. Their performance varies with the complexity of
the problem. Researchers found PSO as an interesting heuristic algorithm because
of its various advantages compared to other metaheuristic techniques such that it
can be written in few lines of code and can be implemented with only basic math-
ematical operators. PSO is capable of escaping from local optima and shows faster
convergence than other metaheuristic techniques by sustaining a balance between
exploitation and exploration. In most of the less complex and continuous search
space problems, PSO performs better than ACO and GA in terms of its success
rate and quality of the solution as observed in the considered task problem in this
paper. For complex and large search space problems, GA or ACO may perform better
than PSO. PSO is fast gradient, more robust, and stable algorithm. Its mathematical
implementation is easier than ACO and GA as it has few parameters to be adjusted.
This may be the reason for outperforming PSO over GA and ACO in optimizing
the specified QoS parameters. The authors are working on a modified PSO approach
that improves the other QoS parameters such as fault tolerance and reducing the cost
involved in the CC for scheduling workflow-centered scientific applications in cloud
computing.
References
10. Madni SHH, Latiff MSA, Coulibaly Y, Abdulhamid SM (2016) An appraisal of meta-heuristic
resource allocation techniques for IaaS cloud. Indian J Sci Technol 9. https://doi.org/10.17485/
ijst/2016/v9i4/80561
11. Tsai CW, Rodrigues JJPC (2014) Metaheuristic scheduling for cloud: a survey. IEEE Syst J
8:279–291. https://doi.org/10.1109/JSYST.2013.2256731
12. Madni SHH, Latiff MSA, Coulibaly Y, Abdulhamid SM (2016) Resource scheduling for infras-
tructure as a service (IaaS) in cloud computing: challenges and opportunities. J Netw Comput
Appl 68:173–200. https://doi.org/10.1016/j.jnca.2016.04.016
13. Braun TD, Siegel HJ, Beck N et al (2001) A comparison of eleven static heuristics for mapping a
class of independent tasks onto heterogeneous distributed computing systems. J Parallel Distrib
Comput 61:810–837. https://doi.org/10.1006/jpdc.2000.1714
14. Thomas A, Krishnalal G, Jagathy Raj VP (2015) Credit based scheduling algorithm in cloud
computing environment. Procedia Comput Sci 46:913–920. https://doi.org/10.1016/j.procs.
2015.02.162
15. Elzeki OM, Reshad MZ, Elsoud M (2012) Improved max-min algorithm in cloud computing.
Int J Comput Appl 50:22–27.https://doi.org/10.5120/7823-1009
16. Parsa (2009) RASA: a new grid task scheduling algorithm. Int J Digit Content Technol Appl.
https://doi.org/10.4156/jdcta.vol3.issue4.10
17. Devipriya S, Ramesh C (2013) Improved max-min heuristic model for task scheduling
in cloud. In: Proceedings 2013 international conference on green computing, communica-
tion and conservation of energy, ICGCE 2013, pp 883–888.https://doi.org/10.1109/ICGCE.
2013.6823559
18. Maguluri ST, Srikant R, Ying L (2012) Stochastic models of load balancing and scheduling in
cloud computing clusters. In: Proceedings—IEEE INFOCOM, pp 702–710.https://doi.org/10.
1109/INFCOM.2012.6195815
19. Kaur S, Bagga P, Hans R, Kaur H (2019) Quality of service (QoS) aware workflow scheduling
(WFS) in cloud computing: a systematic review. Arab J Sci Eng 44:2867–2897. https://doi.
org/10.1007/s13369-018-3614-3
20. Alam T, Raza Z (2018) Quantum genetic algorithm based scheduler for batch of precedence
constrained jobs on heterogeneous computing systems. J Syst Softw 135:126–142. https://doi.
org/10.1016/j.jss.2017.10.001
21. Shahid M, Raza Z, Sajid M (2015) Level based batch scheduling strategy with idle slot reduction
under DAG constraints for computational grid. J Syst Softw 108:110–133. https://doi.org/10.
1016/j.jss.2015.06.016
22. Zhang Y, Koelbe C, Cooper K (2009) Batch queue resource scheduling for workflow applica-
tions. Proceedings—IEEE international conference on cluster computing. https://doi.org/10.
1109/CLUSTR.2009.5289186
23. Attiya I, Zhang X (2017) A simplified particle swarm optimization for job scheduling in cloud
computing. Int J Comput Appl. https://doi.org/10.5120/ijca2017913744
24. Mathew T, Sekaran KC, Jose J (2014) Study and analysis of various task scheduling algo-
rithms in the cloud computing environment. In: Proceedings 2014 international conference on
advances in computing, communications and informatics, ICACCI 2014, pp 658–664. https://
doi.org/10.1109/ICACCI.2014.6968517
25. Tawfeek M, El-Sisi A, Keshk A, Torkey F (2015) Cloud task scheduling based on ant colony
optimization. Int Arab J Inf Technol 12:129–137
26. Srikanth GU, Maheswari VU, Shanthi P, Siromoney A (2012) Tasks scheduling using ant colony
optimization. J Comput Sci 8:1314–1320. https://doi.org/10.3844/jcssp.2012.1314.1320
27. Jabreel M. The study of genetic algorithm-based task scheduling for cloud computing
28. Safwat A, Fatma A (2016) Genetic-based task scheduling algorithm in cloud computing
environment. Int J Adv Comput Sci Appl 7. https://doi.org/10.14569/ijacsa.2016.070471
29. Almezeini N, Hafez A (2017) Task scheduling in cloud computing using lion optimization
algorithm. Int J Adv Comput Sci Appl 8:. https://doi.org/10.14569/ijacsa.2017.081110
30. Agarwal M, Srivastava GMS (2019) A PSO algorithm based task scheduling in cloud
computing. Int J Appl Metaheuristic Comput 10:1–17. https://doi.org/10.4018/IJAMC.201910
0101
Comparative Study of Metaheuristic Algorithms for Scheduling … 13
31. Masdari M, Salehi F, Jalali M, Bidaki M (2017) A survey of PSO-based scheduling algorithms
in cloud computing. J Netw Syst Manag 25:122–158. https://doi.org/10.1007/s10922-016-
9385-9
32. Salman A, Ahmad I, Al-Madani S (2002) Particle swarm optimization for task assignment
problem. Microprocess Microsyst 26:363–371. https://doi.org/10.1016/S0141-9331(02)000
53-4
33. Zhang L, Chen Y, Yang B (2006) Task scheduling based on PSO algorithm in computational
grid. Proc - ISDA 2006 Sixth Int Conf Intell Syst Des Appl 2:696–701. https://doi.org/10.1109/
ISDA.2006.253921
34. Al-Maamari A, Omara FA (2015) Task scheduling using PSO algorithm in cloud computing
environments. Int J Grid Distrib Comput 8:245–256. https://doi.org/10.14257/ijgdc.2015.8.
5.24
35. Beegom ASA, Rajasree MS (2019) Integer-PSO: a discrete PSO algorithm for task scheduling
in cloud computing systems. Evol Intell 12:227–239. https://doi.org/10.1007/s12065-019-002
16-7
36. Guo L, Zhao S, Shen S, Jiang C (2012) Task scheduling optimization in cloud computing based
on heuristic algorithm. J Networks 7:547–553. https://doi.org/10.4304/jnw.7.3.547-553
37. Awad AI, El-Hefnawy NA, Abdel-Kader HM (2015) Enhanced particle swarm optimization for
task scheduling in cloud computing environments. Procedia Comput Sci 65:920–929. https://
doi.org/10.1016/j.procs.2015.09.064
38. Ebadifard F, Babamir SM (2018) A PSO-based task scheduling algorithm improved using a
load-balancing technique for the cloud computing environment. Concurr Comput 30
39. Salot P (2013) A survey of various scheduling algorithm in cloud computing environment. Int
J Res Eng Technol 2(2):131–135
40. Kaur S, Verma A (2012) An efficient approach to genetic algorithm for task scheduling in cloud
computing environment. Int J Inf Technol Comput Sci 4:74–79. https://doi.org/10.5815/ijitcs.
2012.10.09
41. Konar D, Sharma K, Sarogi V, Bhattacharyya S (2018) A multi-objective quantum-inspired
genetic algorithm (Mo-QIGA) for real-time tasks scheduling in multiprocessor environment.
Procedia Comput Sci 131:591–599. https://doi.org/10.1016/j.procs.2018.04.301
42. Gupta A, Garg R (2017) Load balancing based task scheduling with ACO in cloud computing.
In: 2017 International conference computing applications ICCA 2017, pp 174–179.https://doi.
org/10.1109/COMAPP.2017.8079781
43. Introduction I (2011) Improved ant colony optimization for grid scheduling. 1:596–604
44. Alworafi MA, Dhari A, El-Booz SA et al (2019) An enhanced task scheduling in cloud
computing based on hybrid approach. Springer Singapore
45. Alsaidy SA, Abbood AD, Sahib MA (2020) Heuristic initialization of PSO task scheduling
algorithm in cloud computing. J King Saud Univ—Comput Inf Sci.https://doi.org/10.1016/j.
jksuci.2020.11.002
Impact of Spatial Distribution
of Repeated Samples on the Geometry
of Hyperplanes
Abstract Support vector machines (SVMs) and their uses in various scientific
domains have been the subject of extensive research in recent years. SVMs are among
the most potent and reliable classification and regression algorithms in various appli-
cation areas. In the proposed work, the impact of location and multiple occurrences
of support vectors on SVM has been studied by noticing the geometrical differences.
Multiple occurrences or repetitions of data points are generally done; in case of
imbalance classes to balance the data otherwise, results will be biased toward the
majority class. Multiple occurrences of the same data points will result in a change
of behavior and orientation of the hyperplane. The hyperplane will change if the
support vectors are deleted or added.
1 Introduction
One of the most popular techniques for classification problems, such as disease
detection [1, 2], text recognition [3], emotion detection [4] and face detection [5],
is the support vector machine (SVM). For the optimization problem, SVM provides
a globally optimal solution by employing a maximum margin strategy. The notion
of structural risk minimization is included into SVM. Vapnik is presented SVM as
a machine learning model for applications including classification and regression.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 15
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_2
16 R. Lalit and Kapil
Support vector machines were created by Vladimir Vapnik in 1979. As seen in Fig. 2,
an SVM is a hyperplane that, with the largest feasible margin, separates a set of
positive samples from a set of negative samples. The distance between the hyperplane
and the closest positive and negative examples in the linear instance determines the
margin. Various versions of SVM are available such as Twin SVM, Least squares twin
SVM, L1-norm-based TSVM, Fuzzy SVM, and SVM for multi-view and multi-class
learning. Jayadeva and Chandra [7], by tackling quadratic programming issues in the
twin support vector machine, produce two non-parallel hyperplanes. Kumar et al. [8],
in least square Twin SVM, equality constraints are modified in inequality constraints.
Wang et al. [9] present L1-norm-based TSVM to increase the robustness of the TSVM
model. To reduce the effects of outliers, fuzzy support vector machines came into the
picture [10]. Multi-view learning further enhances the generalization of the SVM-
based models [11]. Richhariya and Tanveer [12] proposed a reduced Universum
twin support vector learning to address the issue of class imbalance by employing
a tiny rectangular kernel matrix to shorten the computation time of our Universum-
based approach. Ganaie and Tanveer [13] take into account the neighborhood that is
included in the objective function’s weight matrix.
In its linear form, SVM is a hyperplane to distinguish between sets of positive and
negative data samples. Numerous hyperplanes might be used to divide the two classes,
but the one that generates the greatest margin is picked. The margin is determined by
calculating the distance between the hyperplane and the nearest positive or negative
data sample [14]. Let the training data be denoted by T d .
Td = {( A1 D1 ), ( A2 D2 ), . . . , ( Am Dm )}, (1)
Impact of Spatial Distribution of Repeated Samples on the Geometry … 17
where Am ∈ R n and D ∈ {1, −1} are the labels for ith observations i = 1, 2, . . . , m.
In Linear SVM, the Primal QPP is intended to be resolved using Linear SVM.
1 m
min w22 + C ξi
w,b,ξ 2
i=1
s.t. D(∅(A)w + eb) ≥ e − ξ, (2)
where ξ is the slack variable and C is a penalty parameter. The objective is to identify
the best separating hyperplane.
1
max − α t D∅( A)∅(A)t Dα + et α
α,μ 2
s.t., Ce ≥ α ≥ 0, D t α = 0, (4)
1
min α t D K A, At Dα − et α
α,μ 2
s.t., Ce ≥ α ≥ 0, D t α = 0, (5)
Here, K A, At = ∅(A)∅(A)t represents the linear Kernel function. To find the value
of b and ξ, we will look for support vectors. Data points where α = 0 as per KKT
conditions:
D(∅(A)i w + ei b) + ξi − ei = 0, (6)
w = At Dα. (8)
Here, n + and n − are the support vectors belonging to positive and negative classes
and xi is the data points of both classes. For each support vector, if it is belonging to a
positive class, the value of ξi can be calculated by Eq. (10). For each support vector,
if it is belonging to a negative class, the value of ξi∗ will be calculated by Eq. (11).
ξi = 1 − xi ∅(A)t Dα − b, (10)
xi belongs to the positive class data sample for ξi , and for ξi∗ it will belong to the
negative class data points.
For the proposed work, two clustered normal distributed datasets, having two features
X1 and X2, are generated and divided into two classes. Dataset is imbalanced and has
500 and 1000 data samples for positive and negative classes, respectively. Dataset
imbalance is in 1:2 as shown in Fig. 1. Few data samples from each class lie in the
overlapped region. The dataset dimensions are 1500 × 2.
After implementing the SVM classifier on artificially generated dataset and calcu-
lating and plotting support vectors on the classifier as shown in Fig. 2. Now, we will
try to note the geometrical differences after repeating support vectors at different
locations. Firstly, the value of ξ and ξ * is divided into ten different bins for posi-
tive and negative classes, respectively. The distribution of the positive and negative
classes in various bins is shown in Figs. 3 and 4.Two bins of size (0 to 0.5) and (0.5
to 1.0) from the value of ξ and ξ * are created, and their data samples are repeated
Impact of Spatial Distribution of Repeated Samples on the Geometry … 19
for both positive and negative classes. After repeating the data samples of the posi-
tive class, the entire hyperplane will be shifted in the upward direction as shown in
Figs. 5 and 6. Similarly, data samples can be repeated for the negative class, then the
entire hyperplane will be shifted in downward direction. ξ and ξ * values obtained
are positive, but some values of ξ and ξ * are negative because some points are lying
very close to the line.
3.2 Algorithm
Step 2. Implement the SVM classifier from Eq. (5). Considering variable C = 1,
and threshold = 10−10 .
Step 3. Calculate Support Vector data points where α ≥ Threshold and plot them.
Step 4. Calculate the value of b and w from Eqs. (7) and (8) respectively.
Step 5. Calculate the value of ξ and ξ * from Eqs. (10) and (11) respectively.
Step 6. Dividing the value of ξ and ξ *into bins and repeating the data points of a
particular bin n number of times at a time, and noticing geometrical differences
on SVM.
Step 7. Now, repeat the number of support vectors at a different location and notice
geometrical differences on SVM. Various locations at which support vectors are
repeated are arranged in four cases.
Case 1: Left side SV of the Positive class.
Impact of Spatial Distribution of Repeated Samples on the Geometry … 21
Consider Fig. 2, for the original data. Here, in this case, data points of the positive
class for input feature in the range X1(− 0.6 to − 0.2) and X2(0 to 0.5) are repeated.
In total, eight data points fall in this range. As the number of data points of the
positive left class is repeated, the contour is shifted in the upward direction from that
22 R. Lalit and Kapil
particular location as shown in Fig. 7. Similarly, data points of the negative class for
input feature in the range X1(− 0.6 to 0) and X2(0.4 to 0.6) are repeated. In total, 21
data points fall in this range. As the number of data points of the negative left class
is repeated, the contour is shifted in the left downward direction from that particular
location as shown in Fig. 9
Consider Fig. 2, for original data. Here, in this case, data points of the positive class
for input features in the range X1(0.2 to 0.4) and X2(0 to 0.5) are repeated. In total,
eight data points fall in this range. Then, the geometrical difference is observed. As
the number of the right data points of the positive class is repeated, the contour is
shifted in the upward direction from that particular location as shown in Fig. 8. Similar
behavior can be observed for the negative class, and data points of the negative class
for input features in the range X1(0 to 0.4) and X2(0.4 to 0.6) are repeated. In total,
21 data points fall in this range and the hyperplane is shifted in the right downward
direction from that particular location as shown in Fig. 10. The summary of all the
positive class and negative class cases is summarized in Table 1.
Impact of Spatial Distribution of Repeated Samples on the Geometry … 23
In this paper, we presented a novel point of view on the SVM by discussing the impact
of the spatial distribution of repeated samples on the geometry of hyperplanes. As
it is seen in the proposed work by repeating the number of samples on a particular
location, a specified no of times, the hyperplane can shift its position. This means that
average error can be reduced which can further reduce the misclassification of data
samples. In the future, spatial distribution of repeated samples can be implemented
on variants of SVM.
24 R. Lalit and Kapil
Table 1 Impact of repeating SV at different locations of positive class on the SVM classifier
Range to repeat Range to repeat No. of Result
point of (X1 point of (X2 repeated
feature) feature) points
Case 1 − 0.6 to − 0.2 0 to 0.5 8 The hyperplane moved in the left
upward direction as shown in Fig. 6
Case 2 0.2 to 0.4 0 to 0.5 8 The hyperplane moved in the right
upward direction as shown in Fig. 7
Case 3 − 0.6 to 0 0.2 to 0.4 21 The hyperplane moved in the left
downward direction as shown in
Fig. 8
Case 4 0 to 0.4 0.4 to 0.6 21 The hyperplane moved in the right
downward direction as shown in
Fig. 9
References
1. Richhariya B, Tanveer M (2018) EEG signal classification using universum support vector
machine. Expert Syst Appl 106:169–182. https://doi.org/10.1016/j.eswa.2018.03.053
2. Eke CS, Jammeh E, Li X, Carroll C, Pearson S, Ifeachor E (2021) Early detection of Alzheimer’s
disease with blood plasma proteins using support vector machines. IEEE J Biomed Health
Inform 25(1):218–226. https://doi.org/10.1109/jbhi.2020.2984355
3. Liu Z, Lv X, Liu K, Shi S (2010) Study on SVM compared with the other text classification
methods. In: 2010 Second international workshop on education technology and computer
science. https://doi.org/10.1109/etcs.2010.248
4. Sepúlveda A, Castillo F, Palma C, Rodriguez-Fernandez M (2021) Emotion recognition from
ECG signals using wavelet scattering and machine learning. Appl Sci 11(11):4945. https://doi.
org/10.3390/app11114945
5. Raji ID, Fried G (2021) About face: a survey of facial recognition evaluation. ArXiv: Computer
Vision and Pattern Recognition. https://arxiv.org/pdf/2102.00813
Impact of Spatial Distribution of Repeated Samples on the Geometry … 25
1 Introduction
The agriculture sector is an indispensable sector of every country and becomes even
more important, especially for a developing country like India. Agriculture is the
primary source of livelihood for nearly 58% of India’s population and contributes
about 17% to Gross Value Added (GVA) [1]. India is among the world’s leading
producers of rice and wheat in terms of net production volume; agriculture has a
vital role in import and export as well. Many industries depend on agriculture as
it is the primary source of raw materials like cotton, jute, sugar, tobacco, oils, etc.
According to the Department for Promotion of Industry and Internal Trade (DPIIT), a
cumulative Foreign Direct Investment (FDI) equity inflow of about US$ 9.08 billion
was achieved from April 2000 to 2019 in the agriculture sector alone [2]. A significant
contribution toward any country’s growth is derived from agriculture.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 27
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_3
28 G. Manchanda et al.
With the increasing global population estimated to touch 9.6 billion by 2050,
advancement in the agriculture sector is a must to feed the growing population [3].
However, farmers in India still use manual methods for crop monitoring, irrigation,
and other activities. These manual methods take time and sometimes cannot detect
the exact situation, leading to poor crop yield. Therefore, food security is a crucial
issue in India. According to the Food and Agriculture Organization of the UN (FAO),
it is estimated that over 189.2 million people go hungry every day in the country [4].
Adopting sustainable farming practices can increase both productivity and reduce
ecological harm as it will help produce a greater agricultural output while using less
land, water, and energy, ensuring profitability for the farmers. Sustainable agriculture
is defined as a system that helps conserve resources and reduces agricultural practices
that pose a threat to the environment [5].
The use of innovations like the Internet of Things (IoT) in farming could have the
best results against the challenges (like adverse environmental conditions, climate
change, increasing expenses, wastage of resources, etc.) in the future [6]. IoT is a
system of interrelated networks of physical tools with sensors, software, and other
technological equipment that can transfer and collect data with other devices or
systems over the internet without requiring human interference [7]. Smart farming
with big data and advanced analytics technology includes automation, adding senses
and analytics to modern agriculture. The use of technology will not only help provide
better yield and less labor effort, but it will also revolutionize agriculture for farmers
in India. The potential of IoT in the agricultural sector motivated us to explore the
same in this research work.
The significant contributions of this paper are as follows:
• We first present a review of Internet of Things (IoT) as an intelligent farming
solution that has the potential to overcome the problems faced in Indian agriculture
and stimulate sustainable agriculture.
• Then, we analyze and validate mathematically how the agricultural factors on
which IoT works affect the productivity of various crops using available agricul-
tural datasets. To validate the role of IoT in agriculture mathematically, we have
used R Studio’s [8] “agridat” package [9].
The rest of the paper is organized as follows. In Sect. 2, we describe the method-
ology of this work. Then, Sect. 3 presents the results, and Sect. 4 presents the
discussion. Lastly, we conclude in Sect. 5.
2 Methodology
We started our work by collecting information about the role and need for IoT in
Indian agriculture and its applications. Then, we experimented using R Studio’s
“agridat” package and selected some of the available datasets to statistically prove
the benefits of using IoT devices for sustainable agriculture.
IoT-Based Smart Farming for Sustainable Agriculture 29
This work was divided into two stages. In the first stage, we analyze the role of
IoT in agriculture. In the second stage, we use the “agridat” package available in
RStudio to analyze the effect of various factors on crop yield using the available
datasets.
In this section, we explore how IoT can be beneficial for sustainable agriculture and
how it has the potential to overcome various problems in the agricultural sector.
Agricultural problems in India: The success of the agricultural sector depends
on various factors such as climate, irrigation, soil quality, humidity, seeds, pesticides.
The problems associated with these factors thus affect agricultural production too.
Some of the significant factors are discussed below:
• Climatic conditions: Climate change harms agricultural produce. A rise has been
noticed in all of India’s mean temperature, and the frequency of rainfalls has been
increased in the last three decades. These climatic changes are more likely to affect
the agricultural yield negatively. These changing circumstances are directing us
to monitor climatic conditions [10].
• Irrigation: Irrigation is an essential input for agriculture in every country. The
yield of a crop depends on the way how the watering of these crops is done. In a
tropical country like India, where the rainfall pattern is so uncertain and irregular,
irrigation is the only hope to sustain agriculture. However, over-irrigation has their
ill effects. Large areas of land in Punjab and Haryana have become useless due
to faulty irrigation that led to salinity, alkalinity, and water-logging [11].
• Soil Quality: Soil quality is one of the most essential components for good crop
health. Soil mismanagement and land misuse adversely affect soil health. Farming
practices like in-field burning crop residuals, excessive digging or tillage, irriga-
tion dependent on the flood, and indiscrete use of chemicals often lead to degrada-
tion of soil health [12]. The degrading soil health shows the dire need to monitor
soil health.
• Humidity: Humidity refers to the amount of water vapor present in the air.
Humidity is often expressed in terms of Relative Humidity (RH). The Relative
Humidity is the percentage of water vapor in the air at a given temperature and
pressure. Very high and very low Relative Humidity (RH) does not lead to high
grain yield. These can further contribute to more usage of pesticides which has
its ill effects [13].
• Seeds, Fertilizers, and Pesticides: The three pillars of modern agriculture consti-
tute seeds, fertilizers, and pesticides. The main task of these is to enhance agri-
cultural productivity. Seeds are the most essential input as far as agriculture is
concerned. It has been observed that still many farmers use common grain saved
from the previous crop as seed and cannot distinguish between common grain and
seed. Using common grain seeds affects productivity. Judicious and optimal use
30 G. Manchanda et al.
of fertilizers is necessary to meet the future demand for food with the increasing
population. Based on the study reports of the National Institute of Agricultural
Economics and Policy (NIAP), one-third of the major states apply excess nitrogen
and two-thirds of them apply nitrogen below the optimum level [14]. There are
similar regional imbalances in the use of Potassium (K) and Phosphorus (P). This
further stresses the use of modern technology for the right mix of crops. In India,
a drop in the crop yield has been found due to pests including weeds, insect pests,
diseases, nematodes, and rodents, ranging from 15 to 25% causing a loss of 0.9
to 1.4 lakh crore rupees annually [15].
IoT as a solution: Crop yield is the measure of grains that are produced from a
given land of the plot. It is the most important factor in agriculture as it measures the
performance of the farmer and depicts in totality the efforts and resources invested
in the development of plants on the fields. Increasing crop yield is the main aim of
every farmer and one of the common ways to do so is effectively improving crop
management which includes preparation of soil, sowing of seeds, the addition of
manures and fertilizers, irrigation, protection from weeds, harvesting, and storage.
The above management decisions should be used efficiently in reducing losses and
improving quality. Using IoT to control and monitor devices at the farm which
eventually collects the data from the sowing of seeds to harvesting makes it an easier
task to improve the crop yield without wastage of any resource.
IoT plays a very important role in smart agriculture; IoT sensors are capable of
providing information about agriculture fields. IoT agricultural monitoring system
makes use of sensor networks that collects data from different sensors deployed at
various nodes and sends it through the wireless protocol. The primary data flow
mechanism used by sensors allows them to sense, store, present, evaluate, decide,
and control by receiving real-time data feeds on a variety of gadgets, such as smart
phones and tablets [16]. The main function of IoT gadgets is live monitoring of
environmental data in terms of temperature, moisture, and other types depending on
the sensors integrated with it, and then, farmers can implement smart farming by
getting live data feeds on various devices like smartphones, tablets, etc. The data
generated via sensors can be easily shared and viewed by agriculture consultants via
cloud computing technology [17].
Various sensors that are used in the IoT devices [15] for agriculture to gather
information are discussed below:
• Temperature Sensor: The DS18B20 temperature sensor [18] provides 9-bit to 12-
bit Celsius and it also has an alarm function with non-volatile user-programmable
upper and lower trigger points. The biggest changing range of soil temperature
is 0–40° and the optimum average range required for plant growth is 20–30 °C
[19]. The DS18B20 has 64-bit serial code which allows multiple DS18B20s to
function on the same 1-wire bus.
• Soil Moisture Sensor: The soil moisture sensor has two large exposed conductors
which function as probes for the sensor, together acting as a variable resistor.
When the water level is low in the soil, the conductivity will be low, and thus,
the analog voltage will be low and this analog voltage keeps increasing as the
IoT-Based Smart Farming for Sustainable Agriculture 31
conductivity between the electrodes in the soil changes. In this way, soil moisture
is detected by the sensor [17].
• Light Intensity Sensor: All crops react differently and have different physiologies
to deal with light intensity. Thus, the farmers need to provide sufficient light of at
least 8–10 h per day to have healthy growth. Using smart farming techniques which
include the sensor system that controls the light intensity could be a better option
as it is time-efficient. There are several types of light sensors that is photoresistors,
photodiodes, and phototransistors. They are used for the automated light intensity
monitoring process as they separate the substance of light in a growth chamber
and increase or decrease the brightness of the light to have an accurate level [20].
The use of IoT in the Indian agricultural sector has been widely promoted by the
Government of India as well. The Government of India has introduced new schemes
to help Indian farmers in the advancement of Indian agriculture utilizing the concept
of smart farming. Some of the Government initiatives and schemes are described
below:
• Mobile apps
The Government of India has launched several mobile applications for farmers which
provide information on agriculture activities, free of cost, for the benefit of farmers
and other stakeholders [21].
Crop Cutting Experiments (CCE) Agri Mobile App: This app collects crop cutting
experiment data and has a special quality as it works on both online and offline
modes. Internet is required for only installing this app on mobile and for registration.
After this data can be added to the CCE app without internet and when internet
connectivity is available, data can be pushed to the server [22].
Kisan Suvidha: This app provides information related to weather (humidity,
temperature, etc.), market prices, plant protection techniques, cold stores, godowns,
and agro-advisory section which shows messages for farmers in different local
languages. This app also directly connects the farmer with the kisan call center
where technical graduates answer farmers’ queries [23].
• Agriculture events
Following are some of the events and projects organized by the Government of India
to promote smart farming.
Agri India Hackathon: The Agri India Hackathon is organized by Pusa Krishi,
ICAR—Indian Agricultural Research Institute (IARI), Indian Council of Agricul-
tural Research (ICAR) and Department of Agriculture, Cooperation and Farmers’
Welfare, Ministry of Agriculture and Farmers’ Welfare. It is the largest virtual gath-
ering to boost up the advancements in agriculture. Agri India Hackathon discussed
precision farming including the application of sensors, WSN, ICT, AI, IoT, and
drones. Precision livestock and aquaculture are also a goal of this initiative [24].
SENSAGRI Project for Drone Based Agriculture Technology: SENSAGRI is
“SENsor-based Smart AGRIculture” formulated by the Indian Council of Agricul-
tural Research (ICAR) through the Indian Agricultural Research Institute (IARI).
32 G. Manchanda et al.
The main objective is to develop an indigenous prototype for a drone-based crop and
soil health monitoring system using hyperspectral remote sensing (HRS). It has a
feature to smoothly scout over farm fields, gathering precise information and trans-
mitting the data on a real-time basis. It will be an advantage in the farming sector
at regional/local scale for assessing land and crop health: extent, type, and severity
of damage besides issuing forewarning, post-event management, and settlement of
compensation under crop insurance schemes [25].
3 Results
In this section, we present and discuss the results from our analysis. We deduced the
following results after doing the statistical analysis on the datasets “gregory.cotton”
and “gumpertz.pepper” [9].
• The “gregory.cotton” package is a factorial experiment of cotton conducted in
Sudan, and it includes 144 observations on the following six variables: yield (a
numeric vector), year, nitrogen (nitrogen level), date (sowing date), water (irri-
gation amount), and spacing (spacing between plants). We analyzed the effect of
water and nitrogen level on the yield of these crops and the findings are explained
below:
(a) Yield and water—Cotton yield is very much dependent on the amount and
frequency of irrigation water [26]. The effects on yield were studied on three
irrigation levels: I1 = Light, I2 = Medium, and I3 = Heavy. The yield was
found to be maximum under heavy irrigation as depicted in Figs. 1 and 2.
(b) Nitrogen level and yield—Nitrogen is a crucial nutrient that plays important
role in the photosynthesis, growth, and yield of cotton crops [27]. The effects
on the yield were studied on two nitrogen levels: L = none/control, H = 600
rotls/feddan. The yield was found to be maximum under H nitrogen level as
shown in Figs. 3 and 4.
IoT-Based Smart Farming for Sustainable Agriculture 33
various other factors affect the presence of disease, but this can be an initial warning
sign for the same.
4 Discussion
In this section, we discussed our findings of the merits of IoT for sustainable and
advanced agriculture and derived that IoT is the need of the hour for a good crop
yield which brings some challenges.
Key findings from Agridat package datasets: The crop yield depends on irri-
gation, as depicted in Figs. 1 and 2, and if proper irrigation is not provided, it can
affect crop production. Using IoT devices such as soil sensors and cloud-based data
analytics can monitor the need for water in the soil and thereby allow farmers to
determine when they should irrigate their farms. This will not only help conserve
water but also prevent over-irrigation, which can affect yield adversely.
To minimize losses and increase efficiency in cotton plants, Nitrogen (N) fertilizer
should be applied as close as possible to the time it will be taken up by the plant, indi-
cating that cotton requires varying amounts of N throughout its growth, as depicted
IoT-Based Smart Farming for Sustainable Agriculture 35
in Figs. 3 and 4. By using smart devices, we can automate multiple processes across
our production cycle, increasing yield efficiency through automation. Example: It
will help monitor the plant’s requirements for nutrients in the soil.
If any plant was wilted, dead, or had lesions, the phytophthora disease was consid-
ered present in the plot, as depicted in Fig. 5. IoT devices help in crop management as
they can monitor crop growth and any anomalies to prevent diseases or infestations
that could harm the yield effectively.
Challenges for IoT in Agriculture Sector: Farmers cannot take full advantage
of this technology due to poor Infrastructure. There is a problem with Internet acces-
sibility in farms located in remote areas. In such cases, the monitoring systems these
farmers use become unreliable and useless. The machinery used in the implemen-
tation of the IoT system is costly. The sensors used in this system are the least
expensive, but fitting this system in the agricultural field is too costly.
5 Conclusion
Acknowledgements This research follows from the project work done as part of Summer Intern-
ship Programme (SIP) 2020–21 organized by Centre for Research, Maitreyi College, University of
Delhi.
References
1. Annual Report 2020. ICAR, Government of India, Ministry of Agriculture & Farmers Welfare.
https://icar.gov.in/sites/default/files/ICAR-AR-2020-English.pdf. Last accessed 12 Apr 2022
2. The emerging scope of agri-tech in India. https://www.investindia.gov.in/team-india-blogs/eme
rging-scope-agri-tech-india. Last accessed 12 Apr 2022
3. Balakrishna G, Nageshwara Rao M (2019) Study report on using IoT agriculture farm moni-
toring. Lect Notes Networks Syst 74:483–491. https://doi.org/10.1007/978-981-13-7082-3_
55
4. IFBN: hunger in India. https://www.indiafoodbanking.org/hunger. Last accessed 12 Apr 2022
36 G. Manchanda et al.
5. D’souza G, Cyphers D, Phipps T (1993) Factors affecting the adoption of sustainable agri-
cultural practices. Agric Resour Econ Rev 22:159–165. https://doi.org/10.1017/s10682805000
04743
6. Salecha M (2022) Smart farming: IoT in agriculture. https://analyticsindiamag.com/smart-far
ming-iot-agriculture/. Last accessed 12 Apr 2022
7. Ministry of Electronic and Information Technology: IoT Policy Document. https://meity.gov.
in/sites/upload_files/dit/files/Draft-IoT-Policy%281%29.pdf. Last accessed 12 Apr 2022
8. RStudio: RStudio: Integrated development environment for R. www.rstudio.com. Last accessed
12 Apr 2022
9. Wright K (2022) “agridat”: agricultural datasets. R package version 1.20. https://cran.r-project.
org/package=agridat. Last accessed 12 Apr 2022
10. Effect of climate change on agriculture. Press Information Bureau Government of India
Ministry of Agriculture & Farmers Welfare. https://pib.gov.in/Pressreleaseshare.aspx?PRID=
1696468. Last accessed 12 Apr 2022
11. Krar P. Parts of Haryana have salty groundwater and rains add to the salt
content. https://economictimes.indiatimes.com/news/economy/agriculture/parts-of-haryana-
have-salty-groundwater-and-rains-add-to-the-salt-content/articleshow/71342070.cms
12. Soil health is degraded in most of regions of India. https://www.livemint.com/news/india/-
soil-health-is-degraded-in-most-regions-of-india-11595225689494.html. last accessed 12 Apr
2022
13. Agrometeorology: relative humidity and plant growth. https://agritech.tnau.ac.in/agriculture/
agri_agrometeorology_relativehumidity.html. Last accessed 12 Apr 2022
14. Raising agricultural productivity and making farming remunerative for farmers. https://www.
niti.gov.in/sites/default/files/2019-08/RaisingAgriculturalProductivityandMakingFarmingR
emunerativeforFarmers.pdf. Last accessed 12 Apr 2022
15. Vennila S, Lokare R, Singh N, Ghadge SM, Chattopadhyay C (2022) Crop pest surveil-
lance and advisory project of Maharashtra. https://farmer.gov.in/imagedefault/handbooks/Boo
KLet/MAHARASHTRA/20160725144307_CROPSAP_Booklet_for_web.pdf. Last accessed
12 Apr 2022
16. Meera SN, Kathiresan C (2022) Internet of Things (IoT) in agriculture industries. https:/
/aphrdi.ap.gov.in/documents/Trainings@APHRDI/2017/8_aug/IOT/ShaikNMeera1.pdf. Last
accessed 12 Apr 2022
17. Nayyar A, Puri V (2017) Smart farming: IoT based smart sensors agriculture stick for live
temperature and moisture monitoring using Arduino, cloud computing & solar technology.
In: Communication and computing systems—proceedings of the international conference on
communication and computing systems, ICCCS 2016, pp 673–680. https://doi.org/10.1201/
9781315364094-121
18. DS18B20+T&R. https://in.element14.com/maxim-integrated-products/ds18b20-t-r/temper
ature-sensor-0-5deg-c-to/dp/2515605. Last accessed 12 Apr 2022
19. Aniley AA, Kumar N, Kumar A (2017) Soil temperature sensors in agriculture and the role of
nanomaterials in temperature sensors preparation. Int J Eng Manuf Sci 7:2249–3115
20. Lakhiar IA, Jianmin G, Syed TN, Chandio FA, Buttar NA, Qureshi WA (2018) Monitoring and
control systems in agriculture using intelligent sensor techniques: a review of the aeroponic
system. J Sensors 2018. https://doi.org/10.1155/2018/8672769
21. Mobile apps empowering farmers. https://www.manage.gov.in/publications/edigest/dec2017.
pdf. Last accessed 12 Apr 2022
22. Crop cutting experiment. http://krishi.maharashtra.gov.in/Site/Upload/Pdf/CCE_App_Tut
orial_Primary_Worker_Maharashtra.pdf. Last accessed 12 Apr 2022
23. Kisan Suvidha. http://mkisan.gov.in/aboutmkisan.aspx. Last accessed 12 Apr 2022
24. Agri India Hackathon. https://innovateindia.mygov.in/agriindiahackathon/. Last accessed 12
Apr 2022
25. Agricultural situation in India. https://eands.dacnet.nic.in/PDF/August2016.pdf. Last accessed
12 Apr 2022
IoT-Based Smart Farming for Sustainable Agriculture 37
26. Hunsaker DJ, Clemmens AJ, Fangmeier DD (1998) Cotton response to high frequency surface
irrigation. Agric Water Manag 37:55–74. https://doi.org/10.1016/S0378-3774(98)00036-5
27. Nitrogen fertility and abiotic stresses management in cotton crop: a review
28. Bowers JH (1990) Effect of soil-water matric potential and periodic flooding on mortality
of pepper caused by Phytophthora capsici. Phytopathology 80:1447. https://doi.org/10.1094/
phyto-80-1447
ELM-Based Liver Disease Prediction
Model
1 Introduction
The Greek word ‘Hepar’ means liver which is the largest gland present in the human
body. A cone-like structure is present on top of the stomach, protected by a rib cage.
Being a vital digestive organ, it is necessary to maintain a healthy liver. A healthy
liver is a key to a healthy body of a person. Unfortunately, because of change in
lifestyle and dietary pattern that involves the intake of junk and canned food, tend to
impact the liver causing it to lose its ability to work efficiently.
C. Agarwal (B)
Ajay Kumar Garg Engineering College, Dr. A.P.J. Abdul Kalam Technical University, Ghaziabad,
Uttar Pradesh, India
e-mail: agarwalcharu@akgec.ac.in
G. Singh
KIET Group of Institutions, Dr. A.P.J. Abdul Kalam Technical University, Ghaziabad, Uttar
Pradesh, India
A. Mishra
Deendayal Upadhyay College, University of Delhi, Delhi, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 39
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_4
40 C. Agarwal et al.
Liver diseases can be classified into four stages: inflammation, fibrosis, cirrhosis,
and liver failure [1]. Inflammation is the initial stage in which the tissue of the body
tends to swell and is also known as hepatitis. Hepatitis is a viral infection of five
types, i.e., A, B, C, D, E. The second stage of liver disease is fibrosis in which
mild scarring of tissues starts to appear. The third stage is cirrhosis, and the late-
stage scarring of liver tissues happens which is permanent. Cirrhosis can be further
classified into compensated and decompensated stages. Compensated cirrhosis is
asymptomatic, and the decompensated cirrhosis is symptomatic in which the liver
is unable to function well. Liver Failure, which is the last stage, is life-threatening.
The only available treatment is an expensive liver transplant.
Rapid recognition of liver illness is beneficial to a person’s ability to live a healthy
life. Liver Function Test (LFT) and Imaging can be done for diagnosis of disease.
The blood sample of a person is collected and a report is generated after analyzing
the sample which includes parameters like sgpt, sgot, total bilirubin, etc. Based on
these parameters, the hepatologist prescribes medication and precautionary measures
which can treat the individual.
In India, liver disease is the tenth leading cause of death and the second major
reason for death in the USA. It was found that approximately two million people die
due to one or the other liver illness [2].
From the above facts, it can be inferred that manual analysis by hepatologists
would be a tedious and difficult task. Manual analysis is also error prone. To help
the medical community, we can build fully automated analytical systems using a
variety of advanced technologies to deliver efficient and accurate results. Various
machine learning algorithms can be used to develop such models. Many researchers
have worked on the development of the model, as shown below.
The ILPD dataset was modified from Geetha and Arunachalam [3] to evaluate the
effectiveness of SVM and LR algorithms in diagnosing liver disease. The authors
examined SVM and LR for accuracy, precision, sensitivity, and specificity and found
that SVM had 75.04% higher accuracy (73.23%) than LR.
Various researchers examined different machine learning classification algorithms
for liver disease prediction [4–10]. They applied algorithms such as SVM, LR, KNN,
RF, DT and computed the values of classification accuracy. After a thorough study
of the literature, it is clear that there is a need to develop computer-based models that
can more accurately predict liver disease with less computational effort.
In this paper, we used extreme learning machines (ELMs) as classifiers to propose
a more computationally efficient and accurate machine learning-based model for
liver disease prediction. ELM is a fast single-layer feedforward neural network with
good generalization capabilities. ELM has already been used successfully for various
other classification task such as ECG classification [11], fingerprint recognition [12],
watermarking [13], face mask detection [14]. The proposed model is trained and
tested on the ILPD dataset. We also examined ELM performance using different
activation functions. This is because networks help understand complex data, taking
input from a previous layer and transforming it into a format that is used as input
to the next layer. ELM applies a nonlinear activation function to transform it into a
linear system. The activation functions used in this model are Sigmoid, Relu, Leaky_
ELM-Based Liver Disease Prediction Model 41
Relu, Tanh, Sin, Tanhre, and Swish with different sets of hidden neurons (8, 16, 32,
64, 128, 512, and 1024). The main contribution of the current experimental work is to
find out the possibility of detecting liver diseases using ELM algorithms. This paper
is organized as follows. Section 2 presents the mathematical formulation of ELM
and its activation function. Section 3 presents details of the ILPD dataset. Section 4
presents a proposed methodology. Section 5 presents results and discussion. Finally,
Sect. 6 concludes the work.
3 Dataset
For our study, we have used Indian Liver Patient Dataset (ILPD) [17] from the Kaggle
repository as this is the only dataset that is available freely for the research fraternity.
This dataset has 11 attributes. This is a standard dataset which is used for liver disease
prediction by many researchers [18, 19]. This dataset contains 583 records of which
167 were records of healthy patients and 416 were suffering from liver disease.
4 Proposed Methodology
We compile the performance parameter values for different number of hidden neurons
and activation functions. The present experiment is carried out in three categories as
mentioned below:
In the first experiment, we considered 80% data for training the ELM and 20% data
for testing the ELM. Figure 2 depicts the graphical representation of accuracies of
various activation functions with respect to different numbers of hidden neurons.
A Relu activation function with 32 neurons and TanHRe with 128 neurons give a
maximum accuracy of 77.77%. Then, 77.19% accuracy is obtained using 512, 512,
and 32 neurons from Sigmoid, Leaky_Relu, and Swish, respectively.
The precision score, recall, and F1-score for all the activation functions for
different hidden neurons for the 80:20 data split are shown in Tables 1, 2, and 3,
respectively.
We also analyzed the model based on training time taken by ELM using different
activation functions and concluded that training time is less than 1 s for all the cases
for an 80:20 data split.
In the second experiment, we considered 70% data for training the ELM and 30%
data for testing the ELM. Figure 3 depicts the graphical representation of accuracies
of various activation functions with respect to different numbers of hidden neurons.
The highest accuracy for the 70:30 data split is 78.36% shown by the sigmoid
function for 32 hidden neurons. The precision score, recall, and F1-score for all the
ELM-Based Liver Disease Prediction Model 45
activation functions for different hidden neurons for the 70:30 data split are tabulated
in Tables 4, 5, and 6, respectively.
For the 70:30 data split, training time was also analyzed and concluded that
training time is less than 1 s for all the cases.
Table 7 showcases the comparison of our experimental results with the results of
work done by other authors based on accuracy.
46 C. Agarwal et al.
From the table above, we can see that the proposed methodology based on the
ELM classifier shows an accuracy of 78.36. This is the highest precision achieved
compared to other published studies in the same field. Therefore, we can conclude that
liver disease detection models designed with ELM as a classifier have been proven
to be optimal for prediction and can be used in the healthcare field. The current work
can be further extended by applying the proposed model to other datasets as well.
6 Conclusion
As we know, early detection of liver disease can help a person live longer. Manual
analysis of liver disease is a laborious task, so medical departments can use machine
learning models to predict liver disease. In this post, we used Extreme Learning
Machine (ELM) classifiers to build a liver disease prediction model. It is a fast
learning algorithm compared to other neural networks because it uses a feedfor-
ward method instead of backpropagation. ELM is generally preferred over other
methods for AI-related challenges due to its high speed, good generalization, and
ease of implementation. Our work comprises training ELM classifier using various
activation functions and analyzing the performance of our model with different
number of neurons on ILPD dataset first with 80:20 training:testing data ratio, then
70:30 training:testing data ratio, and lastly comparison with other author’s work. We
concluded that the highest accuracy was shown by the sigmoid activation function
with 32 neurons, which is 78.36% for a data split of 70:30.
References
1. https://www.healthline.com/health/liver-failure-stages
2. Asrani SK, Devarbhavi H, Eaton J, Kamath PS (2019) Burden of liver diseases in the world. J
Hepatol 70(1):151–171
3. Geetha C, Arunachalam A (2021) Evaluation based approaches for liver disease prediction
using machine learning algorithms. In: International conference on computer communication
and ınformatics (ICCCI), pp 1–4
4. Thirunavukkarasu K, Singh AS, Irfan M, Chowdhury A (2018) Prediction of liver disease using
classification algorithms. In: 4th International conference on computing communication and
automation (ICCCA), pp 1–3
5. Nahar N, Ara F (2018) Liver disease prediction by using different decision tree techniques. Int
J Data Min Knowl Manage Process:1–9
6. Kumar S, Katyal S (2018) Effective analysis and diagnosis of liver disorder by data mining.
In: International conference on ınventive research in computing applications (ICIRCA), pp
1047–1051
7. Hashem S et al (2018) Comparison of machine learning approaches for prediction of advanced
liver fibrosis in chronic Hepatitis C patients. IEEE/ACM Trans Comput Biol Bioinf 15(3):861–
868
8. Sontakke S, Lohokare J, Dani R (2017) Diagnosis of liver diseases using machine learning. In:
International conference on emerging trends & ınnovation in ICT (ICEI), pp 129–133
48 C. Agarwal et al.
9. Alfisahrin SNN, Mantoro T (2013) Data mining techniques for optimization of liver disease
classification. In: International conference on advanced computer science applications and
technologies, pp 379–384
10. Gogi VJ, Vijayalakshmi MN (2018) Prognosis of liver disease: using machine learning
algorithms. In: International conference on recent ınnovations in electrical, electronics &
communication engineering (ICRIEECE), pp 875–879
11. Yang J, Xie S, Yoon S, Park D, Fang Z, Yang S (2013) Fingerprint matching based on extreme
learning machine. Neural Comput Appl:435–445
12. Kim J, Shin H, Shin K, Lee M (2009) Robust algorithm for arrhythmia classification in ECG
using extreme learning machine. BioMedical Engineering OnLine
13. Mishra A, Agarwal C, Chetty G (2018) Lifting wavelet transform based fast watermarking of
video summaries using extreme learning machine. In: 2018 International joint conference on
neural networks (IJCNN), Rio de Janeiro, Brazil, pp 1–7. https://doi.org/10.1109/IJCNN.2018.
8489305
14. Agarwal C, Itondia P, Mishra A (2023) A novel DCNN-ELM hybrid framework for face mask
detection. Intell Syst Appl 17:200175, ISSN 2667-3053https://doi.org/10.1016/j.iswa.2022.
200175
15. Zhang R, Lan Y, Huang G-B, Xu Z-B (2012) Universal approximation of extreme learning
machine with adaptive growth of hidden nodes. IEEE Trans Neural Netw Learn Syst 23(2):365–
371
16. Guang-Bin H, Qin-Yu Z, Chee-Kheong S (2006) Extreme learning machine: Theory and
applications. Neurocomputing:489–501
17. Moody GB, Mark RG (2001) The impact of the MIT-BIH arrhythmia database. IEEE Eng in
Med and Biol 20(3):45–50
18. Singh G, Agarwal C (2023) Prediction and analysis of liver disease using extreme learning
machine. In: Shakya S, Du KL, Ntalianis K (eds) Sentiment analysis and deep learning.
Advances in ıntelligent systems and computing, vol 1432. Springer, Singapore. https://doi.
org/10.1007/978-981-19-5443-6_52
19. Singh G, Agarwal C, Gupta S (2022) Detection of liver disease using machine learning
techniques: a systematic survey. https://doi.org/10.1007/978-3-031-07012-9_4
20. Grandini M, Bagli E, Visani G (2020) Metrics for multi-class classification: an overview. ArXiv
abs/2008.05756
Intercompatibility of IoT Devices Using
Matter: Next-Generation IoT
Connectivity Protocol
Sharat Singh
Abstract The market for IoT devices is massive, with hundreds of companies devel-
oping a variety of IoT devices; however, due to different methods and software tech-
nologies used to develop these products, all of these devices do not necessarily work
together in a seamless manner. The Connectivity Standard Alliance (CSA) came up
with the concept and created “Matter,” an open standard for all Internet of Things
(IoT) devices. This serves as a universal connectivity standard, making it easier to
use and manage IoT devices. This paper examines the implementation, necessity,
and impact of this new protocol as the next generation of IoT connectivity protocol.
1 Introduction
S. Singh (B)
Department of Electronics, Deen Dayal Upadhyaya College, University of Delhi, New
Delhi 110078, India
e-mail: therealsharat@ieee.org
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 49
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_5
50 S. Singh
Right now, these are the current approaches possible to make different/natively
incompatible devices to work together.
• Home automation hubs: One popular approach is to use a central hub that connects
to all of your smart devices and allows you to control them from a single app or
interface [3]. Examples of popular home automation hubs include Amazon Echo,
Google Home, and Apple HomeKit.
• APIs: Many smart devices come with APIs that allow developers to interact with
them programmatically. This means that one can use code to connect different
devices together and create custom automation. For example, you could use an
API to connect a smart thermostat to a smart lighting system, so that when the
temperature drops below a certain level, the lights automatically turn on.
• If–This–Then–That (IFTTT): IFTTT is a web-based service that allows us to
create custom “applets” that connect different devices and services together [4].
For example, you could create an applet that automatically turns off your smart
lights when you leave your home, as determined by your phone’s location.
• Zigbee and Z-Wave: Zigbee and Z-Wave are wireless communication protocols
specifically designed for home automation [5, 6]. These protocols allow devices to
communicate with one another, making it possible to create complex automation
and control all devices from a central hub.
The standard for Smart Home IoT devices’ management and control is Smart Home
Ecosystems such as Apple Homekit, Google Home, Amazon Alexa, Samsung Smart
Things. These ecosystems connect, consolidate, group, and manage the smart devices
with great ease due to agreement with manufacturers for a common protocol and
development guides provided by these ecosystems.
All of the various ecosystems need their own applications, and every smart device
needs a device application of their own. These device applications are based on
various ecosystems for seamless connectivity and management. The same device,
but different applications, can be used to control all of these devices from various
ecosystems. A device from another ecosystem cannot be detected by or connected
to by any ecosystem.
Intercompatibility of IoT Devices Using Matter: Next-Generation IoT … 51
Fig. 1 Setup of different smart home ecosystems with their own set compatible smart devices
2.3 Security
Security is a critical concern in IoT because these devices often handle sensitive data
and may be used to control critical systems.
There are a number of important factors to take into account when it comes to
securing IoT systems, and because there is not a local, secure network, using the
cloud to store data from IoT devices and perform smart analysis and value added
services is unavoidable, which raises data security concerns. (IoT devices frequently
52 S. Singh
gather and transmit private data, such as identifying information or system control
information. The transmission and storage of this data must be protected).
3 What is Matter
The Matter Open Standard (formerly known as the Thread Group) is an open, royalty-
free networking protocol designed for low-power, low-bandwidth devices in the
Internet of Things (IoT) [7]. It is based on the Internet Protocol (IPv6) and is designed
to be simple, secure, and scalable, enabling devices to connect and communicate with
each other and with the cloud.
By using a mesh networking architecture, the Matter Open Standard eliminates the
need for a central hub or server and enables direct device-to-device communication.
Greater reliability is made possible because devices can still communicate even if
some of them are offline or out of range.
The protocol is intended to be low power and low bandwidth, making it suitable
for use in battery-powered devices and devices with constrained resources.
In addition to its technical capabilities, the Matter Open Standard is intended to
be open and interoperable, enabling seamless communication between devices made
by various manufacturers. Additionally, it is supported by a sizable and expanding
ecosystem of businesses and developers, ensuring that it will proceed to develop and
advance over time.
3.1 Architecture
Matter is not a protocol like WiFi, ZigBee, Thread, etc.; it is an application layer that
acts as a standard. Utilizing previously successful technologies like Google Home,
Apple Homekit, and the Connectivity Standard Alliance (formerly known as the
Zigbee Alliance), the application layer was developed in accordance with Matter
guidelines (Fig. 2).
It is constructed using the IPv6 architecture and currently only supports WiFi,
Thread, and Ethernet, with Thread aimed at low-power, resource-constrained IoT
devices like sensors, locks, etc. [9], whereas WiFi is best for high-bandwidth, active
powered smart appliances like cameras, smart hubs, etc.
All devices require to onboard this application layer on their smart devices to be
Matter-certified.
Intercompatibility of IoT Devices Using Matter: Next-Generation IoT … 53
3.2 Security
Matter is an open standard for the Internet of Things (IoT) that aims to make it easier
to connect and control smart devices. It is designed to be secure, scalable, and most
important of all, communicate locally. This means that Matter-certified smart devices
do not need to upload or share any device data to the cloud, except for add-ons or
required features provided by the manufacturers that make it necessary to use the
cloud. This functionality is similar to that of Apple Home Kit.
One of the key security features of Matter is the use of secure communication
protocols. The standard defines a set of mandatory and optional security protocols
that devices must implement in order to be Matter-compliant. These protocols include
transport layer security (TLS) for encryption, as well as secure key exchange and
device authentication mechanisms.
Another important aspect of Matter’s security is the concept of a “security
domain,” which is a group of devices that share the same security credentials and
are trusted to communicate with one another. This allows for secure communica-
tion between devices within a security domain while preventing unauthorized access
from outside the domain.
Matter also includes a mechanism for device provisioning, which is the process of
securely onboarding new devices to a network. This includes securely provisioning
the device’s initial credentials using Bluetooth Low Energy (BLE) [10], as well as
any subsequent updates to those credentials.
A feature of the standard is the way that implements the device’s automatic soft-
ware updates and firmware updates, which can help to prevent device vulnerabilities
from being exploited.
54 S. Singh
Overall, Matter aims to provide a robust and secure foundation for IoT devices
to communicate and interact with one another. However, it is important to note that
security is a continuous process and implementing standard alone is not enough. A
good security practice includes regular software updates, monitoring of the device,
and an incident response plan.
• Commissioning Devices: Commissioning Devices are devices that help add new
devices to the network and manage network security. They are typically used to
securely provision new End Devices with network keys and also to manage the
security of the network.
• Sleepy End Devices: Sleepy End Devices are low-power End Devices that spend
most of their time in a low-power sleep state and wake up periodically to commu-
nicate with the network. They are designed to be highly power-efficient and are
often used in battery-powered applications where long battery life is critical.
• Intermediate Devices: Intermediate Devices are devices that support the Thread
protocol but do not fully implement all its features. They are typically used to
provide additional functionality or to act as bridges between different types of
networks.
• Network Co-processor (NCP): An NCP is a device that provides an IP interface
to a Thread network, enabling other devices to communicate with the network
without implementing the Thread protocol themselves. NCPs are typically used
in more complex systems where the main processor does not have the resources to
handle the Thread protocol directly. They provide a convenient way to add Thread
capability to existing devices and can also help to reduce the power consumption
of the main processor by offloading some of the network processing.
Figure 3 [14] shows a brief working of a thread-based mesh network using the most
prominent types of devices that are found in a common smart home environment.
Thread also supports over-the-air updates which enable devices to update their
firmware or software automatically which can improve the security and stability of
the network.
When a device joins a Thread network, it first goes through a process called
“commissioning” to establish secure communication with the Thread Routers. Once
a device has been commissioned, it can participate in the network and communicate
with other devices on the network.
To sum up, Matter and Thread work together to provide a secure and reliable
networking solution for IoT devices. The Matter standard provides a set of rules for
how devices can interact with one another, while Thread provides the underlying
communication infrastructure to make those interactions possible.
5 Outcome
Matter provides a solution for IoT devices to increase the compatibility of smart
appliances belonging to multiple ecosystems and also simplifies development for
manufacturers. This also means that a company has to develop and focus on one
common “standard.” Additionally, the market for Matter-certified IoT devices will
grow, which will reduce user confusion because all devices will be interoperable and
able to be controlled by a single common application regardless of the supported
ecosystem.
56 S. Singh
As can be seen from Fig. 4, all smart devices are compatible with any smart home
ecosystem, thus making all devices work seamlessly together over a suite of software
applications.
The Matter IoT Protocol, formerly known as Project CHIP (Connected Home over
IP), has the potential to bring a new level of interoperability and security to the
Internet of Things (IoT) industry.
The launch of Matter is considered to be a huge step forward for IoT, especially in
the consumer market. Cross-platform smart device integration and interoperability
will be possible, making the selection of smart devices easier and more convenient
for IoT customers all over the world, and will increase the range of features that can
be integrated on a smart device.
Matter will also enable the creation of smarter and more efficient ecosystems in a
much larger geographic region, like a Smart Security manager for Residential Soci-
eties, with smart smoke, gas, and motion sensors spread all across using thread, and
Intercompatibility of IoT Devices Using Matter: Next-Generation IoT … 57
Fig. 4 Setup of different smart home ecosystems with Matter-certified smart devices
actively powered devices like cameras, security gates on WiFi/Ethernet, all integrated
for residents.
Here are a few possible future implementations of the Matter IoT protocol:
• Smart Homes: The Matter IoT protocol could become the backbone for smart
homes, enabling seamless integration of different smart devices from different
manufacturers. This would allow homeowners to easily control and automate
their homes, regardless of the brand of their devices.
• Industrial IoT: The integration of various industrial equipments and machineries
from various manufacturers could be made possible by the use of the Matter in
industrial settings. As a result, industrial systems would operate more effectively
and safely overall.
• Health care: Matter could be used in health care, providing a secure and reliable
way for different medical devices and sensors to communicate with each other.
This would improve patient care and make it easier for healthcare professionals
to access and analyze patient data.
• Automotive: The automotive sector could use the Matter IoT protocol to enable
the integration of various in-car devices from various manufacturers. The overall
driving experience would be enhanced, and drivers’ access to and control over
vehicle data would be made simpler and will be accessible remotely as well.
• Agricultural IoT: The integration of various sensors and devices for crop manage-
ment and livestock monitoring could be made possible by the use of the Matter
in agriculture. This would increase agriculture’s productivity and sustainability
58 S. Singh
while also assisting farmers in making better decisions with their devices and
smart system environments.
These are just a few examples of the potential future implementations of the
Matter IoT protocol. With its focus on security, interoperability, and open standards,
the Matter IoT protocol has the potential to revolutionize the IoT industry and bring
new levels of efficiency and convenience to people’s lives.
Acknowledgements The author would like to thank Anshuman Singh (Roll Number: 20HEL2111)
from the Department of Electronics, Deen Dayal Upadhyaya College, University of Delhi, for his
contribution to designing the figures presented in this paper.
References
1. Internet of things (IoT) market growth, trends, covid19 impact, and forecasts (2022
2027). Available: https://www.mordorintelligence.com/industry-reports/internet-of-things-iot-
market. Accessed 15 Dec 2022
2. Aston K. That ’internet of things’ thing. Available: http://www.rfidjournal.com/articles/view?
4986. Accessed 15 Dec 2022
3. Setz B, Graef S, Ivanova D, Tiessen A, Aiello M. A comparison of opensource home automation
systems. https://doi.org/10.1109/ACCESS.2021.3136025
4. Ovadia S. Automate the internet with “if this then that” (IFTTT). https://doi.org/10.1080/016
39269.2014.964593
5. Ergen SC (2004) ZigBee/IEEE 802.15. Available: https://pages.cs.wisc.edu/~suman/courses/
707/papers/zigbee.pdf. Accessed 10 Jan 2023
6. Unwala I, Taqvi Z, Lu J (2018, April) IoT security: ZWave and thread. In: 2018 IEEE green
technologies conference (GreenTech). IEEE, pp 176–182. https://doi.org/10.1109/GreenTech.
2018.00040
7. Thread applications. Available https://www.threadgroup.org/BUILT-FOR-IOT/Smart-Home#
Application. Accessed 10 Jan 2023
8. Matter security and privacy fundamentals connectivity standards alliance documen-
tation. Available: https://csa-iot.org/wp-content/uploads/2022/03/Matter_Security_and_Pri
vacy_WP_March-2022.pdf. Accessed 10 Jan 2023
9. What is thread? Available https://www.threadgroup.org/What-is-Thread/Thread-Benefits.
Accessed 10 Jan 2023
10. How thread can work seamlessly with Bluetooth for commissioning and operation. Available:
https://www.threadgroup.org/news-events/blog/ID/196. Accessed 10 Jan 2023
11. Unwala I, Taqvi Z (2018) Thread: an IoT protocol. In: IEEE green technologies conference.
https://doi.org/10.1109/GreenTech.2018.00037
12. Kim HS, Kumar S, Culler DE. Thread/OpenThread: a compromise in low power wireless
multihop network architecture for the Internet of Things. In: Future internet. https://doi.org/
10.1109/MCOM.2019.1800788
13. Gregersen C. An expert guide to the thread and matter protocols in IoT. Available: https://
nabto.com/matter-thread-protocols-iot/. Accessed 10 Jan 2023
14. Thread in homes. Available https://www.threadgroup.org/BUILT-FOR-IOT/Smart-Home.
Accessed on 4 Feb 2023
Role of Node Centrality for Information
Dissemination in Delhi Metro Network
Kirti Jain, Harsh Bamotra, Sakshi Garg, Sharanjit Kaur, and Gunjan Rani
Abstract The topological structure of the network and the role of node centrality
need to be investigated and evaluated for information dissemination related to trans-
port services and their functionalities, to promote government policies and products,
etc., for vast reachability. Additionally, it is important to study the network reliabil-
ity in case of extreme situations like halting, power failure, overcrowding at a node,
etc. This paper aims to investigate the role of three standard measures of node cen-
trality for information dissemination in the Delhi Metro Network. We simulate the
process of information diffusion utilizing the Susceptible-Infected-Removal (SIR)
spread model through seed nodes identified as vital metro stations. Not only are
the identified central stations ideal for advertising products and disseminating vital
information, but they can also lead to chaotic situations that disrupt the metro’s
functionality.
1 Introduction
With the rapid development of the urban metro network, especially in Delhi, the
spatial structure of the metro network has evolved gradually from a simple line
crossing and grid to complex patterns [1]. To accommodate the increasing population
and congestion, metro infrastructure requires maintenance and extension [2]. The
metro network is not only a solution to traffic congestion but also a potent medium
for information dissemination to popularize vital government policies, products,
K. Jain
Department of Computer Science, University of Delhi, New Delhi, Delhi, India
e-mail: kjain1@cs.du.ac.in
H. Bamotra · S. Garg · S. Kaur · G. Rani (B)
Acharya Narendra Dev College, University of Delhi, New Delhi, Delhi, India
e-mail: gunjanrani@andc.du.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 59
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_6
60 K. Jain et al.
schemes, etc., among a vast fraction of the population through commuters [3]. Metro
systems, even though useful for spreading information, are also prone to power
failures, natural disasters, accidents, and malicious attacks, which entail appropriate
measures to guarantee their safety and reliability.
Complex network theory has recently gained popularity in ecology, finance, social
networks, social sciences, transport systems, etc., for its capabilities to model com-
plex relationships [4]. We make use of network theory to model the complex network
of the Delhi metro, in which a node represents a metro station, and an edge between
two nodes marks a direct route between two stations. Commuters make use of the
metro to reach different destinations and meet people along the way. They propagate
the perceived information along the way and fuel the information depending upon the
topology of the network and the positioning of the nodes (stations) within a network
[1].
Information prevalence depends on the source nodes, which need to be care-
fully selected. Several studies have been conducted to identify the central nodes that
maximize the spread of information [5–7]. Quantifying node importance through
centrality provides a means to rank nodes based on their significance and influence
on others [8]. There exist well-known measures such as degree centrality, between-
ness centrality, closeness centrality, and eigenvector centrality that identify central
nodes considering the topological structure of the network [9, 10].
Existing studies on the metro network focus on its topological characteristics and
its evolution [4, 11]. Recently, Kanwar et al. carried out a complex network-based
comparative analysis between the operational Delhi Metro Network (DMop) and the
extended Delhi Metro Network (DMext). They found similar degree distributions
for both networks [12]. Although an increase in local connectivity in DMext seems
efficient for tackling congestion and managing higher transport loads, it comes at
the cost of increased vulnerability. Another case study on the metro transit system
of Shenzen City explored the effectiveness of a node using an entropy-based multi-
measure metric [13].
Motivated by this, we investigate information diffusion in the Delhi Metro Network
using different centrality measures to understand the pace of information spread and
its disruption under the failure of central nodes.
We construct a simple, undirected, and unweighted network for the Delhi metro
system (referred to as DMN) to investigate the following:
(i) Determine the network model for the constructed DMN based on its topological
properties (Sect. 3.1).
(ii) Identify and investigate important metro stations pivotal for information dis-
semination using three centrality measures (Sects. 4.1 and 4.2).
(iii) Report effective centrality measure for the transport network (Sect. 4.3).
Role of Node Centrality for Information Dissemination … 61
3 Methodology
The Delhi metro rail consists of 10 color-coded lines serving 230 stations in its
current operational state.1 It is by far the largest and busiest metro rail system in
India and the second oldest after the Kolkata Metro. The station and track lines are
the basic elements of the metro network, and the stations are connected via tracks,
resulting in a complex network.
The Delhi Metro Network (DMN) is modeled as an undirected and unweighted
network and is represented as G = (V, E), where V is a set of nodes (metro stations),
and E is a set of edges (metro tracks) connecting two successive metro stations.
DMN is a sparse network with 244 lines between 230 stations whose visualization
is shown in Fig. 1 plotted using the PyViz tool.2 Note that the exact placement and
Fig. 1 Graph representing the Delhi Metro Network (DMN) using PyViz
positioning of the stations are not taken into account. Basic topological features of
DMN are given in Table 1 along with its degree distribution plot in Fig. 2.
The average clustering coefficient of DMN is 0.0, which affirms the fact that
neighboring stations are not connected. A high average shortest path between nodes,
a smaller clustering coefficient, and a straight line on the log-log scale of the trun-
cated degree distribution make the constructed network a candidate for the scale-free
network over random and small-world networks.
64 K. Jain et al.
4 Results
In this section, we describe the experiment settings and observations covering top-
ranking stations (nodes), the analysis of information diffusion through prominent
nodes, and finally, changes in the process of information diffusion after the removal
of significant nodes.
We create the network using the NetworkX library and implement the SIR model
using the NdLib package in Python (64bits, version 3.7.2). Programs are executed
on an Intel(R) Core(TM) i7, CPU @1.80GHz with 16GB RAM. In the rest of the
paper, unless otherwise specified, transmission probability .β = 0.5 and recovery rate
.γ = 0.1 are used for all the diffusion simulations (Sect. 3.2). The reported results are
averaged over 20 simulations.
Role of Node Centrality for Information Dissemination … 65
Since metro stations are interdependent, information originating from one station
reaches the whole system due to a cascading effect. Identifying a source node using
centrality is based on the premise that a central station transmits information faster to
other stations because of its position in the network. In this experiment, we system-
Fig. 3 Number of stations informed when information disseminates from the most central station
with varying diffusion parameters
atically assess seed (source) node selection with respect to its impact on the speed
and reach of information diffusion.
Figure 3 shows the number of stations informed when the diffusion process begins
at the most central station (Rank 1) as the seed node based on BC, CC, and EVC
(Table 2). Note that Rajiv Chowk is the most central station identified by CC and
EVC, and Kashmere Gate is reported as a top-rank node in BC. We also vary dif-
fusion parameters .β = {0.5, 0.8} and .γ = {0.1, 0.3} to understand the variation in
information spread originating from the same seed node.
It is vindicated from Fig. 3 that information dissemination starting at Rajiv Chowk
leads to the highest final reach with the maximum number of informed stations.
In contrast, the spread from Kashmere Gate leads to a relatively lower reach and
fewer informed stations. However, this observation is also true for varying diffusion
parameters (.β and .γ ) with distinct numbers of informed stations. A high value of
.β results in higher information dissipation among stations, whereas a high value of
.γ eliminates the received information from the station, resulting in fewer informed
stations. The inset table in Fig. 3 shows the total number of stations informed for two
central stations: Kashmere Gate and Rajiv Chowk, for varying diffusion parameters.
We also diffused information from the ranked fifth node and compared the max-
imum number of informed stations (M) with that of the ranked one station to affirm
the importance of the latter (Fig. 4). Higher spread from the top-ranked nodes com-
pared to the lower-ranked stations, viz. Rajouri Garden (BC), Mandi House (CC),
and Patel Chowk (EVC) endorse the said claim.
It is revealed that diffusion from stations with a high CC and EVC score signifi-
cantly contributes to the spread of information. Moreover, the speed of diffusion and
recovery from information are two opposing forces that are decisive for information
Role of Node Centrality for Information Dissemination … 67
Fig. 4 Most informed stations (M) at a time for ranked first (R1) versus fifth (R5) stations
spread. High information dispersion among stations with low eradication acceler-
ates the spread. Hence, identifying the top-central node is crucial for maintaining
the overall prevalence of the information.
Since the centrality of the seed node is a key factor in how information spreads
throughout a network, we look into the relationship between the centrality score and
the number of stations that have information.
Information is diffused from each station with a centrality score (CS), and the
maximum (M) and total number (T) of stations informed are noted. The top row
in Fig. 5 shows a relationship between M and CS. The bottom row displays the
dependency between T and CS, with each subfigure corresponding to a centrality
metric. It is clear from the scatter plots that both M and T escalate with an increase
in centrality score. We also plotted a regression line to analyze the relationship
and observed that the CC score has a strong linear relationship with both M and T
(. R 2 = 0.83 and . R 2 = 0.77 respectively).
The stated observations are in tandem with earlier results showing that central
nodes promote the spread of information and closeness centrality is an effective
measure for information propagation.
68 K. Jain et al.
Fig. 5 Relationship of centrality scores (BC, CC, and EVC) with the maximum stations informed
(M) and the total number of stations informed (T)
If a station becomes dysfunctional, its load cascades through the network. Hence, it is
important to study the effect of station failure in this scenario. We removed the most
prominent station corresponding to each BC, CC, and EVC sequentially from the
network and recomputed the topological characteristics of the updated network and
top-ranked nodes by three centrality measures (Table 3). Diffusion spread is simulated
from the newly identified central node in the DMN, and comparative results for M
cases are shown for three centrality metrics in Fig. 6.
Results show that the number of informed stations (M) is reduced in the absence
of the top-ranked station for all centrality measures and assert its importance in
maintaining maximum functionality within the network.
Table 3 Properties of the updated DMN after removing the top-ranked station in the original DMN
Fig. 6 Maximum number of stations informed (M) before and after the removal of first ranked
stations
5 Conclusion
This paper studies the role of node centrality in information dissemination through
the Delhi Metro Network. The network is sparse, with an average degree of two
nodes, and has almost no clustering coefficient, reflecting no connectivity within
the neighbors of a node. Three well-known centrality measures, namely between-
ness centrality, closeness centrality, and eigenvector centrality, are used to deduce
prominent stations. Rajiv Chowk and Kashmere Gate are identified as prominent
interchange stations and are crucial for the information dissemination required for
the proper functioning of the network.
Acknowledgements This work was supported by the DBT Star College Scheme at Acharya Naren-
dra Dev College, DU.
References
1. Kandhway K, Kuri J (2016) Using node centrality and optimal control to maximize information
diffusion in social networks. IEEE Trans Syst, Man, Cybern: Syst 47(7):1099–1110
2. Frutos Bernal E, Martín del Rey A, Galindo Villardón P (2020) Analysis of Madrid metro
network: from structural to hj-biplot perspective. Appl Sci 10(16):5689
3. Yadav S, Rawal G (2016) The novel concept of creating awareness about tuberculosis at the
metro stations. The Pan Afr Med J 23
4. Chen S, Zhuang D, Zhang H (2018) Urban metro network topology evolution and evaluation
modelling based on complex network theory: a case study of Guangzhou, China. MATEC web
Conf 232:01034
70 K. Jain et al.
R. M. Patil (B)
Electrical Engineering Department, SKNS College of Engineering Korti, Affiliated to Solapur
University, Pandharpur, Maharashtra, India
e-mail: rajesh.m.pati1972@gmail.com
B. G. Nagaraja
Electronics and Communication Engineering, Vidyavardhaka College of Engineering, Mysuru,
Karnataka, India
M. R. Prasad
Computer Science and Engineering, Vidyavardhaka College of Engineering, Mysuru, Karnataka,
India
T. C. Manjunath
Electronics and Communication Engineering Department, Dayananda Sagar College of
Engineering, Bengaluru, India
R. Rayappa
Electronics and Communication Engineering, Jain Institute of Technology, Davanagere,
Karnataka, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 71
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_7
72 R. M. Patil et al.
1 Introduction
2 Methodology
The generalized DFD for the iris detection using unconstrained environment for the
three proposed methodologies is shown in Fig. 1.
In recent years, iris recognition systems have achieved impressive recognition
rates in controlled environments. The study and implementation of iris recogni-
tion technologies have been the focus of various research communities for the past
50 years. However, most earlier research on iris recognition was limited to clear and
well-captured iris images, and the system’s effectiveness was thought to be highly
dependent on image quality. Images with lower quality and resolution, captured from
a distance, or those that contain dynamic motion can significantly reduce the perfor-
mance of iris recognition systems that are already limited in scope. A non-ideal iris
image is one that suffers from issues such as poor acquisition angles, occlusions,
pupil dilations, image blurriness, and low contrast. Most research has been confined
to restricted environments, but we have focused on addressing this gap by studying
the iris recognition problem in unrestricted settings [1–5].
74 R. M. Patil et al.
Fig. 1 Generalized DFD for the iris detection using unconstrained environment for the three
proposed methodologies
Biometric Iris Recognition System’s Software and Hardware … 75
Numerous attempts have been made to create iris biometric recognition systems
for secure authentication. Many researchers have focused on developing recognition
systems in constrained environments, where the camera must be aimed directly at
the subject’s eye, the subject must look directly into the camera, there must be no
parallax, the subject’s eyes must be open for iris capturing, and adequate lighting must
be present [6–10]. However, only a few have developed iris identification systems in
unrestricted settings, which is the focus of the proposed work presented in this article.
While these algorithms are effective in constrained situations, they may not perform
well in unconstrained environments. It is important to note that for unconstrained
situations, certain limitations must be considered for the system to function properly
and accurately. We have previously covered this topic in earlier articles [11–15].
The standard procedure for the enrollment segment starts with the acquisition of an
image of the iris from a high-resolution iris camera, followed by the identification
of the area of interest (ROI) from the entire image of the subject’s face.
The process of image acquisition catches people’s attention. It should be taken
into consideration that the region of inference only includes the portion of the iris,
and that single portion of the iris should be taken into account for detection purpose
[6–10]. The preprocessed original image is used for analysis and to improve system
performance. The preprocessed image is next subjected to a normalization approach
to reduce noise and improve the effectiveness w.r.t. recognized rule sets, which is
carried along with the improvement of the iris portion. Then, the next stage may
be the identification of the subject once the iris features have been retrieved and
will save a featured vector for comparing of a number of vectors in the database
of the iris collected. The six contributed works that employ the ideas of prepro-
cessing of images, edge detections, segmentations, normalization, feature, and the
classifications for the identification of any humans iris are given in Fig. 1 [11–15]
Currently, a complete iris scan of a human eye is performed using an iris recognition
system (IDS) consisting of multiple block sets. Each block has a specific function and
is used in our research [5–10]. The procedural aspects of the digital image processing
76 R. M. Patil et al.
(DIP) are shown in Fig. 2, while Fig. 3 depicts the functional data flow diagram (DFD)
of the approach presented in this paper.
In this section, we will discuss the contributions made to the iris recognition process
using the LabVIEW tool, which is a product of National Instruments NI®. LabVIEW
is a programming language and development environment that offers an interactive
environment for designing and solving problems in various application-dependent
tasks. It includes a workspace, a command window, a primary program editor’s
window, and a location for program files, as shown in Fig. 2. Additionally, LabVIEW
Biometric Iris Recognition System’s Software and Hardware … 77
Fig. 4 DFD for the iris detection system for election purpose
Fig. 5 Developed IRS block diagram with the help of artificial neural net and LabVIEW software
offers built-in mathematical functional modules that are essential for tackling scien-
tific and engineering challenges. Figure 4 provides a diagrammatic representation
of the data flow diagram (DFD) of the iris recognition system that can be used for
electioneering, while Fig. 5 shows the developed block diagram of the iris detection
system using artificial neural networks and the LabVIEW software.
7 Proposed Contributions
This section presents the contributions of the research as three distinct entities denoted
by C1–C3, which were developed in the LabVIEW environment. The proposed block
diagrams were converted into LabVIEW scripts (.vi files) and executed, resulting in
a binary response of either yes or no indicating the recognition of the iris. The main
78 R. M. Patil et al.
interactive GUI-based system that is user-friendly and automates the biometric recog-
nition process. The system is designed to operate in unconstrained environments,
including poor lighting conditions, images taken from angles, and long distances.
The iris recognition system has been developed using the NI LabVIEW and
NI Vision software platforms. NI LabVIEW is a popular graphical programming
language for scientific and engineering tasks. The Vision Development Module
provides a library of LabVIEW VIs called NI Vision for LabVIEW, which can
be used to develop applications for scientific imaging and machine vision. This
Biometric Iris Recognition System’s Software and Hardware … 81
contribution includes a GUI that demonstrates how various processes involved in iris
recognition systems, such as feature extraction and preprocessing, are implemented
using LabVIEW. Additionally, an example of an iris detection module created with
LabVIEW for electronic voting is presented.
This section describes the development of an automated GUI and the hardware
implementation of iris recognition using an ATMEL microcontroller connected to
LabVIEW. The main focus of this section is the step-by-step implementation of
the algorithm and its integration with a well-developed GUI. One methodology is
suggested in this chapter, along with the creation of a LabVIEW GUI. Additionally,
this section includes a real-time implementation of an iris identification module
used for voting procedures and one of its applications. The chapter presents the
essential observations and justifications in the form of discussions, along with the
varied outcomes achieved for all of the test images. The algorithm for the hardware
implementation is shown in Figs. 12 and 13, respectively.
8 Conclusions
References
1. Kaur N, Juneja M (2014) A novel approach for iris recognition in unconstrained environment.
Journal of Emerging Technologies In Web Intelligence 6(2):243–246
2. Tsai Y-H (2014) A weighted approach to unconstrained ıris recognition. World Academy
of Science, Engineering and Technology International Journal of Computer and Information
Engineering, vol 8, No. 1, pp 30–33. ISSN:1307-6892
3. Roy K, Bhattacharya P, Suen CY (2010) Unideal ıris segmentation using region-based active
contour model. Campilho A, Kamel M (eds) ICIAR 2010, Part II, LNCS 6112, © Springer,
Berlin, pp 256–265
4. Raffei AFM, Asmuni H, Hassan R, Othman RM (2013) Feature extraction for different
distances of visible reflection iris using multiscale sparse representation of local Radon
transforms. Pattern Recogn 46:2622–2633
5. Jan F (2017) Segmentation and localization schemes for non-ideal iris biometric systems.
Signal Process 133:192–212
6. Shin KY, Nama GP, Jeong DS, Cho DH, Kang BJ, Park KR, Kim J (2012) New iris recognition
method for noisy iris images. Pattern Recogn Lett 33:991–999
7. Nagaraja BG, Jayanna HS (2013) Multilingual speaker identification by combining evidence
from LPR and multitaper MFCC. J Intell Syst 22(3):241–251
8. Haindl M, Krupička M (2015) Unsupervised detection of non-iris occlusions. Pattern Recogn
Lett 57:60–65
9. Karakaya M (2016) A study of how gaze angle affects the performance of iris recognition.
Pattern Recogn Lett 82:132–143
10. Barpanda SS, Majhi B, Sa PK (2015) Region based feature extraction from non-cooperative
iris images using triplet half-band filter bank. Opt Laser Technol 72:6–14
11. Proença H, Neves JC (2016) Visible-wave length iris/periocular imaging and recognition
surveillance environments. Image Vis Comput 55:22–25
12. Hu Y, Sirlantzis K, Howells G (2017) A novel iris weight map method for less constrained iris
recognition based on bit stability and discriminability. Image Vis Comput 58:168–180
13. Liu J, Sun Z, Tan T (2014) Distance metric learning for recognizing low-resolution iris images.
Neurocomputing 144:484–492
14. Alvarez-Betancourt Y, Garcia-Silvente M (2016) A key points—based feature extraction
method for iris recognition under variable image quality conditions. Knowl-Based Syst
92:169–182
15. Hajaria K, Gawandeb U, Golharc Y (2015) Neural network approach to ıris recognition in noisy
environment. In: International conference on ınformation security & privacy (ICISP2015),
Procedia Computer Science, vol 78 (2016). 11–12 Dec 2015, Nagpur, India, pp 675–682
A Unique Method of Detection of Edges
and Circles of Multiple Objects
in Imaging Scenarios Using Line
Descriptor Concepts
Abstract This paper proposes a novel method for detecting the edges and circles
of multiple objects in imaging scenarios using line descriptor concepts. The method
involves analyzing the intensity gradient and the orientation of the pixels in the image,
and using this information to identify lines and circles that are likely to correspond
to object boundaries. The proposed approach is compared with existing methods
and is shown to provide superior performance in terms of accuracy and computa-
tional efficiency. The method is particularly useful for applications such as object
recognition and tracking, where accurate detection of object boundaries is essential.
The experimental results demonstrate the effectiveness of the proposed method in
detecting edges and circles of multiple objects in different imaging scenarios.
R. M. Patil (B)
Electrical Engineering Department, SKNS College of Engineering Korti, Affiliated to Solapur
University, Pandharpur, Maharashtra, India
e-mail: rajesh.m.pati1972@gmail.com
B. G. Nagaraja
Electronics and Communication Engineering, Vidyavardhaka College of Engineering, Mysuru,
Karnataka, India
M. R. Prasad
Computer Science and Engineering, Vidyavardhaka College of Engineering, Mysuru, Karnataka,
India
T. C. Manjunath
Electronics and Communication Engineering Deparment, Dayananda Sagar College of
Engineering, Bengaluru, India
R. Rayappa
Electronics and Communication Engineering, Jain Institute of Technology, Davanagere,
Karnataka, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 85
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_8
86 R. M. Patil et al.
1 Introduction
In computer vision, the detection of object boundaries is an essential task for various
applications, such as object recognition, tracking, and segmentation. Detecting edges
and circles of multiple objects in imaging scenarios is a challenging problem due to
variations in object shapes, sizes, orientations, and lighting conditions. Many existing
edge and circle detection methods suffer from limitations such as sensitivity to noise,
computation time, and detection of false positives.
The detection of edges and circles in images is a fundamental task in computer
vision, with numerous applications in areas such as object recognition, tracking, and
segmentation. In this literature survey, we review some of the existing methods for
detecting edges and circles in images. The Canny edge detector is one of the most
widely used methods for edge detection. It involves convolving the image with a
Gaussian filter to reduce noise, computing the gradient magnitude and orientation,
and applying non-maximum suppression and hysteresis thresholding to obtain the
final edge map. While the Canny detector can produce high-quality edge maps, it is
computationally expensive and sensitive to the choice of parameters.
The Hough transform is a popular method for detecting circles in images. It
involves converting the image to a parameter space, where circles are represented as
curves, and then detecting peaks in the parameter space to identify the circles. While
the Hough transform can produce accurate results, it is computationally expensive and
sensitive to the choice of parameters. Detection of edges is utilized for segmenting of
the data and in extracting of the fields including image processing, computer vision,
and machine vision. The term “edge detection” refers to a group of mathematical
methods for identifying regions in digital images where there are discontinuities or,
more precisely, when the brightness of the image changes suddenly [1].
The study in [2] presents a comprehensive framework for designing and imple-
menting augmented reality (AR) guidance systems in industrial settings. It also
provides a valuable contribution to the field of AR guidance systems by offering
a comprehensive framework that considers various aspects of designing and imple-
menting AR guidance systems in industrial settings. The case studies and evaluation
demonstrate the effectiveness of the proposed framework and provide insights into
the potential benefits of AR guidance systems in improving industrial processes. The
authors in [3] introduce the Retina U-Net, which is a modification of the popular U-
Net architecture. The Retina U-Net combines the segmentation and detection tasks,
making use of the segmentation supervision to detect objects. The proposed method
achieves state-of-the-art performance on several medical object detection datasets,
including lung nodule detection and polyp detection. The Retina U-Net is compu-
tationally efficient and requires less training data compared to other state-of-the-art
methods. The paper also demonstrates the potential of combining segmentation and
detection tasks for medical object detection, which can lead to more accurate and
efficient detection systems.
Three different categories of edge exist:
• Straight outlines (horizontal in nature)
A Unique Method of Detection of Edges and Circles of Multiple Objects … 87
2 Block-Diagrams/Flow-Charts
The typical edges of an object in a gray scale image which will be stored in PC’s
computer memory and is shown in Figs. 2, 3 and 4, respectively, whereas the types
of operators that could be utilized for the detection of edges of objects in a image is
shown in Fig. 1 [6].
Viewpoint dependent or viewpoint independent edges can be retrieved from a
two-dimensional picture of a three-dimensional scene. The fundamental qualities of
three-dimensional objects, such as surface marks and shape, are often reflected by a
perspective independent edge. A perspective dependent edge, which changes as the
point of view changes, frequently reflects the scene’s geometric features, for ex., the
occlusion of the objects one above the other (hidden objects) [7].
The line separating a red block from a yellow block, for example, is a typical
edge. A line, on the other hand, could be a tiny no. of image pels of a variable color
on an other-wise constant back ground (as can be retrieved by a ridge detector). As a
result, there may be any one of the edges on the either sides of a line’s in the majority
of the cases considered. The edges obtained from natural photos are rarely perfect
edges which are stepped in nature. Actually, those are typically effected by a few of
the important parameters such as [8]
• Focal-type blurs developed by the finite depth of the file d7 the finite point of the
spread function.
• Penumbral blurs that are developed due to the shadowing effects which are created
by the sources of the light.
90 R. M. Patil et al.
One of the important parameter, the scaled value of the sigma determines the
blurred scales for the edge. To prevent destroying the image’s actual edges, this scale
parameters could ideally be changed depended on the quality of the image or the
picture [10].
In this section, we present the concepts that are used in the detection of the circles
using the digital image processing fundamentals. This section explains a three-stage
procedure for determining the location of a circle. In this section, we’ll go over a
three-step process for determining the circumference and radius of circles [11].
Some of the notions from the canny edge detection operators are used in the
algorithm that incorporates the procedure. We consider the input image to be a gray
scale image with no noise. There are no random variations in the intensity of the
noise-free image examined for input. The block diagram of the suggested algorithm,
which is a three-stage process for detecting the location of circles, is discussed in the
next section [12].
4 Approaches
In the second stage, the thin edges determined from the first. The arcs that satisfy
the conditions of being a component of the circle are contour traced using the pixel
direction of each pixel. In this stage, any spurious points or edge pixels that do not
meet the criteria for being a component of a circle are discarded. As a result, the arcs
that have a higher chance of being part of the circle are kept [14].
5 Introductory Remarks
When the objects are not polyhedral, shape analysis is a method of determining the
shape of irregular objects using two types of descriptors, viz., the line descriptors and
area descriptors. Examples of the shape analysis of objects could be circles, spheres,
ellipses, boundaries, curves, arcs, objects of irregular shapes. The first method is the
line descriptors method, which is explained as follows [15].
6 Line Descriptors
1st method of performing SA and is used for finding out the boundary or the curve
of an object’s length of a regular or irregular object in pixels and uses an encoding
scheme called as chain coding. The process of chain coding is as follows in Table 2
[16].
Chain coding is a technique of finding the length of the closed curve or open curve
in pixels by representing the curve as a sequence of chain codes a ∈ Rn (n being the
length of curve in pixels) and is a relative representation.
• p is a pixel on the boundary of an object.
• Start from the right most pixel marked with a dot, ‘.’.
• Write down the relative position of the next adjacent neighboring pixel forming
the boundary/curve.
• Repeat the process till you reach the starting position (for all the pixels on the
curve).
• The vector thus formed is called as the chain code of the curve.
• |C(a)| gives the length of the curve in pixels.
• a = [2, 2, 3, 4, 5, 4, 5, 6, 7, 7, 0, 1, 1]T
• |C(a)| = 13 pixels.
The developed chain coding process (CCP) is invariant to translations and is
variant to rotations. Since chain coding is a relative representation, the chain code
does not change for a curve of identical shape, but, at a different location, translated
by some amount. For rotations, the length of the curve will be the same, but the chain
code of the curve will be different since the starting point will be changed. If the
number of transitions = number of pixels, it is called as an closed curve; if the no.
of transitions is > the no. of pixels, then it is an open curve, i.e., nT = np − 1. Use
of chain coding are in signature verification in banks and in character recognition. A
numerical example is shown in Table 2, whereas the central pixel surrounded by 8
neighbors is shown in Table 1. Then, the three stages of the developed circle detection
algorithmic approach is mentioned below in the form of a program as follows [17].
7 Program/Algorithm Developed
Program:
%% Read Image
Inputimage=imread(‘a.jpg’);
%% Show the image
figure(1)
imshow(Inputimage);
title(‘I/P THE IMAGE CONSIDERING THE NOISEs’)
%% Convert to gray scale
if size(Inputimage,3)==3% RGB image
Inputimage=rgb2gray(Inputimage);
End
%% Convert to binary image
threshold = graythresh(Inputimage);
Inputimage=~im2bw(Inputimage,threshold);
%% Remove all the types of objects that are containing < 30 pixels
Inputimage = bwareaopen(Inputimage,30);
pause(1);
%% Label connected components
[L Ne]=bwlabel(Inputimage);
A Unique Method of Detection of Edges and Circles of Multiple Objects … 93
propied=regionprops(L,‘BoundingBox’);
imshow(~Inputimage);
hold on
for n = 1:size(propied,1)
rectangle(‘Position’,propied(n).BoundingBox,‘EdgeColor’,‘g’,‘LineWidth’,2)
end
hold off
pause (1);
%% Objects extraction
figure
for n=1:Ne
[r,c]=find(L==n);
n1=Inputimage(min(r):max(r),min(c):max(c));
imshow(~n1);
pause(0.5)
end
Output:
The outputs of the program for a single finger, two finger, three finger, four finger,
and a five finger of a human hand is shown in Figs. 6, 7, 8, 9 and 10, respectively.
8 Conclusion
In conclusion, this paper proposes a unique method for detecting edges and circles of
multiple objects in imaging scenarios using line descriptor concepts. The proposed
method uses a combination of line detection and descriptor techniques to detect edges
and circles, which is then refined using a clustering algorithm to identify individual
objects. The experimental results demonstrate the effectiveness of the proposed
method in detecting edges and circles of multiple objects in various scenarios,
including natural scenes and industrial environments. The proposed method is shown
to outperform existing state-of-the-art methods in terms of accuracy and efficiency.
Furthermore, the proposed method is computationally efficient and can process large
volumes of images quickly. This makes it suitable for real-time applications, such as
industrial inspection and surveillance.
References
1. Lindeberg T (1998) Edge detection and ridge detection with automatic scale selection.
International Journal of Computer Vision 30:117–154
2. Zubizarreta J, Aguinaga I, Amundarain A (2019) A framework for augmented reality guidance
in industry. Int J Adv Manuf Technolo 102:4095–4108
3. Jaeger PF, Kohl SA, Bickelhaupt S, Isensee F, Kuder TA, Schlemmer HP, Maier-Hein KH (2020)
Retina U-Net: embarrassingly simple exploitation of segmentation supervision for medical
object detection. In: Machine learning for health workshop. PMLR, pp 171–183
4. Park JM, Lu Y (2008) Edge detection in grayscale, color, and range images. In: Wah BW (ed)
Encyclopedia of computer science and engineering
5. Canny J (1986) A computational approach to edge detection. IEEE Transactions on Pattern
Analysis and Machine Intelligence 8:679–714
6. Haralick R (1984) Digital step edges from zero crossing of second directional derivatives. IEEE
Transactions on Pattern Analysis and Machine Intelligence 6(1):58–68
96 R. M. Patil et al.
Abstract To create a robot that can navigate the environment, you need an environ-
ment map of the appropriate environment. SLAM is the problem of updating a map
while tracking the location of an unknown environment. LiDAR SLAM is a type of
SLAM whose photodetection and ranging are common remote sensing methods used
to determine the precise distance between an object and a sensor. Helps to draw the
map more accurately. A pulsed laser is used in LiDAR to determine an object’s fluc-
tuating distance. The scanner estimates the altitude measurement by receiving and
recording the time delay between the transmission and receipt of the laser pulse. The
position of the system connected with the LiDAR sensor is also indicated through
GPS. We propose an agnostic front-end LiDAR system and provide a variety of
qualitative results. In addition to SLAM, we will also introduce YOLO v4. In other
words, you can see it only once, which is a new approach to discovering multiple
objects in real time in one frame. The whole image frame is processed by a single
neural network in YOLO. Divide the image into areas and use the probability of each
region to forecast the bounding box. This article introduced YOLO and modified the
YOLO v4 network for real-time object detection.
1 Introduction
Robotic vision systems are a crucial piece of technology that allow robots to interact
with the environment and comprehend their surroundings. It includes analyzing
visual input and creating a three-dimensional model of the environment surrounding
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 97
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_9
98 S. Pendkar and P. Shingare
the robot using cameras, sensors, and software algorithms [1]. Several sectors,
including industrial automation, autonomous cars, medical robotics, and many more,
can benefit from using robotic vision systems. Robotic vision systems are getting
more sophisticated and are able to carry out difficult tasks with increased accuracy
and efficiency as machine learning and computer vision techniques progress.
The creation of 3D cloud maps is an important challenge for a variety of robotic
applications. To do so, two different techniques are mentioned below.
1. LiDAR SLAM.
Robotics and autonomous systems employ the LiDAR Simultaneous Localization
and Mapping (SLAM) technology to simultaneously map the environment and
localize the robot inside it [2]. It makes use of a LiDAR sensor, which sends out
laser beams to detect its surroundings and generate a 3D point cloud of the area.
This point cloud data is used by the LiDAR SLAM algorithm to identify and
track environmental elements including walls, objects, and landmarks while also
determining the location and orientation of the robot on the map. As a result, the
robot can maneuver and avoid obstacles in real time with accuracy.
In applications like autonomous cars, drones, and mobile robots where precise
and effective mapping and localization are crucial, LiDAR SLAM is frequently
employed. Robots are useful instruments for a number of jobs, including surveillance,
search and rescue, and transportation, since they can work independently in complex
and dynamic situations by employing LiDAR sensors to generate and update a map
of the surroundings.
2. Visual sensor-based SLAM.
Simultaneous Localization and Mapping, or SLAM, is a robotics and computer vision
technology that uses visual sensors to map an uncharted area while also detecting
the location of the robot inside it.
In visual sensor-based SLAM, the robot takes pictures of its surroundings with
one or more cameras. To construct a 3D map of the environment, these photos are
then processed to extract visual elements like corners, edges, and blobs.
The robot’s location and orientation are also calculated by observing how these
visual elements change over time. A comprehensive map of the environment is
created by fusing data from the camera(s) and the robot’s movement, and the position
of the robot inside that environment is continually updated.
Built on visual sensors applications for SLAM may be found in a number of indus-
tries, including robots, autonomous driving, virtual reality, and augmented reality. It
offers the benefit of being inexpensive, lightweight, and free from the need for pricey
sensors like LiDAR [3].
The creation of a rich and exact 3D point cloud map has been made feasible
thanks to recent breakthroughs in LiDAR technology. Since solo odometry is subject
to oscillations in motion estimates, the integration of the modules is very impor-
tant for the accuracy of maps. Despite several advances in LiDAR odometry tech-
niques, this motion estimation error is unavoidable. Front-end LiDAR system is
developed. By building the system in modules and successfully integrating it with
Robotic Vision: Simultaneous Localization And Mapping (SLAM) … 99
2 Literature Survey
Fast SLAM 2.0 that incorporates scan matching and loop closure detection has been
shown and reviewed [4]. After investigating one of the top deep learning methods
that makes use of convolutional neural networks to assist the robot in detecting
its surroundings and identifying items. With the help of the YOLOv3 algorithm
validation of experiments has been done.
Cutting-edge method for visual sensor data in open spaces that works with point
clouds of sparse data and allows SLAM and object recognition. Unlike deep neural
networks, which can only recognize and classify items in the current frame, ORB-
SLAM determines the observer’s location and generates a cloud of points that
symbolizes the environment’s objects by combining previous and present monoc-
ular visual sensor video frames [5]. The collected point cloud contrasts with the
region that the OR network recognized. Filtration of points that match to the region
indicated by the OR algorithm is done because every point has a counterpart in the
current frame on the 3D map. Clustering method discovers regions in which points
are widely spread to pinpoint the locations of objects detected by OR. Following
step estimates the bounding boxes of the objects that were detected using a heuristic
based on principal component analysis [5].
Due to the poor resolution and background-like objects in aerial photos, tiny
target detection is still a challenging task. Effective and high-performance detector
approaches have been created with the recent advancements in object detecting tech-
nology. The YOLO series is an example of an efficient, lightweight object identifi-
cation technique among them. In this article, we suggest a technique for tweaking
YOLOv4 to enhance the performance of tiny target recognition in aerial photos. The
first effective channel attention module was used to change the structure of it, and the
channel attention pyramid approach was suggested. Useful YOLO channel attention
pyramid is provided [6].
The creation of accurate 3D point clouds is essential even for data-driven urban
studies and many robot tasks. To achieve this, SLAM based on light detection and
ranging LiDAR sensors has been developed. Numerous odometry and location iden-
tification techniques have been independently presented in academics to make up
a complete SLAM system. However, they have not been sufficiently integrated or
merged, making it difficult to upgrade a single place identification or odometry
module. Each module’s performance has significantly increased recently, thus it is
essential to create a SLAM system that can seamlessly combine them and quickly
swap out older modules for the newest. Successful combining of SLAM with Scan
Context++ and several different free LiDAR alternatives for building accurate maps
has been done.
Mathematical framework is used for merging SLAM with object tracking. Two
approaches are outlined: SLAM for generic objects and SLAM for tracking and
recognizing moving things. A joint posterior is computed for the robot and all gener-
alized objects in SLAM with generalized objects. SLAM systems are now in use,
but with a more organized methodology that makes motion modeling of generic
objects easier. Sadly, it is computationally expensive and usually impractical. The
estimation issue is divided into two distinct estimators using SLAM with DATMO.
Because discrete posteriors are preserved for stationary and moving objects, the
Robotic Vision: Simultaneous Localization And Mapping (SLAM) … 101
ensuing estimation problems are substantially less dimensional than in SLAM with
generalized objects. It’s challenging to do SLAM and object tracking from moving
vehicle in congested cities. Workable techniques that address problems with percep-
tion modeling are offered. The recognition of moving objects and data association
are carried out using the SLAM with DATMO framework. The CMU Navlab11 car’s
data was used to demonstrate the use of SLAM with DATMO while it sped through
crowded metropolitan areas at high speeds. A wide range of experimental findings
demonstrate the viability of the suggested theory and methods [2].
3 LiDAR SLAM
process. Furthermore, the laser’s difference in return time and wavelength is utilized
to create exact digital 3D representations and surface details of the target, as well as
to visually map its distinct properties [8]. As seen, LiDAR technology can generate
accurate and precise information about road structure and identify obstacles to avoid
collision.
LiDAR is a radio-wave and sound-based technology like radar and sonar. LiDAR,
on the other hand, is more exact than them since they can only map the location
of a distant object, whereas LiDAR can build precise digital 3D representations.
This qualifies them for intimate and personal dynamics in a variety of applications,
including driverless cars.
4 YOLO_v4
A cutting-edge real-time object recognition technique that makes use of deep learning
is called You Only Look Once (YOLO) v4. YOLO v4 is an upgrade over earlier
iterations of YOLO, with faster speed and more accuracy.
The technique predicts the bounding boxes, objectless score, and class probability
for each grid cell after splitting an image into a grid. The cross-stage partial (CSP)
architecture-based backbone network used by YOLO v4 aids in increasing the preci-
sion of object recognition. A spatial pyramid pooling (SPP) module is also included,
which enables the model to learn features at various sizes.
Robotic Vision: Simultaneous Localization And Mapping (SLAM) … 103
5 Hardware Used
In this study, a personal computer was chosen. It features an Intel Core(R) i7-10510U
CPU clocked at 1.80 and 2.30 GHz, 8 GB of DDR4 RAM, and an NVIDIA GeForce
MX350 graphics card with 4 GB of DDR5 RAM to expedite CNN training. For
YOLO operations, the Mi Webcam HD 720p camera was used as a 3D camera
sensor. SmartFly info the LiDAR-053 EAIYDLIDAR X4 LiDAR Laser Radar sensor
module is utilized for LiDAR with a range of 10 m and a frequency of 5 kHz (Fig. 5).
Robotic Vision: Simultaneous Localization And Mapping (SLAM) … 105
6 Results
LiDAR
SC-PGO is the fundamental core of SC-LiDAR-SLAM. Open source and inte-
grated LiDAR odor measuring technologies have been combined with SC-PGO.
For convenience of usage, the entire LiDAR SLAM system is available through the
repository.
The SLAM output of an example image is shown in Fig. 6, providing a real-time
3D map of an interfaced environment in a specific graphical interface.
YOLO_v4
YOLOv4 achieves cutting-edge results in real-time object detection and is capable
of running at 60 FPS using the GPU. Model is trained to detect 81 other objects
together. The real-time object detection situation is depicted in Fig. 7, where it detects
3 different objects simultaneously.
Fig. 6 (continued)
7 Conclusion
Front-end agnostic LiDAR system has been developed and that gave qualitative
results. Easy interaction between several LiDAR (or even radar odor measuring) tech-
nologies can be possible and successfully built the exact point cloud maps thanks to
our modular architecture and Scan Context++’s excellent loop-closing features [10].
In subsequent work, we’ll provide several quantitative evaluations of the performance
of the recommended LiDAR system.
YOLO series is established and enhanced the YOLOv4 network in this study to
better recognize indoor micro targets [11].
References
1. https://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping
2. Wang C-C, Thorpe C, Thrun S, Hebert M, Durrant-Whyte H (2007) Simultaneous localiza-
tion, mapping and moving object tracking. The International Journal of Robotics Research
26(9):889–916
3. Maolanon P, Sukvichai K, Chayopitak N, Takahashi A (2019) Indoor room identify and
mapping with virtual based SLAM using furnitures and household objects relationship based on
CNNs. In: 2019 10th international conference of information and communication technology
for embedded systems (IC-ICTES). IEEE, pp 1–6
4. Chehri A, Zarai A, Zimmermann A, Saadane R (2021) 2D autonomous robot localization using
fast SLAM 2.0 and YOLO in long corridors. In: International conference on human-centered
intelligent systems. Springer, Singapore, pp 199–208
5. Mazurek P, Hachaj T (2021) SLAM-OR: simultaneous localization, mapping and object recog-
nition using video sensors data in open environments from the sparse points cloud. Sensors
21(14):4734
6. Kim M, Jeong J, Kim S (2021) ECAP-YOLO: efficient channel attention pyramid YOLO for
small object detection in aerial image. Remote Sensing 13(23):4851
7. Chan S-H, Wu P-T, Fu L-C (2018) Robust 2D indoor localization through laser SLAM and
visual SLAM fusion. In: 2018 IEEE international conference on systems, man, and cybernetics
(SMC). IEEE, pp 1263–1268
8. Bavle H, De La Puente P, How JP, Campoy P (2020) VPS-SLAM: visual planar semantic
SLAM for aerial robotic systems. IEEE Access 8:60704–60718
9. Garcia-Rodriguez J (ed) (2020) Robotic vision: technologies for machine learning and vision
applications: technologies for machine learning and vision applications. IGI Global
10. Kim G, Yun S, Kim J, Kim A (2022) SC-LiDAR-SLAM: a front-end agnostic versatile
LiDAR SLAM system. In: 2022 international conference on electronics, information, and
communication (ICEIC). IEEE, pp 1–6
11. Cai Y, Alber F, Hackett S (2020) Path markup language for indoor navigation. International
conference on computational science. Springer, Cham, pp 340–352
Optimum Value of Cyclic Prefix (CP)
to Reduce Bit Error Rate (BER)
in OFDM
Keywords Cyclic prefix (CP) · Bit error rate (BER) · Inter-symbol interference
(ISI) · Inter-carrier interference (ICI)
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 109
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_10
110 M. Gawande and Y. Kapse
can be used as a modulation system, the most prevalent being BPSK, QPSK, and
64-QAM [1].
The result of adding the outputs of all the modulators is the signal to be broadcast.
Compared with alternative modulation techniques, the OFDM system with channel
coding and BPSK modulation produce the lowest BER value. As a result, a BPSK-
based OFDM system using FFT and cyclic code gives the lowest BER value. A
low-rate data stream modulates each orthogonal carrier in the OFDM system, which
divides the spectrum into many of them. It will be possible to utilize the spectrum
more efficiently than with frequency division multiple access if carrier spacing over-
head is removed. Each carrier’s bandwidth is limited, it has a low symbol rate and,
as a result, a high tolerance for multipath delay spread [2]. To cause significant inter-
symbol interference, the delay spread must be enormous. As a result, inter-symbol
interference is a significant issue when evaluating the signal’s performance during
the various phases of transmission. BPSK, 64-QAM, and other modulation constel-
lations are used to modulate and map a wide-band data stream of binary digits to
a symbol stream. Inverse multiplexing is used to de-multiplex these symbols into
several parallel streams. There’s a chance the constellation will be different this
time. As a result, the bit rate of some streams may be higher than that of others. An
inverse FFT is performed on each set of symbols, resulting in a collection of complex
time-domain samples that are quadrature blended with passing band in the typical
manner [3].
2 Related Work
To attain the highest level of data transfer dependability, OFDM was initially devel-
oped in the communication industry as a technique for encoding digital data on
multiple carrier frequencies. From previous research work, I have understood that
only one thing is common in all papers, i.e., as we increase the cyclic prefix value,
our inter-carrier interference and inter-symbol interference reduce. However, one of
the papers also showed that as we increase our cyclic prefix value, our bit error rate is
reduced [4]. However, in the previous paper, they did not talk about optimum value
for same, because as we increase maximum percentage of cyclic prefix then at the
same time our transmitted date is also lost, in previous all these are found on using
simulation of MATLAB, i.e., Simulink. However, I not only worked with simulation
but also did the same thing using MATLAB coding and took the optimum value to
reduce the bit error rate without losing our data too much [5].
In this paper, we will calculate the optimum value of cyclic prefix as previous
work was done on by increasing the value of cyclic prefix. We will reduce the ICI
and ISI, and the BER. However, as a cyclic prefix is increasing at the same time, our
date is also lost, so we will here talk about the optimum value of the cyclic prefix [6].
Optimum Value of Cyclic Prefix (CP) to Reduce Bit Error Rate (BER) … 111
3 Proposed Methodology
A. OFDM Transmitter
Figure 1 depicts the OFDM system’s basic model, i.e., the transmitter part uses the
BPSK modulation technique to modulate digital data to be transmitted, after which
the data is transformed into many parallel streams. The modulated signals are then
given to the IFFT block, which converts the spectrum representation of the data into
the time domain, which is significantly more computationally efficient and employed
in all practical systems [7]. The signals are then prefixed with a cyclic prefix. During
the guard interval, the cyclic prefix consists of the end of the OFDM symbol copied
into the guard interval, followed by the guard interval and the OFDM symbol. The
guard interval comprises a copy of the end of the OFDM symbol so that when
the receiver conducts OFDM demodulation with each multipath, it will integrate
across an integer number of sinusoid cycles. OFDM exhibits exceptional resilience
in multipath scenarios. The cyclic prefix keeps subcarriers orthogonal. The receiver
can catch more multipath energy with a cyclic prefix. The signals are then translated
to serial form and sent through a transmitter. After that, digital data is sent over the
channel [8].
B. AWGN Channel
The AWGN channel model is commonly utilized in OFDM research. In this model,
the amplitudze distribution is Gaussian, and there is just linear addition of white
noise with a constant spectral density. The model doesn’t work. Consider fading,
frequency selectivity, interference, and so on [9].
Even though it is unsuitable for most people, terrestrial networks are still being
utilized to provide essential and reliable services. Controlled mathematical models to
investigate the fundamentals in the absence of the, the behavior of system elements
above, i.e., the reason we go for AWGN channel as compared to Rayleigh Channel.
C. OFDM Receiver
Serial data received from the AWGN channel is to be converted into parallel form,
i.e., in the number of subcarriers, so that the cyclic prefix can be removed from each
of the subcarriers [10].
Cyclic Prefix Removal Block: Cyclic prefix which is added to eliminate inter-
symbol interference (ISI) is to be removed first to retrieve the original data [11].
Fast Fourier Transform (FFT): The IFFT block on the transmitter side serves
the same function as the FFT block on the receiver side. FFT of each subcarrier is
determined individually parallel to the serial converter. For our results, we are using
256 FFT length and 1024 FFT length [12].
All the subcarriers are then merged in the serial form, i.e., converted from
parallel to serial form to retrieve the original data. Demodulation serial data is then
demodulated by the required modulation technique to get the original data back [13].
4 Simulation Flowchart
Figure 2 shows the flowchart of MATLAB coding, main focus on flowchart is where
we are using the cyclic prefix adder and cyclic prefix remover, as we can say cyclic
prefix adder using on transmitter side and on other side, i.e., on receiver side we are
using the cyclic prefix remover. After completing all the steps, we will get the bit
error rate graph versus signal to noise ratio, showing different values of the cyclic
prefix.
A. Simulation Models
For different modulation techniques.
See Fig. 3.
See Fig. 4
Optimum Value of Cyclic Prefix (CP) to Reduce Bit Error Rate (BER) … 113
See Fig. 5.
See Fig. 6.
Using a Bernoulli distribution, the Bernoulli binary generator block produces
random binary numbers (above simulation only eight binary bits are generated).
Utilize this block to mimic digital communication networks and produce random
data bits to acquire performance metrics like bit error rate. Zero is produced by the
Optimum Value of Cyclic Prefix (CP) to Reduce Bit Error Rate (BER) … 115
Bernoulli distribution with parameter p with probability p and one is produced with
probability 1 − p. The mean value of the Bernoulli distribution is 1 and the variance
is p. (1 − p). Any real value between [0, 1] can be used as the probability of zero
parameter, which determines p.
A column or row vector, a two-dimensional matrix, or a scalar might be the
output signal. The samples per frame parameter determines the number of rows in
the output signal, which corresponds to the number of samples in a frame. The number
of elements in the probability of zero parameter determines how many columns there
are in the output signal, which is equal to the number of channels.
AWGN Channel (or) Rayleigh Channel: This is the path by which the data is
transmitted. The presence of noise in this medium has an impact on the signal and
116 M. Gawande and Y. Kapse
produces data content distortion. Additive white Gaussian noise (AWGN) is a funda-
mental noise model used in information theory to simulate the effect of many random
processes seen in nature.
Rayleigh fading is a statistical model that explains how the propagation environ-
ment affects a radio signal, such as the one used by wireless devices. The underlying
premise of Rayleigh fading models is that the strength of a signal traveling through
such a transmission medium (also referred to as a communication channel) will
randomly change, or fade, in accordance with a Rayleigh distribution, which is the
radial component of the sum of two uncorrelated Gaussian random variables.
Under QAM, 16-QAM, and 64-QAM modulation schemes, the AWGN channel
has the best performance of all channels because it has the lowest bit error rate (BER).
The quantity of noise in this channel’s BER is much lower than in fading chan-
nels. Rayleigh fading has the worst performance of all channels, since its BER has
been heavily influenced by noise under QAM, 16-QAM, and 64-QAM modulation
schemes.
The evaluation of various cyclic prefix lengths indicated that, on that basis, we
established the optimal value acceptable for the results, which lowered our bit error
rate while not causing excessive data loss. Figures 7, 8, 9, and 10 depict the various
cyclic prefix values for various modulation techniques, but for our understanding we
are taking only two modulation techniques, i.e., BPSK, 64-QAM with different FFT
length like 256, 1024 FFT length. From Figs. 7, 8, 9, and 10 we can say that as cyclic
prefix value increases our bit error rate decreases, but from Figs. 7, 8, 9, and 10, we
can conclude that as cyclic prefix value increases, we know that our date is also lost
hence we also take value which is less or optimum to reduce our bit error rate, at the
same time our data to not be lost we can take that value nearly 10%, because that
much data loss is less when compared with 40–50% of cyclic prefix (CP).
Fig. 7 Different cyclic prefix values for BPSK modulation of 256 FFT length
Fig. 8 Different cyclic prefix values for BPSK modulation of 1024 FFT length
118 M. Gawande and Y. Kapse
Fig. 9 Different cyclic prefix values for 64-QAM modulation of 256 FFT length
Fig. 10 Different cyclic prefix values for 64-QAM modulation of 1024 FFT length
Optimum Value of Cyclic Prefix (CP) to Reduce Bit Error Rate (BER) … 119
References
1. Lim C, Chang Y, Cho J, Joo P, Lee H (2005) Novel OFDM transmission scheme to overcome
caused by multipath delay longer than cyclic prefix. In: 2005 IEEE 61st vehicular technology
conference, vol 3. IEEE, pp 1763–1767
2. Subotic V, Primak S (2007) BER analysis of equalized OFDM systems in Nakagami, m < 1
fading. Wireless Pers Commun 40(3):281–290
3. Chang Y-P, Lemmens P, Tu P-M, Huang C-C, Chen P-Y (2011) Cyclic prefix optimization
for OFDM transmission over fading propagation with bit-rate and BER constraints. In: 2011
Second international conference on innovations in bio-inspired computing and applications.
IEEE, pp 29–32
4. Mišković B, Lutovac MD (2012) Influence of guard interval duration to interchannel interfer-
ence in DVB-T2 signal. In: 2012 Mediterranean conference on embedded computing (MECO).
IEEE, pp 220–223
5. Lorca J (2015) Cyclic prefix overhead reduction for low-latency wireless communications in
OFDM. In: 2015 IEEE 81st vehicular technology conference (VTC Spring). IEEE, pp 1–5
6. Waichal G, Khedkar A (2015) Performance analysis of FFT based OFDM system and DWT
based OFDM system to reduce inter-carrier interference. In: 2015 international conference on
computing communication control and automation. IEEE, pp 338–342
7. Jadav NK (2018) A survey on OFDM interference challenge to improve its BER. In: 2018
second international conference on electronics, communication and aerospace technology
(ICECA). IEEE, pp 1052–1058
8. Gowda NM, Sabharwal A (2018) CPLink: interference-free reuse of cyclic-prefix intervals in
OFDM-based networks. IEEE Trans Wireless Commun 18(1):665–679
9. Farzamnia A, Hlaing NW, Mariappan M, Haldar MK (2018) BER comparison of OFDM with
M-QAM modulation scheme of AWGN and Rayleigh fading channels. In: 2018 9th IEEE
control and system graduate research colloquium (ICSGRC). IEEE, pp 54–58
10. Athaudage CRN, Dhammika A, Jayalath S (2004) Delay-spread estimation using cyclic-prefix
in wireless OFDM systems. IEE Proceedings-Communications 151(6):559–566
11. Bandele JO (2019) A Matlab/Simulink design of an orthogonal frequency division multiplexing
system model. International Journal of Engineering Inventions 8(4)
12. Alkamil ADE, Hassan OTA, Hassan AHM, Abdalla WFM (2020) Performance evaluations
study of OFDM under AWGN and Rayleigh channels. In: 2020 international conference on
computer, control, electrical, and electronics engineering (ICCCEEE). IEEE, pp 1–6
13. Mohseni S (2013) Study the carrier frequency offset (cfo) for wireless OFDM
Optimum Sizing of Solar/Wind/Battery
Storage in Hybrid Energy System Using
Improved Particle Swarm Optimization
and Firefly Algorithm
Abstract The integration of hybrid energy systems (HES) with solar photovoltaic
(PV)—wind turbine (WT)—battery energy storage (BES) is increasing rapidly to
enhance the performance of microgrid or power systems along with mitigating local
energy crisis and environmental pollution concerns. In this work, the performance of
HES with PV-WT-BES is improved by minimizing the total annual cost of these
components using improved particle swarm and firefly optimization algorithms.
These two algorithms are compared and analyzed for three system configurations
as PV-BES; WT-BES and PV-WT-BES to determine the optimum capacity sizing of
PV, WT, and BES to fulfill the load of a particular place. The study also highlighted
the impact of the state of charge (SOC) of BES on the optimum sizing of system
components and its overall cost. The effect of SOC variation is analyzed for two
battery chemistries as lithium-ion and lead acid.
1 Introduction
Hybrid energy systems (HES) with PV and WT are proven to be a boon for addressing
crises due to depletion in fossil fuels and power shortages from the past few eras.
But these PV and WT energy sources have the limitations of uncertain generated
power output and high installation cost, hence battery energy storage (BES) should
be integrated with them. This addition of BES can play a multi-functional role in
the electrical power system such as reducing operating costs or capital expenditures
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 121
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_11
122 G. M. Karve et al.
when used as a generator in the utility sector, facilitating the integration of RES
into the electric power system, load leveling, peak shaving, stabilizing voltage, and
frequency and maintaining uninterrupted power supply [1]. Despite certain advan-
tages, the market maturity of BES is slow due to its high cost incurred in the expensive
cell material and lack of a corresponding legal framework [2]. Though the combina-
tion of solar-wind-BES is beneficial for mitigating environmental pollution concerns
and local energy crises, this hybrid renewable energy system (HRES) faces challenges
regarding its optimum operation, optimum sizing, system security, system reliability,
and cost of the system [3]. After the literature review, it is observed that researchers
have presented many aspects for better performance of HRES either by optimizing
system cost with total annual or project cost or by taking into account loss of power
supply probability (LPSP) or by minimizing the cost of energy (COE) or by applying
various optimization techniques implemented on real-time case studies. The litera-
ture on optimum sizing of HRES components can be categorized as—minimizing
COE only [3–5] or LPSP only [6, 7]. Some of the literature shows improvement in the
system performance by combining both—COE and LPSP [8–12] and some research
articles proposed HRES optimization studies using techniques like PSO [4, 8–10,
13–15], FA [11, 12, 16], GA [6, 14] or hybrid optimization techniques like PSO-GSA
[10], PSO-GWO [8], SA-PSO [13], etc. The case study of a university campus on
a Mediterranean island by considering economic metrics as net present cost (NPC)
and levelized COE of a PV/WT/BES hybrid system is proposed in [3]. The analysis
of the PV/WT/diesel/BES hybrid system for minimum COE for homes in Morocco,
Spain, and Algeria is carried out using PSO [4]. In [5], HRES is designed for the
remote island—of Jiuduansha, Shanghai by considering the effects of saturation of
RES on various parameters like system reliability, NPC, BES size, and the repayment
period. In [6], the Nigerian case study based on optimum sizing of a hybrid system
consisting of PV/WT and storage system is presented to fulfill the load demand based
on the LPSP using enhanced genetic algorithm. In [7], the comprehensive review
of the optimum sizing of HRES in Oman is described using various optimization
methods. In [8], the enhanced whale optimization algorithm (EWOA) is applied for
the optimization and operation of HRES for electrifying a rural city in Algeria. It also
compared EWOA with various optimization algorithms like PSO, gray wolf opti-
mizer (GWO), and modified GWO to resolve the COE problem considering LPSP.
In [9], optimal planning of the microgrid consisting of solar/wind/bio-generator/
BES is presented with a real-time case study of Bihar, India. It analyzed the tradeoff
between COE and LPSP using hybrid PSO-GWO. In [10, 11], novel optimization
techniques are employed for the optimal sizing of hybrid microgrids of Egypt and
Manipur, India, respectively. In [12], power reliability with COE and load dissatis-
faction criteria for PV/WT systems are proposed using FA. The [13] proposed hybrid
optimization methods using SA and PSO as SA-PSO for determining the optimal
size of a microgrid located in Egypt. The iterative complexity for optimum sizing of
HRES which arises in GA is minimized by PSO [14]. The optimum sizing of HRES
is discussed by comparing the fuel cell with the battery by applying harmony search,
tabu search, SA, and PSO [15]. The paper concluded that PV/WT/BES combination
is economically better than the fuel cell system and PSO resulted better than other
Optimum Sizing of Solar/Wind/Battery Storage in Hybrid Energy … 123
algorithms. The [16] analyzed the optimal design of HRES using FA to achieve the
profitable operations of RES to supply the load.
The objective of this paper is to analyze the performance of the PV-WT-BES
system by minimizing total annual cost (TAC) using IPSO and FA. The objective
also includes the analysis of the impact of variation in SOC on BES sizing with two
BES chemistries lithium-ion batteries and lead acid batteries.
The rest of the paper is arranged as—Sect. 2 describes system components
under consideration, mathematically along with objective function and constraints.
Section 3 discusses size optimization algorithms. The discussion of results is given
in Sect. 4 and 5 concludes the paper with key contributions.
The HRES under consideration along with the specifications is given in Fig. 1. The
HRES comprises PV, WT, inverters/converters, BES, and the load of a particular
location at Rafsanjan, Iran [15].
The mathematical modeling of components of HRES like PV, WT, and BES is
referred from [15, 17].
Solar Irradiance
irradiance for a specific 800
location
(W/m2)
600
400
200
0
1 3 5 7 9 11 13 15 17 19 21 23
Time (Hour)
Figure 2 shows the average daily solar irradiance of Rafsanjan, Iran [15]. The output
power of each solar PV panel is derived as per Eq. (1) and total PV power output can
be found by using Eq. (2).
⎧ ( 2 )
⎪
⎪
r
0 ≤ r ≤ Rcr
⎨ Prs ( Rsrs ·R
)cr
Ppv (t) = Prs r Rcr ≤ r ≤ Rsrs (1)
⎪
⎪ Rsrs
⎩
Prs Rsrs ≤ r
where,
Prs Rated PV power in watt,
r Solar radiation W/m2 ,
RCR Certain solar radiation in W/m2 ,
Npv Number of PV panels,
Ppv Power output of all PV panels,
ppv−Each (t) Power rating of a PV panel, and
RSRS Solar radiation with the standard environment as 1000 W/m2 .
The average daily wind velocity of Rafsanjan, Iran [15] is depicted in Fig. 3. The
output power of each WT is derived as given in Eq. (3) and the total power output of
WT can be found by using Eq. (4).
Optimum Sizing of Solar/Wind/Battery Storage in Hybrid Energy … 125
Wind Speed(m/s)
velocity of wind for a
6
specific location
4
2
0
1 3 5 7 9 11 13 15 17 19 21 23
Time (Hour of day)
⎧
⎪
⎨ Pwn Vwr ≤ Vw (t) ≤ Vco
Pw (t) = Pwn [V[Vwwn(t)−Vci ]
−Vci ]
Vci ≤ Vw (t) ≤ Vwr (3)
⎪
⎩0 Vw (t) ≤ Vci or Vw (t) ≥ Vco
where;
Depending upon the charge available in the battery (SOC), the BES either can fulfill
the load demand (discharge) or store the excess power if the generated power by
RES (PV/WT) is greater than the load (charge). The energy of the BES at the time
of charging and discharging can be derived from Eqs. (5) and (6) respectively. All
these equations are referred from [15, 17] explicitly (Fig. 4).
Battery Power (watt)
Load (watt)
[15, 17] 2000
1500
1000
500
0
1 3 5 7 9 11 13 15 17 19 21 23
Time (Hour)
B. Discharging Mode
[ ]
E batt (t) = E batt (t − 1) × (1 − σ ) (E load (t)/ηinv ) − E pv (t) + E wt (t) × ηbatt (6)
where;
2.1.4 Load
The real-time data from the load curve of a particular area (Rafsanjan, Iran) by
averaging out data of one year into one day or 24 h [15, 17] (Fig. 5).
This work aims to find out the optimum capacity size of the hybrid energy system
(HES) components, which is carried out by minimizing the TAC of these components.
The TAC is addition of capital cost (C Cpt ) and maintenance cost (C Mtn ) of every
component of HES such as solar PV, wind turbine, and BES annually. The cost
intended at the time of the project installation is the capital cost and the cost intended
Optimum Sizing of Solar/Wind/Battery Storage in Hybrid Energy … 127
during the working of the project is the maintenance cost. The minimum TAC is
attained by reducing C Cpt and C Mtn of solar PV, wind turbine, and BES annually.
Equations (7)–(9) are depicting these costs determined along with the equality and
inequality constraints as ΔP = (Pgen − Pdem ) ≥ 0.
[( )]
Capital Cost ofHRES = CCptpv + CCptwt + CCptbatt
[( ) ]
Capital Cost of HRES = Npv × Cpv + (Nwt × Cwt ) + (Nbatt × Cbatt ) (8)
[ ]
CMtn = (CMtnpv + CMtnwt + CMtnbatt ) (9)
where, C Cpt : Capital cost of solar PV (C CptPV ), Wind (C Cptwt ) and battery (C Cptbatt ),
C Mtn : Maintenance cost of solar PV (C MtnPV ), Wind turbine (C Mtnwt ) and battery
(C Mtnbatt )
Optimization is the process of finding out the optimum solution to make something as
feasible and efficient as possible by minimizing or maximizing the problem variables.
In this study, an attempt is made to determine the optimum capacity size of PV/wind/
BES to fulfill the load demand of a particular area by minimizing system TAC using
two optimization techniques as IPSO and FA for three system configurations.
PSO and its types are introduced by Kennedy and Eberhart in 1995. It is a heuristic
algorithm to determine the optimized solution to a problem. In the IPSO algorithm,
every viable solution to the optimization problem is represented by a ‘particle’ and
it is stated by a velocity vector. The mathematical modeling of this optimization
method is given by Eq. (10) and Eq. (11), which are referred from [15, 17, 18].
X i + 1 = Vi + 1 + X i (11)
128 G. M. Karve et al.
where,
V Velocity of particle,
X Position of particle,
C1, C2 Acceleration constants,
r1, r2 Randomly generated numbers between 0 and 1, and
Pbest , Gbest Local and global best positions of particles.
The firefly algorithm (FA) was introduced by Xin-She Yang and is inspired by the
flashing behavior of fireflies. The ideal rubrics which are followed in this algorithm
are as given below.
1. All fireflies have the same sex, hence one firefly gets attracted to other fireflies.
2. The brightness of fireflies decides the attractiveness. Therefore, the less luminant
firefly gets attracted by the more luminant one. The attraction is dependent on the
intensity of brightness and they both are inversely proportional to the distance.
If there is no brighter firefly than a certain firefly, it flies arbitrarily.
3. The brightness of the firefly changes with the objective function.
The mathematical modeling of this optimization method is based on the above
three rules and is given by Eqs. (12)–(16). The attractiveness (β) of a firefly is the
function of the distance ‘r’. Hence the relation between attractiveness and distance
of two fireflies is mentioned as
β = β0 · e−γ r2 (12)
The distance of separation of firefly ‘i’ and firefly ‘j’ which have their positions
as ‘X i ’ and ‘X j ’ can be expressed by Eq. (13).
/[( ) ( ) ]
ri j = X i − X j 2 + Yi − Y j 2 (13)
( )
X i+1 = X i + β0 · e − γ ri j 2 X j − X i + α(rand − 0.5) (15)
where,
1. Read input parameters for solar PV as solar irradiance, panel efficiency, and
power rating of single PV panel for 1st and 3rd system configuration.
2. Read input parameters for wind system as wind speed (wind cut-in, cut-out,
nominal), and power rating of single WT for 2nd system configuration.
3. For all three system configurations, read data of load demand over 24 h [15].
4. Compute average annual load demand (kWh) over 24 h [15, 17].
5. Set the number of PV panels (for system configurations 1 and 3) or wind systems
(for system configurations 2 and 3) to one. Calculate the power generated by
PV by using Eqs. (1) and (2).
6. Find out differential power as ΔP = (Pgen − Pdem ).
7. If differential power (ΔP) < zero, then follow step 8. Otherwise, follow step 9.
8. Increase the number of PV panels by one and repeat the process from step 3.
Get the optimum quantity of PV panels (N PV ). Estimate total PV power.
9. Calculate ΔP over a period of 24 h by using N PV .
10. Find ΔP curve over a period of time ∫and convert it∫into the energy curve (ΔW ).
11. From the energy curve as [ΔW = (ΔP dt = (Pgen − Pdem )dt], estimate
battery capacity. Considering SOC, the self-discharge rate of the battery, and
inverter losses, compute the optimum capacity size of the battery.
130 G. M. Karve et al.
12. Find out the optimum quantity of batteries (N Batt ) by taking into account the
1.35kWh capacity of single BES and SOC for lithium-ion and lead acid batteries.
13. Repeat the steps from 2 to 14, for calculating power generation by wind turbine
(WT) by using Eq. (7). Find the optimum number of wind turbines (N WT ) and
batteries (N Batt ) for 2nd system configuration.
14. Repeat the steps from 1 to 14, to find out the optimum N pv , N WT , and N batt for
the 3rd system configuration.
By following the above steps from 1 to 14, IPSO and FA are implemented and
coded in MATLAB 2020 version for the objective function of minimum TAC for the
three system configurations. The results obtained are organized below from Tables 2,
3 and 4 for all system components for both optimization methods.
Though IPSO and FA both are nature inspired and swarm intelligence-based algo-
rithms, they both have some significant differences. FA can exhibit better characteris-
tics than IPSO due to its nonlinear and dynamic nature, which are briefly summarized
in Table 1.
Tables 2 and 3 give the optimum number of PV panels, WT, and batteries required
to fulfill the load demand for three system configurations as PV-BES; WT-BES, and
PV-WT-BES for minimum TAC using two optimization algorithms—IPSO and FA.
Table 2 considered lithium-ion battery and compared the results obtained with the
results given in [15]. Table 3 considered the optimum system components with lead
acid battery for the same system configurations using IPSO and FA.
Table 4 gives the impact of change in % SOC of both battery chemistries (lithium-
ion and lead acid) on their optimum number and also on the overall annual cost of
the system to fulfill the same load demand. The results are shown only for the third
system configuration as PV-WT-BES with two optimization algorithms—IPSO and
FA.
From Table 2, it is seen that the results obtained by implementing IPSO and FA
are reasonably matching with the results in [15]. It is observed that time taken by
FA method to reach to optimum solution is considerably reduced than that of IPSO
method. In this particular case (Rafsanjan, Iran), it is also seen that out of three
system configurations, WT-BES configuration gives optimum solution. But this may
not be a generalized statement as availability of solar irradiance and wind velocity
at that specific location must be checked prior to the installation of HRES.
Optimum Sizing of Solar/Wind/Battery Storage in Hybrid Energy … 131
The mismatch in results regarding number of the system components may be due
to the inexact mapping of wind velocity and solar irradiance data than that of [15].
While, the mismatch in system’s overall cost than mentioned in [15] is due to the
changes in the cost of PV, WT, and BES.
Table 3 gives the results obtained for lead acid batteries by implementing IPSO
and FA. It is observed that time taken by FA method to reach to optimum solution
is considerably reduced than that of IPSO method. In this case (Rafsanjan, Iran),
it is also seen that out of three system configurations, WT-BES configuration gives
optimum solution. But this may not be a generalized statement as availability of solar
irradiance and wind velocity at that specific location must be checked prior to the
installation of H RES . After comparing Tables 2 and 3, it is seen that the cost of the
system is more with lead acid batteries than that of cost of the system with lithium-ion
batteries. Again, it may not be a general statement as the cost of batteries changes
with the advanced research in the material or chemistry of the battery. So, this work
132 G. M. Karve et al.
Table 3 Optimum number of PV, WT, and batteries (lead acid) by IPSO and FA methods
Sr. no. System configuration No. of PV panels, no. of wind turbines, and no. of BES
IPSO FA
1 PV + BES + load N PV = 61, N Batt = 110 N PV = 61, N Batt = 110
TAC of the system in 8742.54
$
Time elapsed (s) 15.12 4.21
2 WT + BES + load N WT = 8, N Batt = 12 N WT = 8, N Batt = 12
TAC of the system in 5341.76
$
Time elapsed (s) 16.12 5.13
3 PV + WT + BES N PV = 12, N WT = 7, N Batt = N PV = 12, N WT = 7, N Batt =
(1.35 kWh) + load 14 14
TAC of the system in 7856.28
$
Time elapsed (s) 22. 92 8.95
has given a choice to the user to select any particular type of battery chemistry (either
lithium-ion or lead acid) as per the requirements or feasibility or economy.
Table 4 presents the effect of % SOC of both battery chemistries (lithium-ion and
lead acid) on their optimum number required for PV-WT-BES system configuration
with both optimization algorithms as IPSO and FA.
Optimum Sizing of Solar/Wind/Battery Storage in Hybrid Energy … 133
Table 4 Results of IPSO and FA method for number of lithium-ion battery and lead acid battery
for 3rd system configuration (PV-WT-BES)
Sr. % SOC of No. of No. of No. of LI No. of Convergence Convergence
no. BES PV WT BES LA BES time for IPSO time for FA (s)
panels (N WT ) (N Batt ) (N Batt ) (s)
(N PV )
1 1350 12 7 8 15 22. 9 8.95
(100%)
2 1080 (80%) 13 7 11 20 20.42 7.61
3 810 (60%) 13 11 42 80 14.61 8.32
4 675 (50%) 16 13 60 ara> 98 15.32 15.32
5 270 (20%) 19 16 75 120 17.96 6.24
The result Tables 2, 3 and 4 showed odd numbers for PV panels, wind turbines
and batteries, but practically even numbers can be preferred after checking technical
and economic feasibility. From Table 4, it is seen that, if the SOC of both battery
chemistries (lithium-ion and lead acid) is reduced from 100% (1350 Wh) to 20%
(270 Wh), then the number of batteries, number of PV panels, and number of WT
required to fulfill the same load demand are increasing for the same system config-
uration. This observation is valid for both optimization algorithms—IPSO and FA.
But FA converges faster than IPSO. Though Table 4 gives the ideal operating % SOC
range of both batteries from 20% (270 Wh) to 100% (1350 Wh), practically the range
of the % SOC of lead acid battery is from 50% (675 Wh) to 100% (1350 Wh) and
that of lithium-ion battery is from 20% (270 Wh) to 100% (1350 Wh). This is shown
in Fig. 6a and b respectively.
Figure 6 shows variation in % SOC of lithium-ion and lead acid battery for 24 h
of a day, respectively. Figure 6 indicates the range of % SOC for lithium-ion battery
which is in between 20 and 100% and for lead acid battery is in between 50 and
(a) (b)
Fig. 6 a % SOC for lead acid battery (20–100%). b % SOC for lithium-ion battery (50–100%)
134 G. M. Karve et al.
5 Conclusions
The paper discussed the optimum sizing of PV, WT, and BES by minimizing TAC of
the hybrid energy system using IPSO and FA. After comparing results of IPSO and
FA with results of referred paper, it is seen that both methods are sensibly accurate,
but FA converges faster than IPSO. For the particular case of Rafsanjan, Iran, it
is also seen that out of three system configurations (PV-BES, WT-BES, PV-WT-
BES), WT-BES configuration gives optimum numbers of system components with
minimum system cost. But this may not be a generalized comment as availability of
solar irradiance and wind velocity at that specific location must be checked prior to
the installation of HRES. The impact of SOC variations of battery on optimum sizing
of system components is also analyzed with these two methods. It is observed that
as % SOC of battery goes on decreasing, the optimum size of battery is increasing
which will increase the overall system cost. This effect is analyzed for two battery
chemistries—lithium-ion and lead acid. The result shows that a greater number of
lead acid batteries are required than that of lithium-ion batteries to fulfill the same
load demand.
Optimum Sizing of Solar/Wind/Battery Storage in Hybrid Energy … 135
References
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 137
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_12
138 V. Gore and P. Holambe
electronics plays an important role to manage the power between two energy sources
[1] and [2].
Single inductor-based topologies have been proposed in [3–6], single inductor-
based multiple ports DC–DC converters based on the main two step-up and step-
down types. By reducing the number of conduction devices in each stage, the size
of the converter reduces, and the converter becomes more effective. Multiple power
sources are used by single inductor-based multiport topologies, cascade and parallel
multiport converters, to power a wide range of applications. It has minimum losses
and simple control when compared with multiple converters [5, 7–9].
The purpose of this research is to provide a novel battery management system and
an MPPT controller based on fuzzy logic for an isolated PV system. A new method
is used to develop the energy management system with fuzzy-based MPPT control.
In different irradiance, MPPT always track maximum power point [10, 11].
In this work, multiple input/output DC–DC converter is designed for the power
management between solar PV and energy storage system. The control technique
used is a time-sharing closed-loop control, whereas it maintains the power flow
between solar PV, hybrid energy storage, and load. The maximum power point
tracking algorithm is used to extract maximum power from solar PV. The power
converter manages power flow between solar PV and energy storage systems to give
continuous supply to the load. It will also monitor the power generation at solar
PV and charge–discharge of the energy storage system accordingly. The proposed
system will be simulated in a MATLAB Simulink environment.
2 Proposed System
Figure 1 displays the block diagram of the proposed work in which there are two
different sources giving constant supply to the load. Solar PV with energy storage is
fed to the dc load via a multiport boost converter. The main source for the standalone
DC load is solar PV, with batteries operating as energy storage. The control block
presents the MPPT algorithm to track maximum power point from solar PV panel
and the voltage controller to keep continuous output voltage. The power modulation
block is used to generate required duty signals of three switches. The generated PWM
signals go to gate driver circuit, this provides isolation between two voltage levels
and gives required gate pulse to turn on and turn off power converter switches. The
power management between solar, battery, and load is done by using a state flow-
based modified time-sharing control scheme. Solar power depends on the external
parameters, for the uninterrupted power supply to load energy storage is required.
The multiport non-isolated boost converter is used for power management between
solar PV, ESS, and DC load [12, 13].
Fuzzy Based MPPT Control of Multiport Boost Converter for Solar … 139
3 Solar PV
In order to generate the required amount of energy, a solar photovoltaic system uses
solar energy. To track maximum power, we need some power electronics system.
Photovoltaic cells with MPPT algorithms are employed to continuously capture the
maximum solar energy. At a specific temperature and level of irradiation, the solar
PV module’s output is determined by the PV voltage and current drawn by the load.
By adjusting the solar arrays in series and parallel combinations, we can design solar
panels. From Fig. 2 we can determine the power to voltage relation. For hardware
purposes, the solar simulator can design rated solar panels.
Maximum power point algorithms help to track MPP in solar system. Different
methods are present with different algorithms, i.e., modified P and O, I and C.
Fuzzy logic-based method is easy and efficient. Multirule-based resolution and multi-
variable consideration for both linear and nonlinear fluctuations of parameters are
two characteristics of fuzzy logic control. Additionally, it can function with improper
inputs. Fuzzification, rule base, inference engine, and defuzzification are the four
aspects of a fuzzy logic system. Figure 3 shows the fuzzy-based MPPT algorithm
[14].
When the solar power is more than the load requirements, the solar PV panel serves as
the primary source of supply for the energy storage and the DC load. Figure 5 shows a
multiport converter in double mode operation. When the solar power increases or the
load decreases then the system goes into DO mode. In this mode, charging switch S2
is on and switch S3 becomes off state. By charging and discharging of inductor, the
proposed converter operates in three stages switch S1 is turned on during the initial
step and solar PV charges the inductor L. Power flows through this switch by D1,
passes the current and D2 and D3 block the current. Here ip, V p represent primary
source current and voltage, current of inductor and output load voltage are represented
as, iL and V o , respectively. By switching the operation of switches, we perform this
In this mode, solar irradiance increases, and solar power becomes more than the
load demand. The proposed converter in this stage acts like double input mode.
Solar PV as a primary source for the load and battery both. In this mode only switch
1 and switch 2 operates. Third switch becomes off during this stage. In the first
stage primary source supply to the inductor and inductor fully charges. In the next
stage, inductor supply to the load as an only first switch is on. Then by turning third
switch on inductor supplies power to the battery. Figure 6 indicates different stages
of converter in this mode.
5 Proposed Controller
The MPPPT algorithm is set to extract the maximum power from primary source,
i.e., solar PV. Solar power is not constant throughout the day. MPPT by adjusting the
impedance will help to operate solar PV close to maximum power point under varying
conditions like solar irradiance temperature. Here the simple I and C method is used.
The mode selection is important in this control strategy. For the mode selection in
Fig. 7, the PV power and load power taken as an input and Msel is the mode selection
Fuzzy Based MPPT Control of Multiport Boost Converter for Solar … 143
signal. The state flow control logic is used to select the appropriate mode according to
excess and deficit in solar power. In the state chart, conditions are given to transitions
to select the mode. When there is excess solar PV power, then Msel gives signal to
0 which is DOBM and when there is deficit solar power then gives signal 1 which
is DIBM mode. By using state flow, the control becomes easy and we can see the
changes in the transitions live. Battery charge control is added to control the charging
and discharging of battery in different modes. Lifecycle is main issue in the Lithium-
ion batteries. By limiting the over-charge and over-discharge we can protect battery
lifecycle. Here battery can charge only below 80% SOC and can discharge only
above 20% SOC.
6 Results
In this instance, when the solar irradiance rises from 400 to 800 W/m2 and vice
versa, changes in solar irradiation result in an increase in solar power of 40–70 W.
Due to the system entering DO mode, solar power is more than load power, and
the mode selection block’s output is 1. When solar power is not sufficient to fulfill
load requirement, then both supply to load. For system power management, MPPT
duty signal and voltage control signal as input to the controller. The battery holds
Fuzzy Based MPPT Control of Multiport Boost Converter for Solar … 145
the remaining energy. By charging and discharging an inductor, solar energy in this
case, provides power to a load as well as a battery. Those changes in mode but output
voltage remains constant (Fig. 8).
7 Conclusion
The modified fuzzy-based MPPT with time-sharing control is used for power flow
control between solar and battery. In this work, a fuzzy-based algorithm is, used for
MPPT which is more efficient than conventional. The suggested converter provides
a constant supply to the load through the operation with multiple inputs and outputs.
Mode selection logic increases system effectiveness and slows down battery degra-
dation. Performing simulation operations in different conditions with a modified
control method provides better results.
References
1. Oosthuizen C, Van Wyk B, Hamam Y, Desai D, Alayli Y (2019) Lot R (2019) Solar
electric vehicle energy optimization for the Sasol Solar challenge 2018. IEEE Access
7:175143–175158. https://doi.org/10.1109/ACCESS.2019.2957056
2. Elshafei M, Al-QutubA, Saif AA (2016) Solar car optimization for the world solar challenge.
In: 2016 13th international multi-conference on systems, signals & devices (SSD), pp 751–756.
https://doi.org/10.1109/SSD.2016.7473675
3. Jiang W, Fahimi B (2011) Multiport power electronic interface—concept, modeling, and
design. IEEE Trans Power Electron 26(7):1890–1900
4. Ki W-H, Ma D (2001) Single-inductor multiple-output switching converters, vol. 1, pp 226–231.
https://doi.org/10.1109/PESC.2001.954024
146 V. Gore and P. Holambe
Ch. Bhavya Sri, Sudeshna Sani , K. Naga Bavana, and Syed. Hasma
Abstract Many decagons faced several unresolved problems because of the growth
of machine learning, including image identification, image detection, picture cate-
gorization, etc. The most fundamental, traditional, and important subject matter of
research in the area of machine learning has always been image recognition. Image
recognition software progresses in society at a faster rate than technology. The protec-
tion of personal information, for instance, when using mobile phones, depends on
the picture recognition. For picture recognition, we used the GAN algorithm and the
CNN algorithm. To categorize segment, and recognize images, machine learning-
based image preprocessing technology is used. Nevertheless, because of the intricacy
of video images and the current nature of things in various application qualifications,
accurate categorization becomes vital and difficult. The usage of image recognition
technologies is very useful in the future generation.
1 Introduction
With the advent of technology, real-time facial gesture detection is becoming increas-
ingly important in the field of human–computer interaction. While we employ
contact-free and reasonable, face detection-based approaches [1, 2] that use warble
detection to identify face gestures and the vision-based approaches only require one
or more cameras for taking images or videos to recognize face movements. Numerous
vision-based static approaches for recognizing postures or specific poses as well as
dynamic methods for recognizing a series of postures and facial gestures have been
proposed. And machine learning is a significant and challenging subject for image
processing [3, 4], particularly in the field of enormous image processing where
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 147
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_13
148 Ch. Bhavya Sri et al.
machine learning approaches can be used to analyze complex data [5]. Machine
learning techniques can be created from complex data like disease identification by
plant leaves image processing [6]. To enable the fair application of image recog-
nition in many domains and industries, the primary features of an image are split
[7]. Machine learning-based image processing techniques have been extensively
employed in picture classification, segmentation, and recognition [8]. A technique
called biometrics is used to measure and examine a person’s physical and behavioral
traits.
An intriguing innovation in machine learning recently is a method called gener-
ative adversarial networks (GANs). GANs, or generative models, create new data
instances that resemble your training data. For example, GANs have the ability to
create images that resemble photographs of faces with human traits even when the
corresponding faces don’t actually belong to any living thing. The primary input for
a GAN algorithm is random noise. The generator then transforms this noise into
a useful output. By introducing noise and sampling from various points across the
target distribution, we may make the GAN provide a broad range of data.
CNN is a well-liked and efficient pattern detection and image processing approach.
It has many benefits, such as adaptability, a simple structure, and reduced training
requirements. Spatial correlations found in the input data are used by CNN. Each
concurrent layer of the neural network is coupled to certain input neurons. The area
is referred to as the “local receptive field”. The focal point of the local receptive field
is a hidden neuron. CNNs, often referred to as convolutional neural networks, are
a class of artificial neural networks used in deep learning and are frequently used
for object and image recognition and categorization [9]. As a result, deep learning
employs a CNN to recognize objects in a picture.
One of the best image analysis tools that have recently achieved prominence in our
surveillance and security-related applications is face recognition [10]. It requires
verifying someone’s identity by looking at their face. Based on the subject’s face
features, such as their eyes and nose, it captures, assesses, and contrasts patterns.
System access is granted, and authentication is put into place. It uses a human face’s
biometric patterns as part of its biometric identification mechanism.
efficient real-time data is safer than data derived from a static image. Face recognition
is a commonly used method that uses biometrics to map facial attributes from our
database. Face verification is a method for comparing two faces to find the correct
person.
2 Literature Review
Zhu et al. in [11] used data complexity for the generation of contextual synthetic
data. In this work, calculating the length of boundary calculation technique is used
to calculate the data complexity [DC]. The length of the class boundary decides the
complexity of the data. The dimensionality may also influence classifier accuracy.
The use of synthetic datasets is more useful in analyzing the algorithm in a
controlled scenario. Several geometrical descriptors has been defined by several
studies by identifying characteristics of the datasets. These descriptors were found
useful to understand classifier performance. This approach can be more useful in
identifying the performance of the algorithm in different degrees of class imbalance.
They arrived at the expression for data complexity calculation which is given in
Eq. (1). They have taken the example of the generation of minimum spanning trees
with different data complexities.
P = b × (n − 1) (1)
b [0,1]
n Number of instances
b Length of class boundary (desired complexity)
p Number of edges connecting different classes.
Yuan in [12] applied visual attention-based networks for the synthesis of the
image. Here the main objective is to convert magnetic resonance (MRI) images
to computed tomography (CT) images using the fully convolutional networks, to
reduce the side effects for the patient because of the radiation due to CT scan. An
MR input picture is first divided into overlapping areas, and the generator is then
used to forecast the associated CT patch for each patch. The first method proposed in
this paper is supervised GANs, which consist of a network that contains a generator
for predicting the CT and a discriminator for separating the genuine CT from the
generated CT. Usually, GANs have two network-generators (G) and discriminators
(D). In order to reduce the binary cross entropy (BCE) between D’s decisions and the
appropriate label (real or synthetic), we minimize the BCE between D’s decisions and
the correct label (real or synthetic). Generators are FCNs that generate images, and
discriminators are CNNs that calculate the likelihood that the input image was created
from real images (real or synthetic). The second approach presented is auto-context
model (ACM) for refinement.
150 Ch. Bhavya Sri et al.
Wang et al. in [13, 14] used synthetic data for image segmentation. Synthetic data
is vital because it can be generated to meet specific requirements or conditions that
are not available in existing (real) data. The technique for synthetic data generation
is GAN. GAN is an unsupervised task in machine learning. Generative adversarial
networks consist of two models that automatically discover and learn the patterns
form the input data. The Generator and Discriminator models run in competition with
each other to generate new records, examine records, and classify the variances within
a dataset. The numbering of heading levels should be limited to two. Lower-level
headings are structured as run-in headings and are left unnumbered.
The generative adversarial self-attention network (SAGAN), which provides
attention-driven, long-range dependency modeling for image tasks, was employed by
Zhang, Han, and colleagues [15]. Generative adversarial networks (GANs) with tradi-
tional convolutional architectures only produce high-resolution information based on
local spatial points and function cards with poor resolution. Signals from every func-
tional location can be used by SAGAN to produce details. The discriminator may also
examine several intricate details in far-off passages. The pictures match up with one
another. Recent studies have also revealed that the performance of GANs is impacted
by the conditioning of the generators. Apply the spectral normalization supplied to
the GAN generator using this information to check whether the training dynamics
are improved. The suggested SAGAN [16] performs better than task 1 and generates
the strongest drive. On the difficult ImageNet dataset, they moved the starting point
from 36.8 to 52.52 and decreased Fréchet’s starting distance from 27.62 to 18.65.
Low-rise visualizations demonstrate that the generator benefits from the relevant
neighborhood within the object’s shape, are not from its immediate surroundings.
3 Experimental Investigation
We discovered the most effective technique for producing the necessary high-quality
photographs after having a literature review. The optimal method for producing as
many images as needed, according to our research, is GANs, or generative adversarial
networks [17]. To create new, artificial instances of data that can be mistaken for
genuine data, algorithmic structures known as GANs use two neural networks in
competition with one another. There are often two networks in GANs. Generators G
and discriminators D, which can distinguish between genuine and synthetic images,
are both trained simultaneously. Whereas generators G are FCNs that create images,
discriminators D are CNNs that estimate the likelihood that an input image is taken
from a real image. D was trained to distinguish between actual and artificial data,
whereas G was trained to create realistic visuals that will deceive D [18].
We need to identify the criteria used to obtain the necessary quality photographs
from our experimental research. With the celebrity dataset, we initially trained the
model. It includes pictures of famous people. We can produce the photos as needed.
Image Classification Model Based on Machine Learning Using GAN … 151
The dataset of photos of celebrities for this article is taken from Kaggle. The sample
photos that are included in the collection are described in Fig. 3. The number of
photos provided in the dataset affects how well this training model performs.
4 Result Analysis
The main criterion used to evaluate the effectiveness of any face identification algo-
rithm is the obtained accuracy of the match. The accuracy is calculated using the
algorithm’s ability to recognize face input. The percentage of the match is displayed.
It is essential for the algorithm to show the closest percentage of matches. With
Image Classification Model Based on Machine Learning Using GAN … 153
Pie Chart and Bar Chart. Pie charts are one of the most well-known and often-
used ways of data visualization and are utilized in a wide range of applications. This
pie chart shows how well-known each celebrity is because of their movies. This pie
chart is also easy to understand and apply when studying the case study. It can be an
effective tool for communicating with even the most ignorant audiences because it
graphically represents data as a small portion of a larger total. It makes it possible for
the audience to easily understand information or compare data in order to undertake
analysis. The bar chart’s bar lengths display how each group stacks up against the
value. When there are too many categories present, the bar’s labeling and clarity can
become a problem. This bar graph displays the celebrity’s level of online popularity
(Fig. 5).
Correlation Matrix. A correlation matrix is a table that exhibits correlation coef-
ficients among several variables. Each cell’s color represents the degree to which and
154 Ch. Bhavya Sri et al.
whether two variables are related to one another, reflecting the relationship between
the two variables. Correlation matrices can be used to summarize huge datasets and
find patterns. A correlation matrix may be used in business to investigate the connec-
tions between different product-related data items, such as the launch date, etc. This
matrix displays the facial expressions of famous people. Additionally, the celebrity’s
name is displayed (Fig. 6).
Scatter Plot Matrix. The dataset’s scatter plot matrix displays all pairwise scatter
between different variables as a matrix for k sets of variables or columns, such as
(x1, x2…xk), along with their names in this scatter plot matrix. Many relationships
Image Classification Model Based on Machine Learning Using GAN … 155
between variables can be examined in one chart by scatter plots. We may generally
assess if there is a linear link between several variables by using scatterplot matrices.
This is especially useful for identifying particular variables that might correlate with
your genomic or proteomic data. To display bivariate correlations between different
combinations of variables, scatter plots are arranged in a grid (or matrix). Numerous
associations can be investigated in a single chart thanks to the scatter plots in the
matrix, which each show the link between a pair of variables. As a result, the scatter
plot matrix has k rows and k columns for each of the k variables in the dataset. Each
row and column represents a scatter plot. Additionally, this scatter matrix graph
demonstrates how the celebrity’s facial emotions are displayed together with their
names (Fig. 7).
On the vertical axis, variable xj.
On the horizontal axis, variable xi.
156 Ch. Bhavya Sri et al.
6 Conclusion
out physical data growth. Although data is all around us, tagged data is uncommon.
Similar to other fields, collecting data for picture recognition is simpler, but doing
so manually requires a lot of time and effort.
References
1. Sukmandhani AA, Sutedja I (2019) Face recognition method for online exams. In: International
conference on information management and technology (ICIMTech), Jakarta/Bali, Indonesia,
pp 175–179
2. Venkateswar Lal GR, Nitta AP (2019) Ensemble of texture and shape descriptors using support
vector machine classification for face recognition. Ambient Intell Humaniz Comput
3. Fayyoumi A, Zarrad A (2014) Novel solution based on face recognition to address identity
theft and cheating in online examination systems. Adv Internet Things 4(3):5–12
4. Bah SM, Ming F (2020) An improved face recognition algorithm and its application in
attendance management system. Array 5
5. Kranthikiran B, Pulicherla P (2020) Face detection and recognition for use in campus
surveillance. Int J Innovative Technol Exploring Eng 9(3)
6. Mitra D, Gupta S (2022) Plant disease identification and its solution using machine learning.
In: 2022 3rd international conference on intelligent engineering and management (ICIEM),
London, United Kingdom, pp 152–157. https://doi.org/10.1109/ICIEM54221.2022.9853136
7. Kamencay P, Benco M, Mizdos T, Radil R (2017) A new method for face recognition using
convolutional neural network. Digital Image Process Comput Graphics 15(4):663–672
8. Traoré YN, Saad S, Sayed B, Ardigo JD, de Faria Quinan PM (2017) Ensuring online exam
integrity through continuous biometric authentication
9. Sani S, Bera A, Mitra D, Das KM (2022) COVID-19 detection using chest X-Ray images based
on deep learning. Int J Softw Sci Comput Intell (IJSSCI) 14(1):1–12. https://doi.org/10.4018/
IJSSCI.312556
10. Traoré I, Awad A, Woungang I (eds) (2017) Information security practices. Springer, Cham
11. Zhu C, Zheng Y, Luu K, Savvides M (2017) CMS-RCNN: contextual multi-scale region-
based CNN for unconstrained face detection. In: Bhanu B, Kumar A (eds) Deep learning for
biometrics. Advances in computer vision and pattern recognition. Springer, Cham
12. Yuan Z (2020) Face detection and recognition based on visual attention mechanism guidance
model in unrestricted posture. In: Scientific programming towards a smart world
13. Wang B, Chen LL (2019) Novel image segmentation method based on PCNN. Optik 187:193–
197
14. Wang K, Zhang D, Li Y et al (2017) Cost-effective active learning for deep image classification.
IEEE Trans Circ Syst Video Technol (99):1–1
15. Zhang H et al (2019) Self-attention generative adversarial networks. International conference
on machine learning. PMLR
16. Merrigan A, Smeaton AF (2021) Using a GAN to generate adversarial examples to facial image
recognition. ArXiv. https://doi.org/10.2352/EI.2022.34.4.MWSF-210
17. Cheng F, Hong Z, Fan W et al (2018) Image recognition technology based on deep learning.
Wireless Pers Commun C:1–17
18. Lin BS, Liu CF, Cheng CJ et al (2018) Development of novel hearing aids by using image
recognition technology. IEEE J Biomed Health Inf 99:1-1
19. Zhang XB, Ge XG, Jin Y et al (2017) Application of image recognition technology in census
of national traditional Chinese medicine resources. Zhongguo Zhong yao za zhi = Zhongguo
zhongyao zazhi = China J Chinese Materia Medica 42(22):4266
158 Ch. Bhavya Sri et al.
20. Sun D, Gao A, Liu M et al (2015) Study of real-time detection of bedload transport rate using
image recognition technology. J Hydroelectric Eng 34(9):85–91
21. Aggarwal A, Mittal M, Battineni G (2021) Generative adversarial network: An overview of
theory and applications. Int J Inf Manag. 100004. https://doi.org/10.1016/j.jjimei.2020.100004
Role of Natural Language Processing
for Text Mining of Education Policy
in Rajasthan
Abstract The knowledge of education policy will bring an array of new growth,
but it has necessitated an improved type of human–machine intercommunica-
tion, in which the machine enhances a thoughtful and interactive intelligence.
Natural language processing (NLP), a part of artificial intelligence (AI), is
the competence of a computer program to comprehend spoken and written
human language (https://www.linguamatics.com/what-text-mining-text-analytics-
and-natural-language-processing; Zhang and Segall in IJITDM 7(4):683–720,
(2008)) [1, 2]. After being thoughtful about it, in mining, one should have sagacity
for the predetermination of policy (Bhardwaj in Int J Eng Res Technol (IJERT) 1(3),
2012; Maes in Commun ACM 7:30–40, 1994) [3, 4]. Using NLP, this provides a
quick way of extracting information about education policy. This paper focuses on
manipulating NLP commands after data collection using unstructured interviews
about the attitude of NLP and then filling out a website questionnaire form to collect
the satisfaction result. Coding is executed to get the required data using Python and
NLP. During the analysis of feedback at colleges in Jaipur, Rajasthan, it is divulged
about the satisfaction of using NLP commands, so it is observed that NLP creates
a convenient way of mining. The goal behind this text mining is to identify the
importance of NLPs in getting data into an integrated form. Lastly, in the execution
phase, it narrates the process to obtain cognition for extricating data about policies
for gratification.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 159
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_14
160 P. Jain and S. Lal
1 Introduction
Text mining is one of the AI techniques. It enlists NLP and converts unstructured
text into data analysis format. Data on the web is mainly in unstructured format [5,
6]. Unstructured data is inputted into models to get predictions. NLP is a sub-part
of data science that consists of processes for intelligently processing, interpreting,
and getting knowledge from text data. NLP and its components can be used to
organize large amounts of data, perform various automated tasks, and solve a variety
of problems. Important tasks of NLP are text classification, text matching, and co-
reference resolution. Text mining is a technique for reviewing the records of a large
group to find knowledge from the data. It is broadly useful for getting knowledge
[7–10]. This uncovers documentation of large amounts with interrelationships. To
process the text, text mining can be used with NLP. Text mining produces structured
data that can be incorporated into databases [11–15].
big data visualization, etc., are operated using dimensionality reduction. This is an
unsupervised technique where the unlabeled groups of similar entities are processed
as image compression, recognizing forgery newscasts, unsolicited processes, adver-
tising mechanisms, systematizing web marketing, associating crooked or delinquent
tasks, recording surveys, and others are solved by it [18].
National Education Policy 2020 includes nearly 2 lakh suggestions from 2.5 lakh
gram panchayats, 6600 blocks, 6000 urban local bodies, and 676 districts. By 2030,
this new policy aims to universalize education from pre-school to the secondary level.
There is a strong emphasis on foundational literacy. Vocational education will begin
in Grade 6 with internships, and until Grade 5, it will be taught in the parent’s native
language. According to NEP 2020, it has been dividing the 10 + 2 system into the
5 + 3 + 3 + 4 format. Flexibility in a higher education curriculum will be added
[19–21]. Medical education will be mingled with Ayurveda, Naturopathy, Unani,
Homoeopathy, Siddha, and vice versa at the undergraduate level, according to the
education policy [22].
2 Methodology
NLP is applied for cleaning and summarizing text, tokenizing sentences and words,
getting the frequency of words, etc. There are some steps in text mining for deriving
meaningful information when manipulating NLP with Python code [23] (Figs. 2, 3,
4, 5, 6, 7, 8, 9).
#Installing NLTK (Natural Language Toolkit)
C:\Users\HP\AppData\Local\Programs\Python\Python39>python
>>> import nltk
>>> nltk.download()
Showing info https://raw.githubusercontent.com/nltk/nltk_data/
gh-pages/index.xml
#Working with tokenization in NLP
162 P. Jain and S. Lal
No
If Get NLP
With
Purposef Python
Yes
Stop
Execution
>>> fdist1=fdist.most_common(9)
>>> fdist1
[(’the’, 2), (’.’, 2), (’According’, 1), (’to’, 1), (’NEP’, 1),
(’2020’, 1), (’,’, 1), (’it’, 1), (’has’, 1)]
# Opening a jupyter notebook
164 P. Jain and S. Lal
Fig. 5 Classifying words using POS-tagging, tagged token and Brown Corpus
Fig. 9 Importing sent_tokenize() and word_tokenize() from nltk.tokenize package using Beautiful
Soup
C:\Users\HP\AppData\Local\Programs\Python\Python39>jupyter
notebook
[W 14:47:40.293 NotebookApp] Terminals not available (error was No
module named ’winpty.cywinpty’)
[I 14:47:40.543 NotebookApp] Serving notebooks from local direc-
tory: C:\Users\HP\AppData\Local\Programs\Python\Python39
[I 14:47:40.543 NotebookApp] Jupyter Notebook 6.2.0 is running at:
[I 14:47:40.543 NotebookApp] http://localhost:8888/
?token=85319cedbe702cff61e821a7e71b767c23e5c6db032d48ef
[I 14:47:40.559 NotebookApp] or http://127.0.0.1:8888/
?token=85319cedbe702cff61e821a7e71b767c23e5c6db032d48ef
[I 14:47:40.559 NotebookApp] Use Control-C to stop this server and
shut down all kernels (twice to skip confirmation).
[C 14:47:40.637 NotebookApp]
To access the notebook, open this file in a browser: file://
/C:/Users/HP/AppData/Roaming/jupyter/runtime/nbserver-1700-
open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=85319cedbe702cff61e821a7e71b767c
23e5c6db032d48ef
or http://127.0.0.1:8888/?token=85319cedbe702cff61e821a7e71b767c
23e5c6db032d48ef
[W 14:49:57.733 NotebookApp] 404 GET /undefined/undefined (::1)
22.060000ms referer=None
[I 14:53:45.992 NotebookApp] Creating new file in
[I 14:53:46.054 NotebookApp] Creating new notebook in
[I 14:53:46.443 NotebookApp] Creating new notebook in
[I 14:53:46.683 NotebookApp] Creating new notebook in
[W 14:53:46.939 NotebookApp] 404 GET /undefined/undefined (::1)
29.570000ms referer=None
Role of Natural Language Processing for Text Mining of Education … 167
3 Results
4 Conclusion
References
1. https://www.linguamatics.com/what-text-mining-text-analytics-and-natural-language-proces
sing
2. Zhang Q, Segall RS (2008) Web mining, a survey of current research, techniques, and software.
IJITDM 7(4):683–720
Role of Natural Language Processing for Text Mining of Education … 169
3. Bhardwaj B (2012) Extracting data through web mining. Int J Eng Res Technol (IJERT) 1(3).
ISSN: 2278-0181
4. Maes P (1994) Agents that reduce work and information overload. Commun ACM 7:30–40
5. Shahmoradi L (2014) Structure-based web pages clustering. Int J Sci Eng Res 5(4). ISSN:
2229–5518
6. https://www.datamation.com/big-data/structured-vs-unstructured-data.html
7. Tsuyoshi M, Saito K (2006) Extracting user’s interest for web log data. IEEE 343–346. ISBN:
0-7695-2747-7
8. Malarvizhi R, Saraswathi K (2013) Web content mining techniques tools & algorithms—a
comprehensive study. IJCTT 4(8). ISSN: 2231-2803
9. https://shodhganga.inflibnet.ac.in:8443/jspui/handle/10603/334941
10. Shetty S, Hans V (2019) Education for skill development and women empowerment. EPRA
Int J Econ Bus Rev Peer Reviewed J 7(2). e-ISSN: 2347–9671|, p-ISSN: 2349-0187
11. Inamdar SA, Shinde GN (2011) Web data mining using an intelligent information system
design. Int J Comput Tech Appl 280–283. ISSN: 2229–6093
12. Saini S, Pandey HM (2015) Review on web content mining techniques. Int J Comput Appl
118(18) (0975–8887)
13. Khalili A (2008) A semantic web service-oriented model for project management. In: IEEE 8th
international conference on computer and information technology workshops. CIT Workshops,
pp 667–672
14. Fedak G (2009) BitDew: a data management and distribution service with multi-protocol
file transfer and metadata abstraction. J Netw Comput Appl 32(5) [Next Generation Content
Networks, pp 961–975]
15. Mebrahtu A, Srinivasulu B (2017) Web content mining techniques and tools. IJCSMC 6(4):49–
55. ISSN: 2320-088X
16. Barfourosh AA, Motahary Nezhad HR, Anderson ML, Perlis D (2002) Information retrieval
on the world wide web and active logic: a survey and problem definition
17. https://searchenterpriseai.techtarget.com/definition/natural-language-processing-NLP
18. Barba P (2020) Machine learning (ML) for natural language processing (NLP), September 29,
2020. https://www.lexalytics.com/lexablog/
19. https://www.rajasthanshiksha.com/download-national-edcuation-policy-2020-pdf/
20. https://www.rajras.in/education/
21. https://economictimes.indiatimes.com/jobs/making-skilling-part-of-education-system-a-cha
llenging-task/articleshow/67633636.cms?from=mdr
22. https://www.mid-day.com/amp/mumbai/mumbai-news/article/new-education-policy-will-suf
focate-medical-education-22945232
23. https://www.kdnuggets.com/2018/11/text-preprocessing-python.html
24. Sharma P, Bhartiya R (2012) An efficient algorithm for improved web usage mining, vol 3,
issue 2, pp 766–769. ISSN: 2229-6093
25. Beniwal R, Tanwar V (2014) Evaluation of web personalization. IJIRST 1(6). ISSN: 2349-6010
26. Ameen A, Khan KUR, Rani BP (2012) Semantic web personalization: a survey. Inf Knowl
Manage 2(6). ISSN: 2224-5758
27. Yadav M, Mittal P (2013) Web mining: an introduction. Int J Adv Res Comput Sci Softw Eng
3(3). ISSN: 2277 128X
28. https://towardsdatascience.com/gentle-start-to-natural-language-processing-using-python-
6e46c07addf3
29. https://machinelearningmastery.com/clean-text-machine-learning-python/
30. Vijayarani S, Suganya E, Prakathambal M (2018) Web log files in web usage mining research—
a review, vol 5, issue 2. ISSN: 2394-2320
31. Ratnakumar AJ (2010) An Implementation of web personalization using web mining
techniques. J Theor Appl Inf Technol [2005–2010 JATIT]
Multilingual and Cross Lingual Audio
Emotion Analysis Using RNN
Abstract Speech and language are necessary for a person’s development of their
emotional and social skills. Reading, writing, and verbal apprehensions are vital
components of the overall learning process, and hence, instruction or some linguistic
representation is necessary for academic achievement. This study has been high-
lighted using several audio files based on various linguistic methodologies, and it
has been compared using multilingual and cross-lingual methods. For our experi-
ments with the English, German, Italian, and Canadian-English emotion datasets,
we have decided to use CNN and LSTM as our models of choice. We obtained the
most outstanding results in Canadian English during the trial with a single language.
Here, we obtained CNN at 99.12% and LSTM at 98.96%. During the experiment
with several languages, only English test data was found to be secure at 98.19% in
CNN and 93.41% in LSTM. Lastly, in the cross-lingual domain of Canadian English,
namely 93.25%, we found 98.19% using the CNN and LSTM models.
1 Introduction
Research into multilingual and cross-lingual audio emotion analysis seeks to provide
methods and algorithms for detecting and interpreting emotional information in voice
data across various languages. There are many potential uses for emotion analysis
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 171
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_15
172 S. Bhattacharya et al.
in fields including medicine, education, customer service, and even the entertain-
ment industry. Recent years have seen substantial advancements in the accuracy and
robustness of emotion detection in spoken language, thanks to deep learning models,
particularly recurrent neural networks (RNNs) [1]. Models that can reliably detect
and interpret emotions in voice data across many languages are the goal of multilin-
gual and cross-lingual audio emotion analysis. It is difficult because some languages
may not have enough training data, while others may have emotional expressions
and cultural conventions that differ significantly from English. Emotions can be
represented in many ways, including intonation, vocabulary choice, and grammat-
ical constructions, but these vary from language to language. One form of deep
learning valuable model in voice emotion recognition is recurrent neural networks
(RNNs) [2]. RNNs capture the temporal relationships and long-term dependencies
in sequential data because of their architecture for handling such data. It includes
speech signals and text. While emotions are generally expressed over time and may
be impacted by earlier events or context, this is especially crucial for speech emotion
identification. Several methods using RNNs to analyse emotional content in audio
have been developed for use with languages other than English. Using a single model
that can process different languages is one option. A multilingual RNN is one such
model. A model of this type may be taught to recognise emotions in any language by
being exposed to data in those languages. Another strategy involves creating indi-
vidual models for each language and developing ways to share information, like
transfer learning or multitask learning. It allows the models to learn from each other
and share characteristics, increasing their performance across languages [3].
The need for annotated data in some languages is a problem for multilingual
and cross-lingual audio emotion analysis. Training and assessing machine learning
models need annotated data, which can be challenging and expensive. Using unsuper-
vised or semi-supervised learning methods, which can make use of either completely
unlabelled data or only a small quantity of labelled data, is one way to tackle this
problem. Other strategies include unsupervised pre-training, in which the model
is initially trained on a large body of unlabelled data before fine-tuning a smaller
amount of labelled data. Consideration of cultural differences in emotional expres-
sion is another obstacle in multilingual and cross-lingual audio emotion analysis [4].
Cultural norms about the display of emotion vary from one society to the next. It is
possible to tackle this issue by incorporating cultural knowledge or creating models
that can adapt to varied cultural contexts.
Finally, cross-cultural, and multilingual audio emotion analysis is a promising
field with numerous practical applications. There are several methods for creating
models that can process many languages, and RNNs are a potent tool for speech
emotion recognition. However, several obstacles must be overcome, such as more
annotated data for languages and the requirement to consider cultural variations
in effect. More thorough and reliable models for assessing emotions in spoken
language across languages and cultures will require further study in this field.
Detecting emotion is one of today’s most critical marketing strategies. We could
see a person’s feelings from their speech. Speech emotion recognition was a tech-
nology that extracted emotional features from speech signals and compared and
Multilingual and Cross Lingual Audio Emotion Analysis Using RNN 173
2 Related Work
In 2021, Saad et al. [7] used the TESS database to analyse language-independent
vocal features in Bengali and English. They made a verbatim translation of the two
languages using a sample of “Say the word Read” and “Poro” should “ti bolo.” In
this work, support vector machine is incorporated, and 50 audio samples are for
six different emotions: happy, angry, neutral, sad, disgusted, and fear. The overall
recognition rate for Bangla, English and Canadian English TESS was 88.3%, 85%,
and 93.3%, respectively. Dupuis et al. [8] conducted this experiment with 56 under-
graduate students from the University of Toronto in 2011. They listened to speeches
delivered by both young and old speakers and identified the emotion of the speakers.
Overall, the accuracy was 82%. The Ravdess database includes eight emotions:
happy, sad, angry, calm, fearful, surprised, neutral, and disgusted. The following
research has been conducted using the same. In 2016, Shegokar and Sircar [9] used
the Ravdess database to create a quadratic SVM using a five-fold cross-validation
technique. Their accuracy rate was 60.1%, with the limitation in the male sample
speech along with the use of selected features in SVM. In 2016, Zhang et al. [10]
proved that accuracy could be 57.14% if the dataset uses a song-to-speech relation-
ship using group multitask features using only four emotions: angry, happy, neutral,
and sad. In 2017, Zhang [11] used spectrograms from songs and speech to multitask
gated residual networks (GResNets), claiming the model was task-specific with an
accuracy of 65.97%.
In 2017, Popova [12] used convolutional neural network VGG-16 as a classifier;
they obtained 71% accuracy when Mel-spectrogram was taken from the speech. The
German Emo-DB database presents the emotions of boredom, anger, sadness, fear,
disgust, neutrality, and happiness, and the number of the audio file in it is 535. In 2010,
174 S. Bhattacharya et al.
Luengo et al. [13] used the German Emo-DB database to use the features of spectral,
intonation, and intensity regression features, sentence end, voice quality features,
statistics and speech rate. The accuracy rate of their research was 78.30%, despite
the database having 535 utterances at 8 kHz with 16 bits per sample, rather than
the original 16 kHz. According to Wu et al. [14], modulation spectral features and
prosodic features use multiclass linear discriminant analysis (LDA) classifiers with
an accuracy rate of 85.8%. On the other hand, in 2012 Lampropoulos and Tshisintzis
[15] found that combining MPEG-7 descriptors, MFCC, and timbral using SVM
with RBF kernel leaves out one equation that results in 83.93% accuracy. In 2014,
Pohjalainen et al. [16] used the first derivative of MFCC and the second derivative
of MFCC. The GMM classifier can produce an accuracy rate of 68.49%, and they
claim that improvement can be possible if the training data is selected at the time of
training of the GMM-based model. In 2014, Huang et al. [17] used a CNN input layer
along with the last layer with an SVM classifier utilising a spectrogram and found
that 88.3% accuracy is concluded while including the speaker and 85.2% accuracy
is concluded when the speaker is excluded.
In 2019 Latif et al. [18] works with Emo-Vo with features eGeMAPS and
algorithm used SVM, and accuracy was found to be 61.8 without data augmentation.
Haider et al. [19] worked with emboss features, eGeMAPs and SVM algorithm
for classification and found 80% accuracy.
3.1 Database
The TESS dataset consists of 2800 audio files of 200 utterances. Two female actors
aged between 26 and 64 years and seven emotions like neutral, surprise, pleasant
surprise, anger, disgust, happiness, and fear is exciting. The TESS database language
is Canadian English. All audio files are in wav format.
Multilingual and Cross Lingual Audio Emotion Analysis Using RNN 175
The Emo-Db database is a German emotional database comprising five male and five
female actors with 535 unique utterances. This database consists of seven emotions
neutral, disgust, sadness, boredom, anger, joy, and fear. All the data is in 48 and
16 kHz sample rates, and all the information is in wav format.
RAVDESS database consists of audio and video, both total files 2880, the total
number of actors is 24 (12 male and 12 female). We have chosen only audio files.
We worked on 1440 audio files in our experiment with 60 trials per actor. It contains
eight emotions, disgust, surprise, anger, fear, sad, calm, with two levels of expression
(normal and robust).
Several data augmentation techniques have been introduced, including white noise,
shifting sound, stretching sound, and changing the sound pitch have been proven in
our work. The database size gets four times using all these methods. We have trained
in augmented data and tested in original data in every mono-lingual, multilingual,
and cross-lingual experiment.
The effectiveness of machine learning models can be enhanced by using data
augmentation, which is an effective method for expanding the amount and variety of
a dataset. Methods for improving audio records are as follows.
Adding white noise to sound data can create the impression of traffic noise or other
forms of background noise. Adding a randomly generated noise signal to the source
audio would accomplish this. When we talk about “time shifting,” we’re referring
to the practice of modifying the beginning or end of sound transmission, e.g., a new
start or ending time can be chosen randomly, and the audio signal can be trimmed or
padded as needed. Sound stretching is altering the length of an audio transmission
without altering its pitch. Time stretching algorithms, which change the playback
speed of the audio signal, can be used for this purpose.
A signal’s frequency content must be modified to modify the pitch of a sound.
Pitch-shifting algorithms can be applied to the audio stream to alter its pitch without
affecting its pace or timing.
These methods can be used singly or in tandem to provide a significant and varied
dataset for use in machine learning. Sound data can also be improved using other
ways, including filtering, equalisation, and modulation.
3.3.1 MFCC
resulting spectral information onto this scale. The coefficients are calculated by
discrete cosine transformation of the Mel-spectrum after it has been translated into
the cepstral domain (DCT). The initial step in computing MFCCs is to pre-emphasise
the signal by amplifying its high-frequency components to increase the signal-to-
noise ratio of the spoken signal. After that, a window function is applied to each
overlapping signal frame to lessen the spectral leakage representation of the spectral
properties of the spoken stream.
Speech recognition, speaker identification, and music genre categorisation are just
some audio processing tasks that have significantly benefitted from using MFCCs.
In addition, they are helpful for classification tasks that rely on machine learning
because they offer a compact and robust representation of the spectral features of the
speech signal.
Overall, in speech and audio signal processing, Mel-frequency cepstral coeffi-
cients (MFCCs) are a standard feature extraction method. They are helpful for anal-
ysis and classification because they offer a condensed representation of a signal’s
spectrum characteristics.
MFCC features very well in identifying monosyllabic words and spoken
sentences. Moreover, the MFCC spectrogram can identify the pattern by which we
can quickly identify words, emotions, etc. Speech signals commonly contain tones
of varying frequencies, each style with an actual frequency, f (Hz), and the subjec-
tive pitch is computed on the Mel scale. They were converted into a Mel-spectrum
of a speech. First and foremost, a noise reduction technique occurs, framing after
that windowing, then converting speech into fast Fourier transformation, after that
long energy computation, then converted into discrete cosine transformation and
converted into Mel-spectrum. Fast Fourier transform (FFT) calculations are then
used to determine each frame’s power spectrum, filtered through a series of triangles
to produce a Mel scale representation. The filter bank mimics the limited frequency
range of the human ear, which means it works best at lower frequencies. When the
spectrum has been converted to the Mel scale, the logarithm of each filter bank’s
output is calculated, and the discrete cosine transform (DCT) is then performed on
the resulting coefficients. The estimated cepstral coefficients are considered [22, 23].
Convolutional neural networks (CNNs) excel at the visual recognition of still and
moving images. Convolution detects graphic patterns. Pooling layers minimise
extracted features, and convolution uses a filter or kernel to remove edges, textures,
and shapes. The output is then enhanced with a nonlinear activation function like
ReLU. CNNs use convolutional, pooling, and fully connected layers. Convolutional
layers extract picture features. Fully combined layers identify feature spatial dimen-
sions. The CNN optimises filter weights and bias settings during training to get the
178 S. Bhattacharya et al.
most accurate output for any input image. Backpropagation communicates the differ-
ence between the predicted and actual output from layer to layer, updating weights
and biases.
CNNs excel at object detection, facial recognition, and handwritten digit recog-
nition. They are also employed in natural language processing, speech recognition,
and drug creation. CNNs’ ability to automatically learn features without feature engi-
neering is advantageous. They can also handle hazy, distorted, and different-sized
photographs.
CNNs are effective, but they require a lot of data and computing. In addition,
adversarial attacks, in which even minor changes to a picture might fool the network,
may also affect them. These reasons make convolutional neural networks a popular
and successful deep learning model for image and video recognition. They excel in
finding objects and patterns in dense visual data by automatically learning valuable
attributes from input pictures.
Convolution neural networks have three fundamental elements besides the input
layer. The convolution layer provides a feature map with classed features. The pooling
layer’s principal task is down sampling pixels, while max pooling keeps the strongest
pixel. Finally, the flattening layer flattens the previous layer’s output into a vector.
LSTM is a particular version of the RNN model which solves short-term memory
problem. LSTM is a combination of long-term and short-term memory, a combination
of forget gate, input gate, and output gate. Forget gate works with a sigmoid function,
and the input and output gate also work with tanh and sigmoid functions. Forget gate
and replace the keyword. Forget gate is used to take hidden input cell (h(t−1)) and
previous cell x(t). Input gate adds keyword h(t−i), and x(t) is cell state; this has been
done by tanh with the −1 to +1. With the regulatory filter multiplied by a sigmoid,
create a vector with the function. The output gate sent it to the hidden channel for
further use. Output gates are broken into three steps:
• Creating a filter
• Multiplying with the filter
• Transferring them from the secret cell to the next cell Fig. 1 represents a diagram
of the LSTM model.
The design of our proposed LSTM model represents in Fig. 2. The initial layer
consists of 512 filters, kernel size 5, stride one, batch normalisation, and ReLu
applied. The dropout rate is applied at 0.1. The next onward shape is 25,612,864. We
consider 1D data with three dense layers, and the optimiser used stochastic gradient
Multilingual and Cross Lingual Audio Emotion Analysis Using RNN 179
descent (SGD) with a momentum of 0.9 and a decay le-6. According to our clas-
sification goal, we set our dense layer. The layer consists of filters to change input
value data hypermeter that consists of filter size (F) and stride (S); the output name
features map. In this work, we have used 40X1 with hand-picked features of MFCC.
Figure 3 represents a proposed model framework in which we have only shown
the RNN model (LSTM) in the first step; we have processed audio and extracted
features from them. We have only used MFCC features. After that, we have chosen
k-fold cross-validation for the test and train set. After this, we sent these train and
test data into the LSTM model selected and classified into emotions.
model, 93% of the original and 98.96% of the expanded databases were present.
In the Emo-Db emotional database, the total number of utterances is 535, after the
total augmentation number of phrases is 2140, training data is 1712, and the test
with original data is 107. In Emo-Db, we experimented with CNN. In the initial
database, we got 70.37%; in the augmented database, we got 92.96%. An LSTM
experiment produced 71.27% in the original database and 85.38% in the expanded
database. In the RAVDESS database, the total number of speech utterances in 1440
after augmentation was 5760, training data for augmentation was 4608 and testing
data 288 statements from 20% of original data. We experimented with CNN in the
RAVDESS dataset. We obtained 70.46% in the original dataset and 91.80% in an
augmented dataset. In the LSTM model, we have secured 75.35% in the original
dataset and 90.38% in the expanded dataset. There are 585 utterances in the Emo-
Vo emotional database. After data augmentation, the total number of files is 2340,
and the test data is 117 from 20% of the original dataset. We have experimented
with the Emo-Vo dataset and found 73.47% in CNN with the original dataset and
90.87% in the augmented dataset. In the multilingual database, we have only shown
English languages in training and testing with all languages. The number of utterances
used for training was 4608 in the augmented dataset and testing with all languages
original 20% of data, which is 1172, in the multilingual database. We have done some
experiments with CNN and LSTM models; two experiments have been done with
the original and augmented models. In the CNN model, 72.59% and 98.19% for data
augmentation and 87.88% and 93.41% also got LSTM in original and augmented
datasets, respectively. Training has been done for all the linguistic databases and
testing with individual linguistic datasets in the cross-lingual database. Training
for augmented data is 16866 and testing with individual languages database like
for German 107 utterances has been used for testing purposes. Similarly, French
100, Italian 117, and Canadian English 560 were used for the experiment. In the
cross-lingual dataset many experiments have been done for the cross-lingual original
dataset using CNN and LSTM training with all languages of the emotional database.
Best with individual languages like German, English, Italian, and Canadian English
as accuracy found for CNN is 68.25%, 70.38%, 67.35%, and 90.25%, respectively,
and for LSTM, 52.87%, 65.25%, 70.21%, 87.88%, respectively. Next, we have also
done augmented experiments in the cross-lingual database in both model CNN and
LSTM, with accuracy, found 78.12%, 78.04%, 72.25%, and 93.25% for CNN model
and 72.18%, 87.88%, 92.24%, 98.12%, respectively. Table 2 represents the result of
mono-lingual, multilingual, and cross-lingual experiment results.
We have chosen a mono-lingual database for our general studies and compared it with
some potential work of different emotional databases (mono-lingual). In the TESS
182 S. Bhattacharya et al.
Table 2 Represents all the observations based on mono-lingual, multilingual, and cross-lingual
experiments
Linguistic Database Database Train Test Model Epochs Accuracy
type (%)
Mono-lingual RAVDESS Original English English CNN 100 70.46
Augmented 91.80
Original LSTM 150 75.35
Augmented 90.38
Emo-Db Original German German CNN 100 70.37
Augmented 92.76
Original LSTM 150 71.27
Augmented 85.38
EMO-Vo Original Italian Italian CNN 100 73.47
Augmented 90.87
Original LSTM 150 70.56
Augmented 92.62
TESS Original Canadian Canadian CNN 100 92
Augmented English English 99.12
Original LSTM 150 93
Augmented 98.96
Multilingual Multilingual Original English ALL CNN 100 72.59
Database Augmented 98.19
Original LSTM 150 87.88
Augmented 93.41
Cross-lingual Cross-lingual Original ALL Italian CNN 100 67.33
database Augmented 72.25
Original LSTM 150 70.21
Augmented 92.24
Original ALL Canadian CNN 100 90.25
Augmented English 93.25
Original LSTM 150 87.88
Augmented 98.12
database, Saad et al. performed slightly better than our proposed LSTM model;
Fig. 4 represents a comparison study in a mono-lingual experiment in TESS data. In
RAVDESS data, our proposed model outperforms other models, Fig. 5 represents a
comparison study in the RAVEDSS database. In the Emo-DB database, our model
outperforms all models except Hung et al. Figure 6 illustrates a comparison study
between all potential works of the Emo-VO database. Finally, Fig. 7 represents a
Multilingual and Cross Lingual Audio Emotion Analysis Using RNN 183
comparison study between all possible models based on the Emo-Vo database; our
model outperforms other models.
92.62
Our model LSTM
%
61.80%
Latif et al.[18]
In cross-lingual database analysis, we have trained with all databases and tested with
different individual databases. Our proposed LSTM model performs training with
all emotional database with original training file size 4288 and testing with original
data 117 Emo-Vo (Italian database) files. Similarly, in the augmented cross-lingual
experiment, we have chosen 17,152 as training data and 117 for testing purposes.
Multilingual and Cross Lingual Audio Emotion Analysis Using RNN 185
Tables 5 and 6 represent the original and augmented experiment for the cross-lingual
experiment (training with all databases and testing with the Emo-VO database). Our
LSTM model outperforms CNN, which we have mentioned in Table 2.
Another cross-lingual experiment has been done with the TESS database where
training with all databases with original training data (3952) and testing with actual
data (560), and for augmented experiment training with 17,152 and testing with 560
utterances. Tables 7 and 8 represent classification reports of original and augmented
cross-lingual experiments based on the TESS database.
Table 7 Represents classification report of cross-lingual experiment (testing with TESS) in original
data
Precision Recall F1-score Support
Anger 0.88 0.89 0.85 89
Disgust 0.82 0.81 0.80 108
Fear 0.85 0.81 0.87 117
Sad 0.83 0.85 0.84 126
Happy 0.85 0.84 0.85 120
Avg/total 0.87 560
186 S. Bhattacharya et al.
6 Conclusions
References
1. Le XH, Ho HV, Lee G, Jung S (2019) Application of long short-term memory (LSTM) Neural
network for flood forecasting. Water 11(7):1387. https://doi.org/10.3390/w11071387
2. Sherratt F, Plummer A, Iravani P (2021) Understanding LSTM network behaviour of IMU-
based locomotion mode recognition for applications in prostheses and wearables. Sensors
21(4):1264. https://doi.org/10.3390/s21041264
3. Janse PV, Magre SB, Kurzekar PK, Deshmukh RR (2014) A comparative study between MFCC
and DWT feature extraction technique. Int J Eng Res Technol 3
4. Sen S, Dutta A, Dey N (2019) Speech processing and recognition system. In: Audio processing
and speech recognition. Springer, Singapore, pp 13–43
5. Bhattacharya S, Borah S, Mishra BK, Das N (2022) Deep analysis for speech emotion recog-
nization. In: 2022 second international conference on computer science, engineering and
applications (ICCSEA), Gunupur, India, 2022, pp 1–6. https://doi.org/10.1109/ICCSEA54677.
2022.9936080
6. Bhattacharya S, Das N, Sahu S, Mondal A, Borah S (2021) Deep classification of sound:
a concise review. In: Patil VH, Dey N, Mahalle P, Shafi Pathan M, Kimbahune VV (eds)
Proceeding of first doctoral symposium on natural computing research. Lecture notes in
networks and systems, vol 169. Springer, Singapore. https://doi.org/10.1007/978-981-33-407
3-2_4
7. Saad F, Mahmud H, Ridwan Kabir M, Alamin Shaheen M, Farastu P, Kamrul Hasan M (2021) A
case study on the independence of speech emotion recognition in Bangla and English languages
using language-independent prosodic features. ArXiv E-Prints, arXiv:2111.10776. https://doi.
org/10.48550/arXiv.2111.10776
8. Dupuis K, Pichora-Fuller MK (2014) Intelligibility of emotional speech in younger and older
adults. Ear Hear 35(6):695–707
9. Shegokar P, Sircar P (2016) Continuous wavelet transform based speech emotion recogni-
tion. In: 2016 10th international conference on signal processing and communication systems
(ICSPCS). IEEE, pp 1–8
10. Zhang B, Provost EM, Essl G (2016) Cross-corpus acoustic emotion recognition from singing
and speaking: A multi-task learning approach. In 2016 IEEE International conference on
acoustics, speech and signal processing (ICASSP) (pp 5805–5809). IEEE
11. Zhang S, Zhang S, Huang T, Gao W, Tian Q (2017) Learning affective features with a hybrid
deep model for audio–visual emotion recognition. IEEE Transactions on Circuits and Systems
for Video Technology, 28(10):3030–3043
12. Popova OV (2017) To the issue of culturological approach to professional speech training
targeted for the future translators of Chinese 2017
13. Luengo I, Navas E, Hernáez I (2010) Feature analysis and evaluation for automatic emotion
identification in speech. IEEE Trans Multimedia 12(6):490–501
14. Wu S, Falk TH, Chan W-Y (2011) Automatic speech emotion recognition using modulation
spectral features. Speech Commun 53:768–785
15. Lampropoulos AS, Tsihrintzis GA (2012) Evaluation of MPEG-7 descriptors for speech
emotion recognition. In: 2012 eighth international conference on intelligent information hiding
and multimedia signal processing (IIH-MSP). IEEE, pp 98–101
16. Pohjalainen J, Alku P (2014) Multi-scale modulation filtering in automatic detection of
emotions in telephone speech. In: 2014 IEEE international conference on acoustics, speech,
and signal processing (ICASSP). IEEE
17. Huang Z, Dong M, Mao Q, Zhan Y (2014) Speech emotion recognition using CNN. In:
Proceedings of the 22nd ACM international conference on multimedia (2014), pp 801–804
18. Latif S, Qadir J, Bilal M (2019) Unsupervised adversarial domain adaptation for cross-lingual
speech emotion recognition. In: 2019 8th international conference on affective computing and
intelligent interaction (ACII), pp 732–737
19. Haider F, Pollak S, Albert P, Luz S (2020) Emotion recognition in low-resource settings: an
evaluation of automatic feature selection methods. Comput Speech Lang 65:101119
188 S. Bhattacharya et al.
20. Bhattacharya S, Borah S, Mishra BK, Mondal A (2022) Emotion detection from multilingual
audio using deep analysis. Multimedia Tools Appl
21. Jiang D-N, Lu L, Zhang H-J, Tao J-H, Cai LH (2002) Music type classification by spec-
tral contrast feature. In: 2002 IEEE international conference on multimedia and Expo, 2002
(ICME’02), vol 1. IEEE, pp 113–116
22. Dey N, Ashour AS (2018) Sources localization and DOAE techniques of moving multiple
sources. In: Direction of arrival estimation and localization of multi-speech sources. Springer,
Cham, pp 23–34
23. Sen S, Dutta A, Dey N (2019) Audio indexing. Audio processing and speech recognition, pp
1–11
Multi-modality Brain Tumor
Segmentation of MRI Images Using
ResUnet with Attention Mechanism
Abstract Brain tumors occur when abnormal cells grow within the brain. They
can put pressure on healthy parts of the brain or spread into those areas. Early and
prompt disease detection and diagnosis boost these individuals’ life expectancy. The
most popular technique for visualizing important brain areas is magnetic resonance
imaging (MRI). There are different modalities in magnetic resonance images and they
differ in contrast and function. Four modalities of magnetic resonance images are: T1,
T1-CE (contrast enhanced), T2 (spin–spin relaxation), and flair. In MRI, separation
of different tumor tissues from normal tissues is termed as segmentation process. An
MRI tumor must be manually segmented, which takes a lot of computation and could
produce inaccurate results. Consequently, segmenting brain tumors requires the use
of automated procedures. In this study, we propose a pipeline of preprocessing tech-
niques and a deep learning model for segmenting brain tumors, thereby enhancing
the capability of automated algorithms to support doctors in clinical diagnosis.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 189
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_16
190 A. Verma et al.
1 Introduction
The human brain is a hugely complicated organ. Brain tumors arise when normal
healthy cells develop mutations in their DNA structures. These affected cells then
keep multiplying and survive even when all healthy cells die. They are far more
noticeable in children and old people. Having said that, they are one of the most diffi-
cult diseases to treat and have more chances of being cured if detected and segmented
early on. Employing a method that uses deep learning, the model keeps continuously
learning and hence potentially helps in more accurate and better decision-making than
an experienced neurosurgeon and be of a great clinical application. Brain tumors are
very heterogeneous in size, shape, location, etc. MRI also involves a lot of noise and
boundaries are also very irregular. Furthermore, there is a class imbalance problem
which makes it even more challenging to segment the brain tumor.
There have been numerous attempts to address this issue over the last few years.
Babu et al. [1] proposed using LSM and CV model with active contour segmentation
method. Unet is a very common and useful approach which could either be used
after skull stripping [2] or after data augmentation [3] or with a multiscale module
[4]. Unet architecture could also be modified using CNN [5] or the number of layers
could be decreased to reduce complexity of the model [6]. Abou Elenein et al.
[7] employed an encoder–decoder algorithm using pyramid pooling network. Wang
et al. [8] used a transformer to provide feature embeddings for the CNN decoder
(TransBTS). Brain tumor segmentation can also be done using the genetic algorithm
[9] or by transforming the 3D brain into a solid unit ball and then a cube followed
by CNN [10].
We wish to achieve two main objectives. The first objective is to handle class
imbalance in the dataset. The second objective is to segment a 3D volume into a
mask where labels represent different regions such as whole tumor, tumor core, and
enhancing tumor. Both objectives are explained in the proposed method section.
This paper includes five sections in total: literature review in Sect. 2, proposed
methodology in Sect. 3, results and discussion in Sect. 4, and conclusion and future
scope in Sect. 5.
2 Literature Review
The last several decades have seen a rise in research on computerized brain tumor
segmentation, indicating a growing interest in this area of still-developing study.
This section discusses a few of the current techniques for segmenting brain tumors.
The arduous and error-prone process of manually segmenting brain tumor is tedious.
Researchers have suggested numerous automated approaches to address these issues.
Magadza et al. [11] conducted a study of advanced deep learning techniques
for segmenting brain tumors, emphasizing their key components, and different
approaches along with critical review of open challenges in medical image analysis.
Multi-modality Brain Tumor Segmentation of MRI Images Using … 191
Atiyah et al. [12] suggested using Unet encoder with EfficientNet-B7 architecture.
In their proposed method they made use of four Nvidia P40 GPU’s and attained great
accuracy. In addition to fusion loss function, Zhou et al. [13] used a powerful 3D
residual neural network (ERV-Net). Shan et al. [14] proposed a depth-wise CNN to
save computational resources and to combine features from various receptive fields.
Mlynarski et al. recommended utilizing both fully and poorly labeled training data
whereas Díaz-Pernas et al. used a deep convolution method with multiscale approach
which could analyze three types of tumors [15, 16]. Tiwari et al. [17] performed a
review on different segmentation techniques for brain tumors and gave a detailed
comparison between them. A preprocessing strategy was put out by Ranjbarzadeh
et al. to reduce time complexity and solve the overfitting issue. Along with that,
they included a distance-wise attention (DWA) mechanism, although it was limited
when it came to tumors that covered more than one-third of the entire brain [18]
whereas Naser et al. [19] utilized Unet, transfer learning and a completely integrated
classifier. In 2021, Khan et al. [20] made use of K-means clustering for brain tumor
segmentation using the BraTS 2015 benchmark datasets with better accuracy than
previously reported methods. Long short-term memory (LSTM) and ConvNet are
combined which enhances the results by using edge enhancement, noise reduction,
histogram equalization, and Laplacian of Gaussian filtering [21–23].
A lot of research has been performed by using the CNN and RCNN architectures
along with decoder blocks and UNet encoders [24–26]. Yogananda et al. created a
three-group framework using the 2019 BraTs dataset and each group consisted of
three 3D-dense-Unets [27]. Sajid et al. [28] suggested a preprocessing stage in which
3D MR images are converted to 2D slices to keep constant dimensions and made
use of two-path, three-path, and hybrid CNN and controlled the overfitting problem.
Yang et al. [29] developed an autonomous segmentation technique (RF and SK-TP
CNN) to improve the capability of nonlinear mapping. Using the BraTs 2015 and
2021 datasets, Elmezain et al. [30] developed a method in 2022 by combining the
deep capsule network with the latent-dynamic conditional random field.
Wang et al. [31] introduced a 2.5D network that bridges the gaps of memory
consumption, model complexity, and receptive field. The analysis was done on BraTs
2017 and 2018 datasets. Chen et al. [32] introduced a number of layers with a
perceptron-based method to enhance the performance. Wu et al. [33] put out a deep
CNN neural network fusion support vector machine approach in which the model
was run on the BraTs and a custom dataset in three phases. With deep learning-based
selective attention, Akil et al. [34] developed a method employing contiguous regions
and multiclass weighted cross-entropy. Zhao et al. reviewed different methods with
DNN on the BraTs 2019 dataset whereas Zhang et al. in their research, came up
with a powerful hybrid clustering technique coupled with morphological procedures
to reduce noise sensitivity and enhance segmentation stability using fuzzy C-means
algorithm to segment images [35, 36]. Biratu et al. [37] suggested an enhanced region-
growing technique using skull stripping for efficient seed point initialization. Jiang
et al. [38] proposed a novel edge extraction algorithm and self-adaptive balancing
class weight coefficient to solve the class imbalance problem which further achieved
better performance.
192 A. Verma et al.
Some of them have used the FLAIR MRI data whereas most of the work is done
on the BraTs dataset [39, 40].
The above review shows that the Unet model was predominant compared with
other models but has certain limitations which can be overcome.
3 Proposed Methodology
Figure 1 depicts the stages of the experiment which contains five steps. It includes
preprocessing of MRI volumes as the first stage followed by data augmentation. The
next stage explains the patching strategy to which ResUnet-A model was applied.
The trained model is assessed using a variety of performance measures in the last
phase.
The range of pixel intensity values is altered through normalization. All values were
normalized in range 0–1. Resampling resizes an image according to the desired voxel
spacing. This technique was used to change the number of voxels per mm. Each input
volume was resampled with voxel spacing of 1.62, 1.62, and 1.62. Patching refers
to dividing a large volume into a smaller set of volumes. The size of each 3D patch
is (64, 64, 64). The concept of overlapping patches is borrowed from Akil et al.
[34]. The overlapping patches will produce five predictions per patch, where the first
pixel’s forecast will be influenced by the predictions of the remaining four pixels and
vice versa for each voxel. Therefore, even when using small patches, architectures
are still able to categorize and determine the overall context. The size of overlapping
patches for our experiment was (32, 32, 32).
There were six augmentation methods used: scaling, rotations, elastic deforma-
tion, mirroring, brightness, and Gaussian noise. The probability of data augmentation
per sample was 0.8.
Figure 2 depicts the overall architecture of the model with dimensions of the feature
map in each layer. Since patch size of (64, 64, 64) was used with two modalities, the
input layer has dimension of (64, 64, 64, 2). The final layer has dimension of (64,
64, 64, 4) since there are four labels to be predicted.
Figure 3a depicts the residual convolution block which consists of three sets of
(Conv. + Batch Norm.) layers. The first two convolution operations are computed
with kernel size of (3, 3, 3) while the third convolution operation is computed with
a kernel size of (1, 1, 1). A skip connection is used to merge the feature map of
convolution with different kernel sizes, so that both larger and smaller features are
accounted in the feature map. The three sets are followed by ReLu activation function.
Using a max pooling layer with a pool size of 2, the dimension of the feature map is
then decreased. The middle layer, which connects the encoder and decoder layers, is
shown in Fig. 3b. It consists of two sets of (Conv. + Batch Norm.) layers with a kernel
size of (3, 3, 3). Figure 3c depicts the gating signal which contains a convolution
layer with kernel size of (1, 1, 1) followed by batch normalization layer. The aim of
the gating signal is to return the gating feature map with the same dimension as of
the upper layer feature map.
Figure 4a depicts the attention block which is used to focus more on important
features rather than non-useful background information. It follows two pathways. The
first path involves the use of the gating output which then undergoes a convolution
operation with a kernel size of (1, 1, 1) and stride 1. Only a convolution operation
with a kernel size of (3, 3, 3) with a stride of 2 is used in the second path. The
two-path feature map is then concatenated. The feature map is upsampled by size of
2 as it is a part of the decoder. After feature extraction, the dimensions of the volume
must be reduced which can be accepted by the attention block. Figure 4b depicts the
upsampling block which concatenates the previous downsampled layer feature map
with the attention feature map followed by two sets of (Conv. + Batch Norm.) layers
with a kernel size of (3, 3, 3).
The dataset consisted of 369 folders where each sample includes four modalities, i.e.,
t1, t2, t1ce, and flair in the nifti format. Classes in the mask comprise the enhancing
tumor: 4, necrotic and non-enhancing tumor core: 1, peritumoral edema: 2, and
non-tumor: 0. After preprocessing 189 samples were filtered.
Multi-modality Brain Tumor Segmentation of MRI Images Using … 195
In this experiment, two modalities, i.e., t1ce and flair were used for the segmentation
process. Both modalities are cropped to (128, 128, 128) since the majority of the
useful volume lies in the centrally cropped region of slices. Each modality was
concatenated, resulting in a shape of (128, 128, 128, 2). The segmented mask was
cropped to size of (128, 128, 128). The volumes which consist of less than 1% useful
masks are discarded to save computational resources. Total number of volumes after
preprocessing was 189.
MRI data was split into two folders, i.e., the training and testing sample, where
84% of volumes were used for training and 16% was used for testing. Adam optimizer
with an initial learning rate of 10−4 was used. The batch size was 2. The training
included two-fold cross-validation with a total of 200 epochs. The implementation
was done in Python using the MISCNN package and Keras. Nvidia Tesla V100 GPU
was used for training. The regions for evaluation were whole tumor (labels 1, 2, 4),
tumor core (labels 2, 4), and enhancing tumor (label 4).
The loss function is the sum of dice coefficients and the cross-entropy called dice
cross-entropy loss [41] was used to achieve the best results.
2 i⊆l u i vi
k k
ldc =− (1)
K k⊆K i⊆l u ik + i⊆l vik
The loss function is written above with u representing the softmax output and v
representing the GT’s single hot encoding. I is the number of voxels in the training
batch, and K is the total number of labels.
TP + TN
Accuracy = (2)
TP + TN + FP + FN
TP
Precision = (3)
TP + FP
2 × TP
Dice Score = (4)
2 × TP + FP + FN
Sensitivity: It is a measure of how many true positives are predicted out of all actual
positives.
TP
Sensitivity = (5)
TP + FN
In this experiment, the ResUnet-A model along with dice cross-entropy loss function
was used. The residual block allows the architecture to extract both smaller and
larger features using deep layers without degrading performance. A good score was
achieved in enhancing tumor region, i.e., minority class because only those samples
were chosen which had more than 1% useful mask. Since ResUnets generally take
more time and effort to train, two modalities were experimentally chosen to reduce
training and inference time to make the solution practically efficient.
Table 1 gives that a high accuracy of 98.93% was achieved with a precision of
86.73%, while the sensitivity of predicting all labels was 88.56%. Table 2 depicts
the region-wise scores for semantic segmentation.
Table 3 depicts the comparison of existing models with the proposed method
which uses the BraTs dataset for training and evaluation. In comparison, the proposed
method yields a greater dice score for the enhancing tumor region, however it yields
a decent score for the tumor core and whole tumor regions. Higher score could have
been achieved if the number of fold for cross validation was increased. Fig. 5 below
illustrates a sample of the predicted tumor area.
In this paper, a method to segment brain tumors into four classes and evaluate the
results based on three regions was proposed. We selected only those samples which
were meaningful for this problem and which resulted in less training time for the
model. The proposed pipeline for volumetric segmentation trains on smaller but
important samples and gives satisfactory results. The ResUnet-A model with dice
cross-entropy loss function was used for training. We also used the concept of overlap-
ping patches to get better results. The results could have been further enhanced if the
number of folds in cross validation were increased. It should be noted that the experi-
ment results were not validated on the online platform. We have shown that reducing
the volume size with only two modalities and selecting only important samples
does not affect the results significantly. This approach performs memory-efficient
segmentation that can help radiologists with faster diagnosis and treatments.
For future work, we intend to perform 3D tumor classification which will predict
whether the tumor is non-cancerous or cancerous.
Acknowledgements We are thankful to the Department of Computer Engineering and IT, COEP
Tech. for providing GPU server facility to implement this work. This facility was established under
TEQIP-III (A World Bank Project).
198 A. Verma et al.
References
1. Babu KR, Indira ND, Prasad KV, Shameem S (2021) An effective brain tumor detection from
t1w MR images using active contour segmentation techniques. J Phys Conf Ser 1804(1):012,
174. https://doi.org/10.1088/1742-6596/1804/1/012174
2. Rao N, Reddy DLS, Gujja H (2022) Brain MRI segmentation binary u-net based architecture
using deep learning algorithm. https://doi.org/10.21203/rs.3.rs-1916275/v1
3. Ottom MA, Rahman HA, Dinov ID (2022) Znet: deep learning approach for 2d MRI brain
tumor segmentation. IEEE J Transl Eng Health Med 10:1–8. https://doi.org/10.1109/jtehm.
2022.3176737
4. Zhang F, Wu L, Wang Y, Yang Y, Li M, Li J, Xu Y (2022) A multi-scale brain tumor segmen-
tation method based on u-net network. J Phys Conf Ser 2289(1):012, 028. https://doi.org/10.
1088/1742-6596/2289/1/012028
5. Kajal M, Mittal A (2022) A modified u-net based architecture for brain tumour segmentation
on BRATS 2020. https://doi.org/10.21203/rs.3.rs-2109641/v1
6. Jena B, Jain S, Nayak GK, Saxena S (2022) Analysis of depth variation of u-NET architecture
for brain tumor segmentation. Multimedia Tools Appl. https://doi.org/10.1007/s11042-022-
13730-1
7. AboElenein NM, Piao S, Zhang Z (2022) Encoder–decoder network with depthwise atrous
spatial pyramid pooling for automatic brain tumor segmentation. Neural Process Lett. https://
doi.org/10.1007/s11063-022-10959-7
8. Wang W, Chen C, Ding M, Yu H, Zha S, Li J (2021) TransBTS: multimodal brain tumor
segmentation using transformer. In: Medical image computing and computer assisted inter-
vention—MICCAI 2021. Springer International Publishing, pp 109–119. https://doi.org/10.
1007/978-3-030-87193-2
9. Arif M, Jims A, Ajesh F, Geman O, Craciun MD, Leuciuc F (2022) Application of genetic
algorithm and u-net in brain tumor segmentation and classification: a deep learning approach.
Comput Intell Neurosci 2022:1–11. https://doi.org/10.1155/2022/5625757
10. Lin WW, Juang C, Yueh MH, Huang TM, Li T, Wang S, Yau ST (2021) 3d brain tumor
segmentation using a two-stage optimal mass transport algorithm. Sci Rep 11(1). https://doi.
org/10.1038/s41598-021-94071-1
11. Magadza T, Viriri S (2021) Deep learning for brain tumor segmentation: a survey of state-of-
the-art. J Imag 7(2):19. https://doi.org/10.3390/jimaging7020019
12. Atiyah AZ, Ali KH (2022) Segmentation of human brain gliomas tumour images using u-net
architecture with transfer learning. Diyala J Eng Sci 17–29. https://doi.org/10.24237/djes.2022.
15102
13. Zhou X, Li X, Hu K, Zhang Y, Chen Z, Gao X (2021) ERV-net: an efficient 3d residual neural
network for brain tumor segmentation. Expert Syst Appl 170:114, 566. https://doi.org/10.1016/
j.eswa.2021.114566
14. Shan C, Li Q, Wang CH (2022) Brain tumor segmentation using automatic 3d multichannelfea-
ture selection convolutional neural network. J Imaging Sci Technol 66(6):060, 502–1–060,
502–9. https://doi.org/10.2352/j.imagingsci.technol.2022.66.6.060502
15. Mlynarski P, Delingette H, Criminisi A, Ayache N (2019) Deep learning with mixed supervision
for brain tumor segmentation. J Med Imaging 6(03):1. https://doi.org/10.1117/1.jmi.6.3.034002
16. Díaz-Pernas FJ, Martínez-Zarzuela M, Antón-Rodríguez M, Gonzàlez-Ortega D (2021) A
deep learning approach for brain tumor classification and segmentation using a multiscale
convolutional neural network. Healthcare 9(2):153. https://doi.org/10.3390/healthcare9020153
17. Tiwari A, Srivastava S, Pant M (2020) Brain tumor segmentation and classification from
magnetic resonance images: review of selected methods from 2014 to 2019. Pattern Recogn
Lett 131:244–260. https://doi.org/10.1016/j.patrec.2019.11.020
18. Ranjbarzadeh R, Kasgari AB, Ghoushchi SJ, Anari S, Naseri M, Ben-dechache M (2021)
Brain tumor segmentation based on deep learning and an atten- tion mechanism using MRI
multi-modalities brain images. Sci Rep 11(1). https://doi.org/10.1038/s41598-021-90428-8
Multi-modality Brain Tumor Segmentation of MRI Images Using … 199
19. Naser MA, Deen MJ (2020) Brain tumor segmentation and grading of lower-grade glioma
using deep learning in MRI images. Comput Biol Med 121:103, 758. https://doi.org/10.1016/
j.compbiomed.2020.103758
20. Khan AR, Khan S, Harouni M, Abbasi R, Iqbal S, Mehmood Z (2021) Brain tumor segmentation
using k-means clustering and deep learning with synthetic data augmentation for classification.
Microsc Res Tech 84(7):1389–1399. https://doi.org/10.1002/jemt.23694
21. Iqbal S, Khan MUG, Saba T, Mehmood Z, Javaid N, Rehman A, Abbasi R (2019) Deep learning
model integrating features and novel classifiers fusion for brain tumor segmentation. Microsc
Res Tech 82(8):1302–1315. https://doi.org/10.1002/jemt.23281
22. Thillaikkarasi R, Saravanan S (2019) An enhancement of deep learning algorithm for brain
tumor segmentation using kernel based CNN with m-SVM. J Med Syst 43(4). https://doi.org/
10.1007/s10916-019-1223-7
23. Kumar MJ, Sai NR, Chowdary CS (2020) RETRACTED: an efficient deep learning approach
for brain tumor segmentation using CNN. IOP Conf Ser Mater Sci Eng 981(2):022, 012. https://
doi.org/10.1088/1757-899x/981/2/022012
24. Hossain T, Shishir FS, Ashraf M, Nasim MAA, Shah FM (2019) Brain tumor detection using
convolutional neural network. In: 2019 1st international conference on advances in science,
engineering and robotics technology (ICASERT). IEEE. https://doi.org/10.1109/icasert.2019.
8934561
25. Pitchai R, Praveena K, Murugeswari P, Kumar A, Bee MKM, Alyami NM, Sundaram RS,
Srinivas B, Vadda L, Prince T (2022) Region convolutional neural network for brain tumor
segmentation. Comput Intell Neurosci 2022:1–9. https://doi.org/10.1155/2022/8335255
26. Chang J, Zhang L, Gu N, Zhang X, Ye M, Yin R, Meng Q (2019) A mixpooling CNN architec-
ture with FCRF for brain tumor segmentation. J Vis Commun Image Represent 58:316–322.
https://doi.org/10.1016/j.jvcir.2018.11.047
27. Yogananda CGB, Wagner B, Nalawade SS, Murugesan GK, Pinho MC, Fei B, Madhuran-
thakam AJ, Maldjian JA (2020) Fully automated brain tumor segmentation and survival predic-
tion of gliomas using deep learning and MRI. In: Brainlesion: Glioma, multiple sclerosis, stroke
and traumatic brain injuries. Springer International Publishing, pp 99–112. https://doi.org/10.
1007/978-3-030-46643-510
28. Sajid S, Hussain S, Sarwar A (2019) Brain tumor detection and segmentation in MR images
using deep learning. Arab J Sci Eng 44(11):9249–9261. https://doi.org/10.1007/s13369-019-
03967-8
29. Yang T, Song J, Li L (2019) A deep learning model integrating SK-TPCNN and random forests
for brain tumor segmentation in MRI. Biocybernetics Biomed Eng 39(3):613–623. https://doi.
org/10.1016/j.bbe.2019.06.003
30. Elmezain M, Mahmoud A, Mosa DT, Said W (2022) Brain tumor segmentation using deep
capsule network and latent-dynamic conditional random fields. J Imaging 8(7):190. https://doi.
org/10.3390/jimaging8070190
31. Wang G, Li W, Ourselin S, Vercauteren T (2019) Automatic brain tumor segmentation based on
cascaded convolutional neural networks with uncertainty estimation. Front Comput Neurosci
13. https://doi.org/10.3389/fncom.2019.00056
32. Chen S, Ding C, Liu M (2019) Dual-force convolutional neural networks for accurate brain
tumor segmentation. Pattern Recogn 88:90–100. https://doi.org/10.1016/j.patcog.2018.11.009
33. Wu W, Li D, Du J, Gao X, Gu W, Zhao F, Feng X, Yan H (2020) An intelligent diagnosis
method of brain MRI tumor segmentation using deep convolutional neural network and SVM
algorithm. Comput Math Methods Med 2020:1–10. https://doi.org/10.1155/2020/6789306
34. Naceur MB, Akil M, Saouli R, Kachouri R (2020) Fully automatic brain tumor segmentation
with deep learning-based selective attention using overlapping patches and multi-class weighted
cross-entropy. Med Image Anal 63:101, 692. https://doi.org/10.1016/j.media.2020.101692
35. Zhao YX, Zhang YM, Liu CL (2020) Bag of tricks for 3d MRI brain tumor segmenta-
tion. In: Brainlesion: Glioma, multiple sclerosis, stroke and traumatic brain injuries. Springer
International Publishing, pp 210–220. https://doi.org/10.1007/978-3-030-46640-420
200 A. Verma et al.
36. Zhang C, Shen X, Cheng H, Qian Q (2019) Brain tumor segmentation based on hybrid clustering
and morphological operations. Int J Biomed Imaging 2019:1–11. https://doi.org/10.1155/2019/
7305832
37. Biratu ES, Schwenker F, Debelee TG, Kebede SR, Negera WG, Molla HT (2021) Enhanced
region growing for brain tumor MR image segmentation. J Imaging 7(2):22. https://doi.org/
10.3390/jimaging7020022
38. Jiang M, Zhai F, Kong J (2021) A novel deep learning model DDU-net using edge features to
enhance brain tumor segmentation on MR images. Artif Intell Med 121:102180
39. Zeineldin RA, Karar ME, Coburger J, Wirtz CR, Burgert O (2020) DeepSeg: deep neural
network framework for automatic brain tumor segmentation using magnetic resonance FLAIR
images. Int J Comput Assist Radiol Surg 15(6):909–920. https://doi.org/10.1007/s11548-020-
02186-z
40. Jun W, Haoxiang X, Wang Z (2021) Brain tumor segmentation using dual-path attention u-net in
3d MRI images. In: Brainlesion: Glioma, multiple sclerosis, stroke and traumatic brain injuries.
Springer International Publishing, pp 183–193. https://doi.org/10.1007/978-3-030-72084-1
41. Isensee F, Petersen J, Klein A, Zimmerer D, Jaeger PF, Kohl S, Wasserthal J, Koehler G,
Norajitra T, Wirkert S, Maier-Hein KH (2019) Abstract: nnU-net: self-adapting framework
for u-net-based medical image segmentation. In: Informatik aktuell. Springer Fachmedien
Wiesbaden, pp 22–22. https://doi.org/10.1007/978-3-658-25326-4
CPF Analysis for Identification
of Voltage Collapse Point and Voltage
Stability of an IEEE-5 Bus System Using
STATCOM
1 Introduction
In recent years, the use of FACTS devices has increased extensively due to their use
of minimizing the loss in power system analysis. The continuous power flow is an
effective tool for analysis of power flow solutions from base load and continuing
till, a steady-state voltage stability limit is reached. Here, the power flow process
is well conditioned around critical point. In such a process, even if single-precision
computation is used, then also divergence is not observed due to ill conditioning
of the system. Even the loading parameter also helps in identifying the weakest
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 201
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_17
202 S. Goswami et al.
bus of a system of buses. Voltage stability has to be maintained for all buses of the
system under normal operating conditions as well as when subjected to a disturbance.
There are many classifications for voltage stability when it becomes unstable due
to occurrence of fault in the system. They are large disturbance voltage stability,
small disturbance voltage stability, short-term voltage stability and long-term voltage
stability. Voltage collapse is typically associated with reactive power demand of
load not being met due to shortage in reactive power production and transmission.
The term voltage collapse is also often used for voltage instability conditions. It
is the method by which a series of subsequent events caused by voltage instability
resulting abnormally low voltages or even a blackout in a wide range of model types
of devices, both conventional and non-conventional. With PSAT, several static and
dynamic analyzes can be completed [1]. Using a different load flow technique in
PSAT, steady-state analysis of the IEEE-6 bus system is also displayed, and the line
losses are also detected with the change in loading, in order to establish a better
result [2]. By studying eigen-values and PV curves, solar photovoltaic generation at
peak demand conditions aids in boosting the loading margin that enhances system
stability without having a detrimental impact on voltage stability [3]. Voltage stability
margin is also improved in order to maximize the effective generator reactive power
reserve with PV and PQ generators by using the optimized one stage and two stage
approaches of preventive control action [4]. In order to identify the most sensitive
node to prevent blackouts or voltage collapse of the transmission system in the
IEEE-I4 bus system utilizing PSAT, the power flow results have also been compared
and studied [5]. The location of FACTS devices such as SVC and STATCOM at
midpoint of transmission line increases the power transfer capability [6]. HVAC
systems with different frequencies are interconnected for delivering more power over
longer distances with fewer losses [7]. The load flow analysis is carried out using
Gauss–Seidel method and Newton–Raphson method [8]. The modified two-area,
four generator with a parallel HVDC system is also simulated in PSAT software of
MATLAB for performing the load flow analysis [9]. A new MATLAB-based power
system analysis tool (PSAT) is described that freely distributed on line [10].
2 System Structure
The aforementioned Fig. 1 is a 5 bus test system where bus 1 is the slack bus, bus
2 is the generator bus (PV bus), and buses 3, 4, and 5 are exclusively load buses
(PQ). Figure 2 shows the circuit diagram of the 5-bus system compensated with
STATCOM at bus-3. Similarly, Figs. 3 and 4 depict the circuit model for 5-bus
system compensated with STATCOM at bus-4 and bus-5, respectively.
CPF Analysis for Identification of Voltage Collapse Point and Voltage … 203
The normal power flow is carried out for the 5-bus test system shown in Fig. 1. It
is seen from Table 1 that voltage is stable for buses 1 and 2 because buses 1 and 2
are slack bus and generator bus, respectively. But, buses 3, 4, and 5 have incurred
voltage drops as they are exclusively load buses.
The shunt compensating device STATCOM is then inserted in the 5-bus system.
Firstly, it is inserted at bus-3 as shown in Fig. 2. Then again, power flow is conducted,
and it is seen that voltage at bus-3 becomes 1.00 p.u. It is shown in Table 2.
204 S. Goswami et al.
Now, line losses are compared once without STATCOM and then with using
STATCOM. In Table 8, it is seen that line losses are minimized using STATCOM.
Now, we increase the load by a certain per cent individually at bus-3 and bus-4,
and then, the change in line losses is observed, once without STATCOM and then
with STATCOM. The simulation results of varying load with change in line losses
in shown in Tables 9 and 10, respectively, for bus-3 and bus-4. It is seen that losses
decrease with STATCOM for both buses 3 and 4 from 60% increase in load onwards.
206 S. Goswami et al.
Table 8 Losses comparison with and without STATCOM at bus-3, bus-4, and bus-5
Bus number Without STATCOM With STATCOM
3 6.778 6.058
4 6.778 6.091
5 6.778 6.189
Table 9 Comparison of losses with loading with and without STATCOM at bus-3
% Loading Loss without STATCOM Loss with STATCOM
20 7.469 8.345
40 10.832 11.255
60 14.947 14.812
80 19.230 19.043
100 24.245 23.977
Table 10 Comparison of losses with loading with and without STATCOM at bus-4
% Loading Loss without STATCOM Loss with STATCOM
20 7.469 8.4
40 10.832 11.337
60 14.947 14.925
80 19.230 19.197
100 24.245 24.179
Table 11 Continuous power flow results for the above 5 bus system
Bus V (p.u.) Phase Pgen (p.u.) Qgen (p.u.) Pload (p.u.) Qload (p.u.)
1 1.06 0 9.5773 2.1043 0 0
2 1 −0.44023 1.9884 1.05931 0.9942 1.2428
3 0.6398 −0.74959 0 0 2.237 0.4971
4 0.62994 −0.81906 0 0 1.9884 0.24855
5 0.52175 −1.0531 0 0 2.9826 0.4971
The continuous power flow is also carried out for the above test system, and
power flow results are displayed in Table 11. The global power flow report is shown
in Table 12.
The per unit values of real and reactive power for total generation, total load, and
total loss are shown in Table 12. When CPF analysis is done, then a graph of loading
parameter is also obtained (Fig. 5).
208 S. Goswami et al.
4 Conclusion
The above results show that voltage stability is obtained by using a shunt compen-
sating device called STATCOM. Also, the line losses are checked using STATCOM.
It is seen that losses decrease by using STATCOM from 60% increase in load. The
CPF analysis is also done to determine the weakest bus, and it is found from loading
parameter plot that bus-5 is the weakest bus.
CPF Analysis for Identification of Voltage Collapse Point and Voltage … 209
Acknowledgements Without all of the team members’ invaluable advice, input, and inspiration,
this effort would not have been possible. I consider it an honour to collaborate with them to effectively
finish this paper.
References
1. Milano F (2005) An open source power system analysis toolbox. IEEE Trans Power Syst 20(3)
2. Nitve B, Naik R (2014) Steady state analysis of IEEE-6 bus system using PSAT power toolbox.
Int J Eng Sci Innovative Technol [lJESIT] 3(3). ISSN: 2319–5967
3. Tamimi B, Canizares C, Bhattacharya K (2011) Modelling and performance analysis of large
solar photo-voltaic generation on voltage stability and inter-area oscillations. IEEE PES General
Meeting, July 2011
4. Mousavi AO, Bozorg M, Cherkaoui R (2013) Preventive reactive power management for
improving voltage stability margin. Electr Power Syst Res 96:36–46
5. Mishra P, Udupa HN, Ghune P (2013) Calculation of sensitive node for IEEE-14 bus system
when subjected to various changes in Load. In: Proceedings of lRAJ international conference,
21 July 2013, Pune, India. ISBN: 978-93-82702-22-1
6. Wadhwa CL (2009) Electrical power systems, 4th edn. New Age International Publishers
7. Kundur P (1997) Power system stability and control. McGraw-Hill, Inc.,
8. Gupta JB (2009) A course in power systems. S. K. Kataria & Sons
9. Chow J (2003) Power system toolbox version 2.0 load flow tutorial and functions manual
10. Bagchi S, Goswami S, Bhaduri R, Ganguly M, Roy A (2016) Small signal stability anal-
ysis and comparison with DFIG incorporated system using FACTS devices. In: 2016 IEEE
1st international conference on power electronics, intelligent control and energy systems
(ICPEICES)
Analysis of Various Blockchain-Based
Solutions for Electronic Health Record
System
Abstract Digitization helps most of the countries in the world to adopt electronic
health record (EHR) system to store and access sensitive patient medical record easily.
Developing secure IT environment for EHR is a challenging task due to repeatedly
cyber attack on healthcare system. Recently, use of blockchain in healthcare appli-
cations increases due to its inherent features. Blockchain addresses various security
issues faced by healthcare system. The main objective of the paper is to review
and analyze the use of blockchain-based solutions in healthcare domain for EHR.
Comparative analysis is carried out based on type of blockchain used, their pros and
cons along with various security parameters and performance parameters. Based on
the analysis, the future research direction have been suggested.
1 Introduction
Healthcare is one of the largest sectors in world, and its global market has been
reached up to $11,908 billion. It is segmented into pharmaceutical medicine, medical
equipment, healthcare service, and biologics. Healthcare service is one of the largest
and important section of healthcare industry. To improve the quality and decision-
making process of the healthcare service, data generated through the interaction
between patient and doctors called healthcare data are an important parameter. Data
Scientists predicted that per year more than 2000 Exabyte of data generated [1]. These
healthcare data are categorized into three groups based on who maintain and handle
it, personal health record (PHR), electronic medical record (EMR), electronic health
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 211
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_18
212 N. Sawant and J. Gomes
record (EHR) [2]. PHR is an electronic health data along with other information
maintained and managed by patient in privately and securely. EMR is also an elec-
tronic record created, gathered, and managed by single hospital. Whereas, EHR is an
electronic health record exchanged between different healthcare providers. Storing
and exchanging data between providers, EHR gives various benefits as compared to
paper-based health record (PBHR) [3]. EHR has many benefit over PBHR.
• Cost: PBHR is costlier to manage and maintain over the time than EHR. Initial
cost to setup infrastructure is more in case of EHR but gradually it reduces over
the time.
• Storage: Storing data on decentralized network or cloud becomes more accessible
in case of EHR. PBHR requires big warehouses to store data and become costlier
when handled by many providers.
• Security: Record can be lost, damaged, or misused because of human error
in PBHR, whereas more security mechanism can be provided for unauthorized
access in EHR.
• Diagnosis: EHR helps to improve diagnose the diseases as well as prevent it as
compared to PHBR.
• Access: Exchanging or accessing accurate record is tedious and time consuming
process in case of PBHR. Data stored in electronic form helps.
• Readability and Accuracy: Standard procedure is followed to make document
more readable and accurate to avoid confusion in EHR, whereas most of the time
PBHR affect due to medical error as insufficient space to write information more
details [3].
Advancement in IT sector helps to manage and maintain EHR. Due to sensi-
tive nature of healthcare data confidentiality, authentication, access control, non-
repudiation, interoperability, transparency, accountability, and privacy become need
of IT infrastructure [1]. To improve the security of information technology (IT)
systems, various healthcare standards are proposed. Patient safety, evidence-based
care, process improvement, easy exchange of information, and reducing cost are
some benefits of standard. Standards are listed in Table 1 [4].
in healthcare domain with the help of blockchain is presented. With the advancement
in healthcare domain, the use of blockchain has been incorporated to provide user
privacy, data security, authentication, data sharing [11]. Proposes blockchain-based
architecture to control, share healthcare data but author fails to show the implemen-
tation of system. Scattered data of patient is an issue solved by Roehrs et al. [12] by
providing unified view, interoperability and scalability issues are handled without
testing security and privacy of data [13]. Presented about how to share medical
data among custodians in trust less environment through Ethereum platform. System
also discusses about security and auditing of data. Another blockchain-based solu-
tion given for secure and scalable medical data sharing [14]. IPFS is used to store
large amount of data as off-chain database. Healthcare information exchange through
the approach of off-chain storage and on chain verification for privacy and authen-
ticity provided in [15]. Logic implementation for access control is more complex.
Blockchain-based service framework proposed to manage personal medical data [16]
in this complete control of data given to patient. Proposed scheme has only frame-
work, and no implementation is carried out. Among multiple authority to preserve
privacy of patient, attribute base signature scheme is proposed [17]. The perfor-
mance cost of the system linearly increases as the number of authority and patient
attributes increases but this scheme helps to resist the collision attack. Decentralized
blockchain technology can help to find the missing EMR from distributed replica
nodes. System implement smart contract to automate action against EMR but it fails
to work when patient is in emergency situation [18]. Another decentralized attribute-
based signature scheme is proposed for healthcare system using blockchain which
help, secure data sharing, easy access the EHR and non-repudiation. In this system,
owner of data has no control over write operation [19]. To access and retrieve the
EMR data efficiently, hyperledger based blockchain solution provided. Proposed
access control protocol helps to hide signature information. This system leads to
down in case of ordering of data increases [20]. The work presented in [21] proposes
new blockchain-based architecture for access model along with authorization scheme
in which users are provided control of the system at granular level. All user has the
same encryption key which leads to non-repudiation. Smart mobiles can also be
used to handle blockchain-based system remotely for personal health data sharing
and collaboration. Application is user centric for sharing data among various doctors
[22, 23]. The work in [24] proposes an approach that control the use of blockchain
for store transactional information about e-Health records and access control policies
(ACPs). ACPs are defined at user as well as resource level. Access policies and indi-
vidual authorization stored in blockchain which may lead to data leakages. Sharing
of EHR within healthcare provider helps proper diagnosis of patient. Decentralized
IPFS-based EHR sharing framework discussed in [25]. Data stored on cloud and user
can share the medical data through mobile as well. System prevents unauthorized
access and allows sharing of medical data in reliable way. System allow to access
patient data to all authorized doctors without patient permission. Yong et al. [26]
have implemented blockchain-based EHR sharing protocol which focuses on secu-
rity and privacy of record. Result shows that the proposed protocol is computational
Analysis of Various Blockchain-Based Solutions for Electronic Health … 215
efficient. Author has designed this protocol without using standard blockchain plat-
form. Recently, one more blockchain enabled patient centric framework is proposed
for healthcare application [27]. For experimental analysis, hyperledger caliper tool is
used to provide various security parameter such as latency, throughput, and resource
utilization.
This research work reviews and analyzes various blockchain-based solutions
for EHR on the basis of security parameters such as authentication, access
control, privacy, interoperability, confidentiality, and performance parameters like
throughput, latency, and speedup. Prospective research areas in said domain have
also been identified.
The rest of the paper is organized as follows: Sect. 2 introduces blockchain funda-
mentals to present overview of blockchain technology. Section 3 presents blockchain
development tools followed by blockchain for EHR methodology in Sect. 4. Section 5
presents the analysis of blockchain work done in EHR domain, and finally, conclusion
is presented.
2 Blockchain Technology/Fundamentals
Hash of Previous Block: Because the hash of the previous block is contained in
the hash of the new block, the blocks of the blockchain all build on each other.
Without this component, there would be no connection and chronology between
each block. SHA-256 hash algorithm is used for Bitcoin.
Root Hash of Markle Trees: All transactions contained in a block can be
aggregated in a hash. This is the root hash of the Merkle tree.
Timestamp: A timestamp in the block itself. The time is given in seconds since
1.1.1970.
The Nonce: The nonce, i.e., number used only once in a bitcoin block is a 32-bit
(4-byte) field whose value is adjusted by miners so that the hash of the block will
be less than or equal to the current target of the network.
Emerging healthcare industry adopting latest innovative model for faster and effi-
cient communication between different stakeholders. In the today’s Internet age, it
is easy to obtain and share the information through smart devices. But this leads to
risk of malicious attacks and risk of sensitive information compromised. So, there
are some basic requirements associated with healthcare. These are system security
(authentication, access control, confidentiality), interoperability, privacy, and data
sharing [32].
Cornelius [33] found that blockchain for EHR is a major research topic in literature
due to its properties such as decentralization, immutability, smart contract, and open
and transparent nature. Some of the benefits of using blockchain in EHR are
• Decentralization: The same copy of healthcare records will be available to all
stakeholders, and all of them have same access and control privileges. No single
entity will have control over the data.
218 N. Sawant and J. Gomes
This section briefly analyzed and compared various security and performance param-
eters which are handles in various blockchain-based solutions with respect to
EHR.
• Authentication: All healthcare providers are need to authenticate before handling
EHR system. Most of the blockchain-enabled frameworks provide authentication
through public key infrastructure (PKI) system in which separate private and
public keys are provided to each users.
• Data Storage on Blockchain: Medical information stored on blockchain becomes
immutable, i.e., cannot be modified. It enhance the security and trust of the system.
• Access Control: It is a security service which prevent unauthorized use of data.
This service controls the accessibility of data under certain conditions.
• Privacy: It is a right of individual to keep their EHR private. Privacy ensures only
authorized users of the system access EHR.
• Interoperability: It is an ability of the system to exchange the EHR between two
or more stakeholders of healthcare systems. It ensures the accuracy of shared data
and helps to increase efficiency and diagnostics testing.
• Confidentiality: Any identifiable information taken from patient by doctor is
considered as confidential information. To achieve confidentiality, such private/
confidential EHR not made available to unauthorized user.
• Patient Centric: EHR system considered patient centric if consent taken from
patient while doing various operations such as store, access, modification, or
exchange of medical record.
• Latency: Evaluated based on average response time required for client to access
or store data on blockchain and obtain the response.
• Average time taken for number of blocks travel from source node to destination
node within 100 nodes to travel 1490 blocks 0.216 s time required. This time can
be change with respect to block size [12].
• Average time is calculated with respect to number of request received to the
system. Result shows to handle 10 simultaneously request 145 s required [13],
whereas method proposed in [20] takes 122 s.
• Framework proposed in [22] evaluated based on time taken for data validation
process and integrity proof generation with respect to number of blocks.
• Paper [27] uses hyperledger caliper benchmark tool to analysis developed
blockchain-based application. Performance parameters are analyzed with respect
to configuration parameter such as block size, endorsement policy, channel,
resource allocation, and ledger database. Number of organization and peers and
block size are affecting factor over result.
Findings:
• Most of the research works only propose framework without its actual implemen-
tation.
• Research is also needed to focus in the area of designing dynamic blockchain-
enabled framework that ensure secure data sharing from present healthcare
systems.
222 N. Sawant and J. Gomes
6 Conclusion
References
11. Yue X, Wang H, Jin D, Li M, Jiang W (2016) Healthcare data gateways: found healthcare
intelligence on blockchain with novel privacy risk control. J Med Syst (Springer Link) 40(218)
12. Roehrs A, Da Costa CA, da Rosa Righi R (2017) OmniPHR: a distributed architecture model
to integrate personal health records. J Biomed Inform (Science Direct) 71:70–81
13. Xia QI, Sifah EB, Asamoah KO, Gao J, Du X, Guizani M (2017) MeDShare: trust-less medical
data sharing among cloud service providers via blockchain. IEEE Access 5:14757–14767
14. Rifi N, Rachkidi E, Agoulmine N, Taher NC (2017) Towards using blockchain technology for
health data access management. In: 4th international conference on advances in biomedical
engineering (ICABME). IEEE, pp 1–4
15. Jiang S, Cao J, Wu H, Yang Y, Ma M, He J (2018) BlocHIE: a blockchain-based platform for
healthcare information exchange. In: IEEE international conference on smart computing, pp
49–56
16. Chen Y, Ding S, Xu Z, Zheng H, Yang S (2018) Blockchain based medical records secure
storage and medical service framework. J Med Syst (Springer) 43(5):1–5
17. Guo R, Shi H, Zhao Q, Zheng D (2018) Secure attribute-based signature scheme with multiple
authorities for blockchain in electronic health records systems. IEEE Access 6:11676–11668
18. Azaria A, Ekblaw A, Vieira T, Lippman A (2016) MedRec: using blockchain for medical data
access and permission management. In: Proceeding of 2nd international conference on open
and big data. IEEE, pp 26–30
19. Sun Y, Zhang R, Wang X, Gao K, Liu L (2018) A decentralizing attribute-based signature
for healthcare blockchain. In: 27th international conference on computer communication and
networks (ICCCN), pp 1–9
20. Fan K, Wang S, Ren Y, Li H, Yang Y (2018) Medblock: efficient and secure medical data
sharing via blockchain. J Med Syst (Springer Link) 42(8)
21. Zhang X, Poslad S (2018) Blockchain support for flexible queries with granular access control
to electronic medical records (EMR). In: IEEE international conference on communications
(ICC), pp 1–6
22. Liang X, Zhao J, Shetty S, Liu J, Li D (2017) Integrating blockchain for data sharing and
collaboration in mobile healthcare applications. In: IEEE 28th annual international symposium
on personal, indoor, and mobile radio communications (PIMRC). IEEE, pp 1–5
23. Cao S, Wang J, Du X, Zhang X, Qin X (2020) CEPS: a cross-blockchain based electronic health
records privacy-preserving scheme. In: IEEE international conference on communications
(ICC), pp 1–6
24. Dias JP, Reis L, Ferreira HS, Martins A (2018) Blockchain for access control in e-health
scenarios. arXiv preprint arXiv:1805.12267v1
25. Nguyen DC, Pathirana PN, Ding M, Seneviratine A (2019) Blockchain for securing EHR
sharing on mobile cloud based e-health system. IEEE Access 7:66792–66806
26. Wang Y, Zhang A, Zhang P, Wang H (2019) Cloud-assisted EHR sharing with security and
privacy preservation via consortium blockchain. IEEE Access 7:136704–136719
27. Singh AP, Pradhan NR, Luhach AK, Agnihotri S, Jhanjhi NZ et al (2021) A novel patient-
centric architectural framework for blockchain-enabled healthcare applications. IEEE Trans
Ind Inf 17(8):5779–5789
28. Solidity—Solidity 0.8.15 documentation (soliditylang.org)
29. https://trufflesuite.com
30. https://www.trufflesuite.com/ganache
31. https://docs.metamask.io/guide/
32. McGhina T, Choo K-KR, Liu CZ, He D (2019) Blockchain in healthcare applications: research
challenges and opportunities. J Netw Comput Appl (Elsevier) 135:62–75
33. Agbo CC, Mahmoud QH, Eklund JM (2019) Blockchain technology in healthcare a systematic
review. Healthcare (MDPI) 7(56):1–30
Coordinated Network of Sensors Over
5G for High-Resolution Protection
of City Assets During Earthquakes
Abstract This paper introduces a novel concept in earthquake early warning for
the smart city (EEW-SC) environment based on a finely spaced network of seismic
sensors with a wireless backbone 5G network. The sensors are spaced 50–100 m apart
to predict the damage from one structure to another depending on the resonances
of the structure and the shape of the seismic waves striking the structure a few
seconds before an earthquake arrives. The capability of performing predictive real-
time performance-based damage assessment (PBDA) is unique and not possible with
existing sensor technologies. Depending on the expected severity of the damage,
automated actions, such as disconnecting electricity and gas, can be triggered to
protect the asset and prevent fires and explosions.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 225
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_19
226 I. Daiss et al.
and local reflections in a city distort the seismic waveforms at different geographical
coordinates. The assets’ frequency response and the particular frequency content
of the arriving waves determine the damage suffered by the asset, and automatic
response actions, such as disconnecting electrical circuit breakers, closing gas lines,
and others, can be performed to protect the equipment, buildings, and lives. Currently,
earthquake early warning systems have limited capability for automated actions since
they only predict the intensity of the arriving earthquake waves and not the frequency
content that will be applied to the structure [4].
Conventional networks of sensors for earthquake early warning are regional systems
based on expensive (about $30 K) high-sensitivity seismic sensor stations separated
about 30 km from each other that can measure the small amplitude P-waves that
precede the large destructive S-waves by 10–20 s for a 50–100 km away earth-
quake. From the intensity of the P-waves, the intensity of the S-waves is predicted
based on historical earthquakes, and an alarm is issued to the entire city. Since the
number of earthquakes in a particular region is relatively small, the historical data
is normally assembled from all over the world. However, earthquake propagation
is highly dependent on the particular characteristics of a local region in terms of
propagation parameters and local wave reflections, and even though the earthquakes
may originate from a similar fault, by the time the waves arrive in a city, they may
have suffered multiple distortions.
There are many limitations to the conventional approach for predicting damage
to the city’s assets [4]. (1) Since the sensors are separated by 30 km or so, the spatial
resolution is limited to about 30 km. (2) Since the small magnitude of the P-waves
is used to predict the magnitude of the large S-waves, and the relationship between
these two waves is not a constant physical parameter, this prediction has a large
margin of uncertainty. (3) Since the prediction is based on historical earthquakes and
there are not two faults that are identical or two regions with the same propagation
characteristics, earthquakes that arrive at a city located tens of kilometres away from
the epicentre will present waveforms that can be vastly different from one location
to another. In addition, the effects of climate change in recent years are making
historical records even less reliable.
Even though traditional networks are working towards placing their sensors closer
to each other (e.g. every 10 km), the other aspects of the prediction (mostly the
estimation of the S-waves from the P-waves) are subjected to the same limitations
described above. In general, the accuracy of the traditional methods is in the order
of 50% in magnitude and location. Due to these limitations, when assets need to be
protected individually, they are provided with their own sensors increasing the spatial
resolution, but the accuracy limitation of predicting the S-waves from the P-waves
remains. To increase the frequency resolution in determining the frequency response
Coordinated Network of Sensors Over 5G for High-Resolution … 227
of the structures, travelling wave methods are being introduced [5]; however, these
methods are not fast enough to protect thousands of assets in a city.
Figure 1 illustrates a dense network of sensors across the city, forming a fine grid.
The sensors (nodes) are separated by 50–100 m. Each sensor is connected wirelessly
to the underlying 5G communications network, which presents a low latency at any
given node. This uniformly low latency is equivalent to having all the sensors next
to each other (except for the small latency) “in the same workbench”. Since waves
are propagating along the grid, comparing the measurements at all nodes point by
point allows us to see how the waves propagate. Superimposing these measurements
on an underlying wave propagation model built with transmission line segments, we
can consider the differences in the subsoil paths and predict where the waves will
be in the next few seconds. The method’s accuracy is high, and the sources of error
are predictable. Accounting for errors in the model, in the sensors, and in the subsoil
values, the expected accuracies are in the order of 95%.
With the proposed method, for a city of about 2.5 M people, placing a sensor every
100 m, we need about 300,000 sensors for a total cost of about $30 M. With traditional
technology, this number of sensors would cost about $9B, which is prohibitively
expensive. The main reason for this high cost is that conventional seismic sensors
have to be very sensitive to detect the very small P-waves to predict the S-waves,
while inexpensive MEMS sensors are used in the proposed solution to measure and
predict the full waveforms, which include the strong S-waves.
Figure 2 shows the integration of the mini-seismic (MSe) stations with the 5G
network through a Raspberry Pi board and a wireless communication module. As
indicated, the cost for 300,000 stations is $30 M, which is quite reasonable compared
to the large savings expected in assets and lives.
The main requirement of the proposed solution is an extensive telecommunica-
tions infrastructure with high-performance requirements in terms of latency, band-
width, and processing speeds due to the very short time frames for making decisions
before the earthquake arrives.
The expansion of IoT technologies in recent years is transitioning from using
conventional wireless technologies such as Bluetooth, Zigbee, and LoRa, to commu-
nicating over extensive networks over 4G and 5G. Particularly, 5G networks offer
high throughput, high reliability, wide-coverage, and full IP support for IoT devices.
In the context of critical IoT applications, such as ours, they fulfil the need for low
latency, guaranteed bandwidth, and added security. They support data transmission
over large areas and are much cheaper than satellite communication. Another benefit
of using 5G technologies is that an application that relies on the massive deployment
of sensors can use the existing telecommunications infrastructure instead of building
and deploying new infrastructure (gateways, access points, etc.).
The 5G wireless technology is an enabler technology that can accommodate
massive-scale IoT deployments with the consistency of quality of service required
for mission-critical applications with a high impact on lives and property. For our
Coordinated Network of Sensors Over 5G for High-Resolution … 229
2 Mini-Seismic Stations
We have developed MSe stations (Fig. 3) centred around an easily available Raspberry
Pi 4 board equipped with a general-purpose input osutput (GPIO) interface. This inter-
face allows control of external components, such as control circuits and communica-
tions modules connected to its outputs, and input information from sensor modules
and control circuits. The GPIO standard allows interaction using both synchronous
(I2C and SPI interfaces) and asynchronous (UART interface) serial communica-
tions. The GPIO also contains programmable pins that can be used to meet specific
application needs.
The operating system of the Raspberry Pi is highly optimized for the ARM CPU.
The OS is based on the Debian Linux distribution and allowed us to use several
available tools/libraries during the development and testing of the MSe software.
The sensor selected for this application is a 3-axis MEMS accelerometer
√
ADXL355 [6]. This sensor provides ultra-low noise density (22.5 µg/ Hz) on all
axes. The ADXL355 comes with an integrated 20-bit ADC, and it supports digital
SPI and I2C interfaces that can be used for communication with the processing unit
of the seismic station.
In our tests, we used different IoT modules to test the network’s access and
performance impact on the overall application. The modules tested included:
• Quectel BG96 for LTE-M network access,
3 Sensors Coordination
A large deployment (in the tens or hundreds of thousands) of MSe stations requires
a carefully integrated data collection and management system. Each sensor records
the seismic wave at its location node, which is then coordinated in time with the
data collected in the other nodes. The objective is to accurately reassemble in time
the waves propagating across all nodes at about 4 km/s and perform all calculations
needed to predict how the waves will be propagating across the grid in the next few
seconds.
To compare the waves at different nodes, we need to have an almost exact common
time reference, despite the latency and jitter of the communications network. In the
network, the latency changes from location to location in space and time (jitter).
Conventional timekeeping methods, such as clocks in the boards that provide a
time stamp when a data packet is sent, are not accurate enough for synchronizing
the phase angles of the travelling waves over very short distances of 50–100 m.
Also, maintaining the clocks of thousands of sensors synchronized just before the
earthquake begins is logistically very difficult. GPS synchronization is subject to
non-coverage locations, and onboard hardware clocks are susceptible to temperature
changes and drifts. In addition, these clock synchronization technologies add to the
cost of the devices.
Operationally, constant synchronization to keep the application awake would
unnecessarily load the network in normal time before it is actually needed when
the earthquake occurs. Also, having to wake up possibly hundreds of thousands
of devices would subtract valuable early warning time, which is very critical for
earthquake early warning.
To satisfy the strict synchronization requirements of our application and start the
process whenever the earthquake begins, we developed an asynchronous sensor
Coordinated Network of Sensors Over 5G for High-Resolution … 231
gives very accurate results, and we also examined the behaviour of the system and
the data collection mechanism in different wireless technologies.
Because multiple data points are sent together every time a request for a data
packet is made to the board, there is a source of error if the agent’s request arrives
in the interval between two neighbouring samples. This sampling error is very small
as long as the sampling frequency is at least ten times the maximum frequency
we want to measure in the earthquake waveform. In our application, we sample at
200 Hz, which gives us up to a frequency of 20 Hz of the earthquake waveform. This
frequency is sufficient for accurately predicting damage to most structures [7].
ASC was very consistent in its estimates and produced very accurate timestamps.
Figure 5 shows the results obtained with ASC’s estimation and measured with a
highly controlled testing environment. The average error was about 0.002 ms in the
estimated timestamps of two devices detecting the wave at the same location and at
the same time. The requests are sent in parallel to two devices sampling in parallel. In
this case, the estimated timestamps should be the same, except for the network errors
plotted in Fig. 5. The very low error achieved allows us to trace earthquake waveforms
with a range of frequencies up to 20 Hz very accurately. This wide frequency range
is adequate to predict damage to most structural assets (Fig. 6).
The tests to verify the ASC synchronization algorithm were conducted in a highly
controlled laboratory environment. In these tests, we injected artificial uplink and
downlink delays and externally added jitter. The accuracy of ASC was estimated
by comparing the estimated times versus the actual measured times in the network
time protocol. In these tests, the accuracy was estimated by comparing the estimated
times versus the sample times recorded by NTP-synchronized MSE stations.
We measured the round-trip time (RTT) for each packet but estimated the downlink
time from server to sensor. We first assumed that the travelling time was equal in the
uplink and downlink trips, that is, ½ of the RTT. This assumption was then verified
using other assumed ratios of downlink/uplink times, but the ½ ratio was overall very
accurate.
In a cellular network, downlink and uplink times are not equal, but their ratio is
not fixed because the network delay changes continuously due to the scheduling of
the traffic in the network. Since the grid solution is based on following the wave
propagation from sensor to sensor, we need to minimize the error between devices.
Assuming ½ RTT for all devices resulted in a very accurate synchronization time
when reconstructing the waveforms regardless of the ratio at a given moment.
Even though ASC compensates for the network’s latency and jitter, many error limits
are affected by higher latencies. Table 1 shows our field test results for different
network technologies. The minimum technology for useful results was 4G-LTE,
giving us an earthquake frequency response bandwidth of 5 Hz. With 5G FR1, the
bandwidth was up to 10 Hz, while with 5G FR2 (mmWave), we can capture up to
20 Hz of the seismic waves. A bandwidth of 20 Hz is sufficient to characterize the
fragility of most structures. Table 1 compares different technologies tested. It should
be noted in the results of this table that in addition to the bandwidth that can be
captured using the ASC algorithm with different technologies, the absolute value of
the latency limits the computer time we have available to process the data streams in
real-time.
234 I. Daiss et al.
The primary focus of most modern building codes is to ensure human life safety.
However, the degradation of life quality and the economic losses in the days after the
earthquake can be considerable and should be part of the design process. Integrated
design methodologies for the building owner, architect, and engineer, to choose
the desired level of seismic performance for buildings and nonstructural components
when subjected to a specified level of ground motion should be added to the standards
[8].
Real-time performance-based damage assessment (PBDA) is not possible with
current earthquake early warning systems because they cannot predict wide-band
waveforms, and therefore, their prediction accuracy is very limited. The PBDA index
determines, in a probabilistic manner, the level of risk to which the infrastructure
is exposed, driving decisions for seismic reinforcement, backup of critical data, and
safety of people, as well as the cost of insurance to cover this risk. This risk depends
on the sensitivity of the infrastructure to the soil site conditions and the dynamic
characteristics of the earthquake in the specific region. Figure 6 [9] illustrates the
wide range of frequencies in the frequency response of the 121-storey Shanghai
tower that may be excited by the earthquake waveforms.
With the state of damage of each critical asset given by the PBDA, we can make
disaster recovery decisions to preserve the city’s integrity. Local responses consist
of automatic actions that can be applied in a couple of seconds before the earthquake
strikes. An important local application is to open the circuit breaker feeding wires
or equipment that will collapse and produce short circuits that result in fires. Global
responses prioritize the responders’ actions to restore the most critical services as fast
as possible. UBC’s infrastructures interdependencies simulator (i2SIM) [10] is a tool
that allows us to coordinate these actions within our earthquake early warning (EEW-
SC) disaster response environment. Figure 7 shows a diagram of i2SIM where the
main infrastructures of a city’s downtown area are represented for disaster response.
The information from the PBDA subsystem is fed into the i2SIM management
console to determine where the responders should prioritize their responses. i2SIM
Coordinated Network of Sensors Over 5G for High-Resolution … 235
coordinates the supply of services that each critical infrastructure needs from each
other and, given the damage level of each infrastructure, determines how to distribute
the available resources to maximize the speed of recovery and restore the city’s
wellbeing. For example, immediately after an earthquake, there will be a deprecation
in the capacity of electrical feeders and water pipes to supply electricity and water
to the emergency units (EU) of the hospitals. In addition, major roads and highways
may have severe structural damage and are no longer functional. i2SIM will inform
the responders how many people can be treated at each available EU, depending on
the structural damage and availability of resources in each EU.
In many cases of earthquakes, achieving a “recovered” minimum functional state
of a building or infrastructure could take months to a year. With the information and
knowledge provided by our proposed EEW-SC system, a minimum operating system
state with minimum essential services for survival could be achieved in weeks rather
than months.
236 I. Daiss et al.
6 Conclusion
In this paper, we have introduced a new approach to protection and response to large
earthquake disasters to protect the infrastructure assets across a smart city. The main
premise is that considerable life and property can be saved during earthquakes by
having a better knowledge of the damage expected from the earthquake shaking.
In many cities that suffer strong damage during earthquakes, an extensive 5G or 4G
communication network already exists that allows the implementation of the solution
proposed in this paper. In this project, we have designed a network of economical
mini-seismic stations that can crisscross the area of the city. This resolution in space
allows the development of more comprehensive damage assessment and disaster
response strategies to save human lives and quickly restore the city’s wellbeing.
References
1. Martí J (2021) The EQZ transmission line model for earthquake wave propagation. The
University of British Columbia
2. Moehle J, Deierlein G (2004) Framework methodology for performance-based earthquake
engineering. In: WCEE, Vancouver, B.C., Canada
3. Ericsson—Deutsche Telekom (2021) Enabling time-critical applications over 5G with rate
adaptation
4. Wald D (2020) Practical limitations of earthquake early warning. Earthq Spectra 36:1412–1447
5. Hoshiba M (2021) Real-time prediction of impending ground shaking: review of wavefield-
based (ground-motion-based) method for earthquake early warning. Front Earth Sci 9
6. Analog Devices Inc (2020) ADLX354/ADXL355 data sheet
7. Arnold C (2006) FEMA 454: designing for earthquakes. In: A manual for architects. Providing
protection to people and buildings. Oakland, California
8. Haselton CB, DeBock et al (2019) Resilient design for functional recovery—expectations for
current California buildings and approaches to resilient design. Haselton Baker Risk Group,
LLC Seismic Performance Prediction Program (SP3)
9. Ventura C (2020) Joint time-frequency analysis in OMA. In: IMAC XXXVIII conference on
structural dynamics
10. I2SIMRT (2022) I2SIM-RT software and user’s manual. I2SIM-RT Technologies Inc.,
Vancouver, Canada
Detection of COVID-19 Using Medical
Image Processing
Abstract Since the outbreak of COVID-19, several human life has been affected in
many aspects and witness huge death cases. Ultimately, the World Health Organiza-
tion has declared COVID-19, a pandemic, which has created a magnificent loss all
over the world, especially in those countries having poor health hygiene and is finan-
cially slower to respond. Medical image processing has been carried out to implement
in various applications of healthcare, such as cancerous cell detection, lung nodules
classification, thyroid diagnosis, diabetic retinopathy detection, and fetal localiza-
tion. The sources for the study are medical images, e.g., X-ray, CT, and MRI, and
these numerous sources of medical images have made the medical image processing
techniques tackle the COVID-19 outbreak. Huge research work as a response to the
outbreak to combat the deadly disease has been proposed and implemented using
healthcare technology available to us. Therefore, in this paper, we are motivated to
analyze and summarize several state-of-the-art research works related to COVID-19
medical image processing. Further, we also highlight an overview of deep learning
and its applications to healthcare found in the last decade.
1 Introduction
In human history, one of the outbreaks that have created a worldwide health crisis is
the COVID-19 pandemic affecting humans of all age groups. Initially, fewer people
get affected and were confined to one geographical region and not much of a potential
threat to the human race; however, in the later stages, the disease outbreak has become
an immensely high-risk pandemic as declared by the World Health Organization
(WHO). It has the potential of infecting millions of lives in all geographical regions,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 237
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_20
238 R. S. Durga et al.
especially ones with weaker health systems. The new virus discovered is deadly
mainly due to the fact that no vaccine was available, and it is transmitted through
direct or indirect contact with the affected individual.
We have several medical image resources that made deep learning a great tech-
nique to detect the virus in the human body like CT scan, MRI scan, etc.; then, we
can preview deep learning and its applications in which healthcare found in the last
decade. To detect if the patient has COVID, first, they go under some scans (CT scan/
MRI scan), and if they have, they would know at what level is the patient infected.
Since this virus emerged in China and the first case emerged, it has cost over 4 million
people. We have many methods like medical imaging, X-ray session, etc. Doctors
take the diagnosis which is mainly dependent on the RT-PCR test as it is not the final
report. It only indicates whether the person has SARS-CoV-2 or not. As to know
how much he’s infected with the virus, X-ray session is conducted, but for a more
detailed report, they take a CT scan (computed tomography) (Fig. 1).
In this scan, doctors mainly observe the lungs through the X-ray image or CT
scan images of the patient and indicate the symptoms. As we all know, medical
image processing is the process of exploring the image datasets of the body, which
is commonly obtained from a computed tomography scan (CT scan) or magnetic
resonance imaging (MRI) scan. This technique is mainly carried out by radiologists
to understand more about the symptoms of the infected patient.
Several organizations and governments have instigated to invest in the COVID-
19 vaccine and related research abundantly and ardently. Several-related symptoms
were identified and enlisted so that the general public is aware and conscious of the
same to be perceptive to investigate and follow treatment to the earliest to reduce the
mortality rate. Further, tremendous research works are being carried out connecting
to the COVID-19 outbreak. The medical image processing approach has achieved
huge momentum in several health sectors, especially in cancer detection [1]. Further,
deep learning and machine learning techniques are popular choices for the detection
of several disease detection as reported in [2].
Detection of COVID-19 Using Medical Image Processing 239
The objective of this paper is to emphasize the input of machine learning, deep
learning, and medical image processing technique to counter the COVID-19 outbreak
all over the world. A review of the state-of-the-art techniques designed by using the
methods.
Table 1 (continued)
Paper Dataset Algorithm Remark
[15] The dataset consists of X-ray COVIDX-Net consists of A larger dataset with
images obtained from Dr. deep CNN architecture varied medical images can
Adrian Rosebrock and Dr. be considered for more
Joseph Cohen detail analysis
[16] The dataset has chest X-ray Deep CNN Implementation of the
images from Dr. Joseph Cohen’s architecture—ResNet50, CNN models on larger
GitHub repository InceptionV3, and datasets to enhance
InceptionResNetV2 classification performance
not considered
normal people X-ray images; contributing a total of 178 images for analysis. The
overall accuracy of 99.5% is reported in the proposed CNN model.
This paper [22] describes a deep learning technique for the detection of COVID-19
patient using CT scan images. For the experimental study and analysis chest, X-ray
images were used as a sample dataset keeping in mind the economic features of X-ray
equipment with time efficiency and availability in the majority of the hospitals or
clinics. The authors have claimed that the developed technique can detect in presence
of the virus in the shortest time possible, which will lessen the pressure on RT-PCT
testing when hospitals or diagnostic centers need to run for huge testing. For the study,
several pre-trained networks have been employed such as MobileNet-V2, VGG-16,
ResNet-50, and InceptionV3 which are modified based on the requirement as a head
model from a base model. The model has been reported to achieve 92% and 98% of
validation accuracy at epochs 8 and 10, respectively. The presented method can be
extended for improving accuracy by setting on a larger dataset.
Another transfer learning model is presented for the identification and detection
of COVID-19. In this paper [23], the authors have utilized a well-organized CT scan
image dataset which is known as COVIDx CT-2A and COVIDx CT-2B. The dataset
contains as many as 194,992 CT scan images taken from 3745 patients that belong
to the age group of 0–93 years. In this paper, the authors have modeled a reprise
version of ResNet-V2 and changed the parameter as per their requirement for all the
CNN layers. It has provided 99.2% accuracy in detecting COVID-19 cases. However,
their work is limited on the theoretical ground and can be validated by experts and
physicians for its clinical uses.
In [24], the authors have proposed the detection of COVID, and its severity is also
predicted using optimization methods and deep learning architecture. The extraction
of features from CT scans and chest X-ray images based on CNN is presented in
a feature-learning stage. The authors have suggested fine-tuning the CNN model
with data augmentation to improve the performance of the developed model. The
presented work reported an accuracy of 97.36%. The authors have suggested the
generalization of the proposed model by adding a new labeled image.
Another deep learning approach for accurate identification of COVID-19 from
chest X-ray images is presented in [25]. The proposed model is reported to detect any
abnormal structure in the CT scan or X-ray images. The developed methodology for
identification is broken down into three phases; namely dataset preparation, prepro-
cessing, and finally training and classification. The authors have suggested that with
a larger amount of data, the results can be improvised.
Detection of COVID-19 Using Medical Image Processing 243
Several deep learning models have been reported in the literature to identify the
accurate outcome and consistent results by standardizing and digitizing medical
images dataset with the techniques of medical image processing. Further, it has been
observed that radiographic patterns on CT chest scans in the early phase of COVID
infection have a higher positive detection rate as compared to the RT-PCR in detecting
COVID-19. Popular universities such as Stanford have provided data, models, tools,
research studies, and funding opportunities for COVID-19 research. The research
effort combined with COVID-19 datasets has helped to build comprehensive medical
image processing and DL models for identification, virus diagnosis, treatment, and
even potential vaccine development.
In the field of drug research and vaccine research, medical image processing tech-
nique with deep learning has the capacity to contribute to the COVID-19 outbreak.
Enormous datasets of chemical compounds are available where several machine
learning and deep learning models can be trained; which can boost human immunity
and protect from infection. Therefore, the pattern of the compound can be studied
and learned using machine learning techniques in less time. Further, the researchers
along with the input of medical experts can test the algorithm whether their newly
composed compounds in the medicine can be used as vaccines or not.
The virus generally represents a similar part of the antigen that induces the disease.
When the vaccine is introduced into the body, the immune system of the host body is
activated which helps in the generation of particular antibodies for the identification
and demobilization of the virus. The virus will multiply rapidly, and their respective
antigen is likely to undergo mutation and which will prevent the identification or
detection of the developed antibodies. So, the vaccine generation effort devoted to
classifying the T-cell epitopes for the COVID-19 virus needs to be discussed. That is
why several researchers have used the CNN model as a deep learning method for the
prediction of cross-immunoreactivity (CR) in heterogeneous epitope vaccines with
the help of experts or physicians.
Several applications are developed following the concept of machine learning,
medical image processing, and computer vision methods to control and monitor
the transmission of COVID-19 disease. For the identification, detection, inspec-
tion, and guiding of COVID-19, several smart devices based on machine learning,
medical image processing, and computer vision were deployed among the public.
For instance, ventilators, automatic sanitizers, respirators, and protective gear are
employed for attending to the patients at the same time protecting healthcare special-
ists as an assurance for virus containment. The temperature of individuals is measured
using a thermal screener to check the temperature rise among them as the primary
symptom of COVID-19 is high temperature. Vision-guided robots have been used to
ensure social distancing which is practiced strictly among COVID-infected patients
and their near ones. Several countries’ governments have launched drones to detect
COVID-19 infections among people in remote areas. Other ML and image processing
244 R. S. Durga et al.
techniques are being employed for the extensive manufacturing of gears and health-
care products to be used by all who work in hospitals and offices. These devices assist
in avoiding the spreading of the virus by minimizing human contact. These technolo-
gies have proved their roles in diagnosing and reducing airborne virus particles, which
have the possibility of infecting a large number of people.
It is a time-consuming process, although demanding, to manually detect COVID-
19 or not. Therefore, the application of medical image processing with deep learning
or machine learning approach is highly supported for the prediction of disease
using publicly available datasets. The approach is effectively supported by health
organizations such as WHO, IMCR, etc.
However, the availability of the dataset and its accuracy is questionable for deter-
mining the efficiency of the diagnostic system developed using machine learning
techniques. Further, it is difficult to gather medical data such as X-ray and CT scan
images from wider demographics and healthcare organizations. Moreover, access to
other information related to the patients such as family history, work, education, and
other behavioral characteristics is required for determining the technical implemen-
tation using machine learning or deep learning approach. Also, how some people got
infected without any symptoms needs to study.
4 Conclusions
Medical image processing with a deep learning approach has been treated as an effec-
tive method to furnish rational and reliable solutions for the diagnosis and identifica-
tion of the COVID-19 outbreak. Several applications of the deep learning approach
have been reported in the literature in the field of medical diagnosis with the objec-
tive of providing potential solutions. In this paper, we have summarized some of the
recent diverse publications based on COVID-19 that have similar objectives. It has
presented the relevance of medical image processing with deep learning methods
in the identification and detection of COVID-19. Further, the study also featured
challenges combined with such applications that will be helpful for future research
work that is not explored.
However, there is still a requirement for discussion and process among different
organizations from government, industry, and academia that demands extensive effort
and time. Further, we have listed the application of medical image processing with
a deep learning approach in COVID-19 in diagnosis and vaccine preparation. More-
over, the challenges associated with computer-aided diagnosis and its impact on the
medical field are also discussed.
Detection of COVID-19 Using Medical Image Processing 245
References
23. Zhao W, Jiang W, Qiu X (2021) Deep learning for COVID-19 detection based on CT images.
Sci Rep 11(1):1–12
24. Syarif A, Azman N, Repi VVR, Sinaga E, Asvial M (2022) UNAS-Net: a deep convolutional
neural network for predicting COVID-19 severity. Inf Med Unlocked 28:100842
25. Xue Y, Onzo BM, Mansour RF, Su SB (2022) Deep convolutional neural network approach
for COVID-19 detection. Comput Syst Sci Eng 201–211
Text Encryption Using ECC and Chaotic
Map
Abstract With the advancement of technology in the modern society, there is also
a need for an advance security system. The public-key encryption known as elliptic
curve cryptography (ECC) is considered to be more secure than traditional methods
such as Rivest, Shamir, Adleman (RSA) for the same key size. In ECC, a key pair
is generated by selecting a point on an elliptic curve and a secret integer, called the
private key. In this research, we proposed an ECC-based chaotic map text encryption
scheme. A chaotic sequence is generated where BitXoR is performed with the input
ASCII characters. The XoRed data is grouped, and ECC is performed to each group to
generate the cipher text. The chaotic system provides better randomness while ECC
provides efficient security. Experimental analysis shows that the proposed method
can efficiently encrypt and decrypt the input text data.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 247
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_21
248 P. N. V. L. S. Sneha Sree et al.
other applications where data security is important [1]. It is important to keep secret
keys secure and protect them from unauthorized access, as this could compromise
the security of the encrypted data [2]. A literature review on various existing meth-
ods will provide an overview of the key findings, trends, and developments in the
field. The different variants and variations of ECC have been proposed, as well as
any known vulnerabilities or attacks on the algorithm. This study shows that ECC
surpasses RSA in terms of operational effectiveness and security. This research also
raises the possibility that ECC may be best suited for memory-constrained devices
like smartphones. ECC is an equivalent of the Deffie-Hellman key exchange (DHKE)
protocol. With a smaller key size than RSA, ECC offers more security. In [1], the
authors used ECC in electrical devices since that have less memory and consume less
power. The mapping technique’s security will give twofold security for text encryp-
tion [2], and the method suggested in this work has the significant benefit of avoiding
the need to pad the grouped hexadecimal with an extra bit when the integer is odd
since it is expected that the length zero group is NULL. The encryption and decryp-
tion of a matrix-based message for an ElGamal-based message were reported by the
authors in [3]. AES uses a variable-length key, with sizes ranging from 128 bits to
256 bits, and is considered much more secure than DES. In [4], the modified AES
is more effective than the original AES since it encrypts text and picture data more
quickly. Homomorphic Encryption is a trusted and private method for cloud data cen-
ter processing and storage. Currently, HE has been used by the authors, and it appears
conceivable that its first extensive use will be in ML applications that enable private
AI [5]. Through this research, authors achieved access control limitation with the use
of attribute-based encryption, which is both secure and trustworthy. This platform is
known as CloudSim [6]. By tackling the clustering problem, the developers of this
mechanism take the first step into the field of unsupervised learning, a significant area
of machine learning that has numerous practical applications. In order to conduct
a privacy-preserving assessment, fully homomorphic encryption (FHE) techniques
are used [7]. A new encryption technique [8] which provides a high-level security
with a small sized is proposed where the traditional method of converting letters into
affine points on the elliptic curve has been removed. Here, the input text is converted
into ASCII values, and then, they are grouped according to a respective size and then
these groups converted into long numbers within the group, and then key generation
is done followed by generation of cipher text. This approach helps eliminate the cost
of operation associated with mapping characters to elliptic curve co-ordinates while
also eliminating the requirement for using a standard lookup table. A new [9]and
efficient stenography method which is used for hiding the biomedical image in a
ordinary file or message has been proposed. In this paper to maintain continuous
communication through channel where it has no security, ElGamal cryptosystem is
adopted. A new encryption technique [10] is introduced which is a symmetric key
encryption-based scan and cycle-shift operation called chaotic map. Input is refined
according to the requirement using respective processes. Henon map played a key
role in their technique. Finally double scrambled image, an encrypted image is gen-
erated. A new algorithm [11] is proposed which protects medical images against
an attack. Chaotic systems are used in this technique. This algorithms have two
Text Encryption Using ECC and Chaotic Map 249
main parts, and they are high-speed permutation and adaptive diffusion. It is such an
effective algorithm which says image cannot be decrypted if there exists any small
change in produce key. It is mentioned that there is a requirement of key space of.2100 .
The authors found that end-to-end key generation and picture encryption based on
deep learning provided benefits in enormous key spaces and automatic generation,
with a lessened need on complex cryptography architecture [8]. In this mechanism,
authors proposed an unique mechanism of performing image security with an aid of
deep neural network. Authors emphasized that shouldn’t offer heavy cryptographic
operation. Usage of stacked encoder has actually overcome the iterative problem
of feed-forward approach while enhanced chaotic map leads to better generation
[12]. Author used a new bit reversion, chaotic logmap, and deep CNN to gener-
ate keys for encryption operations. Permutation, DNA encoding, diffusion, and bit
reversion, which scrambles and modifies the pixels, are used here to securely encrypt
images [13]. This framework [14] is mainly uses dynamic DNA encoding, hyper-
chaotic system, and elliptic curve cryptography. The colorful image is encrypted
into a DNA order using process randomly selected row-level encoding-rules. Hyper-
chaotic system is utilized for producing pseudo-random sequences to arranged image
information at two different levels called bit-level and block-level. The method in
[15] concentrates on encryption of color and gray image(s) using chaotic system. It
uses a chaotic system of different dimensions. This proposal is very sensitive about
initial conditions because the important chaotic sequences are generated based the
initial conditions. Finally, the algorithm performs encryption using the permutation
table. A new image encryption [16] transmitting algorithm using parallel mode is
proposed. To increase the security level, both sequence signal generator and chaotic
cryptography will be combined. The cryptographic properties of chaotic signals can
fully used by flexible digital logic circuit. The overall contribution of the proposed
encryption scheme can be summarized as follows:
1. A new text encryption is proposed based on ECC and logistic map.
2. The proposed method can successfully encrypt and decrypt any input text data.
The rest of the paper is structured as follows. Section 2 describes the preliminaries.
Section 3 displays the data grouping. Section 4 describes the proposed methodol-
ogy. Section 5 presents the experimental analysis of the algorithm, followed by the
conclusion in Sect. 7.
2 Preliminaries
The following calculations depict the mathematical operation using the co-ordinates
in elliptic curve whose range is a finite field.
250 P. N. V. L. S. Sneha Sree et al.
1. Point addition: Any two co-ordinate points which are distinct, . S(x1 , y1 ) and
. T (x 2 , y2 ), on point addition returns a new co-ordinate .U (x 3 , y3 ) which satisfies
the equation of elliptic curve. Mathematically representation is as follows:
x = {λ2 − x1 − x2 } mod p
. 3 (1)
y = {λ(x1 − x3 ) − y1 } mod p
. 3 (2)
if . P /= Q,
y2 − y1
. λ= mod p (3)
x2 − x1
otherwise,
3x12 + a
λ=
. mod p (4)
2y1
Point addition and point doubling operation are combined to perform Point
multiplication efficiently [13].
Logistic Map Logistic map is a widely used point of entry into a consideration of the
concept of chaos. This logistic map is a polynomial with degree 2, which is regularly
referred to as a typical thing such as how complex, chaotic behavior can emerge
from simple non-linear dynamic equations. Figure 1 shows the bifurcation diagram
of logistic map. The logistic map (LM) is define as
a
. x+1 = t ∗ ax (1 − ax ) (6)
where the common values of the parameter t are in the range [0, 4], so that .ax holds
on .[0, 1].
Text Encryption Using ECC and Chaotic Map 251
3 Data Grouping
4 Proposed Methodology
The communicating parties adopted DHKE method over the elliptical curve to share
the produced secret key. The proposed encryption and decryption algorithm are
illustrated in Fig. 2.
1. Input text data is taken and converted to its corresponding ASCII values.
2. Using a secret key as seed value, a chaotic sequence is generated, and its lengths
are equal to length of input.
252 P. N. V. L. S. Sneha Sree et al.
Key
Input Text Data Cipher Text
D
E e
n Logistic Map c
Corresponding ASCII Value Corresponding ASCII Value
c r
r y
y p
p Chaotic Sequence
Data Grouping t
t i
i o
o n
n XORed Data ECC Decryption
P
P r
r o
Data Grouping
o To Integer Form c
c e
e s
s ECC Encryption s
s
Plain Text
Cipher Text
5 Experimental Simulation
The simulation is carried out using Mathematical on a system with RYZEN 7 5000
series as processor, 16 GB RAM. ECC Brainpool Parameters for 512-Bit Curves
(. ECC512 curve) [18] is used. Any text data can be used for encryption.
Encryption process As an example let the input be: Koneru Lakshmaiah Education
foundation Guntur, Andhra Pradesh 522302. So, its corresponding ASCII values
will be- {75, 111, 110, 101, 114, 117, 32, 76, 97, 107, 115, 104, 109, 97, 105, 97,
104, 32, 69, 100, 117, 99, 97, 116, 105, 111, 110, 32, 102, 111, 117, 110, 100,
97, 116, 105, 111, 110, 32, 71, 117, 110, 116, 117, 114, 44, 32, 65, 110, 100, 104,
114, 97, 32, 80, 114, 97, 100, 101, 115, 104, 32, 53, 50, 50, 51, 48, 50}. And
the XoRed data is obtained as- {78, 25, 67, 236, 248, 230, 115, 200, 29, 25, 104,
78, 40, 34, 106, 80, 64, 91, 103, 87, 114, 75, 13, 225, 77, 43, 13, 57, 124, 56,
8, 39, 69, 105, 230, 248, 254, 34, 81, 117, 94, 89, 75, 87, 26, 8, 165, 92, 53, 9,
113, 20, 237, 83, 223, 85, 245, 38, 245, 72, 31, 53, 61, 111, 50, 18, 109, 125.}.
With the . ECC512 curve, 31 ASCII characters can be grouped [19] using the data
grouping algorithm. After data grouping, the grouped ASCII values are generated as
(234136428739045330699128408015630808708125583027459807202164636369605
1964706311736867755528131649211506053168719863324161623481619564754351
92615029, 3433971040155273319344750108422875121163765540091608785014053
9015657813159510245160576271572969789227080230947761550347425550890901
7135687624294432, 64073990790852999619018802). The input (. Pm ) for the ECC
encryption is taken as. Pm [1]= (2341364287390453306991284080156308087081255
830274598072021646363696051964706311736867755528131649211506053168719
86332416162348161956475435192615029, 343397104015527331934475010842287
512116376554009160878501405390156578131595102451605762715729697892270
802309477615503474255508909017135687624294432 and. Pm [2]= (6407399079085
2999619018802, 32). As the series is odd 32 is padded. After performing the encryp-
tion operation and mapping to its corresponding ASCII character, the cipher text is
generated as shown in Fig. 3.
Decryption process To the receiver’s side, the secret seed value is obtained through
Diffie-Hellman key exchange methodology. The cipher text is also obtained, and
its corresponding ASCII value is generated as (5713, 9933, 7552, 2032, 61898,
51902, 2129, 64847, 18986, 46567, 11406, 22448, 3623, 33762, 13639, 23347,
61355, 8546, 35491, 53060, 62795, 12998, 19758, 18365, 27639, 59146, 27837,
7240, 55565, 18301, 266, 23758, 6059, 5448, 60237, 22395, 35904, 34552, 34643,
62697, 50282, 7939, 33091, 21466, 9806, 54530, 16759, 11541, 6495, 27776, 2473,
36629, 2101, 61940, 56653, 34610, 7985, 34171, 7914, 14258, 43804, 4131, 11407,
48397, 28328, 7756, 13897, 3002, 49230, 31412, 32085, 18834, 219, 13498, 47050,
35891, 64995, 29259, 2112, 16930, 33157, 24114, 16805, 27949, 19678, 44486,
12601, 26367, 38128, 48349, 10148, 42166, 33352, 28755, 42228, 24219, 33945,
36586, 53019, 10272, 23688, 20614, 52782, 38289, 50240, 37789, 61036, 33175,
54140, 44924, 19202, 37383, 20944, 35767, 14662, 57554, 46006, 1582, 17501,
44470, 59259, 56649, 4214, 32943, 25415, 21629, 7027, 22764). After data group-
ing, we obtained the values as-(692617089956388824536608628841288590053
1964574752970775831636177001632709041223471323091848676365986807520925
230661579701384920024739891072163904960051489, 275497454792548792130339
0100395977366319657773641180457277490430902626674895543682695733337911
698764303589142167768931883714591950914869678122269099179592, 452481156
0089991239981546387384157036172096688426413904760445949592200527912764
3481249620738781923038159740598752199893078388652562410730865346101119
25208, 3107720616474291426241574354539899713886556055364482199927592282
7420416028906147387437477029825154145297225727292079201506705861792497
32595930757060914281). These values are taken as input, and ECC decryption is
performed. Then, it is converted to reverse data grouping as obtained as {78, 25, 67,
236, 248, 230, 115, 200, 29, 25, 104, 78, 40, 34, 106, 80, 64, 91, 103, 87, 114, 75, 13,
225, 77, 43, 13, 57, 124, 56, 8, 39, 69, 105, 230, 248, 254, 34, 81, 117, 94, 89, 75, 87,
26, 8, 165, 92, 53, 9, 113, 20, 237, 83, 223, 85, 245, 38, 245, 72, 31, 53, 61, 111, 50,
18, 109, 125.}. Finally using the secret key as seed value and XoRed is performed
to obtain {75, 111, 110, 101, 114, 117, 32, 76, 97, 107, 115, 104, 109, 97, 105, 97,
104, 32, 69, 100, 117, 99, 97, 116, 105, 111, 110, 32, 102, 111, 117, 110, 100, 97,
116, 105, 111, 110, 32, 71, 117, 110, 116, 117, 114, 44, 32, 65, 110, 100, 104, 114,
97, 32, 80, 114, 97, 100, 101, 115, 104, 32, 53, 50, 50, 51, 48, 50}. These values are
converted back to its ACSII characters to get the decrypted text which is the same as
input text data.
6 Experimental Analysis
The size of the key that is used often affects how secure a cryptographic procedure is.
Because there are more potential keys to attempt, a bigger key size typically suggests
Text Encryption Using ECC and Chaotic Map 255
that it will be harder for an attacker to guess or break the key. However, as you pointed
out, it’s crucial to take into account the computational load while raising the key size
because it might affect the system’s performance. The proposed method used key
space as .2512 .
Data is encrypted and decrypted using keys in cryptographic systems, and the security
of the system depends on keeping the keys private. Key sensitivity is therefore a
crucial factor to take into account when developing and implementing a cryptographic
system, and precautions should be made to guarantee that keys are handled and
maintained securely. When the key is change by a single bit, the whole outputs get
an avalanche effect. While decrypting, Fig. 4 shows the key sensitivity effect with 1
bit difference in key.
Data are shown using a histogram according to their frequencies. The histogram for
the cipher text shows equal distribution of its frequency and is shown in Fig. 5.
A sort of cryptographic assault known as a cipher text only attack (COA) occurs
when an attacker only has access to cipher text and is unaware of the accompanying
plain text or encryption key. By examining the cipher text, the attacker seeks to
ascertain the plain text or the key. A typical attack used to evaluate the security of
cryptographic systems is the COA. Both the strength of the encryption technique
and the security of the key used may be assessed using them. In general, the more
difficult it is to properly conduct a COA, the stronger the encryption technique and
the longer the key. Due to our method’s adoption of . ECC512 parameters, this attack
will fail.
The proposed method is compared with the methods in [1]. The inputs are take
as 10,000 words counts for different text inputs. While using the data grouping,
the proposed method can grouped more ASSCII characters as the value of ECC
prime modulo. P = 512. Table 1 shows the performance comparison with the existing
methods where the proposed method shows better performance.
7 Conclusion
A new and efficient text encryption algorithm has been proposed based on ECC and
chaotic system. The input text data is converted to its corresponding ASCII values,
and a chaotic sequence is also generated using the secret key as seed value. XoRed
operation is performed in between the input data and chaotic sequence which is later
grouped to a fixed grouped size. Each group is then converted to a number of single
large integers. ECC encryption is then performed on the larger integers generated.
Further, the encrypted integers are then converted to ASCII values, and finally, the
cipher text is generated. The proposed method can successfully encrypt and decrypt
the input text data.
References
1. Keerthi K, Surendiran B (2017) Elliptic curve cryptography for secured text encryption. In:
International conference on circuit, power and computing technologies (ICCPCT). IEEE, pp
1–5
2. Agrawal E, Ram Pal P (2017) A secure and fast approach for encryption and decryption of
message communication. Int J Eng Sci 11481
3. Laiphrakpam DS, Rohit T, Singh KM, Awida MA (2022) Encrypting multiple images with an
enhanced Chaotic map. IEEE Access 10:87844–87859
4. Gamido HV, Sison AM, Medina RP (2018) Mdified AES for text and image encryption. Indone-
sian J Electr Eng Comput Sci 11(3):942–948
5. Lauter K, Private AI: machine learning on encrypted data. In: Recent advances in industrial
and applied mathematics, pp 97–113
6. Almuzaini KK, Kumar SA, Raju R, Vikas G, Shrivastava R, Halifa A (2022) Key aggrega-
tion cryptosystem and double encryption method for cloud-based intelligent machine learning
techniques-based health monitoring systems. In: Computational intelligence and neuroscience,
2022
7. Jaschke A, Armknecht F (2018) Unsupervised machine learning on encrypted data. In: 25th
international conference, Calgary, AB, Canada, Aug 15–17
8. Bao Z, Xue R (2021) Survey on deep learning applications in digital image security. Opt Eng
60(12):120901
9. Laiphrakpam DS, Khumanthem MS (2017) Medical image encryption based on improved
ElGamal encryption technique. Optik 147:88–102
10. Shahna KU, Mohamed A (2020) A novel image encryption scheme using both pixel level and
bit level permutation with chaotic map. Appl Soft Comput 90:106162
11. Moafimadani SS, Chen Y, Tang C (2019) A new algorithm for medical color images encryption
using chaotic systems. Entropy 21(6):577
12. Maniyath SR, Thanikaiselvan V (2020) An efficient image encryption using deep neural net-
work and chaotic map. Microprocess Microsyst 77:103134
13. Erkan U, Toktas A, Enginoğlu S, Akbacak E, Thanh DNH (2022) An image encryption scheme
based on chaotic logarithmic map and key generation using deep CNN. Multim Tools Appl
81(5):7365–7391
14. Jasra B, Moon AH (2022) Color image encryption and authentication using dynamic DNA
encoding and hyper chaotic system. Expert Syst Appl 206:117861
15. Lagmiri SN, Elalami N, Elalami J (2018) Color and gray images encryption algorithm using
chaotic systems of different dimensions. Int J Comput Sci Netw Secur 18(1)
258 P. N. V. L. S. Sneha Sree et al.
16. Yu J et al (2020) Image parallel encryption technology based on sequence generator and chaotic
measurement matrix. Entropy 22(1):76
17. Khoirom MS, Laiphrakpam DS, Tuithung T (2021) Audio encryption using ameliorated ElGa-
mal public key encryption over finite field. Wireless Pers Commun 117:809–823
18. Elliptic curve parameter. http://www.ecc-brainpool.org/download/Domainparameters.pdf.
Accessed 19 Aug 2022
19. Singh KM, Singh LD, Tuithung T (2023) Improvement of image transmission using chaotic
system and elliptic curve cryptography. Multimed Tools Appl 82:1149–1170
20. Singh KM, Singh LD, Tuithung T (2022) Improvement of image transmission using chaotic
system and elliptic curve cryptography. Multim Tools Appl, 1–22
Plant Leaf Disease Detection
and Classification: A Survey
Abstract Yields are impacted by climate and temperature, making them susceptible
to pathogen infection during growth. Progressive disease detection and prevention
in crops are compulsory to avoid disease-induced damage during growth, harvest-
ing and post-harvesting, enhance productivity, and ensure yield sustainability. In
the earlier decade, the authors contributed several research articles to detect disease
locations and identify complex disease patterns using leaf images. The leaf is the
most prominent organ that shows the most distinct features that plant pathologists
can identify through visual inspection. This article analyzes the principal aspects that
affect the design and effectiveness of disease detection and classification framework
using current technologies. An in-depth analysis of the various findings, highlighting
advantages and shortcomings, has been discussed, leading to more realistic conclu-
sions about the subject. The assessment is centralized on providing a thorough study
and factors for evolving AI-based techniques to support plant disease detection and
provide disease oversight support to agriculturalists.
R. Bansal · R. K. Aggarwal
National Institute of Technology, Kurukshetra, India
e-mail: r_k_a@nitkkr.ac.in
R. Bansal (B)
JMIT Radaur, Radaur, India
e-mail: rajiv_62000071@nitkkr.ac.in; rajivbansal@jmit.ac.in
N. Goyal
M.M. Institute of Computer Technology & Business Management, Maharishi Markandeshwar
Deemed to be University, Mullana, Ambala, Haryana, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 259
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_22
260 R. Bansal et al.
1 Introduction
1.1 Motivation
The researcher can identify foliar diseases using computer vision, machine learn-
ing, and deep learning techniques. Effective disease diagnosis system must incor-
porate disease identification [3, 4], identification of multiple diseases in multiple
crops and multiple diseases in a single crop [5], estimation of the various disorders
[6], assessment of the right volume of pesticide needs to be spread [7, 8], and other
appropriate measures for restricting the spread of disease [4].
2 Related Work
Researchers have proposed several techniques in the last decades using computa-
tional intelligent practices, soft computing, and image processing to improve the
detection and classification system to facilitate crop field monitoring. The disease
detection and classification task are divided into various modules, i.e., image acqui-
sition, image pre-processing, image segmentation, and classification. The first and
foremost important phase is selecting input organs and taking pictures of the plant,
including the leaf, stem, root, and branches. The performance of any recognition
system entirely depends on training data. Therefore, the image acquisition phase is
a foundation and challenging.The requirements of plant diseases can be divided into
three different levels: what, where, and how.
1. “what” corresponds to the classification, identify the label of the category to
which it belongs.
2. “where” corresponds to the location task, identifying types of diseases that exist
in the image, but also gives their specific locations
3. “how” corresponds to the segmentation task.
After acquiring leaf images with disease patterns, the pre-processing image phase
deals with noise removal and content enhancement. In this phase, various image pre-
processing techniques like colour space conversion, threshold segmentation, rotation,
transformation, contrast stretching, smoothening, and many more are explored. The
next phase deals with the segmentation process. This phase aims to partition the
given images to get the region of interest, i.e., spotted or lesion region, infected area
in input images. The resultant image is easy to analyze and more meaningful in
separating infected and non-infected regions.
The segmented images are then forwarded to the next phase of feature extraction.
This phase converts an image to a feature vector. The feature represents the relevant
and discriminating attributes associated with the objects that differentiate them from
other objects. Various discriminating characteristics like shape, colour, and texture
are discussed to make classification efficient. In the last phase, the machine learn-
262 R. Bansal et al.
ing model comes into the picture that classifies healthy/ non-healthy plants. The
phase entirely depends on early stages, i.e., pre-processing, segmentation, and fea-
ture extraction. A model is trained with a set of training images, and then, a trained
model categorizes the new sample into healthy or diseased plants.
The machine learning model for disease detection and classification Joshi et al.
[9] uses 115 self-selected leaf images to classify four disease patterns. The author
employed colour moments, shape descriptor, eccentricity, and orientation and pre-
sented the 88.15% accuracy using . K -NN classifier. In the study [10], the author
explored the vegetation index classification model to identify sheath blight patterns
from leaf images captured through a multi-spectral camera. Coulibaly et al. [11]
deployed the transfer learning using the VGG-16 model to classify two disease pat-
terns and healthy leaf images. The author used only 99 images of the self-captured
dataset for training the classifier. The model produced 89% accuracy with 25 test
samples. Rothe et al. [12] utilized EBPNN with 85.52% accuracy on images captured
through the digital camera. The model was designed to classify three disease types:
bacterial blight, Alternaria, and myrotheciu. In another study [13], a self-collected
dataset captured using mobile phones in controlled conditions was analyzed with
textures features and gradient values. SVM with radial basis kernel produced 82%
classification results. Comparative analysis of various research studies focused on
image processing techniques and machine learning models is discussed in Table 1.
Deep learning-based disease detection and classification To overcome the arbi-
trary selection of plant disease spots and features, deep learning applications make the
feature extraction more objective and result in efficient research and high technology
transformation speed [11, 14–16]. In the study, [17], the transfer learning approach is
used to implement EfficientNet architecture with pre-trained noisy-student weights.
Experiments are performed using 14 plant cultures with 39 background and aug-
mented image sub-categories. Optimal mobile network-based CNN (OMCNN) was
implemented on various phases of disease detection [18]. Image segmentation is done
using bilateral filtering, and Kapur’s thresholding is used to identify the affected por-
tion of the leaf image. MobileNet was deployed at the feature extraction phase of
the detection and classification system. In addition, an emperor penguin optimizer is
used to optimize the hyperparameter. For the classification phase, an extreme learn-
ing machine is implemented. In the study, [19], transfer learning is used to identify
the diseases in multiple crops. CNN-based visual geometry group is used to improve
performance.
However, deep learning approaches suffer from several issues, including depen-
dency on the large amount of training data that increases the computational cost,
noise on labels, leads to overfit, and degraded performance.
See Table 1.
Table 1 Comparative analysis for plant disease detection and classification using leaf images with machine learning models
References Diseases Image data Count of images Features Classifier Accuracy (%)
[20] Brown spot, Leaf blight, APS image data & RRI 400 images, RTT- 70:30 Local features and SVM 94.16
False smut, vocabulary using SIFT
descriptors and building
using BoW technique,
respectively
[10] Sheath blight Captured using UAV Color features (red, green and Vegetation index method 63
(multi-spectral camera) hue)
[21] Gray leaf spot, Common rust, Plant Village 3823 images Local texture features SVM, DT, RF, Naïve Bayes’ 87
Northern blight, Healthy
[5] Common rust, Northern Self-captured, using 50 for each category Six first-order histograms and KNN (distance metric), SVM KNN: 85 SVM:
blight, both diseases, Healthy Samsung digital camera GLCM features (kernel) 88
leaf PL200, from Agriculture
university Dharwad
[22] Downy mildew, Frog eye, Plant Village 4775 leaf images Fusion (Color, texture, shape) SVM 84
Septoria leaf blight
Plant Leaf Disease Detection and Classification: A Survey
[23] 18 Culture, four disease 46 images LBP (Texture) One-class SVM 83.3–95
pattern
[24] Canker, Melanose, Scab, Image Gallery dataset 1000 citrus fruit images Color, Geometric, and Multiclass SVM , DT, LDA, 95.8
Greening, black spot, Texture features KNN, and Ensemble Boosted
Anthracnose Tree
[25] Powdery mildew Plant Village dataset 50 healthy and 50 infected Quantification of diseased 99
area using calculated lesion
area ratio
[26] Brown spot, Powdery mildew Collected from University of 100 images (Training: 60 & GLCM features SVM 100
Florida & Plant Village Testing: 40)
dataset
(continued)
263
264
Table 1 (continued)
References Diseases Image data Count of images Features Classifier Accuracy (%)
[27] Healthy, Fusarium, Captured from ARI, Turkey 80 leaf images Wavelet transform (min, max, ANN, NB, KNN 84
Mycorrhizal fungus, Both mean, SD)
[11] Yellowing, Malformation of Self-captured and some 124 images (training& Transfer learning VGG-16 model with early 89
ear, Plantule, Partial green downloaded from Internet validation: 99; testing: 25) stopping technique
ear, Healthy
[14] 25 plants & 58 diseases Combined (PlantVillage and 87,848 Transfer learning AlexNet, Overfeat, 33.27–99.53
self-captured) GoogLeNet, VGG,
AlexNetOWTBn
[15] 14 plants and 59 diseases Self-prepared (augmented) 1575 (original), 46,409 Transfer learning GoogLeNet 25–100
(augmented)
[16] Grapevine yellow Self-captured and Plant 272 (self-captured), Plant AlexNet, GoogLeNet, 98, 96, 98, 99,
Village datasets Village (3400 images) Inception-v3, ResNet-50 & 99, 94
101, SqueezeNet
[28] Esca Black rot Healthy Plant Village dataset 2986 Siamese network 92
[29] 9 tomato disease pattern Captured (natural 5000 DWT, Haar wavelets ResNet-50, VGG-16, 83.06
background) ResNet-101, ResNet-152,
ResNeXt-50
[30] 6 disease pattern Plant Village dataset 9000 Transfer learning CNN 99.84
[31] 17 classes of corn, grapevine, Plant Village dataset 15,873 fuzzy color histogram and PNN 95.68%
and tomato fuzzy GLCM for color and
texture
[32] 38 disease pattern and 44 Plant village dataset& own 54,305 & 10,851 Attention dense learning DADCNN-5 99.93%, 97.93%
health pattern dataset
[33] 5 apple leaf diseases Apple dataset 2141 RegNet-Adam Transafer learning 99.23%
R. Bansal et al.
Plant Leaf Disease Detection and Classification: A Survey 265
3 Findings
In this article, we surveyed more than 25 articles on plant disease detection and
classification. The article in the survey focused on multiple disease patterns, images
acquired in multiple conditions, different devices, including smartphone cameras,
digital cameras, UAV images captured with the multi-spectral camera, augmented
dataset, and images downloaded from the Google search engine. Some researchers
captured images with natural backgrounds; others collected leaf images with proper
sunlight and controlled settings. From the survey, it is inferred that the quality of leaf
images and the mode of collection greatly influenced the efficiency of the pattern
recognition system. It also reduces the overhead of the high pre-processing cost of
the disease detection phase. The entire recognition system aims to answer three lev-
els, i.e., “where”, “how”, and “what”. Several researchers focused on cost-effective
machine learning and image processing-based approaches, while others focused on
efficient deep learning-based models. The comparative analysis presented in Table 1
confirmed that amount of training images plays a significant role. The research study
focused on a single plant with two or more disease types. Few kinds of research
focused on large-scale disease detection and identification systems due to the high
computational complexity and the non-scalability of machine learning algorithms.
The similarity in the pattern of diseased spots in various cultures demands dis-
criminating attributes that increase computation cost and classification time. Deep
learning-based algorithms are more accurate than image processing tools but compu-
tationally expensive and require vast training data. The studies with transfer learning
approaches are inefficient on self-captured dataset and augmented dataset.
4 Conclusion
The survey concentrates on deep learning and machine learning approaches for foliar
disease detection and identification using leaf images while addressing several chal-
lenges. The performance of plant disease detection and recognition depends on the
quality of acquired images. Experimental results are greatly influenced by the avail-
ability of the dataset, i.e., real-time, controlled conditions, and limited size. The
framework must be effective and efficient. Improved accuracy with a lesser compu-
tational cost is highly desirable.
References
1. Bera T, Das A, Sil J, Das AK (2019) A survey on rice plant disease identification using
image processing and data mining techniques. In: Emerging technologies in data mining and
information security. Springer, Singapore, pp 365–376
266 R. Bansal et al.
2. Kaur S, Pandey S, Goel S (2018) Semi-automatic leaf disease detection and classification
system for soybean culture. IET Image Process 12(6):1038–1048
3. Johannes A, Picon A, Alvarez-Gila A, Echazarra J, Rodriguez-Vaamonde S, Navajas AD, Ortiz-
Barredo A (2017) Automatic plant disease diagnosis using mobile capture devices, applied on
a wheat use case. Comput Electron Agric 138:200–209
4. Syed-Ab-Rahman SF, Hesamian MH, Prasad M (2022) Citrus disease detection and classifi-
cation using end-to-end anchor-based deep learning model. Appl Intell 52(1):927–938
5. Deshapande AS, Giraddi SG, Karibasappa KG, Desai SD (2019) Fungal disease detection in
maize leaves using Haar wavelet features. In: Information and communication technology for
intelligent systems. Springer, Singapore, pp 275–286
6. Wang G, Sun Y, Wang J (2017) Automatic image-based plant disease severity estimation using
deep learning. Comput Intell Neurosci
7. Upadhyay SK, Kumar A (2021) Early-stage Brown spot disease recognition in paddy using
image processing and deep learning techniques. Trait du Signal 38(6)
8. Wang C, Du P, Wu H, Li J, Zhao C, Zhu H (2021) A cucumber leaf disease severity classification
method based on the fusion of DeepLabV3+ and U-Net. Comput Electron Agric 189:106373
9. Joshi AA, Jadhav BD (2016) Monitoring and controlling rice diseases using image processing
techniques. In: 2016 International conference on computing, analytics and security trends
(CAST). IEEE, pp 471–476
10. Zhang D, Zhou X, Zhang J, Lan Y, Xu C, Liang D (2018) Detection of rice sheath blight using
an unmanned aerial system with high-resolution color and multispectral imaging. PloS ONE
13(5):e0187470
11. Coulibaly S, Kamsu-Foguem B, Kamissoko D, Traore D (2019) Deep neural networks with
transfer learning in millet crop images. Comput Ind 108:115–120
12. Rothe PR, Kshirsagar RV (2015) Cotton leaf disease identification using pattern recognition
techniques. In: 2015 International conference on pervasive computing (ICPC). IEEE, pp 1–6
13. Hallau L, Neumann M, Klatt B, Kleinhenz B, Klein T, Kuhn C, Oerke EC (2018) Automated
identification of sugar beet diseases using smartphones. Plant Pathol 67(2):399–410
14. Ferentinos KP (2018) Deep learning models for plant disease detection and diagnosis. Comput
Electron Agric 145:311–318
15. Barbedo JGA (2019) Plant disease identification from individual lesions and spots using deep
learning. Biosyst Eng 180:96–107
16. Cruz A, Ampatzidis Y, Pierro R, Materazzi A, Panattoni A, De Bellis L, Luvisi A (2019) Detec-
tion of grapevine yellows symptoms in Vitis vinifera L. with artificial intelligence. Comput
Electron Agric 157:63–76
17. Hanh BT, Manh Van, H.,& Nguyen, N. V. (2022) Enhancing the performance of transferred effi-
cient net models in leaf image-based plant disease classification. J Plant Dis Protect 129(3):623–
634
18. Ashwinkumar S, Rajagopal S, Manimaran V, Jegajothi B (2022) Automated plant leaf disease
detection and classification using optimal MobileNet based convolutional neural networks.
Mater Today: Proc 51:480–487
19. Paymode AS, Malode VB (2022) Transfer learning for multi-crop leaf disease image classifi-
cation using convolutional neural network VGG. Artif Intell Agric 6:23–33
20. Bashir K, Rehman M, Bari M (2019) Detection and classification of rice diseases: an automated
approach using textural features. Mehran Univ Res J Eng Technol 38(1):239–250
21. Kusumo BS, Heryana A, Mahendra O, Pardede HF (2018) Machine learning-based for auto-
matic detection of corn-plant diseases using image processing. In: 2018 International confer-
ence on computer, control, informatics and its applications (IC3INA). IEEE, pp 93–97
22. Kaur S, Pandey S, Goel S (2018) Semi-automatic leaf disease detection and classification
system for soybean culture. IET Image Process 12(6):1038–1048
23. Pantazi XE, Moshou D, Tamouridou AA (2019) Automated leaf disease detection in different
crop species through image features analysis and one class classifiers. Comput Electron Agric
156:96–104
Plant Leaf Disease Detection and Classification: A Survey 267
24. Sharif M, Khan MA, Iqbal Z, Azam MF, Lali MIU, Javed MY (2018) Detection and classifi-
cation of citrus diseases in agriculture based on optimized weighted segmentation and feature
selection. Comput Electron Agric 150:220–234
25. Sengar N, Dutta MK, Travieso CM (2018) Computer vision based technique for identification
and quantification of powdery mildew disease in cherry leaves. Computing 100(11):1189–1201
26. Esmaeel AA (2018) A novel approach to classify and detect bean diseases based on image pro-
cessing. In: 2018 IEEE symposium on computer applications & industrial electronics (ISCAIE).
IEEE, pp 297–302
27. Karadağ K, Tenekeci ME, Taşaltın R, Bilgili A (2020) Detection of pepper fusarium disease
using machine learning algorithms based on spectral reflectance. Sustain Comput: Inform Syst
28:100299
28. Goncharov P, Ososkov G, Nechaevskiy A, Uzhinskiy A, Nestsiarenia I (2018) Disease detection
on the plant leaves by deep learning. In: International conference on neuroinformatics. Springer,
Cham, pp 151–159
29. Fuentes A, Yoon S, Kim SC, Park DS (2017) A robust deep-learning-based detector for real-
time tomato plant diseases and pests recognition. Sensors 17(9):2022
30. Ashqar BA, Abu-Naser SS (2018) Image-based tomato leaves diseases detection using deep
learning
31. Nagi R, Tripathy SS (2023) Plant disease identification using fuzzy feature extraction and PNN.
Signal, Image Video Process, pp 1–7
32. Pandey A, Jain K (2022) A robust deep attention dense convolutional neural network for plant
leaf disease identification and classification from smart phone captured real world images. Ecol
Inform 70:101725
33. Li L, Zhang S, Wang B (2022) Apple leaf disease identification with a small and imbalanced
dataset based on lightweight convolutional networks. Sensors 22(1):173
Performance Evaluation of K-SVCR
in Multi-class Scenario
Abstract The support vector classification regression machine for K-class classifi-
cation (K-SVCR) based on the “1-verses-1-verses-rest” structure is a unique multi-
class classification method, and it generates ternary output {−1, 0, 1}. Since it gener-
ates ternary output, the training data requires corresponding labels. In this article, we
have checked the performance of K-SVCR to explore the impact of (1) the labeling
of the class and (2) the relative location of class clouds. Many artificially gener-
ated datasets and a real-world dataset are considered to understand the impact of
the choice of labels and relative location of the class distribution. Accuracy is used
for K-SVCR concerning the choice of class labels and the relative location of data
clouds. We found that the change in the class labels—positive + , negative −, and
neutral 0 affects the accuracy of classification significantly.
1 Introduction
The support vector machines (SVMs) for binary class classification problems were
put forward by Vapnik [1]. The SVMs solve a convex optimization, a quadratic
programming problem (QPP). It has some important advantages such as globally
optimal solution [2] and unique solution [3]. SVMs have shown promising results
for different kinds of applications like identification of fingerprints [4], recognition
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 269
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_23
270 V. P. Srivastava et al.
of different face expressions [5], detection and discovery of drug [6], bio- medicine
[7], Alzheimer’s disease detection [8], plant identification [9], and so on. The SVMs
are specific to binary class classification problems. During the learning process, the
SVM classifier builds a decision function as given below:
2 Literature Review
Many researchers have used K-SVCR in many different contexts. Tian et al. [14]
proposed the K-LSVCR algorithm which solves linear programming instead of
quadratic programming. Ma et al. [15] proposed K-RLSSVCR which is the robust
least square version of the K-SVCR, and according to this paper, K-SVCR is sensitive
to outliers and also time-consuming so K-RLSSVCR uses truncated square loss and
squares ξ −insensitive loss of ramp to partly minimize the effect of outliers. Some
other researchers like [16, 17] have utilized K-SVCR in many different aspects. The
labels of the classes are usually chosen as + 1, − 1, and 0. In the K-LSVCR tech-
nique, a decision function is constructed through which the input given is partitioned
into three classes. When the decision function value = + 1, a positive vote is added
to the + 1 class, and to other classes, no vote is added. Similarly, when the value of
the decision function = − 1, a positive vote is added to the − 1 class, and to other
classes, no vote is added; when the value of the decision function = 0, then to both +
1 and − 1 classes, a negative vote is added, and to other classes, no votes are added,
and it provides neat information about other classes labeled as 0, like “1-versus-rest”
does.
Based on the motivation from Vapnik’s SVM theory, [18] for multi-class classifica-
tion, the support vector classification regression machine for K-class classification
(K-SVCR) [12] is a relatively newer method for multi-class classification. It gives
the output in the format {− 1, 0, 1}, i.e., ternary output. So, the K−SVCR combines
both regression and classification in the same machine, and all over the decomposi-
tion state, it maintains the “1-versus-1-versus-rest” organization for all the training
data instances (Fig. 1).
The formulation of K−SVCR as a convex QPP is given as
1
min w2 + c1 (e1t ξ1 + e2t ξ2 ) + c2 e3t (φ + φ ∗ ) (3)
w,b,ξ1 ,ξ2 ,φ,φ ∗ 2
subject to
Aw + e1 b ≥ e1 − ξ1
− (Bw + e2 b) ≥ e2 − ξ2
− δe3 − φ∗ ≤ Cw + e3 b ≤ δe3 + φ,
ξ1 , ξ2 , φ, φ ∗ ≥ 0
272 V. P. Srivastava et al.
Fig. 1 Geometrical
representation of K-SVCR.
Multi-class classification
with ternary {−1, 0, + 1}
output
Here, positive slack variables are ξ 1 , ξ 2 , φ, and φ ∗ , c1 , c2 are the penalty parameters,
and e1 , e2 , and e3 are vectors of ones of suitable dimensions. To avoid overlapping,
the positive parameter δ must be less than 1.
The formulation of dual of the primal equation of the above equation can be stated
as
1
max q t α − α t H α, (4)
α 2
subject to
0 ≤ α ≤ F,
here,
Q = At , −B t , C t , −C t , H = Q t Q,
F = [c1 e1 ; c1 e2 ; c2 e2 ; c2 e3 ], w = Q t α
and
4 Experimental Design
Fig. 5 Classification performance of K-SVCR for Iris dataset (sepal length and sepal width)
on the basis of neutral class chosen. Therefore, one of three classes, Iris Setosa,
Iris Versicolor, and Iris. Virginica are considered as neutral class. The remaining
two classes are labeled positive and negative arbitrarily, so in total, 18 datasets are
considered to understand the impact of choice of labeling in K-SVCR. The diagrams
with accuracy are generated by doing experiments on these datasets for K-SVCR
performance evaluation. These datasets consist of 150 samples in each category and
150 corresponding labels, 450 in all. The model’s performance is evaluated on 10%
data samples from all classes selected, and the remaining 90%, i.e., 405 samples, are
used to train the model.
Experiments are performed with random values of c1 and c2 , while the value of δ
is taken from 0 to 1. We have tuned the different values of c1 , c2 from 1 to 10, and
the value of δ from 0 to 1 for getting better accuracy but preferably values taken as
c1 = 0.1, c2 = 1 and δ = 0.3. By conducting the experiments on above mentioned 18
datasets, the diagrams with accuracy are generated as given in Figs. 5, 6, 7, 8, 9, and
10. For example, consider Fig. 9, sepal length petal width dataset, the accuracies are
93%, 80%, and 73% by keeping the neutral class Iris Versicolor, Iris Setosa, and Iris
Virginica, respectively, Also from Table 1, it is explicit that when we keep Versicolor
as a neutral class, the accuracy is remarkably higher in 5 cases, while for petal length
petal width dataset case, the accuracy is less than the Setosa neutral case. For the
sepal width petal length label (SW PW Label) dataset, the minimum training data
required is 70% for training the model, and the accuracy generated in this case is
remarkably well and is 91%, but for other cases, the training data 90% holds good
and is generating the better accuracy (Table 1).
In this article, the support vector classification regression machine for K-class clas-
sification (K-SVCR) is implemented on two artificially generated datasets and also
on 18 different sub-datasets of real-world Iris datasets. The model is based on the
“1-verses-1-verses-rest” structure. It solves a three-class classification problem by
Performance Evaluation of K-SVCR in Multi-class Scenario 277
Fig. 6 Classification performance of K-SVCR for Iris dataset (sepal length, petal length)
Fig. 7 Classification performance of K-SVCR for Iris dataset (sepal length, petal width)
Fig. 8 Classification performance of K-SVCR for Iris dataset (sepal width, petal length)
solving one QPP and represents a unique multi-class classification method that gener-
ates ternary output {−1, 0, 1}. Since it generates ternary output, the performance of
K-SVCR is analyzed with respect to the class label. From the experiments conducted,
it is noted that labeling classes are a challenging task that significantly affects perfor-
mance. For the real-world classification problem, it is essential to know which class
would be the neutral class because the neutral class plays a significant role in better
278 V. P. Srivastava et al.
Fig. 9 Classification performance of K-SVCR for Iris dataset (sepal width, petal width)
Fig. 10 Classification performance of K-SVCR for Iris dataset (petal length, petal width)
performance. It is explicit that when we keep Versicolor as a neutral class, the accu-
racy is remarkably higher in 5 cases, while for the sepal length sepal width dataset
case, the accuracy is equal to Virginica neutral case. Some methods may be explored
to provide labels to classes while solving multi-class classification problems.
Performance Evaluation of K-SVCR in Multi-class Scenario 279
References
M. Imam · S. Adam
Department of Computer Science, ARSD College, University of Delhi, New Delhi, India
N. Agrawal (Garg)
Department of Physics, University of Allahabad, Prayagraj, U.P., India
S. Kumar (B) · A. Gosain
USICT, GGSIPU, New Delhi, India
e-mail: suyashgarg@gmail.com
S. Kumar
Department of Computer Science, Hansraj College, University of Delhi, New Delhi, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 281
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_24
282 M. Imam et al.
1 Introduction
2 Literature Survey
Numerous studies have been conducted utilizing diverse machine learning and
deep learning methodologies for classifying heart disease and its stages. We have
reviewed some of the significant research on the prediction of CVD including classic
An Ensemble Method for Categorizing Cardiovascular Disease 283
and ensemble-based machine learning and deep learning techniques in this part as
summarized in Table 1.
The limitations and advantages of the proposed methods for CVD diagnosis have
been outlined in Table 1 for better understanding the significance of the proposed
approach. To address the limitations, new methods are needed to accurately detect
CVD.
3 Methodology
Figure 1 shows the workflow of our methodology which includes data preprocessing
for treatment of outlier and skewness of the attributes. Each of these techniques has
been covered in detail as follows:
284 M. Imam et al.
3.1 Dataset
In this study, we obtain the heart disease dataset from the UCI machine learning
repository heart disease dataset. We have a total of 303 instances, 164 of which
belong to the healthy and 139 to those with cardiac disease with 14 clinical features
are collected for each data record [13].
In addition to the methods employed, the quality of the dataset and the preprocessing
techniques also influences the performance and precision of the prediction model.
Preprocessing prepares the dataset and transforms it into a format that the algorithm
can interpret. It is possible for datasets to contain errors, missing data, redundancy,
noise, and other issues that render the data unsuitable for direct usage by the machine
learning algorithm. The size of the dataset is an additional issue. Some datasets have a
large number of attributes, which makes it more difficult for the algorithm to examine
the data, detect patterns, and generate correct predictions.
Moreover, many ML models work better when the data is distributed normally
and worse when it is skewed. It is crucial to recognize the skewness that is present
in the features and to carry out appropriate transformations and mappings in order to
convert the skewed distribution into a normal distribution. In our dataset, we apply
logarithmic transformation to the skewed attributes excluding those whose skewness
value is minimal. As a result, the majority of the data for each feature on which
logarithmic transformation is applied, shifts closer to its respective mean, which has
a substantial effect on the skewness value which can be seen in Fig. 2.
Apart from this, a dataset may contain outlier values that deviate from the other
data and exceed the expected range. Figure 3 shows outliers in dataset. We perform
outlier removal on the basis of attributes trestbps and chol which do not lie in the
range.
This is the most critical step, during which a model for predicting the disease class,
i.e., whether a person has heart disease or not is constructed. For this, we have imple-
mented a number of machine learning methods. Our problem statement is a binary
class classification problem, and the algorithm is a supervised learning method [15]
for classifying incoming observations according to previously established criteria.
286 M. Imam et al.
The models used to determine whether or not a person has a cardiovascular disease
will all be discussed in the section that follows.
Ensemble-based classification algorithms are one of the most extensively used
classification techniques for class imbalanced problem [16]. Their popularity is a
result of their superior performance as compared to single-learner systems and their
ease of deployment in real-world healthcare applications [17]. Our work focus on
classification using various ensemble models and comparing them with traditional
machine learning methods. We specifically used 10 machine learning models to
classify the presence of CVD, including ensemble models such as AdaBoost, Gradi-
entBoost, XgBoost, LightGBM, random forest, and some other classical machine
learning models such as support vector classifier, decision tree, and K-nearest
neighbors.
We have used fivefold cross-validation to evaluate the efficacy (or accuracy) of
machine learning models. It protects against overfitting in prediction models, partic-
ularly when the amount of data may be limited. Further, we have also used Grid-
SearchCV to specify hyperparameters to fit the estimator model with the best score
on our training dataset. Table 2 displays the hyperparameters that led to the highest
prediction score.
An Ensemble Method for Categorizing Cardiovascular Disease 287
that they have low false positive rates for class 0 samples. The SVC model had a
precision of 0.92 for class 0, which suggests that it may have a higher false positive
rate compared to other models.
In terms of class 1, i.e., the person has a heart disease, all the models except
the SVC model had a precision and F1-score of 0.94 or higher, indicating that they
perform well in identifying class 1 samples. The SVC model had a low precision
and F1-score of 0.67 and 0.92, respectively, for class 1, which suggests that it may
not be as effective in identifying class 1 samples compared to other models.
ROC Curve
When the threshold for classifying a sample is changed, the performance of a binary
classifier system is graphically represented by a ROC curve. The true positive rate
(TPR) and false positive rate (FPR) at various categorization levels are plotted on
the ROC curve. The percentage of positive samples that are correctly categorized
as positive is known as the true positive rate, whereas the percentage of negative
samples that are wrongly classified as positive is known as the false positive rate,
while a perfect classifier will have a ROC curve that is a step function from (0, 0) to
(0, 1), a classifier that is randomly guessing will have a ROC curve that is a diagonal
line from the bottom left to the top right of the figure.
An Ensemble Method for Categorizing Cardiovascular Disease 289
They are helpful since they do not take class imbalance into account and give a
visual representation of the trade-off between the genuine positive rate and the false
positive rate. Figure 5 displays the combined ROC curve for each selected model,
and its shows that XgBoost and AdaBoost outperform other models with 95.5%
accuracy.
The models evaluated in this study show promising results for the task of predicting
heart disease. The AdaBoost and XgBoost models showed particularly strong perfor-
mance and could be recommended for use in similar binary classification tasks. The
optimization of the hyperparameters for each model played a crucial role in achieving
the high-performance results, highlighting the importance of tuning these parameters
for improved model performance.
Our study summarizes the latest developments in the area of cardiovascular disease
classification through the use of ensemble methods. In this paper, we have offered a
comparative analysis of ensemble machine learning-based computational models in
the application of predicting cardiovascular diseases. Our research findings demon-
strate that the use of boosting-based ensemble learning algorithms significantly
outperforms the application of a single classic machine learning algorithm. To build a
robust model for classifying heart disease based on the given attributes in the dataset,
we employed careful feature selection techniques and utilized evaluation techniques
like cross-validation and grid-search CV for optimal hyperparameter tuning.
290 M. Imam et al.
References
1. Chen IY, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M (2021) Ethical machine learning
in healthcare. Ann Rev Biomed Data Sci 4:123–144
2. Qayyum A, Qadir J, Bilal M, Al-Fuqaha A (2020) Secure and robust machine learning for
healthcare: a survey. IEEE Rev Biomed Eng 14:156–180
3. Austin PC, Tu JV, Ho JE, Levy D, Lee DS (2013) Using methods from the data-mining and
machine-learning literature for disease classification and prediction: a case study examining
classification of heart failure subtypes. J Clin Epidemiol 66(4):398–407
4. Kumar S, Kaur P, Gosain A (2022)A comprehensive survey on ensemble methods. In: IEEE
7th international conference for convergence in technology (I2CT), Mumbai, India, 2022, pp
1–7. https://doi.org/10.1109/I2CT54291.2022.9825269
5. Dehkordi SK, Sajedi H (2018) Prediction of disease based on prescription using data mining
methods. Health Technol 9(1):37–44
6. Jan M, Awan AA, Khalid MS, Nisar S (2018) Ensemble approach for developing a smart heart
disease prediction system using classification algorithms. Res Rep Clin Cardiol 9:33–45
7. Venkatalakshmi B, Shivsankar M (2014) Heart disease diagnosis using predictive data mining,
international journal of innovative research in science. Eng Technol 3(3):1873–1877
8. Miao F, Cai Y-P, Zhang Y-X, Fan X-M, Li Y (2018) Predictive modeling of hospital mortality
for patients with heart failure by using an improved random survival forest. IEEE Access
6:7244–7253
9. Lakshmi MS, Haritha D, SRKIT V (2016) Heart disease diagnosis using predictive data mining.
Int J Comput Sci Inf Secur
10. Javeed A, Zhou S, Yongjian L, Qasim I, Noor A, Nour R (2019) An intelligent learning system
based on random search algorithm and optimized random forest model for improved heart
disease detection. IEEE Access 7:180235–180243
11. Soni J, Ansari U, Sharma D, Soni S (2011) Predictive data mining for medical diagnosis: an
overview of heart disease prediction. Int J Comput Appl 17(8):43–48
12. Islam HM, Elgendy Y, Segal R, Bavry AA, Bian J (2017) Risk prediction model for inhospital
mortality in women with ST-elevation myocardial infarction: a machine learning approach. J
Heart Lung 1–7
13. Brahmi B, Shirvani MH (2015) Prediction and diagnosis of heart disease by data mining
techniques. J Multi Eng Sci Technol 2:164–168
14. Benesty J, Chen J, Huang Y (2008) On the importance of the Pearson correlation coefficient
in noise reduction. IEEE Trans Audio Speech Lang Process 16(4):757–765
15. Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning
algorithms. In: Proceedings of the 23rd international conference on Machine learning. pp
161–168
16. Rokach L (2019) Ensemble learning: pattern classification using ensemble methods
An Ensemble Method for Categorizing Cardiovascular Disease 291
17. Che D, Liu Q, Rasheed K, Tao X (2011) Decision tree and ensemble learning algorithms with
their applications in bioinformatics. In: Software tools and algorithms for biological systems.
pp 191–199
18. Hossin M, Sulaiman MN (2015) A review on evaluation metrics for data classification
evaluations. Int J Data Min Knowl Manage Process 5(2)
Intrusion Detection System for Internet
of Medical Things
1 Introduction
Humanity’s curiosity to know more has unearthed immense amounts of data, which
attract multiple adversaries for benefits. In recent years, the Internet of things (IoT)
exposed such data to the Internet and made it more vulnerable. Although intrusion
detection systems exist to cope with vulnerabilities, the data-sensitive healthcare
industry is increasingly facing cybersecurity issues [1]. Internet of medical things
(IoMT) is a subset of IoT that relates to the medical, healthcare, and personal well-
being of an individual and their loved ones. Since the exposure of humans to the
Internet is making them conscious about themselves, the number of monitoring
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 293
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_25
294 P. Kulshrestha et al.
devices over the internet has increased immensely [2, 3] resulting in an increase in the
IoMT cyberthreats. The primary threat is the unknown cyber-attacks, whose signature
is unavailable and therefore remains unrecognizable by the detection mechanisms
[4].
Industries like Manufacturing, Finance, Professional Services, Energy, Retail,
Healthcare, Transportation, Government, Education, Media, etc., are facing cyber-
security issues. According to IBM’s security X force report, the healthcare industry
has jumped from 10th in 2019 to 7th in 2020 and up to 6th in 2021 [5], which also
draws major attention to this industry. People do follow latest market trends but
are not fully aware of the gadgets. They usually take protocols for granted, which
could lead to hazardous situations. Although no system can be fully secure, they are
responsible for preserving the greatest amount of security and privacy.
Cyber-attacks can have an impact both financially or non-financially. Most people
often underestimate theft and take it casually especially when such thefts have no loss
associated, viz. currency value. Devices in the IoMT environment are closer to one
another. This leads to people unknowingly providing windows to adversaries. The
problem is with the devices on edge that are unable to perform rigorous computations
and which require data to be transferred to the cloud. Sniffing personal data and
injecting undesirable data in and out of the system are done while communicating
data on the network. Addressing these flaws in existing mechanisms is the focus of
this paper.
The paper is organized as follows. Section 2 is concerned with the overview
of the IoMT with some use-cases. It defines the components of IoMT and its 3-
layered architecture with proper functioning. Also, a brief taxonomy of intrusion
detection systems (IDS) is given with detail of the previous works related to health-
care cybersecurity. In Sect. 3, the possible integrative technologies with IoMT are
briefly discussed. Subsequently, in Sect. 4, the challenges and limitations such as
the inefficiency of devices to compute, not having signatures of unknown malware,
improper/insufficient datasets, high false alarm rates of IDS, low intrusion detection
rates, non-generic IDS or platform independent malware detectors, etc., are briefly
discussed to open an insight on the present-day picture of malware detection in IoMT.
Section 5 concludes this paper.
2 IoMT
IoMT is an application of IoT that inter-relates medical equipment used for moni-
toring and assessing an individual’s health. Everything focuses on an individual’s
health awareness, monitoring and treatment. The streamlining of services through
the targeting of each such service can generate many use-cases where IoMT can
be of help. Some of the use-cases could be remote patient monitoring, smart
hospital management, self-health management, real-time data analytics, fast emer-
gency services, use of ingestible sensors, etc. Subsequent subsections will provide
an insight to its components, architecture, and intrusion detection systems.
Intrusion Detection System for Internet of Medical Things 295
Wearable sensors play a vital role in capturing sensitive data. The term wireless body
area network (WBAN) got the IEEE standard in Feb 2012; it was coined by Van
Dam in 2001 [6]. WBAN contains sensor nodes that are attached to the living body
to measure bio-signals like heart rate, blood pressure, Spo2, brain signals, etc. These
sensor nodes communicate data in two ways, namely the in-body communication
and the on-body communication. These sensors, which are built for short-range
communication, are always in a slave mode to feed raw data to the master. Through the
Internet, the data is synced back to the servers of service providers like Amazon web
services (AWS), Apple HealthKit, Android HealthKit, etc., for analytical purposes.
In hospitals, the medical devices directly send the Electronic Health Record (EHR)
to their respective authorized (ideally) masters for storage and analysis.
The next subsection provides the detailed architecture of IoMT along with the
workings of each layer.
Intrusion is an unauthorized access to any digital system with the intention to damage
or gain sensitive information therefrom. An attack that can compromise any of the
confidentiality, integrity, or availability is considered as an intrusion. Systems which
detect intrusions are called IDS. There exist many types of software based IDSs
which can be broadly classified according to their methodology, input data source or
behavior [8] as shown in Table 1.
HIDS runs on an independent device/host and monitors the traffic from the device
only, whereas NIDS is set up in a network to monitor traffic from every device on the
network. Passive IDS primarily just log and notify the possible threat while active
296 P. Kulshrestha et al.
IDS suitably change the environment accordingly to block the threat. SIDS detects on
the basis of previously stored patterns, while AIDS tries to detect unknown malware
attacks.
Many types of malware intrusion corrupt the system, namely denial of services
(DoS), distributed DoS (DDoS), SQL injection, malware attacks with both known
and unknown signatures, Botnet attacks, etc. Various intrusion detection models exist
in the healthcare domain to counter such attacks. These are presented in Table 2.
The ML and DL-based IDS are efficient and capable of detecting such attacks
with accuracy. IDS’s capabilities can be enhanced through the merger of new and
innovative technologies. The next section briefly mentions the upcoming technolo-
gies pertaining to IoMT cybersecurity that can be integrated to provide appropriate
features they are proficient in.
Intrusion Detection System for Internet of Medical Things 297
Table 2 (continued)
Objective Used approach Outcomes Dataset(s) Limitations
Integrating smart Multilayer Accuracy = Synthetic dataset • Not all metrics
detection engine perceptron 93% with are used
into a firewall or (MLP) and two hidden
Web filter or wavelet neural layers
intrusion network Accuracy =
detectors [12] 90% with
one hidden
layer
Highly scalable 3 hybrid deep Accuracy = Publicly available IoT • This framework
hybrid (deep learning 99.83% dataset (name not is prone to a
learning) algorithms Precision = mentioned) single point of
DL—driven Convolutional 99.43% failure
software defined neural network Recall =
network long short-term 99.73%
(SDN)—enabled memory F1-score =
framework [13] (CNN-LSTM) 99.77%
shows best
performance
Deep neural Compared Accuracy = Benchmark dataset • The dataset is
network-based KNN, deep 99.9% from Kaggle (name not not designed
IDS [14] neural network DNN-PCA mentioned) for IoMT
(DNN), Naïve with gray • Overhead is not
Bayes (NB), wolf gave calculated
RF, SVM the best
modified with result
principal
component
analysis (PCA)
and grey wolf
Fog-based attack Design of a Accuracy = NSL-KDD (network • Overhead is not
detection FBAD 98.19% security calculated
(FBAD) framework Detection laboratory—knowledge • Not all metrics
framework [15] using an rate = discovery in databases) are used
ensemble of 97.09%
online False
sequential positive rate
extreme = 2.04%
learning
machine
Integrating IoMT with technologies like AI has multiplied its strength in terms of
security and smartness. Some of the technologies that can be integrated into IoMT
are briefly discussed in the following subsections.
Intrusion Detection System for Internet of Medical Things 299
Blockchains are believed to have been invented by Satoshi Nakamoto in 2008 for
the digital currency bitcoin, which provides an immutable ledger of transactions.
It is a peer-to-peer technology to share data and its computation in a decentralized
manner [16]. Features of blockchain can be used for handling vulnerabilities in
EHRs. For instance, if three stakeholders (Hospital, Government, Insurance provider)
are involved in a task, then every peer node hosts an instance of distributed ledger
consisting of EHRs. It will ensure the tamper-proof digital EHRs. Additionally,
decentralized control can distract adversaries from a single point of attack.
PUFs is a physical entity that for a certain given condition and inputs produces a
physically defined digital fingerprint that serves as a unique identifier. It works by
implementing a challenge-response authentication. For instance to check the legiti-
macy of the doctor, an individual or a sensor node before establishing a communi-
cation session between them, one can use PUFs for physical secure authentication
[17].
SDN is the physical separation or abstraction of the network control plane from the
data forwarding plane. The data plane forwards the network traffic to the destination,
while the control plane manages the tasks involved to make forwarding decisions.
It defines a protocol between the two planes. The SDN can be used to establish
communication between each IoMT connected devices and sensors [18].
3.4 5G Technology
Since no system is fully foolproof, the emergent flaws in them provide suitable
opportunities to further evolve with advancements in technologies. The next section
will discuss the challenges and limitations associated with the established IDSs in
the development of the emerging integrative technologies.
The challenge is to make users fully aware and make them learn its remedies. Further-
more, this section discusses the challenges and limitations in context to the resource
providers and developers which are open for approaching future research directions.
The intrusion detection mechanisms discussed in literature are not adequately capable
enough to detect various types of malwares. Mostly, IDS is attack specific or specific
to the environment of IoMT test bed. So, there is a need for robust systems which
can handle different types of attacks simultaneously and effectively.
Since WBAN sensors and devices are meant to be portable and wearable, their size
restriction restricts their in-house power capacity. This impacts their efficiency as
regards their computing strength capacity [6, 20].
IoMT constitutes a wide range of sensors and devices, from fully capable server
machines to just small actuators. Every such device works on different protocols.
Hence, IDS should be designed so that they are able to support every block and piece
effectively [21].
4.4 Scalability
IoMT is a heterogenous type of network with different sizes and protocols of commu-
nication of devices that have a varying resource consumption. Since, for every block
Intrusion Detection System for Internet of Medical Things 301
A literature review is evident from the fact that most IDS is validated on datasets
obtained from IoMT test beds. For better intrusion detection, generic and published
datasets in the domain of IoMT need to be researched [24–26].
Signature-based attacks are easy to rectify. But today unknown attacks, having no
previous signature, are taking place. Zero-day attacks are software vulnerabilities
that are completely unknown to the stakeholders [27, 28]. In healthcare organiza-
tions worldwide, IoMT is experiencing such zero-day attacks, glimpses of which are
reported by the U.S. department of Health and Human Services (HHS) [29].
Integration with different technologies strengthens the product and adds more quality
to it even as it inherits its negative consequences. While using blockchain-enabled
systems, scalability is the issue [16, 30]. The number of EHRs will be saturated for
a particular framework of hospitals. Also, the mining cost of day-to-day interactive
logs of wearables will not be cost efficient. Similarly, every device cannot be made
SDN enabled hence difficulty in data gathering [18]. According to Masud et al. [17],
in PUF systems, there are authentication bottlenecks at the satellite broadband pivot
point. In case of 5G in IoMT, the major problems are the high cost of telesurgery
devices and the attached legal requirements [31].
302 P. Kulshrestha et al.
5 Conclusion
AI is rapidly changing the healthcare industry. Smart healthcare systems are achiev-
able through IoMT technology. The interconnectivity of electronics and digital
devices to serve efficiently and smartly has generally led to making a trade-off with its
security. Security challenges could include hacking into EHRs or using AI-powered
malware to disrupt the operation of medical devices. This paper provides an insight
into IoMT and its security. This paper enunciates the 3-layered architecture of IoMT,
which are device, fog, and cloud layer. Further, some of the intrusion detection
mechanisms were discussed based on ML and DL. Based on the literature review,
the DL-based models shown high accuracy in detecting intrusion. The problems
with the existing IDSs are high number of false alarms and not so robust systems and
heterogeneity in IoMT devices and communication protocols. Some of the challenges
encountered were related to lesser availability of published datasets for healthcare
network traffic analysis and scalability of IDS solutions. The analyzed problems can
possibly be resolved using integrative technologies such as blockchain, PUFs, and
SDN-enabled devices for more purposeful intrusion detection and upgraded 5G tech-
nology may prove to be a game changer for more operational and enhanced security
capabilities.
References
11. Newaz AI, Sikder AK, Rahman MA, Uluagac AS (2019) HealthGuard: a machine learning-
based security framework for smart healthcare systems. In: 6th international conference on
social networks analysis management and security (SNAMS). pp 389–396
12. Al-Shaher MA, Hameed RT, Ţăpuş N (2017) Protect healthcare system based on intelligent
techniques. In: 4th international conference on control, decision and information technologies
(CoDIT). pp 421–426
13. Khan S, Akhunzada A (2021) A hybrid DL-driven intelligent SDN-enabled malware detection
framework for Internet of Medical Things (IoMT). Comput Commun 170:209–216
14. Swarna Priya RM, Maddikunta PKR, Parimala M, Koppu S, Gadekallu TR, Chowdhary CL,
Alazab M (2020) An effective feature engineering for DNN using hybrid PCA-GWO for
intrusion detection in IoMT architecture. Comput Commun 160:139–149
15. Alrashdi I, Alqazzaz A, Alharthi R, Aloufi E, Zohdy MA, Ming H (2019) FBAD: fog-based
attack detection for IoT healthcare in smart cities. In: IEEE 10th annual ubiquitous computing,
electronics and mobile communication conference UEMCON. pp 0515–0522
16. Dilawar N, Rizwan M, Ahmad F, Akram S (2019) Blockchain: securing internet of medical
things (IoMT). Int J Adv Comput Sci Appl 10:82–89
17. Masud M, Gaba GS, Alqahtani S, Muhammad G, Gupta BB, Kumar P, Ghoneim A (2021)
A lightweight and robust secure key establishment protocol for internet of medical things in
COVID-19 patients care. IEEE Internet Things J 8:15694–15703
18. Liaqat S, Akhunzada A, Shaikh FS, Giannetsos A, Jan MA (2020) SDN orchestration to combat
evolving cyber threats in internet of medical things (IoMT). Comput Commun 160:697–705
19. Mishra L, Vikash, Varma S (2021) Seamless health monitoring using 5G NR for internet of
medical things. Wireless Pers Commun 120
20. Khan FA, Haldar NAH, Ali A, Iftikhar M, Zia TA, Zomaya AY (2017) A continuous change
detection mechanism to identify anomalies in ECG signals for WBAN-based healthcare
environments. IEEE Access 5:13531–13544
21. Aldhaheri S, Alghazzawi D, Cheng L, Alzahrani B, Al-Barakati A (2020) DeepDCA: novel
network-based detection of IOT attacks using artificial immune system. Appl Sci 10
22. Begli M, Derakhshan F, Karimipour H (2019) A layered intrusion detection system for critical
infrastructure using machine learning. In: Proceedings of 7th international conference on smart
energy grid engineering (SEGE). pp 120–124
23. Salem O, Alsubhi K, Mehaoua A, Boutaba R (2021) Markov models for anomaly detection
in wireless body area networks for secure health monitoring. IEEE J Sel Areas Commun
39:526–540
24. Rbah Y, Mahfoudi M, Balboul Y, Fattah M, Mazer S, Elbekkali M, Bernoussi B (2022) Machine
learning and deep learning methods for intrusion detection systems in IoMT: a survey. In: 2nd
international conference on innovative research in applied science, engineering and technology
(IRASET)
25. Sun Y, Lo FPW, Lo B (2019) Security and privacy for the internet of medical things enabled
healthcare systems: a survey. IEEE Access 7:183339–183355
26. Si-Ahmed A, Al-Garadi MA, Boustia N (2022) Survey of machine learning based intrusion
detection methods for internet of medical things
27. Roumani Y (2021) Patching zero-day vulnerabilities : an empirical analysis. 1–13
28. Tang R, Yang Z, Li Z, Meng W, Wang H, Li Q, Sun Y, Pei D, Wei T, Xu Y, Liu Y (2020)
ZeroWall: detecting zero-day web attacks through encoder-decoder recurrent neural networks.
In: Proceedings—IEEE INFOCOM. pp 2479–2488
29. Razdan S, Sharma S (2021) Internet of medical things (IoMT): overview, emerging technolo-
gies, and case studies. IETE Tech Rev (Institution Electron Telecommun Eng India)
30. Esposito C, De Santis A, Tortora G, Chang H, Choo KKR (2018) Blockchain: a panacea for
healthcare cloud-based data security and privacy? IEEE Cloud Comput. 5:31–37
31. Li J, Yang X, Chu G, Feng W, Ding X, Yin X, Zhang L, Lv W, Ma L, Sun L, Feng R, Qin J,
Zhang X, Gou C, Yu Z, Wei B, Jiao W, Wang Y, Luo L, Yuan H, Chang Y, Cai Q, Wang S,
Giulianotti PC, Dong Q, Niu H (2022) Application of improved robot-assisted laparoscopic
telesurgery with 5G technology in urology. Eur Urol 83:41–44
Veracity Assessment of Big Data
Abstract The inconsistent nature of Big Data drives the interest of research commu-
nity for devising new techniques for assessing, predicting and computing the veracity
of Big Data. The inconsistencies in Big Data is primarily due to limited number of
authorized sources. Moreover, social media platforms and web-based application
are the main source of such inconsistencies of Big Data. If our data is not reliable,
then the very purpose of analyzing the Big Data would be compromised. Outcomes
derived from such analysis would be of virtually no relevance. So before analyzing
the data and finding new insights, one needs to compute its veracity in terms of
its correctness, consistency, reliability, trustworthiness, credibility and authenticity.
This paper focuses on the research that has been carried out to compute the veracity
of Big Data and outlines the research gaps and challenges associated with computing
such data veracity.
1 Introduction
In this age of computing, Big Data is the heart of all the widely established large enter-
prises, social networking websites and IoT/web-based applications. From economic
evolutions to the smartly growing business models, Big Data has become the game
changer capable of tackling challenges that beset business uncertainties. Big Data
caters to the needs of all big companies to sustain in the rapidly evolving trends of
today’s global market. No company can survive without analyzing their customer
data, which continuously grows with time. According to a research, Facebook handles
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 305
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_26
306 Vikash and T. V. Vijay Kumar
500 TB of data each day [1]. This data is in structured, unstructured and semi-
structured form. It would not be possible to handle such a huge amount of data using
traditional methods. For example, analyzing the buying patterns of customers can
help a business model to predict changes in the demands of customers in order to
optimize the inventories accordingly. Big Data has numerous applications such as
stock market prediction, weather forecasting, education, customer behaviour predic-
tion, etc. In the past few years, out of the four V’s of Big Data, researchers have
delved the importance of veracity of Big Data. Veracity of Big Data needs to be
addressed very urgently because the social network and IoT/web-based networks are
generating the data at humongous rates which is chock full of inconsistencies and
uncertainties. So, veracity of data is a huge challenge in comparison with the other
V’s of Big data. Veracity of Big data is discussed next.
The original 3 V’s of Big Data were originally defined by [2]. Then an IBM employee
was the first source found to coin the term veracity as the fourth V [3]. After veracity
was adopted by IBM, it began appearing in Big Data research in 2013. The amount
of research work done in this area is very limited. Data veracity does not have
a unified definition. Some researchers have defined data veracity in terms of data
uncertainty due to data inconsistency and incompleteness [4]. McArdle and Rob
defined veracity in terms of authenticity, reliability and precision of collected data
[5]. Others have defined it as data correctness [6]. Some other researchers have
defined it as trustworthiness, completeness, consistency, integrity between the data
and its resources.
Thus, veracity of Big Data can be defined as trustworthiness, correctness, relia-
bility, consistency, authenticity and completeness of data. Veracity is somewhat of
a broader domain to work on which may include credibility of information, authen-
ticity of information, consistency, precision of data, trust computation of information
and reliability of information. Among these, trust is a multidisciplinary topic for
research. Several fields such as sociology, philosophy, automation and computing
and networking have defined trust as under:
• Sociology: Subjective probability of the trustor that the trustee will not indulge
in an action that hurts the trustors’ interest under uncertainty [7].
• Philosophy: Trust is a moral phenomenon and the violation of moral behaviour
leads to distrust [8].
• Automation: Trust is basically a situation where one agent will try to achieve
another agent’s goal under vulnerability and uncertainty [9].
• Computing and Networking: An agent trust is a subjective probability that
another agent or human will exhibit behaviour in a reliable manner under certain
risk [10].
Veracity Assessment of Big Data 307
According to research, around 80% of Big Data is uncertain [11]. U.S. spends
$3.1 trillion every year because of poor data quality [12]. In the year 2016, 62% of
U.S. adults got news from social media, which was an increase from 49% in 2012
[13]. Every smartphone user generates and consumes data through social media and
web-based applications. Facebook or Twitter like social networking platforms can
be used to spread misinformation and/or disinformation just to mislead the mindset
or opinion of an individual or a group of individuals. Several scenarios have been
observed in which the misinformation has caused the depreciation in the share values
of an enterprise and have even impacted on the presidential election in U.S. Various
other incidents have also been recorded in the past which have led to inconclu-
sive investigations and unwanted hassle situations just to mislead the people or in
spreading propaganda. These incidents can harm an individual or a group of individ-
uals or even humanity as a whole. So, the quality of data needs to be assured. Since
the quality of data decides the quality of the analysis, its veracity can be said to be
the most important V from among all the Vs of Big Data.
This paper focuses on the research that has been carried out to compute the veracity
of Big Data and outlines the research gaps and challenges associated with computing
data veracity.
The organization of this paper is as follows. The first section is the introduction.
The second section elaborates the literature review. The third section discussed the
research gaps and challenges in the field of data veracity followed by tentative solu-
tions provided in the fourth section. The fifth section emphasizes the future research
directions. The last section is the conclusion.
2 Literature Review
Several attempts have been made by researchers to address the challenge of predicting
and assessing data veracity. The real-time scenario where data veracity is assessed is
in the case of social networking websites. Attempts have been made to quantify the
veracity of Twitter microblogs and the datasets available for the veracity domain such
as Liar, Facebook Hoax, Buzz Face, Fake News Net, etc. Researchers have worked on
the sentiment analysis of the news content dataset available using machine learning
techniques. Computational statistics, along with artificial intelligence-based machine
learning models, can be used to assess the veracity of Big Data which is considered
to be an NP hard problem. Veracity of data is somewhat a bigger notion to work
on which includes assessing the reliability, credibility, trustworthiness, authenticity,
precision and completeness of the collected data. Most researchers have worked
on the computation of trust which is one of the important aspects of determining
veracity.
Crowdsourcing techniques have been used to assess Big Data veracity [6]. Crowd-
sourcing is basically a technique in which a group of people share their interest for
achieving a common goal to solve a problem. In [6], an app called “TAG ME” is
developed through which people are going to tag tweets from their smartphones based
308 Vikash and T. V. Vijay Kumar
on three categories, i.e. positive, negative and neutral. This information is saved and
passed to the Bayesian predictor to train the classifier. Other than this, a different
Bayesian predictor has been used on a verified dataset using a trinomial function.
It was shown that crowdsourcing techniques perform reasonably well in assessing
veracity of Big Data. In [14], a method that computes the emotional weights of
the news content from “The Star and The Onion” news dataset is proposed. First,
emotions from the news were identified using Emolex followed by finding the weight
of the emotions of each news content using the following expression [14]:
1.0 ∗ v
W = (1)
max value of(v)
where v is the value of emotional state and max value of (v) is the value of the highest
emotional state conveyed in the news. The computed weights become an input to the
input layer of the multilayer perceptron, which classifies the news as true or false. It
was observed that fake news is dominated by emotions such as anger, sadness, joy,
etc. In [15], a web-based application called Backdrop is presented with which a user
can interact with different knowledgebase with the aim of finding out which particular
claims can be considered more trustworthy. Backdrop was designed to annotate infor-
mation and semantics found online to check how the veracity of a statement differs
with claims. In [16], three approaches for assessing the veracity of data have been
outlined, namely the implicit, the explicit and the authoritative approach. The implicit
approach takes into consideration the fact that implicitly the truthful statement differs
from the untruthful statement not only in terms of its content and expression, but also
in the context they have been used for assessing data veracity. The explicit approach
needs an outside data to assess the veracity of data. Finally, the authoritative approach
needs legitimate authority to verify the claim with the objective to find out its credi-
bility. In [17], veracity has been defined using three dimensions, namely objectivity/
subjectivity; truthfulness/deception and credibility/implausibility. Using these three
dimensions, the veracity index is computed. The veracity index gives a systematic
view in assessing the quality of Big Data especially in the case of textual data. In
[18], assessing, predicting and improving data veracity has been presented in three
different contexts, namely for social media networks, for web applications and for
Internet of Things applications. Semantic analysis and Twitter influence have been
used to assess data veracity. In [10], all the factors and constructs affecting the trust
between the trustor and the trustee are discussed. Further, it emphasizes the impor-
tance of considering scaling of trust at different levels, in contrast to binary levels,
thereby laying a sound foundation for quantifying the veracity of Big Data using
the trust computation. In [19], machine learning models for computing trust either
as a two-class classification problem or a continuous target variable class problem
for social and ad-hoc networks is presented. Also, the various properties of trust
like privacy protection and context awareness and various attacks that can occur in
reputation-based models like amazon, which include white washing, bad mouthing,
on off attack, collusion and conflict behaviour attacks, are presented. In [20], how
Veracity Assessment of Big Data 309
the information can be classified into three categories, namely disinformation, misin-
formation and unverified information is presented. These three types of information
are distinguished based on their intention. Disinformation typifies bad intention in
comparison with misinformation and unverified information. Rumours are a type of
misinformation and fake news is a type of a hoax, which is a type of disinformation.
Also, the relationship between the two problems, i.e. disinformation detection and
truth discovery is presented. Further, various approaches to tackle the disinforma-
tion detection and truth discovery such as traditional (feature-based, kernel-based,
graph-based, iteration-based, etc.) and neural network-based (DNN-based, RNN-
based, CNN-based) approaches for disinformation detection is discussed. In [21],
the four steps for analyzing rumours have been suggested. These are data annota-
tion, rumour classification and prediction, rumour diffusion and rumour visualization.
Data annotation is mainly concerned with how the rumours are spreading through
the network and how one can verify and/or validate/invalidate the rumours. Rumour
classification and prediction basically addresses the task of how one can analyze and
predict the source tweet as rumours by looking at the retweets, i.e. veracity prediction
using the stance classification. Rumour diffusion basically refers to how the infor-
mation is being spread out through the network and how it can be tackled. Finally,
rumour visualization is concerned with visualizing the rumour using the GRIP system
and dynamic graph drawing-based methods. In [22], a probability-based solution
for tackling the problem of trust computation is proposed. At random, 50 samples
from a database were taken and the trust was computed for different weights of the
predefined trust factors like “personal information sharing, known in person, mutual
friends, common interest, past conversation history”. They have simply computed
the weighted arithmetic mean using these trust factors to compute the trust for a user
in the network and total average trust value for the whole network using the binomial
probability. In [23], user-based and item-based collaborative filtering have been used
to compute similarity to enhance the trust of users towards the network. User-based
collaborative filtering prefer items that have been preferred by similar users whereas,
item-based collaborative filtering prefers items which have been preferred earlier.
In new heuristic similarity model (NHSM), three factors, namely similarity, prox-
imity and significance have been used to compute similarity between users for the
Movielens dataset. In [24], a more specific domain in truth discovery that computes
the hardness of claims using the maximum likelihood estimation is presented. For
computing the source reliability and correctness of a claim, three real world-based
scenarios Oregon shooting, Paris attacks and Baltimore riots on the Twitter social
networking platform were used. In [25], rumour detection and stance classification
were considered together. RNN with multi-task architecture (MTUS, MTES) has
been used to compute veracity. The next section introduces detailed study on the
research gaps and challenges regarding the literature review.
310 Vikash and T. V. Vijay Kumar
Research gaps have been mentioned in Table 1 based on the study of papers in groups
of three or four. Only few studies have highlighted the research gaps which laydown
a root-level foundation for future research work and the limitations of the studies
found in the literature review.
The following challenges exist pertaining to assessing data veracity such as
• Lack of models for veracity assessment of data from disparate data sources.
• Multiple definitions and interpretations of data veracity.
• Less number of datasets available for assessing the veracity.
• Lack of scalable solutions for veracity assessment.
4 Tentative Solutions
This section introduces the tentative solutions for the above-mentioned research chal-
lenges. These solutions are proposed by taking into account the multiple definitions
of veracity of data, and the factors which can help to compute the veracity of data,
in a most efficient way. Only few studies have used the advance AI tools to address
the problem of assessment of veracity of Big Data.
• To address heterogeneity of data sources as a problem for assessing the veracity
of data, there is a need to come up with a single conceptual framework which will
suit all domains for computing veracity. This conceptual framework can lead to
such an environment where the computational model of veracity can deal with
different sources of data up to certain extent.
• To cope up with multiple interpretations of veracity as a research challenge, the
research community should define veracity in such a way so that it highlights
and touches every aspect of veracity at fundamental level. These multiple inter-
pretations need to be put concisely in a manner to avoid any domain specific
interpretation of veracity.
• Most of the work is done on textual data of the Twitter microblog using super-
vised machine learning techniques [16]. To address unavailability of datasets as a
research challenge there should be enough datasets available for computing and
comparing the veracity of data among the different approaches. Availability of
datasets can be resolved by using the API for social network and IoT/web-based
network and different graph-based approaches can be used such as extracting
graph of retweet network, interaction-based network from Twitter and other social
networks.
• Scalability of solutions as a research challenge can be addressed by finding
more intact solutions using machine learning tools and computational statistics
to address the problem. Machine learning, computational statistics and proba-
bility can lay the foundation for achieving scalable solutions for problem veracity
assessment.
Veracity Assessment of Big Data 311
Table 1 (continued)
References Description Gap analysis
[20] Addressed the problem of • Most of the work has been done using implicit
disinformation detection and truth feature, i.e. features containing user profile
discovery from a single reference information, emotions, linguistic and
using traditional and neural information propagated on a network
network-based techniques • Social networking websites not only contain
[27] Study suggests that around 51% the text data but also the images, which makes
tweets come with images, which the information appear real or trustworthy
• In certain scenarios some sort of fake images
comprise of visual and statistical
is spread resulting in fake information, which
features
is a severe issue
[24] Study has explored the somewhat • More work on datasets that contain text and
more specific domain in truth images needs to be carried out so that veracity
discovery that computing the hardness assessment, especially in case of social
of claims using the maximum networks, is more precise and accurate
likelihood estimation to compute the • Unavailability of social network datasets make
source reliability and correctness of the quantification process of trust more
claim using the three real world-based complex as it would require use of techniques
scenarios Oregon shooting, Paris that involve human intervention like
attacks and Baltimore riots crowdsourcing or fact checking
[25] Study basically identifies that if one
would approach the rumour detection
and stance classification separately,
comparatively results or the outcomes
for the veracity would be quite
different with one who consider them
jointly
[18] Study has been conducted in such a • Not much work has been carried out on
way that it divides the literature quantification of veracity that preserves the
review in three parts, i.e. veracity of properties of trust like context dependency,
data in social networks, web-based dynamicity, asymmetry and subjectivity
applications and IoT applications • Only few studies have proposed solutions that
[15] Study claims to come up with a are reliable for handling the trust attacks in
web-based application called recommendations or reputations-based
Backdrop, which is used to annotate systems such as bad mouthing, white washing,
information and semantics found collusion and conflict behaviour
• Most of the work has been carried out on
online to check how veracity of
graph-based approach for quantification of
statements differ with claims
trust in social networks that considers
[23] Study claims to use the collaborative neighbour node recommendations, which is
filtering, which is basically of two vulnerable to trust attacks
types, i.e. user-based and item-based. • There is no standard way of scaling the trust
User-based collaborative filtering which is one of the measures for computing
prefers the items that have been veracity of data
preferred by similar users. Item-based
collaborative filtering will prefer the
items which have been preferred
earlier
Veracity Assessment of Big Data 313
Future work can be done in plenty of ways out of which few important areas have
been mentioned below. Most of the work has been done using computation of trust
which is one of the important constituents of veracity, using different approaches for
quantifying data veracity.
• Veracity is a multidisciplinary topic for research and it has multiple interpretations
which are domain specific. Research work needs to work out a domain independent
conceptual framework for veracity.
• Feature extraction and feature selection can be the next domains in social network
for analyzing what are the features which are independent of the platform, i.e.
platform independent features to compute veracity of data for any social network.
• Only few people have worked with datasets containing images. Mostly posts
on social network forums include images and, therefore, veracity computation
becomes relevant. So, there is a need to define approaches for computing veracity
of datasets that includes images.
• Future work may be done basically for finding more scalable solutions by using
advance machine learning techniques and tools to address the problem of data
veracity.
• Privacy protection needs to be addressed while dealing with, or collecting, the
datasets which contain sensitive user information.
• Some standard datasets need to be published for further analysis and comparison of
results regarding computation of the veracity of data for finding the most efficient
approach.
6 Conclusion
Veracity is going to be the most important area in the field of data science and Big Data
in comparison with the other 3 V’s of Big Data. This paper focuses on the motivation
for computing the veracity of data and its various interpretations in literature. Veracity
deals with trustworthiness and consistency of the information, without which the sole
purpose of analyzing Big Data is defeated. If the data under analysis is not certain
then the insights obtained after performing analysis would be of no relevance. Data
veracity can help in information sorting, aggregating information and information
filtering. Veracity can help people by enriching the credible content for the users
who are connected through online social networks for taking efficient decisions by
having the credible/trustworthy opinion of their friends. Various approaches have
been discussed in this paper for computing the veracity of data and their detailed
research gaps have been emphasized. This paper discussed challenges regarding the
field of data veracity and their tentative solutions mentioned in literature. Future
research directions have also been mentioned in the paper for taking notes for further
research in the field of data veracity. Veracity can give us more promising results
314 Vikash and T. V. Vijay Kumar
and bring about positive changes in people’s lives by enriching the social capital of a
social network. There is a need to make computation of veracity scalable, for which
advance AI and machine learning tools and techniques can be used. After surveying
the literature, it can be gleaned therefrom that data veracity assessing techniques and
tools will be required to meet the challenges posed by the astonishingly increasing
rates of Big data creations, which may help to shape the decision-making in an
effective way. Feature extraction and feature selection can be the next domains in
social network for identifying platform independent features for computing veracity
of data.
References
20. Fan XU (2021) A unified perspective for disinformation detection and truth discovery in social
sensing: a survey. ACM Comput Surv 55(6):33
21. Devi PS, Karthika S (2018) Veracity analysis of rumors in social media. In: 2nd international
conference on computer, communication, and signal processing: special focus on technology
and innovation for smart environment, ICCCSP
22. Yadav P, Gupta S, Venkatesan S (2014) Trust model for privacy in social networking using prob-
abilistic determination. In: International conference on recent trends in information technology,
ICRTIT
23. Garakani MR, Jalali M (2014) A trust prediction approach by using collaborative filtering
and computing similarity in social networks. In: International congress on technology,
communication and knowledge, ICTCK
24. Marshall J, Syed M, Wang D (2016) Hardness-aware truth discovery in social sensing applica-
tions. In: Proceedings 12th annual international conference on distributed computing in sensor
systems, DCOSS. pp 143–152
25. Liu X, Gao J, He X, Deng L, Duh K, Wang YY (2015) Representation learning using multi-
task deep neural networks for semantic classification and information retrieval. In: NAACL
HLT 2015 conference of the North American chapter of the association for computational
linguistics: human language technologies, proceedings of the conference. pp 912–921
26. Zhao K, Pan L (2015) A machine learning based trust evaluation framework for online social
networks. In: Proceedings 2014 IEEE 13th international conference on trust, security and
privacy in computing and communications, TrustCom. pp 69–74
27. Jin Z, Cao J, Zhang Y, Zhou J, Tian Q (2017) Novel visual and statistical image features for
microblogs news verification. IEEE Trans Multimedia 19(3):598–608
The Role of Image Encryption
and Decryption in Secure
Communication: A Survey
Abstract Information security is a vital tool for protecting the confidentiality and
integrity of digital information. In this paper, we consider different image encryption
techniques based on advanced encryption standard (AES), chaotic system, RSA,
elliptic curve cryptography (ECC), data encryption standard (DES), and hybrid
encryption schemes. Further, the discriminative capability of each encryption scheme
is then examined. Various security analyzes were also considered to show the
effectiveness of the respective models proposed by different researchers.
1 Introduction
In the modern world, the exchange of digital information has become an integral part
of daily life. Digital data is essential for many aspects of society, from online banking
and e-commerce to social media and communication between government agencies.
However, this reliance on digital information also exposes it to potential threats, such
as unauthorized access, tampering, or interception. To protect the confidentiality
and integrity of digital information, encryption and decryption techniques have been
developed to secure the transmission and storage of data. In this paper, we presented a
theoretical analysis of the role of encryption and decryption in secure communication.
We begin by reviewing the basic concepts of encryption and decryption, including
symmetric and asymmetric key algorithms, cryptographic protocols, and protocols
for secure key exchange. We then examine the strengths and limitations of various
encryption and decryption techniques, including their ability to resist attacks like
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 317
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_27
318 T. Devi Manjari et al.
Image Encryption
brute force, PNSR, SSIM, key space, histogram analysis, etc. The rest of the paper
is organized as follows. Section 2 shows the literature survey. Section 3 shows the
comparative analysis followed by conclusion in Sect. 4 (Fig. 1).
2 Literature Survey
AES is a popular symmetric key block cipher published by NIST, mentioned by Joan
and Vincent [1]. It is known for its security, low cost, and versatility in hardware
and software implementations. Alsaffar et al. [2] proposed two methods for securely
transferring medical images, one using AES-GCM combined with the whirlpool hash
function and ECDSA, and the other using only AES-GCM and ECDSA. Faragallah
[3] developed a secure cryptosystem that combines hashed image LSB watermarking
with AES or RC6 encryption to protect audio data. In this system, plain audio is
transformed into 4 × 4 blocks, XORed with a private image, and then embedded
using LSB watermarking before being encrypted using AES or RC6.
To increase the security of the DES algorithm, Yun-Peng et al. [4] propose incorpo-
rating chaos into the encryption process. Experimental evaluation of the method
revealed that the suggested encryption algorithm offers good security and can
successfully preserve the secrecy of digital photos. The unique encryption method
Dang et al. [5] use is based on chaotic systems and conventional encryption tech-
niques. Before encrypting the image data, the approach uses chaos to randomize it,
thus offering very high security for online image transmission.
The Role of Image Encryption and Decryption in Secure … 319
Encryption schemes based on chaotic systems are popular due to their properties,
including ergodicity, simplicity, and resistance to key sensitivity and key space. In
their research [6], Guesmi et al. proposed an image encryption scheme that combines
DNA masking, SHA-256, and the Lorenz system. The plain image is split into
RGB planes, encoded using DNA encoding, and scrambled using chaotic sequences
generated from the Lorenz system. The SHA-256 hash value is changed to a binary
sequence and encoded using DNA encoding. The resulting cipher image is obtained
through DNA decoding. Brindha and Ammasai [7] proposed an image encryption
scheme that uses the Henon map and Lorenz equation with multiple levels of diffu-
sion. This scheme involves dividing the input image into square blocks, confusing
them using the Henon chaotic map, and performing two diffusion operations using the
Lorenz equation and a matrix generated from a complex function applied to the input
image. The final encrypted image is obtained by scrambling the confused-diffused
image using Arnold’s transformation.
Hybrid encryption, proposed by Brindha et al. [8], is a technique used to secure digital
communication by combining symmetric and asymmetric key algorithms. In this
approach, a symmetric key is used to encrypt the data, while an asymmetric key is used
to protect the symmetric key. The use of hybrid encryption offers several advantages
over the use of either symmetric or asymmetric key algorithms alone, including
increased security and efficiency. For example, if an attacker was to compromise the
symmetric key, they would still need to obtain the asymmetric key in order to decrypt
the data. This added layer of protection can be further enhanced by using multiple
keys and algorithms, as proposed by Robshaw and Seurin [9].
Elliptic curve cryptography (ECC) is a popular method for public key encryption
(PKE), with security based on the elliptic curve discrete logarithm problem (ECDLP).
ECC has been demonstrated to provide high security with smaller key sizes, espe-
cially in constrained environments, as shown in the research of Koblitz [10]. Maria
and Muneeswaran [11] proposed a method for encrypting both text and images using
ECC. For the purpose of generating a private key and a random integer k, they used
a connected linear congruential generator and applied elliptic curve point multipli-
cation to each ASCII value of the text or pixel value of the image to map them to
elliptic curve points. Decryption involves solving the elliptic curve discrete logarithm
320 T. Devi Manjari et al.
problem (ECDLP), but the small order of the elliptic curve used in this scheme may
not provide sufficient security in practical implementations.
Micciancio and Peikert [12] have written about the security of lattice-based cryp-
tography, which is believed to be more secure than RSA. In today’s world, safe-
guarding data during civilian satellite missions has become crucial, and the use of
encryption techniques is a necessity. However, traditional encryption methods like
DES, RSA, and AES are not suitable for multimedia data transmission. To counter
this, we present a new technique that combines chaotic maps and AES, specifically
designed to secure satellite imagery against illegal use and unauthorized access.
Saranya et al. [13] proposed to implement a high value of exponent in RSA algo-
rithm to improve security. RSA has become the choice algorithm for functions such
as authenticating phonecalls, encrypting credit-card transactions, security e-mail,
and providing Internet security functions.
3 Comparative Analysis
The different encryption techniques previously discussed show how encryption algo-
rithms could protect the data. However, every algorithm has its pros and cons. The
most important criterion for an encryption algorithm is the security and effective-
ness of the algorithm. Considering security, some of the responsible factors are
PNSR, SSIM, the peak-signal-to-noise-ratio (PSNR), and structural-similarity-index
(SSIM) which are applied to identify the image’s accuracy. Key space: Where brute
force attack can be avoided if the minimum requirement key space is fulfilled.
Histogram Analysis: The output cipher produces equal pixel values distribution for
an ideal algorithm. Histogram analysis for different encryption schemes using pepper
image is shown in Fig. 2. Some of the existing state-of-the-art methods are simu-
lated on Mathematica software with a system configuration of intel(R) Core(TM)
i5-1035G1 CPU @ 1.00 GHz. Color images of size 512 × 512 from SIPI image
database [25] are taken as the input for the comparative analysis which is shown in
Table 1. For fair comparison, the key size for all the method in Table 1 is taken as 512
bits. Figures 3, 4, 5, and 6 show the NPCR, UACI, entropy, and PSNR comparisons
for the images—house, pepper, and baboon using different encryption schemes.
Ref. [14–24] are encryption schemes based on chaotic systems, AES, RSA,
ECC, and DES, respectively. In [14], the author proposed a multiple image encryp-
tion where multiple k images are combined to form big images, and encryption is
performed using a chaotic system. In [14, 15], it is advised for the best image encryp-
tion in terms of pixel correlation since it uses a chaotic function logistic map and
a particle swarm. Due to some shortcomings, traditional encryption standards such
The Role of Image Encryption and Decryption in Secure … 321
(a)
Fig. 2 a Pepper image, b, d, f, h, j, l encrypted image of (a) using AES, DES, chaotic system,
hybrid system, ECC, and RSA scheme, respectively, c, e, g, i, k, m corresponding histogram of b,
d, f, h, j, l, respectively
as DES, AES, international data encryption algorithm (IDEA), and blowfish are not
the most suitable options for image encryption. In [16], complex algorithms like
cryptographic mechanisms, etc., cannot be used directly because of the increase of
demand for resources. So, the less complex algorithms are used as a requirement for
IoT to provide security for data communication. The devices are smaller in size and
have limited processing power in IoT architecture. In [19, 20], the proposed method
incorporates security and quality-focused data transfer through the application of
cryptographic and steganographic elements. They used hybrid encryption (RSA and
AES), and the entire security level was increased through the strategic application
of data encryption and data decryption. In parallel, an adaptive genetic algorithm
322 T. Devi Manjari et al.
60
40
20
0
Chaotic AES RSA ECC DES
Encryption Schemes
based on OAPA was developed after discovering the quality preservation compo-
nent of steganography, which is most frequently utilized in safeguarding medical
data. This algorithm raises the least significant bit of embedding over image blocks.
The proposed method is appropriate for a wide range of communication needs,
including cloud communication, the transmission of healthcare data, and communi-
cation between IoT devices. In [21], the author proposed a medical image security
model. For the security of medical imaging, innovative and secure algorithms are
proposed. In [22], the author improved the Rayot’s method, where while converting
large values using base converter, it becomes larger than the prime modulo P and
The Role of Image Encryption and Decryption in Secure … 323
Values
20
15
10
5
0
Chaotic AES RSA ECC DES
Encryption Schemes
7.4
7.2
7
6.8
6.6
Chaotic AES RSA ECC DES
Encryption Schemes
25
20
15
10
5
0
Chaotic AES RSA ECC DES
Encryption Schemes
324 T. Devi Manjari et al.
is difficult to decrypt. The authors avoided this flaw by generating the larger integer
always less than that of modulo prime by using inverse modulo operator. In [23],
the primary concept is about an effective chaotic encryption of images based on AT
and SVD. The author proposed the technique in which a plain image is submitted to
AT confusion and SVD diffusion. The result shows the efficiency of the suggested
AT-SVD image cipher. In [24], the author proposed an asymmetric image encryp-
tion algorithm based on ECC and a chaotic system. The proposed algorithm makes
key transmission along with management simple and clear. By analyzing the other
algorithms in the above table, ECC is considered the best encryption method due
to its security, which is strong due to its mathematics, and the security of ECC
which is based on ECDLP. In [26], on the basis of chaotic systems and permutation-
substitution (SP) networks, the author proposed a novel image encryption method.
Diffusion, substitution, diffusion, and permutation are the four cryptographic phases
that make up this process. By analyzing the results, we can say that the suggested
encryption solution has more security, sensitivity, and speed than earlier methods.
In [27], the paper studies an algorithm for full recovery of the plain image from
a ciphered image using permutation and diffusion. A new spatiotemporal chaotic
system is introduced in which a permutation-diffusion mechanism is used to design
an image encryption scheme. A hash function SHA-2 is employed to compute the
hash values used as the initial conditions of a chaotic system. A new pre-modular,
permutation, and diffusion (PPD) cipher is proposed to solve the problem of two bits
being changed at the same time, while the pixel summation is kept unchanged. In
[28], encryption of digital information is necessary to protect it from security threats.
Some of the most widely used encryption algorithms are designed specifically for text
encryption, such as the international data encryption algorithm (IDEA), triple-DES
(3DES), advanced encryption standard (AES), and data encryption standard (DES).
In [29], one of the benefits of chaotic systems is that they are relatively secure when it
comes to communication. This paper proposes two modifications that will enhance
the security of an image cryptosystem, which suffers from some drawbacks. The
first modification introduces a P-box for permutation, and the second introduces a
S-box for substitution. Both modifications are tested in simulations and found to be
successful.
4 Conclusion
This paper gives a brief overview of different image encryption schemes. Symmetric
key encryption schemes such as AES provide fast and better security, but an issue
exists in exchanging the private key. At the same time, asymmetric encryption
schemes like RSA need bigger key size and take more computation time. Further,
ECC outperforms other public key encryption schemes regarding cipher size, key
size, and computation cost. Finally, researchers are working on hybrid or modified
algorithms that can provide better security, smaller key size, and smaller cipher data.
The Role of Image Encryption and Decryption in Secure … 325
References
1. Joan D, Vincent R (2002) Advanced encryption standard. US: National Institute of Standards
and Technology
2. Alsaffar DM, Almutiri AS, Alqahtani B, Alamri RM, Alqahtani HF, Alqahtani NN, Ali AA
et al (2020) Image encryption based on aes and rsa algorithms. In: 2020 3rd international
conference on computer applications & information security (ICCAIS). IEEE, pp 1–5
3. Faragallah OS (2018) Secure audio cryptosystem using hashed image lsb watermarking and
encryption. Wireless Pers Commun 98(2):2009–2023
4. Yun-Peng, Zhang, Liu Wei, Cao Shui-Ping, Zhai Zheng-Jun, Nie Xuan, and Dai Wei-di. Digital
image encryption algorithm based on chaos and improved DES. In: 2009 IEEE international
conference on systems, man and cybernetics. IEEE, pp 474–479
5. Dang PP, Chau PM (2000) Image encryption for secure internet multimedia applications. IEEE
Trans Consum Electron 46(3):395–403
6. Guesmi R, Farah M, Kachouri, Samet MA (2016) A novel chaos-based image encryption using
DNA sequence operation and secure Hash Algorithm SHA-2. Nonlinear Dyn 83(3):1123–1136
7. Brindha M, Ammasai G (2016) Image encryption scheme based on block-based confusion and
multiple levels of diffusion. IET Comput Vision 10(6):593–602
8. Brindha M, Ammasai G (2016) A chaos based image encryption and lossless compression
algorithm using hash table and Chinese Remainder Theorem. Appl Soft Comput 40:379–390
9. Robshaw MJB, Seurin Y (2007) The design and analysis of hybrid encryption schemes. J
Cryptol 20(4):361–396
10. Koblitz N (1987) Elliptic curve cryptosystems. Math Comput 48(177):203–209
11. Maria S, Muneeswaran K (2012) Nonce based elliptic curve cryptosystem for text and image
applications. Int J Netw Secur 14(4):236–242
12. Micciancio D, Peikert C (2020) Satellite image encryption based on aes and discretised chaotic
maps. Autom Control Comput Sci 54(5):446–455
13. Saranya et al (2014) (IJCSIT) Int J Comput Sci Inf Technol 5(4):5708–5709
14. Encoding and chaotic system (2019) Multimedia Tools Appl 78(6):7841–7869
15. Ahmad M, Alam MZ, Umayya Z, Khan S, Ahmad F (2018) An image encryption approach
using particle swarm optimization and chaotic map. Int J Inf Technol 10(3):247–255
16. Nayak MK, Swain PK (2020) MSIT: a modified lightweight algorithm for secure Internet of
Things. In: 2020 IEEE international symposium on sustainable energy, signal processing and
cyber security (iSSSC). IEEE, pp 1–6
17. Ahmad M, Alam MZ, Umayya Z, Khan S, Ahmad F (2018) An image encryption approach
using particle swarm optimization and chaotic map. Int J Inf Technol 10:1–9
18. Mir UH, Singh D, Lone PN (2022) Color image encryption using RSA cryptosystem with a
chaotic map in Hartley domain. Inf Secur J Glob Perspect 31(1):49–63
19. Denis R, Madhubala P (2021) Hybrid data encryption model integrating multi-objective adap-
tive genetic algorithm for secure medical data communication over cloud-based healthcare
systems. Multim Tools Appl 80(14):21165–21202
20. Denis R, Madhubala P (2020) Evolutionary computing assisted visually-imperceptible hybrid
cryptography and steganography model for secure data communication over cloud environment.
Int J Comput Netw Appl 7:208–230
21. Shankar K, Elhoseny M, Dhiravida Chelvi, Lakshmanaprabu SK, Wu W (2018) An efficient
optimal key based chaos function for medical image security. IEEE Access 6:77145–77154
22. Singh KM, Dolendro Singh L, Tuithung T (2022) Improvement of image transmission using
chaotic system and elliptic curve cryptography. Multim Tools Appl 1–22
23. Malladar R, Kunte S (2016) Selective video encryption using Sattolo’s encryption technique.
In: 2016 International conference on electrical, electronics, communication, computer and
optimization techniques (ICEECCOT). IEEE, pp 268–273
24. Afifi A (2019) Efficient Arnold and singular value decomposition based chaotic image
encryption. Int J Adv Comput Sci Appl 10(3)
326 T. Devi Manjari et al.
Abstract The COVID-19 pandemic has made face recognition and identification a
complex task, as people often cover a significant portion of their face with masks as
a precautionary measure. This creates difficulties for biometric devices and secure
authentication systems, as masks obstruct facial key points that are necessary for
face detection. The presence of masks also presents challenges for face identifica-
tion. There is a shortage of paired and aligned face images that show faces both with
and without masks. This study proposes a framework for reconstructing the occluded
part of the face that is covered by a mask. The GAN-based unpaired image trans-
lation method is used to translate masked face images into unmasked face images
as the reconstructed faces. A synthetic paired face dataset is created to evaluate the
performance of the model in reconstructing the unmasked face from a masked face
and is used to train the proposed GAN-based face reconstruction model. The model
is based on transfer learning and the pix2pix cGAN architecture and the results of
the comparison analysis show that our model outperforms other state-of-the-art face
reconstruction models in both quality and quantity.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 327
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_28
328 C. Agarwal et al.
1 Introduction
Face recognition is being used more and more for security and human interaction
with machines [1, 2]. The population was affected by the COVID-19 outbreak. Iden-
tifying masked faces is a challenge for face recognition systems, even though the
treatment protocol includes wearing a mask. Most facial recognition methods used in
human computer interaction applications fail to recognize masked faces. The tradi-
tional face recognition systems identify faces based on facial landmark detection and
find the facial features missing for identifying a masked face. The issue of occluded
face photos, including masks, has not been fully resolved despite the exponential
development in research studies on face recognition. There is also a lack of a dataset
for masked faces with complex mask sizes and face variation. As a result, recog-
nizing and verifying the identity of individuals wearing masks has become a widely
researched topic, and the need for more advanced facial recognition methods has
arisen.
In the proposed work, we demonstrate the reconstruction of an unmasked face
from a masked face using Generative Adversarial Networks (GANs). The experi-
ments are carried out using the image inpainting or image completion mechanism
for reconstructing the face using Generative Adversarial Network (GAN) [3–7]. To
remove the mask and synthesize the affected regions in detail while maintaining the
overall consistency of the facial structure, we use image interpolation with GAN [8]
on the masked face. The model we have selected for our work is the pix2pix cGAN
[8]-based model. We have experimented with three models in our paper, generative
image inpainting with contextual attention (GC) [6], pix2pix cGAN [8], and other
are customized GAN.
The key contributions of the work are:
• Development of a new approach that uses GANs to automatically remove masks
from faces and precisely reconstruct the concealed areas.
• Creation of a synthetic masked face dataset called “MaskedFace-CelebA-HQ,”
which consists of 29,571 images and is based on the benchmark face dataset
“CelebA-HQ.”
• Presentation of a qualitative and quantitative comparison study of three GAN
models (GAN with contextual attention, Pix2Pix GAN, and the proposed model)
for face reconstruction, with an in-depth analysis of the reconstructed faces in
terms of face recognition accuracy.
The remaining sections of the paper are structured as follows. A review of related
studies is presented in Sect. 2, followed by a detailed description of the proposed
model in Sect. 3. The experimental setup and results are discussed in Sects. 4 and
5, respectively. The paper concludes with a discussion of the findings and potential
future work in Sect. 6.
Reconstructing Masked Face Using GAN Technique 329
2 Related Work
Recently, deep learning GAN-based methods have become popular for a range of
applications, including data augmentation [9], pose estimation [10], object removal,
and image painting [11]. The achievement of this methodology can be credited to its
utilization of unsupervised learning, producing highly detailed and genuine images,
and the robustness of adversarial training.
Object removal techniques that don’t rely on learning [5, 12] have attempted
to tackle occlusions such as sunglasses and other objects by synthesizing the absent
content through finding similar patches from other regions of the image. An approach
was proposed [13] to eliminate occlusion objects from facial photos by modifying
the path priority function with a regularized factor. However, these techniques have
limitations and only work for small holes with limited color and texture variations.
Iizuka et al. [14] presented a learning model based on GANs that can remove
objects and repair the impacted regions. The model has two discriminators (local and
global) to ensure that the reconstructed image is both locally and globally realistic. A
post-processing technique called Poisson blending [15] was applied to avoid visible
seams. This method can handle random damage, but struggles with producing high-
quality photos and creates artifacts when the damage is near the edge of the image.
Zeng et al. [16] developed a controlled image inpainting system by integrating a
deep generative model with closest neighbor-based global matching.
In their work, Boutros et al. [17] created a novel embedding unmasking model
which removes masks by creating a new feature embedding that resembles an
unmasked face using the feature embedding of a masked face as input. Another
study by Din et al. [4] applied image-to-image translation with GAN-based image
inpainting to automatically remove masks from photos. Farahanipad et al. [7]
presented a GAN-based approach to reconstruct masked faces using cycle GAN,
which generated the occluded parts of the face in a realistic manner with promising
results.
In this study, a novel framework for automatically removing masks and recon-
structing masked faces is proposed using image-to-image translation. This approach
addresses the limitations of traditional methods in restoring missing parts of facial
images, taking into consideration different facial angles and expressions.
3 Proposed Method
Equation 1 in the paper represents two parts, one for the discriminator and the
other for the generator. The discriminator outputs 1 when the input (x, y) is real, and
outputs 0 when the input is a fake sample generated by the generator (x, G(z)). The
objective of the generator is to learn to produce samples that are more similar to
real samples, and it is trained based on the discriminator. The goal is for the value
of G(z) to be as close to zero as possible, indicating that the generated sample is
similar to a real sample. The discriminator, which changes D(x, G(z)) to 1 from
D(x, G(z)) = 0, contributes to learning the original distribution. Along with fooling
the discriminator, the generator also creates images that are close to the truth by
combining the loss function with an L1 loss. In essence, Eq. 2 demonstrates that the
generator’s additional L1 loss is added to the loss function.
L L1 (G) = E x,y,z y = G(x, z)1 (2)
It’s crucial to note that the L1 loss only captures low-frequency image features
and lacks the ability to preserve high-frequency details. To address this issue, the use
of PatchGAN in combination with the Adam optimizer is implemented to enhance
the hazy output [18]. The parameter lambda determines the relative significance of
the two objectives.
332 C. Agarwal et al.
Fig. 4 MaskTheFace[21]: a tool designed to generate a paired face dataset made up of masked and
unmasked images by warping the mask template according to the crucial face landmarks to attain
believable masked faces
4 Experimentation
The proposed work for unmasking the masked face uses a modified Pix2PixGAN
model [8]. The architecture for the generative networks is adapted from [8], which
consists of one generator and one discriminator. The generator in the proposed model
is a modified U-Net and includes an encoder with eight downsampling layers and a
decoder with eight upsampling layers.
The encoder is comprised of a series of blocks, where each block consists of a
convolution operation followed by batch normalization (except for the first block)
and leaky ReLU activation.
The architecture of each block in the decoder consists of a Transposed Convolution
layer followed by Batch Normalization and ReLU activation, with Dropout applied
to the first three blocks.
Reconstructing Masked Face Using GAN Technique 333
The last layer of the network produces an output of shape (batch_size, 30, 30, 1),
where each 30 × 30 image patch is used to classify a 70 × 70 portion of the input
image. The discriminator is given two inputs, both having dimensions [256, 256,
3], which are joined together and used to make a judgment on whether the image
is authentic or generated. The discriminator classifies the input and target images as
real while classifying the input and generated images as fake. To calculate losses, the
model is trained over 150,000 epochs with 5862 unpaired, 256 × 256 facial photos
with and without masks at a learning rate of 0.003.
We now go over the qualitative and quantitative results of our method on real-world
images with masks, as well as how it compares to other prior state-of-the-art image
editing techniques.
We conducted a qualitative comparison between our proposed method and two other
approaches, Yu et al. [6] and pix2pix cGAN [8], using real-world test images as
shown in Fig. 5. Our comparison showed that while Yu et al. [6] was able to reduce
artifacts at the margins, it was not capable of fully recovering complex face struc-
tures. The pix2pix cGAN is a conditional GAN that uses real data, noise, and labels
to generate images. It trains using a pair-to-pair image translation method, which
involves using a provided dataset. The cGAN consists of one generator and one
discriminator. We modified the pix2pix cGAN by changing the downsampling layers
of the discriminator and adjusting the hyperparameters.
Additionally, despite the presence of face masks that cover significant facial
features in each test image, our proposed model is capable of effectively removing
the mask and generating output images with a natural appearance and structural
integrity, surpassing the results of other leading image manipulation techniques.
The performance of the proposed method, along with the models by Yu et al. [6] and
Pix2Pix cGAN [8], was assessed using a synthetic masked face dataset of 29,571
images. This dataset was derived from the publicly accessible CelebA-HQ [20]
celebrity face image collection. We evaluated the generated output images using
334 C. Agarwal et al.
Fig. 5 Visual comparison between the proposed method and representative image interpolation
methods. From left to right: input image, a ground truth, b masked face, c GAN with contextual
attention, d pix2pix cGAN, e our proposed method
Structural Similarity (SSIM) [22], PSNR [23], and MAE. SSIM is a full reference
metric that measures the perceptual difference between two similar images. PSNR
[23] is a metric that quantifies the ratio between the maximum possible power of
a signal and the power of any disturbance or noise. As there is no corresponding
ground truth for real images containing masks, we used the synthetic test dataset
created from CelebA-HQ for evaluating image quality metrics. The results of the
comparison between our proposed method, Yu et al. [6] and pix2pix cGAN [8] are
shown in Table 1 which shows the better performance of our model.
Table 2 Face recognition results in terms of precision, recall, and F1-score for reconstructed faces
using two mentioned models
Model Accuracy Precision Recall F1-score
Pix2pix cGAN 0.787 0.727 0.787 0.738
Proposed model 0.824 0.767 0.824 0.784
VGGFace as face descriptor and MTCNN as face detector for the face recogni-
tion task of reconstructed faces. The qualitative assessment is conducted using four
metrics: Accuracy, precision, recall, and F1-score. The accuracy metric calculates
the number of instances that were correctly classified out of the total instances. Preci-
sion, on the other hand, represents the model’s accuracy in terms of the proportion
of true positive predictions out of all positive predictions made by the model. Recall
assesses a classifier’s capacity to identify all positive examples. The F1-score, also
known as the F-score, evaluates a model’s accuracy by combining the precision and
recall into a single metric and is employed to categorize samples as “positive” or
“negative.”
Table 2 demonstrates the better performance of our proposed model over the
existing framework of pix2pix cGAN.
6 Conclusion
References
1. Manogaran G, Thota C, Lopez D (2022) Human-computer interaction with big data analytics.
In: Research anthology on big data analytics, architectures, and applications. IGI Global, pp
1578–1596
2. Sardar A, Umer S, Rout RK, Wang SH, Tanveer M (2022) A secure face recognition for
IoT-enabled healthcare system. ACM Trans Sensor Netw (TOSN)
336 C. Agarwal et al.
3. Khan MKJ, Ud Din N, Bae S, Yi J (2019) Interactive removal of microphone object in facial
images. Electronics 8(10):1115
4. Din NU, Javed K, Bae S, Yi J (2020) Effective removal of user-selected foreground object from
facial images using a novel GAN-based network. IEEE Access 8:109648–109661
5. Criminisi A, Pérez P, Toyama K (2004) Region filling and object removal by exemplar-based
image inpainting. IEEE Trans Image Process 13(9):1200–1212
6. Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative image inpainting with
contextual attention. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp 5505–5514
7. Farahanipad F, Rezaei M, Nasr M, Kamangar F, Athitsos V (2022) GAN-based face recon-
struction for masked-face. In: Proceedings of the 15th ınternational conference on PErvasive
technologies related to assistive environments, pp 583–587
8. Isola P, Zhu JY, Zhou T, Efros AA (2017) Image-to-image translation with conditional adver-
sarial networks. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp 1125–1134
9. Luo M, Cao J, Ma X, Zhang X, He R (2021) FA-GAN: face augmentation GAN for deformation-
invariant face recognition. IEEE Trans Inf Forensics Secur 16:2341–2355
10. Chou CJ, Chien JT, Chen HT (2018) Self adversarial training for human pose estimation. In:
2018 Asia-Pacific signal and ınformation processing association annual summit and conference
(APSIPA ASC). IEEE, pp 17–30
11. Liu G, Reda FA, Shih KJ, Wang TC, Tao A, Catanzaro B (2018) Image inpainting for irregular
holes using partial convolutions. In: Proceedings of the European conference on computer
vision (ECCV), pp 85–100
12. Darabi S, Shechtman E, Barnes C, Goldman DB, Sen P (2012) Image melding: combining
inconsistent images using patch-based synthesis. ACM Trans Graphics (TOG) 31(4):1–10
13. Park JS, Oh YH, Ahn SC, Lee SW (2005) Glasses removal from facial image using recursive
error compensation. IEEE Trans Pattern Anal Mach Intell 27(5):805–811
14. Iizuka S, Simo-Serra E, Ishikawa H (2017) Globally and locally consistent image completion.
ACM Trans Graphics (ToG) 36(4):1–14
15. Zhang L, Wen T, Shi J (2020) Deep image blending. In: Proceedings of the IEEE/CVF winter
conference on applications of computer vision, pp 231–240
16. Zeng Y, Gong Y, Zeng X (2020) Controllable digital restoration of ancient paintings using
convolutional neural network and nearest neighbor. Pattern Recogn Lett 133:158–164
17. Boutros F, Damer N, Kirchbuchner F, Kuijper A (2022) Self-restrained triplet loss for accurate
masked face recognition. Pattern Recogn 124:108473
18. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:
1412.6980
19. Wang Z, Wang G, Huang B, Xiong Z, Hong Q, Wu H et al (2020) Masked face recognition
dataset and application. arXiv preprint arXiv:2003.09093
20. Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality,
stability, and variation. arXiv preprint arXiv:1710.10196
21. Anwar A, Raychowdhury A (2020). Masked face recognition for secure authentication. arXiv
preprint arXiv:2008.11104
22. All about Structural Similarity Index (SSIM): Theory + code in PyTorch. Avail-
able online: https://medium.com/srm-mic/allabout-structural-similarity-index-ssim-theory-
code-in-pytorch-6551b455541e
23. Signal-to-Noise Ratio as an Image Quality Metric. Available online: https://www.ni.com/en-
in/innovations/white-papers/11/peak-signal-to-noise-ratio-as-an-image-quality-metric.html
Brain Cancer Detection Using Deep
Learning (Special Session “Digital
Transformation Era: Role of Artificial
Intelligence, IOT and Blockchain”)
S. Pandey
Student, Chandigarh University, Mohali, Punjab, India
S. Bansal (B)
Assistant Professor, Department of Mathematics, Chandigarh University, Mohali, Punjab, India
e-mail: shivani.bansal40@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 337
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_29
338 S. Pandey and S. Bansal
1 Introduction
[13]. A faceted deeper network compensates deep learning [14]. The human brain
employs the stochastic gradient descent method to lessen the difference between both
the goal and the real output. However, despite the additional levels, creating neural
network-based artificial intelligence simulations becomes increasingly challenging.
2 Related Work
The advanced deep learning-based brain cancer categorization methods are intro-
duced throughout this part. There are multiple techniques for detecting neurological
disorders depending on deep learning and transfer learning methods.
The technique which determines whether or not there exists a cancer and after-
ward classifying the tumor type was given by Pareek et al. [18]. A total of 150
magnetic resonance imaging scans of the head were used to assess the suggested
technique’s ability to detect central nervous system cancers. This categorization
340 S. Pandey and S. Bansal
procedure used the supervised learning method, as well as the extraction of features
used singular value decomposition. To evaluate the stages of the tumor, researchers
also measured its size. The fresh approach put forth in [19] beats older approaches
and yields successful success. The recommended strategy makes use of packed accel-
erated amazing tools, and distribution of gradients techniques to enhance operations
and create an excluded suite. They employ a support vector algorithm during the
classification phase. A sizable sample has been utilized to evaluate the optimum
parameters. When contrasted to cutting-edge methods, this strategy’s efficiency
is 90.27%. According to analytical outcomes, this approach fared better than the
newest methods. By making use of the potential of classical connections, a quali-
tative completely self-neural system utilizing three phases of classical for localiza-
tion is being proposed [20]. This revolutionary unsupervised qutrit-based counter-
propagation technology, employed internally, replaces the intricate atomic baseline
values predicted in supervisory systems. Using this technique, incremental quantum
superpositions can propagate across all levels of the demonstration.
3 Methodology Used
The brain cancer classification algorithm is proposed within those varying purposes
on either an efficient version that employs a deep neural network. Figure 1 depicts the
architectural framework of something like the suggested paradigm. Using classifica-
tion is employed to make a malignancy diagnosis. It comprises about an Inception-
ResnetV2 learning algorithm accompanied by something like a predicted optimal
refinement. This outcome is a numeric 0 or 1 (0: healthy, 1: cancer), and now it
makes utilization popular which was before frameworks (Inception-ResnetV2) to
speed up the identification of neurological disorders (Table 1).
The chosen training set is covered in this part. These factors which make up a classifier
generally independent of something like the approaches are therefore not determined
out from information. There are two basic types of hyperparameters: (1) those that
determine the show’s architecture, and (2) those that train the structure.
2. Segmenting Data
Data Enhancement
Orientation, luminescence manipulation, horizontally flip, longitudinal switch, as
well as other modifications were added to an optimization procedure of the utilized
photos that will further help the system be so much more effective and avoid being
feature for the perceived task.
3. Histogram Equalization
4 Data Description
Dataset [27] is really the sample that was employed. The substantial quantity of
nonlinear and non-routine routinely obtained inter pictures of glioma, with neurot-
ically patient diagnosed, as well as searchable promoter activation condition are
utilized for preparation, validating, and conducting an experiment. During several
more routinely acquired MRI images were therefore added to the samples used
in Task 1. Experienced research aims develop and validate regression coefficients
descriptions of cancer comment thread between every individual throughout the
learning, validating then assessment collections in order to statistically evaluate
actual predicted tumor clusters.
Data Preparation
As described in Fig. 3, another used dataset was categorized into the following three
groups: 2400 MRI pictures again for given dataset, 500 for the validation set and 50
for the testing dataset.
The subsequent stage is to duplicate the pictures required for information analysis
and visualization, as illustrated. It is made up of two classifications of magnetic
This segmented procedure uses four courses. The segmented classifications are (green
color), no cancer, non-enhancing brain cancer (red color), and (yellow color). Eventu-
ally, those sessions became divided into three separate sections. Examples of photos
and coverings with something like a malignant terminal disease just in 2021 sample,
using three-dimensional U-net segmented framework has been developed for further
accurate and quick medical image processing. 10% of the sample is used for assess-
ment, 20% for validating, as well as 70% for trained. At various online sources,
functionality is built continuously. Figure 7 illustrates how the approach improves
segmented predictive performance up to 96.98%.
346 S. Pandey and S. Bansal
6 Conclusion
References
1. Al-Galal SAY, Alshaikhli IFT, Abdulrazzaq MM (2021) MRI brain tumor medical images
analysis using deep learning techniques: a systematic review. Health Technol 11:267–282
2. Rahman ML, Reza AW, Shabuj SI (2022) An internet of things-based automatic brain tumor
detection system. Indones J Electr Eng Comput Sci 25:214–222
348 S. Pandey and S. Bansal
3. Key Statistics for Brain and Spinal Cord Tumors. Available online: https://www.cancer.org/
cancer/brain-spinal-cord-tumors-adults/about/key-statistics.html. Accessed on 20 Sep 2022
4. Ayadi W, Elhamzi W, Charfi I, Atri M (2021) Deep CNN for brain tumor classification. Neural
Process Lett 53:671–700
5. Liu J, Li M, Wang J, Wu F, Liu T, Pan Y (2014) A survey of MRI-based brain tumor segmentation
methods. Tsinghua Sci Technol 19:578–595
6. Amin J, Sharif M, Haldorai A, Yasmin M, Nayak RS (2021) Brain tumor detection
and classification using machine learning: a comprehensive survey. Complex Intell Syst
8:3161–3183
7. Yang Y, Yan LF, Zhang X, Han Y, Nan HY, Hu YC, Hu B, Yan SL, Zhang J, Cheng DL
et al (2018) Glioma grading on conventional MR images: a deep learning study with transfer
learning. Front Neurosci 12:804
8. Nazir M, Shakil S, Khurshid K (2021) Role of deep learning in brain tumor detection and
classification (2015 to 2020): a review. Comput Med Imaging Graph 91:101940
9. El-Kenawy ESM, Mirjalili S, Abdelhamid AA, Ibrahim A, Khodadadi N, Eid MM (2022)
Meta-heuristic optimization and keystroke dynamics for authentication of smartphone users.
Mathematics 10:2912
10. El-kenawy ESM, Albalawi F, Ward SA, Ghoneim SSM, Eid MM, Abdelhamid AA, Bailek
N, Ibrahim A (2022) Feature selection and classification of transformer faults based on novel
meta-heuristic algorithm. Mathematics 10:3144
11. El-Kenawy ESM, Mirjalili S, Alassery F, Zhang YD, Eid MM, El-Mashad SY, Aloyaydi
BA, Ibrahim A, Abdelhamid AA (2022) Novel meta-heuristic algorithm for feature selection,
unconstrained functions and engineering problems. IEEE Access 10:40536–40555
12. Abdelhamid AA, El-Kenawy ESM, Alotaibi B, Amer GM, Abdelkader MY, Ibrahim A, Eid
MM (2022) Robust speech emotion recognition using CNN+LSTM based on stochastic fractal
search optimization algorithm. IEEE Access 10:49265–49284
13. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich
A (2015) Going deeper with convolutions. In: Proceedings of the 2015 IEEE conference on
computer vision and pattern recognition (CVPR), Boston, MA, USA, 7–12 June 2015
14. Alhussan AA, Khafaga DS, El-Kenawy ESM, Ibrahim A, Eid MM, Abdelhamid AA (2022)
Pothole and plain road classification using adaptive mutation dipper throated optimization and
transfer learning for self driving cars. IEEE Access 10:84188–84211
15. Srikanth B, Suryanarayana SV (2021) Multi-Class classification of brain tumor images using
data augmentation with deep neural network. Mater Today Proc
16. Deepak S, Ameer P (2019) Brain tumor classification using deep CNN features via transfer
learning. Comput Biol Med 111:103345
17. Kokkalla S, Kakarla J, Venkateswarlu IB, Singh M (2021) Three-class brain tumor classification
using deep dense inception residual network. Soft Comput 25:8721–8729
18. Pareek M, Jha CK, Mukherjee S (2020) Brain tumor classification from MRI images and calcu-
lation of tumor area. In: Advances in intelligent systems and computing, Springer, Singapore,
pp 73–83
19. Ayadi W, Charfi I, Elhamzi W, Atri M (2020) Brain tumor classification based on hybrid
approach. Vis Comput 38:107–117
20. Konar D, Bhattacharyya S, Panigrahi BK, Behrman EC (2022) Qutrit-inspired fully self-
supervised shallow quantum learning network for brain tumor segmentation. IEEE Trans Neural
Netw Learn Syst 33:6331–6345
21. Khairandish M, Sharma M, Jain V, Chatterjee J, Jhanjhi N (2022) A hybrid CNN-SVM threshold
segmentation approach for tumor detection and classification of MRI brain images. IRBM
43:290–299
22. Öksüz C, Urhan O, Güllü MK (2022) Brain tumor classification using the fused features
extracted from expanded tumor region. Biomed Signal Process Control 72:103356
23. Kadry S, Nam Y, Rauf HT, Rajinikanth V, Lawal IA (2021) Automated detection of brain
abnormality using deep-learning-scheme: a study. In: Proceedings of the 2021 seventh interna-
tional conference on bio signals, images, and instrumentation (ICBSII), Chennai, India, 25–27
March 2021
Brain Cancer Detection Using Deep Learning (Special Session “Digital … 349
24. Irmak E (2021) Multi-classification of brain tumor mri images using deep convolutional neural
network with fully optimized framework. Iran J Sci Technol Trans Electr Eng 45:1015–1036
25. Saber A, Sakr M, Abo-Seida O, Keshk A, Chen H (2021) A novel deep-learning model for
automatic detection and classification of breast cancer using the transfer-learning technique.
IEEE Access 9:71194–71209
26. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image
segmentation. In: Lecture notes in computer science. Springer International Publishing, Berlin,
Germany, pp 234–241
27. Gupta S, Saini A (2018) An artificial intelligence-based approach for managing risk of IT
systems in adopting cloud. Int J Inf Technol 13:1–9. https://doi.org/10.1007/s41870-018-
0204-2
28. Saber A, Keshk A, Abo-Seida O, Sakr M (2022) Tumor detection and classification in breast
mammography based on fine-tuned convolutional neural networks. IJCI Int J Comput Inf
9:74–84
29. BRaTS 2021 Task 1 Dataset, RSNA-ASNR-MICCAI Brain Tumor Segmentation (BraTS)
Challenge 2021. Available online: https://www.kaggle.com/datasets/dschettler8845/brats-
2021-task1?select=BraTS2021_Training_Data.tar. Accessed on 20 Sep 2022
Traffic Accident Modeling and Prediction
Algorithm Using Convolutional
Recurrent Neural Networks
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 351
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_30
352 A. Kumar et al.
1 Introduction
The tremendous growth of vehicles has resulted in a batch of issues that governments
must address quickly and effectively. A few of them, like traffic congestion [1], has
been alleviated since drivers can now view traffic information and choose a less
crowded path to avoid traffic jams using real-time traffic volume data and vehicle
navigation systems depending on GPS. On the other hand, issues like traffic accidents
are difficult to control [2]. According to WHO report, every year, approximately 1.2
million humans end up dying, and 50 million are severely injured in accidents all
around the globe [3]. With so much suffering caused by traffic accidents, it’s critical
to learn what causes them to create a better road environment.
A traffic accident can be caused by a variety of circumstances, including driver
behavior, weather, and road conditions. Even though some research has focused on
the relationship between traffic accidents and these parameters, revealing dynamic
changes in accident risk with these factors is extremely difficult. To be more precise,
driving behavior differs from person to person, making real-time and large-scale
observation difficult. Furthermore, weather conditions are rarely accurately portrayed
in traffic accident scenes [5]. Moreover, the road conditions are far too steady to depict
risk shifts dynamically. Figure 1 depicts the plot of estimated road traffic deaths in
developed countries for past years.
Alongside its advantages, transportation, similar to some other endeavor or pecu-
liarity, has disadvantages and limitations for street clients [2, 4]. Traffic congestion
and related issues are on the ascent all over the planet, representing a danger to
individuals’ lives and property. Helpless traffic offices and conditions bring about
metropolitan contamination, rising fuel, and energy utilization, squandering a long
period of time each day in gridlock, squandering local area administration offices
and public resources, and, at last, the event of mishaps bringing about injury, passing,
and property harm [3, 6].
12.5
Fatalities per 100,000 Vehicles
8.2
7.1
5.5 5.7 5.7
4.4 4.7
Fig. 1 Estimated road traffic deaths (in millions) in developed countries for past years [4]
Traffic Accident Modeling and Prediction Algorithm Using … 353
The Feed-Forward Neural Network is the most prevalent forms of ANN, where the
linear flow of information from the input layer reaches to output layer via the hidden
layer/s. Recurrent Neural Networks (RNNs) [7–9] and Convolutional Recurrent
Neural Networks (CRNNs) [10–12] are the extensions of ANNs. The CRNN is supe-
rior at image processing jobs, whereas the RNN is more applicable in text processing
applications. Due to the various hidden layers structured in their designs, they are
being sometimes termed “Deep Learning Approaches”. There are numerous ambi-
guities and gaps in the data that traditional approaches cannot resolve. As a result,
AI utilizes these underlying uncertainties to construct a cause-and-consequences
relation in a variety of real-life situations.
The significant points of concern in this paper are CRNN-based traffic acci-
dent prediction algorithm, adopting the convolution kernel to extract feature values
and eventually compared with classic machine learning prediction algorithms. The
prediction algorithm of the CRNN does have a reduced loss as well as greater
predictive accuracy.
This paper is organized as follows. Section 2 presents an overview of the several
researches that have been undertaken so far by researchers in related area. Section 3
illustrates the methodology of the proposed system. Sections 4 presents our experi-
mental approach and evaluate and analyze the results obtained from our experiments.
We conclude the paper in Sect. 5. References are included in the end.
2 Background Works
Clustering and categorization of the data related to traffic accident is a way to reduce
unexpected outcomes. A technique of categorizing traffic events is relied on the type
of traffic occurrence [13–15]. Furthermore, some studies have treated traffic accident
data based on criteria such as visibility (daylight, and even nighttime circumstances)
[16]. Several clustering techniques, like latent class clustering [17], k-means clus-
tering [18], and community recognition algorithms, were employed to cluster road
accident records for the first time before accident analysis [19].
A deep fusion model that can deal with clear cut and persistent factors simultane-
ously was proposed [20]. Not exclusively does the model consider the elements of
auto collisions; however, it likewise considers the spatial–temporal connections in
rush hour gridlock stream. In this model, the unmitigated factors are taken care of by
a stacked restricted Boltzmann machine (RBM), the ceaseless factors are dealt with
by a stacked Gaussian-Bernoulli RBM, and the removed elements are melded by a
joint layer. The exhibition of the recommended model was analyzed and contrasted
with some benchmark models utilizing removed I-80 information.
A method for doing electric power steering (EPS) reverse engineering for external
control was proposed [21]. The fundamental goal of the linked research was to solve
the problem of predicting the dynamic trajectory of an autonomous vehicle with
precision. This was achieved by developing a new equation for calculating lateral
tire forces and modifying some vehicle characteristics during road tests.
354 A. Kumar et al.
A special methodology [22] for precise crossing point traffic forecast by fusing
extra information sources into the expectation model other than street traffic volume
information. We exploit the data gathered from reports of car crashes and roadwork
at convergences specifically. They additionally investigate two different forms of
learning plans: cluster learning and Internet learning. Gradient Boosting, Random
Forest (RF), and Extreme Gradient Boosting are three famous troupe choice tree
models utilized in the cluster learning plan, while the Fast Incremental Model Trees
with Drift Detection (FIMT-DD) model is used in the web-based learning plan. The
proposed technique was tried utilizing datasets made accessible by the Victorian
Government of Australia. The result reveals that incorporating contiguous episodes
and roadwork data works on the exactness of convergence traffic estimate.
3 Methodology
Here we have used the UK accident dataset for our study [6]. This research focuses
on a variable-based classification system for determining the degree of traffic acci-
dents. The collected data is preprocessed and noise is removed. Then, using the
suggested Convolutional Recurrent Neural Networks (CRNNs) model [23], the data
is trained. This trained data is further classified with the use of an edge computing plat-
form in which the intensity of risk is predicted. Detailed architecture and algorithms
are discussed in the following subsections. Figure 2 shows the basic methodology
diagram of our method.
Pre-Processing
Sampled Training
CRNN Trained
Classifier Model
Model Comparison
Model
Fig. 2 Basic methodology diagram for traffic accident modeling and prediction [11]
Traffic Accident Modeling and Prediction Algorithm Using … 355
Under the traffic accident prediction framework, two steps are utilized to predict
traffic accidents: preprocessing of dataset and prediction model training by using
a classifier. This research employs the UK Car Accident dataset from the United
Kingdom and first filters out all the features that have a stronger effect on traffic
circumstances. Before the CRNN training model can be created, the data must be
de-meaned as well as normalized. Subtract the average of each data dimension from
each dimension of the original data, then substitute. Second, before scaling the results
to the same scale, divide each dimension’s data by its standard deviation. Weather
variations, road surface smoothness, vehicle speed, vehicle type, light levels, road
type, travel length, and other attributes may be extracted automatically when prepro-
cessed data is supplied into the CRNN training model. To depict the current traffic
situation, create a status matrix [24].
The resulting parameter of the CRNNs’ training model varies between 0 and
1. Under this scenario, the lower the chance of a traffic accident, the higher the
final result is biased toward 0. A road accident, on the other hand, is nearly sure to
regularly arise. Through edge calculation, the Mobility Management Entity (MME)
can package as well as deliver the projected outcomes to the operating vehicle [25].
This can also remind the driver to modify his or her speed promptly and to give
greater attention to the surroundings. Ultimately, we must fulfill the goal of reducing
the volume of traffic accidents.
Deep learning method is a subset of machine learning technology that can manage
large amounts of data and obtain attribute values on its own, making it perfect for use
on the Internet of Vehicles (IoV). Even the centralized cloud computing technique
finds it inadequate for deep learning of data received via automobile networking
interfaces due to network capacity limits. The hidden layer calculations for most of
the layers in deep learning technique can be delivered to the edge server, and finally
the reduced information is supplied to the cloud server, to effectively and appropri-
ately manage resources and to continuously improve network resource scheduling.
Wayside, micro-cloud, and base station capture as well as send real information on
vehicle driving and traffic to an edge server. The edge server decodes and evaluates
the information instantly when the volume of data acquired is significant enough.
The present deep learning activities are carried out by the core network’s centralized
cloud computing center, with the outcomes communicated back to an edge server,
and is then forwarded toward the vehicle unit. Certain deep learning activities are
offloaded from the cloud to the edge nearest IoV devices, limiting data transmission
to the cloud and freeing network bandwidth resources [26].
356 A. Kumar et al.
Neural Networks’ basic calculations are inextricably linked to neurons. The neurons
comprise the basic building blocks of every Neural Network (NN), so they’re mostly
utilized to incorporate nonlinear features. Sigmoid, Tan Hyperbolic (tanh), Rectified
Linear Units (ReLu), Softmax, and other activation functions are popular. A percep-
tual layer is made up of two layers of functional neurons. External input signals are
received by the perceptron layer’s input layer. The M-P neuron is the output layer
(the functional layer of the perceptron layer) [10]. The perceptron model’s formula
is defined as:
y = f 1 (ωa + b) (1)
Before approaching the convolutional computing layer for actual training, the data
from the original traffic dataset [1] must be preprocessed, which comprises de-
meaning and normalization. The process of reducing every dimension of input data
to zero is termed “de-mean”. It seeks to align the specimen centroid well with the
coordinate system’s origin. By dividing each dimension’s data by its standard devia-
tion, the data is normalized. The purpose would be to scale this information to a very
similar scale. The result from the convolutional layers (feature maps) is turned into
a sequence of feature vectors in CRNN instead of employing fully-connected layers
at the end of CNN. These vectors are then fed into a bidirectional RNN (Fig. 3).
The proposed algorithm has the potential of achieving ultra-fast, high-accuracy
recognition. Furthermore, the real-time accuracy can be enhanced by increasing the
response time.
Traffic Accident Modeling and Prediction Algorithm Using … 357
4 Performance Analysis
This paper incorporates Python’s high-level neural network API framework Keras, as
well as the UK Car Accident 2005–2015 dataset, to replicate the presented accident
prediction system. The Windows 10 OS, 10th Gen Intel Core i7, and Nvidia RTX
3080 Max-Q GPU were the system specifications utilized in the construction. Here we
evaluated our proposed model (CRNN) with other models such as LSTM, DenseNet,
ResNet, VGG, and RNN against performance measures like accuracy, specificity,
sensitivity, recall, F-score, and memory utilization.
The comparative analysis in terms of sensitivity, specificity, and accuracy of
various models such as VGG, LSTM, DenseNet, and RNN with our proposed model
CRNN under 5 datasets is given in Table 1. The graphical representation of models
versus average sensitivity, specificity, and accuracy is shown in Fig. 4. From the
graph it is evident that our proposed method CRNN is giving better performance
compared with the existing models VGG, LSTM, DenseNet, and RNN.
The comparative analysis (recall, F-score, and memory utilization) of various
models with our proposed model under 5 datasets is given in Table 2. The graphical
representation of models versus average recall and F-score is shown in Fig. 5. From
the above evidences it is proved that CRNN out performs existing methods.
From the comparative analysis of Table 1, it is evident that the CRNN network
is giving better accuracy of 96%, specificity of 90%, and sensitivity of 92.3% when
compared with the existing methods such as VGG, LSTM, DenseNet, and RNN.
The encouraging results are due to the feature extraction by utilizing a Convolu-
tional Neural Network (CNN), and a Recurrent Neural Network (RNN), which was
developed for synthesizing multi-view characteristics out of each image for ultimate
prediction. The improved result in terms of recall, F-score, and memory utilization
358 A. Kumar et al.
over the set of five datasets that has been considered for our study is presented in
Table 2.
Traffic Accident Modeling and Prediction Algorithm Using … 359
5 Conclusions
The world’s population is rising tremendously and the number of automobiles on the
road is expanding in tandem which increases the risk of traffic accidents and injuries.
As a result, having a traffic accident modeling as well as a prediction system is critical
for mitigating the situation. Here, we proposed a traffic modeling as well as prediction
framework centered on CRNN. We have used the UK traffic accident dataset which
is preprocessed and trained by CRNN. Edge computing platform was used for the
classification of trained data. When the system predicts the danger of an accident, it
sends an alert signal to the vehicle unit in movement, based on the intensity of the
risk. We have evaluated our proposed model (CRNN) to other models such as LSTM,
DenseNet, ResNet, VGG, and RNN against performance measures like accuracy,
specificity, sensitivity, recall, F-score, and memory utilization. Our proposed model
(CRNN) outperforms other existing models with an average accuracy of 95.10% and
is a promising area for being further researched with diverse selection of datasets.
References
1. Tedjopurnomo DA, Bao Z, Zheng B, Choudhury FM, Qin A (2020) A survey on modern deep
neural network for traffic prediction: trends, methods and challenges. IEEE Trans Knowl Data
Eng 34:1544–1561
2. An J, Fu L, Hu M, Chen W, Zhan J (2019) A novel fuzzy-based convolutional neural network
method to traffic flow prediction with uncertain traffic accident information. IEEE Access
7:20708–20722
3. Qingchao L, Wang B, Zhu Y (2018) Short-term traffic speed forecasting based on attention
convolutional neural network for arterials. Compu Aided Civil Infrastruct Eng 33(11):999–
1016
4. Fukuda S, Hideaki U, Hideki F, Tomonori Y (2020) Short-term prediction of traffic flow under
incident conditions using graph convolutional recurrent neural network and traffic simulation.
IET Intel Transport Syst 14(8):936–946
5. Zheng M, Li T, Zhu R, Chen J, Ma Z, Tang M, Cui Z, Wang Z (2019) Traffic accident’s severity
prediction: a deep-learning approach-based CNN network. IEEE Access 7:39897–39910
6. Manchanda C, Rathi R, Sharma N (2019) Traffic density investigation & road accident analysis
in India using deep learning. In: Proceedings of the 2019 international conference on computing,
communication, and intelligent systems (ICCCIS). IEEE, Greater Noida, India, pp 501–506
7. Chen C, Xiaoliang F, Chuanpan Z, Lujing X, Ming C, Cheng W (2018) SDCAE: stack denoising
convolutional autoencoder model for accident risk prediction via traffic big data. In: Proceed-
ings of the 6th international conference on advanced cloud and big data (CBD). IEEE, Lanzhou,
China, pp 328–333
8. Plante PL, Francovic-Fontaine E, May JC, McLean JA, Baker ES, Laviolette F, Marchand M,
Corbeil J (2019) Predicting ion mobility collision cross-sections using a deep neural network:
DeepCCS. Anal Chem 91(8):5191–5199
9. Sameen MI, Pradhan B (2017) Severity prediction of traffic accidents with recurrent neural
networks. Appl Sci 7(6):476
10. Cai Q, Abdel-Aty M, Yuan J, Lee J, Wu Y (2020) Real-time crash prediction on expressways
using deep generative models. Transp Res Part C Emerg Technol 117(102697):2020. https://
doi.org/10.1016/j.trc.2020.102697
Traffic Accident Modeling and Prediction Algorithm Using … 361
11. Petersen NC, Filipe R, Francisco CP (2019) Multi-output bus travel time prediction with
convolutional LSTM neural network. Expert Syst Appl 120:426–435
12. Mohamed AA, Qian K, Elhoseiny M, Claudel CG (2020) Social-STGCNN: a social spatio-
temporal graph convolutional neural network for human trajectory prediction. In: Proceedings
of IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, Seattle,
USA, pp 14412–14420
13. Golob TF, Recker WW, Leonard JD (1987) An analysis of the severity and incident duration
of truck-involved freeway accidents. Accid Anal Prev 19(5):375–395
14. Giuliano GE (1989) Incident characteristics, frequency, and duration on a high volume urban
freeway. Transp Res *A, 23A(5):387–396
15. Ozbay K, Kachroo P (1999) Incident management in intelligent transportation systems 1-248.
Artech House Publishers, Norwood, MA
16. Wei X, Shu X, Huang B, Taylor EL, Chen H (2017) Analyzing traffic crash severity in work
zones under different light conditions. J Adv Transp 2017:5783696
17. Depaire B, Wets G, Vanhoof K (2008) Traffic accident segmentation by means of latent class
clustering. Accid Anal Prev 40(4):1257–1266
18. Anderson TK (2009) Kernel density estimation and K-means clustering to profile road accident
hotspots. Accid Anal Prev 41(3):359–364
19. Lin L, Wang Q, Sadek AW (2014) Data mining and complex network algorithms for traffic
accident analysis. Transp Res Rec 2460(1):128–136
20. Li J, Hong D, Gao L, Yao J, Zheng K, Zhang B, Chanussot J (2022) Deep learning in multimodal
remote sensing data fusion: a comprehensive review. Int J Appl Earth Obs Geoinf 112:102926
21. Shadrin SS, Varlamov OO, Ivanov AM (2017) Experimental autonomous road vehicle with
logical artificial intelligence. J Adv Transp 2017:2492765
22. Alajali W, Zhou W, Wen S (2018) Traffic flow prediction for road intersection safety. In:
2018 IEEE SmartWorld, ubiquitous intelligence & computing, advanced & trusted computing,
scalable computing & communications, cloud & big data computing, internet of people and
smart city innovation, Guangzhou, China, pp 812–820
23. Jain AK (2018) Working model of self-driving car using convolutional neural network, Rasp-
berry Pi and Arduino. In: Proceedings of the 2nd international conference on electronics,
communication and aerospace technology (ICECA), IEEE, Coimbatore, India, pp 1630–1635
24. Heidari M, Rafatirad S (2020) Semantic convolutional neural network model for safe business
investment by using Bert. In: Proceedings of the 7th international conference on social networks
analysis, management and security (SNAMS). IEEE, Paris, France, pp 1–6
25. Guo K, Yongli H, Zhen Q, Hao L, Ke Z, Yanfeng S, Junbin G, Baocai Y (2020) Optimized
graph convolution recurrent neural network for traffic prediction. IEEE Trans Intell Transp
Syst 22(2):1138–1149
26. Jiang W, Luo J (2022) Graph neural network for traffic forecasting: a survey. Expert Syst Appl
Int J 207(C):117921
Cyberbullying Severe Classification
Using Deep Learning Approach
Abstract Given the accessibility of the Internet, social media has become a popular
means of communication. Social media facilitates communication, but it has also
created several issues. People may be bullied despite the advantages of using social
media. Due to the extensive variety of language used, identifying cyberbullying in
posts is complex. The detrimental effects of bullying on social media are getting
worse every day, which is frightening. In this study, we proposed cyberbullying
detection and its severe classification model using roBERTa, convolutional neural
network (CNN), and long short-term memory (LSTM). We used an annotated dataset
with a total of 7547 negative tweets derived from the Nigerian top influential Twitter.
The proposed model demonstrates nearly 97% accuracy in detecting and classifying
posts into four severe classes: Very severe, severe, moderate, and mild.
1 Introduction
Considering how widespread the Internet is, people now communicate easily on
social media. Social media is helpful for communication, but it has also led to several
problems. Users of social media are more likely than non-users to experience abuse
or contempt [1]. Aggressive behavior, which can be verbal, physical, or social is
referred to as bullying. Cyberbullying is, by extension, defined as bullying that takes
place over digital devices like cell phones, tablets, and computers [2]. Verbal and
emotional abuse, such as spreading untrue stories, disclosing accurate or incorrect
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 363
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_31
364 I. Mohammed and R. Prasad
personal information via texts, and sharing things on social networking platforms,
are the most prevalent forms of cyberbullying.
Text-based comments or messages are a common method of communication on
social media platforms and have evolved into the main form of cyberbullying. In
cyberbullying, motive, power, elements of repetition, and differential are considered
essential factors [3]. We observe that individuals, particularly teenagers and young
adults, are coming up with novel strategies for cyberbullying. The level of depression
and anxiety among cyberbullying victims is higher, and they also perform worse at
work, have more suicidal thoughts and attempts, perform worse in school, and are in
worse physical and mental health [4]. Because there is a larger online audience and
messages propagated more quickly online than in conventional bullying, the negative
consequences of cyberbullying are more severe.
Research conducted by Bayzick et al. [5] identified Exclusion, Denigration,
Outing, Flooding, Masquerade, Flaming, Cyberstalking, Harassment, and Trolling
as different types of cyberbullying. To manage cyberbullying dies, natural language
processing (NLP) and machine learning (ML) are combined to identify whether the
text contains cyberbullying content automatically. Deep learning algorithms can be
used in online social networks, social curation, wikis, tweeting, forums, and social
bookmarking to identify cyberbullying. These algorithms are made to automatically
find cyberbullying texts among massive amounts of data.
Severe techniques used in previous studies to classify the severity of cyberbul-
lying such as Hierarchical Squashing-Attention Network [1], fine-grained categories
[6], and language-based techniques [7] did not put into consideration users’ dialect
and behavior when choosing training datasets. In our study, we consider Nigerian
English posts on social media and translated the posts into English to enable our
proposed model to act robustly. Also, users’ behaviors were put into consideration
when annotating the dataset to classify the severe level.
Therefore, in the current social media platforms, cyberbullying is a study issue
that demands further attention. The paper’s contributions are as follows:
(i) An annotating scheme to generate a cyberbullying detection and severe
classification dataset for Nigerian English.
(ii) Model for cyberbullying detection and its severe classification in social media
posts.
The research continues as follows: Sect. 2 offers a literature review of related
research papers. Section 3 describes the methodology used to develop an approach
that matches the objectives of this study. Section 4 describes the proposed model
architecture and how it operates. Section 5 presents and discusses how well different
classifiers did when given the task of classifying tweets according to their severity
ratings. Finally, Sect. 6 concludes the study.
Cyberbullying Severe Classification Using Deep Learning Approach 365
2 Related work
Even though the term “Cyberbullying” did not exist two decades ago, nonetheless,
the issue has now become widespread [8]. Cyberbullying is harassment committed
using digital tools. It can take place on gaming platforms, messaging platforms, social
media, and mobile phones. Its consistent behavior is meant to frighten, infuriate, or
degrade the targets. Twitter and other social networks aim to prohibit or remove
publications that support cyberbullying victimization.
Very limited researchers are working on the detection of cyberbullying severity.
Research describing cyberbullying severity is conducted by authors in [9]. Their
research aimed toward detecting and classifying cyberbullying severity using Naïve
Bayes, KNN, Decision Tree (J48), Random Forest, and SVM Classifiers. They offer
a comprehensive approach to measuring the severity of cyberbullying in online social
networks. Also, they build machine learning multi-classifier for classifying cyber-
bullying severity into different levels with multi-class classification problems as well
as for binary class classification problems. They classify cyberbullying into severe
experiences into long-term, physical threats, sexual, and trapping.
Authors in [8], categorize severity into ten (10) levels, ranging from 1 (mild) to 10
(severe). To detect cyberbullying, they employed a language-based technique. They
also generated a feature to measure the overall “badness” of a post which is called
SUM and compute it by taking a weighted average of the “bad” words (weighting
by the severity assigned).
Research conducted by authors in [1] classify cyberbullying severity posts by
using a Hierarchical Squashing-Attention Network. The authors established a cyber-
bullying severity dataset of Chinese-language marked with three severity levels
(serious, medium, and slight) and develop a new squashing-attention mechanism
of hierarchical squashing-attention network. The authors adopted a cross validation
approach to evaluate training algorithms which resulted in 79.76% accuracy.
Van Hee et al. [6] presented model using fine-grained categories to detect cyber-
bullying on online post. Their experiment was conduction on Dutch dataset. The
classification includes insult, defamation, exclusion, and threat. Annotations iden-
tified include Author’s role, threat, insult, curse, defamation, sexual talk, defense,
and encouragement to the harasser. They used lexical features to gain insight into the
difficulty and learnability of the detection and fine-grained classification of cyberbul-
lying conducted on a Dutch dataset. They classify cyberbullying events into Insult,
Threat/Blackmail, Defamation, Curse/Exclusion, Defense, Sexual talk, and Support
to the harasser.
However, the reviewed automated cyberbullying detection techniques still require
development to determine the level of cyberbullying in social media posts with high
accuracy. Not much work has been put into determining the severity of bullying feel.
In this study, we proposed a model that combines robustly optimized roBERTa with
long short-term memory trained with dataset constructed with recent post to identify
and classify severe levels of cyberbullying.
366 I. Mohammed and R. Prasad
3 Methodology
Several scientific studies have shown that social media may be valuable sources of
data for analysis as well as for understanding people’s attitudes and behavior [3].
This section outlines how the dataset for the research originated. Figure 1 and outline
the procedure of data gathering.
Usually referred to as opinion mining, sentiment analysis can detect, extract, and
quantify the emotional undertone of a body of text. With the development of deep
language models like roBERTa [9], it is now possible to evaluate more challenging
data domains, such as news texts where authors often convey their opinions or senti-
ments less openly. Twitter is well-known for being a platform where users may tweet
about their emotions. In this regard, our sentiment analysis section uses roBERTa
model to determine whether a post is expressing positive (1), neutral (0), or negative
(− 1) emotions. Any text that was found negative is appended to the collected list. The
result is then saved in a CSV file, downloaded, and forwarded for annotation. A total
of 7547 negative tweets were used for annotation. Table 1 contains the descriptions
for each column for annotated dataset.
The negative streams of text collected from the sentiment analysis stage were
annotated manually resulting in 48,354 cyberbullying entries and 151,646 non-
cyberbullying. As a testing dataset, the cyberbullying entries were selected, annotated
by two English speakers. We used Cohen’s kappa Statistics to evaluate the result of
manual annotation which measures the degree of agreement between the annota-
tors [10]. Annotators are labeled A, B, and C for easy identification. The annotation
result of A is compared with B, A with C, and vice versa as provided in Table 2. The
formula to calculate Cohen’s kappa for two raters is shown in Eq. 1.
po − pe 1 − po
k= =1− , (1)
1 − pe 1 − pe
where
Table 3 Distribution of
Classification Annotated tweets
dataset by cyberbullying class
Mild 1084
Moderate 1280
Severe 1256
Very severe 87
Cohen’s Kappa technique yields a result of 0.87. Based on the result, the adoption
rate of the dataset annotations produced by this study is near the perfect agreement
level.
For the second annotation, we adopted a high-quality annotated dataset for harass-
ment post provided by Thirunarayan and Shalin [11]. To conduct our experiment on
assessing the intensity of harassment using the dataset, we translate the dataset into
English, then divided the annotated tweets that were cyberbullied into 4 levels. Mild,
moderate, severe, and very severe. We then categorize personnel related tweets as
mild, political racist as moderate, intellectual as severe, and murder as very severe.
Dataset is published at https://data.mendeley.com/datasets/4w6fcyzdfp/1. As a result,
a dataset with features in Table 3 was created.
The most important stage of the text classification pipeline is selecting the optimal
classifier. Without a complete conceptual understanding of each algorithm technique,
we cannot effectively choose the best text classification model for implementation.
A model to identify cyberbullying behaviors and their severity was developed using
the severity detection of cyberbullying from tweets. To select the best algorithm for
classification, we tested several machine learning algorithms namely: Convolutional
neural network (CNN), Naïve Bayes, support vector machine (SVM), long short-term
memory (LSTM), and K-nearest neighbors (KNN).
Cyberbullying Severe Classification Using Deep Learning Approach 369
Support vector machine (SVM) was developed within the context of statistical
learning theory and have successfully been used by various applications, including
face recognition, time series forecasting, and processing data for medical diagnosis
[15].
Finding separators that can identify the various classes in the search space is the
main goal of SVM [7]. By training our SVM mode, each of the four classes very
severe, severe, moderate, and mild were applied as target variable using one-against-
all approach. For each classifier, the class is fitted against all the other classes. The
results of the comparison will be presented in the results section.
Long short-term memory (LSTM) cell type is a unique configuration of the funda-
mental RNN unit [16]. LSTM is local in time and space its time and space compu-
tational complexity is O(1) [17]. Based on its ability to contact the context, we used
the LSTM algorithm to examine the sentiment in derived characteristics.
370 I. Mohammed and R. Prasad
In this section, we illustrate the design of model with four components for cyber-
bullying severe classification (CSC). The components include data input, sentiment
analyzer, cyberbullying detector, and severe classifier. Figure 2 illustrates the overall
layout of our model. In the remaining portion of this section, we explain details about
our model.
The CSC model begins with reading test data. The test data is then pre-processed
and submitted to sentiment analysis stage which utilizes roBERTa model to detect if
the stream of data contains negative sentiment. roBERTa model is used to tokenize
words and build word embeddings [9]. The sentiment analysis section gives output
Negative (− 1), Neutral (0), and Positive (1). Cyberbullying detector stage begins
with reading 7547 training data and a stream of negative test data. The cyberbullying
detector utilized CNN and LSTM algorithms. The CNN algorithm was used to extract
features from streams of test data. The LSTM algorithm works by analyzing the
sentiment in retrieved features based on its attribute of being able to contact the
context [13]. The cyberbullying detector gives out labels cyberbullying (1) and non-
cyberbullying (0). If the output of this stage if given as non-cyberbullying the next
streams of test data is loaded for testing while if the output is cyberbullying, then the
next stage is loaded.
The final stage of this model is the severe classifier. Begins with reading 3707
training data and cyberbullying text. The model utilized CNN and LSTM algorithms
to classify the severity of cyberbullying.
The CSC model uses Twitter-roBERTa-base and LSTM algorithms for model training
and severe classification using the dataset explained in section III(A). Millions of
tweets were used to train the roBERTa-base model and fine-tuned for sentiment
analysis with the TweetEval benchmark [19]. The CSC model sentiment analysis
module implemented in Python was used. roBERTa-base model consist of 12 base
layers and 125 million parameters [9]. The layers’ goal is to produce a useful feature
representation of words from which the more sophisticated layers may quickly extract
the necessary data. The results of the dropout layers are then fed into a LSTM. The
LSTM model retains data for the purpose of identifying input dependencies that are
far away in the feature. When words are changed into numbers, the models learn
more effectively.
F1-score, accuracy, precision, and recall were utilized to assess the classification and
model comparison. The aim of evaluation is to confirm model performance [20].
The percentage of classifications a model successfully forecast divided by the total
number of predictions is known as model accuracy [21]. Accuracy is calculated using
Eqs. (2) and (3).
CP
Accuracy = (2)
TP
Acronyms definition
CP correct prediction
TP total predictions.
Equation 3 can be used to calculate the positive and negative accuracy of the
binary classification.
372 I. Mohammed and R. Prasad
TP + TN
Accuracy = (3)
TP + FN + FP + TN
Acronyms definition
TP true positive
TN true negative
FN false negative
FP false positive.
Precision is defined as the proportion of accurately categorized positive samples
(True Positive) to the total number of positively classified samples. It’s computed
using Eq. (4).
TP
Precision = (4)
TP + FP
The percentage of true positive predictions that are correctly categorized and
measured as Eq. (5).
TP
Recall = (5)
TP + FN
The F-score, also known as the F1-score, is used to assess a model’s accuracy on
a dataset. It is used to evaluate binary categorization methods that label examples as
“positive” or “negative”. The accuracy evaluation measure could not be accurate if
there isn’t a balanced dataset. F1-score is calculated using Eq. (6).
2TP
F1 = (6)
2TP + FP + FN
In this section, we examines how well various classifiers performed when tasked with
categorizing tweets based on their severity ratings. Results for each classifier’s multi-
class categorization in various contexts are shown in Table 4. With CNN + LSTM
features, performance was significantly improved in terms of accuracy and F-score.
The results were obtained using data obtained from posts in Nigeria top influential
Twitter handles. All posts were subjected to test after text preprocessing without
any translation. Our suggested strategy outperforms numerous feature-engineered
strategies and procedures in terms of identifying cyberbullying and categorizing it
as severe in binary classification. In this research, several recommended features
were added on top of the default classifier settings, but only some features enhanced
classifier performance. The CNN + LSTM model performs well because the deep
Cyberbullying Severe Classification Using Deep Learning Approach 373
Table 4 Comparison of
Case study Algorithm Accuracy F-score
testing accuracy of CNN,
SVM, KNN, Naïve Bayes, 1 CNN 95.64 0.91
and LSTM algorithms SVM 87.42 0.88
KNN 86.65 0.83
Naïve Bayes 76.56 0.82
LSTM 97.54 0.88
2 CNN + SVM 95.33 0.94
3 CNN + KNN 91.37 0.91
4 CNN + Naïve Bayes 93.76 0.93
5 CNN + LSTM 97.34 0.95
6 SVM + KNN 96.01 0.87
7 SVM + Naïve Bayes 89.52 0.88
8 SVM + LSTM 96.87 0.94
9 KNN + Naïve Bayes 79.76 0.89
10 Naïve Bayes + LSTM 89.61 0.91
learning model can learn more about the text’s underlying semantic structure [22].
The CNN + LSTM model can ensure that each microblog’s overall metrics are
met, but it can also extract more detailed semantic data, modify different model
parameters, and optimize the algorithm to better classify emotions.
6 Conclusion
In this study, we presented a new multi-class approach to detect and classify severity
of cyberbullying posts written in Nigeria English in social media. The main objective
of this research is to build a dataset of cyberbullying severity and development a model
for severe classification of posts made in Nigerian English. The obtained results
show an improvement in terms of F-score and accuracy when compared with other
techniques. Our study proves that detecting severity of cyberbullying is influenced
using language. We were unable to conduct an in-depth investigation of user activity
on social media. Despite this limitation, we believe that our proposed work will
contribute the multi-class classification of cyberbullying from binary classification
of cyberbullying or non-cyberbullying. Furthermore, present research focuses on
text-based post on social media. Other forms of communications such as voice,
image, and video need to be investigated with same pattern to identify severity.
374 I. Mohammed and R. Prasad
References
Abstract Six Sigma is a method for updating processes to reduce variability and
defects. Although many world-class manufacturing businesses have adopted Six
Sigma, it is still relatively new in the software sector. This article discusses how the Six
Sigma technique may be used to reduce faults in software maintenance projects. To
solve the fundamental problem of minimizing customer reported defects during the
maintenance phase of the software, the Define–Measure–Analyze–Improve–Control
(DMAIC) technique was used. The purpose of this study was to show how a software
process may adopt a systematic approach for achieving world-class quality while
also achieving customer satisfaction as well as the overall profit of a company. The
project discuss is the live maintenance project done by the software QA team which
was responsible for ensuring that the software meets quality standards and customer
satisfaction. The work done in this project shows that the implementation of Six
Sigma during the maintenance or QA process has improved the total profit of the
Software Service company and thus saved the ruining of brand of the company.
1 Introduction
The service industry (manufacturing industry and software industry) throughout the
world had played a vital role in the economies of both developed and developing
countries demanding an emphasis on the quality of service. The success of Japanese
A. Juvekar (B)
IT Consultant, Mumbai, India
e-mail: Abhay.Juvekar@yahoo.com
O. L. D’souza
HCL Technology, Mumbai, India
A. Chaware
Associate Professor, P G Department of Computer Science, SNDTWU, Mumbai 400049, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 375
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_32
376 A. Juvekar et al.
industry, particularly in the 1970s and 1980s, encouraged the entire globe to focus
on quality concerns [1]. The Japanese experience has shown that the requirements
and expectations of consumers are the most important aspects in determining quality
[2]. There are no exceptions, in the software industry as well. As a result, the value
of high-quality software can no longer be overlooked. Today’s customers expect and
value greater quality and are ready to pay a premium for it [3]. The characteristics
of high-quality software are:
• proper development,
• compliance with requirement specifications,
• good performance that meets customer expectations, and
• suitable for usage.
Software projects that are of poor quality are delayed, failed, abandoned, or
rejected, and may accelerate the execution of the penalty clause and cancelation
of the business contract, resulting in the removal from the preferred vendor list. Even
completed software projects may require costly ongoing maintenance and corrective
releases or service packs to ensure excellent software quality [4]. In this research,
the IT solution provider firm DG Solutions which has taken the E-ticketing and
Reservations Project of Hindustan Asia Pacific Airways (HAPA) a reputed airline
with the following specification—the project size was estimated to be 307070 SMC
assuming a productivity of 8.8 SMC/PD and the duration was for around 32 months
with a resource allocation of around 45 personnel all through that period. But due to
poor productivity and delayed deliverables, the project was labeled as RED and was
under the purview of High Risk Project Management (HRPM) with the possibility
of HAPA blacklisting DG solutions.
Originally, the project size was estimated to be 307070 SMC assuming a produc-
tivity of 8.8 SMC/PD and the duration was for around 32 months with a resource allo-
cation of around 45 personnel all through that period. However, with in loopholes in
the system like the Requirement Gathering Phase took an overly long time and despite
the extended period elapsed it was far from being done, though the activities were
studiously monitored and time reporting was maintained, but the idle time of SMEs
and BAs was increasing multifold, code reusability was not on plan, requirement
volatility, complexity, and ambiguity was rife and irregular/overlapping configura-
tion management led to missing components of Software. This give HAPA a buggy
Software with Schedule Overrun, loss incurred to the tune of $1.2 million which
finally ruin its brand in the international market. For DG solutions this computed to
a very high Schedule Variance (SV) bringing the project to the stage of foreclosure
associated with huge penalties.
The Six Sigma Methodology Implementation in Agile Domain 377
2 Literature Review
According to IEEE [5], a process is “a set of actions carried out for a certain goal.”
There is a correlation between processes and outcomes, hence process optimization
can increase the quality of a software product. It gives project participants a consistent
approach of doing the same task in the same way every time. Process improvement
is concerned with identifying and enhancing processes. Defects discovered in past
attempts are addressed in subsequent efforts [6]. Quality processes are usually intro-
duced toward the end of the project cycle, usually before implementation, in tradi-
tional Software Development Life Cycle (SDLC) approaches. Unit testing, system
testing, integration testing, and other terminology are widely used. Some of the most
efficient approaches emphasize on design reviews and code reviews, but they, too,
occur after a deliverable has already been created [7]. Six Sigma corrects this by
implementing toll gates at each level of the project’s life cycle. As a result, the
SDLC’s idea, requirements collecting, systems definition, software development,
software testing, deployment, and maintenance stages correlate to toll gates [8]. Toll
gates should be included from the start of a software project to increase the probability
of a successful outcome. One among the many available models and techniques, Six
Sigma can be used for process improvement.
Six Sigma, according to Tomkins [5], is “a program aiming at the near-elimination
of faults from every product, process, and transaction.” “A deliberate endeavor to raise
profitability, expand market share, and improve customer satisfaction via statistical
methods that can lead to break-through quantum increases in quality,” is according
to Harry [9]. Also, it is a “new strategic paradigm of management innovation for firm
survival in the twenty-first century, implying three things: statistical measurement,
management strategy, and quality culture” [10].
Six Sigma has two key methodologies: DMAIC and DMADV. Define–Measure–
Analyze–Improve–Control (DMAIC) is used to improve an existing business
process, and Define–Measure–Analyze–Design–Verify (DMADV) is used to create
new product or process designs for predictable, defect-free performance [10].
3 Proposed Solution
With the Waterfall Model, the Earned Value Analysis showed the infeasibility of
continuing with the Waterfall Model and still make the deadline. As the Client
key members were not always available due to their heavy workloads, the project
progress suffered, and the onus was put entirely on the Vendor Project Team. This was
despite regular status updates being presented to the Steering Committee comprising
stakeholders from both the Client and the Vendor.
Rather than crash the project, it was decided in concurrence with the customer to
move from the Waterfall to the Agile Model of delivery. Thus, there was stakeholder
signoff on shift of model. This was so that Quick Wins after incremental deliveries
378 A. Juvekar et al.
would enthuse the Client Project Team members and provide the necessary impetus
and encouragement to move forward positively. This also allowed DG to give periodic
deliveries as well as ensure transparency and joint/accepted accountability for delays
that might be wholly attributable to the Customer and/or other Vendors engaged for
interfaces.
The following improvements were expected from the QA Project (Table 1).
4 Methodology
The goal of this study is to improve the quality level and process capabilities of
software build for the E-ticketing and Reservations of the highly reputed airlines,
utilizing the Six Sigma DMAIC methodology. When the DMAIC technique is used,
it produces targeted goods at the right time and at a low cost.
The different stages followed under this methodology are explained below.
The output of As-IS project current productivity of 8.34 SMC/PD was to be targeted
to reach to the goal of 12.54 SMC/PD. To fulfill this requirement the process started
with getting a project Charter (Fig. 1).
The initial step is to define and develop a project charter considering the Project
Output and Goal as seen in Fig. 2.
Each sprint duration was taken as 12 weeks. Pre-planning was done over a span of
8 weeks, followed by the sprints. Steady State Observation was done over 16 weeks.
Next step was to create the Supplier, Input, Process, Output, Customer (SIPOC) was
Project Title: To increase the productivity by 50%, reduce the delay and make Schedule var than zero. Thus, to remove the penalty
threat.
Problem Statement Opportunity Statement
Pain: Low Productivity
Due to poor productivity, deliverables are having a high positive sc iance. This impacts the Impact of pain in $: 10 milli loosing market
customer satisfaction as end-customer is unh customer is continuously complaining about branding.
the bug-ridden and delay delivery. Project product and quality needs to be
brought to six sigma
Goal statement Project scope
Product and Process under ment:
Metric Current level Goal / Targe Target D Productivity. Starts wit 2020 Ends with 20-
Aug-2021
Project plan
Team Selection
Activity Start End Goal Met
Fig. 2 Snap shot project charter for the Six Sigma project
the second activity. SIPOC indicates major activity or sub processes in a business
process and the top-down flow chart for the processes (Table 2).
The third activity was to prioritize the voice of customer and voice of business to
those which are related to project objective. Critical to Qualities (CTQs) and Critical
for Processes (CFPs) were thus determined. The fourth activity was to sketch the
top-down chart for PROCESS showing the different processes with the HR and
other details like starting point as customer and end point as satisfying the customer.
And finally, the fifth activity prepared was to define the process map to identify the
important processes in the Project. The following were the output of this process
Voice of customer → Delayed Delivery/Redundant usage of customer resources/
Loss of Revenue and Threat of vendor change at great cost
Critical to Quality → High SV on Positive side/Extended working hours for
vendor as well as customer/PSDD much above the USL
Voice of business → Rare Technology in the global market/Complex Airline
Domain/High Response Time required from all ends/Diminishing Profit Margins
Critical for Processes → Inexperienced resources/No trainer available in market
for TPF & Assembler/Reduction in cost.
In this phase of DMAIC, identify responsible assignable causes for poor quality or
variability in the existing process. The cause-and-effect diagram is a visual brain-
storming tool for capturing potential causes of an issue used at the analyses stage
to figure out what’s causing an issue, or in the improve stage to figure out what’s
causing it. The responsible causes (X’s) for each of the Y ’s are mentioned in cause-
and-effect diagram. Figure 4 shows the cause-and-effect diagram for the of the Y ’s
in the project, also called the Fish bone diagram due to its appearance (Figs. 4).
The Six Sigma Methodology Implementation in Agile Domain 381
Rank (% Weightage)
Rank (% Weightage)
30 30
20
10
5 5
Fig. 4 Fish bone diagram showing factors impacting the process map Y3 in the measure phase of
sprint 1
Based on the analysis phase, all the possible solutions for all the 3 problems (Y s)
were listed down, and the solution matrix was used to decide which the best solu-
tion is. Sigma impact, time impact, cost effect, and other implications are among
the assessment factors of the solution matrix. These factors’ scoring was done by
382 A. Juvekar et al.
15.56
9.65
5.36 1.43 6 6
1.13 1.46
1.14 0.30.59 0 0 0.1 0.58 1.12 1.05
Average
Average
Cp
Cpk
Cp
Cpk
Sigma Level
Standard deviation
Sigma Level
Standard deviation
-1.77-0.84
Fig. 5 The results before and after implementing the Six Sigma
Table 3 Indicating the values before and after the Six Sigma implementation
Output indicator—PSDD much above U DPMO Sigma level
Before Six Sigma implementation 1,000,000.00 0
After Six Sigma implementation 3.36 6
finding the correlation of solution to criteria. Higher the score better is the solution
in meeting the four requirements. So, from among the many causes found, 12 causes
were selected for improvement. The modification of the current process according
to the result of analyze phase was done in the improve phase. Figure 5 shows the
comparative visualization of the complete process before and after implementing the
Six Sigma (Table 3).
The sustainability in the improvements achieved and to monitor them to ensure the
continuity and sustainability success, the control chart thus obtained is given in
Table 4.
The Six Sigma Methodology Implementation in Agile Domain 383
Table 4 Control chart showing all the important results for Six Sigma
Output indicator Average Standard deviation Cp Cpk
Y1—High SV on the positive 0.10 1.13 1.46 1.43
Y2—Extended Working hour customer as 9.65 0.58 1.12 1.05
well as vendor
Y3—PSDD much above USL 0 0 1.43 1.04
Usually, as part of the closure of the project, one of the final deliverables is
replicating the project. As a result of this success of Six Sigma with one organiza-
tion the replication of this plan was done in two other firms at all locations where
project is executed. Action Plan of E-ticketing & Reservations for HAPA was imple-
mented on similar project. Reusable components like checklist, templates, knowledge
management portals, code optimizer were used wherever required.
Improvements due to the Six Sigma exercise are measured and the results as
shown in Table 5 reflect that, there is increase in productivity, reduction in schedule
overrun, considerable cost savings and reductions in number of bugs (Table 5).
Additionally, automation was introduced by using the High-Level Assembler
(HLASM) and Transaction Processing Facility (TPF) tools. Code was made reusable
which helped in increasing the profitability by reducing the time in subsequent coding
exercises. The tools coupled with more systemic controls in QA led to a decrease in
the COPQ.
5 Conclusion
The Six Sigma exercise was successful. This paper reports the results of a Six
Sigma implementation carried out in the Airlines in their QA. The project shows
that: Damage control was done in time and there was no business loss; produc-
tivity increase was gradual, stable, and sustainable; there was no downfall in the
productivity; customer co-operation was excellent during the crisis, team received
awards, bonus, increments, ESOPS after successful delivery milestones comple-
tion. Algorithm used for this project was taken for Organization Innovation and
Deployment.
384 A. Juvekar et al.
The process would be more effective if—internal people would have been given
an opportunity of bettering themselves instead of lateral hiring; lateral recruits should
have been done in planned manner and keeping demand supply threshold in mind;
domain knowledge experts should have scaled up as per the project technical require-
ments; proactive contribution of senior management in Six Sigma Project; High
Risk Project Management group’s contribution would have been proactive; require-
ment stability index should have been captured. Looking into what went wrong, we
conclude that the concept of nine Sigma would be more beneficial and maybe an
opportunity to reduce cost. This can be taken as a further scope for research.
References
1. Hsieh YJ, Huang LY, Wang CT (2012) A framework for the selection of six sigma projects in
services: case studies of banking and health care services in Taiwan. Serv Bus 6(2):243–264
2. Free Six Sigma Lessons. Motorola University (2008). http://www.motorola.com/content.jsp?
globalObjectId=3069-5787
3. Coby P (2004) Community Spirit. Airline Bus 20(6). http://www.flightglobal.com/articles/
2004/06/01/182274/community-spirit
4. http://www.iata.org/stbsupportportal (2009). StB Support Portal. https://www.iata.org/en/pub
lications/
5. Srivastava A, Bhardwaj S, Saraswat S (201) SCRUM model for agile methodology. In:
Proceeding—IEEE international conference on computing, communication and automation
ICCCA 2017, vol 2017-Janua, pp 864–869
6. Abrahamsson P, Salo O, Ronkainen J, Warsta J (2002) Agile software development methods:
review and analysis. VTT publication 478, Espoo, Finland, 107 p
7. Harvie DP (2016) Targeted scrum: software development inspired by mission command, vol
42, no 5, pp 476–489
8. Mundra A et al (2013) Practical scrum-scrum team: way to produce successful and quality
software. In: Proceedings of 13th international conference on computational science and its
applications, IEEE, pp 119–123
9. Hart MA (2011) Agile product management with scrum: creating products that customers love
by Roman Pichler. J Prod Innov Manag 28
10. Pan Z, Park H, Baik J, Choi H (2007) A Six Sigma framework for software process improve-
ments and its implementation. In: Proceedings of the 14th AsiaPacific software engineering
conference (APSEC’07), pp 446–453
11. Sutherland J, Schwaber K (2013) The scrum guide. The definitive guide to scrum: the rules of
the game
Toward a Generic Multi-modal Medical
Data Representation Model
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 385
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_33
386 K. M. Swaroopa et al.
2 Background
In recent years, AI models, especially the deep learning models, have shown
promising outcomes results on a wide range of medical data analysis tasks, ranging
from ultrasound-based lung disease detection [1] to detecting brain tumor based on
multi-modal MRI scans [2], with some results being on par or even surpassing human
expert performance. These promising outcomes illustrate a positive expectation for
automated and computer-aided medical data interpretation and diagnosis, with the
potential to inclusion in the clinical workflow, reduce errors in clinical practice,
and better health outcomes for patients. However, these successes come at the cost
of two important pre-conditions: (1) well-curated and labeled good-quality data in
extremely large volumes (requiring high resources and high curation costs), and (2)
Simplification of operational settings, attempting to answer a well-defined clinical
question from a single data modality.
The real-world scenarios are often the opposite. For most of the research prob-
lems, curating large datasets and good-quality data annotation is difficult, though
the data does, however, exist in abundance. Decades of digitized health records
and imaging data are sitting quietly on servers in health networks. The trouble
researchers often encounter is that these data are often not analysis ready. Even for
the same modality, the data may have different presentations (e.g., imaging result
varies by techniques, device makers and models, and operators), incomplete (e.g.,
some information is not recorded), and (most disappointingly) unlabeled, rendering
their use difficult. Data annotation is crucial and needs to be done by clinicians in
order to facilitate data analytics research, which is resource intensive. Even though
medical AI research is a field of scientific research, primarily focusing on developing
AI technologies; however, its development is significantly limited by the clinical
input from clinicians [3].
Representation models are pre-trained models which can be fine-tuned toward
different downstream clinical tasks. They do not remove the need for accumulating
data annotation but, with small-scale data annotation of the downstream task, they
can function with a strong generalization ability. From the technological aspect,
representation models take the form of deep learning neural network models and
often use self-supervised learning methods to learn on unlabeled data. In the natural
language processing (NLP) domain, massive-scale language representation models
are named foundation models, e.g., BERT [4], DALL·E [5], GPT-3 [6]. In the context
of computer vision, representation models cover the widely used convolutional neural
network (CNN, or ConvNet), such as ResNets [4] and the more recent Vision Trans-
formers (ViT) [7] which are often pre-trained on large datasets, such as ImageNet,
with or without using labeled annotations.
In real-world clinical practice, a clinical judgment takes account of the patient’s
demographics, medical history, as well as imaging exam and/or non-imaging test
results (e.g., blood test). For imaging analysis, general computer vision pre-trained
representation models are widely used as the starting point for medical image anal-
ysis. However, pre-trained models using medical imaging data are scarce. The few
388 K. M. Swaroopa et al.
existing models [1, 8, 9] are limited to a specific imaging modality and designed
to solve only the downstream clinical task, and they do not consider non-imaging
modalities. Therefore, there is a lack of a generic universal representation model that
can handle multi-modal data at the same time. A general-purpose medical represen-
tation model, with a wide coverage of imaging modalities and tasks, will create a
breakthrough transformative innovation in the field of medical image analysis. Such a
model will stimulate the flourishing development of medical AI in general, making a
positive impact on the health of mankind. The research reported in this paper lays the
foundation by creating initial representations toward a general-purpose medical data
representation model, with a reference implementation focusing multiple imaging
models that are applicable for different downstream tasks. The model is exten-
sible for combining both imaging and non-imaging modalities such as electronic
health records. The performance of the representation model proposed was evaluated
against different downstream task specific metrics, such as classification, detection,
or segmentation tasks. The proposed representation model can be extended hori-
zontally to model other imaging modalities such as the histopathology imaging and
OCT and inspire the technical development of generic vision foundation model, a
universal model to produce representations for images and videos from different
imaging technologies including natural scenes and medical imaging data.
The proposed medical data representation model can be highly significant for the
medical image analysis research community by addressing one of the most funda-
mental problems in the field of medical image analysis, which carries a huge potential
benefit to health care area. It will also advance the broad development of medical AI
in general, subsequently providing benefits to the local and global healthcare system.
The outcome from this research can provide an alternative source of generalization
ability for many medical data analysis projects. This representation model will facil-
itate an agile development cycle for diseases with rare prevalence (i.e., an incident
rate less than 1 in 2000 in a general population [10]) or where there is limited data. It
will also aid the researchers in making sensible development and validation decisions
on small-scale datasets before needing to perform a large multi-center validation.
With data availability being a long-standing limiting factor in the field of medical
image analysis, researchers often adopt ready-to-use computer vision representation
models to improve generalization of their learning models. However, there is a signif-
icant gap between natural senses (e.g., photos and videos from digital cameras and
smartphones) and medical imaging modalities (e.g., MRI, US, CT, and X-ray videos
and images). Closing the pre-training and downstream task domain gaps is the key to
successful transfer learning. Recently, several medical image representation models
were proposed [11–13]; however, each of these targets a specific imaging modality,
which narrows the scope of the downstream tasks to which it can provide benefit.
Further, existing multi-modal methods aim at fusing information from pre-selected
Toward a Generic Multi-modal Medical Data Representation Model 389
modalities under the assumption that the targeted modalities must all be presented,
and if any of the modality is missing or inadequate, this impacts on the representa-
tion model and downstream clinical application. This assumption is not always true
in the real world, and it limits the number of eligible studies to be included in the
model’s learning [14]. Also, the information in hospital databases is usually scattered
in the database tables, medical reports, and imaging data files. It is often incomplete
and non-uniformly stored. However, the most of this information is not ready to use
for studies, due to the need for data preparation involving cleaning, pre-processing,
and removal of noise and acquisition artifacts. All these activities need input from
multidisciplinary experts, from clinical, and database exploration professionals [15]
(Fig. 1).
The focus of the proposed generic, universal representation model is to: (1) Effi-
ciently capture the common anatomic representation of the internal human body and
maximize sharing and training of model parameters to improve efficiency of the
model; (2) Build a robust representation model to handle missing data elements and
modalities, particularly for EHR and pathology test results, and clinical text notes,
and redundant data from multiple imaging modalities.
By leveraging the recent advances in deep learning such as text and vision
transformer-based learning architectures, and by combining features at multiple
levels, including low level features, cross modal features, high-level features, the
representation model developed is capable of extracting complex and latent, hidden
information from multiple data sources, and can solve multiple downstream clinical
application tasks, such as disease detection, classification, or pathology segmentation
for assessing severity, or staging and tracking of disease progression due to different
pharmacological interventions.
390 K. M. Swaroopa et al.
4 Experimental Work
In this section, we report the study design and methodology for the proposed generic
medical data representation model for a use case involving two different downstream
clinical tasks:
1. Semantic segmentation task involving multi-modal brain images for extracting
meaningful tumor regions, such as the active tumor (AT), necrotic core (NCR),
and peritumoral edematous/infiltrated tissue (ED), directly from multi-modal/
multi-parametric MRI scans (T1w, T1ce, T2w, and FLAIR).
2. Classification task involving segmented brain tumors from multi-parametric
MRI, and extracting features and associate them with tumor severity, contributing
to better prognosis and treatment.
Task 1: For segmentation downstream task, we used the brain tumors
dataset from the Medical Segmentation Decathlon challenge (http://medicaldecat
hlon.com/) [16]. The data is collected from the Multi-modal Brain Tumor Image
Segmentation Benchmark Challenge (BraTS) dataset from 2016 and 2017 [17]. The
task is to segment tumors into three different subregions (active tumor (AT), necrotic
core (NCR), and peritumoral edematous/infiltrated tissue (ED)) from multi-modal
multisite MRI data (T1w, T1ce, T2w, and FLAIR). There are 388 subjects in the
dataset, with each subject consisting of four 3D volumes (T1, T1c, T2, FLAIR) and
corresponding manual annotated labels. Each of the T1, T1c, T2, and FLAIR volume
images are of size 240 × 240 × 155. Ten percent of the data was used for testing,
and 90% data for training and validation. We also performed data augmentation,
with an aim to increase the diversity of the data set by performing random, realistic
transformations, such as rotations, flips, zooming, pixel intensity modifications, and
much more. This also ensures a degree of invariance to these transformations for the
resulting trained models, resulting in better generalization. There are many possible
data augmentation techniques, ranging from basic to more advanced transformations,
including methods for combining multiple images into sets of “new” images (e.g.,
what’s called “CutMix” or “MixUp” and more). When doing data augmentation, it is
vital that the transformations won’t change the correct label (for example, zooming
in on a region of the image that doesn’t contain the information needed to assign the
class of the original image. In our case, we normalized the image, resized them all
to the same size, and did some random motion as our data augmentation.
The next step for building the model was choosing the deep learning architecture,
and for this we used an enhanced UNet architecture as shown in Fig. 2. We also used
a novel loss function based on combining the Dice loss and Cross-Entropy loss, as a
weighted sum of these two losses. Figure 3 shows the trained model performance in
terms of prediction on the validation subset of the data. With just one epoch, the model
performance obtained on test set was a multi-dice score of 0.7252, 0.5850, 0.7105
for each label (active tumor (AT), necrotic core (NCR), and peritumoral edematous/
infiltrated tissue (ED).
Task 2: The second downstream task considered was extracting imaging-
biomarkers for brain cancer analyses from MRI of glioma. This can be done by
Toward a Generic Multi-modal Medical Data Representation Model 391
Fig. 3 Segmentation model performance visualization for validation and test dataset
using a deep learning model to segment brain tumors from multi-parametric MRI,
and then extracting features from the resulting tumor. Such features can potentially
be associated with tumor severity and prognosis and contribute to better treatment.
Extracting features from objects of interest in medical images for diagnostic purposes
is often referred to as radiomics [18].
The goal of radiomics is to extract information from medical images that can
be used as part of a medical imaging-based diagnostic workflow. The information
can be extracted from various imaging modalities, e.g., different MRI contrasts, PET
imaging, CT imaging, and so on. One can then combine it with other sources of infor-
mation (e.g., demographics, clinical data, and genetics). In such a way, radiomics—
and radio genomics—can open the door to sophisticated and powerful analyses.
By estimating the locations and extent of the brain tumors’ T2-enhancing and non-
enhancing regions, we can then use this to extract the tumor location and the tumor
burden. Additionally, we can look at the features of the MRI images inside each of
these two tumor parts. Tumor burden and anatomical locations are highly informa-
tive when assessing prognosis and planning treatment in brain tumors. Once we can
segment the tumors, we automatically obtain tumor volumes. If we have repeated
scans of the same tumors, we obtain estimates of tumor progression. By further anal-
yses one can also estimate the anatomical locations of the tumors. Figure 4 shows
how segmented tumor regions from task 1 can be analyzed further with extraction
of several features to assess the disease progression. For task 2 we used versions of
the MRI images that have already been co-registered and converted to NIfTI format
392 K. M. Swaroopa et al.
from the TCGA collection [19]. We’ve prepared a small sample dataset containing
data from 10 subjects. We used the same UNet architecture for building the segmen-
tation model and segmented masks from task 1 were used for extracting radiomic
features. The radiomic features together with the other information we have about
the subjects can provide relevant clinical information. In other words, to what extent
the various features are associated to clinical outcomes (e.g., survival), either indi-
vidually or together. This can be done using, e.g., plots, basic statistics, statistical
modeling, or machine learning. Figure 5 shows the outcomes of analysis in terms of
length of survival for our subjects. It shows how various radiomics features relate
to IDH mutation status, survival times and volume of the enhancing-non-enhancing
tumor regions for each subject. Note that multiple sources of information beyond
MRI could be valuable when assessing a glioblastoma case. A system tasked with
extracting relevant, actionable information should therefore have access to more than
the MRI images. This is a general principle in medicine, that important information
about a patient, disease, or condition is represented in a vast set of heterogeneous data.
This leads to the need for integrated diagnostics and proposed generic multi-modal
representation model accommodates integrated diagnostics to be performed.
Fig. 5 Length of survival of subjects from radiomic feature analysis of segmented masks
Toward a Generic Multi-modal Medical Data Representation Model 393
References
1. Durrani N, Vukovic D, van der Burgt J, Antico M, van Sloun RJG, Canty D, Steffens M, Wang
A, Royse A, Royse C, Haji K, Dowling J, Chetty G, Fontanarosa D (2022) Automatic deep
learning-based consolidation/collapse classification in lung ultrasound images for COVID-
19 induced pneumonia. Sci Rep 12(1):1–15 [17581]. https://doi.org/10.1038/s41598-022-221
96-y
2. Ahmad P, Qamar S, Shen L, Rizvi SQA, Ali A, Chetty G (2022) Multi-scale 3D UNet: multi-
scale 3D UNet for brain tumor segmentation. In: Crimi A, Bakas S (eds) International MICCAI
brainlesion workshop: glioma, multiple sclerosis, stroke and traumatic brain injuries—7th
international workshop, BrainLes 2021, held in conjunction with MICCAI 2021 (Lecture Notes
in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture
Notes in Bioinformatics), vol 12963 LNCS). Springer, pp 30–41. https://doi.org/10.1007/978-
3-031-09002-8_3
3. Park S, Kim G, Oh Y, Seo JB, Lee SM, Kim JH, Moon S, Lim J-K, Ye JC (2021) Vision
transformer for covid-19 cxr diagnosis using chest x-ray feature corpus. arXiv preprint arXiv:
2103.07055 (PREPRINT)
4. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P,
Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Inf Process
Syst 33:1877–1901
5. Caron M, Touvron H, Misra I, J´egou H, Mairal J, Bojanowski P, Joulin A (2021)
Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF
international conference on computer vision, pp 9650–9660
6. Ramesh A, Pavlov M, Goh G, Gray S, Voss C, Radford A, Chen M, Sutskever I (2021) Zero-
shot text-to-image generation. In: International conference on machine learning, pp 8821–8831.
PMLR
7. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, pP 770–778
8. Chen RJ, Krishnan RG (2022) Self-supervised vision transformers learn visual concepts in
histopathology. In: Learning meaningful representations of life (LMRL) Workshop, NeurIPS,
pp 558–575
9. Wang W, Xie E, Li X, Fan D-P, Song K, Liang D, Lu T, Luo P, Shao L (2021) Pyramid vision
transformer: a versatile backbone for dense prediction without convolutions. In: Proceedings
of the IEEE/CVF international conference on computer vision, pp 568–578
394 K. M. Swaroopa et al.
Abstract This paper presents a universal object detection framework for uncon-
strained environment settings where machines can only learn from massive unlabeled
multimodal data and a few labeled data. This research aims to tackle key challenges in
computer vision and expects to produce next-generation object detection techniques
that can effectively detect objects of diversified categories in complex application
settings. The proposed universal object detection framework is based on a novel
formulation to solve anomaly detection problem leveraging multimodal heteroge-
neous data sources and denoising diffusion models and application to a wide set of
complex application settings.
1 Introduction
Universal object detection is one of the most important image analysis and computer
vision tasks, and fundamental for many cutting edge and ground-breaking tech-
nologies like autonomous driving, intelligent robotics, and so on. The goal of an
object detection task is to detect objects of certain classes (such as humans, vehi-
cles, or animals) in data sources like images and videos. The advances in machine
learning and AI technologies, particularly deep learning networks has fueled signif-
icant progress in the state of the art in the object detection field [1, 2]. Inspired from
biological visual processing techniques, the recent deep learning-based algorithms
allow modeling of powerful and deeper visual knowledge from large-scale and well
labeled datasets and obtain superior object detection performance as compared to
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 395
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_34
396 N. Kaur et al.
2 Background
Recently several deep learning models have been developed to address the chal-
lenges associated with difficulties of labeling and annotations, particularly in medical
object detection field, where anyone and everyone can’t annotate the ground truth,
and availability of domain experts in a particular medical specialty is needed. Some
of these recent algorithms attempt to address this challenge, based on learning algo-
rithms, such as the self-supervised learning [5], multimodal learning [6], and few-
shot learning [7, 8]. While the self-supervised learning approach relies on gener-
ation of pseudo labels from unlabeled data for building deep learning models,
multimodal learning approach attempts to learn from information from different
data modalities like texts descriptions and 3D point cloud arrangements. Few-shot
learning approaches on the other hand, focus on learning from data with only one
or a few labels, or with only a few categories of objects labeled. However, these
approaches make several assumptions and lead to less-than-optimal performance
outcomes particularly for medical object detection, such as:
• The current self-supervised learning schemes for object detection use binary
pseudo labels for learning on unlabeled data, and focus on the image classifica-
tion task, resulting in inaccurate prediction of object co-ordinates due to primitive
modeling of object appearances [9].
• The state-of-the-art approaches proposed recently on multimodal learning for
object detection rely on strategies for mining visual information either from 3D
point cloud data [10] or from text-based image caption data [6]. However, they
show acceptable performance outcomes with models based one or two modalities,
due to modeling difficulties in disparate and heterogeneous data sources.
• The existing few-shot learning approaches for object detection still count on a large
amount of well-labeled baseline training datasets before learning on a few labels
and have been unable to embed the prior knowledge in their learning schemes
[8, 11].
The universal object detection framework proposed in this paper aims to address
the short comings of these recent approaches, particularly in terms of their applica-
bility to medical object detection task, which is a significantly complex endeavor. The
novel approach proposed here is based on approaching the complex medical object
detection task as a weakly supervised anomaly detection problem and employing
denoising diffusion models. The proposed approach has the capability to outperform
the current weakly supervised object detection methods that rely on generative adver-
sarial networks or autoencoder models. These traditional models are more difficult to
train or find it challenging to preserve fine details in the image. The universal object
detection framework we propose here is based on novel denoising diffusion models
for solving the weakly supervised anomaly detection problem and uses a combination
of deterministic iterative noising and denoising scheme with classifier guidance for
398 N. Kaur et al.
For medical object detection, it is difficult to obtain ground truth annotations and
labels at pixel-level, and often they are biased to human annotators, or simply
unavailable. Addressing these difficulties by formulating the problem of medical
object detection as a weakly supervised anomaly detection task, particularly with
novel denoising diffusion models can be a promising line of investigation. The main
advantage of this approach is that the weakly supervised anomaly detection relies
only on availability of few image-level labels in the model building stage. Figure 1
shows the schematic for the proposed universal object detection framework based
on denoising diffusion modeling-based weakly supervised anomaly detection task.
By assuming two unpaired sets of images for the model building phase in the
training stage, with the first set containing images of healthy subjects and the second
set with images of subjects affected by a disease, we need only the image and the
corresponding image-level label (weakly supervised setting) as healthy or diseased
during training stage. The model building phase comprises two parts, where the first
part involves building a denoising diffusion model [13], followed by a two-class clas-
sifier for classifying healthy and diseased tissues. In the second part, the anomaly
map for an unseen image without any labels is built using the model built in part 1.
The overall scheme is based on using a reverse sampling scheme with a diffusion
process, and encoding the anatomical information of the image, which is essen-
tially an iterative noising process. And a subsequent denoising stage with classifier
Fig. 1 Universal object detection framework based on denoising diffusion modeling-based weakly
supervised anomaly detection task
Universal Object Detection Under Unconstrained Environments 399
guidance to generate image of the healthy tissue. The final stage involves a pixel-
wise anomaly map between the original and synthetic image built with denoising
diffusion model, allowing identification of diseased tissue. The iterative encoding
and denoising process allows preservation of most of the details of the input image
that represents normal tissue, and enhances the tissue affected by the disease on
the synthesized image. The workflow for medical object detection as an anomaly
detection task based on denoising diffusion involves image-to-image translation,
comprising transformation of an image of a patient to an image without any patholo-
gies. It is important to note, that only pathological regions in the image are changed
in this process, and rest of the image is preserved. This allows the anomaly map to be
constructed as the difference between the original and translated image. Using diffu-
sion models, detail preservation during image-to-image translation is more efficient
as compared to other modeling approaches such as the variational autoencoders and
generative adversarial networks (GANs). Further, the unique formulation of object
detection problem as a weakly supervised anomaly detection task, based on lever-
aging the multimodal data sources, along with denoising diffusion models, it is
possible to obtain improved generalizability and applicability for any downstream
task or application settings, including classification, segmentation, or detection task
for medical or non-medical application settings. Diffusion models, being based on the
Markov chain theory, learn to generate their synthetic outputs by gradually denoising
an initial image packed with random Gaussian noise. This iterative denoising process
makes the inference runs of diffusion models significantly slower than other gener-
ative models, but in exchange, it allows them to extract more representative features
from their input data, enabling them to outperform other models in the end. The eval-
uation of the proposed universal object detection framework was done for a medical
object detection task involving detection of brain tumors from MRI scans. For this
task, brain tumor detection was done from multimodal multiparametric scans from
publicly available medical segmentation decathlon dataset (MSD-Task 01 dataset
[12]). The denoising diffusion model was developed, using a two-dimensional (2D)
axial slice from multiple modalities, a T1-weighted (T1), a contrast-enhanced T1-
weighted (T1Gd), a T2-Weighted (T2), or a T2 fluid attenuated inversion recovery
(T2-FLAIR) sequence of a brain MRI, and a user-defined cropped area of that slice
was synthesized, to represent a realistic and controllable image of either a high-grade
glioma and its corresponding components (e.g., the surrounding edema), or tumor-
less (apparently normal) brain tissues. Further theoretical details of approach used
for development of denoising diffusion probabilistic models are provided in [13, 14].
400 N. Kaur et al.
4 Experimental Work
random noise was added for L steps to the input image and sampling was performed
using the denoising diffusion probabilistic model with UNet classifier guidance. For
all experiments, the two additional hyper parameters, the number of steps were set
to 500 and number of samples to 100. Figure 2 shows the example patient images
for all four MRI image sequences from MSD-Task01 dataset, and comparison of the
proposed method with the GAN and variational autoencoder approach. And Fig. 3
shows the anomaly map for a healthy subject, with no anomaly seen in the anomaly
map, along with a subject with small size of tumor, and how the proposed model
is able to detect that accurately. As can be seen in Fig. 2, the proposed denoising
diffusion model performs as good as GAN, without expensive data augmentation and
complex deep learning architecture design requirements for guidance stage. Figure 4
shows additional results for a diseased image and the anomaly map produced and
their close similarity to the ground truth label/masks available in the dataset.
Fig. 2 Results for an image of the MSD-Task01 dataset for L = 500 and s = 100
402 N. Kaur et al.
Fig. 3 Results for an image of the MSD-Task01 dataset for L = 500 and s = 100 (Top image for
a healthy subject)
Fig. 4 Additional results for diseased subjects from the MSD-Task01 dataset for L = 500 and s =
100
References
6. Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, … Sutskever I (2021) Learning
transferable visual models from natural language supervision. In: International conference on
machine learning. PMLR, pp 8748–8763
7. Wang YX, Girshick R, Hebert M, Hariharan B (2018) Low-shot learning from imaginary
data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp
7278–7286
8. Joseph KJ, Khan S, Khan FS, Balasubramanian VN (2021) Towards open world object detec-
tion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
pp 5830–5840
9. Bar A, Wang X, Kantorov V, Reed CJ, Herzig R, Chechik G, … Globerson A (2022) Detreg:
unsupervised pretraining with region priors for object detection. In: Proceedings of the IEEE/
CVF conference on computer vision and pattern recognition, pp 14605–14615
10. Deng J, Shi S, Li P, Zhou W, Zhang Y, Li H (2021) Voxel r-cnn: towards high performance voxel-
based 3d object detection. In: Proceedings of the AAAI conference on artificial intelligence,
vol 35, no 2, pp 1201–1209
11. Kang B, Liu Z, Wang X, Yu F, Feng J, Darrell T (2019) Few-shot object detection via feature
reweighting. In: Proceedings of the IEEE/CVF international conference on computer vision,
pp 8420–8429
12. Antonelli M, Reinke A, Bakas S et al (2022) The medical segmentation decathlon. Nat Commun
13:4128. https://doi.org/10.1038/s41467-022-30695-9
13. Nichol AQ, Dhariwal P (2021) Improved denoising diffusion probabilistic models. In: Interna-
tional conference on machine learning. PMLR, pp 8162–8171. https://doi.org/10.48550/arXiv.
2102.0967 (2018)
14. Siddiquee MMR, Zhou Z, Tajbakhsh N, Feng R, Gotway MB, Bengio Y, Liang J (2019)
Learning fixed points in generative adversarial networks: from image-to-image translation to
disease detection and localization. In: Proceedings of the IEEE/CVF international conference
on computer vision, pp 191–200
15. Menze BH et al (2015) The multimodal brain tumor image segmentation benchmark (BRATS).
IEEE Trans Med Imaging 34:1993–2024
16. Bakas S, Reyes M, Int E, Menze B (2018) Identifying the best machine learning algorithms
for brain tumor segmentation, progression assessment, and overall survival prediction in the
BRATS challenge. arXiv preprint arXiv:1811.02629
17. Chen X, Konukoglu E (2018) Unsupervised detection of lesions in brain mri using constrained
adversarial auto-encoders. arXiv preprint arXiv:1806.04972
Internet of Things-Based 3-Lead ECG
Signal Acquisition System
Abstract IoT, also known as the Internet of Things, has been a promising technology
that involves numerous devices worldwide to stay connected to collect, share and
analyze data, thereby building invocations that can help humanity. The IoT-enabled
system consists of a smart device connected to processors, sensors, and communi-
cation hardware to receive data from the environment, process it into digital signals,
and transfer it to devices for analysis. Internet of Things-Based 3-Lead ECG Signal
Acquisition System is an efficient trio of components. It is a patient-centric moni-
toring system, which allows them to actively collect real-time rhythmic contraction
and relaxation of cardiac muscles as signals and store the data in the database. It can
then be analyzed using personal devices like laptops and smartphones. Providing
a window for actionable insights in cases of emergencies. In this paper, we have
constructed an IoT-based ECG system by taking into account the basis of ECG
mechanism. It obtains impulses via electrodes from specific locations on the body
as input for the AD8232 sensor operating at a supply range of 2–3.5 V. The sensor
filters biopotential signals, amplifies them, and transfers them to the microprocessor
(Arduino board Uno). The board obtains the signals from the sensor and, as per the
instructions given via the code through the compatible software, it processes the
signals, and sends them as an output for visualization by the system connected to
the board. The system produces an ECG pattern in the Arduino Ide Serial plotter
and provides heart rate and heart rate variability. The values can be obtained from
a serial monitor and are analyzed using different applications. IoT-based ECG can
be incorporated into ambulances for diagnosing various cardiovascular conditions
of the patients remotely by the doctor.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 405
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1_35
406 P. Sinha et al.
1 Introduction
In recent years, people have experienced a phenomenal change in the way of living
due to the pandemic. It has created distance between us leading to the expansion of
digital space, from the simple task of ringing bells to the diagnosis made possible
with rapidly advancing technology.
With health being a strong matter of concern these years, monitoring, balancing,
and maintaining health is really important. Requirement of friendly, simple, afford-
able, and compatible in-hand technology, is a savior. These devices can teach, help,
and learn about the human body and its behavior. Internet of Things, or IoT, has
the capability of connecting a billion devices together and allowing them to collect,
process, and share data. Thereby, utilizes the information to simplify major tasks.
This technology also supports cloud accessibility, giving access to people all over
the world to integrate their ideas.
The technology of IoT can be applied to the field of medicine where health care
professionals can diagnose and prescribe treatment to a patient in emergency at a
faraway location. Using the technology in hand, An ECG device has been built to
record, store, and process data so as to keep a real-time check of heart rhythm.
The prototype created uses personal devices like desktop, laptop, smartphone, etc.
which creates an interference between the hardware and database. A connection is
established between the sensor, microcontroller, and personal device through which
ECG data is received and visualized.
2 Theoretical Framework
Increasing population has the urgent need of rapid tests. The easiest test to check
the functionality of the heart is by using ECG also known as Electrocardiogram
which records heart beats, rates, and rhythm over time as action potential propagates
throughout the heart with every cycle of systole and diastole. There is a standard
ECG pattern (Fig. 1) produced by healthy individuals consisting of
• P Wave—It is the first peak, obtained by the excitation of two arteries
• QRS Wave—It shows ventricular contraction (also known as QRS complex)
• T Wave—It shows ventricular relaxment once electrical impulse stops to spread
The Internet of Things-based 3 lead acquisition system calculates heart rate, heart
rate variability, and ECG pattern of a person and sends it to the cloud. Using the
system the user can monitor his health-related parameters. This system can also be
integrated into an ambulance wherein all the critical health-related parameters of
patients can be acquired and sent to the cloud through which clinicians can analyze
the condition in advance. The prime objective of 3 lead-based ECG systems is to
acquire the physiological parameters using sensors for various purposes.
Internet of Things-Based 3-Lead ECG Signal Acquisition System 407
The heart muscle generates action potential via self-stimulating tissue whose
stimulus flows throughout the organ. At the time of depolarization action potential
travels through the cardiac muscle which the body can conduct to the surface. These
signals can be captured, amplified, and recorded.
Internet of Things-Based 3-Lead ECG Signal Acquisition System Works on a
voltage of 3.3 V supplied by the computer or electrical device connected to it and the
sensor picks the suitable potential from the electrical signals generated from the heart
via the electrodes placed in a specific position on the body to create an Einthoven
Triangle (Fig. 2a) [1]. The sum of vectors of the frontal plane in the Cardiac Vector
at any instant onto the three axes of the Einthoven Triangle is zero.
Bipolar limb lead electrodes known as classical limb lead which measures the
potential by two active electrodes placed on defined position on two limbs and
produces an algebraic sum of the potentials of 2 constituent active electrodes (Fig. 2b).
In a healthy human, heart rate ranges from 60 to 100 bpm. A lower heart rate at
rest showcases more efficient heart functioning and better cardiovascular condition.
For example, a well-trained cricketer might have a normal resting heart rate near to
40 bpm.
Heart rate variability is when the time interval between the heartbeats does not
fluctuate much and even though these fluctuations are not so strong, they can still
indicate current or future heat related problems and can also indicate mental health
issues like anxiety, stress depression which affects the cardiac activity [2].
3 Lead IoT-based ECG system measures Heart Rate Variability and Heart Rate
by using the formula:
Electrodes used to sense the heart rate in the form Einthoven Triangle are bipolar
limb leads which are a form of classical limb leads.
These leads measure the potential using two active electrodes placed on any two
limbs and represent the algebraic sum of the potentials of two constituent active
(electrodes) leads (Table 1).
• ECG sensor AD8232—is the major component of the system which is responsible
to collect physiological data from the body and transfer it to the microprocessor.
Sensors are an integrated chip of a specialized instrumentation amplifier (IA), an
operational amplifier (A1), a right leg drive amplifier (A2), and a misapply refer-
ence buffer (A3). The sensor includes leads off detection circuitry and an automatic
fast restore circuit that brings back the signal shortly as leads are reconnected [3].
• Arduino Board Uno is a microcontroller board based on the ATmega328P. It has
14 digital input/output pins (of which 6 can be used as PWM outputs, 6 analog
inputs, a 16 MHz ceramic resonator, a USB connection, a power jack, an ICSP
header and a reset button). UNO contains everything that was needed to support
the microcontroller; simply by connecting it to a computer with a USB cable or
powering it with an AC-to-DC adapter or battery to get started [4].
• Bipolar electrodes were connected to the AD8232 sensor. There are 3 ECG nodes
(Positive—red, negative—yellow, and green—neutral). The sensor was connected
to Arduino using Jumper wire in the following manner. Libelium has made the
sensor compatible with Arduino Uno USB cable was used to plug in the Arduino
board to the computer, for uploading of code and display of output [5] (Fig. 4).
3.2 Software
Arduino integrated development environment (IDE) version 1.8.19 was used to write
code and give commands to the Arduino board. The program code was written in
the language C++, along with which a library was installed called pulse sensor
playground which helped us to obtain the output. In order to capture data, Arduino
Ide [6] and Cool term software [7] was used. The graph obtained from the Arduino
Serial plotter was compared with the graph from Google Sheets formed by using
data obtained from Arduino Serial Monitor. In the flow chart (Fig. 5) there is a clear
demonstration of the part taken by the signals, its conversions and representation on
the signals. The signals acquired from body were amplified, converted to analogue
and processed by the commands to build a system that can take biological signals
and produce heart rate and heart rate variability.
We tested the prototype system by fitting 3 electrodes, 2 electrodes were fit on both the
wrist and one of the electrodes was fit on the right foot. The ECG was connected to the
Sensor Platform and Arduino Uno. The Program Code was uploaded to the Arduino
Uno chip. The USB cable was connected to Arduino Uno. Graphs were obtained
through Arduino Serial Plotter and Data obtained from Arduino Serial Monitor were
extracted by using Coolterm application, extracted data was then converted to csv
file and uploaded to Google Sheets to obtain graphs.
All data is captured as numbers that refer to the sensor voltage’s value. Value
ranged from 450 to 700. The system established acquired signals from the surface
Internet of Things-Based 3-Lead ECG Signal Acquisition System 411
by nonpolarized process and gave a signal that showed clear depiction of waves with
the following characteristics (Fig. 6).
The 3 lead IoT-based ECG system recorded the electrical activity of the heart at rest
and provided information about the heart rate and its contractions, depicting that
the heartbeats were normal. The signals generated by the Serial plotter COM 3 of
412 P. Sinha et al.
Arduino were same as that of standard ECG signals (PQRS signals) that appears
from an electrocardiogram machine. The P waves, QRS complex and T wave were
clearly distinguishable in the graph obtained from the data received from the system.
In the result (Fig. 7) the P wave obtained had a positive wave with a duration less
than/equal to 0.12. As the blood follows from artery to ventricle, prior ventricle
contraction is about an interval of about 0.1 to 0.2 is seen which indicates a normal
PR interval. Which is followed by a negative wave (Q wave), and R wave with highest
amplitude, representing the ventricle contraction. When the purine fibers depolarize
a long negative wave known as the S wave is produced. At the junction of end of
depolarization and appearance of repolarization a significant point exists known as J
point which holds a significance in identifying any metabolic disorder (its elevation
indicates hypothermia). As the J point crosses there is an increase in the amplitude
and a perk T wave appears which represents the repolarization of ventricles (the
shape of this wave is dependent on the body control and regulation mechanism).
Between the S and T wave lies an ST segment.
The regular time interval between two R waves is denoted as R-R interval. The
variation in the interval can be an indication of abnormal functioning of nerves and
these signals are also impacted by stress, body cycles and hormonal level. The results
obtained in the above graph depict a regular R-R interval showcasing a normal heart
activity. Also it was ensured that the person was at state of rest and calm while
monitoring its heart activity.
There is an appearance of short, positive, asymmetric P wave due to arterial
contraction followed by PR segment including a short QRS complex which is seen
due to rapid depolarization of ventricles indicating proper functioning of the system,
Fig. 7 Marking of the PQRST peaks in the output obtained from the prototype and depiction of R
interval
Internet of Things-Based 3-Lead ECG Signal Acquisition System 413
This paper represents a prototype of an IoT-based 3 lead acquisition system and its
implementation as a healthcare monitoring system, based on knowledge of Electro-
cardiogram and IoT Technology. The system provides constant facilities to monitor
cardiac activity and related disease which can be used personally at home or in any
emergency situations. The data extracted is consistently updated to the database at
a regular time interval which can be analyzed in order to be assisted by clinicians,
patients, and their caretaker to monitor the health condition and take appropriate
actions. It can measure heart rate and heart rate variability which helps in identifi-
cation of different cardiac conditions. Data extracted from the prototype along with
ECG pattern can be easily interpreted by a person and it can be incorporated with
existing portable technology making it user friendly and convenient to be used daily.
The data obtained from the acquisition system can be uploaded or directly sent to the
doctors by different softwares for the continuous monitoring the health of patient.
Accuracy can be increased by increasing the electrodes which can lead to more signal
generation and therefore increasing the performance. The obtained data can be stored
in a database for future references or for analyzing a person’s heart health from his
previous records.
References
1. Ghia CL (2007) Textbook of physiology (7th ed). Jaypee Publication, New Delhi, pp 213–215
2. Circuits Digest: https://circuitdigest.com/microcontroller-projects/understanding-ecg-sensor-
and-program-ad8232-ecg-sensor-with-arduino-to-diagnose-various-medical-conditions. Last
accessed on 10 Sept 2022
3. https://www.analog.com/media/en/technical-documentation/data-sheets/ad8232.pdf. Last
accessed on 21 Aug 2022
4. Aurdino: https://docs.arduino.cc/hardware/uno-rev3. Last accessed on 5 Sept 2022
5. Microcontrillers: https://microcontrollerslab.com/ad8232-ecg-module-pinout-interfacing-with-
arduino-applications-features/. Last accessed on 15 Aug 2022
6. Aurdino: https://www.arduino.cc/en/software. Last accessed on 18 Sept 2022
7. Coolterm: https://coolterm.en.lo4d.com/windows. Last accessed on 18 Sept 2022
Index
B J
Barenya Bikash Hazarika, 237, 317 Joanne Gomes, 211
Bhavya Sri, A., 237 Jyoti Chauhan, 1
Bhumika Papnai, 27
Brojo Kishore Mishra, 171
K
Kapil, 15, 269
C Kaur, Nancy, 385, 395
Chandni Agarwal, 327 Khoirom Motilal Singh, 317, 247
Charu Agarwal, 39 Kirti Jain, 59
Charul Bhatnagar, 327
Ch. Bhavya Sri, 147
Chetty, Girija, 385, 395 L
Chhabra, Amitabh, 225 Lathish, R., 237
D M
Daiss, Ivelina, 225 Mahesh Gawande, 109
Daya Bhardwaj, 405 Mangesh S. Thakare, 121
© The Editor(s) (if applicable) and The Author(s), under exclusive license 415
to Springer Nature Singapore Pte Ltd. 2023
A. Mishra et al. (eds.), Advances in IoT and Security with Computational Intelligence,
Lecture Notes in Networks and Systems 756,
https://doi.org/10.1007/978-981-99-5088-1
416 Index