0% found this document useful (0 votes)
1K views981 pages

Icicit 2020

Uploaded by

shwetha N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views981 pages

Icicit 2020

Uploaded by

shwetha N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 981

Lecture Notes in Networks and Systems 173

S. Smys
Valentina Emilia Balas
Khaled A. Kamel
Pavel Lafata   Editors

Inventive
Computation
and Information
Technologies
Proceedings of ICICIT 2020
Lecture Notes in Networks and Systems

Volume 173

Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland

Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of Campinas—
UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Turkey
Derong Liu, Department of Electrical and Computer Engineering,
University of Illinois at Chicago, Chicago, USA;
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering,
University of Alberta, Alberta, Canada;
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to both
the contributors and the readership are the short publication timeframe and the
world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.

More information about this series at http://www.springer.com/series/15179


S. Smys Valentina Emilia Balas
• •

Khaled A. Kamel Pavel Lafata


Editors

Inventive Computation
and Information
Technologies
Proceedings of ICICIT 2020

123
Editors
S. Smys Valentina Emilia Balas
RVS Technical Campus “Aurel Vlaicu” University of Arad
Coimbatore, India Arad, Romania

Khaled A. Kamel Pavel Lafata


Computer Science Department Czech Technical University
Texas Southern University Praha, Czech Republic
Houston, TX, USA

ISSN 2367-3370 ISSN 2367-3389 (electronic)


Lecture Notes in Networks and Systems
ISBN 978-981-33-4304-7 ISBN 978-981-33-4305-4 (eBook)
https://doi.org/10.1007/978-981-33-4305-4

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
We are honored to dedicate this book to all
the participants and editors of ICICIT 2020.
Preface

This conference proceedings volume contains the written versions of most of the
contributions presented during the conference of ICICIT 2020. The conference
provided a setting for discussing recent developments in a wide variety of topics
including cloud computing, artificial intelligence, and fuzzy neural systems. The
conference has been a good opportunity for participants coming from various
destinations to present and discuss topics in their respective research areas.
This conference tends to collect the latest research results and applications on
computation technology, information, and control engineering. It includes a
selection of 71 papers from 266 papers submitted to the conference from univer-
sities and industries all over the world. All of the accepted papers were subjected to
strict peer-reviewing by two–four expert referees. The papers have been selected for
this volume because of the quality and the relevance to the conference.
We would like to express our sincere appreciation to all authors for their
contributions to this book. We would like to extend our thanks to all the referees for
their constructive comments on all papers; especially, we would like to thank the
organizing committee for their hard work. Finally, we would like to thank the
Springer publications for producing this volume.

Coimbatore, India S. Smys


Arad, Romania Valentina Emilia Balas
Houston, USA Khaled A. Kamel
Praha, Czech Republic Pavel Lafata

vii
Contents

A Heuristic Algorithm for Deadline-Based Resource Allocation


in Cloud Using Modified Fish Swarm Algorithm . . . . . . . . . . . . . . . . . . 1
J. Uma, P. Vivekanandan, and R. Mahaveerakannan
Dynamic Congestion Control Routing Algorithm for Energy
Harvesting in MANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
M. M. Karthikeyan and G. Dalin
Predictable Mobility-Based Routing Protocol in Wireless Sensor
Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
G. Sophia Reena and M. Punithavalli
Novel Exponential Particle Swarm Optimization Technique
for Economic Load Dispatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Nayan Bansal, Surendrabikram Thapa, Surabhi Adhikari,
Avinash Kumar Jha, Anubhav Gaba, and Aayush Jha
Risk Index-Based Ventilator Prediction System
for COVID-19 Infection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Amit Bhati
IoT-Based Smart Door Lock with Sanitizing System . . . . . . . . . . . . . . . 63
M. Shanthini and G. Vidya
Aspect-Based Sentiment Analysis in Hindi: Comparison
of Machine/Deep Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 81
T. Sai Aparna, K. Simran, B. Premjith, and K. P. Soman
Application of Whale Optimization Algorithm in DDOS Attack
Detection and Feature Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
P. Ravi Kiran Varma, K. V. Subba Raju, and Suresh Ruthala
Social Media Data Analysis: Twitter Sentimental Analysis on Kerala
Floods Using R Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Madhavi Katamaneni, Geeta Guttikonda, and Madhavi Latha Pandala

ix
x Contents

Intrusion Detection Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . 113


Sanjay Patidar and Inderpreet Singh Bains
Secure Trust-Based Group Key Generation Algorithm
for Heterogeneous Mobile Wireless Sensor Networks . . . . . . . . . . . . . . . 127
S. Sabena, C. Sureshkumar, L. Sai Ramesh, and A. Ayyasamy
A Study on Machine Learning Methods Used for Team Formation
and Winner Prediction in Cricket . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Manoj S. Ishi and J. B. Patil
Machine Learning-Based Intrusion Detection System with Recursive
Feature Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Akshay Ramesh Bhai Gupta and Jitendra Agrawal
An Optical Character Recognition Technique for Devanagari Script
Using Convolutional Neural Network and Unicode Encoding . . . . . . . . 173
Vamsi Krishna Kikkuri, Pavan Vemuri, Srikar Talagani, Yashwanth Thota,
and Jayashree Nair
A Machine Learning-Based Multi-feature Extraction Method
for Leather Defect Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Malathy Jawahar, L. Jani Anbarasi, S. Graceline Jasmine,
Modigari Narendra, R. Venba, and V. Karthik
Multiple Sclerosis Disorder Detection Through Faster Region-Based
Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Shrawan Ram and Anil Gupta
Retaining Named Entities for Headline Generation . . . . . . . . . . . . . . . . 221
Bhavesh Singh, Amit Marathe, Ali Abbas Rizvi, and Abhijit R. Joshi
Information Hiding Using Quantum Image Processing State
of Art Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
S. Thenmozhi, K. BalaSubramanya, S. Shrinivas,
Shashank Karthik D. Joshi, and B. Vikas
Smart On-board Vehicle-to-Vehicle Interaction Using Visible Light
Communication for Enhancing Safety Driving . . . . . . . . . . . . . . . . . . . . 247
S. Satheesh Kumar, S. Karthik, J. S. Sujin, N. Lingaraj, and M. D. Saranya
A Novel Machine Learning Based Analytical Technique for Detection
and Diagnosis of Cancer from Medical Data . . . . . . . . . . . . . . . . . . . . . 259
Vasundhara and Suraiya Parveen
Instrument Cluster Design for an Electric Vehicle Based on CAN
Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
L. Manickavasagam, N. Krishanth, B. Atul Shrinath, G. Subash,
S. R. Mohanrajan, and R. Ranjith
Contents xi

Ant Colony Optimization: A Review of Literature and Application


in Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Nandini Nayar, Shivani Gautam, Poonam Singh, and Gaurav Mehta
Hand Gesture Recognition Under Multi-view Cameras Using Local
Image Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Kiet Tran-Trung and Vinh Truong Hoang
Custom IP Design for Fault-Tolerant Digital Filters for High-Speed
Imaging Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Somashekhar Malipatil, Avinash Gour, and Vikas Maheshwari
A Novel Focused Crawler with Anti-spamming Approach & Fast
Query Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Ritu Sachdeva and Sachin Gupta
A Systematic Review of Log-Based Cloud Forensics . . . . . . . . . . . . . . . 333
Atonu Ghosh, Debashis De, and Koushik Majumder
Performance Analysis of K-ELM Classifiers with the State-of-Art
Classifiers for Human Action Recognition . . . . . . . . . . . . . . . . . . . . . . . 349
Ratnala Venkata Siva Harish and P. Rajesh Kumar
Singular Value Decomposition-Based High-Resolution Channel
Estimation Scheme for mmWave Massive MIMO with Hybrid
Precoding for 5G Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
V. Baranidharan, N. Praveen Kumar, K. M. Naveen, R. Prathap,
and K. P. Nithish Sriman
Responsible Data Sharing in the Digital Economy: Big Data
Governance Adoption in Bancassurance . . . . . . . . . . . . . . . . . . . . . . . . 379
Sunet Eybers and Naomi Setsabi
A Contextual Model for Information Extraction in Resume Analytics
Using NLP’s Spacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Channabasamma, Yeresime Suresh, and A. Manusha Reddy
Pediatric Bone Age Detection Using Capsule Network . . . . . . . . . . . . . . 405
Anant Koppar, Siddharth Kailasam, M. Varun, and Iresh Hiremath
Design High-Frequency and Low-Power 2-D DWT Based
on 9/7 and 5/3 Coefficient Using Complex Multiplier . . . . . . . . . . . . . . . 421
Satyendra Tripathi, Bharat Mishra, and Ashutosh Kumar Singh
Fuzzy Expert System-Based Node Trust Estimation in Wireless
Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
K. Selvakumar and L. Sai Ramesh
xii Contents

Artificial Neural Network-Based ECG Signal Classification


and the Cardiac Arrhythmia Identification . . . . . . . . . . . . . . . . . . . . . . 445
M. Ramkumar, C. Ganesh Babu, G. S. Priyanka, B. Maruthi Shankar,
S. Gokul Kumar, and R. Sarath Kumar
CDS-Based Routing in MANET Using Q Learning with Extended
Episodic Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
D. S. John Deva Prasanna, D. John Aravindhar, and P. Sivasankar
A Graphical User Interface Based Heart Rate Monitoring Process
and Detection of PQRST Peaks from ECG Signal . . . . . . . . . . . . . . . . . 481
M. Ramkumar, C. Ganesh Babu, A. Manjunathan, S. Udhayanan,
M. Mathankumar, and R. Sarath Kumar
Performance Analysis of Self Adaptive Equalizers Using Nature
Inspired Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
N. Shwetha and Manoj Priyatham
Obstacle-Aware Radio Propagation and Environmental Model
for Hybrid Vehicular Ad hoc Network . . . . . . . . . . . . . . . . . . . . . . . . . . 513
S. Shalini and Annapurna P. Patil
Decision Making Among Online Product in E-Commerce
Websites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
E. Rajesh Kumar, A. Aravind, E. Jotheeswar Raghava, and K. Abhinay
A Descriptive Analysis of Data Preservation Concern and Objections
in IoT-Enabled E-Health Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 537
Anuj Kumar
Applying Deep Learning Approach for Wheat Rust Disease Detection
Using MosNet Classification Technique . . . . . . . . . . . . . . . . . . . . . . . . . 551
Mosisa Dessalegn Olana, R. Rajesh Sharma, Akey Sungheetha,
and Yun Koo Chung
A Decision Support Tool to Select Candidate Business Processes
in Robotic Process Automation (RPA): An Empirical Study . . . . . . . . . 567
K. V. Jeeva Padmini, G. I. U. S. Perera, H. M. N. Dilum Bandara,
and R. K. Omega H. Silva
Feature-Wise Opinion Summarization of Consumer Reviews
Using Domain Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
Dushyanthi Vidanagama, Thushari Silva, and Asoka S. Karunananda
Machine Learning-Based Approach for Opinion Mining
and Sentiment Polarity Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
H. K. S. K. Hettikankanama, Shanmuganathan Vasanthapriyan,
and Kapila T. Rathnayake
Contents xiii

Early Detection of Diabetes by Iris Image Analysis . . . . . . . . . . . . . . . . 615


P. H. A. H. K. Yashodhara and D. D. M. Ranasinghe
A Novel Palmprint Cancelable Scheme Based on Orthogonal IOM . . . . 633
Xiyu Wang, Hengjian Li, and Baohua Zhao
Shape-Adaptive RBF Neural Network for Model-Based Nonlinear
Controlling Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
Kudabadu J. C. Kumara and M. H. M. R. S. Dilhani
Electricity Load Forecasting Using Optimized Artificial Neural
Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665
M. H. M. R. Shyamali Dilhani, N. M. Wagarachchi,
and Kudabadu J. C. Kumara
Object Detection in Surveillance Using Deep Learning Methods:
A Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677
Dharmender Saini, Narina Thakur, Rachna Jain, Preeti Nagrath,
Hemanth Jude, and Nitika Sharma
MaSMT4: The AGR Organizational Model-Based Multi-Agent
System Development Framework for Machine Translation . . . . . . . . . . 691
Budditha Hettige, Asoka S. Karunananda, and George Rzevski
Comparative Study of Optimized and Robust Fuzzy Controllers
for Real Time Process Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
Ajay B. Patil and R. H. Chile
Ant Colony Optimization-Based Solution for Finding Trustworthy
Nodes in a Mobile Ad Hoc Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
G. M. Jinarajadasa and S. R. Liyanage
Software Development for the Prototype of the Electrical Impedance
Tomography Module in C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
A. A. Katsupeev, G. K. Aleksanyan, N. I. Gorbatenko, R. K. Litvyak,
and E. O. Kombarova
Information Communication Enabled Technology for the Welfare
of Agriculture and Farmer’s Livelihoods Ecosystem in Keonjhar
District of Odisha as a Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739
Bibhu Santosh Behera, Rahul Dev Behera, Anama Charan Behera,
Rudra Ashish Behera, K. S. S. Rakesh, and Prarthana Mohanty
CHAIN: A Naive Approach of Data Analysis to Enhance
Market Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751
Priya Matta, Sparsh Ahuja, Vishisth Basnet, and Bhasker Pant
xiv Contents

Behavioural Scoring Based on Social Activity and Financial


Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
Anmol Gupta, Sanidhya Pandey, Harsh Krishna, Subham Pramanik,
and P. Gouthaman
An Optimized Method for Segmentation and Classification of Apple
Leaf Diseases Based on Machine Learning . . . . . . . . . . . . . . . . . . . . . . . 781
Shaurya Singh Slathia, Akshat Chhajer, and P. Gouthaman
A Thorough Analysis of Machine Learning and Deep Learning
Methods for Crime Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795
J. Jeyaboopathiraja and G. Maria Priscilla
Improved Density-Based Learning to Cluster for User Web Log
in Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813
N. V. Kousik, M. Sivaram, N. Yuvaraj, and R. Mahaveerakannan
Spatiotemporal Particle Swarm Optimization with Incremental Deep
Learning-Based Salient Multiple Object Detection . . . . . . . . . . . . . . . . . 831
M. Indirani and S. Shankar
Election Tweets Prediction Using Enhanced Cart and Random
Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851
Ambati Jahnavi, B. Dushyanth Reddy, Madhuri Kommineni,
Anandakumar Haldorai, and Bhavani Vasantha
Flexible Language-Agnostic Framework To Emit Informative
Compile-Time Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859
Malathy Nagalakshmi, Tanya Sharma, and N. S. Kumar
Enhancing Multi-factor User Authentication for Electronic
Payments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
Md Arif Hassan, Zarina Shukur, and Mohammad Kamrul Hasan
Comparative Analysis of Machine Learning Algorithms for Phishing
Website Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883
Dhiman Sarma, Tanni Mittra, Rose Mary Bawm, Tawsif Sarwar,
Farzana Firoz Lima, and Sohrab Hossain
Toxic Comment Classification Implementing CNN Combining Word
Embedding Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 897
Monirul Islam Pavel, Razia Razzak, Katha Sengupta,
Md. Dilshad Kabir Niloy, Munim Bin Muqith, and Siok Yee Tan
A Comprehensive Investigation About Video Synopsis Methodology
and Research Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 911
Swati Jagtap and Nilkanth B. Chopade
Contents xv

Effective Multimodal Opinion Mining Framework Using Ensemble


Learning Technique for Disease Risk Prediction . . . . . . . . . . . . . . . . . . 925
V. J. Aiswaryadevi, S. Kiruthika, G. S. Priyanka, N. Nataraj,
and M. S. Sruthi
Vertical Fragmentation of High-Dimensional Data Using Feature
Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935
Raji Ramachandran, Gopika Ravichandran, and Aswathi Raveendran
Extrapolation of Futuristic Application of Robotics: A Review . . . . . . . 945
D. V. S. Pavan Karthik and S. Pranavanand
AI-Based Digital Marketing Strategies—A Review . . . . . . . . . . . . . . . . 957
B. R. Arun Kumar
NoRegINT—A Tool for Performing OSINT and Analysis
from Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 971
S. Karthika, N. Bhalaji, S. Chithra, N. Sri Harikarthick,
and Debadyuti Bhattacharya

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 981


Editors and Contributors

About the Editors

Dr. S. Smys received his M.E. and Ph.D. degrees all in Wireless Communication
and Networking from Anna University and Karunya University, India. His main
area of research activity is localization and routing architecture in wireless net-
works. He serves as Associate Editor of Computers and Electrical Engineering
(C&EE) Journal, Elsevier, and Guest Editor of MONET Journal, Springer. He has
served as Reviewer for IET, Springer, Inderscience and Elsevier journals. He has
published many research articles in refereed journals and IEEE conferences. He
has been General Chair, Session Chair, TPC Chair and Panelist in several con-
ferences. He is a member of IEEE and a senior member of IACSIT wireless
research group. He has been serving as Organizing Chair and Program Chair of
several international conferences, and in the Program Committees of several
international conferences. Currently, he is working as Professor in the Department
of Information Technology at RVS Technical Campus, Coimbatore, India.

Dr. Valentina Emilia Balas is currently Full Professor at “Aurel Vlaicu”


University of Arad, Romania. She is the author of more than 300 research papers.
Her research interests are in intelligent systems, fuzzy control and soft computing.
She is Editor-in-Chief to International Journal of Advanced Intelligence Paradigms
(IJAIP) and to IJCSE. Dr. Balas is a member of EUSFLAT, ACM and a SM IEEE,
a member in TC – EC and TC-FS (IEEE CIS), TC – SC (IEEE SMCS), and Joint
Secretary FIM.

Dr. Khaled A. Kamel is currently Chairman and Professor at Texas Southern


University, College of Science and Technology, Department of Computer Science,
Houston, TX. He has published many research articles in refereed journals and IEEE
conferences. He has more than 30 years of teaching and research experience. He has

xvii
xviii Editors and Contributors

been General Chair, Session Chair, TPC Chair and Panelist in several conferences
and acted as Reviewer and Guest Editor in refereed journals. His research interest
includes networks, computing and communication systems.

Dr. Pavel Lafata received his M.Sc. degree in 2007 and the Ph.D. degree in 2011
from the Department of Telecommunication Engineering, Faculty of Electrical
Engineering, Czech Technical University in Prague (CTU in Prague). He is now
Assistant Professor at the Department of Telecommunication Engineering of the
CTU in Prague. Since 2007, he has been actively cooperating with several leading
European manufacturers of telecommunication cables and optical network com-
ponents performing field and laboratory testing of their products as well as con-
sulting further research in this area. He also cooperates with many impact journals
as Fellow Reviewer, such as International Journal of Electrical Power & Energy
Systems, Elektronika ir Elektrotechnika, IEEE Communications Letters, Recent
Patents on Electrical & Electronic Engineering, International Journal of Emerging
Technologies in Computational and Applied Sciences and China Communications.

Contributors

K. Abhinay Department of Computer Science and Engineering, Koneru


Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, India
Surabhi Adhikari Department of Computer Science and Engineering, Delhi
Technological University, New Delhi, India
Jitendra Agrawal School of Information Technology, RGPV, Bhopal, India
Sparsh Ahuja Computer Science and Engineering, Graphic Era University,
Dehradun, India
V. J. Aiswaryadevi Dr NGP Institute of Technology, Coimbatore, India
G. K. Aleksanyan Department of Informational and Measurement Systems and
Technologies, Platov South-Russian State Polytechnic University (NPI),
Novocherkassk, Russia
A. Aravind Department of Computer Science and Engineering, Koneru
Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, India
Md Arif Hassan Faculty of Information Technology, Center for Cyber Security,
National University Malaysia (UKM), Bangi, Selangor, Malaysia
B. R. Arun Kumar Department of Master of Computer Applications, BMS
Institute of Technology and Management (Affiliated to Vivesvaraya Technological
University, Belagavi), Bengaluru, Karnataka, India
Editors and Contributors xix

B. Atul Shrinath Department of Electrical and Electronics Engineering, Amrita


School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
A. Ayyasamy Department of Computer Engineering, Government Polytechnic
College, Nagercoil, Tamil Nadu, India
Inderpreet Singh Bains Delhi Technological University, New Delhi, India
K. BalaSubramanya ECE, Dayananda Sagar College of Engineering, Bangalore,
India
Nayan Bansal Department of Electrical Engineering, Delhi Technological
University, New Delhi, India
V. Baranidharan Department of Electronics and Communication Engineering,
Bannari Amman Institute of Technology, Sathy, India
Vishisth Basnet Computer Science and Engineering, Graphic Era University,
Dehradun, India
Rose Mary Bawm Department of Computer Science and Engineering, East Delta
University, Chittagong, Bangladesh
Anama Charan Behera Faculty Green College, Odisha, India
Bibhu Santosh Behera OUAT, Bhubaneswar, Odisha, India;
International Researcher, LIUTEBM University, Lusaka, Zambia
Rahul Dev Behera OUAT, Bhubaneswar, Odisha, India
Rudra Ashish Behera Faculty Green College, Odisha, India
N. Bhalaji Department of Information Technology, SSN College of Engineering,
Kalavakkam, Chennai, Tamil Nadu, India
Amit Bhati Institute of Engineering and Technology, Dr. RML Awadh
University, Ayodhya, UP, India
Debadyuti Bhattacharya Department of Information Technology, SSN College
of Engineering, Kalavakkam, Chennai, Tamil Nadu, India
Channabasamma VNRVJIET, Hyderabad, India
Akshat Chhajer Department of Information Technology, School of Computing,
SRM Institute of Science and Technology, Kattankulathur, Chennai, India
R. H. Chile Department of Electrical Engineering, SGGS Institute of Engineering
and Technology, Vishnupuri, Nanded, M.S, India
S. Chithra Department of Information Technology, SSN College of Engineering,
Kalavakkam, Chennai, Tamil Nadu, India
xx Editors and Contributors

Nilkanth B. Chopade Department of Electronics and Telecommunication, Pimpri


Chinchwad College of Engineering, Pune, India
Yun Koo Chung Department of Computer Science and Engineering, School of
Electrical Engineering and Computing, Adama Science and Technology University,
Adama, Ethiopia
G. Dalin Associate Professor, PG and Research Department of Computer Science,
Hindusthan College of Arts & Science, Coimbatore, Tamil Nadu, India
Debashis De Department of Computer Science and Engineering, Maulana Abul
Kalam Azad University of Technology, Kolkata, West Bengal, India
M. H. M. R. S. Dilhani Department of Interdisciplinary Studies, Faculty of
Engineering, University of Ruhuna, Galle, Sri Lanka
H. M. N. Dilum Bandara Department of Computer Science and Engineering,
University of Moratuwa, Moratuwa, Sri Lanka
B. Dushyanth Reddy Department of Computer Science and Engineering, Koneru
Lakshmaiah Education Foundation, Vaddeswaram, AP, India
Sunet Eybers University of Pretoria, Hatfield, Pretoria, South Africa
Anubhav Gaba Department of Electrical Engineering, Delhi Technological
University, New Delhi, India
C. Ganesh Babu Department of Electronics and Communication Engineering,
Bannari Amman Institute of Technology, Sathyamangalam, India
Shivani Gautam Department of Computer Science and Applications, Chitkara
University School of Computer Applications, Chitkara University, Himachal
Pradesh, India
Atonu Ghosh Department of Computer Science and Engineering, Maulana Abul
Kalam Azad University of Technology, Kolkata, West Bengal, India
S. Gokul Kumar Department of Technical Supply Chain, Ros Tech (A & D),
Bengaluru, Karnataka, India
N. I. Gorbatenko Department of Informational and Measurement Systems and
Technologies, Platov South-Russian State Polytechnic University (NPI),
Novocherkassk, Russia
Avinash Gour Department of Electronics & Communication Engineering, Sri
Satya Sai University of Technology & Medical Sciences (SSSUTMS), Sehore,
Madhya Pradesh, India
P. Gouthaman Department of Information Technology, School of Computing,
SRM Institute of Science and Technology, Kattankulathur, Chennai, India
Editors and Contributors xxi

S. Graceline Jasmine School of Computer Science and Engineering, VIT


University, Chennai, India
Akshay Ramesh Bhai Gupta School of Information Technology, RGPV, Bhopal,
India
Anil Gupta Department of Computer Science and Engineering, MBM
Engineering College, Jai Narain Vyas University, Jodhpur, Rajasthan, India
Anmol Gupta Department of Information Technology, School of Computing,
SRM Institute of Science and Technology, Kattankulathur, Chennai, India
Sachin Gupta Department of Computer Science, MVNU, Palwal, India
Geeta Guttikonda Department of IT, VRSEC, Vijayawada, India
Anandakumar Haldorai Department of Computer Science and Engineering, Sri
Eshwar College of Engineering, Coimbatore, Tamil Nadu, India
Ratnala Venkata Siva Harish Department of Electronics and Communications
Engineering, Au College of Engineering (Autonomous), Visakhapatnam,
Andhrapradesh, India
Budditha Hettige Department of Computational Mathematics, Faculty of
Information Technology, University of Moratuwa, Moratuwa, Sri Lanka
H. K. S. K. Hettikankanama Department of Computing and Information
Systems, Faculty of Applied Sciences, Sabaragamuwa University of Sri Lanka,
Balangoda, Sri Lanka
Iresh Hiremath Computer Science Engineering Department, Engineering
Department, PES University, Bengaluru, Karnataka, India
Vinh Truong Hoang Ho Chi Minh City Open University, Ho Chi Minh City,
Vietnam
Sohrab Hossain Department of Computer Science and Engineering, East Delta
University, Chittagong, Bangladesh
M. Indirani Assistant Professor, Department of IT, Hindusthan College of
Engineering and Technology, Coimbatore, India
Manoj S. Ishi Department of Computer Engineering, R. C. Patel Institute of
Technology, Shirpur, MS, India
Swati Jagtap Department of Electronics and Telecommunication, Pimpri
Chinchwad College of Engineering, Pune, India
Ambati Jahnavi Department of Computer Science and Engineering, Koneru
Lakshmaiah Education Foundation, Vaddeswaram, AP, India
xxii Editors and Contributors

Rachna Jain Department of Computer Science Engineering, Bharati Vidyapeeth’s


College of Engineering, New Delhi, India
L. Jani Anbarasi School of Computer Science and Engineering, VIT University,
Chennai, India
Malathy Jawahar Leather Process Technology Division, CSIR-Central Leather
Research Institute, Adyar, Chennai, India
K. V. Jeeva Padmini Department of Computer Science and Engineering,
University of Moratuwa, Moratuwa, Sri Lanka
J. Jeyaboopathiraja Research Scholar, Department of Computer Science, Sri
Ramakrishna College of Arts and Science, Coimbatore, India
Aayush Jha Department of Civil Engineering, Delhi Technological University,
New Delhi, India
Avinash Kumar Jha Department of Civil Engineering, Delhi Technological
University, New Delhi, India
G. M. Jinarajadasa University of Kelaniya, Kelaniya, Sri Lanka
D. John Aravindhar CSE, HITS, Chennai, India
D. S. John Deva Prasanna CSE, HITS, Chennai, India
Abhijit R. Joshi Department of Information Technology, D.J. Sanghvi College of
Engineering, Mumbai, India
Shashank Karthik D. Joshi ECE, Dayananda Sagar College of Engineering,
Bangalore, India
E. Jotheeswar Raghava Department of Computer Science and Engineering,
Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra
Pradesh, India
Hemanth Jude Department of ECE, Karunya University, Coimbatore, Tamil
Nadu, India
Siddharth Kailasam Computer Science Engineering Department, Engineering
Department, PES University, Bengaluru, Karnataka, India
Mohammad Kamrul Hasan Faculty of Information Technology, Center for
Cyber Security, National University Malaysia (UKM), Bangi, Selangor, Malaysia
S. Karthik Department of ECE, Sri Krishna College of Engineering and
Technology, Coimbatore, India
V. Karthik Leather Process Technology Division, CSIR-Central Leather Research
Institute, Adyar, Chennai, India
Editors and Contributors xxiii

S. Karthika Department of Information Technology, SSN College of Engineering,


Kalavakkam, Chennai, Tamil Nadu, India
M. M. Karthikeyan Ph.D Research Scholar, PG and Research Department of
Computer Science, Hindusthan College of Arts & Science, Coimbatore, Tamil
Nadu, India
Asoka S. Karunananda Department of Computational Mathematics, Faculty of
Information Technology, University of Moratuwa, Moratuwa, Sri Lanka
Madhavi Katamaneni Department of IT, VRSEC, Vijayawada, India
A. A. Katsupeev Department of Informational and Measurement Systems and
Technologies, Platov South-Russian State Polytechnic University (NPI),
Novocherkassk, Russia
Vamsi Krishna Kikkuri Department of Computer Science and Engineering,
Amrita Vishwa Vidyapeetham, Amritapuri, India
S. Kiruthika Sri Krishna College of Technology, Coimbatore, India
E. O. Kombarova Department of Informational and Measurement Systems and
Technologies, Platov South-Russian State Polytechnic University (NPI),
Novocherkassk, Russia
Madhuri Kommineni Department of Computer Science and Engineering, Koneru
Lakshmaiah Education Foundation, Vaddeswaram, AP, India
Anant Koppar Computer Science Engineering Department, Engineering
Department, PES University, Bengaluru, Karnataka, India
N. V. Kousik Galgotias University, Greater Noida, Uttarpradesh, India
N. Krishanth Department of Electrical and Electronics Engineering, Amrita
School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
Harsh Krishna Department of Information Technology, School of Computing,
SRM Institute of Science and Technology, Kattankulathur, Chennai, India
Anuj Kumar Department of Computer Engineering and Applications, GLA
University, Mathura, India
Kudabadu J. C. Kumara Department of Mechanical and Manufacturing
Engineering, Faculty of Engineering, University of Ruhuna, Hapugala, Galle, Sri
Lanka
N. S. Kumar Department of Computer Science, PES University, Bengaluru, India
Hengjian Li School of Information Science and Engineering, University of Jinan,
Jinan, China
xxiv Editors and Contributors

Farzana Firoz Lima Department of Computer Science and Engineering, East


Delta University, Chittagong, Bangladesh
N. Lingaraj Department of Mechanical Engineering, Rajalakshmi Institute of
Technology, Chennai, India
R. K. Litvyak Department of Informational and Measurement Systems and
Technologies, Platov South-Russian State Polytechnic University (NPI),
Novocherkassk, Russia
S. R. Liyanage University of Kelaniya, Kelaniya, Sri Lanka
R. Mahaveerakannan Department of Information Technology, Hindusthan
College of Engineering and Technology, Otthakkalmandapam, Coimbatore, India
Vikas Maheshwari Department of Electronics & Communication Engineering,
Bharat Institute of Engineering & Technology, Hyderabad, India
Koushik Majumder Department of Computer Science and Engineering, Maulana
Abul Kalam Azad University of Technology, Kolkata, West Bengal, India
Somashekhar Malipatil Department of Electronics & Communication
Engineering, Sri Satya Sai University of Technology & Medical Sciences
(SSSUTMS), Sehore, Madhya Pradesh, India
L. Manickavasagam Department of Electrical and Electronics Engineering,
Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
A. Manjunathan Department of Electronics and Communication Engineering, K.
Ramakrishnan College of Technology, Trichy, India
A. Manusha Reddy VNRVJIET, Hyderabad, India
Amit Marathe Department of Electronics and Telecommunication, Xavier
Institute of Engineering, Mumbai, India
G. Maria Priscilla Professor and Head, Department of Computer Science, Sri
Ramakrishna College of Arts and Science, Coimbatore, India
B. Maruthi Shankar Department of Electronics and Communication Engineering,
Sri Krishna College of Engineering and Technology, Coimbatore, India
M. Mathankumar Department of Electrical and Electronics Engineering,
Kumaraguru College of Technology, Coimbatore, India
Priya Matta Computer Science and Engineering, Graphic Era University,
Dehradun, India
Gaurav Mehta Department of Computer Science and Engineering, Chitkara
University Institute of Engineering and Technology, Chitkara University, Himachal
Pradesh, India
Editors and Contributors xxv

Bharat Mishra MGCGV, Chitrakoot, India


Tanni Mittra Department of Computer Science and Engineering, East West
University, Dhaka, Bangladesh
S. R. Mohanrajan Department of Electrical and Electronics Engineering, Amrita
School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
Prarthana Mohanty OUAT, Bhubaneswar, Odisha, India
Munim Bin Muqith Department of Computer Science and Engineering, BRAC
University, Dhaka, Bangladesh
Malathy Nagalakshmi Department of Computer Science, PES University,
Bengaluru, India
Preeti Nagrath Department of Computer Science Engineering, Bharati
Vidyapeeth’s College of Engineering, New Delhi, India
Jayashree Nair Department of Computer Science and Applications, Amrita
Vishwa Vidyapeetham, Amritapuri, India
Modigari Narendra Department of Computer Science and Engineering, VFSTR
Deemed to be University, Guntur, India
N. Nataraj Bannari Amman Institute of Technology, Sathyamangalam, India
K. M. Naveen Department of Electronics and Communication Engineering,
Bannari Amman Institute of Technology, Sathy, India
Nandini Nayar Department of Computer Science and Engineering, Chitkara
University Institute of Engineering and Technology, Chitkara University, Himachal
Pradesh, India
Md. Dilshad Kabir Niloy Department of Computer Science and Engineering,
BRAC University, Dhaka, Bangladesh
K. P. Nithish Sriman Department of Electronics and Communication
Engineering, Bannari Amman Institute of Technology, Sathy, India
Mosisa Dessalegn Olana Department of Computer Science and Engineering,
School of Electrical Engineering and Computing, Adama Science and Technology
University, Adama, Ethiopia
Madhavi Latha Pandala Department of IT, VRSEC, Vijayawada, India
Sanidhya Pandey Department of Information Technology, School of Computing,
SRM Institute of Science and Technology, Kattankulathur, Chennai, India
Bhasker Pant Computer Science and Engineering, Graphic Era University,
Dehradun, India
xxvi Editors and Contributors

Suraiya Parveen Department of Computer Science, School of Engineering


Science and Technology, Jamia Hamdard, New Delhi, India
Sanjay Patidar Delhi Technological University, New Delhi, India
Ajay B. Patil Department of Electrical Engineering, SGGS Institute of
Engineering and Technology, Vishnupuri, Nanded, M.S, India
Annapurna P. Patil RIT Bangalore, Bangalore, India
J. B. Patil Department of Computer Engineering, R. C. Patel Institute of
Technology, Shirpur, MS, India
D. V. S. Pavan Karthik Vallurupalli Nageswara Rao Vignana Jyothi Institute of
Engineering and Technology, Secunderabad, Telangana, India
Monirul Islam Pavel Center for Artificial Intelligence Technology, Faculty of
Information Science and Technology, The National University of Malaysia, 43600
Bangi, Selangor, Malaysia
G. I. U. S. Perera Department of Computer Science and Engineering, University
of Moratuwa, Moratuwa, Sri Lanka
Subham Pramanik Department of Information Technology, School of
Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai,
India
S. Pranavanand Vallurupalli Nageswara Rao Vignana Jyothi Institute of
Engineering and Technology, Secunderabad, Telangana, India
R. Prathap Department of Electronics and Communication Engineering, Bannari
Amman Institute of Technology, Sathy, India
N. Praveen Kumar Department of Electronics and Communication Engineering,
Bannari Amman Institute of Technology, Sathy, India
B. Premjith Center for Computational Engineering and Networking, Amrita
School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
G. S. Priyanka Department of Electronics and Communication Engineering,
Sri Krishna College of Engineering and Technology, Coimbatore, India;
Sri Krishna College of Technology, Coimbatore, India
Manoj Priyatham Department of ECE, APS College of Engineering, Bangalore,
Karnataka, India
M. Punithavalli Department of Computer Applications, School of Computer
Science and Engineering, Bharathiar University, Coimbatore, India
E. Rajesh Kumar Department of Computer Science and Engineering, Koneru
Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, India
Editors and Contributors xxvii

P. Rajesh Kumar Department of Electronics and Communications Engineering,


Au College of Engineering (Autonomous), Visakhapatnam, Andhrapradesh, India
R. Rajesh Sharma Department of Computer Science and Engineering, School of
Electrical Engineering and Computing, Adama Science and Technology University,
Adama, Ethiopia
K. S. S. Rakesh LIUTEBM University, Lusaka, Zambia
Shrawan Ram Department of Computer Science and Engineering, MBM
Engineering College, Jai Narain Vyas University, Jodhpur, Rajasthan, India
Raji Ramachandran Department of Computer Science and Applications, Amrita
Vishwa Vidyapeetham, Amritapuri, India
M. Ramkumar Department of Electronics and Communication Engineering, Sri
Krishna College of Engineering and Technology, Coimbatore, India
D. D. M. Ranasinghe Department of Electrical & Computer Engineering, The
Open University of Sri Lanka, Nawala, Nugegoda, Sri Lanka
R. Ranjith Department of Electrical and Electronics Engineering, Amrita School
of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
Kapila T. Rathnayake Department of Physical Sciences and Technology, Faculty
of Applied Sciences, Sabaragamuwa University of Sri Lanka, Balangoda, Sri Lanka
Aswathi Raveendran Department of Computer Science and Applications, Amrita
Vishwa Vidyapeetham, Amritapuri, India
P. Ravi Kiran Varma MVGR College of Engineering, Vizianagaram, AP, India
Gopika Ravichandran Department of Computer Science and Applications,
Amrita Vishwa Vidyapeetham, Amritapuri, India
Razia Razzak Department of Computer Science and Engineering, BRAC
University, Dhaka, Bangladesh
Ali Abbas Rizvi Department of Information Technology, D.J. Sanghvi College of
Engineering, Mumbai, India
Suresh Ruthala MVGR College of Engineering, Vizianagaram, AP, India
George Rzevski Department of Computational Mathematics, Faculty of
Information Technology, University of Moratuwa, Moratuwa, Sri Lanka
S. Sabena Department of Computer Science Engineering, Anna University
Regional Campus, Tirunelveli, Tamil Nadu, India
Ritu Sachdeva Department of Computer Science, MVNU, Palwal, India
T. Sai Aparna Center for Computational Engineering and Networking, Amrita
School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
xxviii Editors and Contributors

L. Sai Ramesh Department of Information Science and Technology, Anna


University, Chennai, Tamil Nadu, India
Dharmender Saini Department of Computer Science Engineering, Bharati
Vidyapeeth’s College of Engineering, New Delhi, India
M. D. Saranya Department of Electronics and Communication Engineering, KPR
Institute of Engineering and Technology, Coimbatore, India
R. Sarath Kumar Department of Electronics and Communication Engineering,
Sri Krishna College of Engineering and Technology, Coimbatore, India
Dhiman Sarma Department of Computer Science and Engineering, Rangamati
Science and Technology University, Rangamati, Bangladesh
Tawsif Sarwar Department of Computer Science and Engineering, East Delta
University, Chittagong, Bangladesh
S. Satheesh Kumar Department of Electronics and Communication Engineering,
KPR Institute of Engineering and Technology, Coimbatore, India
K. Selvakumar Department of Computer Applications, NIT, Trichy, India
Katha Sengupta Department of Computer Science and Engineering, BRAC
University, Dhaka, Bangladesh
Naomi Setsabi University of Pretoria, Hatfield, Pretoria, South Africa
S. Shalini RIT Bangalore, Bangalore, India
S. Shankar Professor, Department of CSE, Hindusthan College of Engineering
and Technology, Coimbatore, India
M. Shanthini PSG Institute of Technology and Applied Research, Coimbatore,
India
Nitika Sharma Department of Computer Science Engineering, Bharati
Vidyapeeth’s College of Engineering, New Delhi, India
Tanya Sharma Department of Computer Science, PES University, Bengaluru,
India
S. Shrinivas ECE, Dayananda Sagar College of Engineering, Bangalore, India
Zarina Shukur Faculty of Information Technology, Center for Cyber Security,
National University Malaysia (UKM), Bangi, Selangor, Malaysia
N. Shwetha Department of ECE, Dr. Ambedkar Institute of Technology,
Bangalore, Karnataka, India
M. H. M. R. Shyamali Dilhani Department of Interdisciplinary Studies,
University of Ruhuna, Hapugala, Galle, Sri Lanka
Editors and Contributors xxix

R. K. Omega H. Silva Department of Computer Science and Engineering,


University of Moratuwa, Moratuwa, Sri Lanka
Thushari Silva Department of Computational Mathematics, Faculty of
Information Technology, University of Moratuwa, Moratuwa, Sri Lanka
K. Simran Center for Computational Engineering and Networking, Amrita School
of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
Ashutosh Kumar Singh IIT-A, Prayagraj, India
Bhavesh Singh Department of Information Technology, D.J. Sanghvi College of
Engineering, Mumbai, India
Poonam Singh Department of Computer Science and Applications, Chitkara
University School of Computer Applications, Chitkara University, Himachal
Pradesh, India
M. Sivaram Research Center, Lebanese French University, Erbil, Iraq
P. Sivasankar NITTTR, Chennai, India
Shaurya Singh Slathia Department of Information Technology, School of
Computing, SRM Institute of Science and Technology, Kattankulathur, Chennai,
India
K. P. Soman Center for Computational Engineering and Networking, Amrita
School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
G. Sophia Reena Department of Information Technology, PSGR Krishnammal
College for Women, Coimbatore, India
N. Sri Harikarthick Department of Information Technology, SSN College of
Engineering, Kalavakkam, Chennai, Tamil Nadu, India
M. S. Sruthi Sri Krishna College of Technology, Coimbatore, India
G. Subash Department of Electrical and Electronics Engineering, Amrita School
of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
K. V. Subba Raju MVGR College of Engineering, Vizianagaram, AP, India
J. S. Sujin Department of ECE, Sri Krishna College of Engineering and
Technology, Coimbatore, India
Akey Sungheetha Department of Computer Science and Engineering, School of
Electrical Engineering and Computing, Adama Science and Technology University,
Adama, Ethiopia
Yeresime Suresh BIT&M, Ballari, India
C. Sureshkumar Faculty of Information and Communication Engineering, Anna
University, Chennai, Tamil Nadu, India
xxx Editors and Contributors

Srikar Talagani Department of Computer Science and Engineering, Amrita


Vishwa Vidyapeetham, Amritapuri, India
Siok Yee Tan Center for Artificial Intelligence Technology, Faculty of
Information Science and Technology, The National University of Malaysia, 43600
Bangi, Selangor, Malaysia
Narina Thakur Department of Computer Science Engineering, Bharati
Vidyapeeth’s College of Engineering, New Delhi, India
Surendrabikram Thapa Department of Computer Science and Engineering,
Delhi Technological University, New Delhi, India
S. Thenmozhi ECE, Dayananda Sagar College of Engineering, Bangalore, India
Yashwanth Thota Department of Computer Science and Engineering, Amrita
Vishwa Vidyapeetham, Amritapuri, India
Kiet Tran-Trung Ho Chi Minh City Open University, Ho Chi Minh City,
Vietnam
Satyendra Tripathi MGCGV, Chitrakoot, India
S. Udhayanan Department of Electronics and Communication Engineering, Sri
Bharathi Engineering College for Women, Pudukkottai, India
J. Uma Department of Information Technology, Hindusthan College of
Engineering and Technology, Otthakkalmandapam, Coimbatore, India
M. Varun Computer Science Engineering Department, Engineering Department,
PES University, Bengaluru, Karnataka, India
Bhavani Vasantha Department of Computer Science and Engineering, Koneru
Lakshmaiah Education Foundation, Vaddeswaram, AP, India
Shanmuganathan Vasanthapriyan Department of Computing and Information
Systems, Faculty of Applied Sciences, Sabaragamuwa University of Sri Lanka,
Balangoda, Sri Lanka
Vasundhara Department of Computer Science, School of Engineering Science
and Technology, Jamia Hamdard, New Delhi, India
Pavan Vemuri Department of Computer Science and Engineering, Amrita
Vishwa Vidyapeetham, Amritapuri, India
R. Venba Leather Process Technology Division, CSIR-Central Leather Research
Institute, Adyar, Chennai, India
Dushyanthi Vidanagama Department of Computational Mathematics, Faculty of
Information Technology, University of Moratuwa, Moratuwa, Sri Lanka
G. Vidya PSG Institute of Technology and Applied Research, Coimbatore, India
Editors and Contributors xxxi

B. Vikas ECE, Dayananda Sagar College of Engineering, Bangalore, India


P. Vivekanandan Department of Computer Science and Engineering, Park
College of Engineering and Technology, Kaniyur, Coimbatore, India
N. M. Wagarachchi Department of Interdisciplinary Studies, University of
Ruhuna, Hapugala, Galle, Sri Lanka
Xiyu Wang School of Information Science and Engineering, University of Jinan,
Jinan, China
P. H. A. H. K. Yashodhara Department of Electrical & Computer Engineering,
The Open University of Sri Lanka, Nawala, Nugegoda, Sri Lanka
N. Yuvaraj ICT Academy, Chennai, Tamilnadu, India
Baohua Zhao School of Information Science and Engineering, University of
Jinan, Jinan, China
A Heuristic Algorithm
for Deadline-Based Resource Allocation
in Cloud Using Modified Fish Swarm
Algorithm

J. Uma, P. Vivekanandan, and R. Mahaveerakannan

Abstract Virtualization plays an indispensable role in improving the efficacy and


agility of cloud computing. This process involves in assigning resources to the cloud
application users based on the requirements, where most of the resources are being
virtual in nature. These resources are utilized by the users for implementing tasks for
a certain time period. Virtualization assists in effective usage of hardware resources.
Based on the application, the users may require definite amount of resources to be
utilized in a definite time period. Thus, a deadline that is a start time and end time for
every resource needs to be considered. Deadline specifically relates to time limit for
the implementation of tasks in the workflow. In this paper, the resource allocation
is optimized based on deadline as the optimization parameter using modified fish
swarm algorithm (FSA).

Keywords Cloud computing · Virtualization · Deadline and fish swarm algorithm


(FSA)

1 Introduction

Cloud computing [1] is a resource-constrained environment in which the allocation of


resources plays a major role. The requirement of these virtual resources is defined by
means of certain parameters that specify resources like applications, services, CPU,

J. Uma · R. Mahaveerakannan (B)


Department of Information Technology, Hindusthan College of Engineering and Technology,
Otthakkalmandapam, Coimbatore 641032, India
e-mail: mahaveerakannan10@gmail.com
J. Uma
e-mail: ujothi@gmail.com
P. Vivekanandan
Department of Computer Science and Engineering, Park College of Engineering and Technology,
Kaniyur, Coimbatore 641659, India
e-mail: drpvivekanandan@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 1
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_1
2 J. Uma et al.

processors, I/O, networks, storage and servers. It is imperative that these resources
are effectively utilized in the cloud environment. With varying resource availability
and workloads, keeping up with the quality of service (QoS) and simultaneously
maintaining an effective usage of resources and system performance are critical tasks
at hand. This gives rise to issues between the cloud resource provider and user for
maximizing the resource usage effectively. Thus, the basic tenet of cloud computing
is resources allotment [2].
Some of the key issues in cloud computing have been resolved using meta-
heuristic algorithms that have a stronghold in the research field, due to its efficacy and
efficiency. Despite a lot of attention being garnered for resource allocation in cloud
computing from the global research community, several of late studies have drawn
attention towards the progress made in this area. The objective of resource alloca-
tion [3] is finding an optimal and feasible allocation scheme for a certain service.
Classification of effective resource assignment schemes which efficiently utilize the
constrained resources in the cloud environment has been performed. Resource assign-
ment in distributed clouds and concentrating are the chief issues with regard to the
challenges faced in the cloud paradigm. The issues extend to resource discovery, its
availability, selecting the appropriate resource, treating and offering the resource,
monitoring the resource, etc. Despite various issues present in the new research,
distributed cloud is promising for usage across different contexts.
Provisioning of resources is done by means of virtual machine (VM) technology
[4]. Virtual environment has the potential of decreasing the mean job response time
and executing the tasks as per the resource availability. The VMs are assigned to
the users based on the nature of the job to be executed. A production environment
involves several tasks being submitted to the cloud. Thus, the job scheduler software
should comprise interfaces for defining the workflows and/or job dependencies for
automatically executing the submitted tasks. All of the required VM images that are
needed for running the user-related tasks are preconfigured by the cloud broker and
stored in the cloud. All jobs that enter are sent into a queue. These jobs and a pool
of machines are all managed by a system-level scheduler which runs on a particular
system which also decides if new VM has to be provisioned from clouds and/or jobs
are assigned to the VMs. This scheduler works from time to time. There are five tasks
performed by the scheduler at every moment: (1) forecasting the possible workloads,
(2) provisioning the required VMs beforehand from the cloud, (3) assigning tasks
to VMs, (4) releasing the VM if its billing time unit (BTU) is close to surge and (5)
starting the required amount of VMs when a lot of unassigned jobs are present.
The progress in virtualization and distributed computing for supporting the cost-
effective utilization of computing resources is the basis for cloud computing. There
is an emphasis on the scalability of resources as well as services-on-demand. Based
on the business requirements, the resources can be scaled up or scaled down in cloud
paradigm. There are issues in on-demand-resource allocation as the needs of the
customers need to be factored. Provisioning of resources has been done via resource
provisioning. Thus, based on the features of the task that requires the resources, the
VMs are accordingly assigned. The execution of jobs of greater priority must not be
delayed due to low-priority jobs. This kind of a context can cause a resource access
A Heuristic Algorithm for Deadline-Based Resource … 3

contention between jobs of low priority and high priority. The chief input in our
allocation is the information contained in the resource request tuple. Nonetheless,
the benefit of the Dcloud’s resource allotting algorithm for effectively using the cloud
resources is severely undercut; if a selfish user keeps declaring short deadlines, it
will affect the balance of the VM and bandwidth adversely [5]. A mechanism based
on job as well as strategy-proof charging has been formulated for the DCloud. This
allows the users to honestly declare deadlines so that their costs are minimized.
There are several meta-heuristic algorithms [6] in use. Many new variations are
often proposed for resource assignment in several fields. Several meta-heuristic algo-
rithms that are popular in the cloud computing arena include firefly algorithm (FA),
league championship algorithm (LCA), immune algorithm (IA), harmony search
(HS), cuckoo search (CS), differential evolution (DE), memetic algorithm (MA) and
ant colony optimization (ACO), among others [7]. There are several benefits of the
artificial FSA (AFSA) including its ability for global search, strength and intense
robustness. It also has rapid convergence and better precision for global search. For
enabling flexible as well as effective usage of resources in the data centres, the Dcloud
influences the deadlines of the cloud computing jobs. The literature associated with
the proposed work has been explained in the second section briefly. The details on
the schemes used for this work are explained in the third section, and the outcomes
obtained are delineated in the fourth section with the conclusion of the work presented
in the fifth section.
In this work, an optimization algorithm-based FSA for minimizing the overall
workflow execution cost while meeting deadline constraints. Section 2 briefly
explains the literatures related to this work. Section 3 presents the techniques used
in methodology. Section 4 explains the result and discussion. Section 5 concludes
the entire work.

2 Literature Survey

On the basis of evaluating the job traits, Saraswathi et al. [2] focussed on the assign-
ment of VM resources to the user. The objective here is that jobs of low importance
(whose deadlines are high) should not affect the execution of highly important jobs
(whose deadlines are low). The VM resources have to be allocated dynamically for
a user job within the available deadline. Resource and deadline-aware Hadoop job
scheduler (RDS) has been suggested by Cheng et al. [8]. This takes into consideration
the future availability of resources, while simultaneously decreasing the misses in the
job’s deadline. The issue of job scheduling is formulated as an online optimization
problem. This has been solved using an effective receding horizontal control algo-
rithm. A self-learning prototype has been designed for estimating the job completion
times for aiding the control. For predicting the availability of resources, a simple, yet
an effective model has been used. Open-source Hadoop implementation has been
used for implementing the RDS. Analysis has been done considering the varying
benchmark workloads. It has been shown via experimental outcomes that usage of
4 J. Uma et al.

RDS decreases the penalty of missing the deadline by at least 36% and 10% when
compared, respectively, with fair scheduler and EDF scheduler.
Cloud infrastructure permits the simultaneous demand of the cloud services to the
active users. Thus, effective provisioning of resources for fulfilling the user require-
ments is becoming imperative. When resources are used effectively, they cost lesser.
In the virtualization scheme, the VMs are the resources which can map the incoming
user requests/ tasks prior to the execution of the task on the physical machines. An
analysis of the greedy approach algorithms for effectively mapping the tasks to the
virtual machines and decreasing the VM usage costs has been explored by Kumar
and Mandal [9]. The allocation of resources is crucial in several computational areas
like operating systems and data centre management. As per Mohammad et al. [10],
resource allocation in cloud-based systems involves assuring the users that their
computing requirements are totally and appropriately satisfied by the cloud server
set-up. The efficient utilization of resources is paramount to the context of servers
that provide cloud services, so that maximum profit is generated. This leads to the
resource allocation and the task scheduling to be the primary challenges in cloud
computing.
The review of the AFSA algorithm has been presented by Neshat et al. [11]. Also
described is the evolution of the algorithm including its improvisations as well as
its combinations with several methods. Its applications are also delineated. Several
optimization schemes can be used in combination with the suggested scheme which
may result in improvisation of performance of this technique. There are, however,
some drawbacks including high complexity of time, absence of balance between the
local search and the global search and also the lack of advantage from the experiences
of the members of the group for forecasting the movements. The proposed deadline-
aware two-stage scheduling schedules the VM for the requested jobs submitted from
users. Every job is specified to need 2 types of VMs in a sequential for completing its
respective tasks, as per Raju et al. [12]. This prototype takes into consideration the
deadlines with regard to the reply time and the waiting time, and it allocates the VMs
as resources for the jobs that require them, based on the processing time and the jobs
scheduling. This prototype has been evaluated using a simulation environment by
considering the analysis of several metrics like deadline violations, mean turnaround
time and mean waiting time. It has been contrasted with first come first serve (FCFS)
and shortest job first (SJF) scheduling strategies. In comparison with these schemes,
the suggested prototype has shown to decrease the evaluation metrics by constant
factor.
From the CSP’s perspective, the issue of global optimization of the cloud system
has been addressed by Gao et al. [13]. It takes into consideration lowering of oper-
ating expenses by maximizing the energy efficiency and simultaneously fulfilling
the user-defined deadlines in the service-level agreements. For the workload to be
modelled, viable approaches should be considered for optimizing cloud operation.
There are two models that are currently available: batch requests and task graphs with
dependencies. The latter method has been adopted. This micro-managed approach to
workloads allows the optimization of energy as well as performance. Thus, the CSP
can meet the user deadlines at lesser operational expenses. Yet, some added efforts
A Heuristic Algorithm for Deadline-Based Resource … 5

are required by these optimizations with regard to resource provisioning, placement


of VMs and scheduling of tasks. The suggested framework addresses these issues
holistically. It has been conveyed by Rodriguez and Buyya [14] that the schemes
that exist cannot fulfil the requirements of QoS or even fail to include elastic and
heterogeneous requirements of the computing services in the cloud environments.
This paper suggests a strategy for resource provisioning and scheduling for scientific
workflows on infrastructure as a service clouds. For minimizing the overall workflow
execution expense and simultaneously fulfilling the deadline constraint, an algorithm
has been suggested which is based on particle swarm optimization (PSO). CloudSim
and different popular scientific workflows of variable sizes have been used for eval-
uating our heuristics. The outcomes have suggested that compared to some of the
state-of-the-art schemes, the suggested scheme performs better.
An auto-adaptive resource control framework which was deadline-aware has been
suggested by Xiang et al. [15]. This framework can be executed in a totally distributed
fashion. This makes it suited to the environments that are not reliable, where single
point of failure is unacceptable. This concept is based on the Nash bargaining in
non-cooperative game theory. Based on this concept, this framework assigns cloud
resources optimally for maximizing the Nash bargaining solutions (NBS) with regard
to the priority of job as well as its deadline for completion. It additionally allows
for resource allocation to be auto-adaptive and deadline-aware, rebalancing in when
exposed to cyber or physical threats which may compromise the ability of cloud
systems. Experiments on Hadoop framework have validated the suggested scheme.

3 Methodology

Most work in the literature focusses on job completion time or job deadline along
with bandwidth constraints and VM capabilities. The challenge, however, is to map
the deadline against the job completion time so that deadlines are met with minimum
cost and job completion time. A novel allocation algorithm which benefits from the
added information in the requested attraction has been formulated. It is based on two
schemes: time sliding and bandwidth scaling. In time sliding, a delay between job/task
submission and execution is permitted. This requires smoothening the maximum
demand of the cloud and decreasing the quantity of excluded users at busy intervals,
whereas in bandwidth scaling, dynamic adaptation of the bandwidth assigned to the
VMs is allowed. Deadline, greedy-deadline and FSA-based deadlines are detailed in
this section.

3.1 Deadline

In cloud computing, for fulfilling a user’s task, a job needs cloud resources. One of
the available models for resource scheduling is deadline-aware two-stage scheduling
6 J. Uma et al.

model. The cloud resources are present as virtual machines. After scheduling the
given n job requests, the scheduler allocates the needed cloud resources/VMs for
every job that requests for it [12]. The scheduler, on receiving the n jobs from different
users, allocates the VMs as resources by means of job scheduling, in deadline-aware
two-stage scheduling. In this prototype, a job needs several VMs of various types
sequentially for task completion.
The overall workflow deadline is distributed across the tasks. A part of the deadline
is allocated to every task based on the VM which is the most cost-effective for that
particular task.

3.2 Greedy-Deadline

As they have the greedy approach, resource allocation algorithms (RAAs) are
extremely suited for dynamic and heterogeneous cloud resource environments. These
are linked to a process scheduler by means of cloud communication [9]. For deter-
mining the issues of task scheduling, the greedy approach for optimized profit is
effective. The greedy-deadline resource allocation algorithm [16] can be explained
as follows:
1. The input is the virtual machine input.
2. Every resource in the resource cache is checked to see if it is in suspended state
or waking state. If yes, then the remaining capacity of the resource is found and
checked.
3. The remaining capacity of the resource is found if it is in the sleeping state.
4. The function is processed to obtain the resource from the cache.
The priorities of the incoming tasks are evaluated, and the newly allocated priority
is compared with the previously allocated ones. This is followed by assigning the
tasks into the previously formulated priority queues. After allocation of the tasks,
tasks in the high-priority queues are selected and are executed. This is followed by
the transfer of tasks from medium-priority queues to high-priority queues. Thus, the
remaining tasks in the queues are executed until the queue has been exhausted of all
tasks.

3.3 Fish Swarm Algorithm (FSA)

Another population-based optimizer is the AFSO. Initially, the process begins with
a random set of generated probable solutions. An interactive search is performed
for obtaining an optimum solution. Artificial fish (AF) [17] refers to the fictitious
form which is used for analysing and explaining the problem. It may be understood
through the concept of animal ecology. Object-oriented analytical scheme has been
employed for considering the artificial fish as an object enclosed in its own data
A Heuristic Algorithm for Deadline-Based Resource … 7

as well as series of behaviours. These fish grasp an extraordinary amount of data


regarding their surroundings through the use of their senses. They control their tail
and fin for the stimulant reaction. The solution space constitutes the environment,
wherein the AF resides along with the states of the other fish. Its present state and the
state of its environment (including the current quality of questions as well as the states
of other companions) determine its next behaviour. Its environment is influenced by
not only its 7 own activities but also due to the activities of its companions [18].
The external perception of the AF is realized by means of its vision. Let X denote
the present state of the AF. Let the visual distance be denoted by ‘Visual’. Let the
visual position at a certain moment be denoted by ‘Xv’. In case the virtual position
state is more efficient compared to the current one, it advances by one step in the
direction and arrives at the subsequent state denoted by Xnext. Otherwise, it goes on a
tour of inspection. The more number of inspection tours the artificial fish goes on, and
the greater the amount of knowledge the artificial fish gains about its environment.
It definitely does not require to move through complex or infinite states. This aids
in finding the global minimum by letting certain local optimum along with some
uncertainty.
Let X = (x1 , x2 , . . . , xn ) and X v = (x1v , x2v , . . . , xnv ), then this process can be
expressed as in Eq. (1):

xiv = xi + Visual.rand(), i ∈ (0, n) (1)

Xv − X
xnext = X + .Step.rand() (2)
||X v − X ||

where Rand () is random number between 0 and 1, Step is the step length, x i is the
optimization variable and n is the number of variables. There are two components
included in the AF model—variables and functions. Variables are as follows: the
existing position of AF is denoted by X. The moving length step is denoted by ‘Step’.
The visual distance is denoted by ‘Visual’. The try number is given by try_number,
and the crowd factor whose value is between 0 and 1 is given by δ. The functions
include the behaviours of the AF: preying, swarming, following, moving, leaping
and evaluating. The flow chart for artificial fish swarm optimization has been shown
in Fig. 1.

3.4 Proposed Fish Swarm Algorithm (FSA)-Deadline

The FSA is easily stuck in local optimal solutions, so an improved FSA is proposed to
avoid local optima using appropriate cost function for evaluating the solutions. AFSA
also finds increased usage in complex optimization fields. It offers an alternate to well-
known evolutionary methods of computing which can be applied across domains.
The service is considered to be a supererogatory one that a cloud service provider
8 J. Uma et al.

Start

Initialize all fishes

Swarming Following
Behaviour Behaviour

Implement the better of two behavior on every fishes

Meet criteria?

Final Solution

Fig. 1 Artificial fish swarm algorithm (AFSA)

offers as potential tenants may make use of the room to submit a job run time lesser
than the actual profiled outcome on purpose (anticipating the cloud service provider
to protract it by relaxing). Practically however, there may be job requests where
the profiling error is greater than the profiling relaxing index provided by the cloud
service provider. There are two schemes employed by the cloud provider to deal with
these tasks. First approach is to kill the jobs at the presumed end duration. The second
approach is when a cloud provider makes use of small part of the cloud resource for
specifically servicing those jobs. The virtual machines associated with those jobs are
sent at once to the specific servers at their expected end times. Thereafter, they are
run on the basis of best effort.
The algorithmic procedure of implementation is described below:
(1) Fishes are positioned in random on a task node. That is, each fish represents a
solution towards meeting the objectives of deadline and job completion time.
(2) Fishes choose a path to a resource node with a certain probability, determining
if the limits of the optimization model are met. If they are met, the node is
A Heuristic Algorithm for Deadline-Based Resource … 9

included to the list of solutions by the fish. Else, the fish goes on to search for
another node.
If X i is the current state of fish, a state X j is chosen randomly within visual
distance, Y = f (X) is the food consistence of fish:

X j = X i + a f _visual.rand() (3)

If Yi < Y j , then the fish moves forward a step in the direction of the vector sum
of the X j and the X best_af , X best_af is the best fish available.
⎡ ⎤
X j − X tj X − X t
X it+1 = X it + ⎣   +  best_af i ⎦
∗ af_step ∗ ranf() (4)
 t  X best_af − X t 
X j − X j  i

Else, state X j is chosen randomly again and check if it complies with the forward
requirement. If the forward requirement is not satisfied, then the fish would move
a step randomly; this helps to avoid local minima.

X it+1 = X it + af_visual ∗ rand() (5)

(3) Fish move arbitrarily towards the next task node for the assignment of their next
task.
(4) Assigning all the tasks is regarded to be an iterative procedure. The algorithm
is terminated at the time when the number of iterations reaches its maximum.

4 Results and Discussion

Table 1 displays the parameter of FSA. Tables 2, 3 and 4 and Figs. 2, 3 and 4 show the
makespan, VM utilization and percentage of successful job completion, respectively,
for deadline, greedy-deadline and FSA-deadline.
It is seen from Table 2 and Fig. 2 that the makespan for FSA-deadline performs
better by 8.7% and by 10.9% than deadline and greedy-deadline, respectively, for
number of jobs 200. The makespan for FSA-deadline performs better by 7.6% and

Table 1 Parameter of FSA


Parameter Value
Population 30
Max generation 50
Visual 2.5
Try number 5
Step 0.1
Crowd 0.618
10 J. Uma et al.

Table 2 Makespan (in


Number of jobs Deadline Greedy-deadline FSA-deadline
seconds) for FSA-deadline
200 44 43 48
400 94 92 102
600 152 149 164
800 201 197 218
1000 250 246 271

Table 3 VM utilization (in


Number of jobs Deadline Greedy-deadline FSA-deadline
percentage) for FSA-deadline
200 76 78 79
400 77 79 78
600 79 81 83
800 80 82 83
1000 76 79 79

Table 4 Percentage of successful job completion for FSA-deadline


Percentage of successful job completion Deadline Greedy-deadline FSA-deadline
200 79.9 82.3 83.5
400 81.4 83.4 81.9
600 83.7 85.3 87.6
800 84.7 86.4 87.4
1000 80.1 83.1 83.6

325

275

225
Makespan (sec)

175

125

75

25
200 400 600 800 1000
Number of Jobs

Deadline Greedy - Deadline FSA-Deadline

Fig.2 Makespan (in sec) for FSA-deadline


A Heuristic Algorithm for Deadline-Based Resource … 11

84
83
82
VM Utilization (%) 81
80
79
78
77
76
75
74
200 400 600 800 1000
Number of Jobs

Deadline Greedy - Deadline FSA-Deadline

Fig. 3 VM utilization (in percentage) for FSA-deadline

89
Sucessful job completion (%)

87

85

83

81

79

77
200 400 600 800 1000
Number of Jobs

Deadline Greedy - Deadline FSA-Deadline

Fig. 4 Percentage of successful job completion for FSA-deadline

by 9.6% than deadline and greedy-deadline, respectively, for number of jobs 600.
The makespan for FSA-deadline performs better by 8.1% and by 9.7% than deadline
and greedy-deadline, respectively, for number of jobs 1000.
It is seen from Table 3 and Fig. 3 that the VM utilization for FSA-deadline performs
better by 3.87% and by 1.27% than deadline and greedy-deadline, respectively, for
number of jobs 200. The VM utilization for FSA-deadline performs better by 4.94%
and by 2.44% than deadline and greedy-deadline, respectively, for number of jobs
600. The VM utilization for FSA-deadline performs better by 3.9% and no change
than deadline and greedy-deadline, respectively, for number of jobs 1000.
It is seen from Table 4 and Fig. 4 that the percentage of successful job completion
for FSA-deadline performs better by 4.41% and by 1.45% than deadline and greedy-
deadline, respectively, for number of jobs 200. The percentage of successful job
12 J. Uma et al.

completion for FSA-deadline performs better by 4.6% and by 2.7% than deadline and
greedy-deadline, respectively, for number of jobs 600. The percentage of successful
job completion for FSA-deadline performs better by 4.28% and by 0.599% than
deadline and greedy-deadline, respectively, for number of jobs 1000.

5 Conclusion

In the cloud computing paradigm, the computation as well as the storage of resources
is migrated to the “cloud.” These resources can be accessed anywhere by any user,
based on the demand. Suspicious modification of optimization parameters in meta-
heuristic algorithms is needed in order to find better solutions that lack extreme
computational time. The artificial fish swarm algorithm (shortened to AFSA) is
regarded as one among the top optimization methods within the set of swarm intelli-
gence algorithms. AFSA is chosen because it has global search ability, good robust-
ness as well as tolerance of parameter setting. This work proposes heuristic algorithm
for deadline-based resource allocation in cloud using modified fish swarm algorithm.
Outcomes have shown that the makespan for FSA-deadline performs better for two
hundred jobs by 8.7% and by 10.9% than deadline and greedy-deadline, respectively.
For 600 jobs, the FSA-deadline makespan is better compared to deadline by 7.6%
and for greedy-deadline, and it is better by 9.6%. The same parameters for 1000 jobs
are better by 8.1 and 9.7%. This task could be implemented by a trusted third party
in future that is reliable to both tenant and provider.

References

1. Wei W, Fan X, Song H, Fan X, Yang J (2016) Imperfect information dynamic stackelberggame
based resource allocation using hidden Markov for cloud computing. IEEE Trans Serv Comput
11(1):78–89
2. Saraswathi AT, Kalaashri YR, Padmavathi S (2015) Dynamic resource allocation scheme in
cloud computing. Proc Comput Sci 47:30–36
3. Chen X, Li W, Lu S, Zhou Z, Fu X (2018) Efficient resource allocation for on-demand mobile-
edge cloud computing. IEEE Trans Veh Technol 67(9):8769–8780
4. Jin S, Qie X, Hao S (2019) Virtual machine allocation strategy in energy-efficient cloud data
centres. Int J Commun Netw Distrib Syst 22(2):181–195
5. Li D, Chen C, Guan J, Zhang Y, Zhu J, Yu R (2015) DCloud: deadline-aware resource allocation
for cloud computing jobs. IEEE Trans Parallel Distrib Syst 27(8):2248–2260
6. Madni SHH, Latiff MSA, Coulibaly Y (2016) An appraisal of meta-heuristic resource allocation
techniques for IaaS cloud. Indian J Sci Technol 9(4)
7. Asghari S, Navimipour NJ (2016) Review and comparison of meta-heuristic algorithms for
service composition in cloud computing. Majlesi J Multimedia Proces 4(4)
8. Cheng D, Rao J, Jiang C, Zhou X (2015, May) Resource and deadline-aware job scheduling
in dynamic hadoop clusters. In Parallel and Distributed Processing Symposium (IPDPS), 2015
IEEE International, pp 956–965. IEEE
A Heuristic Algorithm for Deadline-Based Resource … 13

9. Kumar D, Mandal T (2016, April) Greedy approaches for deadline based task consolidation
in cloud computing. In: 2016 International Conference onComputing, Communication and
Automation (ICCCA), pp 1271–1276. IEEE
10. Mohammad A, Kumar A, Singh LSV (2016) A greedy approach for optimizing the problems
of task scheduling and allocation of cloud resources in cloud environment
11. Neshat M, Sepidnam G, Sargolzaei M, Toosi AN (2014) Artificial fish swarm algorithm: a
survey of the state-of-the-art, hybridization, combinatorial and indicative applications. Artif
Intell Rev 42(4):965–997
12. Raju IRK, Varma PS, Sundari MR, Moses GJ (2016) Deadline aware two stage scheduling
algorithm in cloud computing. Indian J Sci Technol 9(4)
13. Gao Y, Wang Y, Gupta SK, Pedram M (2013, September) An energy and deadline aware
resource provisioning, scheduling and optimization framework for cloud systems. In: Proceed-
ings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Co design
and System Synthesis, p 31. IEEE Press
14. Rodriguez MA, Buyya R (2014) Deadline based resource provisioning and scheduling
algorithm for scientific workflows on clouds. IEEE Trans Cloud Comput 2(2):222–235
15. Xiang Y, Balasubramanian B, Wang M, Lan T, Sen S, Chiang M (2013, September) Self-
adaptive, deadline-aware resource control in cloud computing. In: 2013 IEEE 7th international
conference onSelf-adaptation and self-organizing systems workshops (SASOW), pp 41–46.
IEEE
16. Wu X, Gu Y, Tao J, Li G, Jayaraman PP, Sun D, et al. (2016) An online greedy allocation of
VMs with non-increasing reservations in clouds. J Supercomput72(2):371–390
17. Shen H, Zhao H, Yang Z (2016) Adaptive resource schedule method in cloud computing system
based on improved artificial fish swarm. J Comput Theor Nanosci 13(4):2556–2561
18. Li D, Chen C, Guan J, Zhang Y, Zhu J, Yu R (2016) DCloud: deadline-aware resource allocation
for cloud computing jobs. IEEE Trans Parallel Distrib Syst 27(8):2248–2260
Dynamic Congestion Control Routing
Algorithm for Energy Harvesting
in MANET

M. M. Karthikeyan and G. Dalin

Abstract Energy harvesting (EH) is seen as the key enabling innovation for the mass
sending of mobile ad hoc networks (MANETs) for the IoT applications. Effective EH
methodologies could oust the necessities of successive energy source substitution,
subsequently offering a near interminable system working condition. Advances in
EH systems have moved the plan of routing conventions for EH-MANET from
“energy-mindful” to “energy-harvesting-mindful.” Right now, Dynamic Congestion
Control Routing Algorithm is using Energy Harvesting in MANET. The presentation
of the Dynamic Congestion Control Routing Algorithm-Based MANET Algorithm
scheme is evaluated using various measurements, for instance, Energy Consumption
Ratio, Routing Overhead Ratio, and Throughput Ratio.

Keywords Dynamic congestion · Energy harvesting · Routing overhead ·


Throughput · Energy consumption · MANET

1 Introduction

Congestion occurs in impromptu systems with compelled resources. In such a


system, package transmission habitually encounters crash, impedance, and blur-
ring, as a result of shared radio and dynamic topology. Transmission botches incon-
venience the system load. Starting late, there is a growing interest for supporting
sight and sound correspondences in specially appointed systems [1]. The immense
constant arrangements are in impacts, information move limit concentrated, and

M. M. Karthikeyan (B)
Ph.D Research Scholar, PG and Research Department of Computer Science, Hindusthan College
of Arts & Science, Coimbatore, Tamil Nadu, India
e-mail: mmk.keyan90@gmail.com
G. Dalin
Associate Professor, PG and Research Department of Computer Science, Hindusthan College of
Arts & Science, Coimbatore, Tamil Nadu, India
e-mail: profgdalin@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 15
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_2
16 M. M. Karthikeyan and G. Dalin

congestion committed. Congestion in specially appointed systems prompts bundle


mishap, transfer speed corruption, and lounges around and energy for congestion
recovery. Congestion-mindful routing show jolts preemptively settle congestion
through bypassing the blocked associations. To restrict congestion in organize routing
calculations are used. The vehicle is a system layer answer to offer isolated help in
stopped up sensor systems. In proactive conventions, courses between each two
centers are developed ahead of time in spite of the way that no transmission is sought
after. This philosophy isn’t sensible for tremendous systems considering the way that
various unused courses regardless of everything need be kept up and the incidental
reviving may obtain overwhelming taking care of and correspondence overhead. In
any case, when an association is isolates on account of disillusionment or center
point conveys ability, which as often as possible occurs in MANETs, the deferral
and overhead in view of the new process exposed may be tremendous [2]. To view
this problem, different approaches of objective is used in multipath route conven-
tions. A substitute way can be found quickly if the present way is broken. In addition,
the use of different ways doesn’t change route performance is smarter than single-
path aside from on the off chance that we use an outstandingly huge no of ways
are more expensive and subsequently infeasible [3]. Their estimation for arranging
route conventions: congestion-versatile routing Vs congestion un-adaptive routing.
Most by far of the current routing conventions have a spot with the consequent social
event. A bit of the current routing conventions are congestion-mindful, and a very
few are congestion-versatile (Fig. 1).
In congestion-mindful routing systems, congestion is contemplated just while
developing another course which proceeds as before until convey ability or dissatis-
faction achieves withdrawal. In congestion-versatile routing, the course is adaptively
factor reliant on the congestion status of the system. Routing may let congestion
happen which is later recognized and dealt with by congestion control [4]. In multi-
rate impromptu systems, different data rates will probably prompt certain courses
having different associations with different data rates. In the occasion that lower
data rate joins follow higher data rate joins, groups will create at the center heading
the lower data rate associate, prompting long coating delays. A further explanation
behind the congestion is interfaced constancy [5]. If associations break, conges-
tion is extended on account of group salvage. Introductory a congestion-mindful
routing metric that uses the retransmission check weighted channel deferral and
pad lining delay, with a tendency of less stopped up high throughput interfaces with
improve channel use. By then, to avoid the congestion issue we can apply the Conges-
tion Aware Routing show for portable specially appointed systems (CARM) CARM
applies an association data rate plan approach to manage hinder courses with screwed
up association data rates [6]. In CRP, every center point appearing on a course alerts
its past center point when slanted to be blocked. The past center by then uses an
"avoid" course bypassing the potential congestion to the first non-blocked center
on the course. Traffic will be part probabilistically over these two courses, funda-
mental and avoid, right now diminishing the chance of congestion occasion. CRP is
a congestion-versatile unicast routing show for MANETs. Every center appearing
on a course alerts its past center point when slanted to be obstructed [7]. The past
Dynamic Congestion Control Routing Algorithm … 17

Fig. 1 Architecture of congestion-aware routing algorithm

center uses an "evade" course for bypassing the potential congestion region to the
first non-blocked center on the basic course. Traffic is part probabilistically over these
two courses, fundamental and evades, therefore sufficiently decreasing the chance
of congestion occasion. The congestion watching will use various measurements
to screen the congestion status of the center points. When the no of packs setting
off to the centers outperforms its passing on limit, the center gets stopped up and
its beginnings losing packages. Manager among these is the degree of all packages
discarded for the nonappearance of support space, the run of the mill line length,
the quantity of groups facilitated out and retransmitted, the ordinary package delay,
and the customary deviation of pack delay. In all cases, extending numbers exhibit
creating congestion. Any of these strategies can work with CRP by and by [8].

2 Literature Survey

Adaptive congestion control is an instrument with a learning limit. This learning


limit empowers the system to adjust to dynamically changing system conditions to
keep up relentlessness and staggering execution. At this moment, sent to the sender
to change the sending rate, as indicated by the present system conditions [9]. It is
18 M. M. Karthikeyan and G. Dalin

versatile regarding creating deferrals, data move limit and different clients using the
system. ACP is depicted by its learning limit which connects with the show to adjust
to the fundamentally dynamic system condition to keep up security and unbelievable
execution. This learning limit is appeared by a novel estimation calculation, which
"learns" about the quantity of streams using each relationship in the system [10].
Merits:

• An adaptive routing procedure can improve execution, as observed by the system


client.
• An adaptive routing technique can help in congestion control. Since an adaptive
routing technique will, as a rule, change loads, it can concede the beginning of
unprecedented congestion [9].

Demerits:
• The routing choice is dynamically strange; in this way, the preparing trouble on
arrange center points increments.
• In most cases, adaptive frameworks depend upon status data that is amassed at one
spot in any case utilized at another. There is a tradeoff here between the possibility
of the data and the extent of overhead [10].
• An adaptive technique may respond too rapidly, causing congestion-passing on
affecting, or too bit by bit, being immaterial.

Uddin et al. [11] proposed an energy-proficient multipath routing show for versa-
tile specially appointed mastermind to use the health work recalls this particular
issue of energy consumption for MANET by applying the fitness function system
to streamline the energy consumption in Ad hoc On-Demand Multipath Distance
Vector (AOMDV) routing show. The proposed show is called Ad hoc On-Demand
Multipath Distance Vector with the Fitness Function (FF-AOMDV). The wellbeing
work is utilized to locate the ideal way from the source to the objective to lessen
the energy consumption in multipath routing. The presentation of the proposed FF-
AOMDV show was overviewed by utilizing Network Simulator Version 2 (NS-2),
where the show was separated and AOMDV and Ad hoc On-Demand Multipath
Routing with Life Maximization (AOMR-LM) conventions, the two most prominent
conventions proposed here.
Merits:
• FF-AOMDV figuring has performed unmistakably better than both AOMR-LM
and AOMDV in throughput, pack transport degree and starts to finish delay.
• Performed well against AOMDV for proportioning more energy and better
structure lifetime.

Demerits:
• More Energy consumption and Less Network lifetime.
Dynamic Congestion Control Routing Algorithm … 19

Zhang et al. [12] proposed energy-productive communicate in portable systems


haphazardness novel energy and information transmission fruitful communicate
conspire named the energy-effective communicate plot, which can adjust to lively
changing structure topology and channel abnormality [12]. The structure of the
communicate plot depends upon a through and through assessment of the favorable
circumstances and insufficiencies of the by and to a great extent utilized scourge
communicate plans.
Merits:
• An energy-productive communicate contrive is proposed, affected by the appraisal
of the information dispersing process utilizing the SIR plot;
• Analytical results are appeared on the bit of focus focuses that get the informa-
tion communicate by a fearless focus point in a system utilizing the proposed
communicate plot.

Demerits:
• Right when the structure is familiar with empowering multi-hop interchanges
among center points, or in multi-skip networks with compelled establishment
support.
Lee et al. [13] proposed an assembled TDMA space and power orchestrating
plans which develop energy effectiveness (EE) considering quality-of-service (QoS)
utility, and this arrangement redesigns the unfaltering quality and survivability of
UVS key MANET. The proposed calculation has three stages Dinkelbach strategy,
animating the Lagrangian multiplier and the CCCP procedure. To update the EE,
the length of a TDMA design is dynamically balanced. The drawback of this show
is that as the all out concede stretches out as appeared by the edge round, it cannot
ensure the diligent transmission.
Merits:
• The proposed calculation is certified by numerical outcomes.
• Those ensure least QoS and show the really unprecedented energy productivity.

Demerits:
• Using TDMA progression is that the clients increase some predefined encounters
opening.
• When moving from one cell site to other, if all the availabilities right presently
full the client may be isolated.
Jabbar et al. [14] proposed cream multipath energy and QoS—mindful improved
association state routing show adaptation 2 (MEQSA-OLSRv2), which is made to
adjust to the challenges showed up by constrained energy assets, mobility of focus
focuses, and traffic congestion during information transmission in MANET-WSN
association conditions of IoT systems. This show utilizes a middle point rank as
exhibited by multi-criteria hub rank estimation (MCNR). This MCNR totals various
20 M. M. Karthikeyan and G. Dalin

parameters identified with energy and nature of administration (QoS) into a cautious
estimation to fundamentally reduce the multifaceted thought of different obliged
contemplations and dodge the control overhead acknowledged via independently
communicating different parameters. These estimations are the middle’s lifetime,
remaining battery energy, focus’ inert time, focus’ speed, and line length. The MCNR
metric is used by another association quality assessment work for different course
computations.
Merits:
• MEQSA-OLSRv2 maintained a strategic distance from the assurance of focuses
with high flexibility.

Demerits:
• Audiences whine about information over-burden, and they can be overpowered
and feel that it is annoying.
• The quickly changing of progression has upset the gathering’s exercises.
Kushwaha et al. [15] proposed a novel response for move server load starting with
one server then onto the accompanying server. Energy effectiveness is a basic factor
in the activity of specially appointed systems. The issue of sorting out routing show
and overpowering nature of impromptu headway may decrease the life of the middle
point like the life of the system.
Merits:
• MANETs over networks with a fixed topology join flexibility (an impromptu
system can be made any place with versatile devices).
• Scalability (you can without a lot of a stretch add more focuses to the system) and
lower organization costs (no persuading inspiration to gather a framework first).

Demerits:
• Mobile focus focuses allow the land to pass on and plan a brief system.
• The significant issue with the impromptu community focuses is resource goals.

3 Proposed Work

3.1 Dynamic Congestion Control Routing Algorithm

DCCR is unicast routing show process for mobile ad hoc network. It decreases to
organize congestion by techniques for diminishing pointless flooding of bundles
and finding a without congestion route starts the source and the objective. At the
present time, present the all out plan and an all around appraisal of the DCCR show.
Exactly when a source have expected to communicate a data bundle to an objective,
Dynamic Congestion Control Routing Algorithm … 21

the DCCR show initially builds up a without congestion set (CFS) to relate both
one-bounce and 2-jump neighbors. By then, the source begins the course exposure
methodology using the CFS to perceive a sans congestion path to the objective. In
case, the DCCR show cannot build up a CFS in view of the system being as of
now stopped up, it cannot begin the course disclosure process. In any case, when
another course has been set up, the transmission of data packs will continue. Right
now, essential objective of DCCR is to find a without congestion course between the
source and the objective. In doing accordingly, it reduces the overhead and flooding
of groups. The DCCR show contains the going with parts:
1. Technique process of dynamic congestion,
2. Construction of CFS,
3. Routing way of congestion free,
4. Path discovery of congestion free.
The proposed calculation controls arrange congestion by strategies for dimin-
ishing the futile flooding of bundles and finding a sans congestion path between the
source and the objective. The EDCDCR framework at first perceives the congestion,
by then forms a sans congestion set (CFS) to relate both one-bounce and two-jump
neighbors and the source begins the course disclosure system using the CFS to recog-
nize a sans congestion route to the objective. The proposed calculation involves three
portions to perceive and control congestion on MAC layer in MANET fuses: dynamic
congestion detection.
1. CFS construction,
2. Route discovery congestion free.

3.2 Dynamic Congestion Detection

The detecting congestion is based on estimation of the link stability (LS), residual
bandwidth (RB), and residual battery power (RP).
Link Stability
The link stability (LSD) is used to define link’s connection strength. In MANET, to
improve QoS LSD is essential and is defined as:

LSD = Mobility factor/Energy factor

LSD characterizes the level of the connection dependability. The higher estimation
of LSD, higher is the dependability of the connection and more noteworthy is the
duration of its reality. In this way a course having the whole connection with LSD >
LSD thr is practical.
22 M. M. Karthikeyan and G. Dalin

4 Experimental Results

Energy Consumption Ratio


Figure 2 exhibits the examination of Energy Consumption Ratio is characterized
as comprise of low force gadgets that are disseminated in topographically confined
territories. The energy consumption is a significant worry for MANET. Their Energy
Consumption Ratio CODA esteems are commonly characterized as between 67.2 and
75. Energy Consumption Ratio CCF values are characterized as between 57 to 69.
Energy consumption ratio proposed DCCRA values are characterized as between 83
and 93.6. These outcomes are reproduced utilizing NS2 test system. This outcome

Fig. 2 Comparison chart of energy consumption ratio


Dynamic Congestion Control Routing Algorithm … 23

shows a reliable outcome for proposed novel procedure. Consequently, the proposed
strategy created a superior improvement energy consumption ratio results. Thus, the
proposed strategy delivered a noteworthy improvement in results.
Routing Overhead Ratio
Figure 3 shows the examination of routing overhead ratio is characterized as routing
and information parcels need to have a similar system data transfer capacity the vast
majority of the occasions, and henceforth, routing bundles are viewed as an overhead
in the system. Their current CODA esteems are commonly characterized as between
39 and 58, and another current CCF values are commonly characterized as between
26.77 and 44.56. Proposed DCCRA values are characterized as between 66 and
85. These outcomes are mimicked utilizing NS2 test system. This outcome shows a
predictable outcome for proposed novel procedure. Consequently, the proposed tech-
nique delivered a superior improvement routing overhead ratio results. Subsequently
the proposed strategy delivered a huge improvement in results.
Throughput Ratio
Figure 4 exhibits the examination of average throughput ratio. Normal throughput
ratio is characterized as the proportion of parcels effectively got to the all out sent.
Their current CODA esteems are commonly characterized as between 0.09 and 0.3.
Existing CCF values are commonly characterized as between 0.04 and 0.22. Proposed
DCCRA values are characterized as between 0.13 and 0.45. These outcomes are
recreated utilizing NS2 test system. This outcome shows a reliable outcome for
proposed novel procedure. Consequently, the proposed technique delivered a superior
improvement average throughput ratio results. Subsequently, the proposed strategy
created a huge improvement in results.

Fig. 3 Comparison chart of routing overhead ratio


24 M. M. Karthikeyan and G. Dalin

Fig. 4 Comparison chart of throughput ratio

5 Conclusion

Congestion-mindful adaptive routing can effectively improve the presentation of


networks due to its ability to exactly envision arrange congestion and choose perfect
routing decisions. At the present time have examined the possibility of congestion
mindful adaptive routing, favorable circumstances and burdens of congestion mindful
adaptive routing, dynamic congestion control routing algorithm, and scheduling
based on 1-Hop interference effect. We consider throughput-perfect booking plans
for remote systems in a passed on way.

References

1. Arora B, Nipur (2015) An adaptive transmission power aware multipath routing protocol for
mobile ad hoc networks © 2015 Published by Elsevier. https://creativecommons.org/licenses/
by-nc-nd/4.0/
2. Divya M, Subasree S, Sakthivel NK (2015) Performance analysis of efficient energy routing
protocols in MANET. pp. 1877–0509 © 2015 The Authors. Published by Elsevier. https://cre
ativecommons.org/licenses/by-nc-nd/4.0/
3. Sandeep J, Satheesh Kumar J (2015) Efficient packet transmission and energy optimization in
military operation scenarios of MANET, pp 1877–0509 © 2015 The Authors. Published by
Elsevier. https://creativecommons.org/licenses/by-nc-nd/4.0/
4. Kim D, Kim, J-h, Moo C, Choi J, Yeom I (2015) Efficient content delivery in mobile ad-hoc
networks using CCN, pp 1570–8705 © 2015 Elsevier. http://dx.doi.org/https://doi.org/10.1016/
j.adhoc.2015.06.007
5. Anish Pon Yamini K, Suthendranb K, Arivoli T (2019) Enhancement of energy efficiency using
a transition state mac protocol for MANET, pp 1389–1286 © 2019 Published by Elsevier.
https://doi.org/https://doi.org/10.1016/j.comnet.2019.03.013
6. Taheri S, Hartung S, Hogrefe D (2014) Anonymous group-based routing in MANETs, pp
2214–2126 © 2014 Elsevier Ltd. http://dx.doi.org/https://doi.org/10.1016/j.jisa.2014.09.002
Dynamic Congestion Control Routing Algorithm … 25

7. Sakthivel M, Palanisamy VG (2015) Enhancement of accuracy metrics for energy levels in


MANETs, pp 0045–7906 © 2015 Elsevier Ltd. http://dx.doi.org/https://doi.org/10.1016/j.com
peleceng.2015.04.007
8. Ragul Ravi R, Jayanthi V (2015) Energy efficient neighbour coverage protocol for reducing
rebroadcast in MANET. © 2015 The Authors. Published by Elsevier. https://creativecommons.
org/licenses/by-nc-nd/4.0/
9. Gawas MA, Gudino LJ, Anupama KR (2016) Cross layer adaptive congestion control for
best-effort traffic of IEEE 802.11e in mobile ad hoc networks. In: 2016 10th international
symposium on communication systems, networks and digital signal processing (CSNDSP).
doi: https://doi.org/10.1109/csndsp.2016.7574042
10. Shafigh AS, Veiga BL, Glisic S (2016) Cross layer scheme for quality of service aware multicast
routing in mobile ad hoc networks. Wireless Netw 24(1):329–343. https://doi.org/10.1007/s11
276-016-1349-1
11. Uddin M, Taha A, Alsaqour R, Saba T (2016) Energy efficient multipath routing protocol for
mobile ad-hoc network using the fitness function, pp 2169–3536 (c) 2016 IEEE
12. Zhang Z, Mao G, Anderson BDO (2015) Energy efficient broadcast in mobile networks subject
to channel randomness, pp 1536–1276 (c) 2015 IEEE
13. Lee JS, Yoo Y-S, Choi HS, Kim T, Choi JK (2019) Energy-efficient TDMA scheduling for
UVS tactical MANET, pp 1089–7798 (c) 2019 IEEE
14. Jabbar WA, Saad WK, Ismail M (2018) MEQSA-OLSRv2: a multicriteria-based hybrid multi-
path protocol for energy-efficient and QoS-aware data routing in MANET-WSN convergence
scenarios of IoT, pp 2169–3536 (c) 2018 IEEE
15. Kushwaha A, Doohan NV (2016) M-EALBM: a modified approach energy aware load
balancing multipath routing protocol in MANET. 978-1-5090-0669-4/16/$31.00 © 2016 IEEE
Predictable Mobility-Based Routing
Protocol in Wireless Sensor Network

G. Sophia Reena and M. Punithavalli

Abstract While considering the routing process in mobile wireless sensor network
as the greatest complex task, it would get affected mainly based on mobility behavior
of nodes. Successful routing ensures the increased network performance by sending
packets without loss. This is confirmed in the previous research work by introducing
the QoS-oriented distributed routing protocol (QOD) which measures the load level
of channels before data transmission; thus, the successful packet transmission is
ensured. However, this research method does not concentration prediction about
mobility behavior which would cause the path breakage and network failure. It is
completely determined in this proposed method by specifically presenting predictable
mobility-based routing scheme (PMRS) in which successful data transmission can
be guaranteed by avoiding the path breakage due to mobility. In this work, node
movement will be predicted based on node direction and motion angles toward the
destination node. By predicting the node mobility in the future, it is concluded that
whether the node is nearest to the destination or not. Thus, the better route path can be
established for deploying a successful data transmission. Based on node movement,
the optimal cluster head would be selected, and thus, the shortest and reliable path
can be achieved between source and destination nodes. In this work, cluster head
selection is prepared by using the genetic algorithm, which can ensure the nodes
reliable transmission without any node failure. Finally, data transmission is done
through cluster head node by using time division multiple access (TDMA) method.
The overall implementation of the proposed scheme is on the NS2, from which it is
shown that this technique provides best possible result than the other recent schemes.

G. Sophia Reena (B)


Department of Information Technology, PSGR Krishnammal College for Women, Peelamedu,
Coimbatore, India
e-mail: sophiareena@psgrkcw.ac.in
M. Punithavalli
Department of Computer Applications, School of Computer Science and Engineering, Bharathiar
University, Coimbatore, India
e-mail: punithavalli@buc.edu.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 27
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_3
28 G. Sophia Reena and M. Punithavalli

Keywords Mobility prediction · Mobile wireless sensor network · Cluster head ·


Reliability · Location prediction

1 Introduction

Wireless sensor network (WSN) is becoming widespread, where a lot of recent


research work tends to be particularly focused on the specific applications. Actual-
time communication exchange is one of the necessity researches that confront based
on the type of application. In these kind of applications, data packets beyond the
time limit are considered to affect the performance and quality of the system [1]. In
order to handle this issue, it is crucial to analyze the appropriateness and relevance of
actual-time communications in WSN with an effect on the amount of suppositions
and the exceptional assessment standards so as to order and deduce the amount of
normal issue [2].
To be able to gain the previously discussed goals, actual-time communications in
WSN are categorized into hard, firm, and easy real-time categories devoid of failure
of generality, such as conversational classification. Hard real-time communications
point out that the lost time limit has an effect on the function of the system by means
of rooting the complete system to fail [3]. As a result, in the most horrible scenario,
it takes time to limit the end-to-end delay contained by the time limit [4].
In order to concentrate on the aspect of mobility dimension of WSN, it is extremely
significant to recognize how traditional assumptions regarding dynamically orga-
nized WSNs shift with the implementation of mobile units [5]. These kind of networks
are extremely possess potential to function better that the static WSNs because they
are inclined to extend the network time span, minimize the usage of services, endow
with additional data, and achieve higher processing [6].
Mobility has turned out to be a critical field of study for the WSN culture over
the last few years. The growing capabilities and thus the dropping prices of mobile
sensors build potential and sensible mobile sensor networks [7]. Since the topology
changes too much, pre-structure of communication distribution networks won’t be
of much facilitation at this point. Recurrent position notifications from a mobile node
may result in unnecessary drainage of the battery bid of the sensor node, in addition,
may also augment collisions [8].
Various routing protocols [9–11] are suggested by many scholars, diverse stan-
dards are endorsed, and design problem [12]. Usually routing method is extremely
difficult in a mobile network and in MWSN it is even more complicated because the
system nodes are low-power, value-efficient mobile devices with limited resources
[8].
In this method, the path breakage due to mobility can be predicted by introducing
the method, namely predictable mobility-based routing scheme (PMRS), in which
successful data transmission can be guaranteed. This routing protocol groups the
sensors into various similar clusters, there is a cluster head (CH) in each cluster that
accumulates data from every node in their cluster. Selection of the cluster head is
Predictable Mobility-Based Routing Protocol in Wireless … 29

rendered using genetic algorithm (GA). The CH possibly will regularly accumulate
data from the sensors or TDMA scheduling might be performed to collect the data
from the sensors [13].

2 Predictable Mobility-Based Routing Protocol

In the proposed research method, predictable mobility-based routing scheme (PMRS)


is developed for deploying an effective data transmission that can be assured by
avoiding the path breakage due to mobility. In this work, the upcoming node move-
ment will be predicted based on node direction and motion angles toward the desti-
nation node. By predicting the node mobility in upcoming, it is determined that
whether the node is nearest to destination or not. Thus, the improved route path
can be established for the successful data transmission. Based on node movement
optimal cluster head would be selected, thus the shortest and reliable path can be
attained between source and destination nodes. In this work, cluster head selection is
completed by using the genetic algorithm which can confirm the nodes reliable trans-
mission without node failure. Finally, data transmission is completed through cluster
head node using time division multiple access (TDMA) method. The processes
involved in the proposed research technique are listed below:
• Nodes mobility prediction based on nodes direction and motion angles
• Mobility-based cluster head selection using genetic algorithm
• Reliable data transmission using time division multiple access method.

2.1 Nodes Mobility Prediction

The update protocol is essential for the dissemination of knowledge about geograph-
ical location and services. Measured resources, such as battery power, queuing space,
processor speed, range of transmission.
1. Type 1 update: A Type 1 update is regularly produced. The time between subse-
quent changes to form 1, i.e., the time remains set at the specified frequency.
Otherwise the frequency of the type 1 modified may differ linearly between a
maximum (f max ) and a minimum (f min ) of the threshold, but node v is defined.
The characteristics are shown in Fig. 1.
2. Type 2 update: Type 2 updates are provided in case if there is a significant shift
in the node’s pace or path. The mobile node can assess the approximate location
in which it is positioned at a certain time in its current record (specifically from
latest information) (Fig. 2).
Subsequently, anticipated position (x e , ye ) is provided by the following equations:

xe = x + v.(te − t). cos θ (1)


30 G. Sophia Reena and M. Punithavalli

Fig. 1 Deviation of update


frequency of type 1 update
together with velocity of the
node

Fig. 2. Check at time t c whether type 2 update must be produced

ye = y + v.(te − t). sin θ (2)

2.1.1 Predictions

When connecting to a specific target b, source a must initially determine the desti-
nation b’s geographic position as well as the intermediate hops when the first packet
enters the individual nodes. This phase therefore engrosses a spot, in addition to the
prediction of propagation delay. It is to be observed that the location prediction is
effectively employed for the purpose of determining the geographical position of any
node, either an intermediary node or target in the future when the packet enters it at
a given time t p .
For updates containing node motion direction information, only 1 preceding
update is necessary if the position is to be predicted. In the case of a given node,
the measurement of the projected position is then exactly the same as the periodic
measurement of the actual position in node b itself.
Predictable Mobility-Based Routing Protocol in Wireless … 31

2.1.2 Genetic Algorithm

An adaptive genetic algorithm (GA) was announced by J.Holland for usage as search
algorithm. GAs effectively handled several fields of applications and were capable of
resolving an extensive array of complicated numerical optimization complications.
GAs need no gradient details and is comparatively less possible to be cornered in
local minima on multi-modal search spaces. GAs create to be reasonably not sensitive
to the existence of noise. The pseudocode of the GAs method is given below:

Pseudo code for Genetic Algorithm


begin GAs
g = 0 generation counter
Initialize population
Compute fitness for population P (g)
While (Terminating condition is not reached) do

g=g+1

Crossover P (g)
Mutate P (g)
Evaluate P (g)
end while
end GA

The above problem is encoded via gas within chromosomes which represent every
possible solution.

2.1.3 Local Search

The combinational optimization problem is described through (S, g) in which S


signifies the set of every possible results and g is defined as the objective function
which can plot every constituents in S to a given actual value. The end result is
finding a solution sin S which will minimize the objective function g. The problem
is visualized through the following equation:

Min g(s), s S

where N represents the function of the neighborhood or problem format (S, g) where
it is represented from S to its powerset by the given mapping format:
32 G. Sophia Reena and M. Punithavalli

N: S

N(s) is also symbolic of the value of the neighborhoods and it contains each possible
solution which is reached via a single move from s. The move represents operators
who convert multiple solutions with minute changes. x then represents the solutions
which is also known as the local minimum of g in accordance with the neighborhood
N if:

g(x) g(y), y N (x)

The process of minimizing cost functions g or the local search function is the
consecutive strides in each of which the existing solution x is being exchanged by a
solution y in order that:

g(y) < g(x), y N (x)

Maximum local search starts with arbitrary solution and end with the selection of
local minimums. There are multiple ways to conduct local searches and the complex-
ities in local search computations are dependent on neighborhood set sizes and its
approximate time required to evaluate moves. It is thus noted that neighborhood size
grows in size and this effects the time required to search for it, in order to determine a
better local minima. Local search makes use of concept of state space, neighborhood,
and objective function.
i. State space S: It is the collection of potential states that can be extended at some
point in the search.
ii. Neighborhood N(s): It is the collection of states, neighbors that during which
can be extended from the state, s in one step.
iii. Objective function f (s): It is a value that signifies the excellence of the state, s.
The best possible value of the function is attained during at a stage where s is a
solution.
Pseudocode for local search is as follows:
Select an initial state s0eS.
While s0 is not a solution D0.
Select by some heuristic, seN(s0) such that f(s) > f(s0)

Replace s0 by s.
Predictable Mobility-Based Routing Protocol in Wireless … 33

2.1.4 Genetic Algorithm Using Local Search.

In genetic algorithm, four parameters are accessible. The size of the population,
cross-probability, mutation probability, and weight accuracy of influence factors.
Figure 3 shows the flowchart for proposed method.

3 Reliable Data Transmission Using TDMA

Our planned method uses inter-levels to synchronize signal transmission time in


accordance with the local time distinction among any sensor node together with
its parent node and to create energy-efficient and proactive TDMA change rather
than taking into account the concept of global clock synchronization and position
information from nodes between arbitrarily simulated sensor nodes.
One sensor with even numerated Id can only steal slots at some point in odd slot
number, according to our proposed algorithm (Fig. 4).

4 Results and Discussion

The performance evaluation parameters considered here are packet delivery ratio,
throughput, end-to-end delay, and network lifetime is evaluated by using existing
MADAPT algorithm, previous work QoS-aware channel load-based mobility adap-
tive routing protocol (QoS-CLMARP) and the proposed work on predictable
mobility-based routing scheme (PMRS).
The results of end-to-end delay are illustrated in Fig. 5. In existing MADAPT
and QoS-CLMARP method, the end-to-end delay is lower. In case of the proposed
system, the end-to-end delay is improved considerably by PMRS method (Fig. 5).
The results of network lifetime are illustrated in Fig. 6. In existing MADAPT
and QoS-CLMARP method, the network lifetime is lower. In proposed system, the
network lifetime is improved significantly by PMRS method. In Fig. 7, the packet
distribution ratio performance is substantially improved by the PMRS approach in
the proposed system.
In Fig. 8, the proposed PMRS accomplishes less packet loss ratio of compared
with other two methods.

5 Conclusion

In this work, node movement in future will be predicted based on node direction and
motion angles toward the destination node. By predicting the node mobility in future,
it is concluded that whether the node is nearest to destination or not. Thus, the better
34 G. Sophia Reena and M. Punithavalli

Fig. 3 Flowchart of cluster


head selection process Begin

Chromosome’s coding

Initial population

Calculating the fitness value of


chromosome combined by weight values

Perform selection, crossover, and


mutation operators

Local search

Generation of new population

No
Meet
optimization

Yes

Get optimal weight value

End
Predictable Mobility-Based Routing Protocol in Wireless … 35

Fig. 4 Slot sharing

Fig. 5 End-to-end delay


comparison

Fig. 6 Network lifetime


comparison
36 G. Sophia Reena and M. Punithavalli

Fig. 7 Packet delivery ratio


comparison

Fig. 8 Packet loss ratio


comparison

route path can be established for the successful data transmission. Based on node
movement optimal cluster head would be selected, thus the shortest and reliable
path can be achieved between source and destination nodes. In this work, cluster
head selection is completed by using the genetic algorithm which can confirm the
nodes reliable transmission without node failure. Finally, data transmission is finished
through cluster head node using time division multiple access (TDMA) method.

References

1. Kim BS, Park H, Kim KH, Godfrey D, Kim KI (2017) A survey on real-time communications
in wireless sensor networks. Wireless Commun Mob Comput
2. Oliver R, Fohler G (2010) Timeliness in wireless sensor networks: common misconceptions.
In: Proceedings of international workshop on real-time networks, July 2010
3. Collotta M, Costa DG, Falcone F, Kong X (2016) New challenges of real-time wireless sensor
networks: theory and applications. Int J Distrib Sens Netw 12(9)
Predictable Mobility-Based Routing Protocol in Wireless … 37

4. Zhan A, Xu T, Chen G, Ye B, Lu S (2008) A survey on realtime routing protocols for wireless


sensor networks. In: Proceedings of China wireless sensor network conference, 2008
5. Amundson I, Koutsoukos XD (2009) A survey on localization for mobile wireless sensor
networks. In: Mobile entity localization and tracking in GPS-less environnments, pp 235–254.
Springer, Berlin
6. Rezazadeh J, Moradi M, Ismail AS (2012) Mobile wireless sensor networks overview. Int J
Comput Commun Netw 2(1):17–22
7. Ekici E, Gu Y, Bozdag D (2006) Mobility-based communication in wireless sensor networks.
Commun Mag IEEE 44(7):56–62
8. Sara GS, Sridharan D (2014) Routing in mobile wireless sensor network: A survey. Telecommun
Syst 57(1):51–79
9. Asad M, Nianmin Y, Aslam M (2018) Spiral mobility based on optimized clustering for optimal
data extraction in WSNs. Technologies 6(1):35
10. Poulose Jacob K, Paul V, Santhosh Kumar G (2008) Mobility metric based LEACH-mobile
protocol
11. Khandnor P, Aseri T (2017) Threshold distance-based cluster routing protocols for static and
mobile wireless sensor networks. Turkish J Electr Eng Comput Sci 25(2):1448–1459
12. Chen C, Ma J, Yu K (2006) Designing energy efficient wireless sensor networks with mobile
sinks. In: Proceedings of WSW’06 at SenSys’06, Colorado, USA, 31 October 2006
13. Jain SR, Thakur NV (2015) Overview of cluster based routing protocols in static and mobile
wireless sensor networks. In: Information systems design and intelligent applications, pp 619–
626. Springer, New Delhi
Novel Exponential Particle Swarm
Optimization Technique for Economic
Load Dispatch

Nayan Bansal, Surendrabikram Thapa, Surabhi Adhikari,


Avinash Kumar Jha, Anubhav Gaba, and Aayush Jha

Abstract Due to vicious competition in the electrical power industry, growing envi-
ronmental issues and with an ever-increasing demand for electric energy, optimiza-
tion of the economic load dispatch problem has become a compulsion. This paper
emphasizes on a novel modified version of PSO to obtain an optimized solution of
the economic load dispatch problem. In the paper, exponential particle swarm opti-
mization (EPSO) is introduced and comparison has been performed on the basis of
speed of convergence and its stability. The proposed novel method of exponential
PSO has shown better performance in the speed of convergence and its stability.

Keywords Exponential particle swarm optimization (EPSO) · Soft computing ·


Economic load dispatch (ELD) · Variants · Convergence

N. Bansal (B) · A. Gaba


Department of Electrical Engineering, Delhi Technological University, New Delhi, India
e-mail: nayan.7991@gmail.com
A. Gaba
e-mail: anubhavgaba@gmail.com
S. Thapa · S. Adhikari
Department of Computer Science and Engineering, Delhi Technological University, New Delhi,
India
e-mail: surenthapa5803@gmail.com
S. Adhikari
e-mail: suravi.prasiddi@gmail.com
A. K. Jha · A. Jha
Department of Civil Engineering, Delhi Technological University, New Delhi, India
e-mail: avinash.jha.6696@gmail.com
A. Jha
e-mail: ayushjha452@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 39
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_4
40 N. Bansal et al.

1 Introduction

The present-day power system indulges a stack of internally connected electrical


networks, with inflation in the prices of fuel being used in thermal power plants;
hence, it is imperative to minimize the cost of the generating unit. The main motive
of the contemporary power system is to render high-grade electrical power to the
buyer at the cheapest rate, considering different constraints of the power system and
units being generated. Thus, it creates a base for the ELD problem, which concen-
trates on finding the actual power generation of each interconnected power plant
and diminishes the total cost of fuel. The main problem generated here is of varied
inequality and equality constraints which hammers it out into a complicated problem
[1].
Regular methods like the Newton method, gradient method, and lambda itera-
tion method can work out a monotonically and linearly increasing function. But,
the curve of fuel cost in the ELD problem becomes sharp and extremely nonlinear
because of factors like ramp rate, curve inequality constraints, valve point effect, and
generator efficiency and prohibited operating constraints making ELD a non-convex
and complex issue, which is cumbersome for solving the conventional method. AI,
stochastic algorithms, and genetic algorithms like PSO can resolve these highly
complicated problems in the affinity of global optimal solution.
PSO is an evolutionary and mathematical process that was influenced by the birds
and fishes’ colonies. Fewer parameters and high convergence speed make PSO a
highly favorable option [2]. In this paper, the application of PSO will be studied by
varying the variants and also applying it on the problem of ELD, differentiating each
process based on convergence stability and speed [3].

2 Particle Swarm Optimization

PSO is an evolutionary and intelligent process that is influenced by the fish colonies
and the congregation of birds. The symbiotic cooperation between inhabitants of
the society is the principle behind the revolutionary computational technique. PSO
requires fewer parameters for evaluation and has more convergence speed than
any arithmetic techniques. All through these years, there has been massive anal-
ysis on the PSO technique and algorithms which are acquired based on the better-
ment of diversity of population and parameter adjustment. The primary parameter
is used for finding the equilibrium between global searching and local searching
[4]. These consist of algorithms such as CPSO, DPSO, and LWPSO. To shun away
from premature convergence, the second parameter is employed for obtaining algo-
rithms. For significant improvement of performance, techniques like natural selection
are used [5]. In this paper, the primary focus will be on our first parameter due to
less computational cost and relatively lesser complexities; efficiency of parameter
strategies.
Novel Exponential Particle Swarm Optimization Technique … 41

In the early moderation of PSO, there was an introduction of the inertia weight
coefficient to the velocity update equation. There was a decrement of inertia weight
in a constant linear manner (LWPSO) [6]. This technique helped increase the conver-
gence speed and obtain a balance between local and global search exploitation. But a
compromised situation with local exploration was observed because of the reduction
of inertia weight in a constant linear way. Thus, a further modification was made,
and inertia weight was decreased by a damping factor rather than a linear manner
[7]. This process increased the convergence speed but affected the balance between
local and global probing of global optimum value. Further research introduced a
constriction factor, thus, removing the inertia weight introduced which was brought
up earlier in papers in the velocity update equation. Thus, 0.729 was revealed to be
the value constriction factor for rendering the best optimum solutions [8]. This tech-
nique demonstrated that the dynamic updating of the velocity equation could upgrade
local search of an optimal solution and convergence speed without accumulating any
complexities in the PSO technique.
Deeply inspired by the improvement in the modification of the PSO techniques, a
novel method of exponential partial swarm optimization (EPSO) has been introduced
in this paper. In this method, the inertia weight eliminated by CPSO has been re-
introduced. The inertia weight in this method is dependent on the Max I t, which is the
maximum number of iterations [9]. This gives a big relay decay step in an early stage
of the algorithm which enhances the speed of convergence, and in the later stage,
the decay step decreases considerably allowing local exploration, which balances
local and global exploration. In this paper, the ELD problem has been exposed to
techniques like LWPSO, DPSO, CPSO, and EPSO, and the solutions obtained by
each algorithm, their convergence speed, and convergence stability are compared.

3 Exponential Particle Swarm Optimization

Population swarm having a size of n particles is considered. We assigned a velocity


vector vi and position vector xi , both being P-dimensional vectors described as
vi = vi1 vi2 . . . vi p and xi = xi1 , xi2 . . . xi p . Convergence speed is affected by
vi velocity vector and possible solution is represented by the position vector x i .
The speed of convergence and exploration of global and local optimum value is
influenced by the velocity equation which hence affects  convergence
 stability. Each
particle acquires a personal best position Pb = Pb1 Pb2 . . . Pbp during the PSO
search operation. Personal best position of other swarm  members is collated
 with
that of the particles and global best position Pg = Pg1 , Pg2 , . . . Pgp is selected
using the algorithm. The global and personal best positions are utilized to update the
particle velocity. The equation of velocity can be defined as:
 
vi (n + 1) = vi (n) + c1r1 (Pb (n) − xi (n)) + c2 r2 Pg (n) − xi (n) (1)

The position equation is realized as:


42 N. Bansal et al.

xi (n + 1) = xi (n) + vi (n) (2)

where c1 and c2 being the positive coefficients, and random variable functions are r 1
(.) and r 2 (.)
The inertia weight LWPSO was introduced by earlier research papers where inertia
constant equation was given by
 
vi (n + 1) = wvi (n) + c1 r1 (Pb (n) − xi (n)) + c2 r2 Pg (n) − xi (n) (3)

and the inertia weight is linearly decreased as follows:

(wmax − wmin ) · it
w = wmax − (4)
Max I t
where ‘MaxIt’ denotes the count of iterations, ‘It’ being the instantaneous iteration,
wmin and wmax are constants having values 0.4 and 0.9, respectively. This technique
discovered a balance between local and global searching and also improved conver-
gence speed. Further research in this field deduced out that in the updated equation
of velocity damping factor in inertia weight produced a better speed of convergence.

w = w ∗ wdamp (5)

W damp is chosen to be 0.99 and w is chosen as 0.9.


Further betterment in the PSO algorithm prepared the way for the velocity update
equation where inertia weight was eliminated besides introducing constriction factor.
So, velocity update equation is defined as:
  
vi (n + 1) = χ vi (n) + c1r1 (Pb (n) − xi (n)) + c2 r2 Pg (n) − xi (n) (6)

And,

2
x= √ (7)
2−∅− ∅2 − 4∅

Several experiments were conducted to note the value of . The value was found
to be 4.1 which results in χ = 0.729. In this case, the algorithm gives the best
performance for finding the optimal solution.
Deeply inspired by this, a new method exponential PSO (EPSO) is introduced in
this paper. In this method, the inertia weight in (3) was modified. The new fixed of
inertia weight is:
  Maxlt
1
w = 1− (8)
Maxlt
Novel Exponential Particle Swarm Optimization Technique … 43

Now, since the maximum number of iterations is large. The expression (8) can be
expressed as

w = (e)(Maxlt)∗( Maxlt ) = (e)−1


−1
(9)

Thus, with the help of this algorithm, we can have a large step in the initial
stage of competition of computation and the smaller one toward the end of the
computation. Thus, the equilibrium is maintained between the local searching and
global searching for the problem. This is achieved without having any complications
in the algorithm which is actually quite essential for an evolutionary algorithm. A
convergence speed better than that of damped particle swarm optimization (DPSO),
constriction particle swarm optimization (CPSO), and linear weight particle swarm
optimization (LWPSO) were obtained with the help of this algorithm, and the stability
of convergence was found to be the best when ELD problem was exposed to the
algorithms. The numerical results for these algorithms will be discussed in the further
section.

4 Economic Load Dispatch Problem

Power generation in a thermal power plant takes place by the rotation of prime mover
in the turbine with the action of steam. The working fluid in the thermal power plant
is water. Water is fed to the boiler and super-heater which convert it to steam. Steam
which carries thermal energy is allowed to expand in the boiler which rotates the
rotor shaft of generators. The steam loses energy and is condensed and then pumped
to the boiler back to be heated up again. The factors which affect the operating cost
include the transmission losses, fuel costs, and efficiency of the generators in action.
Usually, labor, maintenance, and operation costs are fixed. A typical fuel cost curve
of the generating unit is depicted below in Fig. 1.

Fig. 1 Cost curve (fuel)


locus of a generating unit
44 N. Bansal et al.

Minimum power which can be extracted from the generating unit and below
which it’s not feasible to operate the plant is Pimin [10]. Maximum power which can
be obtained from generating units is Pimax .
The main motive of ELD is to reduce the total generation cost. Execution of the
problem can be as follows:


n
Minimise X T = Fi (Pi ) (10)
i=1

X T = total generation cost.


F i = ith generation cost function.
n = number of generating units.
Pi = ith generators real power.
The above-stated function can be roughly expressed as real power outputs
quadratic function from generating units of Power [11].

Fi (Pi ) = αi + βi Pi + γi Pi2 (11)

where α i , β i and γ represent the fuel cost coefficient for ith generating units.
This problem has inequality and equality constraints [12].

4.1 Equality Constraints

The cumulative real power being generated out of generating units in the case under
study should be equal to the total of the transmission losses and systems demand
power which therefore generates the equality constraints.


n
Pi = PD + P2 (12)
i=1

where
PD = Demand Power (MegaWatt).
PL = Transmission Losses (MegaWatt).

4.2 Inequality Constraints


Pimin ≤ Pi ≤ Pimax (13)
Novel Exponential Particle Swarm Optimization Technique … 45

Here,
For ith unit, Pimax is maximum possible real power and Pimin is minimum possible
real power [10].

4.3 Transmission Losses

The following equation describes the transmission losses:

P2 = P τ B P + B0τ P + B00 (14)

Here, a vector of length N is taken to be P which hereby represents power output


of every generator, B represents the square matrix of loss coefficients, and another
vector of length N is taken to be Bo also a constant B00 is considered.

4.4 Ramp Rate Limit Constants

Pi is the power being generated in the generating unit. It may not exceed real power
being generated in the preceding interval by a fixed amount, URi is the up ramp rate
unit and it might not be less than real power, and DRi is the down rate limit [13].
So, the following constraints arise:
 ◦   ◦ 
Max Pimin , Pi − D Ri ≤ Pi ≤ Min Pimax , Pi + U Ri (15)

4.5 Valve Point Effect

The progressive fuel cost curve of generating units in ELD is presumed to be a mono-
tonically increasing linear function. Therefore, the nature of input–output character-
istics is quadratic. However, because of the value point effect, linearity and disconti-
nuities of high order are displayed by the input–output curve [9]. Thus, the original
function is modified to consider these constraints. The adjusted periodic sinusoidal
function demonstrates the value point effect which is represented as:
   
Fi (Pi ) = αi + βi Pi + γi Pi2 + ei × sin f i × Pimin − Pi  (16)

where F i , ei represent fuel cost coefficient for ith generating unit in correspondence
with valve point effect.
46 N. Bansal et al.

4.6 Prohibited Operating Zones

The existence of a steam valve inside the thermal power plant generates vibrations
in the shaft bearing which results in the creation of zones that are however restricted
for performance in the fuel cost function. Non-segregated auxiliary operating tools
like boilers and feed pumps are part of the other reason. The position of the fuel cost
curve cannot be predicted in the prohibited zones. Precluding operations of units in
given regions are the optimal solution. The cost curve in prohibited zones is given
below in Fig. 2.
This can be mathematically represented as follows:

Pimin ≤ Pi ≤ Pi,1
lower
(17)

upper
Pi,k−1 ≤ Pi ≤ Pi,k
lower
, k = 2, 3, . . . n j (18)

upper
Pi,ni ≤ Pi ≤ Pimax (19)

lower
where lower real power limit of prohibited kth zone of ith unit is depicted by Pi,k ,
upper
the upper limit of prohibited k − 1th zone of ith unit is denoted by Pi,k−1 and ni : is
the number of prohibited zones in ith generating unit [14].
Thus, these are constraints taken into consideration in the ELD Problem, and
solutions have been acquired using different versions of PSO.

Fig. 2 Prohibited operating


zones are shown in cost
curve locus
Novel Exponential Particle Swarm Optimization Technique … 47

5 Numerical Results and Simulation

The power system is considered which has six generating units. It is used for denoting
the application of PSOs various modified methods and results were obtained. Table
1 consists of the fuel cost coefficient of generating units and Table 2 consists of
characteristics of generating units. Table 3 shows prohibited zones.
The B-coefficients are given below to compute the transmission losses in the given
power system:

Table 1 Fuel cost coefficients


Unit αi βi γi ei fi
1 230 7.1 0.0075 220 0.03
2 190 10.5 0.009 145 0.045
3 225 8.2 0.0095 165 0.035
4 200 11.9 0.008 110 0.047
5 210 10.7 0.0085 185 0.032
6 180 12.2 0.0075 125 0.028

Table 2 Generating units characteristics


Unit Pimin Pimax Pio URi DRi
1 120 500 450 80 120
2 75 220 170 50 90
3 100 275 200 65 100
4 60 150 150 50 90
5 70 200 190 50 90
6 60 120 110 50 90

Table 3 Generating units prohibited zones


Unit Prohibited zone 1 Prohibited zone 2
Pilower (MW) (MW) Pilower (MW) (MW)
1 215 245 345 375
2 95 115 140 160
3 155 175 210 240
4 85 95 110 120
5 95 115 140 150
6 75 85 100 105
48 N. Bansal et al.
⎡ ⎤
0.00085 0.0006 0.000035 −0.00005 −0.00025 −0.0001
⎢ 0.0006 −0.0003 −0.00005 ⎥
⎢ 0.0007 0.00045 0.00005 ⎥
⎢ ⎥
⎢ 0.000035 0.00045 0.00155 0.0000 −0.00005 −0.0003 ⎥
B=⎢ ⎥
⎢ −0.0005 0.00005 0.00000 0.0012 −0.0003 −0.0004 ⎥
⎢ ⎥
⎣ −0.00025 −0.0003 −0.0005 −0.0003 −0.00645 −0.0001 ⎦
−0.0001 −0.00005 −0.0003 −0.0004 −0.0001 0.0075
B0 = 1e−3 ∗ [−0.390 − 0.1270.7040.0590.216 − 0.663]
B00 = [0.065]

Different modified techniques of PSO are deployed for calculation of total power
generation cost and power being generated by each unit. Apart from this, the compar-
ison is drawn among different modified techniques of PSO. Two indices govern the
assessment of different optimization methods and they are convergence speed and
convergence stability. A modified stochastic algorithm is one having better conver-
gence stability and speed. The time taken for the computation of various algorithms
is also compared in this paper.

5.1 Power Generation and Total Cost

By considering all the inequality and equality constraints, the problem of ELD is
solved by different innovative techniques of PSO. Table 4 shows the results obtained.
1200 MW of demand power is obtained from the system. PSO algorithm is carried
out with each algorithm being run for 200 iterations and also a population size of
300 is taken.
The total power generated is 1146.37 MW, out of which 1100 MW is used to
meet up demand and 46.37 MW is wasted in transmission loss. The mean total cost
is nearly the same in all the modified versions of PSO as shown in Table 4.

Table 4 Power generated by various units and total costs


Unit LWPSO (MW) DPSO (MW) CPSO (MW) EPSO (MW)
1 434.72 434.73 434.73 434.73
2 140.34 140.32 140.29 140.28
3 180.76 180.78 180.76 180.76
4 126.84 126.81 126.85 126.84
5 198.17 198.19 198.15 198.15
6 65.54 65.54 65.53 65.33
Total mean cost (Rs./hr) 1431.21 1433.28 1432.39 1432.03
Novel Exponential Particle Swarm Optimization Technique … 49

Fig. 3 Convergence curve of different modified PSOs

5.2 Convergence Speed

The convergent algorithm is the one which after a definite no. of iterations reaches
an optimal region. An algorithm is called divergent when the optimal region is not
reached. The slope of the convergence curve determines the convergence speed [15].
The convergence curve of all the versions of PSO is shown in Fig. 3. In the
convergence curve, the vertical axis represents the total cost, whereas the horizontal
axis denotes the no. of iterations for the modified algorithms. Thus, a conclusion can
be drawn that EPSO performs better than DPSO, CPSO, and LWPSO when each of
these algorithms is made to run for 200 iterations.

5.3 Convergence Stability

Convergence stability is defined by the dispersion of global optimum value around


the mean value after it has been run for certain iterations. The global optimum
value’s concentration is an indication of convergence stability [16]. Better is the
concentration more is convergence stability [17]. These modified algorithms which
have been made to run for 40 times are the global best solution and the digital
solution is derived for different modified PSO techniques by calculating standard
deviations and mean cost. A smaller standard deviation reflects less divergence and
better stability. Hence, from Table 5, it can be seen EPSO has got the least mean and
standard deviation among LWPSO, DPSO, and CPSO.
50 N. Bansal et al.

Table 5 Digital analysis of various methods of PSO


Criteria LWPSO DPSO CPSO EPSO
Mean (Rs./hr) 1433.21 1433.28 1432.39 1432.03
Standard deviation (Rs./hr) 63.48 62.45 60.89 58.79

6 Conclusion

In this paper, there has been the successful implementation of PSO and the modified
algorithms for solving the ELD problem. PSO is a stochastic algorithm which has
been inspired from nature and the presence of a fewer number of variants provides
it lead over the other evolutionary techniques which have also been nature-inspired,
thus, a newer version of PSO has been successfully implemented and the comparison
has also been drawn with the present versions of PSO based on convergent stability
and speed and also the total mean cost. Digital analysis of the stability of conver-
gence for different versions has been performed. The new method (EPSO) has better
convergence stability and convergence speed than the pre-existing models (DPSO,
LWPSO, and CPSO). However, the new methods of total min cost are almost equal
to those of the existing models. The dynamic step in the modification of the velocity
of a particle is given by the iterative weighted term given in the velocity equation.
At a later stage, the step gets smaller, therefore, ensuring local exploration. Thus,
an equilibrium has been established between the local and global exploration. The
novel PSO technique can easily be employed in various applications on optimization
of the power system.

References

1. Alam MN (2018) State-of-the-art economic load dispatch of power systems using particle
swarm optimization. arXiv preprint arXiv:1812.11610
2. Shi Y (2001) Particle swarm optimization: developments, applications and resources. In:
Proceedings of the 2001 congress on evolutionary computation (IEEE Cat. No. 01TH8546),
pp 81–86. IEEE
3. Sharma J, Mahor A (2013) Particle swarm optimization approach for economic load dispatch:
a review. Int J Eng Res Appl 3:013–022
4. Kalayci CB, Gupta SM (2013) A particle swarm optimization algorithm with neighborhood-
based mutation for sequence-dependent disassembly line balancing problem. Int J Adv Manuf
Technol 69:197–209
5. Shen Y, Wang G, Tao C (2011) Particle swarm optimization with novel processing strategy and
its application. Int J Comput Intell Syst 4:100–111
6. Abdullah SLS, Hussin NM, Harun H, Abd Khalid NE (2012) Comparative study of random-
PSO and Linear-PSO algorithms. In: 2012 international conference on computer & information
science (ICCIS), pp 409–413. IEEE
7. He M, Liu M, Jiang X, Wang R, Zhou H (2017) A damping factor based particle swarm
optimization approach. In: 2017 9th international conference on modelling, identification and
control (ICMIC), pp 13–18. IEEE
Novel Exponential Particle Swarm Optimization Technique … 51

8. Eberhart RC, Shi Y (2000) Comparing inertia weights and constriction factors in particle swarm
optimization. In: Proceedings of the 2000 congress on evolutionary computation. CEC00 (Cat.
No. 00TH8512), pp 84–88. IEEE
9. Pranava G, Prasad P (2013) Constriction coefficient particle swarm optimization for economic
load dispatch with valve point loading effects. In: 2013 international conference on power,
energy and control (ICPEC), pp 350–354. IEEE
10. Mondal A, Maity D, Banerjee S, Chanda CK (2016) Solving of economic load dispatch problem
with generator constraints using ITLBO technique. In: 2016 IEEE students’ conference on
electrical, electronics and computer science (SCEECS), pp 1–6. IEEE
11. Arce A, Ohishi T, Soares S (2002) Optimal dispatch of generating units of the Itaipú
hydroelectric plant. IEEE Trans Power Syst 17:154–158
12. Dihem A, Salhi A, Naimi D, Bensalem A (2017) Solving smooth and non-smooth economic
dispatch using water cycle algorithm. In: 2017 5th international conference on Electrical
Engineering-Boumerdes (ICEE-B), pp 1–6. IEEE
13. Dasgupta K, Banerjee S, Chanda CK (2016) Economic load dispatch with prohibited zone and
ramp-rate limit constraints—a comparative study. In: 2016 IEEE first international conference
on control, measurement and instrumentation (CMI), pp 26–30. IEEE
14. Hota PK, Sahu NC (2015) Non-convex economic dispatch with prohibited operating zones
through gravitational search algorithm. Int J Electr Comput Eng 5
15. Li X (2004) Better spread and convergence: particle swarm multiobjective optimization using
the maximin fitness function. In: Genetic and evolutionary computation conference, pp 117–
128. Springer, Berlin
16. Ding W, Lin C-T, Prasad M, Cao Z, Wang J (2017) A layered-coevolution-based attribute-
boosted reduction using adaptive quantum-behavior PSO and its consistent segmentation for
neonates brain tissue. IEEE Trans Fuzzy Syst 26:1177–1191
17. Clerc M, Kennedy JJItoEC (2002) The particle swarm-explosion, stability, and convergence in
a multidimensional complex space 6:58–73
Risk Index-Based Ventilator Prediction
System for COVID-19 Infection

Amit Bhati

Abstract The current epidemic of the corona virus disease (COVID-19) in 2019
comprises a general wellbeing crisis of worldwide concern. Ongoing research shows
that factors similar to ımmunity, environmental effect, age, heart and diabetes are
significant supporters of this chronic infections. In this paper, a combined machine
learning model and rule-based framework is proposed to offer medical decision
support. The proposed system consists of a robust machine learning model utilizing
gradient boosted tree technique to calculate CRI index for patients suffering from
COVID-19 disease. This index is a measurement of COVID-19 patient mortality risk.
Based on CRI index system predicts required number of ventilators in forthcoming
days. The suggested model is trained and evaluated using a real-time dataset of
5440 COVID-19 positive patient obtained from John Hopkins University, World
Health Organization and dataset of Indian COVID-19 patients obtained from open
government data (OGD) platform India.

Keywords Ventilators · COVID-19 · CRI index · Gradient boosted tree (GBT) ·


Machine learning (ML)

1 Introduction

Current pandemic of COVID-19 is the result of a respiratory disease syndrome that


is commonly known as SARS-CoV-2. More than 118,000 individuals around the
world have been died with COVID-19 infection [1]. Many infected patients intro-
duced gentle influenza like indications and get recover rapidly [2]. As the COVID-19
infection emergency accelerates, equipment specialists and enthusiasts around the
globe are hearing the call of duty. Inside the clinical infrastructure, there are basic
technologies that are commonly accessible; however, these technologies do not exist
in a sufficiently high quantity to deal with the large number of patients related with

A. Bhati (B)
Institute of Engineering and Technology, Dr. RML Awadh University, Ayodhya 224001, UP, India
e-mail: amitsbhati@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 53
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_5
54 A. Bhati

pandemics [3]. Similarly, current scenario also prevails where critical COVID-19
infected patients struggling are all around the globe because of the absence of access
to a few of these technologies [4]. Ventilators are one of the cases in this context that
are as of now in basic short supply [5, 6]. Ventilators are integrated for the treatment
of both flu and COVID-19 patients in serious intense respiratory failure [7, 8]. Earlier
investigations have indicated that intensive care units (ICUs) won’t have adequate
assets for providing a better treatment to all patients, who have a need of ventilator
support in the pandemic period [9, 10].
A compelling report of Imperial College London assesses that 30% of hospital-
ized COVID-19 patients are probably going to require ventilator support [11]. As a
result, shortage of ventilator stays unavoidable in several places of the world. Andrew
Cuomo’s, Governor of New York, requested for 30,000 units of ventilator for treat-
ment of COVID-19 patients [12]. Even in India, government has also suggested the
automobile companies to produce low-cost ventilators rather than producing vehicles
in such pandemic situation.
Regarding their functionality, ventilators are incredibly reliable machines
comprising of sophisticated pumps which control the oxygen and flow of air from
the patient’s lungs, supporting them while they cannot accomplish their work. As per
World Health Organization (WHO), COVID-19 can overpower our clinical facilities
at the territorial level by causing rapid growth in death rates [13, 14].

2 Background of Study

2.1 COVID-19 and Mortality Risk

Since the seriousness of COVID-19 infection is firmly related to the prediction, the
fundamental and basic techniques to improve results are the early identification of
high-risk and critically sick patients. Zhou et al. [15] reported discoveries from 191
COVID-19 patients during the primary days of the spread in Wuhan and follow
the patient conditions till their discharge. Their discoveries reported about critical
suffered patients having age more than 56 years, a high rate of men (62%), and almost
half of patients with at least one disease (48%) [15]. In another report identified with
Wuhan City of China, mortality rate was 62% among critically sick patients suffered
from COVID-19, 81% of those required ventilators [16].

2.2 Machine Learning in COVID-19

Machine learning (ML) is a possibly incredible technology in the battle against


the COVID-19 pandemic. ML can be valuable to diagnose, predict, help to treat
COVID-19 infections, and help to manage financial effects. Since the outbreak of
Risk Index-Based Ventilator Prediction System … 55

the pandemic, there has been a scramble to utilize and investigate ML, and other
scientific analytic methods, for these reasons. ML methods can precisely anticipate
how COVID-19 will affect the resource needs like ventilators, ICU beds, and so
forth at the individual patient level and the clinic level. In this manner giving a solid
image of future resource utilization and empowering medicinal services experts to
make well-informed decisions about how these rare resources can be utilized to
accomplish the greatest advantage. In this paper, we used gradient boosted machine
learning technique to identify optimally required number of ventilators for COVID-
19 patients.

2.3 Research Gap and Objective

Advancement in machine learning shows a massive effect in the field of clinical


science. A wide scope of research contemplates is in progress in clinical diagnosis
and prediction utilizing machine learning approach. But only very limited works
are available in the COVID-19 mortality risk identification and its uses on critical
life support resources planning. Forecast of ventilators required utilizing COVID-19
risk identification index is the multi-class characterization problem; consequently, it
forces a need to think about variables for numerous classifications. These issues force
a need for a model to investigate and analyze a few parameters, to predict the event
and to settle on optimal decision. In this manner, advancing proficient integration
of behavioural data with patient health information offers better comprehension for
prediction.

3 Materials and Methods

3.1 Dataset

The propsoed research work has examined the dataset containing clinical records
of 5440 COVID-19 patients collected from confirmed source, for example, John
Hopkins University, WHO and open government data (OGD), and Government of
India sites. The sites have announced the details of COVID-19 cases. In our experi-
mentation, we have considered cases enlisted during month of February, March, and
first week of April 2020. The patients include both women and men with age ranges
from 21 to 91 years. The dataset comprises of 9 features reports about age, sexual
orientation, and clinical history of patients experiencing COVID-19.
In Table 1, except date of admission, age and sexual orientation, all features
are of binary nature, such as high blood pressure, cardiac disease, diabetes, nervous
system illness, respiratory disease, pregnancy childbirth, cancer disease, tuberculosis
ailment. Table 2 displays the demise pace of humanity specifically highlight class
56 A. Bhati

Table 1 Patient attributes


Attributes Coefficient
with co-related coefficients to
calculate CRI ındex Age 0.649
Heart disease 0.071
Respiratory disease 0.069
Pregnancy/childbirth 0.054
Neuro disease 0.046
Cancer 0.033
High blood pressure 0.028
Tuerculosis 0.025
Gender 0.017

Table 2 Death Rate of


Attributes Death rate (%)
COVID-19 disease affected
by patient attributes Age
80+ years old 21.90
70–79 years old 8.00
60–69 years old 3.60
50–59 years old 1.30
40–49 years old 0.40
30–39 years old 0.20
20–29 years old 0.20
10–19 years old 0.20
Sex
Male 4.70
Female 2.80
Existing disease
Cardiovascular 13.20
Diabetes 9.20
Chronic respiration 8.00
Hypertension 8.40
Cancer 7.60
No-pre condition 0.90

due to COVID-19 (According to report of WHO). The datatset of 18,134 COVID-19


patient is splitted into subsets of 70 and 30% for training and testing of models,
respectively. For verification of results, trained model is tested on 5440 medical
records of COVID-19 positive patient from the test dataset.
Risk Index-Based Ventilator Prediction System … 57

3.2 COVID-19 Risk Identification Index

In data preparation step, the proposed framework clears out the missing qualities
from input dataset with mean values specially for numerical data. For each feature,
we have calculated co-relation coefficient as shwon in Table 1.

F = {F0 , F1 , F2 . . . . . . .Fn } (1)

A = {A0 , A1 , A2 . . . . . . .An } (2)

Here F 0 , F 1 … F n represent features and A0 , A1 … An represent coefficient of


respective feature selected for training of machine learning model. COVID-19 risk
identification (CRI) index can be calculated as:


10
CRIi = Fi X Ai (3)
i=0

CRI index obtained from Eq. (3) for ith patient is not normalized. Use of this
CRI index values can degrade the performance of the entire learning model. Hence-
forth, the data ought to be improvised in quality before beginning a learning model.
Normalization of CRI can resolve this issue, so in next step we normalize the CRI
as:
CRIi − Min(CRI0−n )
CRI(N )i = (4)
Max(CRI0−n ) − Min(CRI0−n )

Normalized CRI index value obtained from Eq. (4) is calculated for every patient
record. This processed dataset is now ready for training purpose. For training and
validation of trained model, the dataset is divided into two parts. 70% of records
are used for training purpose and remaining 30% records are used for validating the
prediction of CRI index.

3.3 Gradient Boosted Tree

In order to trained the model for prediction, training operation is performed using
random forest, deep learning, gradient boosted trees, decision tree, support vector
machine approach. Gradient boosting technique is a ML procedure for relapse and
characterization problems, which delivers a prediction model as a collection of weak
prediction models, general decision trees. It is likely to be say, gradient boosting is
commonly utilized with decision trees [17]. Like other boosting techniques, gradient
boosting joins weak “learners” into a solitary solid learner in an iterative style. It is
58 A. Bhati

most effortless to clarify at least-squares setting, where the objective is to “instruct”


a model G to foresee estimations of the structure b̂ = G(a) by limiting the mean
 2
squared error n1 i b̂i − bi where i indexes over some preparation set of size n
of genuine estimations of the output variable bi , where b̂i is the expected value of
G(a), bi is the genuine value and n is the quantity of test in b.
Now, let us consider a gradient boosting calculation with R stages. At each stage
r of gradient boosting (where 1 ≤ r ≤ R), suppose some defective model Gr (for low
r, this model may basically return b̂i = bi , where the RHS is the mean of b). So as
to improve Gr , calculation should include some new estimator, E r (a). Hence,

G (r +1) (a) = G r (a) + Er (a) = b (5)

At each G(r+1) step calculation attempts to correct the error of its parent Gr .

3.4 Ventilator Requirement Prediction

The output from GBT model is applied to ventilator prediction process which takes
adaptive threshold value based on mortality rate in the region and forecasts the
expected number of ventilators required in near future based on last 10 days statistics
with predicted CRI index of patients. Adaptive threshold is computed automatically
with mortality rates in specific region as requirement of ventilator also depends on
immunity factor of person live in particular region. For example, immunity of people
lives in India may differs from people live in other countries. So in order to provide
good estimation adaptive threshold is utilized.

4 Mathematical Support for Proof of Concept

To check proposed system acceptability, we are using T-test statistic method. In our
case T-test allow us to compare the mean of number of required ventilators obtained
from our prediction model with actual number of ventilators used for curing of
COVID-19 patients. T-test for single mean can be given as:
 
 

 X − µ
t= √ (6)
S/ n

where X , are sample mean (calculated with the help of predicted output) and popu-
lation mean (actual output), respectively, which can be calculated using Table 3.
S represents standard deviation of predicted output where n is the total number of
Risk Index-Based Ventilator Prediction System … 59

Table 3 Predicted number of ventilators required versus actual ventilators from testing dataset
Date No. of patient Actual ventilator Predicted required % Accuracy
registered used during No. of ventilators
treatment
20-Mar-2020 248 38 32 84.21
04-Apr-2020 3299 331 296 89.42
12-Apr-2020 7790 372 331 88.97
19-Apr-2020 13,888 446 417 93.49
26-Apr-2020 20,483 538 510 94.79
03-May-2020 29,549 612 579 94.60
10-May-2020 43,989 752 703 93.48
17-May-2020 55,875 881 799 90.69
24-May-2020 76,809 971 901 92.79
01-Jun-2020 97,008 1024 956 93.35
09-Jun-2020 133,579 2241 2159 96.34
17-Jun-2020 160,517 2839 2607 91.82
25-Jun-2020 190,156 3512 3374 96.07
03-Jul-2020 236,832 4587 4302 93.78

sample used. Degree of Freedom is (n − 1) Simplified form of Eq. (6) can be specified
by Eq. (7).

|36.1 − 33.4|
t= √ (7)
16.07/ 10

Degree of freedom = 9 and t cal = 0.91. By using T-Table value with one-tail
having α = 0.01, t9,0.01 = 2.821. Because t cal <<< t9,0.01 , hence we can say that
proposed system is highly acceptable.

5 Results and Discussion

The effectiveness of GBT learning model is extensively studied and compared with
other learning techniques. Performance of proposed model is tested over 1632 patient
records used for testing purpose from the dataset obtained from web resource of John
Hopkins University and WHO. Root mean square error (RMSE) and absolute error
(AE) define how well a machine learning model perform for traning. RMSE and
AE for all machine learning models used in our experimentation is calculated using
Eqs. (8) and (9), respectively.
60 A. Bhati

n 
2
i=1 (Yi − Yi)
RMSE = (8)
n
n  
 

i=1 Yi − Y i 
AE = (9)
n


where Y i is the actual value of CRI index calculated using Eq. (4) and Y i is the CRI
Index value predicted by machine learning model for ith test dataset record.
Experimentation is done on Intel Xeon processor with 32 GB RAM, Nvidia-
GeForce GTX1080 GPU supported hardware. All the machine learning models using
in experimentation are trained and tested with python programming utlizing jyupter
tool having Scikit-learn library. In experimentation, gradient boosted trees are found
best model for training of our dataset as it has low RMS, AE compare to counterpart
methods which can be depicted in Fig. 1. The next set of experiment with trained
model has been done on the Indian COVID-19 confirm patient to find effectiveness
of our trained model. The dataset of Indian COVID-19 patients is made available
by open government data, India with limiting patient medical detail and hiding their
personal identification.
Table 3 depicts the accuracy of prediction model for number of COVID-19 patients
registered on a perticular date among which the patients actually utilized ventilator
in real scenario. Figure 2 depicts predicted required number of ventilators versus
actual required ventilators particularly in the case of COVID-19 in India.

Fig. 1 Performance evaluation of machine learning approach in terms of root mean squared error,
absolute error
Risk Index-Based Ventilator Prediction System … 61

Fig. 2 Predicted number of ventilators required and actual ventilators used in Rajasthan, India, for
COVID-19 positive cases

6 Conclusion and Future Scope

COVID-19 pandemic has just demise a large number of lives, and the number is
increasing step by step with exponential rate. As healthcare services assets are
compelled by a similar scarcity limitation that influence every one of us, it got
imperative to get ready for serious consideration to battle against such sickness. As
said by Honbl’e prime minister of India that in future we need to produce more venti-
lators to fight this pandemic. In this paper, we focused on prediction for ventilators
requirement based on CRI index which is calculated with COVID-19 patient medical
history. Physician by finding CRI index of COVID-19 patients pay more attention
toward their specific treatment. With proposed model, it could be hope that healthcare
data science communities, widespread adoption will lead to more effective interven-
tion strategies and ultimately help to curtail the worst effect of this pandemic. The
average performance of proposed model could be enhanced with utilizing stacking
of model with having training on large number of COVID-19 patients dataset.

References

1. World Health Association Coronavirus disease 2019 (COVID-19) situation Report—61. Avail-
able from: https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200322-
sitrep-62-covid-19.pdf. Accessed 22 Mar 2020
2. Chen N, Zhou M, Dong X (2019) Epidemiological and clinical characteristics of 99 cases
of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet (2019).
doi:https://doi.org/10.1016/s0140-6736(20)30211-7
3. Zhang X, Meltzer M, Wortley PM (2006) FluSurge–a tool to estimate demand for hospital
services during the next pandemic influenza. Med Decis Making 26(6):617–623
4. Miller J (2020) Germany Italy rush to buy life-saving ventilators as manufacturers warn of
shortages. Technical report, Reuters
5. Neighmond P (2020) As the pandemic spreads, will there be enough ventilators. NPR
6. Rubinson L (2010) Mechanical ventilators in US acute care hospitals. Disaster Med Public
Health Prep 4(3):199–206
62 A. Bhati

7. Huang HC, Araz OM, Morton DP, Jhonson GP, Damien P, Clement B, Meyers LA (2017)
Stockpiling ventilators for influenza pandemics. Emerg Infect Dis 23(6):914–921
8. Maclaren G, Fisher D, Brodie D (2020) Preparing for the most critically Ill patients with
COVID-19: the potential role of extracorporeal membrane oxygenation. JAMA
9. Smetanin P, Stiff D, Kumar A (2009) Potential intensive care unit ventilator demand/capacity
mismatch due to novel swine-origin H1N1 in Canada. Can J Infect Dis Med Microbiol
20(4):e115–e123
10. Stiff D, Kumar A, Kissoon N, Fowler R (2011) Potential pediatric intensive care unit
demand/capacity mismatch due to novel pH1N1 in Canada. Pediatr Crit Care Med 12(2):e51–
e57
11. Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and
healthcare demand. Available from: https://www.imperial.ac.uk/media/imperial-college/med
icine/sph/ide/gida-fellowships/Imperial-CollegeCOVID19-NPI-modelling-16-03-2020.pdf
Accessed 25 Mar 2020
12. Coronavirus spreading in New York like ‘a bullet train’. Available From: https://www.bbc.
com/news/world-us-canada-52012048 Accessed 25 Mar 2020
13. World Health Organization (2020) Critical preparedness, readiness and response actions for
COVID-19: interim guidance. World Health Organization, 7 March 2020
14. Ramsey L (2020) Hospitals could be overwhelmed with patients and run out of beds and
ventilators as the coronavirus pushes the US healthcare system to its limits. Business Insider
15. Zhou F, Yu T, Du R, Fan G, Liu Y (2020) Clinical course and risk factors for mortality of adult
in patients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. https://doi.
org/10.1016/S0140-6736(20)30566-3
16. Yang X, Yu Y, Xu J, Shu H (2020) Clinical course and outcomes of critically ill patients
with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational
study. Lancet Respir Med. https://doi.org/10.1016/S2213-600(20)30079-5
17. Son LH, Tripathy HK, Acharya BR (2019) Machine learning on big data: a developmental
approach on societal applications. In: Big data processing using spark in cloud. Studies in big
data. Springer, vol 43, pp 143–165. doi: https://doi.org/10.1007/978-981-13-0550-4
IoT-Based Smart Door Lock
with Sanitizing System

M. Shanthini and G. Vidya

Abstract During pandemic situation, safety and security play a major role in main-
taining a person’s health and well-being. The safe and secure environment influences
social habits, reduces stress (feeling of freedom) and increases health protection.
When people feel safe, they find it easier to relax, do all the things that comfort and
focus on the work. The ultimate goal of this paper is on the complete integration of the
sanitizer dispenser into the door lock system to monitor the home using a smartphone
with hand hygiene for advanced safety and security against any anomalies that have
been detected in houses, office buildings and various construction sites—everywhere
there is a need for advanced frontline security. The door lock system opts security by
allowing the owner to control the buildings with a Bluetooth-connected smartphone-
controlled system using Arduino UNO with an android application developed and
opts for safety by using the sanitizer dispenser with PIR sensor by force cleaning the
user’s hands to open or close the door lock. In this method, the users should provide
valid login credentials in the application which is verified with the database over the
Internet and sanitize the user’s hands before permitting entry, thereby reducing the
spread of germs. If the credentials are invalid or if the sanitizer is not used buzzer
rings and an SMS alert will be sent to the owner of the building and keeps the door
locked which enhances the security along with safety or hygiene.

Keywords Bluetooth · Buzzer alarm · Database verification · Digital door lock


system · PIR sensor · Safety · Sanitizer · Security · Short message service (SMS)

M. Shanthini (B) · G. Vidya


PSG Institute of Technology and Applied Research, Coimbatore, India
e-mail: 18cs147@psgitech.ac.in
G. Vidya
e-mail: vidya.ganesan89@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 63
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_6
64 M. Shanthini and G. Vidya

1 Introduction

A smart home automates the entire home for the benefits of the individuals but for
satisfied living, there is a need for safety and security. Some of the diseases, such
as corona, have been found to be transmittable easily because people fail to wash
their hands [1]. Accordingly, there is need for a disinfectant sanitizer mounted on the
wall near the main door which makes the users use it mandatorily before entering
any facility, such as professional offices, smart homes and any other locations where
the site owner desires to have every person entering or exiting through a doorway to
sanitize their hands, or the like, such as on any building that features an automated
digital door lock system for ensuring the security of the building. All the existing
door locking systems are old-fashioned ways of accessing the system with either a
traditional key or some means of radio-frequency identification (RFID) chips [2]. As
security is considered to be a primary concern humans find a solution that provides
reliable and automized security. This paper describes a security system that can
control the home door lock. The safety enhancements in the system should not only
improve the robustness of the system but also not complicate the system accessibility;
in other words, it should provide the ease of access. A Bluetooth module, namely
HC-05 interfaced with Arduino UNO, connects to the Bluetooth of the phone [3].
Each user will have a unique login credential and a type of authentication, i.e. either
password or pin lock stored in the firebase database which is a cloud-hosted database
and maintained by the owner. Users can access the door lock once the user credentials
are verified with the database over the Internet by using smart devices like tablets and
mobile phones with a developed application installed in it which will communicate
via Bluetooth to the lock by sending signals. The user after providing the valid login
credentials in the application, the user shows the hand below the sanitizer and the
motion detector detects the hand and dispenses a pre-set amount of disinfectant, after
which the door lock is opened.
The remainder of this paper is organized as follows: Section 1 includes the intro-
duction. Related works have been discussed in Sect. 2, and the proposed system is
discussed in Sect. 3. Section 4 discusses the result of the functional prototype of
the system implemented, and Sect. 5 contains the conclusion. Section 6 includes the
future scope. References are added at the end of the paper.

2 Literature Review

As per the survey, there exist many such systems to control the door. Each system
has a unique feature. The system aims [4] to develop a door security system using
an LDR sensor, ultrasonic sensor, servo motor and laser module connected with the
Arduino and Bluetooth application. Here, the Bluetooth module controls the door
through an application and the Arduino UNO receives and processes data such as the
intensity at a particular place and the distance from all these sensors continuously.
IoT-Based Smart Door Lock with Sanitizing System 65

This project focussed more on the ultrasonic sensor and LDR. The main drawback
of the system is that it does not have intrusion detection. In the study [5], a smart
door lock and lighting system using IoT for the smart home is presented. The user
can control the opening and closing of the door and can also control the lighting
using the Internet. One of the demerits here is that the relay is used to lock or unlock
the door and to switch ON/OFF the lighting. Relay usually requires a high voltage
to operate, and the motors also need a higher voltage and current which cannot be
given from the microcontroller.
The paper [6] uses a biometric lock using a fingerprint sensor with the door lock
system. The Arduino Nano is the microcontroller, and a Bluetooth module will set up
a communication between the microcontroller and the smartphone. The fingerprint
scanned in the smartphone is verified with the one stored in the android application
developed in kodular and installed on the smartphone. If it matches, a unique ID
of the lock will be sent by the application via the Bluetooth to move the servo to
the unlock position. If the fingerprint does not match, the servo moves to the lock
position. Even though a security measure is included in the system, it lacks the method
of intimation to the owner in case of a mismatch. A Bluetooth-controlled Arduino-
based home automation system proposed in [7] consists of a Bluetooth module,
relay module, LCD, LM35 temperature sensor and water sensor are connected to the
Arduino UNO. Once the microcontroller is powered, the water level and temperature
are displayed on the LCD. The motor is turned ON/OFF automatically if the water
crosses the defined levels. The doors, fans and lights are also controlled by the user.
The Arduino performs operations based on the information received from the user
who controls the whole system using an application. The main advantage of the paper
is that it saves electricity and reduces human effort.
Unlike other door lock systems, Bluetooth communication has been used to
transfer signals to control the door lock as it consumes less power and a database is
maintained which can be accessed only by the owner to monitor the home. To include
a measure of safety, the sanitizer dispensing system is attached along with the door
lock system. The most important feature added in the proposed system is that it can
alert the owner of the house with an SMS and the neighbours with a buzzer ring in
case of intrusion.

3 Modelling and Functions of the Proposed System

The main objective of this paper is to enhance the security and safety of the door lock
system with hand sanitizer dispenser. The hardware and software requirements for the
proposed system are shown in Table 1. The mobile device (android application) will
be sending a signal via Bluetooth to the Arduino circuit [8] that acts as a connection
between the smartphone and the servo motor. The Arduino makes decisions based
on the signal received. The use of Bluetooth on smartphones is suitable for the home
environment as it provides ease of access with better security as it covers only a shorter
66 M. Shanthini and G. Vidya

Table 1 Hardware and


Category Component Function
software requirements
Hardware Conventional lock Open/close door
Servo motors To control door lock and
To Dispense Sanitizer
Piezo buzzer Intrusion detector
Bluetooth module Communication channel
(HC-05)
Arduino UNO Data processor
PIR Sensor For detecting human
hands
Software MIT Application Develop an android
inventor application
Arduino IDE Write and upload
Arduino code/sketch
Wireshark To capture network
packets

range than the conventional key. The PIR sensor detects the motion of the user’s hand
when shown below the sanitizer and dispenses a pre-set amount of disinfectant.

3.1 Implementation of the System

The UNO board can be powered from either the Universal Serial Bus (USB) or
an external power supply. The Arduino UNO board is connected to the computer
using the USB cable to program the board and also to power it up. In the integrated
development environment (IDE) of Arduino, under the tools menu, select the board
as Arduino UNO and port as Arduino UNO (COM3). The Arduino sketch is written
in C++ and uploaded to the Arduino UNO from the IDE [9]. The circuit connections
of the devices interfaced with Arduino are shown in Fig. 1. The sanitizer dispenser is
connected to the servo motor interfaced with pin 6 of the Arduino, and the door lock
is connected to the servo motor interfaces with pin 9 of the Arduino. This completes
the hardware setup. Next, the android application is developed in Massachusetts
Institute of Technology (MIT) application inventor which is an online platform to
create android applications and it is installed on the smartphone and paired to HC-05
using Bluetooth. Once the Bluetooth module is paired with the phone, the user can
start using the application. The application has two types of authentication first type
is by using passwords and second uses pin. The pin lock is included for making
it easy to use for illiterate or aged people. The application also uses the firebase
real-time database to store the user credentials such as username, authentication and
password or pin [10]. The stored information can be updated or modified by the
owner to ensure the privacy and security of the data.
IoT-Based Smart Door Lock with Sanitizing System 67

Fig. 1 Hardware circuit connections of the proposed system

Open the android application then enter the username and choose the type of
authentication, i.e. either password lock or pin lock (pre-defined by owner in the
database). Once the login credentials are provided, the username is then verified in
the database if the user credentials are valid, then the next screen appears to either
enter the password or pin according to the authentication type. For pin lock, the
keypad to enter the pin will be shown, a four-digit pin (set by the owner) is entered
and the Bluetooth devices are paired automatically. For password lock, click on
CONNECT TO BLUETOOTH and select Door Lock, i.e. HC-05 from the list of
paired Bluetooth devices that appear on the screen, and then the user needs to enter
the pre-defined username and password and click on LOGIN. The entered password
or pin is verified by retrieving the user credentials from the database over the Internet.
The verification over the Internet is validated by capturing the network packets, i.e.
the requests sent by the application to the database and the responses received by the
application from the database using network packets analysing software Wireshark
[11]. The block diagram of the proposed system is shown in Fig. 2.
Once the user is verified, the buttons (LOCK and the UNLOCK buttons) to control
the door lock will be enabled for the user on the screen. If the user clicks the UNLOCK
button, the user will be notified to disinfect the hands simultaneously the application
will send a value to the servo motor interfaced with the Arduino via Bluetooth
module and the servo motor will rotate with that value and the lock will be opened
after a delay of 50s for the user to sanitize the hands with the sanitizer which is
68 M. Shanthini and G. Vidya

Fig. 2 Block diagram of the proposed system

pumped automatically by a servo motor when the user’s hands are detected using the
passive infrared sensor (PIR sensor). Likewise, if the user clicks the LOCK button,
the application will send a value to the servo motor interfaced with the Arduino via
Bluetooth module, and thus, the servo motor will rotate with that value and the door
lock will be closed.
If the user credentials are incorrect or if the user does not disinfect the hands, the
buttons will not be enabled to lock or unlock the door and a signal will be sent to the
buzzer that makes it ring along with which an alert SMS will be sent from the current
user’s phone number to the house owner’s phone. A warning notification also pops
up on the screen to the user. Figure 3 shows the flowchart, which is the step-by-step
approach that was followed in the writing the automated door security program, that
enables the execution of a command from the android application developed.

4 Result

A functional prototype of the proposed system which is developed and tested is


shown in Fig. 4. The step-by-step outcomes of the implementation of the proposed
system are described in the following.
The home screen of the application where the user was intended to enter the
username and the authentication done is shown in Fig. 5. A warning pops up on the
screen if the authentication was chosen that does not match the username as shown
in Fig. 6.
If the authentication was chosen as password type, the user needs to choose the
desired Bluetooth device to establish the connection by clicking on CONNECT TO
BLUETOOTH that is indicated with a notification about the connection status as
shown in Fig. 7. Then, the entered password was verified in the database over the
Internet with a click of the LOGIN button.
IoT-Based Smart Door Lock with Sanitizing System 69

Fig. 3 Flowchart of the designed android-based door lock system


70 M. Shanthini and G. Vidya

Fig. 4 A prototype of the proposed system

The verification over the Internet was validated by capturing the requests and
responses sent and received between the application and database using the Wireshark
software during the live testing of the application as shown in Figs. 8 and 9.
If the password was valid, the control buttons are enabled as shown in Fig. 10.
When the UNLOCK button was pressed, the servo motor attached to the door lock
will rotate with the value received from the application via Bluetooth, and once the
hands are disinfected, the door was unlocked.
In pin authentication, the Bluetooth devices were connected automatically as in
Fig. 11. If a valid pin was entered, the control buttons were enabled as shown in
Fig. 12. Similarly, the door lock will be unlocked in the click of the UNLOCK
button once the hands are sanitized.
The user credentials stored and maintained in the firebase database by the owner
are shown in Fig. 13.
When the UNLOCK button in the user application was pressed after database
validation, the user was notified to sanitize the hands as shown in Fig. 14 and the
corresponding values were sent to the Arduino via Bluetooth and once the hands
are disinfected, the servo motor rotates to unlock the door lock as shown in Fig. 15.
Similarly, when the LOCK button was pressed, the door lock was closed as in Fig. 16.
If the sanitizer was not used, the buzzer rings as a reminder for the user of hand
hygiene. If the user credentials were incorrect, the owner received an alert SMS along
with the generation of a buzzer ring, and the user was prompted to try again with a
notification in the application as shown in Fig. 17.
IoT-Based Smart Door Lock with Sanitizing System 71

Fig. 5 Home screen of the


application

The operations performed by the Arduino based on the signal received from the
application via Bluetooth were displayed in the serial monitor of the Arduino IDE
as in Fig. 18.

5 Conclusion

In this paper, considering safety and security as the main objectives digital door lock
with a sanitizing system are proposed. This system locks or unlocks the door when
the user provides valid login credentials in the installed android application and uses
the disinfectant by showing the hand in front of the sanitizer. An alarm is generated
with an SMS alert and the door remains locked if the invalid credentials are provided
in the application or if the user misses using the disinfectant, the buzzer rings as
a remainder which enhances the safety and security of the proposed method. It is
flexible and simple to install the system at a low cost with no overhead like drafting
and construction works.
72 M. Shanthini and G. Vidya

Fig. 6 Authentication and


username mismatch

6 Future Scope

As a further development, a continuous monitoring system that could take


photographs placed near the door and send live videos or picture’s to the house
owner’s mobile can be added to the proposed system.
IoT-Based Smart Door Lock with Sanitizing System 73

Fig. 7 Manual Bluetooth connection in password authentication

Fig. 8 Request sent by the application


74 M. Shanthini and G. Vidya

Fig. 9 Response received by the application

Fig. 10 Control buttons enabled [valid password]


IoT-Based Smart Door Lock with Sanitizing System 75

Fig. 11 Automatic
Bluetooth connection in pin
authentication
76 M. Shanthini and G. Vidya

Fig. 12 Control buttons enabled [valid pin]

Fig. 13 User credentials stored in firebase database


IoT-Based Smart Door Lock with Sanitizing System 77

Fig. 14 Unlock button is


pressed

Fig. 15 A prototype of the


proposed system [emphasis
on unlock]
78 M. Shanthini and G. Vidya

Fig. 16 A prototype of the


proposed system [emphasis
on lock]

Fig. 17 Warning notification and SMS alert


IoT-Based Smart Door Lock with Sanitizing System 79

Fig. 18 Operations displayed in serial monitor

References

1. Brow G, Raymond CA (2013) Door locking hand sanitizer system. In: Canadian patent applica-
tion. https://patentimages.storage.googleapis.com/54/3f/d1/9f1ecf1009a2f5/CA2776280A1.
pdf
2. Gupte NN, Shelar MR (2013) Smart door locking system. Int J Eng Res Technol 2(11):2214–
2217
3. Agbo David O, Chinaza M, Jotham O (2017) Design and implementation of a door locking
system using android app. Int J Sci Technol Res 6(8):198–203
4. Rathod K, Vatti R, Nandre M, Yenare S (2017) Smart door security using Arduino and Bluetooth
application. Int J CurrEng Sci Res 4(11):73–77
5. Satoskar R, Misrac A (2018) Smart door lock and lighting system using internet of things. Int
J Comput Sci Inf Technol 9(5):132–135
6. Patil KA, Vittalkar N, Hiremath P, Murthy MA (2020) Smart door locking system using IoT.
Int Res J EngTechnol (IRJET) 7(5):3090–3094
7. Al Mamun A, Hossain MA, Rahman Md.A, Abdullah Md.I, Hossain Md.S (2020) Smart
home automation system using Arduino and Android application. J Comput Sci EngSoftw
Test 6(2):8–12
8. Sohail S, Prawez S, Raina CK (2018) A digital door lock system for the internet of things with
improved security and usability. Int J Adv Res Ideas InnovTechnol 4(3):878–880
9. BhuteLK, Singh G, Singh A, Kansary V, Kale PR, Singh S (2017) Automatic door locking
system using bluetooth module. Int J Res Appl Sci EngTechnol 5(5):1128–1131
10. Khawas C, Shah P (2018) Application of firebase in Android App development—a study. Int
J Comput Appl 179(46):49–53
11. Das R, Tuna G (2017) Packet tracing and analysis of network cameras with Wireshark. In: 5th
international symposium on digital forensic and security (ISDFS)
Aspect-Based Sentiment Analysis
in Hindi: Comparison of Machine/Deep
Learning Algorithms

T. Sai Aparna, K. Simran, B. Premjith, and K. P. Soman

Abstract With the evolving digital era, the amount of online data generated such as
product reviews in different languages via various social media platforms. Informa-
tion analysis is very beneficial for many companies such as online service providers.
This task of interpreting and classifying the emotions behind the text (review) using
text analysing techniques is known as sentiment analysis (SA). Sometimes, the
sentence might have positive as well as negative polarity at the same time, giving
rise to conflict situations where the SA models might not be able to predict the
polarity precisely. This problem can be solved using aspect-based sentiment analysis
(ABSA) that identifies fine-grained opinion polarity towards a specific aspect asso-
ciated with a given target. The aspect category helps us to understand the sentiment
analysis problem better. ABSA on the Hindi benchmark dataset, having reviews from
multiple web sources, is performed in this work. The proposed model has used two
different word embedding algorithms, namely Word2Vec and fastText for feature
generation and various machine learning (ML) and deep learning (DL) models for
classification. For the ABSA task, the LSTM model outperformed other ML and DL
models with 57.93 and 52.32% accuracy, using features from Word2Vec and fast-
Text, respectively. Mostly, the performance of classification models with Word2Vec
embedding was better than the models with fastText embedding.

Keywords Aspect-based sentiment analysis · Sentiment analysis · Machine


learning · Deep learning · Support vector machine · Word embedding

T. Sai Aparna (B) · K. Simran · B. Premjith · K. P. Soman


Center for Computational Engineering and Networking, Amrita School of Engineering, Amrita
Vishwa Vidyapeetham, Coimbatore, India
e-mail: saiaparnasai@gmail.com
K. Simran
e-mail: simiketha19@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 81
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_7
82 T. Sai Aparna et al.

1 Introduction

The volume of online data has increased tremendously in recent years, giving rise
to various new opportunities as well as challenges in the field of research. Social
media websites such as e-commerce have user’s opinions or feedback systems about
a service or a product. This information is valuable for brands to understand the
sentiment and views of the customers about the product. It improves the quality of
the service. The prediction, analysis, and classification of sentiment expressed in
a review can perform with the help of sentiment analysis (SA) and aspect-based
sentiment analysis (ABSA).
Sentiment analysis (SA) is one way of finding the polarity based on the sentiment
associated with the reviews of an overall text. It is also known as opinion mining or
emotion AI sometimes. Further, the unorganized reviews and comments are found
in social media. Going through this manually remains inefficient and costly for
analysers. SA allows companies to sift through this group of data to achieve insights
that enable more organized form of it. SA becomes more accurate with ML as a
combination over time. It is analysed based on polarities, and most commonly used
polarities are positive and negative. SA works well when dealing with single polarity,
whereas in case of ABSA, the review along with aspect term adds extra information.
Using aspect terms, while prediction and classification make the analysis better and
this aspect dependent analysis for the prediction and classification of polarity may
be termed as aspect-based sentiment analysis (ABSA). The following review, “iska
performance kafi acha hai lekin iski banavat aur keyboard ki quality ne zarur hame
nirash kiya hai” has positive and negative polarity, respectively, due to two different
aspects (misc and hardware). For this example, ABSA can correctly predict and
classify the polarity of the review sentence, which was actually of a conflict type.
Conflict type of polarity is when the prediction was uncertain. For example, one
review has positive as well as negative polarity, then in that case, the model has to
learn and predict it as a conflict label. In the above example, the review has positive as
well as negative polarity which may produce uncertain polarity while classification.
The problem can be solved by training the model with a large number of data or by
using some class-specific features while setting the threshold limits for classification.
Furthermore, fine classification can be done, but as of now, in this work, it was
categorized as a separate label called conflict. Demand for more accurate prediction
or classification of sentiment expressed has increased in the current scenario. Trackers
can be set by the companies to the influencer’s social media account to see over time
and how their brands feature in the influencer’s conversation or news feed and what
followers feel about the same. It is also helpful for checking the impact and immediate
reaction and to monitor carefully. ABSA can be applied for Indian languages like
Hindi, Telugu, etc. (morphologically rich languages), but less amount of work is done
in Hindi language. Out of which many have transliterated or translated to the English
language. ABSA enhances the sentiment prediction by giving the extra information
of aspect term which acts as a keyword for classification and prediction.
Aspect-Based Sentiment Analysis in Hindi … 83

This paper has compared the performance of different word embedding models
with various classifiers that perform the task of ABSA in Hindi. The polarity was
predicted using the review as well as its aspect. Based on the reviews, a polarity label
of conflict type was also explored. For extracting features from the Hindi dataset, pre-
trained Hindi word embedding algorithms were used. Word embeddings algorithms
basically convert words into dense vectors with much lower dimensionality keeping
the context and semantics in place. The similarity between two words can be identified
by the distance between its vectors. Two different word embedding models, namely
fastText and Word2Vec were used in this work. Different classification models such
as support vector machines (SVM) [1 and 2], random forest (RF) were used for
classification.
Based on the experiments done in this work, using Word2Vec together with
SVM and Word2Vec along with long short-term memory (LSTM) achieved 59.34%
accuracy for SA (reviews-polarity) and 57.93% accuracy for ABSA (review-aspect-
polarity) task. For fastText pre-trained word embedding models in machine learning
(ML), SVM had performed better by achieving 51.37% accuracy for SA, whereas
deep learning (DL) had performed better by 52.32% accuracy for ABSA. All the
mentioned accuracies of Word2Vec performed better compared to the benchmark
accuracy of 54.05% for classification addressed in [3].
Section 2 describes a brief overview of the research done in the field of SA and
issues were addressed. In Sect. 3, the dataset used in this work is discussed. In Sect. 4,
the overview of the analysis process was discussed with the help of flow diagram,
in Sect. 5 the experiment results as well as observations obtained by our work are
discussed. Finally, Sect. 6 concluded the work with future directions of our research.

2 Literature Review

Overview of the research done in the field of SA and ABSA is summarized as follows.
Akhtar et al. [3] had created an annotated benchmark Hindi SA dataset, consisting
of product reviews crawled from different online websites. They had also used a
conditional random field (CRF) for aspect term extraction and achieved an average
F-measure of 41.07% and SVM algorithm for SA with an accuracy of 54.05% [3].
Also, this paper used the same dataset as [3]. DL models like neural networks for
classification used by only one team in SemEval-2014 task which ranked below
15 among all the top-ranked teams within all the subtasks. The other top rankers
of the task had not used DL for classification. Though for other datasets some of
the DL models were considered in [4]. Chen et.al had proposed a transfer capsule
network (TransCap) model for ABSA [5]. The concept of the model is dynamic
routing between capsules in a capsule network and transfer learning framework to
transfer knowledge from document level to aspect level sentiment classification. In
this work, experiments were performed on two SemEval datasets demonstrating that
the TransCap model had performed better by a large margin to the state-of-the-art
methods [6]. Wang et.al proposed a DL-based aspect and sentiment prediction model
84 T. Sai Aparna et al.

and outperformed the SVM model trained on the SemEval 2014 dataset and also
proposed a new method where a constituency parse tree used to connect sentiments
of the reviews with its corresponding aspects [7].
The UNITOR system participated in SemEval-2014 competition and used SVM
with different Kernel methods for various tasks addressed by Castellucci et al. in [8].
SVMhmm was used for aspect term extraction by tackling the problem as a sequential
tagging task, where multiple SVM kernels are linearly combined to generalise several
linguistic information. A shared-task on sentiment analysis of code-mixed data pairs
of Bengali-English (BN–EN) and Hindi-English (HI–EN) conducted in 2017, and the
overview of the task was provided [9]. The best performing team used character-level
n-grams and word-level features with SVM classifier to obtain a maximum macro-
average f-score of 0.569 for HI–EN and 0.526 for BN-EN datasets addressed by
Patra et.al. For DL models, Akhtar et.al has proposed the long short-term memory
(LSTM) architecture, which was built on top of bilingual word embeddings for
ABSA. The aim of the work was to reduce the data sparsity effect in resource-
poor language. The model had outperformed other state-of-the-art architectures in
two different setups, namely multilingual and cross-lingual [10]. A comprehensive
overview of DL architectures applied for ABSA was studied in [4]. Around 40
methods were summarized and categorized by taking DL architectures and tasks into
consideration. Santos et.al performed experiments using fastText word embedding
with different ML and DL frameworks. The results show that the proposed model
of CNN had outperformed other approaches of ML and DL [11]. The author in this
paper addressed the gap between research and existing implementations of many of
the popular algorithms such as inherent mathematical complexity of the inference
algorithms, high computational demands, and lack of a “sandbox” environment, it
would enable practitioners to apply the methods specific to the problems on real data.
The author’s contribution here is to fill the gap between academia and ready-to-use
software packages. Within an existing digital library DML-CZ, author demonstrated
the practicability of their approach on a real-world scenario of computing document
similarities [12].

2.1 Issues in State-of-the-Art Research Models

Table 1 shows the issues related to the state-of-the-art model, which used the dataset
[3]. It also describes the solution to those problems.

Table 1 Issues in existing research models


Issues Solution
Not captured semantic information Word2Vec and fastText embedding algorithms
Algorithms for capturing the semantic RNN, LSTM, GRU with word embeddings
association for words in a sentence algorithms
Aspect-Based Sentiment Analysis in Hindi … 85

Table 2 Aspect term polarity


Aspect term polarity Review count
count
Positive (pos) 2290
Negative (neg) 712
Neutral (neu) 2226
Conflict (con) 189
Total number of reviews 5417

3 Dataset Description

The dataset consists of 5417 reviews, 99 aspect terms, and four polarities [3]. The
polarity labels are positive, negative, neutral, and conflict. Conflict polarity labels
make the classification model work better by throwing the uncertain or multiple
prediction of polarities to the label conflict. The number of reviews per label is
presented in Table 2. Manual annotation performed for missing 1000 aspect labels
of respective reviews and category polarities. Without transliterating or translating,
the dataset was taken as such in the original Hindi language.

4 Overview of Analysis Process

The overview of the process is given in Fig. 1. Initially, the preprocessing of the
dataset was performed. The further dataset was cleaned, tokenized, removed stop-
words, white space and zero padded. These preprocessed and tokenized reviews were
sent to a word embedding algorithm for converting words into dense vector represen-
tation taking their context and sequence into account. These numeric representations
were passed to various ML models and DL models for classification. The models
were evaluated using the statistical measures given in Sect. 4.2.

4.1 Selection of Models

The state-of-the-art model was used in [3] for ABSA in Hindi, and dataset used
is explained in Sect. 3 of this paper. Implemented ML algorithms with N-gram
and other linguistic features. Further, the performance of various ML algorithms is
investigated with word embeddings generated using Word2Vec as well as fastText
features. The Word2Vec and fastText algorithms can embed semantic information
in the word vectors. It is required to check how the word embedding features are
generated using algorithms like Word2Vec and fastText to improve the classification.
The performance of DL algorithms like RNN, LSTM, and GRU for ABSA in Hindi
is also investigated because these algorithms can capture the sequential relationship
86 T. Sai Aparna et al.

Dataset

Preprocessing:
Tokenization, White Space removal, Stop Words removal

Word Embeddings:
(Word2Vec and fastText) Hindi Word Embeddings

Tasks:
SA – Review as X and labels as Y
ABSA – Review and aspect together as X and label as Y

Classification Models:
ML – NB, DT, AB, KNN, RF, SVM
DL- RNN, LSTM, GRU

Prediction

Evaluation Metrics:
Accuracy, Precision, Recall, F1-score

Fig. 1 Overview of the process

among the words in a sentence and can generate more meaningful representation for
the sentences.

4.2 Statistical Measures

Various statistical measures were utilized in order to evaluate the performance of the
models.
Metrics like accuracy, precision, recall, F1-score can be calculated using true
positive (TP), true negative (TN), false positive (FP), and false negative (FN).
TP denotes the quantity of positive samples that are correctly classified. TN
denotes the quantity of negative samples that are correctly classified. FP denotes
the quantity of negative samples that are misclassified. FN denotes the quantity of
positive samples that are misclassified.
Accuracy measure is the total number of correctly classified samples out of all
the classified samples.

accuracy − score = (TP + TN)/(TP + TN + FP + FN) (1)

Precision measure is the ratio of true positive with respect to all the positives
predicted.
Aspect-Based Sentiment Analysis in Hindi … 87

precision − score = TP/(TP + FP) (2)

Recall measure is the ratio of true positive with respect to a total number of true
actual classifications.

recall − score = TP/(TP + FN) (3)

F1-score is given by the harmonic mean between precision and recall.

f 1 − score = (2 × recall − score × precision − score)/


(recall − score + precision − score) (4)

5 Experiments and Results

The experiments consisted of four steps. First, the dataset was divided into train
data and test data with the ratio of 0.75:0.25. In the second step, the features were
extracted from the word embeddings using fastText and Word2Vec algorithms. For
the fastText algorithm, the model was trained directly using the Hindi pre-trained
model. For Word2Vec, the Hindi pre-trained model was trained on the review dataset
and was appended to the vocabulary to increase the number of data samples. In the
third step, features that were extracted from these models were used by various ML
and DL algorithms for classifying the polarity of the review. The ML models used
for classification [13] were Naive Bayes (NB), decision tree (DT), AdaBoost (AB),
K-nearest neighbours (KNN), RF, and SVM, whereas the DL models used were
recurrent neural network (RNN), LSTM, and gated recurrent unit (GRU). Keras [14]
and Scikit-learn [15] were used for implementing DL and ML algorithms respec-
tively. The parameters for each classification algorithm was fixed by hyperparameter
tuning. The fixed hyperparameters are given in Tables 3 and 4, respectively. The
above-mentioned steps were repeated for SA (reviews) and ABSA (review-aspect)
tasks. Further, the classification results for polarities were evaluated and compared
between SA and ABSA using various ML and DL algorithms for both fastText and
Word2Vec word embedding algorithms.
In sentiment analysis, the polarities were classified based on the reviews using
NB, DT, AB, KNN, RF, SVM, RNN, LSTM, and GRU algorithms are tabulated
in Table 5. The table shows a comparison between ML and DL algorithms using
Word2Vec and fastText algorithms. SVM with Word2Vec word embedding achieved
an accuracy of 59.34%, precision of 0.5888, recall of 0.5934, and F1-score of 0.5902.
The SVM with fastText word embedding acquired an accuracy of 51.37%, precision
of 0.5214, recall of 0.5137, and F1-score of 0.5165. These models outperformed other
ML models. In the case of DL models, LSTM performed better than other models
88 T. Sai Aparna et al.

Table 3 Hyperparameters
Algorithms Parameter Parameter values
for ML algorithms
NB Priors None
Var smoothing 1e-09
DT Random state 100
class_weight None
Criterion Gini
max_depth None
Splitter Best
AB random_state 100
KNN n_neighbours 5
Weights Uniform
Metrix Minkowski
RF random_state 100
class_weight Balanced
Criterion Entropy
max_depth 10
n_estimators 50
SVM C 100
Kernel rbf
Degree 3
random_state None
class_weight None

Table 4 Hyperparameters
Parameter values Parameter values
used for DL models (RNN,
LSTM, and GRU) Units 64
Batch size 32
Epochs 1000
Input dim 300
Optimizer Adam (lr = 0.01)
Recurrent dropout 0.0
Recurrent activation Sigmoid
Loss function Categorical cross entropy
Dense layer activation Linear
Output layer activation Softmax

and achieved an accuracy of 55.79%, precision of 0.5523, recall of 0.5579, and F1-
score of 0.5525 using Word2Vec embedding. GRU with fastText word embedding
outperformed other models and achieved an accuracy of 51.07%, precision of 0.5018,
recall of 0.5107, and F1-score of 0.5047. Among all the experiments, it is observed
that SVM with the Word2Vec word embedding algorithm achieved the best results.
In aspect-based sentiment analysis, both review and aspect terms are taken for
the classification of polarities using different ML and DL classification algorithms.
The results of classification using both fastText and Word2Vec algorithms are shown
Aspect-Based Sentiment Analysis in Hindi … 89

Table 5 Performance of sentiment analysis with ML and DL models


Algorithms Word2Vec fastText
Accuracy Precision Recall F1-score Accuracy Precision Recall F1-score
NB 34.83 0.5020 0.3483 0.3848 24.21 0.4452 0.2421 0.2791
DT 39.85 0.4100 0.3985 0.4038 36.53 0.3757 0.3653 0.3702
AB 48.78 0.4699 0.4878 0.4681 46.72 0.4485 0.4672 0.4497
KNN 49.67 0.4879 0.4967 0.4898 44.50 0.4452 0.4450 0.4416
RF 50.26 0.4952 0.5026 0.4760 46.27 0.4586 0.4627 0.4387
SVM 59.34 0.5888 0.5934 0.5902 51.37 0.5214 0.5137 0.5165
RNN 55.13 0.5500 0.5513 0.5456 51.66 0.4937 0.5166 0.5024
LSTM 55.79 0.5523 0.5579 0.5525 50.33 0.5088 0.5033 0.5054
GRU 55.42 0.5530 0.5542 0.5435 51.07 0.5018 0.5107 0.5047

in Table 6. KNN with Word2Vec word embedding acquired 50.41% accuracy, the
precision of 0.5048, recall of 0.5041, F1-score of 0.5036, and SVM with fastText
embedding achieved an accuracy of 51.37%, precision of 0.5214, recall of 0.5137, and
F1-score of 0.5165. LSTM with Word2Vec with 57.93% accuracy, the precision of
0.5785, recall of 0.5646, and F1-score of 0.5594 performed well, whereas both GRU
and RNN with fastText algorithm achieved accuracy, precision, recall, and F1-score
of 52.10, 0.5133, 0.5210, and 0.5103%, respectively. The observation made was,
among all the results obtained, LSTM with Word2Vec word embedding algorithm
achieved better results (Table 6).
Table 7 shows the time taken by classifiers with various word embedding
algorithms to complete both training and testing. The time was considered in seconds.

Table 6 Performance of aspect-based sentiment analysis with ML and DL models


Algorithms Word2Vec fastText
Accuracy Precision Recall F1-score Accuracy Precision Recall F1-score
NB 19.26 0.4217 0.1926 0.1904 23.99 0.4028 0.2399 0.2453
DT 41.33 0.4234 0.4133 0.4178 41.70 0.4307 0.4170 0.4235
AB 47.16 0.4628 0.4716 0.4639 45.17 0.4331 0.4517 0.4396
KNN 50.41 0.5048 0.5041 0.5036 49.96 0.4964 0.4996 0.4961
RF 47.97 0.4951 0.4797 0.4602 45.54 0.4567 0.4554 0.4419
SVM 49.96 0.4292 0.4996 0.4530 51.37 0.5686 0.5137 0.4698
RNN 57.27 0.5639 0.5727 0.5657 52.10 0.5145 0.5210 0.5122
LSTM 57.93 0.5785 0.5793 0.5594 52.03 0.5204 0.5203 0.5086
GRU 56.46 0.5706 0.5646 0.5612 52.10 0.5133 0.5210 0.5103
90 T. Sai Aparna et al.

Table 7 Computation time analysis


Algorithms Word2Vec fastText
Training time Testing time Training time Testing time
SA ABSA SA ABSA SA ABSA SA ABSA
NB 0.0173 00.0156 0.0103 0.0112 0.0208 0.0206 0.0047 0.0069
DT 1.5097 01.5745 0.0014 0.0014 0.6098 0.7027 0.0013 0.0019
AB 8.3158 17.9341 0.0408 0.0897 3.1625 7.9179 0.0359 0.1002
KNN 0.1261 00.1755 4.0326 1.0725 0.0425 0.0886 1.1534 0.5016
RF 6.0747 05.7343 0.0185 0.0236 3.9416 3.2080 0.0170 0.0188
SVM 6.0219 12.8812 0.0235 3.1275 5.5822 7.6883 0.9137 1.4487
RNN 182.75 196.521 0.0660 0.1696 192.91 201.12 0.0756 0.0999
LSTM 366.95 437.115 0.4103 0.4568 405.15 699.18 0.4062 0.4222
GRU 298.10 330.914 0.0820 0.3668 301.15 742.16 0.3732 0.3877

6 Conclusion

In this work, performance comparisons were made for various ML and DL models
using fastText and Word2Vec word embedding algorithms for sentiment analysis and
aspect-based sentiment analysis. For comparison, different classification algorithms
were utilized. For sentiment analysis, SVM combined with Word2Vec gave a better
performance with an accuracy of 59.34% in comparison with other ML and DL
algorithms. In the case of aspect-based sentiment analysis, LSTM combined with
Word2Vec outperformed the rest of the algorithms with an accuracy of 57.93%. An
increase in dataset size may improve the classification accuracy. In future work, for
classification of polarity, DL models such as convolutional neural networks (CNN),
CapsuleNetwork, and transfer capsule network can be taken into consideration.

Acknowledgements We take this opportunity to thank M Hari Chandana, Sanjana K, Sreelakshmi


K, from Centre of Computational Engineering and Networking, Amrita Vishwa Vidyapeetham, for
their whole-hearted support during the project.

References

1. Soman KP, Loganathan R, Ajay V (2009) Machine learning with SVM and other kernel
methods. PHI Learning Pvt. Ltd.
2. Vapnik V (2013) The nature of statistical learning theory. Springer Science & Business Media,
Berlin
3. Akhtar MS, Ekbal A, Bhattacharyya P (2016) Aspect based sentiment analysis in Hindi:
resource creation and evaluation. In: Proceedings of the tenth international conference on
language resources and evaluation (LREC’16)
4. Do HH et al (2019) Deep learning for aspect-based sentiment analysis: a comparative
review.Exp Syst Appl 118:272–299
Aspect-Based Sentiment Analysis in Hindi … 91

5. Chen Z, Qian T (2019)Transfer capsule network for aspect level sentiment classification.
In:Proceedings of the 57th annual meeting of the association for computational linguistics
6. Sabour S, Frosst N, Hinton GE (2017)Dynamic routing between capsules.Advances in neural
information processing systems
7. Wang B, Liu M (2015)Deep learning for aspect-based sentiment analysis.Stanford University
Report
8. Castellucci G et al (2014)Unitor: aspect based sentiment analysis with structured learning. In:
Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014)
9. Patra BG, Das D, Das A (2018)Sentiment analysis of code-mixed Indian languages: an overview
of SAIL\_Code-Mixed Shared Task@ ICON-2017.arXiv preprint arXiv:1803.06745
10. Akhtar MS et al (2018)Solving data sparsity for aspect based sentiment analysis using cross-
linguality and multi-linguality. In: Proceedings of the 2018 conference of the North American
chapter of the association for computational linguistics: human language technologies, vol 1
(Long Papers)
11. Santos I, Nedjah N, de Macedo Mourelle L (2017) Sentiment analysis using convolutional
neural network with fastText embeddings. In: 2017 IEEE Latin American conference on
computational intelligence (LA-CCI). IEEE
12. Rehurek R, Sojka P (2010) Software framework for topic modelling with large corpora.In:
Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks
13. Premjith B et al (2019) Embedding linguistic features in word embedding for preposition sense
disambiguation in english—Malayalam machine translation context.Recent Adv Comput Intell
341–370
14. Pedregosa F et al (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–
2830
15. Chollet F (2015) Keras documentation.Keras.io
Application of Whale Optimization
Algorithm in DDOS Attack Detection
and Feature Reduction

P. Ravi Kiran Varma, K. V. Subba Raju, and Suresh Ruthala

Abstract Distributed denial of service (DDOS) is one of the dangerous threat


vectors to the information security of any organization. The ease of launching a
DDOS attack by using readily available tools that make it more widespread. Unlike
the DOS attack, DDOS attacks are harder to be detected. Various real-time and
discrete network traffic attributes can be analyzed using machine learning tech-
niques in the observation of DDOS traffic. However, the larger dimensionality of
the datasets makes the detection algorithm more computationally complex. In this
paper, the whale optimization algorithm (WOA) is proposed to reduce the number of
dataset features in a wrapped-based approach using the accuracy of the classifier as
the fitness function. CICDDOD2019 dataset that consists of 80 features is considered
in this work. The experiments are conducted and the results conclude that the overall
classification accuracy is not at all suffered, while the features are reduced from 80
to 11, thereby optimizing the DDOS attack detection. Random forest has given the
best accuracy of 99.94% post the reduction of features, whereas the accuracy of the
full feature was recorded to be 99.92%.

Keywords DDOS · Distributed denial of service · Feature selection · Whale


optimization algorithm · Intrusion detection · Machine learning

1 Introduction

Distributed denial of service (DDOS) is a family of DOS attacks, whose primary


goal is to bring down the services provided to the legitimate users of a target. It is
a compromise of availability service. The vulnerabilities in the TCP/IP protocols

P. Ravi Kiran Varma (B) · K. V. Subba Raju · S. Ruthala


MVGR College of Engineering, Vizianagaram, AP, India
e-mail: ravikiranvarmap@gmail.com
K. V. Subba Raju
e-mail: srkakarlapudi@gmail.com
S. Ruthala
e-mail: ruthalasuresh294@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 93
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_8
94 P. Ravi Kiran Varma et al.

and applications are leveraged by the attackers to make successful DDOS. In a


DOS attack, the source is unique, whereas, in DDOS attack, the attackers use a large
quantity of periled hosts to launch an attack on the target. Unprotected hosts are tuned
into zombies by the attackers who use botnets to do so. DDOS attacks are broadly
categorized into three categories, volume-related traffic, protocol-related traffic, and
application-related attacks [1, 2]. The volume-related DDOS traffic (measured in bits
per second) injects the massive amount of spoofed network traffic in the form of a
flood of protocols like ICMP and UDP, to consume the full bandwidth of the target
network. The goal of protocol DDOS attack (measured in packets per second) is to
maliciously consume the target’s resources by flooding packets like SYN, ping of
death, Smurf, etc. The intension of application floods is to bring down the target’s web
services by injecting HTTP floods or a stealthier attack like the Slowloris. Zero-day
attacks add more challenges because they are freshly baked attacks whose patches
are not yet available in the market [3].
Deka et al. [4] discussed the need to disseminate the feature selection process
in the IDS system by using parallel computing. They propose a parallel cumulating
ranking algorithm to give the best rank of each attribute in the datasets, viz. ISCX-
IDS, TU-DDOS, MITDARPA, and CAIDA. An accuracy of 97% is reported post
the feature selection process when tested on the SVM binary classifier. Aamir and
Mustafa [5] used machine learning with labeled training data to obtain subsets with
unlabeled or any partially labeled datasets. They disclosed a mapping or branching-
based approach to differentiate the representation of the data structure of network
traffic for both DDOS and regular traffic. After completion of labeling, the supervised
machine learning algorithms, viz. KNN, SVM, and RF are used for classification and
obtained an accuracy of 92, 95, and 96.6%, respectively.
Sharafaldin et al. [6] have analyzed the contemporary datasets in the literature
and proposed a new dataset for DDOS detection, the “CICDDOS2019” dataset, to
overcome the present drawbacks. They used WEKA 3.6 as a tool for classifica-
tion. They extracted 80 DDOS network traffic features from the PCAP files using a
tool called “CICFlowMeter-V3”. The extracted features are evaluated for accuracy
using the random forest classifier, and an accuracy of 96.33% is recorded. Hoque
et al. [7] discussed a synchronous distributed DOS detection with less computational
overhead. A narrative mutual relationship or connection measure is used to find the
DDOS attacks. Four frequently used datasets like TUIDS, CAIDA, DDOS 2007, and
MIT DARPA are considered for experimentation. Meng et al. [8] proposed various
machine learning methods to identify DDOS attacks. They suggested the solution of
joined sequential consecutive selection of features with MLP for selecting of flawless
features according to the training phase. Ahmad et al. [2] proposed consistency subset
evaluation (CSE) for identifying and selecting the best features for DDOS detection.
They used the NSL-KDD dataset here, and the simulation is done by WEKA 3.7
tool. First of all, the selection of features is made by a full feature training dataset
of 42 features, and the accuracy is 91%. Abbass et al. [9] proposed a smart and
new method for DDOS attacks and protection mechanisms. They recommended 44
statistical features that can help to identify an attack by checking the best features
from this framework. HMM and random forest are used for DDOS detection by
Application of Whale Optimization Algorithm in DDOS … 95

Mugunthan in [10]. Neural networks and SVM are explored to classify the DDOS
attack traffic by Pandian and Smys [11].
Khundrakpam et al. [12] discussed the network packet parameters like HTTP
GET and POST requests for the detection of DDOS attacks. They used classifiers,
viz. Naïve Bayes, multinominal, multilayer perception, random forest, to identify
the attacks with an accuracy of 93.67%. The intrusion detection system is the secret
tool for network protection for that proposed best, low-cost IDS model, which deals
with the less tagged data also [6]. This semi-supervised IDS uses fuzzy c-means
clustering and is tested with the NSL-KDD benchmark IDS dataset. Few works in
that used optimization methods in networking domain are [13, 14].
Narasimha et al. [15] discussed the latest method for abnormality detection by
using a machine minder to protect the network and to identify the pattern of attacks.
They used the algorithm, viz. Naïve Bayes classifier, for finding the DDOS traffic.
For testing and training, the NSL-KDD dataset was used, and PCA is used for the
feature extraction with an accuracy of 92%. Varma et al. [16] detected the network
attacks by implementing a rough set-based filter method with ant colony optimization
(ACO). Table 1 lists a brief comparison of contemporary literature on DDOS attack
mitigation. WOA is employed as a feature selection in [17] combination with feature
selection. In radial dispersion, WOA is used in the optimization of the size and
positioning of the capacitors [3]. A binary system is followed in feature selection
using WOA in [1].
The drawbacks of the existent work with respect to the DDOS attacks detection
using the CICDDOS2019 dataset are as follows: Since this is the latest dataset, there
are no works that have been investigated on identifying highly relevant attributes of
the traffic features. The usefulness of meta-heuristic search method like WOA in a
wrapper method is not studied on this dataset. Performance comparison of important
wrapper classifiers on WOA for feature selection is not done. There is a real need to
experiment on dynamic global meta-heuristic optimization logics like the WOA to
minimize and confirm the highly relevant and sufficient DDOS traffic features using
the CICDDOS2019 dataset. Minimal traffic attributes help to design wire speed
DDOS detection systems consuming less computing and memory resources. This
paper presents the result of applying WOA [18] in feature reduction of the DDOS
attack dataset.

2 Whale Optimization Wrapper for Feature Reduction

Very recently, Mirjalili and Lewis came up with yet another nature-inspired meta-
heuristic global exploration optimization algorithm, WOA, mimicking the bubble-net
hounding behavior of humpback whales [19]. WOA constitutes a couple of hunting
methods. The first mechanism is to hunt the prey with the best or random nego-
tiator, and the second mechanism is replicating the bubble-net charging strategy. The
artificial whale optimization is described below.
96 P. Ravi Kiran Varma et al.

Table 1 Brief literature comparison


S. Cite Data set used Detection Feature selection Accuracy after
No. method used technique used feature selection
(%)
1 [4], 2019 MIT DARPA, Pearson’s Parallel 97
CAIDA, ISCX product moment cumulative
IDS correlation ranking
coefficient algorithm
(PPMCC)
2 [5], 2019 .OPNET modular Semi-supervised Agglomerative 96.6
14.5 simulators, machine and K-means
CICIDS 2017 learning under principle
component
analysis
3 [6], 2019 CICDDOD 2019 – Random forest –
Agressor
4 [7], 2017 CAIDA, DDOS Novel Mutual –
2007, TUIDS correlation information
measure correlation,
rough sets
5 [8], 2019 ISOT dataset Multilayer Wrapper, filter 99.4
perception and embedded
(MLP) methods
6 [2], 2017 NSL-KDD – Consistency 91.7
dataset subset evaluation
(CSE), DDOS
characteristics
features (DCF)
7 [12], 2015 DOS attack Distributed Multilayer 91.4
dataset divide and perception, RBF
conquer network
approach
8 [15], 2017 NSL-KDD – Principle 92
dataset component
analysis (PCA)

2.1 Encircling Prey

After discovering the position of prey, the humpback whale encircles around them.
First, they identified the location of the most appropriate pattern in search space;
then, the whale optimization algorithm supposes that the present dominant applicant
solution is the prey or it is closer. Then the remaining search agents can be altering
their location to the so-far best search agent. The whale positioning is represented as
follows:


P(t  D
 + 1) = P ∗ (t) − P.  (1)
Application of Whale Optimization Algorithm in DDOS … 97

 − 
  →∗  
B =  S. P (t) − P(t) (2)

The present location of the whale is represented as P(t  + 1) and the earliest


location of the best solution at repetition t is represented by P ∗ (t) the coefficient
vectors P and Q are expressed as

 v + a
p = 2.l. (3)

S = 2.
v (4)

2.2 Spiral Upgrading Position

After calculating the range between prey and whale, in that situation, they generated
the spiral equation between the victim and the whale. The helix-shaped movement
of the whale is given as follows.

→ − →
 + 1) = e gh . cos(2π h) . B ∗ + P ∗ (t)
P(t (5)

→∗ −
− → 
 
B =  P ∗ (t) − P(t) (6)

During optimization, the behavior is used to change the location of whales. There
are 50% chances of selecting the encircling mechanism. Their component is designed
as
− →

→ P ∗ − p. B if A < 0.5
P (t + 1) = −
→∗ − →∗ (7)
e . cos(2π h). B + P (t) ifA ≥ 0.5
gh

where A is expressed in the range (0, 1), which is nothing but a random number.

2.3 Searching of the Prey

The finding of prey is known as the exploration phase. The search agents discover their
prey using random search depending on the position of each other. The mathematical
expression is derived as follows:

 + 1) = −
P(t
−→
Pr pv − p. B (8)
98 P. Ravi Kiran Varma et al.

 −−→ 
 
B =  S. Pr pv − P  (9)

−−→
where Pr pv is random position vector.
The WOA wrapper algorithm is given in Algorithm 1. The fitness function is
defined in Eq. (10), where ∂ is the tuning parameter, acc is the accuracy of the
classifier wrapper, ω is the full feature dimensionality, and L is the dimensionality
of an agent’s solution.

ω−L
f (L) = ∂ × acc + (1 − ∂) × (10)
ω

Algorithm 1: The WOA wrapper for feature selection


Input: Whale count, w, and iteration count, i
Output: Minimized Feature Position in Binary
 and l
Initialize, p, S,
Initialize whale population w, randomly
Every whale computes the fitness value
P ∗ = The best whale
y=1
while y <= i
for every whale
update the present whale position to a new one using Eq. (7)
for-end
 and l are updated
p, S,
Check the fitness of all the agents
Replace P ∗ if better fitness value is recorded
i=i+1
while-end
solution is returned ( P ∗ )

The process of feature selection by the whales as agents in the WOA is listed in
Algorithm 1. Here, binary WOA is used, wherein each agent forms a solution. A
population for an agent at any given point of time is nothing but a set of 0s and 1s,
where 0 depicts the absence of a feature and a 1 depicts that the particular feature
is present in the set. The count of 1s and the index values of 1s are nothing but the
solution finally. Initially, all the whale agents start with a population of randomly
selected features from the full set. The fitness measure is computed for every agent’s
population, and the best one is recorded. The fitness of a whale’s population depends
on the wrapper classifiers classification accuracy, higher the best, as well as the length
of the solution, smaller the best. In each iteration, one best set of features stands as
the local best, and at the end of all iterations the global best whale’s population is
the solution with the highest fitness measure. In each iteration, the swarm of whales
traverses according to the equations given from 1 to 9.
Application of Whale Optimization Algorithm in DDOS … 99

3 Results and Discussion

CICDDOS2019 [6] dataset is used in this paper to test the proposed feature reduction
algorithms. This dataset carries malware and the latest common DDOS traffic, which
represents the PCAP data. Realistic background traffic is used in the generation of
the dataset. An actual bearing of 25 users based on such protocols, viz. SSH, HTTP,
HTTPS, SSH, FTP, and email protocols is collected to construct the dataset. This
dataset contains six categories of DDOS attacks. It has a total of 60 lakhs samples.
However, 3890 samples are taken randomly.
The dataset includes 80 features. The feature selection of the dataset is imple-
mented in Python, and the classifier entity implemented scikit-learn. By imple-
menting the proposed WOA algorithm in the wrapper method, eleven features were
selected out of 80 by the random forest wrapper classifier. The decision tree wrapper
resulted in 12 attributes, Naive Bayes wrapper produced 16 attributes, and the multi-
layer perceptron (MLP) wrapper resulted in 19 attributes. Table 2 shows the results
of four different classifiers tested as wrappers with WOA. Random forest has given
the best accuracy of 99.94% post the reduction of features, whereas the accuracy of
the full feature was recorded to be 99.92%. Figure 1 is a graphical representation of

Table 2 Results of various WOA wrappers


Classification method Accuracy with all Accuracy after attribute No. of attributes
attributes (%) selection (%) selected
Random forest 99.92 99.94 11
J48 trees 99.89 99.89 12
Naïve Bayes 95.69 94.95 16
MLP 97.84 98.72 19

Fig. 1 Accuracies and Accuracy with all aƩributes (%)


features selected by wrappers
Accuracy aŌer aƩribute selecƟon (%)
No. of aƩributes selected
101 20
100 18
NO. OF FEATURES SELECTED

99 16
98 14
ACCUACY

12
97
10
96
8
95 6
94 4
93 2
92 0
Random J48 Trees Naïve Bayes MLP
Forest
100 P. Ravi Kiran Varma et al.

Table 3 Comparison with similar works


Paper Feature Classifier used Accuracy with No of features Accuracy post
selection used all features selected feature
selection (%)
[21] PCA RNN – 20 99.0
[6] Statistical ID3 – 62 78.0
This work WOA wrapper Random forest 99.92 11 99.94

Table 4 List of features


S. No. Feature name
selected by WOA random
forest 1 Tot_Backw_Packts
2 Backw_IAT_Min
3 Fwd_Hdr_Len
4 Fwd_Pckts/S
5 Min_Pckt_Len
6 Avg_Pckt_Size
7 SubFlow_Fwd_Pckts
8 SubFlow_Fwd_Bytes
9 Init_Win_Bytes_Fwd
10 Src_prt
11 Actl_Dat_Pckt_Fwd

the outputs of the wrappers considered in this work. Table 3 is a comparison with
similar works that used the CICDDOS2019 dataset. All experiments are run with
50 whales and 50 iterations. Due to randomness and the stochastic nature of the
WOA algorithm and the wrapper classifier, different wrappers of Table 2 produced
different length solution. In the process of feature selection with WOA and a wrapper
classifier, data cleaning will be done and the noise in the data shall be eliminated and
hence an improvement of prediction accuracy can be observed.
The eleven features that are selected by the WOA random forest wrapper are listed
in Table 4.

4 Conclusion

One of the challenging threat vectors to be dealt with the IT security teams is the
DDOS attacks. Real-time extraction of network traffic attributes and further analysis
using machine learning techniques is a promising way to deal with the problem.
There exists a problem of large dimensionality of the network traffic attributes that
must be processed in real time for DDOS attack detection. Efficient attribute selection
methods are required to deal with the dimensionality problem. Meta-heuristic search
Application of Whale Optimization Algorithm in DDOS … 101

methods inspired by nature are of great use when combined with classifiers for eval-
uation. This paper proposed WOA, a humpback whale hunting behavior mimicking,
for selecting the appropriate shorter length attributes from the CICDDOS2019 DDOS
dataset. The results showcase a near 100% accuracy with as little as eleven features
out of 80 attributes.

References

1. Hussien AG, Hassanien AE, Houssein EH, Bhattacharyya S, Mohamed A (2018) S-shaped
binary whale optimization algorithm for feature selection. Adv Intell Syst Comput 79–87
2. Yusof AR, Udzir NI, Selamat A, Hamdan H, Abdullah MT (2017) Adaptive feature selection
for denial of services (DoS) attack. In: 2017 IEEE conference on application, ınformation and
network security (AINS), pp 81–84. Miri: IEEE
3. Prakash DB, Lakshminarayana C (2017) Optimal siting of capacitors in radial distribution
network using Whale Optimization Algorithm. Alexandria Eng J 56(4):499–509
4. Deka RK, Bhattacharyya DK, Kalita JK (2019) Active learning to detect DDoS attack using
ranked features. Comput Commun 145:203–222
5. Aamir M, Zaidi SMA (2019) Clustering based semi-supervised machine learning for DDoS
attack classification. J King Saud Univ Comput Inf Sci 1–11 (In Press)
6. Sharafaldin I, Lashkari AH, Saqib Hakak S, Ghorbani AA (2019) Developing realistic
distributed denial of service (DDoS) attack dataset and taxonomy. In: 2019 ınternational
carnahan conference on security technology (ICCST), pp 1–8. Chennai: IEEE
7. Hoque N, Kashyap H, Bhattacharyya DK (2017) Real-time DDoS attack detection using FPGA.
Comput Commun 110:48–58
8. Wang Meng, Yiqin Lu, Qin Jiancheng (2019) A dynamic MLP-based DDoS attack detection
method using feature selection and feedback. Comput Secur 88:1–14
9. Asosheh A, Ramezani N (2008) A comprehensive taxonomy of DDoS attacks and defense
mechanism applying in a smart classification. WSEAS Trans Comput 7(4):281–290
10. Mugunthan SR (2019) Soft computing based autonomous low rate DDOS attack detection and
security for cloud computing. J Soc Clin Psychol 1(2):80–90
11. Pasumpon Pandian A, Smys S (2019) DDOS attack detection in telecommunication network
using machine learning 1(1):33–44
12. Johnson Singh K, De T (2015) An approach of DDOS attack detection using classifiers. In:
Emerging research in computing, ınformation, communication and applications. Springer, New
Delhi, pp 429–437
13. Raj JS, Basar A (2019) QoS optimization of energy efficient routing in IoT wireless sensor
networks 1(1):12–23
14. Haoxiang W (2019) Multi-objective optimization algorithm for power management in cognitive
radio networks. UCCT 1(2):97–109
15. Mallikarjunan KN, Bhuvaneshwaran A, Sundarakantham K, Mercy Shalinie S (2017) DDAM:
detecting DDoS attacks using machine learning approach. In: Computational intelligence:
theories, applications and future directions—volume I, advances in intelligent systems and
computing. Springer, Singapore, vol 798, pp 261–273
16. Ravi Kiran Varma P, Valli Kumari V, Srinivas KS (2016) Feature selection using relative fuzzy
entropy and colony optimization applied to real-time intrusion detection system. Proc Comput
Sci 85:503–510
17. Mafarja M, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing
for feature selection. Neurocomputing 260:302–312
18. Mohammed HM, Umar SU, Rashid TA (2019) A systematic and meta-analysis survey of whale
optimization algorithm. Comput Intell Neurosci 1–25
102 P. Ravi Kiran Varma et al.

19. Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67
20. Hoque N, Kashyap H, Bhattacharyya D (2017) Real-time DDoS attack detection using FPGA.
Comput Commun 48–58
21. Elsayed MS, Le-Khac N-A, Dev S, Jurcut AD (2020) DDoSNet: a deep-learning model forde-
tecting network attacks. In: 21ST IEEE ınternational symposium on a world of wireless, mobile
and multimedia networks (IEEE WOWMOM 2020). Cork
Social Media Data Analysis: Twitter
Sentimental Analysis on Kerala Floods
Using R Language

Madhavi Katamaneni, Geeta Guttikonda, and Madhavi Latha Pandala

Abstract Web-based social networking has turned out to be essential specialized


devices for ordinary discussions. Because of their ubiquity, person to person commu-
nication destinations comprise a rich wellspring of data about the general supposition.
Social information analysis is the examination of individuals’ connection in Internet-
based life on how they are communicating their perspectives. The information broke
down here is gathered through long range informal communication site Twitter which
is one of the greatest stages for data sharing. Wistful investigation permits us think
about individuals’ perspectives in support or against of any subject. In this paper, we
target August 2018, Kerala Floods as a contextual investigation for feeling examina-
tion. Our thought behind is to watch, look at and break down how individuals express
their perspectives. To get the better results with these tweets, R-programming is used
to develop the proposed system. In this analysis, the data from tweets are separated
utilizing catchphrase-based learning extraction. The exploratory outcomes demon-
strate that how individuals everywhere throughout the world remains for Kerala and
contribute relief reserve to CMO Kerala.

Keywords Kerala Floods · Rescue · Flood relief fund · Donate · Rebuild Kerala ·
Emotions · Save · Lives

1 Introduction

From the past many years, various new methods, algorithms and with the approach
of new time, technology has got its new and higher pace [1, 2]. This development

M. Katamaneni (B) · G. Guttikonda · M. L. Pandala


Department of IT, VRSEC, Vijayawada, India
e-mail: itsmadhavi12@gmail.com
G. Guttikonda
e-mail: geetaguttikonda@gmail.com
M. L. Pandala
e-mail: chinnu065@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 103
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_9
104 M. Katamaneni et al.

has driven the adjustment in individuals’ method for communicating their perspec-
tives, feelings and assessments, and furthermore, the stages in which they do as
such. Presently, the utilization of social locales, Web journals, online gatherings
have become possibly the most important factor, and therefore they create enormous
measure of information which whenever dissected can be helpful for business, inno-
vations, common issues and so forth. Information investigation is increasing immense
force over the world [3–5]. One of its applications is estimation investigation which
is a zone as of now under research. Assumption examination uses the intensity of
information investigation to extricate feelings communicated through words utilized
by individuals crosswise over different social stages, remarks, surveys and so on.
Everyday, many number of users in Twitter express their perspectives and feelings
day by day on a particular or unspecific point as tweets. Breaking down unstructured
information is in itself a troublesome undertaking and extricating valuable data from
it is a major test. For doing as such, there is need of ground-breaking devices [5,
6]. Furthermore, advancements which can deal with a huge number of tweets and
removing slant from them. In spite of the fact that there are different routes conceiv-
able to do as such in this paper R dialect is utilized to play out the activity. Nostalgic
analysis is a strategy to investigate whether a composed content is in positive, nega-
tive or unbiased state. Generally, this consists of expressions of the various users in
social media, i.e., Twitter [7–9].

2 Analysis Architecture

In Fig. 1, the process of sentimental analysis steps is explained and how the
functionality is worked as follows:
(a) The selected application confirmation about Twitter:
We have to connect with Twitter API utilizing the login credentials of Twitter
developer application. It is vital to confirm, to interface R Studio with Twitter

TwiƩer R-Studio ExtracƟon Pre-


ApplicaƟon twiƩer of processing
ConfirmaƟon Packages Tweets

Graphical SenƟment Modelling and


View of Analysis transforming
EmoƟons

Fig. 1 Process of the proposed system


Social Media Data Analysis: Twitter Sentimental … 105

for extricating the tweets. Once the confirmation is finished, continue to further
steps.
(b) Various packages that are required:
Different types of R packages that are required to install to perform various
analysis of tweets. The selected packages consist of various functions that will
analyze the tweets.
(c) Tweets extraction:
The information is gathered from the tweets that are trending point will use hash
tag “#”.
(d) Pre-processing of data:
This process is done by expelling undesirable articulations and words.
(e) Data modeling and transforming:
After the completion of pre-processing and cleaning the data the better structure
is formatted to extract the sentiments from the tweets.
(f) Sentimental analysis:
Analysis of sentiments is done here.
(g) Graphical view:
It is the last advance, where the conclusions are plotted and are envisioned by
diagrams and word cloud.

3 Procedure for Analysis of Sentiment

The platform for using source code for free programming is R which is a language
for open source and programming essentially utilized by analysts and deep informa-
tion analysis can be used to contemplate different factual information. For example,
posts, reviews and so forth. There are several steps to be followed for analyzing the
sentiments. The steps are mentioned below:
Step 1:
Generate a Twitter developer account and make login to utilize the tweets in R Studio.
Take the “API key,” “API secret key,” “Access token” and “Access token secret” to
perform agreement with R console.
Step 2:
For the sentimental analysis, these packages are to be downloaded. Few packages
used are:
Twitter—this will create the interface to the Web API.
Twitter httr—reins request and mechanises with URLs.
Step 3:
Once the confirmation is done, the tweets can be extracted tweets by means of hash
tag.
106 M. Katamaneni et al.

Step 4:
As it is an important process for every data to be cleaned, i.e., data cleaning. There
may be inadequate tweets in the Twitter data, i.e., unnecessary data, so it is necessary
to clear the data to get good results.
Step 5:
The data that is cleaned in the first step is arranged in such a manner in a data frame
and a matrix representation so as to perform the operations
Step 6:
From a collection of tokenized words, a set of feelings are separated and information
analysis is done to watch slants. In this way, the task is done.
Tweets:
The analysis of the tweets can be very recent natural calamity Kerala Floods. Due to
heavy rainfall Kerala, the south India State is largely effected with severe floods. Big
number of people died and a million of people were affected. It was really a worst
flood in the century. So people expressed their emotions through Twitter by tweeting
the effect the Kerala.
Tweets Extraction:
Tweets extraction, in straightforward words, implies gathering information for exam-
ination. Here, it suggests to collection of tweets. The API is searched the actual
acceptance API for Twitter, this restores the tweets that, coordinates the given string
and composes it into the object. Add up to 3000 tweets were gathered on “#kerala
floods”.

4 Information Preparation and Modeling

The information assembled is not unadulterated. It contains hash labels, URLs, short-
ened forms, accentuation, stop words and so forth. To get better results and good
information, the tweets must be cleaned in a proper way. The libraries such as tm
package and stringr packages are being used for proper functioning of data cleaning
and mining [10, 11]. The corpus must not contain the tweets which are not included
such as retweets, joins, @, accentuations and different images which do not precise
any spirits. For the further analysis, the corpus is used [3, 4].
There are various functions for which unwanted strings are to be removed from
the tweets like
• removePunctuation()—to remove punctuation marks.
• removeNumbers()—to remove numbers as numbers are not important in sentences
to analyze emotions.
• tolower()—to convert the whole data to lower case.
Social Media Data Analysis: Twitter Sentimental … 107

• stopwords filter the data which adds no meaning to the sentence.


• stemDocument method finds and replaces the current word with root word.
• stripWhitespace—remove additional blank spaces.
After performing all the above operations, the data are now transformed into the
format of NLP procedural format. Presently, the changed corpus is transferred into
document term matrix. This matrix speaks to recurrence of each word which is in
the corpus.
WordCloud Formation:
WordCloud is an image that consists of different words that are used to specify text
or subject, and the weight of every word represents its importance or frequency.
Figure 2 represents the most occurred words in the tweets, i.e., are kerala, floods,
accident, saved, jineesh and so on. The colors are different for each word as it depends
on the frequency of occurring in the corpus. If the frequency is more, then that word
is used number of times in tweeting.

Fig. 2 Most frequent words


108 M. Katamaneni et al.

The above figure gives the graphical view of the words which are used more
number of times in tweets [1, 2]. It is observed that the word kerala is used number
of times and this word has highest frequency compared to other words floods and
accident.
Nowadays, hashtags are most widely used in many social networking sites that
are used to trending the news and various tweets or messages. This is used to send
the idea or message of the person that can be passed to every person who uses this
hashtag. This represents the feature of one specific word. Some of the hashtag that are
present in Fig. 3 are #KeralaFloods, #KeralaFloodReliefSJM, #Kerala Relief Fund,
#ReBuildKerala. These types of hashtags represent the people’s views that will show
the deep insights of the hashtag creators.
Figure 4 shows the words with sentiments are rebuild, happens, livelihoods, lost,
help, tomorrow, munnar, witness, etc. The major emotion for word lost is sadness,
for help is trust and for both words the sentiment will be positive. Some of the like
keralafloods can be taken in both negative and positive sentiment.

5 Sentimental Analysis

This paper mainly focuses on analysis of the various user emotions about the Kerala
Floods that happened. These experiments consist of eight feelings and two slants
positive and negative [1, 2]. To represent these feelings visually bar graph is used
to show various sentiments on tweets. Based on the some of the words, the positive
words show the highest spike and words like help and rebuild are used. The next line
in the graph represents the some of the words such as save, help and these words
are used by the different types of people to save the victims of the floods. From
the package syuzhet, the sentiment method NRC is used to get and this is used to
compare all the tokenized words with the word sentiment EmoLex and these consists
of large number of words with verity of emotions. If the matching of the words that
present in the various sentiments and these are check the emotions in the pre-listed
Social Media Data Analysis: Twitter Sentimental … 109

Fig. 3 Frequent hash tags

emotions and these are increased one by one. On including every one of the qualities
adds up to feeling and slant can be figured.
In Fig. 5, the bar graph is represented based on the verity of emotions that are
present in the sentiment for analysis. Positive words are mostly expected by the many
people for trust. For the negative words, the bar graph shows the low because of the
situation.
Limitations:
• The emotions which are presented by emoji’s cannot be retrieved by present
sentimental analysis tools.
• In the sentiment dictionary, various local language words are not defined.
• It is very difficult to find the sentiments on the mixed language words and also
transliterated words.
110 M. Katamaneni et al.

Fig. 4 WordCloud of words with sentiment

Fig. 5 Bar graph showing sentimental analysis


Social Media Data Analysis: Twitter Sentimental … 111

• The number of words is 3000 that is used to compare with matched content in the
given dataset.
• Only recent tweets are taken into consideration which is in text format only.

Solution:
• Expansion of word net for various languages which makes analyzing the
sentiments easy.
• Developing tools or algorithm which can determine the context of humor or
sarcasm can improve analysis further.

6 Conclusion

In this paper, various tweets are shown for analysis on the #KERALAFLOODS that
some of the people will support Kerala. Sentiment analysis is done on the tweets given
by the various users and this shows the emotions of the users on #KERALAFLOODS.
Positive words show the gratitude toward the victims in this floods and this shows
the more sentiments on victims. The bar graph representation shows the positive and
negative tweets that are analyzed by the proposed system. With this analysis, people
may raise the funds for this floods. This analysis shows the peoples sentiments with
the development of technology and this helps in doing the research in text mining.

References

1. Sharma V, Agarwal A (2016) Suppositions mining and classification of music lyrics utilizing
SentiWordNet. In: symposium on colossal data analysis and networking
2. Keka I, Çiço B (2015) Factual treatment for trend detection and analyzing of electrical load
using programming language R. In: foouth mediterranean conference on embedded computing
2015
3. Katamaneni M, Cheerala A (2014) Wordnet based document clustering. Int J Sci Res (IJSR)
3(5)
4. Madhavi K, Anush Chaitanya K, Percolate M Supremacy user walls by using Pfw. Int J Sci
Eng Adv Technol
5. Turney PD (2002) Thumbs up or thumbs down?: semantic orientation applied to unsuper-
vised classification of reviews. In: proceedings of the 40th annual meeting on association for
computational linguistics, pp 417–424
6. Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In:
proceedings of the seventh conference on international language resources and evaluation, pp
1320–1326
7. Hussein DMEDM (2016) J King Saud Univ Eng Sci. Available: http://dx.doi.org/10.1016/j.jks
ues.2016.04.002
8. Kowcika A, Guptha A (2013) Sentiment analysis for social media. Int J Adv Res Comput Sci
Softw Eng 3(7):216–221
9. Vinodini G, Chandrashekaran RM (2012) Sentiment analysis and opinion mining: a survey. Int
J Adv Res Comput Sci Softw Eng 2(6):283–294
112 M. Katamaneni et al.

10. Liang PW, Dai BR (2013) Opinion mining on social media data. In: IEEE 14th international
conference on mobile data management, pp 91–96, ISBN 978-1-494673-6068-5
11. Thet TT, Na IC, Khoo CS, Shakthikumar S (2009) Sentiment analysis of movie reviews on
discussion boards using a linguistic approach. In: proceedings of the 1st international CIKM
workshop on topic-sentiment analysis for mass opinion, pp 81–84
12. Pangand B, Lee L (2004) Wistful training: sentiment investigation utilizing subjectivity
examination utilizing subjectivity synopsis dependent on least cuts. Leg tendon
Intrusion Detection Using Deep Learning

Sanjay Patidar and Inderpreet Singh Bains

Abstract Deep learning is an artificial activity of perception which resembles the


work of the human intellect in the handling of data and the development of designs
to be used in the making of conclusions. This paper discusses developing a deep
learning model that can detect Web attack in the system to increase accuracy of
attack detection. The goal is to make a system totally secure from any assault by
utilizing the deep learning model for digital security, by utilizing deep learning
model with the dataset an assault can be distinguished and can be made completely
secured. By proposing deep learning model and assessing those with the dataset for
performing attack acknowledgment has given accuracy of 99.10%. It is fundamental
to make a proficient intrusion identification structure that utilizes a profound learning
mechanism to conquer assault issues in the framework. This convolution neural
network is utilized with different convolution layers, and an accuracy is expanded.

Keywords Deep learning · Machine learning · Convolution neurological systems ·


Long transient memory · Attacks · Threats

1 Introduction

Machine learning is on the road of leading research into artificial intelligence.


Machine learning’s success stems from two purposes: first, the duty that machine
must perform, and second, the duty that humans cannot perform. Machine learning is
the logical examination of calculations and theoretical models utilized by PC frame-
works to carry out a complex task without using direct instructions, relying instead
on patterns and inferences. This is called an artificial intelligence category. Machine
learning calculations assemble a model dependent on test information, called training

S. Patidar · I. S. Bains (B)


Delhi Technological University, New Delhi, India
e-mail: inder.rockstar07@gmail.com
S. Patidar
e-mail: sanjaypatidar@dtu.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 113
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_10
114 S. Patidar and I. S. Bains

data, such that observations or assumptions can be made without the function being
programmed directly. Studies of machine learning are focused in the fields of math,
computer science, and engineering and can provide solutions for many disciplines.
The two primary methods of AI are supervised learning and unsupervised learning,
where the use of a completely labeled dataset enables supervised learning. By
comparison, unsupervised learning occurs by the use of a totally unlabeled dataset
[1]. The model receives a dataset in supervised learning, which includes some feature
vectors and labels, which are the corresponding outputs of feature vectors. So, as a
result of given new input, the model learns to generate correct outputs. Classification
and regression are the most common forms of supervised learning [2].
No supervisor supplies the labels in unsupervised learning, which includes the
correct result of the corresponding input to train the models. So the model only has
input values that keep track of the effects of its operation. In other words, unsupervised
learning is a business that explains secret patterns from data inputs. Clustering and
the reduction in dimensionality are two traditional unsupervised methods of learning.
The underlying deep training model architecture consists of one information
covering success by a variety of shrouded coverings, which additionally took care of
contribution to the yield covering. Convolutional neural network (CNN) persists a
profound learning mechanism used primarily in PC picture preparation and language
handling. Without any preprocessing, a crude picture is taken care of straightfor-
wardly to the CNN model; it then evaluates the features through convolution tasks.
Recurrent neural network (RNN) is an additional form of profound learning standard
that has made positive ground within regions, for example, NLP and text prepara-
tion. Long short-term memory (LSTM) structure is a development based on the RNN
structure, which enables inline sequences to learn patterns. Autoencoders is a sort of
artificial neural system utilized in an unsupervised way to learn efficient data coding.
This paper explores these strategies to explain the problems and possible potential
of various methods of deep learning (Fig. 1).
Here, five essential kinds of profound learning designs which are autoencoders,
convolutional neurological networks, long transient memory and repetitive neural
network are examined. Out of these, LSTM and CNN are two of the major and the
most commonly utilized methodologies.

Fig. 1 Difference between machine learning and profound learning


Intrusion Detection Using Deep Learning 115

The document is sorted according to: Section 1 depicts profound learning tech-
niques. Section 2 audits the work done in deep learning by introducing a writing
review. Section 3 presents a summary of the examination papers. Section 4 presents
the proposed work. Section 5 presents results and discussions. Section 6 gives the
end.

1.1 Deep Learning

Deep learning is a more extensive AI subfield, which is a bigger profound neural


system that could be utilized for administered, unsupervised and semi-directed
learning. The idea pertaining to profound learning was originally presented in [3]
dependent on a system of profound beliefs and demonstrated to be powerful in fields,
for example, picture handling, natural language preparing, and self-dynamic vehicle
and so forth. One downside of profound learning is the extensive preparation period;
it takes that the bigger the preparation information, the more noteworthy the prepa-
ration time, yet in order to perform well, profound learning strategies need enormous
information for preparing. The types of deep learning are as follows:

1.1.1 Convolutional Neural Networks (CNNs)

In profound learning, CNN is a class of profound neural systems, most normally


used for optical imaging analysis. Types of multilayer perceptrons are allocated to
CNNs. Multilayer perceptrons generally refer to completely associated structures;
for example, every neuron in one layer is attached in the next layer to all neurons. A
convolutionary neural system comprises one input and one yield layer, just as a few
shrouded layers [4]. CNN’s secret layers are usually a sequence of convolutionary
layers that lap with duplication or other results of dots. The enactment feature is
always a layer with RELU. This is later accompanied by extra convolutions, for
example, pooling layers, completely connected layers and uniform layers called
hidden layers, as their data sources and yields are covered by the activation function
and by the last convolution (Fig. 2).

Fig. 2 Architecture of CNN


116 S. Patidar and I. S. Bains

Fig. 3 Structure of LSTM memory cell

1.1.2 Long Short-Term Memory (LSTM)

LSTM is an artificial, repetitive model of the neural system used in deep learning.
Unlike normal neural feed-forward networks, LSTM has ties to feedback. It can
handle not only just single data points but also whole data sequences. A typical
LSTM system comprises a cellule, an information port, a yield entryway and a
forget port [5]. The battery recalls esteems above self-assertive timeframes, and the
three ports control the expressive progression of data inside and outside the cell.
LSTM systems are used to classify, process and make expectations subject to time
arrangement information because there may be slacks of obscure time in a time series
between important occurrences. LSTM applications include robot control, prediction
of the time series, recognition of voice, rhythm processing, processing grammar and
recognition of human behavior (Fig. 3).

2 Literature Survey

Deep learning is a mainstream examination zone among scientists. This area presents
a writing audit of the work done in this field.
Intrusion Detection Using Deep Learning 117

A lot of work has been concluded in deep learning with the assistance of deep
learning strategies. Chen et al. [6] are exploring a profound learning way of assisting
collaborative altering in Q&A pages. The main concept in this is to help inexperienced
editors to alter posts with a wide variety of subjects and to encourage the group to edit
sentences. This exhibits the practicality of preparing a profound learning model with
community post alters and afterward utilizing the prepared model to help community
post altering.
Lu et al. [7] examined security aspects by sniffing a deep learning smartwatch
password that is a Snoopy method. Snoopy uses a uniform structure to separate
movement information portions, albeit passwords are inserted, and utilizes new
profound neurological systems directed toward surmise the real watchwords. This
system can effectively spy information on moving out of sight while entering pass-
words. Without devouring noteworthy force/computational assets, it can successfully
extract password segments of motion data on smartwatches in real time.
Pouladzadeh et al. [8] presented an app that utilizes the image of the nourish-
ment, taken by the client’s cell phone, to perceive different nourishment things in
a similar food to evaluate the calorie and sustenance of the nourishment. In this,
the client is challenged to rapidly recognize the broad territory of nourishment by
an outline a bouncing ring on the nourishment image by contacting the canopy. The
framework, at that point, utilizes picture handling and statistical insight for food item
acknowledgment.
Shone et al. [9] presented a system that plays a crucial role in defending PC systems
called network intrusion detection frameworks. This paper gives another profound
learning strategy for interference identification, addressing to the expanding levels
of human cooperation required and diminishing degrees of exactness in detection.
For future work, the primary investigation road for development will be to evaluate
and stretch out the model’s ability to deal with zero-day assaults and afterward, hope
to develop the existing assessments by using genuine backbone system freight to
exhibit the benefits of the all-encompassing representation.
Liu et al. [10] presented a portable practice, Third-eye that can transform cell
phones toward top-notch PM2.5 screens, allowing for a crowd-sensing approach to
monitor fine-grained PM2.5 in town. Then they use two profound learning represen-
tations, convolutional neural system for pictures and long transient memory system
for climate and wind contamination information, to manufacture a start to finish
PM2.5 inference models training framework. Future work is to build up its world-
wide form to enable more clients to screen the air quality conventionally and secure
their well-being.
Kang et al. [11] presented the design of a crisis alarm framework dependent on
profound learning. It is deemed appropriate for use in the existing foundation, for
example, shut circuit TV and other checking equipment. Experiments were conducted
on car accidents, and natural disasters like fire and effective results were obtained
for emergencies.
Roopak et al. [12] examined different profound learning representations for digital
surveillance in Internet of Things (IoT) frameworks. The profound learning standards
are assessed utilizing the CICIDS2017 data file, which contains generous and the
118 S. Patidar and I. S. Bains

most forward-thinking normal assaults, which take after the genuine certifiable infor-
mation for identifying DDoS assault. For future work, the usage of IDS dependent on
profound learning order could be tried for haze to hub design utilizing disseminated
equal handling.
Chandran et al. [13] introduced a unique utilization of a profound learning system
for recognizing the reports missing youngsters from the photographs of a huge
number of kids accessible with the assistance of face acknowledgment. The general
population can transfer images of the skeptical kids within a typical entrance with
milestones and comments. The photograph is naturally contrasted and the enlisted
image of the lost kid from the storehouse. Convolutional neural network, which is
an exceptionally viable profound learning strategy for picture-based operation, is
utilized for surface acknowledgment.
Yahyaoui et al. [14] introduced a decision support system for diabetes predic-
tion dependent on conventional machine learning strategies with profound learning
approaches. For conventional machine learning, they utilized support vector machine
and the random forest; on the other hand, for profound learning, they utilized a fully
convolution neural system to foresee and identify the diabetes patients. The dataset
they utilized for this framework was the PIMA Indians Diabetes database. Future
work is to improve the feature extraction step by applying a programmed profound
feature extraction approach and for obtaining a superior fitting model to improve the
expected precision.
Jaradat et al. [15] introduced a way to detect the victims trapped in burning
sites. This work recommends recognizing victims in fire circumstances utilizing the
convolutional neural system model. The goal is to classify input pictures sent from
the burning site into one of the three classes: people, pets or no victims. Future work
is to assess the risk level associated with each recognized and characterize the best
way to contact those people in harm’s way. By characterizing a reasonable scoring
standard for threat level related to each detected individual, the firemen can organize
their tasks during fire conditions.
Sathesh et al. [16] presented an improved soft computing approach to identify the
intrusion that cause security issues in the social community. The proposed strategy of
the paper employs the enhanced soft computing method that consolidates the fuzzy
logic, decision tree, K means -EM and the AI in preprocessing, feature reduction,
clustering and classification, respectively, to build up a security approach that is
more viable than the conventional calculations in recognizing the abuse in the social
organizations.
Raj et al. [17] presented an investigation of the computational savvy methodolo-
gies as they appear to be reasonable choice for the man-made brainpower conquering
the mistake and drawbacks in it. Assessment of the distinctive computational method-
ologies to find the perfect one in the identification of bogus access will be a future
heading.
Intrusion Detection Using Deep Learning 119

3 Summaries of Research Papers

See Table 1.

Table 1 Comparison of techniques in deep learning


Year/paper Algorithm used Dataset Accuracy Future work
2016 [11] Deep learning + disaster Fire 99% To extend the
Car accident 95% EAS to
distinguish
crimes and
explain the
benefits of
profound learning
over traditional
monitoring
frameworks
2017 [8] Deep learning + mobile multi FooDD 94.11% To achieve an
food elevated level of
exactness of food
recognition
2017 [7] Deep learning + mobile PIN (seq2dgt) 18% Motion sensors
devices APL (seq2pwd) 64.5% should have
APL (seq2dgt) 42.6% adequate
countermeasures
in place or
require increased
permissions
2018 [10] Air quality + W&A and 81.5% To build up its
PM 2.5 + PM2.5 worldwide form
CNN + LSTM to enable more
clients to screen
the air quality
conventionally
and secure their
well-being
2018 [9] Deep learning KDD Cup’99 97.85% To evaluate and
+ anomaly NSL-KDD 85.42% stretch out the
detection + model’s ability to
KDD + deal with
autoencoders zero-day attacks
and to use
real-world
backbone system
freight to exhibit
the benefits of the
extended model
(continued)
120 S. Patidar and I. S. Bains

Table 1 (continued)
Year/paper Algorithm used Dataset Accuracy Future work
2019 [12] Internet of CICIDS2017 97.16% The usage of IDS
Things + dependent on
DDoS + deep profound learning
learning + order could be
CNN + LSTM tried for the fog
+ RNN to node design
using distributed
parallel
processing and
also develop a
deep learning
model which
could work on the
unbalanced
dataset
2019 [14] Machine SVM 65.38% Improve the
learning + RF 83.67% feature extraction
Deep learning CNN 76.81% step by applying
+ support a programmed
vector profound feature
machines + extraction
random forest
+ CNN
2020 [15] CNN + image One-Step CNN 96.3% To evaluate the
processing Two-step 94.6% risk level related
cascaded CNN to each
recognized and
characterize the
best way to
contact people in
danger and
assigning a
scoring standard
for threat level
with each person

4 Proposed Work

In this paper, for the security of IoT systems from digital attack, deep learning
models like CNN, RNN and LSTM are implemented and contrasted with machine
learning algorithms. DDoS attacks have influenced numerous IoT systems and that
has resulted in huge losses. So deep learning models provide high precision as
compared to machine learning algorithms for attack detection. IDS are an effective
technique for detection of cyberattacks in any network. A fog-to-node computing
is utilized for the implementation of IoT systems. The dataset contains the training
set which labels the attack as benign or attack, and the test set contains the IDS
Intrusion Detection Using Deep Learning 121

model which tests with profound learning models. The dataset utilized for this work
is CICIDS2017 which contains benign and most up-to-date common attacks, which
resemble true real-world information. The dataset is gathered for five consecutive
days with a wide range of digital attacks along with normal data. In Fig. 4, first, divide
the information into the preparation and testing part, where 70% of data have utilized
for preparing and rest 30% part for testing. Then, in the next step, label the data by
assigning benign as 0 and Web attack as 1. Then, the data should normalize and train
using deep learning approaches, and in testing part, the data will be normalized, and
further, in the next step, IDS model is used to detect attack in the system.

Fig. 4 Flowchart of the IDS process


122 S. Patidar and I. S. Bains

5 Results and Discussion

For leading the proposed endeavor, it has utilized the most recent DDoS assault
CICIDS2017 dataset. CICIDS2017 datasets contain exceptional genuine work
arranged taking after information. This dataset was accumulated for five sequen-
tial days with distinct cyberattacks alongside ordinary information. This dataset
contains the latest cutting-edge arranged information with and without assault which
is near the genuine work organized data. The main objectives are to implement deep
learning method with higher accuracy in cybersecurity to compare the accuracy
with the existing methods. The proposed deep learning model which is the altered
CNN-based deep learning algorithm in this, multiple beds of deep learning-based
convolution and max pooling are applied to improve the accuracy. In this, there is
convolution bed followed by max pooling bed; after that, dropout layer is applied.
The dropout layers are consolidated to spare the structure from warming. Yield from
the dropout bed is dealt with to flatten bed which at that point provide input for the
dense bed which then provide input to the subsequent dropout layer. Yield from the
dropout bed is dealt with the second dense layer with sigmoid, relu and softmax
initiation work. This proposed deep learning model is perfect when necessity is of
less calculation as there is less parameter required in this model. Here, the accuracy
of the model is 99.10%. The proposed deep learning model for the portrayal results
is shown in Fig. 5; variation in accuracies is seen as per the increase in the epochs.

Fig. 5 Epochs versus accuracy and epochs versus loss curve of the proposed deep learning model
Intrusion Detection Using Deep Learning 123

Table 2 Performance
Model Precision Recall Accuracy
metrics evaluation
CNN model 94.33% 97.62% 98.32%
Proposed deep learning model 96.54% 98.44% 99.10%

As number of rounds increases, there are variations in testing accuracy. The varia-
tion in accuracy shows that the accuracy is not steady; it continues fluctuating. The
loss vs. epoch curve shows variation in loss as per the increase in the epochs. Here,
loss is the loss which the proposed model is experiencing, and epoch is the number
of rounds through which the model is experiencing, and accuracy is the precision
which the model is accomplishing. These parameters are essential to quantify intru-
sion detection accuracy as they help to determine productivity and viability of the
model.
The performance metrics evaluation is shown in Table 2. In this, the CNN model
which is the base model has precision of 94.33%, recall of 97.62% and accuracy of
98.32%. In this, the proposed deep learning model has precision of 96.54%, recall
of 98.44% and accuracy of 99.10%.
For calculation, global parameters which are accuracy, recall and precision are
used to compute the values.

TP + TN
Accuracy = (1)
TP + TN + FP + FN
TP
Precision = (2)
TP + FP
TP
Recall = (3)
TP + FN

In this, TP is genuine positive, FN is bogus negative, TN is genuine negative, and


FP is bogus positive. Since the dataset is imbalanced, the exhibition of the models is
assessed as far as precision, recall and accuracy.
Comparison of both the models on the basis of three parameters, accuracy, recall
and precision is shown in Fig. 6. Here, CNN model is the base model and the second
is the proposed deep learning model. It is seen that the proposed deep learning model
performed better than the base model and has a higher accuracy.

6 Conclusion

Deep learning is, for sure, a quickly developing utilization of AI. The quick utiliza-
tion of the innovation of profound learning in various fields truly shows its prosperity
and adaptability. This investigation gives a thought regarding the techniques asso-
ciated with deep learning. Besides, a similar examination of techniques utilized for
124 S. Patidar and I. S. Bains

Fig. 6 Comparison of 100.00%


evaluation parameters for 99.00%
models 98.00%
97.00%
96.00% Precision
95.00%
94.00% Recall
93.00% Accuracy
92.00%
91.00%
CNN model Proposed deep
learning model

different assignments of deep learning is additionally introduced in the investiga-


tion. It is inferred that these techniques are significantly utilized for advancement to
acquire better outcomes. The outcomes acquired from different analysts imply how
the calculation utilized in mix with different methodologies on various datasets giving
diverse exactness. For examination, the usage of IDS dependent on profound learning
order could be tried for the fog to node design using distributed parallel processing
and also develop a deep learning model which could work on the unbalanced dataset.

References

1. Maloof MA (2006) Machine learning and data mining for computer safety: techniques and
applications. Springer, Berlin
2. Alpaydin E (2014) Introduction to machine learnin. MIT Press, Cambridge
3. Hinton GE (2009) Deep belief networks. Scholarpedia 4(5):5947. Hoppensteadt, FC, pp.129–
35. https://www.scholarpedia.org/article/Deep_belief_networks
4. Dan C, Meier U, Masci J, Gambardella LM, Schmidhuber J (2011) Flexible, high-performance
convolutional neural networks for image classification. In: Proceedings of the twenty-second
international common conference on artificial intelligence 2:1237–1242
5. Hasim S, Andrew S, Beaufays F (2014) Short-term long memory recurrent neural network
architectures for large scale acoustic modeling
6. Chen C, Xing Z (2016) Mining technology landscape from stack overflow. In: Proceed-
ings of the 10th ACM/IEEE international symposium on empirical software engineering and
measurement. ACM, p 14
7. Harbach M, Luca AD, Egelman S (2016) The anatomy of smartphone unlocking: a field study
of android lockscreens. In: ACM conference on human factors in computing systems, CHI
8. Pouladzadeh P, Kuhad P, Peddi SVB, Yassine A, Shirmohammadi S (2016) Calorie measure-
ment and food classification using deep learning neural network. In: Proceedings of the IEEE
international conference on instrumentation and measurement technology
9. Dong B, Wang X (2016) Comparison deep learning method to traditional methods using
for network intrusion detection. In: Proceedings of 8th IEEE international conference
communication software and networks, pp 581–585
10. Al-Ali AR, Zualkernan I, Aloul F (2010) A mobile GPRS-sensors array for air deterioration
control. IEEE Sens J 10:1666–1671
11. Baek MS, Lee YH, Kim G, Park SR, Lee YT (2013) Development of T-DMB emergency
broadcasting system and trial service with the legacy receivers. IEEE Trans Consum Electron
59:38–44
Intrusion Detection Using Deep Learning 125

12. Chadd A (2018) DDoS attacks: past, present and future. Netw Secur 2018:13–15
13. Satle R, Poojary V, Abraham J, Wakode S (2016) Missing child identification using face
recognition system. Int J Adv Eng New Technol 3(1)
14. Punthakee Z, Goldenberg R, Katz P (2018) Definition, classification, and diagnosis of diabetes,
prediabetes and metabolic syndrome. Can J Diabetes 42:S10–S15
15. Pinales A, Valles D (2018) Autonomous embedded system vehicle design on environmental,
mapping and human detection data acquisition for firefighting situations. In: IEEE 9th
annual information technology, electronics and mobile communication conference (IEMCON),
Vancouver, BC, Canada
16. Sathesh A (2019) Enhanced soft computing approaches for intrusion detection schemes in
social media networks. J Soft Comput Paradigm (JSCP) 1:69–79
17. Raj JS (2019) A comprehensive survey on the computational intelligence techniques and its
applications. J ISMAC 1(3):147–159
Secure Trust-Based Group Key
Generation Algorithm for Heterogeneous
Mobile Wireless Sensor Networks

S. Sabena, C. Sureshkumar, L. Sai Ramesh, and A. Ayyasamy

Abstract In mobile wireless sensor networks, the development of significant protec-


tion to the network after calculating dynamic key might affect the exploitation of data
by malicious node. All the messages swapped throughout the cluster-based routing
also need to be confined by offering truthfulness and discretion. The proposed design
comprises of assorted backbone nodes that are deployed in the network using secure
trust-based group key generation (STGG) algorithm. They achieve a protected clus-
tering procedure by using the elimination related model. The cost value is calculated
from the nodes optimization parameters. It is used to estimate the dynamic key for its
information communication. Since the constraints used for clustering formation are
used as a supplementary throughout dynamic key generation, the protection has to be
supplied to the clustering period. When the node travels from one cluster to another
cluster, protected cluster preservation is performed, and when the data requires to
broadcast from source node to sink node, the protected route detection will be imple-
mented inside the clusters. The simulation results show that the proposed scheme is
more protected and reduces the network communication overhead.

S. Sabena
Department of Computer Science Engineering, Anna University Regional Campus, Tirunelveli,
Tamil Nadu, India
e-mail: sabenazulficker@gmail.com
C. Sureshkumar
Faculty of Information and Communication Engineering, Anna University, Chennai, Tamil Nadu,
India
e-mail: msa.suresh@gmail.com
L. Sai Ramesh (B)
Department of Information Science and Technology, Anna University, Chennai, Tamil Nadu, India
e-mail: sairamesh.ist@gmail.com
A. Ayyasamy
Department of Computer Engineering, Government Polytechnic College, Nagercoil, Tamil Nadu,
India
e-mail: samy7771@yahoo.co.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 127
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_11
128 S. Sabena et al.

Keywords STGG · Dynamic key · Secret key · Network · HWSN

1 Introduction

Heterogeneous wireless sensor network (HWSN) provides better performance in


terms of routing and security because of its high capability and available function-
ality [1]. With the powerful computational resources, the heterogeneous nodes can
perform complex data processing and long-term information storage. Link hetero-
geneity provides a more dependable data broadcasting [2–4]. The establishment of
optimization can optimize the processing of data in terms of power, using some
amount of sensor platforms [5]. In a mobile wireless sensor networks (MWSNs),
the mobile sensor nodes necessitate several occasions, wherever protection exami-
nations as they shift from one position to another position. Because of the dynamic
mobility of sensor nodes in the network, they can be relevant in buildings’ fire disaster
rejoinder, objective tracking, healthcare observing and monitoring. In the future,
ubiquitous surroundings of every wireless sensor node might be alive in nature [6].
As nodes move from one position to another position, node verification, commu-
nication privacy and data integrity have to be guaranteed [7, 8]. As soon as two
contiguous nodes attempt to deliver data on the identical channel or using moder-
ately nesting channels, they may obstruct with one other, which consequences in
capability diminution. Therefore, creating a proficient technique to improve wireless
obstruction might facilitate to accomplish elevated performance networks [10–12].
The fuzzy-based trust evaluation and energy-efficient secure routing mechanism
using machine learning approaches and soft computing techniques are discussed in
[13–18]. Figure 1 demonstrates the architecture of Heterogeneous Mobile Wireless
Sensor Networks which consists of the base station, moving sink node and the source
sensor nodes. The sensor nodes are grouped into the clusters. There will be a moving
trajectory is employed within the moving sink nodes.
Secured broadcasting within the group of Network secures the group from eaves-
dropping. The capability of establishing the secured broadcasting links in a network
used in several real-time applications. The secret group key generation is the key
aspect in this proposed work. The secure trust-based group key generation algorithm
(STGG) is proposed for secure communication. A mathematical approach is created
for the group key generation for secure trust-based group key generation algorithm.
Our proposed model does not transmit the total number of bits that are used to the
eavesdroppers. The size of the key for the proposed system is the most scalable one.
The main objectives of the proposed work are
1. The shared group key is generated.
2. A mathematical modeling is created for the entropy of the shared group key to
reduce the problem.
3. The verification methodology is used to check the correctness of the created
shared group key.
Secure Trust-Based Group Key Generation Algorithm … 129

Fig. 1 Heterogeneous mobile wireless sensor networks

2 Proposed System Model for Group Key Generation

The generation of group keys for existing and enable nodes which may join in the
group as intermediate. The view of the key distribution is shown in Fig. 2. A system
is formed with a group of N nodes. It is observed that there are N-1 autonomous
broadcasting groups available for data transmission. The mobile nodes don’t have
any predefined shared key or secret data rather than the protocol. The broadcasting
nodes are not believed to have any additional capacity to generate other kinds of
keys. The pre-requisites are shared at neither the physical nor the MAC layers. By
the assumption, the packet losses in the wireless networks within the group of nodes
or eavesdroppers are autonomous. There are M attackers with the same capacity as
the N nodes to any N-1 groups. The location of the eavesdroppers is not known to
any node in the network. Eavesdroppers are capable of change the channels at the
equal rate as the end users endeavor to generate shared secret information. Figure 2
demonstrates the group key generation for the mobile nodes in the heterogeneous
mobile wireless sensor networks.

3 Proposed Work

Three Nodes A, B, and C desire to ascertain the secure transmission within them
and produce a secret group key. There will be another node E who is eavesdropping
130 S. Sabena et al.

Fig. 2 Group key generation

on communication within A, B, and C. Table 1 illustrates the Sharing information


within A, B, and C in Phase 1 and eavesdropped by E.

3.1 Phase 1—Generation of Pair-Wise Keys

Phase 1 contains 3 states, every state containing S rounds. In the initial state, A and
B broadcast when C receives. Node A broadcasts on network NA for S rounds, Node
B broadcasts on network NB for S rounds. All the transmitting A and B in every
S round are randomly produced. A’s broadcasting in every S rounds is denoted by
KeyAC .

3.2 Phase 2—Generation of Group Key

A, B, and C all can producesession keysSeA , Se


 B , SeC correspondingly.
 A employs
pair-wise symmetric keys Key2AB , Key2 BA and Key2 AC , Key2CA to encrypt and broad-
cast SeA to B and C accordingly. Alternatively, B employs pair-wise symmetric
Table 1 Sharing information within A, B, and C in Phase 1 and eavesdropped by E
State A B C Er EF
   
KeyAC KeyBC
1 [KeyAC ] [KeyBC ] 2 , 2 Key AC Key BC [KeyAC ]
2 , 2
   
KeyAB KeyCB
2 [KeyAB ] 2 , 2 [KeyCB ] Key AB Key CB [KeyBC ]
2 , 2
   
KeyBA KeyCA
3 2 , 2 [KeyBA ] [KeyCA ] Key BA Key CA [KeyCA ]
2 , 2
Secure Trust-Based Group Key Generation Algorithm …


Aggregation [KeyAC , KeyAB ] [KeyBC , KeyBA ] [KeyCB , KeyCA ] Key AC Key BC Key AB [KeyAC , KeyAB ]
2 , 2 , 2 ,
   
KeyBA KeyCA KeyAB KeyCB KeyAC KeyBC
Information 2 , 2 2 , 2 2 , 2 Key CB Key BA Key CA [KeyCA ]
2 , 2 , 2
131
132 S. Sabena et al.
   
keys Key2 AC , Key2CA and Key2AB , Key2 BA to encrypt and broadcast SeA to B and C
accordingly.

3.3 Method 1—Initialization

1. A binary tree BT contains a total of n nodes and every node is demonstrated as


Node[1...n] . Every node in BT has Key[1...n] .
2. There will be m devices D[1...m] and every device is grouped as a leaf node of BT
from left to right.
3. It is determined that the value of group key is GrK and the root node Node1 ’s
key value is Key1 .
4. It is demonstrated that the key value of the secret key (SeK1 ) is given to
Node L &Node R child.
5. Source node SN creates an arbitrary number ASN and calculates the group
signature key GroupSN (ASN G).
6. The adjacent node Adjy accepts the message from SN and validates the signature
from SN. If the validation is positive then the adjacent node produces the arbitrary
number (ASN ) and calculates the signature key GroupSN (ASN G|BSN G).
To calculate the secret key value of Nodei and update the GrK value, this research
work demonstrate the random methods (RMs) that use the key on input A with length
i bits and produce the output j bits as

RM(i→ j) : Key x{0, 1}i → {0, 1} j (1)

The RM is relatively quicker and more secure than any other security methods
for producing the values of Secret keys. To improve the security using BT, the secret
key calculation is used as shown in Fig. 3. Before evaluating the key for the group,
the presence of malicious node is identified and modifications are carried out in key
generation.

3.4 Method 2—Secret Key Calculation

1. Every node of a BT calculates their value of secret key SeKi .


2. SeK2 in the left child value and the root node is calculated as

SeK2 = RMKey2 x2 (SeK1 ) (2)


Secure Trust-Based Group Key Generation Algorithm … 133

Fig. 3 Flowchart for the proposed work

Key2 x = RMKey2 x2 (3)

Key2 is the key value of the Node2

SeK3 = RMKey3 x3 (SeK1 ) (4)

SeK4 = RMKey4 x4 (SeK2 ) (5)

SeK5 = RMKey5 x5 (SeK2 ) (6)

The devices (D1 , D2 , . . . , D8 ) are situated on the child node of BT. It is important
for every key node must know the value of every parent’s secret key. It is impossible
134 S. Sabena et al.

for the entire device to the BT without knowing the secret key value of the parent
node. For example, the SeK12 of Node14 can be calculated by knowing the secret key
values of parent nodes SeK7 , SeK3 , SeK1 . Every device keeps their secret key value
in Integrated Circuit.

3.5 Session Key Maintenance Phase

Every device generates a session key for secure transmission within the group. The
shared secret key values within the source to the destination are calculated using the
secret key values of the nodes.

SessKey8,15 = (SeK9 ||SeK2 ||SeK14 ||SeK3 ||random D1 ) (7)

3.6 Algorithm—STGG

3.7 Notations Used in the Proposed Work

Table 2 illustrates the symbols are used in the proposed work.


Secure Trust-Based Group Key Generation Algorithm … 135

Table 2 Notations used


Notation Description
SN Source node
ASN Arbitrary number of source node
GroupSN (ASN G) Group signature key
Adj y Adjacent node
BSN Secondary number of source node
BT Binary tree
NA Network
S Rounds
KeyAB Pair-wise symmetric key
M Number of attackers
E Eavesdropper
A, B, and C Nodes
SeA Session key
SeK1 Secret key
GrK Group key
NodeR Right child node
NodeL Left child node
Node1 Root node
RMs Random methods
D1 Device
SessKey8,15 Session key
random D1 Random value for Device 1
Probsuc Probability for successful transmission
Probfail Probability for failure transmission
δ Encryption function
T Transmission
Ex Energy for node x
EntropyPairwise Entropy for the pair-wise key
Entropy(1) Entropy value

4 Performance Evaluation

4.1 Experimental Setup

Table 3 demonstrates the Experimental Setup in detail.


136 S. Sabena et al.

Table 3 Experimental setup


Name of the parameter Value
Simulator type NS2
Network traffic type CBR
Physical mode Wi-Fi
Gain for the receiver −20.0 dB
Total number of nodes in every state 25 nodes
Type of radio IEEE 802.15.07
Band for frequency 2.45 GHz
Distance 10.5 m
Model of antenna Omni
Delay of propagation Constant
Type of topology Binary tree

4.2 Entropy for the Pair-wise Key


EntropyPairwise (i, j) = EntropyPairwise ( j, i) (8)

Entropy(1)
EntropyPairwise (i, j) = 2 × (9)
M −1

Figure 4 demonstrates the pair-wise keys are involved for the process of encryption
within the network communication. The experiments are evaluated with four different
set of nodes. The entropy level is evaluated based on the total number of rounds.

Fig. 4 Entropy
Secure Trust-Based Group Key Generation Algorithm … 137

4.3 Probability of Successful Transmission


D  
M − i − 1  

Probsuc = 1 − 1 − 1 − δTx E x 1 − δTx,i (10)


i=1
M −i

D  
M − i − 1  

Probfail = 1 − 1 − δTx E x 1 − δTx,i (11)


i=1
M −i

Probfail = 1 − Probsuc

Ra
ProbRa (z) = z
Probsuc xProbRa−z
fail (12)
z
D  z
Ra M − i − 1  

1− 1 − 1 − δTx E x 1 − δTx,i
z
i=1
M −i
D  Ra−z
Ra M − i − 1  

x 1− 1 − 1 − δTx E x 1 − δTx,i (13)


z M −i
i=1

4.4 Performance Analysis

The proposed method STGG is compared with PA-SHWMP [3] and HWMP [9].
Figure 5 explains the comparison for number of malicious nodes versus the packet
delivery ratio. Figure 6 demonstrates the comparison between the total amounts of
malicious nodes versus route acquisition delay. Figure 7 illustrates the comparison
between the total amounts of malicious nodes versus average end-to-end delay.
Figure 8 illustrates the comparison between the total amounts of malicious nodes
versus message overload. Figure 9 illustrates the lossy links versus the false-positive
rate. The Simulation results suggest that the proposed method STGG is performed
better than the related methods PA-SHWMP and HWMP. The entropy is created for
the pair-wise group key in the Wi-Fi physical channel. The MAC protocol is used to
find the performance analysis.

5 Conclusion

This paper generates the STGG algorithm to generate the shared group key. For
the security analysis of the proposed algorithm with related algorithms, the entropy
related security is used to analyze the performance metric. The false-positive rate is
used to analyze the unconnected failure link in the network. Basically, the packets are
138 S. Sabena et al.

Fig. 5 Number of malicious nodes versus packet delivery ratio

Fig. 6 Number of malicious nodes versus route acquisition delay


Secure Trust-Based Group Key Generation Algorithm … 139

Fig. 7 Number of malicious nodes versus average end-to-end delay

Fig. 8 Number of malicious nodes versus message overhead


140 S. Sabena et al.

Fig. 9 Lossy links versus false-positive rate

dropped in the network by the unwanted behavior of the active attacker or less amount
of connectivity. The route acquisition delay is calculated using the time period from
RREQ to RREP messages from the beginning node to the ending node.

References

1. Vivek K, Narottam C, Soni S (2010) Clustering algorithms for heterogeneous wireless sensor
network: a survey. Int J Appl Eng Res 1:273–287
2. Chun-Hsien W, Yeh-Ching C (2007) Heterogeneous wireless sensor network deployment and
topology control based on irregular sensor model. Adv Grid Pervasive Comput 4459:78–88
3. Sathiyavathi V, Reshma R, Parvin SS, SaiRamesh L, Ayyasamy A (2019) Dynamic trust
based secure multipath routing for mobile ad-hoc networks. In: Intelligent communication
technologies and virtual mobile networks. Springer, Cham, pp 618–625
4. Vivek M, Catherine R (2004) Homogeneous vs heterogeneous clustered sensor networks: a
comparative study. IEEE Int Conf Commun 6:3646–3651
5. Andreas R, Daniel B (2013) Exploiting platform heterogeneity in wireless sensor networks
by shifting resource-intensive tasks to dedicated processing nodes. In: IEEE international
symposium on a world of wireless, mobile and multimedia networks (WoWMoM), pp 1–9
6. Yu Y, Peng Y, Yu Y, Rao T (2014) A new dynamic hierarchical reputation evaluation scheme
for hybrid wireless mesh networks. Comput Electr Eng 40(2):663–672
7. Selvakumar K, Karuppiah M, SaiRamesh L, Islam SH, Hassan MM, Fortino G, Choo KKR
(2019) Intelligent temporal classification and fuzzy rough set-based feature selection algorithm
for intrusion detection system in WSNs. Inf Sci 497:77–90
Secure Trust-Based Group Key Generation Algorithm … 141

8. Kamalanathan S, Lakshmanan SR, Arputharaj K (2017) Fuzzy-clustering-based intelligent


and secured energy-aware routing. In: Handbook of research on fuzzy and rough set theory in
organizational decision making. IGI Global, pp 24–37
9. Bansal D, Sofat S, Singh G (2010) Secure routing protocol for Hybrid Wireless Mesh Network
(HWMN). In: Proceedings of the international conference on computer and communication
technology (ICCCT’ 10), pp 837–843, Allahabad, India
10. Akyildiz IF, Wang X, Wang W (2005) Wireless mesh networks: a survey. Comput Netw
47(4):445–487. https://doi.org/10.1016/j.comnet.2004.12.00
11. Basilico N, Gatti N, Monga M, Sicari S (2014) Security games for node localization through
verifiable multilateration. IEEE Trans Dependable Secure Comput 11(1):72–85
12. Arora A, Sang L (2009) Dialog codes for secure wireless communication. In: Proceedings of
the IPSN conference
13. Selvakumar K, Ramesh LS, Kannan A (2016) Fuzzy based node trust estimation in wireless
sensor networks. Asian J Inf Technol 15(5):951–954
14. Thangaramya K, Logambigai R, SaiRamesh L, Kulothungan K, Ganapathy AKS (2017) An
energy efficient clustering approach using spectral graph theory in wireless sensor networks.
In: 2017 Second international conference on recent trends and challenges in computational
models (ICRTCCM). IEEE, pp 126–129
15. Selvakumar K, Sairamesh L, Kannan A (2017) An intelligent energy aware secured algorithm
for routing in wireless sensor networks. Wirel Pers Commun 96(3):4781–4798
16. Selvakumar K, Sairamesh L, Kannan A (2019) Wise intrusion detection system using fuzzy
rough set-based feature extraction and classification algorithms. Int J Oper Res 35(1):87–107
17. Smys S (2019) Energy-aware security routing protocol for WSN in big-data applications. J
ISMAC 1(01):38–55
18. Haoxiang W, Smys S (2020) Soft computing strategies for optimized route selection in wireless
sensor network. J Soft Comput Paradigm (JSCP) 2(01):1–12
A Study on Machine Learning Methods
Used for Team Formation and Winner
Prediction in Cricket

Manoj S. Ishi and J. B. Patil

Abstract Sports prediction is becoming popular day by day as a huge amount of


data is generated after a single match. The number of methodologies is available
for classification of sports data. Machine learning is also one of the best techniques,
which obtains good results in sports prediction. In this world of sports, cricket is
gaining a huge amount of popularity from the last few decades. The problem of
team prediction and winner prediction emerges as one of the challenging tasks in
the game of cricket. The winning team combination for the tournament is based on
the analysis and evaluation of players from the past study. This type of model builds
with several features such as previous record and recent performance of players. The
performance analysis of players is a basic problem in every sport including cricket.
This performance analysis is used to find the strengths and weaknesses of the players.
This work is useful for team management and captain for the selection of players.
The prediction of a winner in cricket matches is also one of the complex research
problem. Many features are needed for deciding the winner of the match like proper
team combination, venue, and weather condition. Cricket is a dynamic game where
probability changes with different phases of the game. It is a multi-criteria decision
problem. In this paper, the study of some existing methods used for team formation
and winner prediction in cricket is done.

Keywords Team formation · Winning prediction · Machine learning · TOPIS ·


Neural network

M. S. Ishi (B) · J. B. Patil


Department of Computer Engineering, R. C. Patel Institute of Technology, Shirpur 425405, MS,
India
e-mail: ishimanoj41@gmail.com
J. B. Patil
e-mail: jbpatil@hotmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 143
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_12
144 M. S. Ishi and J. B. Patil

1 Introduction

In today’s world of sport, a large amount of statistical information is generated.


Statistics are generated for players, teams, and series played between two teams.
Data mining techniques can be useful for mining large data for experts, coaches,
and statisticians. Team prediction, result prediction, and identifying the emerging
player of a tournament are some research problems related to the sports domain. The
problem of team formation and result prediction are some of the interesting areas
of the researchers. These problems depend upon many parameters such as batsman
performance, bowler performance, and weather conditions. [1].
The team is made of players, better players define a good team. According to
the strength of players, ranking is assigned to them. If players perform consistently
good, then the higher rank is assigned to them. If the players are assigned with
higher ranking, then it increases the chance of their selection. While selecting the
team with batting and bowling parameters, the ranking of players can be useful.
Team selection is one of the critical problems. It consists of numerous parameters
such as playing conditions, fitness, performance, and current form of the players.
A number of players are available throughout the country. The team is selected
from a pool of these available players depending on the opposite team strength and
conditions. To make this process unbiased and error-free there is a need to analyze
the strength of players based on statistics or performance of players. This process
becomes efficient if some automatic team selection tool is available. This process
may suffer from problems if a player is not having enough statistics of playing an
international level of matches or statistics related to that particular format of cricket.
In that case, a player’s selection criterion needs to change up to a certain level by
concerning existing players. A careful study shows that for selecting the opening
batsmen, run-scoring ability with good strike rate needs to be considered. In a case
of middle-order batsmen, running between the wickets and ability to move strike
to other batsmen is considered. This is helpful because in middle overs, there is a
need for building the inning without losing the wicket. In the last phase of the game,
there is a need for pinch hitters who can move scoreboard quickly. Wicket-taking
capabilities and dot ball percentage in the power play, middle overs, and death overs
play a significant role in the selection of bowlers. Naive Bayes, random forest, SVM,
decision tree classifiers, Bayesian network, simulated annealing, and binary integer
programming are some of the methods used previously for balanced cricket team
selection [2–4].
Winner prediction is considered as a numerical prediction problem. In cricket, a
number of features are collected based on previous records of teams, the result of
previous matches, and team strength. If the result of matches is predicted correctly
then it is useful for fans, bookmakers, and TV shows. This type of prediction model
can be useful for sports analysis shows those trying to predict the result of matches
before the game or in the middle of the game. Experts, commentators, and former
players try to predict the result of matches for sport analysis shows. Previously,
mathematical and statistical models were used to predict the result of matches. Due to
A Study on Machine Learning Methods Used … 145

Fig. 1 Machine learning model steps

the self-learning nature of machine learning, it is used for the sports prediction model.
Typical statistical algorithms are used to predict the result of matches based on team
strength. This decision-making problem considers past data to predict future games
using the algorithms. This prediction models primarily depend upon the statistical-
based and simulation-based method. In the case of a simulation-based model, sport-
specific simulation engine is used to predict the outcome of a game by running the
algorithm multiple times. While the statistical model is based on the statistics of a
team in different conditions or strengths of the team [5, 6].
Machine learning is useful for designing an analytical model. Artificial intel-
ligence is the main domain of machine learning. Classification and regression in
machine learning are used to solve the prediction problem. The typical machine
learning model is shown in Fig. 1. With the help of dataset of previous matches, a
common pattern can be found from historical data. The primary aim of the classifica-
tion model is to design model based on training data, and further used this model to
evaluate other data. It is used to predict the target variable using previous data. Predic-
tion of a winner in the cricket is a classification problem where one can predict a class
label of win, loss, and draw. Support vector machine, Naive Bayes, logistic regres-
sion, cricket outcome predictor (COP), random forest, K-nearest neighbor algorithm,
logistic regression, C4.5, bagging, and boosting algorithms are used for the classi-
fication of data [7, 8]. In this paper, several methods are studied used for team and
winner prediction in cricket.

2 Related Work

Bunker and Thabtah [9] have focused on artificial neural networks for sports result
prediction. This model considers the result of existing matches, performance indica-
tors for players and opposite team information. They study the existing methodolo-
gies, data used, prediction model, and challenges that occurred during result predic-
tion. Machine learning is used as a learning framework for this prediction model.
The artificial neural network is a more recent area used for statistical and operational
research for sport prediction. ‘SRP-CRISP-DM’ type framework is proposed using
146 M. S. Ishi and J. B. Patil

this NN technique. This framework is based on six steps which consist of domain
understanding, data understanding, data preparation, feature extraction, model prepa-
ration, model evaluation, and finally deploying a model. This method is useful for
researchers, bookmakers, sports fan, and students who are interested in sports predic-
tion using neural network. Match prediction requires more accuracy, for that case
rather than using traditional statistical or analytical models one can go for the NN
model. Machine learning is preferred here because it generates a more accurate
prediction model using the already defined feature and previous dataset. ‘SRP-
CRISP-DM’ framework provides a solution for the most complex problem of sports
prediction.
Asif and McHale [10] have proposed a generalized nonlinear forecasting model
(GNLM) for predicting the number of runs to be scored in an inning of cricket. A
number of wickets and overs left are considered during this model. This model is
useful for any format of cricket. The aim of this model is used to predict total runs
in Twenty-20 international cricket. This model calculates run difference between the
two teams while the match in progress. The difference between runs can be used
to find the closeness of the game. It can be used to calculate the rating of the team
with the help of a margin of victory between two teams. This model can be useful
for target resetting in case of interrupted matches, and prediction but primarily the
main focus of this model is to determine top-20 greatest victory and accordingly
assigning a ranking to the team. This model works on the principle of the number of
wickets lost with remaining runs as a non-increasing function and runs on next ball
is non-decreasing function during inning progress. In case of overs left, the expected
runs and run on next ball are a non-increasing function of wicket lost as inning
progresses. To obtain an accurate team rating, margin of victory is considered for
this model. The problem arises when the team batting second wins the match. Then,
the margin of victory considers as wickets remaining instead of runs. This type of
problem can be resolved by increasing team two scores if they allow continuing to bat.
This is a mathematical model based on truncated survival function based on Weibull
distribution. The current system of ICC ranking does not consider the margin of
victory for team rankings. This module produces some properties for the run-scoring
pattern and accurate behavior of this model. This GNLFM model is not limited to
team rating but this can be also useful for other issues of limited over cricket. It can
be used for target resetting for interrupted matches. The score prediction model can
be designed using this framework.
Chakraborty et al. have used TOPSIS method with MCDM tool for the selection of
players in the cricket team. This tool obtains single response value as performance
measure from multiple features considered for decision making. It gets a lot of
popularity because it considers a smaller number of a parameter as input with high
consistency and less complexity [11]. The selection is based on the shortest Euclidean
distance from a positive ideal solution, and far from a negative ideal solution. A
positive ideal solution obtains if attribute receives a maximum response from the
database. In the case of a negative solution, the attribute receives a minimum response.
The poor results get balanced with a good result if the poor criterion is changed to
some other positive criterion. This method applies all forms of one day and T20
A Study on Machine Learning Methods Used … 147

cricket teams. The entropy method is used to assign a weight to the players by
the corresponding criterion. The players are selected from shortlisted data to find
possible team composition. They prefer to use the augment entropy method, where
information is available in the form of decision and evaluation matrix. They maintain
a relationship between the criterion from the available information and weight. The
primary advantage of this technique is that it finds weight from the decision matrix.
It does not consider the view of decision-makers. For the selection of players, it
measures the uncertainty of random information available in the decision matrix.
The combination of decision matrix and weight is used to get the right composition
of players.
Jayanth et al. [12] have proposed method for team recommendation and outcome
prediction in cricket. The supervised learning method is used with linear, nonlinear,
and RBF kernel to predict the outcome of a cricket match. Group of players are
formed at a different level for both the teams before predicting the outcome of the
match. The player’s contribution is measured at the same level of player into that
group. K-means clustering is used to recommend players using past data. K-nearest
neighbor classifier is used with five neighbors to find the nearest player. Unstructured
data is extracted from a sports Web site and stored in the database. Team selection is
done from the historical data by measuring the winning contribution of the players.
SVM with linear and nonlinear techniques is used to predict the outcome of the
match. This SVM model is trained with a ranking index of batsmen and bowlers.
It is treated as a binary problem by considering win and loss as a class with a
finite dimension. The player ranking index is calculated using statistics extracted
from the particular tournament. Data from the n-dimensional space is not linearly
separable; hence, SVM with nonlinear RBF kernel performs better as compared
to linear or poly kernel. The dimensions of the feature vectors are reduced with
principal component analysis (PCA). PCA converts this feature set into a new set of
variables called principal components. Accuracy, precision, and recall rate for SVM
with RBF kernel outperform other SVM models for winner prediction. For team
recommendation, k-means clustering is used, where similar players are found with
k-nearest classifier.
Chand et al. [13] have provided a model for assembling a team for a particular
tournament. The stochastic optimization technique is used for team selection. This
method of team selection is not useful when stakes are high. Here, multi-objective
integer programming model is proposed for optimal team formation. They suggest
a partial team construction model by selecting a few members and keeping other
members unchanged in the squad. Players are assigned with rank based on their
importance. The integer linear programming model is used to select an optimal
team to get a multi-objective formulation of the team. Multiple ILP modules are
used to provide a solution using a multi-objective procedure. They used a classical
constraint approach where one objective minimized or maximized to check the effect
on other objectives. A binary vector is used to provide solutions for the available set
of players. This method guarantees optimality for team selection with minimum
time. This approach is scalable using five objectives. To make this decision-making
process effective, player ranking is obtained. The current form of players will define
148 M. S. Ishi and J. B. Patil

the team is optimal, not the previous record. Batting and bowling objectives need
to maximized or minimized according to objective formulation method. A player
contribution is calculated using the measurement of the hypervolume of the team.
The higher difference in hypervolume indicates the high contribution of players. Two
way, three-way, and five-way objective optimization method based on batting and
bowling constraints used for team formation in T20 matches.
Ahmad et al. [14] have provided a method for finding the superior team. The first
aim of this technique is to focus on batting and bowling ability of the team. After
that, it uses new features to find the quality of a team. This mechanism considers the
actual performance of batsmen and bowlers in one-day international matches. For
calculating the precedence of batsmen, they considered the top-six batting positions.
This is assigned with label batting productivity precedence. Bowling productivity
precedence (BoPP) is used for the last six positions as a bowler. Overlapping players
are called as an all-rounder. These two are added together to get team productivity
precedence (TPP) over the other team. The productivity precedence algorithm (PPA)
is used here to provide productivity weights for batting, bowling, and team prece-
dence. This efficient mechanism is used to find features of batsmen, bowler, and
cricket team. Around 8 to 9 features are used and aggregate to calculate the weight
of that particular domain. The productivity precedence algorithm (PPA) is applied to
get team precedence over other teams. Bidirectional productivity graph BPG (T, I) is
designed between two teams after the PPA algorithm. In BPG, T represents a node of
the team and I refer to interaction during a match against each team. The productivity
precedence algorithm uses a network structure to find an important feature for team
precedence. For better outcome, fielding parameter can be added with batting and
bowling parameters to provide a rank for players in a team.
Khatua and Khatua [15] have proposed a method for winner prediction in 2015
world cup using Twitter users. This type of problem is not a binary classification
problem because multiple teams are involved. The logistic regression model applied
over 3.5 million tweets to get the relationship between classification and tweeting
patterns. Structured information is used for this model. It is tested with eight logistic
regression models. The user orientation with mix tweet pattern is used form the forma-
tion of a team. This model is statistically tested with likelihood ratio and independent
variable coefficients for all eight logistic regression models. Positive and negative
tweets are used to check the positive effect of the model for winner prediction.
Verma and Izadi [16] have proposed a new analytical framework called as cricket
prognostic system (CPS). Advanced machine learning and statistical methods are
used to predict the result. Around thirty dynamic features are considered based on real
statistics and historical data. Three classifiers are used in this technique for prediction
of win of a particular cricket team and also to find the distribution of players for each
match simulation. The first classifier is used to predict players dismissal, the second
classifier for remaining runs to score, and the third classifier used for calculation of the
number of extra runs scored by a team. Simulation of a cricket match is done with risk
model CPS. If the wickets do not fall then runs and extra runs are calculated on each
ball. The matches which are interrupted and shortened are not considered by CPS
system. This classifier model is designed with hundreds of indicators like batsman,
A Study on Machine Learning Methods Used … 149

bowler, ground, teams, and current state of the game. The current game state variable
includes current over, wickets fallen, and run rate in second winning. Consistency of
player, pressure index, and impact of a player are considered as indicators for CPS.
Central tendency and variability are selected as a parameter in this model. For the
number of wickets binary logistic regression model is used. Runs scored and extra
runs awarded are evaluated with six independent logistic regression models with a
probability score of 0 to 6. This probability is finally normalized into one final value.
This study is used to predict winning probabilities and to evaluate a player on their
benchmark value. Player’s comparison is based on metrics and strategies.
Singh and Kaur [17] have presented the tool for the evaluation of players using
visualization of performance. They identified the key variables required to evaluate
batsmen and bowler’s performances and also adds some extra weight. HBase tool is
used to evaluate the performance of players based on historical data. It is a distributed,
open-source tool used for storing a non-relational database. It is used to store billions
of rows and columns of data into the table. This work focuses on providing statistical
analysis using different characteristics of players. After that, it considers the statistics
of the players to predict the winning probability of the team. This tool is based on
machine learning algorithms. Semi-structured and unstructured data are stored in
HBase tool. K-nearest algorithm with four neighbors is used to predict the winner
of the match. The KNN algorithm is compared with a decision tree, random forest,
support vector machine, and logistic regression to evaluate the accuracy of the model.
Multidimensional feature space with a class label is used for every value of a training
set of KNN algorithm. During classification, k is assigned with value four, and label
for training samples to the nearest point. Euclidean distance is used for designing
distance metric. Hamming distance with a discrete variable is used for overlap metric
for classification of text. In this method, ten features are considered for prediction of
winner with a primary focus on toss and venue of the match. KNN enables data fitting
accurately by avoiding overfitting and underfitting of data with maximum accuracy.
A non-relational database is used to provide a dynamic approach to address this
problem of outcome prediction.
Dey et al. [18] have evaluated network properties for team formation and to find
whether that player belongs to a team network or not. Players are considered as nodes
of a network, and interaction of that players are denoted with edges. Intra-country
networks are considered for team selection, which inherits all characteristic using
the past performance of players. For calculation of weights during team selection
fielding, running between the wickets, and partnership are considered as more impor-
tant parameters for the evaluation of players. The social network analysis method is
used for checking the effectiveness of players. The network analysis is performed
with a bidirectional weighted network from the data of T20 cricket matches. The
clustering coefficients and centrality measurement methods of network analysis are
used for checking a player’s efficiency before adding to the squad. This approach
works on three steps: The first formation of T-20 network, second identifying the
properties of network, and finally the formation of network based on high centrality
measurement of the clustering coefficient. The players are assigned with rank, node
degree distribution, and clustering coefficient with centrality. The path length of the
150 M. S. Ishi and J. B. Patil

network from central specifies the players rank in a free-scale network. This approach
provides information about player performance and the bonding between teammates.
Some players which are having high centrality value and clustering coefficient can
help for the formation of a team.
Irvine and Kennedy [19] have provided a method to determine the performance
indicator of players. They also study the effect of performance indicator on the
outcome of the cricket match. Innings run rate, the total number of wickets taken,
and the number of dot balls is used to get magnitude-based interference for a perfor-
mance indicator. This magnitude-based interference allows selecting bowler with
good wicket-taking capability during the attacking field. The aggressive batsmen
are selected based on high boundary percentage and strike rate. The purpose of this
study is to evaluate the performance indicator of players which has positive effects on
match results. This study concludes that the difference between winning and losing
team is the number of wickets taken, dot balls, and inning run rate. It also finds the
significance of performance indicator. This study is carried out around four envi-
ronment conditions such as sub-continent condition and eastern condition. The run
rate in English condition is better in swinging conditions, but stroke play is diffi-
cult during the start of an inning. In case of sub-continent condition, initially run
rate is high, but as inning progresses run rate gets slowdown due to reverse swing.
The performance indicator for batsmen, bowler, and the team is used separately to
determine the outcome of the match.
Bandulasiri et al. [20] have used principal component analysis method for studying
batting, bowling parameter, and finding ranking of the team. The rank of a team,
number of fifties, partnership of players, number of spinners, number of fasters,
and number of all-rounders are considered as characteristic for analyzing the team.
Numerical feasibility adds an extra effect to this technique for making it more popular.
The primary purpose of the PCA is to reduce the number of variables. Here, a large
number of correlated variables are converted into linear uncorrelated variables in
the best possible way. These variables are called as a principal component. Vector
transformation is used to convert a higher dimension variable into a smaller dimen-
sion so that a small number of variables are sufficient to explain the output of this
method. Batting, bowling, and decision making are used as three factors to calculate
the principal components. The partnership between players is impacting more as
compared to other parameters if proper attention is given to this parameter. CBR and
CBA are used to measure the performance of bowlers and batsmen. Small CBA and
CBR values indicate poor performance of batsmen and bowlers. In PCA dataset size
is checked with KMO value which indicates how data can be used for analysis.
Daud et al. [21] have evaluated the strength of the team based on the concept
of team rank. The current ranking method for cricket team considers the number
of matches win or loss by a particular team. It does not consider the margin of
remaining runs or remaining wickets for assigning ranks to the team. The concept of
H-index and page rankings are proposed for identifying the weakness of previously
used methods. The network of teams is formed, where each team acts as a node and
weighted directed edge is drawn between two teams. The team is awarded more points
if they won against a strong team. Team index (T-index) is proposed similar to the
A Study on Machine Learning Methods Used … 151

concept of H-index to form a non-weightage graph, where weight is not assigned to


nodes. T-index is calculated using the number of wickets and margin of runs to define
the strength and weakness of teams. The page rank algorithm is used to design the
team rank algorithm with the help of a graph. Graph nodes are assigned with weight
after matches are played between two teams. Weighted team rank algorithm is used to
calculate the strengths and weaknesses of the team. WTR is merged with the T-index
algorithm to form a hybrid method. The damping factor is added to find the page
rank algorithm for ranking. The main aim of this work is to provide a ranking based
on a graph with or without weight. An extra parameter can be added to the graph
for providing a unique solution. Hybrid ranking algorithm is proposed for assigning
weight to the graph. It is a very useful algorithm for ranking teams by finding nodes
of the graph. T-index is taken from the concept of H-index for ranking of teams, as
H-index used to check the quality of papers. Teams with more runs and wickets will
have high index value. Page rank is used for team ranking based on the page ranking
algorithm. The link values in the TR method updated with matches won by teams.
In-link represents the number of matches won by the team, while out-link refers the
number of matches lost by the team. In TR method, weight-based enhancement is
done for assigning a rank to team with the number of runs and wickets as an input.
The hybrid method of T-index and WTR proposed known as UWTR. UWTR finds
the power of team without a weighted graph with a margin of the number of runs
and wickets remaining. The weight factor helps to solve the problem if the team
wins the same number of matches. Combined TR, WTR, and UWTR methods used
for ranking of teams. The temporal dimension can also be added for the ranking of
players using the time-weighted page rank algorithm.
Mukherjee [22] has proposed a method for team selection in international matches.
The traditional method uses batting and bowling average for determining the perfor-
mance of batsmen and bowler for team selection. Here, the players are rated with
quality of runs scored by batsmen and quality of overs bowled by the bowler. Social
network analysis (SNA) method is used with a directed network of batsman and
bowler, based on historical information. One-mode projected network is generated
between batsmen and bowler who bowls to the same batsman. The network is gener-
ated due to gradient link information. The procedure for the generation of gradient
network is as follows: Firstly, a single node is created with some potential in the
network. This network is constructed with a direct edge to each point with high
potential value. The edge thickness determines the quality of batsmen or bowlers
into that network. After the construction of the network, the page rank algorithm is
applied to that projected network to find the value or quality of each player. This
complex network is evaluated and page rank score is assigned to each player. During
strength distribution of players, page rank is used to measure the performance of
players against an opposite team member. The greater value indicates better perfor-
mance of players or individual. In this page, rank algorithm only batting and bowling
ability of the players are considered. Fielding abilities are not considered. Bowlers are
assigned with more page rank if he takes more wickets on the batting-friendly pitch.
Same for the batsman, if batsman scores more run on bowling-friendly pitch, then he
must be awarded more points. It is used as an alternative method for team selection to
152 M. S. Ishi and J. B. Patil

replace the traditional method. The proposed method is static, someone can modify
this method to make this method dynamic by obtaining a detailed analysis.
Bhattacharjee and Saikia [23] have proposed binary integer programming method
for the formation of a balanced squad. Batting performance is measured with a batting
average, strike rate, and contribution of batsmen to team total. All values are normal-
ized and assigned with some weight for relative importance. All these normalized
scores are multiplied with some weight and added together to get an optimum value
of batsmen. The number of catches taken, run out in series are used to evaluate the
quality of fielders in the team. These values are also normalized and assigned with
some weight as a multiplication value to evaluate the strength of bowlers. Bowling
average, economy rate, and strike rate are considered for the evaluation of bowlers.
This factor value is normalized and multiplied with some weight to calculate the
performance of bowlers. Finally, the wicketkeeper is evaluated with the number of
catches, stumpings, and byes runs conceded in the match. This factor values are also
normalized and assigned with weight as a multiplication value to evaluate the strength
of the players. This performance measurement is a linear combination of statistics
of batting, fielding, bowling, and wicketkeeper. The normalization method tries to
avoid diversity of variables and remove the unit of measurement. For simplicity, it
maintains a range of 0 to 1 for evaluation. The performance measurement factors
are classified into positive and negative factors. Batting average catches taken are
positive factors for players. The number of byes runs conceded and economy rate
is a negative factor related to the ability of the players. Therefore, it needs to be
extra cautious while designing a formula for normalization. A composite index is
obtained for a particular player linearly with the multiplication factor as a corre-
sponding weight. Binary variables are defined to form the objective function and
constraints. If the constraints are changed, then squad also gets change from the
pool of available players. The team formation is a close operation between cricket
statisticians and selectors. This method tries to simplify the method of team selection
with some simple criteria.
Amin and Sharma [24] have used data envelopment analysis method for cricket
team selection. Different capabilities with multiple outputs are features of this method
for the evaluation of cricket players. DEA score is calculated for each player, then
categorization of efficient and not efficient players is done. This method considers
multiple factors related to the performance of players. Linear programming DEA
model is used to get the aggregate score of particular players. The score of the players
is obtained objectively instead of subjective computation. Aggregation method is
needed for evaluating the players with multiple capabilities. Subjective model of
DEA with linear programming is proposed in this method to evaluate the quality
of players. This method is capable of providing a solution to players if they are
not performing well. The typical DEA model supports multiple inputs and output,
but some of the cases are there with multiple inputs and no output or vice versa.
This model is solved n times to calculate the score of players and then select the
players into the team. The issue with this technique is an evaluation of multiple
performances with aggregation method and measuring the effectiveness of players
using available statistics. Hence, the linear programming model with the aggregation
A Study on Machine Learning Methods Used … 153

method is proposed to get the best DEA score to check the efficiency of the players.
Finding a team is similar to solve the integer programming model. Team efficiency
is directly proportional to the sum of the efficiency of players. Optimization-based
DEA model is obtained with an aggregation of multiple performances.
All the methods studied in this paper are presented in the form of summary in
Table 1. The methods used for team prediction and winner prediction are represented
in the table with their author’s name, methods used, advantage, and disadvantage of
methods.

3 Conclusion

The study of several methods for team formation and winner prediction is performed
in this paper. The finding of the study shows that cricket is a game of planning; it
needs to be divided into segments to evaluate the effects of the parameters which are
needed for team formation and winner prediction. There is a need for classification
of players according to the strength of the opposite team. There is also a need for
searching for more parameters to increase the effectiveness of the team and to study
individual factors which affect winner prediction model. Proper weight needs to be
assigned for forming the team and predicting a winner in cricket. Player evaluation
needs to be done by considering the relative ranking of teams. In this paper, a study
of the team strength evaluation using the concept of ranking is done. The bulk of data
is available, from their quality of players needs to be evaluated using some possible
ways. Machine learning algorithms like KNN, Naive Bayes, support vector machine,
and logistic regression are used to select the players and winner prediction in cricket.
Some of the authors used the neural network method which uses a graph-based
network for team formation. NSGA-II genetic algorithm, the binary programming
model is used to evaluate the strength of the team. The concept of optimization of
weights is used to evaluate the strength of players. Normalization methods are used
to normalize the value of players and team. In this way, the study of a number of
algorithms that are used for the team formation and winner prediction in cricket using
different methodologies is performed. The research gaps are identified. Methods are
studied with their merits and demerits. The effective model with maximum accuracy
needs to be created for team formation and winner prediction in cricket.
154 M. S. Ishi and J. B. Patil

Table 1 Summary of methods used for team formation and winner prediction
Sr No. Author’s Method(s) used Advantage Disadvantage
1 Rory P. Bunker and SRP-CRISP-DM Solution to complex Working on
Fadi Thabtah [9] using ANN for problem pinpoint accuracy
result prediction
2 M. Asif and I.G. GNLFM model for Advanced team Home advantage
McHale [10] winner prediction rating system feature not used
3 Shankar Chakraborty TOPSIS method Use of decision Not bias free
et.al. [11] with MCDM tool matrix with weights
for team prediction
4 Sandesh Bananki K-means Good result for SVM Poor result for
Jayanth et.al. [12] clustering for team with RBF kernel linear and poly
formation and kernel SVM
SVM for winner
prediction
5 Shelvin Chand et.al. Multi-objective Optimal team with Result may not
[13] integer less time consistent with
programming form of players
model for team
formation
6 Haseeb Ahmad et.al. Productivity Use of network Fielding
[14] precedence approach parameter not
algorithm for team considered
formation
7 Apalak Khatua and Logistic regression Use of mix tweets Not a generalized
Aparup Khatua [15] with Twitter data pattern model
for winner
prediction
8 Aman Verma and Cricket prognostic Good accuracy More than one
Masoumeh Izadi [16] system for winner model to predict
prediction winner
9 Shubhra Singh and HBase tool with Use of HBase tool More features can
Parmeet Kaur [17] KNN algorithm be added
for team formation
10 Paramita Dey et.al. Weighted network Network-based Complex
[18] for team formation model approach
11 Scott Irvine and Performance Use of Additional
Rodney Kennedy indicator with magnitude-based parameters for
[19] magnitude-based interference bowlers are
interface for team needed
formation
12 Ananda Bandulasiri PCA method for PCA used to reduce Phase-wise
et.al. [20] team selection variable size of data analysis is not
possible due to
data
(continued)
A Study on Machine Learning Methods Used … 155

Table 1 (continued)
Sr No. Author’s Method(s) used Advantage Disadvantage
13 Ali Daud et.al. [21] T-index and page Use of graph for Concept of
rank algorithm for better ranking for temporal
team ranking players dimensions can be
added
14 Satyam Mukherjee Social network Page rank concept for Use of static
[22] analysis (SNA) for players evaluation approach
team formation
15 Dibyojyoti Binary integer Normalization Need to add more
Bhattacharjee and programming concept used for parameters for
Hemanta Saikia [23] method for team weight calculation evaluation
formation
16 Gholam R. Amin and Data envelopment Objective evaluation Comparison of
Sujeet Kumar analysis method is used instead of players with own
Sharma [24] for team selection subjective evaluation team is not done

References

1. Swartz TB (2017) Research directions in cricket. In: Handbook of statistical methods analysis
sport, pp. 445–460. https://doi.org/10.1201/9781315166070
2. Passi K, Pandey N (2018) Increased prediction accuracy in the game of cricket using machine
learning. Int J Data Min Knowl Manage Process (IJDKP) 8:19–36. https://doi.org/10.5121/
ijdkp.2018.8203
3. Saikia H, Bhattacharjee D, Radhakrishnan UK (2017) A new model for player selection in
cricket. Int J Perform Anal Sport 16:373–388. https://doi.org/10.1080/24748668.2016.118
68893
4. Ahmad H, Daud A, Wang L, Hong H, Dawood H, Yixian Y (2017) Prediction of rising stars
in the game of cricket. IEEE Access 5:4104–4124. https://doi.org/10.1109/ACCESS.2017.268
2162
5. Pathak N, Wadhwa H (2016) Applications of modern classification techniques to predict the
outcome of ODI cricket. Procedia Comput Sci 87:55–60. https://doi.org/10.1016/j.procs.2016.
05.126
6. Asif M, McHale IG (2016) In-play forecasting of win probability in one-day international
cricket: a dynamic logistic regression model. Int J Forecast 32:34–43. https://doi.org/10.1016/
j.ijforecast.2015.02.005
7. Jhanwar MG, Pudi V (2016) Predicting the outcome of ODI cricket matches: a team composi-
tion based approach. In: European conference on machine learning and principles and practice
of knowledge discovery in databases (ECML-PKDD) proceedings, vol 1842, pp 111–126
8. Sankaranarayanan VV, Sattar J, Lakshmanan LVS (2014) Auto-play: a data mining approach
to ODI cricket simulation and prediction. Int Conf Data Min SDM 2:1064–1072. https://doi.
org/10.1137/1.9781611973440.121
9. Bunker RP, Thabtah F (2019) A machine learning framework for sport result prediction. J Appl
Comput Inf 15:27–33. https://doi.org/10.1016/j.aci.2017.09.005
10. Asif M, McHale IG (2019) A generalized non-linear forecasting model for limited overs
international cricket. Int J Forecast 35:634–640. https://doi.org/10.1016/j.ijforecast.2018.
12.003
11. Chakraborty S, Kumar V, Ramakrishnan KR (2019) Selection of the all-time best World XI
Test cricket team using the TOPSIS method. Decis Sci Lett 8:95–108. https://doi.org/10.5267/
j.dsl.2018.4.001
156 M. S. Ishi and J. B. Patil

12. Jayanth SB, Anthony A, Abhilasha G, Shaik N, Srinivasa G (2018) A team recommendation
system and outcome prediction for the game of cricket. J Sports Anal 4:263–273. https://doi.
org/10.3233/jsa-170196
13. Chand S, Singh HK, Ray T (2018) Team selection using multi-/many-objective optimization
with integer linear programming. In: 2018 IEEE congress on evolutionary computation CEC
2018—Proceedings, pp 1–8. https://doi.org/10.1109/CEC.2018.8477945
14. Ahmad H, Daud A, Wang L, Ahmad I, Hafeez M, Yang Y (2017) Quantifying team precedence
in the game of cricket. J Cluster Comput 21:523–537. https://doi.org/10.1007/s10586-017-
0919-z
15. Khatua A, Khatua A (2017) Cricket world cup 2015: predicting user’s orientation through
mix tweets on twitter platform. In: Proceedings 2017 IEEE/ACM international conference on
advances in social networks analysis and mining ASONAM 2017, pp 948–951. https://doi.org/
10.1145/3110025.3119398
16. Verma A, Izadi M (2017) Cricket prognostic system: a framework for real-time analysis in
ODI cricket. In: International conference on large scale sports analytics
17. Singh S, Kaur P (2017) IPL visualization and prediction using HBase. Procedia Int Conf Inf
Technol Quant Manage 122:910–915. https://doi.org/10.1016/j.procs.2017.11.454
18. Dey P, Ganguly M, Roy S (2017) Network centrality based team formation: a case study on
T-20 cricket. J Appl Comput Inf 13:161–168. https://doi.org/10.1016/j.aci.2016.11.001
19. Irvine S, Kennedy R (2017) Analysis of performance indicators that most significantly affect
international Twenty20 cricket. Int J Perform Anal Sport 17:350–359. https://doi.org/10.1080/
24748668.2017.1343989
20. Bandulasiri A, Brown T, Wickramasinghe I (2016) Factors affecting the result of matches in
the one day format of cricket. J Oper Res Decis 26:21–32. https://doi.org/10.5277/ord160402
21. Daud A, Muhammad F, Dawood H, Dawood H (2015) Ranking cricket teams. J Inf Proces
Manage 51:62–73. https://doi.org/10.1016/j.ipm.2014.10.010
22. Mukherjee S (2014) Quantifying individual performance in cricket—a network analysis of
batsmen and bowlers. J Phys Stat Mech Appl 393:624–637. https://doi.org/10.1016/j.physa.
2013.09.027
23. Bhattacharjee D, Saikia H (2014) On performance measurement of cricketers and selecting
an optimum balanced team. Int J Perform Anal Sport 14:262–275. https://doi.org/10.1080/247
48668.2014.11868720
24. Amin GR, Sharma SK (2014) Cricket team selection using data envelopment analysis. Eur J
Sport Sci 14:37–41. https://doi.org/10.1080/17461391.2012.705333
Machine Learning-Based Intrusion
Detection System with Recursive Feature
Elimination

Akshay Ramesh Bhai Gupta and Jitendra Agrawal

Abstract With the prevalent technology like cloud computing, big data and Internet
of things (IoT), a huge amount of data have been generated day by day. Presently,
most of the data are stored in the digital form and transfer to others by the mean
of digital communication media. Hence, to provide security to data and network is
one of the main concerns for everyone. Several intrusion detection systems (IDS)
have been proposed in the last few years, but accuracy and false alarm rate are
still most challenges issue for the researchers. Nowadays, an intruder is used to
design new types of attack day by day, which is challenging to identify. Recently,
machine learning is emerging as the most powerful tool for the development of IDS.
This paper discusses three different machine learning approach, namely decision
tree, random forest and support vector machine (SVM). KDD-99 dataset is used to
train the model. Due to the unbalance data and duplicate feature, recursive feature
extraction technique is being used to reduce the number of features. The experiment
result shows that proposed IDS performs well as compared to the base model with
the accuracy of 99.1.

Keywords Intrusion detection system (IDS) · SVM · Decision tree · Random


forest tree · Neural network · KDD Cup-99

1 Introduction

A huge amount of data is being generated every second due to the social media, IoT
devices and technology reform [1, 2]. All information is stored in a server or host
machine in digital form and transfers from one machine to another. To provide secu-
rity to user data from intruder or attacker is a challenging task due to the advancement

A. R. B. Gupta (B) · J. Agrawal


School of Information Technology, RGPV, Bhopal, India
e-mail: akshaygupta2406@gmail.com
J. Agrawal
e-mail: jitendra@rgtu.net

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 157
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_13
158 A. R. B. Gupta et al.

Fig. 1 Intrusion detection system

in technology and attacker domain expertization. An attacker uses intrusion or virus


to attack to the server or network and stolen useful information. An intrusion can
be used by the attacker to harm the machine to take revenge or stolen their useful
information for their benefit. Nowadays, intrusion detection system (IDS) is used to
provide security to user data from intruder [3]. An IDS is an application or hardware
which regularly monitor system behaviour. If the host is deviating from their normal
behaviour, it will generate an alarm to inform that some attacker is trying to attack
their system. A system manager can also generate all loges and activity report which
can be used for further investigation and optimize their security policies. Figure 1
depicts the working of IDS [4, 5].
The effectiveness of the IDS is depended upon the accuracy and false alarm rate.
An IDS must have high accuracy and low false alarm rate. The researches have
proposed a number of IDS in the past few years, but accuracy and false alarm rate
are still a challenge for IDS developer. Figure 2 shows the different types of IDS. To
detect abnormal behaviour, statistical approaches apply some data mining technique
whereas knowledge-based IDS uses human expert knowledge.
Machine learning is a technique which enables a machine to take decision similar
to the human brain and improve their performance from past data. Machine learning
approaches are to be supervised or unsupervised. Most of the IDS are designed
by using supervised techniques because supervised learning methods give a more
accurate result as compared to the unsupervised learning methods.
It is observed that machine learning-based IDS is more effective as compared to
traditional IDS. In the machine learning-based IDS, a learning algorithm is used to
train IDS. All machine learning approaches split data into training and testing part.
During the training phase, training data is given to the model, and learning algorithm
will try to find a relation between input and output data, whereas testing data is used to
test model performance. There are several machine learning algorithms are available
that can be used to train IDS like decision tree, support vector machine (SVM), Naïve
Bayes, logistic regression, random forest tree, etc. [4–8]. In this paper, decision tree,
SVM and random forest tree are being used to train our model.
Machine Learning-Based Intrusion Detection … 159

Fig. 2 Classification of anomaly-based IDS

1.1 Support Vector Machine (SVM)

It is a maximum marginal classifier which classifies data based on the support vector.
All features are plotted in N dimension space where N is the number of features and
separate all data into classes by using a hyperplane. Figure 3 shows the working of
SVM.

(w.x + b)of Class 1


(w.x + b)of Class 2

1.2 Decision Tree

The decision tree is a tree-like structure where each node represents the question,
an edge represents answer, and leaf node represents the class label. Decision tree
uses information gain and entropy to choose root node. A feature which has higher
information gain and low entropy will opt as a root node. Following equation is used
for calculation information gain
160 A. R. B. Gupta et al.

Fig. 3 Support vector


machine


n
Entropy = −P(ci ) log2 (P(ci )) (1)
i=1

Gain(T, X ) = Entropy(T )−Entropy(T, X ) (2)

where P(ci ) represents the probability of the instance i belongs to class C, T


represents target variable, and X represents features.

1.3 Random Forest Tree

Random forest tree is an ensemble method where multiple trees being constructed.
As opposed to the decision tree where only one tree is built for each problem in the
random forest tree, approach nodes are selected randomly and then build multiple
trees. The final decision will be taken based on the voting, and the sample will assign
to the class based on the majority. Entropy and information gain is used to select a
node in a different tree.

2 Literature Survey

Anish et al. [9] proposed machine learning-based IDS. This paper uses SVM and
Naïve Bayes classifier to train IDS. KDD-NSL dataset is used to train a model.
Machine Learning-Based Intrusion Detection … 161

Maximum accuracy claimed by the author is 97.29 by using SVM. The main limi-
tation of this approach is that they are not eliminating duplicate features from KDD
dataset. As the model is more trained for the normal transaction, so the final developed
model will also classify the maximum transaction as a normal transaction.
Rahaman et al. [10] introduced deep learning-based IDS which cab used for the
smart cities. KDD Cup-99 dataset is used as training data to train a model. In this
paper, they design deep neural network and apply SVM at the classification layer.
The deep neural network will extract features from the given data and pass to the
SVM for classifying data into a class. Information gain and J48 are used to extract
features. Since this model is using deep neural network for the feature extraction and
then SVM is used for the classification, it will increase the training time and chance
of overfitting.
Almseidin et al. [11] evaluate the performance of various machine learning-based
IDS. This paper trains IDS using multiple machine learning approaches like random
forest, decision table multi-layer perceptron (MLP), Naive Bayes and Bayes network.
KDD Cup dataset is used for the training and testing purpose. Accuracy, precision and
recall are used to measure the performance of all trained IDS. Maximum accuracy
claimed by the author is 93.77 for the random forest classifier. The main limitation
of this approach is that they have not applied any feature selection approach, which
may lead to low accuracy. This paper does not evaluate the performance with respect
to SVM because SVM gives better efficiency when features are normalized.
Kumar et al. [12] summarized different ensemble method that can be used for the
IDS development. As each classifier has their limitations, so instead of using single
classifier, this paper suggested using an ensemble method for the development of
IDS. Since the output of one classifier is given to the next classifier, so it will help
to achieve more accuracy, but at the same time, it will increase training and testing
time.
Alrowaily et al. [13] design an IDS which trained for several machine learning
approaches like random forest, KNN, Naive Bayes, decision tree, MLP and
AdaBoost. In this work, they used CICIDS2017 dataset for the training and testing
purpose. KNN gives higher accuracy of 99.46 for the given dataset.

3 Proposed Work

An intrusion detection system is a software which is used to detect any unauthorized


or malicious activity in the network or host machine. In IDS, one of the most chal-
lenging issue phases by the researcher is to achieve higher accuracy and minimum
false alarm rate. A recent study says that machine learning-based approaches can
help to design more robust and secure IDS system. Machine learning is a technology
where a computer performs a task without explicit programming by extracting infor-
mation from a large dataset. In the machine learning-based IDS, the first system is
trained on the existing attack in such a way that it will able to differentiate between
authenticating and legitimate user by extracting pattern in the given dataset, and then
162 A. R. B. Gupta et al.

Fig. 4 Flow diagram of


machine learning-based IDS

the system is tested for the new attacks. In machine learning-based IDS, accuracy is
mainly depending upon the feature selection form the given dataset. Figure 4 shows
the different steps involved in the development of IDS. Following steps are included
in the development of machine learning-based IDS.
(a) Data Collection
(b) Data Preprocessing
(c) Model Training
(d) Model Evaluation

3.1 Data Collection

In this work, KDD Cup-99 dataset is used to train machine learning-based IDS
which is downloaded from the Kaggle [14]. This dataset consists of 494,020 rows
(instances), 42 features and classified all transaction into 23 different types of attacks.
Machine Learning-Based Intrusion Detection … 163

Table 1 New group with


New group Unique number
their class number
Normal 0
DoS 1
Probe 2
R2L 3
U2R 4

3.2 Data Preprocessing

This is one of the most important and challenging phases of any machine learning
model. Accuracy of the model mainly depends on data preprocessing. This is a
process to convert row data into machine consumable form. Following preprocessing
is done on KDD dataset before feeding data to a model. Steps required to convert
row data into machine consumable form
i. Identified all numeric, categorical and text data
ii. Convert all text data into a number
iii. Convert all ordinal variable in number
iv. Convert all nominal variable in the dummy variable.

3.2.1 Identified Categorical Attributes

KDD dataset has 41 features where three features, namely protocol_type, service
and flag, are categorial. To change these features into a number, first apply the label
encoder and then apply one-hot encoding to all features. To avoid dummy trap first
column is drop.

3.2.2 Group Target Variable

KDD dataset has 23 different types of attack. Based on the attack property and
behaviour, 23 attacks are categorized into 5 groups, namely denial of service attacks
(DoS), root to local (R2L), user to root attack (U2R), normal and probe and assign
a unique number to all five groups. Table 1 shows a unique number assign to each
group, and Table 2 depicts the assignment of attack into the group.

3.2.3 Features Selection

KDD data have 41 features as shown in Fig. 5. After applying the one-hot encoding,
the total number of features is 117. During the training, it is observed that most
of the features are redundant and can be removed to get higher accuracy and
164 A. R. B. Gupta et al.

Table 2 Mapping of each


New group Unique number
attack with group
Normal 0
NEPTUNE 1
Back 1
POD 1
Teardrop 1
Land 1
Smurf 1
Buffer_Overflow 4
Load module 4
PERL 4
Rootkit 4
FTP_Write 3
Guess_Passwd 3
IMAP 3
MulitHop 3
PHF 3
SPY 3
Warez client 3
Warez master 3
IPSWEEP 2
NMAP 2
PORTSWEEP 2
SATAN 2

minimize training time. To remove useless and redundant features, the recursive
feature elimination (RFE) approach is being used. After applying RFE, the top 13
features have been chosen from 117 features. Below table depicts selected features
for which the model is trained.

Recursive feature elimination algorithm


1: Train your model with a full feature set
2: Call RFE with the selected model and calculate a score for all features
3: Sort all features into the descending order according to the score value
4: Select top 13 features according to the score value
5: Remove all others features
Machine Learning-Based Intrusion Detection … 165

Fig. 5 Different feature of KDD Cup dataset

3.2.4 Data Normalization

To normalize input data, standard normalization is used which converts all feature
matrix in the range of 0 mean and one standard deviation. The formula for the standard
normalization is
x −μ
Z= (3)
σ
where x represents feature value, μ represent mean, and σ represents the standard
deviation.
166 A. R. B. Gupta et al.

3.3 Training Model

After the preprocessing, feature matrix and target vector are passed to the model for
the training. In this paper, our model is trained for decision tree classifier, random
forest and SVM classifier. All models are trained only for the selected features. Due
to the removal of all redundant features and less number of features, the proposed
model takes less time in training.

IDS algorithm
1: Load dataset
2: Preprocessed data
– Convert all categorical variable into a number using label encoder and one-hot encoder
– Map all attacks of KDD dataset into five clusters
– Normalize feature matrix
3: Split your data into train and test
4: Call RFE algorithm for the feature selection
5: Train your model for the selected features
6: Evaluate the model performance

3.4 Model Performance Evaluation

The task of all machine learning-based IDS is to find the class of each transition, i.e.,
normal, probe, R2L, U2R, DoS, etc. So, the performance of each IDS is evaluated
using the following metrics.

3.4.1 Confusion Matrix

Confusion gives a summary of the actual result and predicted result. It is used
to analyse the performance of the classifier, which helps us to improve model
performance.

3.4.2 Accuracy

It represents the overall accuracy of the given model. The formula for calculating
accuracy of the model is given by

TP + TN
Accuracy =
TP + FP + TN + FN
Machine Learning-Based Intrusion Detection … 167

3.4.3 Precision

Precision is defined as a ratio of TP and (TP + FP).

TP
Accuracy = Precision =
TP + FP

3.4.4 F1-Measure

F1 measure is the weighted average of recall and precision.

2 ∗ Precision ∗ Recall
F1-measure =
Precision + Recall

4 KDD Dataset Description and Preprocessing

KDD Cup-99 is one of the most popular datasets which are used to design an intrusion
detection system (IDS). Although this dataset is prepared in 1999, still is one of the
most popular choices of IDS developer [14]. This dataset consists of 494020 rows
(instances) and 42 features. This dataset covers 23 different types of attacks as shown
in Fig. 5. The main limitation of this dataset is that it is unbalanced and has several
duplicate entries. So preprocessing plays a vital role to achieve higher accuracy.
As this dataset classified transaction into 23 different types of attack so first, these
attacks have been divided into five different categories. Table 3 shows the different
type of KDD attacks and new assign categories of the attack.
It is clear from Table 3 that data is highly unbalanced, and most of the transaction
will be classified as normal or DoS because dataset contains lots for a record for
normal and DoS, so machine learning algorithm will find the normal and Dos pattern
easily. So dataset needs to be preprocessed before passed to the machine learning
model.

5 Result Evaluation

To evaluate the performance of the proposed model, it is compared with the existing
approach named “evaluation of machine learning algorithms for intrusion detec-
tion system” [11]. This paper implements several machine learning algorithms like
support vector machine (SVM), random forest, J48, decision tree, Naïve Bayes, etc.
Maximum accuracy claimed is 93.77 by using random forest. The performance of
168 A. R. B. Gupta et al.

Table 3 KDD attacks and


Attack categories Attack name Instances counts
their categories
DOS SMURF 280,790
NEPTUNE 107,201
Back 2203
POD 264
Teardrop 979
Land 21
U2R Buffer_Overflow 30
Load module 9
PERL 3
Rootkit 10
FTP_Write 8
R2L Guess_Passwd 53
IMAP 12
MulitHop 7
PHF 4
SPY 2
Warez client 1020
Warez master 20
PROBE IPSWEEP 1247
NMAP 231
PORTSWEEP 1040
SATAN 1589
Normal 97,277

our proposed approach has also been evaluated using decision tree, random forest
tree and SVM. The proposed algorithm is implemented in a machine with configura-
tion of i5 (9th generation) processor, 8GB Ram and 4GB Nvidia graphics card 1650
GTX (Tables 4, 5 and 6; Figs. 6, 7, 8 and 9).

6 Conclusion

Due to the social media and IoT, huge amount of data is transferred from one device
to another, and it is increasing day by day. So there is a need to develop an effec-
tive IDS that identified the unauthorized access and takes appropriate action. One
of the primary re-equipments of any IDS is high accuracy and low false alarm rate.
Recently, machine learning-based IDS performs better as compared to the tradi-
tional IDs. This paper proposed machine learning-based IDS which used recursive
Machine Learning-Based Intrusion Detection … 169

Table 4 Classification report for decision tree


Class Precision Recall F1-score Support
0 1.00 0.91 0.95 24,354
1 1.00 1.00 1.00 97,816
2 0.50 0.98 0.66 1028
3 0.95 0.67 0.79 301
4 0.00 0.67 0.01 6
Accuracy 0.95 123,505
Macro avg 0.69 0.85 0.68 123,505
Weighted avg 0.99 0.98 0.99 123,505

Table 5 Classification report for the SVM


Class Precision Recall F1-score Support
0 1.00 0.98 0.95 24,203
1 1.00 1.00 1.00 98,005
2 0.67 0.99 0.66 1025
3 0.99 0.95 0.97 260
4 0.89 0.67 0.76 12
Accuracy 0.97 123,505
Macro avg 0.91 0.92 0.68 123,505
Weighted avg 1.00 1.00 1.00 123,505

Table 6 Classification report for random forest tree


Class Precision Recall F1-score Support
0 0.99 1.00 1.00 24,337
1 1.00 1.00 1.00 97,816
2 1.00 0.96 0.98 1015
3 0.94 0.80 0.86 274
4 0.00 0.00 0.00 18
Accuracy 0.99 123,505
Macro avg 0.79 0.75 0.77 123,505
Weighted avg 1.00 1.00 1.00 123,505

feature extraction technique to reduce redundant and useless features. The proposed
approach is implemented in three machine learning classifiers, namely decision tree,
SVM and random forest tree. The experiment result shows that random forest tree
performs well as compared to the SVM and decision tree. One of the main advantages
170 A. R. B. Gupta et al.

Fig. 6 Confusion matrix for decision tree

Fig. 7 Confusion matrix for SVM

of random forest tree is that it predicts less number of fraud transaction as a normal
transaction as compared to the decision tree and SVM.
Machine Learning-Based Intrusion Detection … 171

Fig. 8 Confusion matrix for random forest tree

Fig. 9 Comparative analysis


of different classifier
Accuracy

97 99
95 92.44 93.77

DECISION TREE SVM RANDOM FOREST

Proposed Approach Base Approach

References

1. Magán-Carrión R et al (2020) Towards a reliable comparison and evaluation of network


intrusion detection systems based on machine learning approaches. Appl Sci 1–21
2. Buczak AL, Guven E (2017) A survey of data mining and machine learning methods for cyber
security intrusion detection. IEEE Commun Surv Tutor 18(2):1153–1176
3. Khraisat A et al (2019) Survey of intrusion detection systems: techniques, datasets and
challenges. Cybersecurity 2–20
4. Shen C, Liu C, Tan H, Wang Z, Xu D, Su X (2018) Hybrid-augmented device fingerprinting for
intrusion detection in industrial control system networks. IEEE Wirel Commun 25(6):26–31
5. Jabbar MA, Aluvalu R, Reddy SS (2017) RFAODE: a novel ensemble intrusion detection
system. Procedia Comput Sci 115:226–234
6. Gupta ARB, Agrawal J (2020) A comprehensive survey on various machine learning methods
used for intrusion detection system. In: Proceeding of IEEE 9th international conference on
communication systems and network technologies, April 2020, pp 282–289
7. Raj Jennifer S (2019) A comprehensive survey on the computational intelligence techniques
and its applications. J ISMAC 1(03):147–159
8. Mugunthan SR (2019) Soft computing based autonomous low rate DDOS attack detection and
security for cloud computing. J Soft Comput Paradig (JSCP) 1(02):80–90
172 A. R. B. Gupta et al.

9. Anish et al (2019) Machine learning based intrusion detection system. In: Proceedings of the
third international conference on trends in electronics and informatics, pp 916–920
10. Rahaman A et al (2020) Scalable machine learning-based intrusion detection system for IoT-
enabled smart cities. Sustainable Cities and Society
11. Almseidin M et al (2017) Evaluation of machine learning algorithms for intrusion detec-
tion system. In: Proceeding of IEEE 15th international symposium on intelligent systems and
informatics, September 2017, pp 277–282
12. Kumar G et al (2020) MLEsIDSs: machine learning-based ensembles for intrusion detection
systems—a review. J Supercomput 76(2), Feb 2020
13. Alrowaily M et al (2019) Effectiveness of machine learning based intrusion detection systems.
In: Proceeding of international conference on security, privacy and anonymity in computation,
communication and storage, pp 277–288
14. Bay SD (1999) The UCI KDD archive. Department of Information and Computer Science,
University of California, vol 404, pp 405. http://kdd.ics.uci.edu.irvine.ca
An Optical Character Recognition
Technique for Devanagari Script Using
Convolutional Neural Network
and Unicode Encoding

Vamsi Krishna Kikkuri, Pavan Vemuri, Srikar Talagani, Yashwanth Thota,


and Jayashree Nair

Abstract This paper describes an optical character recognition technique to convert


scanned Sanskrit text images scripted in Devanagari into digital documents. The
segmentation mechanism, an adaptation from existing literature, identifies and sepa-
rates upper and lower modifiers in a character. It also recognizes fused Devanagari
letters. The segmented characters are fed to a convolutional neural network classi-
fier which is trained upon a dataset with about 1.2 lakhs images belonging to 85
classes for the core part of a character. Each character from the segmentation phase
is predicted and mapped to the respective Unicode representation. These Unicode
values for characters are added to reconstruct the desired word. By keeping track
of spaces between words and lines, a document can be reconstructed to an editable
format.

Keywords Optical character recognition · Segmentation · Devanagari script ·


Sanskrit · Fused character · Convolutional neural network · Unicode encoding ·
Digitization

1 Introduction

Optical character recognition is a procedure by which printed or handwritten texts,


scanned documents, or an image with texts are converted to machine-encoded
characters. This method finds its use many applications of text processing such
as information retrieval process, character recognition from various objects, and
book scanning. The OCR process is generally divided into two phases: The first

V. K. Kikkuri (B) · P. Vemuri · S. Talagani · Y. Thota


Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham, Amritapuri,
India
e-mail: vamsikikkuri@gmail.com
J. Nair
Department of Computer Science and Applications, Amrita Vishwa Vidyapeetham, Amritapuri,
India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 173
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_14
174 V. K. Kikkuri et al.

Fig. 1 Sections of a word in


Devanagari script
(source [1])

phase is character detection, and the second phase is character recognition. The first
phase includes the pre-processing and segmentation of the image. The second phase
includes training a model that can predict the result from the first phase. Using good
image pre-processing techniques will give a better output to the following phases.
Pre-processing includes noise reduction, skew correction, and converting a given
image to binary format. The classifier that predicts characters should be trained in
such a way that it can be generalized over any font and size.
Devanagari script is used to scribe languages like Sanskrit, Hindi, Marathi, etc.
In Devanagari script, there is a line called “Sirorekha” or the header line which
connects all the characters in a word. Languages following Devanagari script have
a very large set of consonants, vowels, and combinations of consonants with vowels
and the combinations of consonants among themselves (left part will the pure form
of one consonant and the right part will be full consonant). In the Devanagari script,
a character can have an upper modifier in the top strip, a lower modifier in the bottom
strip, a pure form of consonant, and a full consonant in core strip. Ref: Fig. 1.
This paper presents a methodology to create an OCR framework for Sanskrit char-
acters by using the segmentation algorithm proposed in [1]. This technique separates
upper modifier, lower modifier, and also does fused character segmentation to sepa-
rate pure form and full consonant separately. Histograms are used for line, word, and
character segmentation as proposed in [2]. To build a robust classifier that can accu-
rately predict characters with different sizes, fonts, and strokes, convolutional neural
network (CNN) is used instead of traditional classification algorithms like SVM,
KNN, ANN, etc. [3]. CNN is a multi-layered architecture which does feature extrac-
tion without information loss by using several layers of convolutional layers and then
feed the extracted features into a classifier which predicts the output [4]. To train the
model, an artificially synthesized dataset consisting of around 1.2 lakh images with
85 classes for the core part of a character, is developed using different fonts avail-
able at [5]. Initially, CNN is used for object detection purpose, but later on, this is
being used in other domains also. Each component in the separated segmentation
phase will be fed into its respective trained CNN model. Using the Unicode values
provided at [6–8], the scanned document is reconstructed to a machine-encoded text.
The workflow of the proposed framework is depicted in Fig. 2. An image, with
Devanagari script (Sanskrit text), taken as input is first pre-processed. The segmen-
tation algorithm deals with the pre-processed image. Segmentation phase consists of
line, word, character, fused character segmentation, and also the separation of modi-
fiers. All characters identified in the segmentation phase are fed to train CNN, and
these predictions are mapped to their respective Unicode values and added together
An Optical Character Recognition Technique for Devanagari … 175

Fig. 2 The proposed OCR framework

to reconstruct the scanned document. Each of the modules in Fig. 2 is exhaustively


explained in the sections following.

2 Existing Works

Several works have been done in the past to build an OCR for Devanagari scripts.
Research work proposed in [1, 9, 10] segments given image to character level, sepa-
rate upper and lower modifiers, and segment the fused characters using the structural
properties of the Script.
Research work presented in [11] discusses OCR for printed Malayalam text using
singular value decomposition (SVD) for dimensionality reduction and Euclidean
distance measure for character recognition, whereas [12] discusses OCR for Telugu
text using SVD, projection profile and discrete wavelet transform for feature
extraction and K-nearest neighbors and support vector machine for character
recognition.
Advancements in computer technology urged developers to adopt machine
learning and deep learning algorithms that need high computational capabilities
when compared to rule-based or template-based mechanisms and at the same time
producing better results [13, 14]. Using such algorithms in tasks like OCR proved
to produce better results and better generalization over a wide range of font styles
and sizes. A work by Dineshkumar and Suganthi [15] does handwritten character
recognition using ANN. Another work by Jawahar [16] does character recognition
for languages like Hindi and Telugu using principal component analysis followed by
SVM.
Sankaran and Jawahar [17] proposed a work using the Bidirectional Long Short-
Term Memory (BLSTM) approach which can recognize text in word level. BLSTM
176 V. K. Kikkuri et al.

Fig. 3 a Depicts a number


of black pixels in every row.
b Depicts a number of black
pixels in every column. Both
a, b graphs are calculated
using the binary image of the
word “ ”

uses previous and present word context to make predictions. The Sanskrit OCR
proposed by Avadesh and Goyal in their work [3] does character-level segmentation
and employs a CNN model that is trained on a dataset consisting of 602 classes which
include fused characters also.

3 The Segmentation Operators

This section explains the segmentation operators used in the algorithm proposed in
[1].

3.1 Horizontal Projection

To the binarized image, horizontal projection computes the total number of black
pixels in every row which can be done by calculating horizontal histogram. So the
pixel rows with no black pixels are considered to be the white spaces. So, the starting
of a black pixel row will be the top boundary and starting of white space can be
considered to be the bottom boundary. Ref: Fig. 3a.

3.2 Vertical Projection

To the binarized image, vertical projection computes the total number of black pixels
in every column which can be done by a vertical histogram. The pixel columns with
no black pixels are considered to be the white space. So, the starting of a black pixel
column can be considered as the left boundary of text and start of white space column
can be considered as the right column of text. Ref: Fig. 3b.
An Optical Character Recognition Technique for Devanagari … 177

3.3 Vertical bar Position

To find the position of the vertical bar, find the height of character from Sirorekha
to the bottom of the image using vertical projection. The column which has black
pixels count more than 80% of the height, that column is considered as the vertical
bar.

3.4 Collapsed Horizontal Projection

Unlike in horizontal projection, collapsed horizontal projection checks for the occur-
rence of at least one black pixel in a row. If there is a black pixel in a row, CHP of
that row is set to 1, else it is set to 0.

3.5 Height and Continuity of Character

To find the height and continuity of a character, denote “R1” as the pixel row where
CHP is equal to 1 for the first time, “R2” as the pixel row where CHP is equal to 0
in the subsequent rows, and “R3” as the pixel row where again in subsequent rows
CHP is equal to 1. Now there are 3 possibilities for R2, R3 (since R1 is always there).
They are:
1. Both R2 and R3 does not exist: This means, the character is continuous and height
is (total number of rows − R1)
2. R2 exists but R3 does not exist: This means, the character is continuous and
height is (R2 − R1)
3. Both R2 and R3 exist: This means, there is discontinuity and height is (R2 − R1)

3.6 Pen Width

It is the width of Sirorekha. The rows with maximum black pixels are considered to
be Sirorekha. It is estimated using the outcome of horizontal projection.

4 Image Pre-processing

The image that contains Devanagari text is prepared for further steps in this stage
by using a Gaussian filter which smoothens the image by preserving the edges of the
image. The resulting image, which is an RGB image is then converted to gray scale
178 V. K. Kikkuri et al.

and then to a binary image. Binarization is done using OTSU global thresholding
algorithm. This binary image is used in the further segmentation process.

5 The Segmentation Phases

An image that has undergone pre-processing is segmented to character level. To


achieve this, go through below-mentioned phases, proposed by Bansal and Sinha
[1].

5.1 Line Segmentation

In this phase, each line is identified and separated from the image. This is done using
the horizontal projection algorithm. Using the top and bottom boundaries obtained
from horizontal projection algorithm, the lines are cropped from the original image,
and each line is saved as a different image.

5.2 Word Segmentation

In this phase, each word from every line is identified and separated that is obtained
from the previous phase. This is done using the vertical projection algorithm. Using
the left and the right boundaries obtained from vertical projection algorithm, a line
can be segmented to words. This is done for every line that is segmented in the
previous phase.

5.3 Character Segmentation

In this phase, each character is identified and separated from every word that is
obtained from the previous phase. For this, first, the “Sirorekha” is identified. To find
Sirorekha for each word, find the row with the maximum number of black pixels
using the horizontal projection algorithm and this row is considered as Sirorekha.
Now remove the Sirorekha and apply the vertical projection algorithm which
separates each character individually. Each character is saved as a separate image.
For better segmentation, find the rows with maximum and second maximum black
pixels and whiten all pixel values in between these rows. To find Sirorekha, horizontal
projection algorithm counts black pixels in every row only up to half the height of
the image. For every segmented character, check for the presence of upper modifier
and lower modifier and separate them from the core part of the character.
An Optical Character Recognition Technique for Devanagari … 179

5.4 Upper Modifier Identification

This can be done by making a vertical projection from the top of the image till
Sirorekha. If there are any black pixels present, it can be concluded that there is an
upper modifier for that character image and cropped the image from top till Sirorekha.

5.5 Lower Modifier Identification

To check the presence of a lower modifier, find the height of each character from
Sirorekha and find the maximum height. Using below-mentioned rules categorize
characters to 3 categories.
1. If the character height is more than 80% of the maximum height, then classify
them to category-1.
2. If the character height is less than 80% and more than 64% of the maximum
height, then classify them to category-2.
3. If the character height is less than 64% of the maximum height, then classify
them to category-3.
To check the possible presence of a lower modifier, find the category with the
maximum number of images and find the average height of the images in this category.
These images in this category will not be having characters with lower modifiers.
The average height calculated is denoted as a threshold to check the presence of
lower modifier that is, character images from other categories with height more than
the threshold are sent for lower modifier segmentation, where lower modifier gets
separated from the core part of the character. The presence of a lower modifier is
considered only if its height is greater than one-fifth of the height of the character.
After identifying and separating lower and upper modifier, the character image is
sent for further character-level segmentation if there are any unsegmented characters
left due to overlapping of pixels because of the presence of modifiers.

5.6 Fused Character Segmentation

A final set of characters remained in the core part of separating upper and lower
modifiers are checked for the presence of fused character. To check whether a char-
acter image has a fused character or not, the same method that is used to check the
possible presence of lower modifier is employed. But here instead of height, consider
the width of the character image. Similarly, categorize every image and find a cate-
gory with the maximum number of images. Then estimate the average width for the
images in this category and threshold is set to this average width. All images from
the other two categories which have width more than a threshold are sent for fused
character segmentation.
180 V. K. Kikkuri et al.

For fused character segmentation, a column needs to be found that separates the
pure form from the full form. It is known that in a fused character, pure forms always
occur on the left side and full consonant on the right side. To find this separating
column follow the steps given below.
1. Find the vertical bar in the rightmost end of the fused character and ignore the
whole part to the right of vertical bar (including vertical bar pixel column). The
vertical bar position is now considered to be the extreme right of the character
image. Now take the column which is pen width columns left to the vertical bar,
denoted as C1. Find continuity and height of the character inscribed between C1
and the column before the vertical bar. If the part of image inscribed between
these boundaries is discontinuous and its height (refer 3.5 for continuity and
height of a character) is greater than one-third of complete character height then
finalize that column as C1 and stop the process. Else, move C1 by one column
to its left and repeat the process.
2. Now find another column C2 from the left end of the character image that is where
pure forms of consonants are positioned. Based on the heights of a pure form
of consonants they are classified into two categories. If the height of character
inscribed between the left most column of image (left_bound_C2) and the column
that is at one-third position of the character width (right_bound_C2), is less than
or equal to 80% of the consonant height is classified to H1 ( ,
etc.), else it is classified to H2 ( etc.). For each class, a different
method is followed to find the column C2, which is compared to C1 and to
estimate the final segmenting column for the fused character.

Estimating C2 for H1 category C2 is initialized to right_bound_C2 and compute


the height of the image inscribed between left_bound_C2 and C2. If the height
of character inscribed between these boundaries is less than one-third of complete
character height then move C2 by one column to the right provided the number of
black pixels in the C2 is more than the number of black pixels in the next column (C2
+ 1). If the height of character inscribed in these boundaries is more than one-third
of complete character height stop the process.
Estimating C2 for H2 category Vertical projection is made to image inscribed
between left_bound_C2 and right_bound_C2. C2 is set to the column with a
maximum number of black pixels on the rightmost side for the part of image inscribed
in these boundaries. C2 is moved by one column to its right provided the number of
black pixels in the C2 column is more than the number of black pixels in the next
column.
C2 must not go beyond half the width of the image for both categories. By
comparing C1 and C2 results as mentioned below, the final segmentation column
is identified.
1. C1 < C2: This means that no segmentation is needed for the character image.
2. C1 ~ C2 < pen width: In this case, C1 is moved by |C1–C2| units to its left and
that will be the final segmentation column.
An Optical Character Recognition Technique for Devanagari … 181

3. C1 ~ C2 > pen width: This means that there are more than two characters present
in the character image. Segment the image using C1 as segmentation column and
the remaining image is sent for further segmentation.

6 Dataset Description

Based on the segmentation methodology followed, the segmented characters are


grouped into core part, pure consonant, and the modifiers (upper and lower). A dataset
has been prepared for the core part using different fonts like “Martel”, “Karma”,
“Sarala”, “Gleego”, “Poppins”, “Eczar” available at [5]. On the whole, there are
85 classes which include the pure and full forms of the consonants, vowels, the
combinations of “ ” coming after consonants ( , etc.) as they cannot
be separated as a lower modifier and excluding the vowels like , etc.,
as the modifiers can be identified and separated in the segmentation phase. There
are around 1.2 lakh images in the dataset with each image having the size 32 × 32
× 3. This dataset is split into train and validation by randomly 20% images from
each class. The images are artificially synthesized by changing contrast, blurring,
brightening and darkening the image.

7 CNN Model Architecture and Training

The model built for recognition of core part character takes an image with size 32
× 32 × 3 as input. This image is passed through several convolution layers in such
a way that the convolution/filter matrix (each with size 3 × 33) width would be
increasing. Max-pooling layer with pool-size 2 × 32 is also added to reduce spatial
dimension and number of training parameters after every few layers of convolution.
The obtained feature matrix is flattened, and hidden layers with activation function
as Rectified Linear Unit are added as shown in Table 1, and the output layer of the
network has 85 nodes (i.e., number of classes) with activation function as “SoftMax”.
Batch normalization layer is also added to the network before flattening the feature
matrix and the dropout layer is added after the first dense layer, to make sure that the
model does not overfit. To generalize the model better, data augmentation technique
is used while training the model. Using this technique more images are generated on
the fly by changing shear range, brightness range, height and width shift ranges, etc.
Model is trained for 20 epochs with batch size being 128. The optimizer used
is Adam with a learning rate 0.001 and performance metric used is accuracy. Since
the dataset used here is balanced, the accuracy metric is claimed to be the best for
evaluation for such datasets. Accuracy is the ratio of true results and the total cases.
The accuracy measures for both training and validation data resulted in >90% as
shown in Fig. 4.
182 V. K. Kikkuri et al.

Table 1 Layer-wise CNN model architecture


Layer name No. of feature matrices Activation function
Pool size/No. of nodes
Convolution_1 4
Convolution_2 4
Convolution_3 8
Convolution_4 8
MaxPooling_1 (2, 2)
Convolution_5 16
Convolution_6 16
MaxPooling_2 (2, 2)
Convolution_7 32
Convolution_8 32
Convolution_9 32
MaxPooling_3 (2, 2)
Convolution_10 64
Convolution_11 64
Convolution_12 64
batch_normalization_1
MaxPooling_4 (2, 2)
Flatten_1
Fully_Connected_1 256 Rectified linear unit
Dropout_1 0.3
Fully_Connected_1 256 Rectified linear unit
Fully_Connected_1 128 Rectified linear unit
Fully_Connected_2 128 Rectified linear unit
Fully_Connected_3 64 Rectified linear unit
Fully_Connected_4 (output) 85 SoftMax

8 Unicode Encoding

The segmented characters from the segmentation part are predicted using the trained
CNN model. Results obtained are mapped to their corresponding Unicode values and
added (Ref: Fig. 5) Devanagari text uses Unicode values ranging from 0900 to 097F
[6–8]. In the segmentation phase, images are named in such a way that, the position
of the character in a word and position of the word in a line can be identified.
An Optical Character Recognition Technique for Devanagari … 183

Fig. 4 Accuracy measures of training and validation data for 20 epochs

Fig. 5 This figure shows examples of Unicode additions

9 Results

Stepwise results for segmentation algorithm presented in this paper are shown for a
sample test image (Fig. 6).

Fig. 6 Sample test image


184 V. K. Kikkuri et al.

Fig. 7 After line


segmentation

Fig. 8 After word segmentation

9.1 Results After Line Segmentation

Top and bottom boundaries are estimated using an algorithm in Sect. 5.1. Using these
boundaries, the test image is cropped out to separate lines Ref: Fig. 7.

9.2 Results After Word Segmentation

Left and right boundaries of each word for every line are estimated as in Sect 5.2.
Using these boundaries each word from every line cropped out and stored Ref: Fig. 8.

9.3 Results After Character Segmentation and Modifier


Separation

Position of Sirorekha is estimated for every word and is whitened. Figure 9a shows
the word#9 of Fig. 8 without Sirorekha.
An Optical Character Recognition Technique for Devanagari … 185

Fig. 9 a After Sirorekha removal b Separated top modifiers. c Separated lower modifiers. d Core
part characters

Left and right boundaries for every character is estimated and are cropped from
their respective word images. Upper modifiers are also separated during character
segmentation and resulting characters are subjected to identification of lower modi-
fier after which further segmentation happens if there are any combined characters
because of overlapping of the modifiers. Figure 9d depicts the final result after sepa-
rating modifiers, and Fig. 9b, c depicts the separated top and lower modifiers for
word#9 of Fig. 8.

9.4 Results After Fused Character Segmentation

Using the technique presented in Sect. 5.6, characters are selected for fused character
segmentation. Final segmentation column estimated by using results of C1 and C2
is used to crop the fused character. The left part of character will be the pure form
and the right part will be full form. Figure 10 depicts the pure form and full form of
the fused character from the word#9 of Fig. 8.

Fig. 10 After fused


character segmentation
186 V. K. Kikkuri et al.

9.5 Core Part Prediction

The core part of character images is fed into the trained CNN model. The predicted
class is mapped to its Unicode, using which reconstruction of the test image can be
done. For example, if images 9_4_1 and 9_5_0 from Fig. 7d is fed into the model,
the output of the model is “ka” and “ma”, respectively. Classifier output is mapped
to the Unicode value “0915” for “ka” and “092E” for “ma” are added together to
produce the word “ ”. For Unicode mapping used a .csv file is used which has
the class names and their respective Unicode values.

10 Conclusion and Future Scope

This presents an approach to building a robust OCR for Devanagari text. The algo-
rithms used for line, word, character, lower modifier and fused character segmentation
in the segmentation phase from [1] are adapted here. A CNN model was trained for
core part recognition using the character dataset synthesized using different Devana-
gari fonts. A Unicode addition approach to reconstruct the image into machine-
encoded text is also exhibited. Future works can include developing dataset for
top and lower modifiers, and building a CNN model for these data samples. Pre-
processing stage can also include skew correction technique to adjust the orientation
of the text. Object detection techniques can also be employed to detect text from
images containing both Devanagari text and other figures.

References

1. Bansal V, Sinha RMK (2002) Segmentation of touching and fused Devanagari characters.
Pattern Recogn 35:875–893. https://doi.org/10.1016/S0031-3203(01)00081-4
2. Vijay K, Sengar P (2010) Segmentation of printed text in devanagari script and Gurmukhi
script. Int J Comput Appl. https://doi.org/10.5120/749-1058
3. Avadesh M, Goyal N (2018) Optical character recognition for Sanskrit using convolution neural
networks. In: 2018 13th IAPR international workshop on document analysis systems (DAS),
Vienna, 2018, pp 447–452
4. Sultana F, Sufian A, Dutta P (2018) Image classification using CNN
5. https://fonts.google.com/?subset=devanagari
6. Chandrakar R (2004) Unicode as a multilingual standard with reference to Indian languages.
Electr Libr 22:422–424. https://doi.org/10.1108/02640470410561947
7. https://unicode.org/charts/PDF/U0900.pdf
8. Nair J, Sadasivan A (2019) A Roman to Devanagari back-transliteration algorithm based on
Harvard-Kyoto convention. In: 2019 IEEE 5th international conference for convergence in
technology (I2CT), Bombay, India, 2019, pp 1–6. doi: https://doi.org/10.1109/I2CT45611.
2019.9033576
9. Bansal V, Sinha R (2001) A Complete OCR for printed hindi text in devanagari script, pp
800–804. https://doi.org/10.1109/ICDAR.2001.953898.
An Optical Character Recognition Technique for Devanagari … 187

10. Bag S, Krishna A. (2015) Character segmentation of Hindi unconstrained handwritten words.
https://doi.org/10.1007/978-3-319-26145-4
11. H. P. M. (2014) Optical character recognition for printed Malayalam documents based on
SVD and Euclidean distance measurement. In: International conference on signal and speech
processing. ICSSP 2014
12. Jyothi J, Manjusha K, Kumar MA, Soman KP (2015) Innovative feature sets for machine
learning based Telugu character recognition. Indian J Sci Technol 8(24)
13. Neena A, Geetha M (2018) Image classification using an ensemble-based deep CNN. Adv Intel
Syst Comput 709:445–456
14. Shah P, Bakrola V, Pati S (2018) Optimal approach for image recognition using deep convo-
lutional architecture. In: Sa P, Bakshi S, Hatzilygeroudis I, Sahoo M (eds) Recent findings
in intelligent computing techniques. Advances in intelligent systems and computing, vol 709.
Springer, Singapore
15. Dineshkumar R, Suganthi J (2015) Sanskrit character recognition system using neural network.
Indian J Sci Technol 8:65. https://doi.org/10.17485/ijst/2015/v8i1/52878
16. Jawahar CV, Kumar MNSSK, Kiran SS (2003) A bilingual OCR for Hindi-Telugu documents
and its applications. 1:408–412. https://doi.org/10.1109/IC-DAR.2003.1227699
17. Sankaran N, Jawahar CV (2012) Recognition of printed Devanagari text using BLSTM
neural network. In: Proceedings of the 21st international conference on pattern recognition
(ICPR2012), pp 322–325
A Machine Learning-Based
Multi-feature Extraction Method
for Leather Defect Classification

Malathy Jawahar, L. Jani Anbarasi, S. Graceline Jasmine,


Modigari Narendra, R. Venba, and V. Karthik

Abstract Automatic inspection for detecting defects in leather is an inevitable


task for grading the leather. Researchers from different parts of the globe have
developed many leather defect classification models to address the problems of
manual inspection. Discriminating defective and non-defective patterns in the leather
substrate are challenging due to the inherent texture variations. Performance of the
feature extraction and classifier plays a vital role in the recognition of the relevant
patterns. Histogram of oriented gradients (HOG) and grey-level co-occurrence matrix
(GLCM) along with Hu moments and HSV are implemented to extract the features
from the leather images. The pivotal process is the extraction of these local and global
features from the leather images. To detect and classify various leather defect types
efficiently, a multi-feature algorithm that combines GLCM and Hog features is also
investigated. Leather defect classification is performed using linear regression (LR),
linear discriminant analysis (LDA), K-nearest neighbour (kNN), classification and
regression tree (CART), random forest (RF), support vector machine (SVM) and

M. Jawahar (B) · R. Venba · V. Karthik


Leather Process Technology Division, CSIR-Central Leather Research Institute, Adyar, Chennai
600020, India
e-mail: malathy.jawahar@gmail.com
R. Venba
e-mail: venbakasi1@gmail.com
V. Karthik
e-mail: karthik.vijayarangan@gmail.com
L. Jani Anbarasi · S. Graceline Jasmine
School of Computer Science and Engineering, VIT University, Chennai 600127, India
e-mail: janianbarasi.l@vit.ac.in
S. Graceline Jasmine
e-mail: graceline.jasmine@vit.ac.in
M. Narendra
Department of Computer Science and Engineering, VFSTR Deemed to be University, Guntur,
India
e-mail: narendramodigari@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 189
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_15
190 M. Jawahar et al.

multi-layer perceptron neural network (MLP). Experimental results show that the
highest classification accuracy (89.75%) is achieved using GLCM along with Hu
moments, HSV colour features and random forest classifier.

1 Introduction

Leather industry is one of the ancient industries in the world, wherein the by-products
of slaughterhouses were utilized and the raw materials were transformed into various
types of leather and high value products. Leather has unique properties like breatha-
bility, feel, comfort, durability and elegance. The leather industry holds a prominent
place in the Indian economy. It is in the top ten foreign exchange earners for the
country. Likewise, it is the second-largest producer of footwear and leather garments
in the world. Own raw material source about 3 billion ft2 of leather is produced
annually as shown in Fig. 1 [1].
The raw material of the leather industry which are hides and skins, suffers from
various defects that downgrade the quality of the leather products. The leather defects
can be classified into ante-mortem and post-mortem where the defects caused while
the animal is alive is ante-mortem and the other is the defect caused after the death
of the animal. Other kinds of defects are caused during leather processing. Brand
marks, tick marks, pox marks, insect bite, wounds, scratches, growth marks, flay cuts,
veininess, wrinkles, fat folds, salt stain, lime blast, chrome or dye patches, drawn
grain, open cuts, pinhole damage, etc. are the common leather defects.
Quality inspection in the assessment of the useful area of leathers is an important
step. The manual inspection system is currently used in leather processing industries.
Manual inspection requires expert knowledge and is highly subjective, tedious and
time-consuming whereas automatic detection leads to reliable, consistent, accurate
and avoids dispute between buyer and seller.
Leather pieces are graded based on the cutting value and the price varies according
to the size and location of surface defects. Grading has to be done carefully since the

Fig. 1 a Profile of leather product manufacturing sector; b Country-wise share of leather and leather
products
A Machine Learning-Based Multi-feature Extraction … 191

Fig. 2 a Leather processing. b Leather defects. c Good leather

price depends upon the quality of the leather. Digital image processing is effectively
used to accurately identify the defect and classify the quality of the leather. The texture
of leather is unique due to its hair pore arrangements and natural grain indentations,
so separating the defect region from the background is a challenging task [2–4].
Leather processing from raw skin to finished leather comprises of four major unit
operations that include pertaining, tanning, post tanning and finishing are illustrated
in Fig. 2a. Few defective leather images and good leather images are shown in Fig. 2b,
c, respectively.

2 Related Work

Various works have been carried out in recent years in classifying the leather defects
from the leather surfaces. It’s a challenging task due to its inherent texture variations
which vary from piece to piece. Several studies have developed using various defect
classification models based on computer vision algorithms [20]. A lot of variations
such as grain surface indentations, colour, texture, brightness exist within the leather
substrate and strong variation within the defective region may lead to inaccuracy.
Moreover, the nature of leather defect features like shape and/or size and/or orienta-
tion and varied distribution of defects also increases the complexity of the problem.
The various types of leather defects and the image size used by the researchers were
illustrated in Table 1.
192 M. Jawahar et al.

Table 1 Various leather defects and image sizes used for analysis
Paper Total no. of images Image size No. of defects Types of defects
[5] 140 200 × 300 5 Lines, holes, stains, wears and
knots
[6] 387 2.5 × 0.5 m2 8 Scars, the mite nests, warts, the
open fissures, the healed scars,
the holes, the pinholes and the
fat folds
[7] 30 8 Scars, mite nests, warts, open
fissures, healed scars, holes,
pinholes and fat folds
[8] 8 Background, no-defect,
hot-iron marks, ticks, open
cuts, closed cuts, scabies and
botfly larvae
[9] 2000 64 × 64 4 Brand marks, made from a hot
iron, tick marks, cuts and
scabies
[10] 80 2048 × 2048 12
[11] 256 × 256 Barbed wires, shingles
[12] 15 600 × 450 Tick marks, brand marks from
hot iron, cuts and scabies
[4] 700 256 × 256 20 Open defects, closed cuts,
ticks, brand marks, Thorn
marks, scratches, bacterial
damage, mechanical damage,
fungal attack, pox mark,
growth marks, insect bites,
lime blast, wrinkles, pipiness,
chrome patch, grain cracks,
dye patch, fatty spew, finish
peel

In this study, an attempt has been made to extract GLCM, HOG, HU, HSV and
multi-variate features. The extracted features were given as training input and classi-
fied using the classifiers like LF, LDA, RF, CART, SVM and MLP. The overall flow
structure of the proposed leather defect classification system is given in Fig. 3.

2.1 Image Acquisition

Leather images are captured using the image acquisition system shown in Fig. 4. The
system used a Sony industrial CCD colour camera with USB 3.0 interface mounted
on a moving carriage. The images are captured as the camera is moved horizontally
from left to right. The captured images are of resolution 1200 × 1600 pixels. The
A Machine Learning-Based Multi-feature Extraction … 193

Fig. 3 Proposed leather defect classification system

Fig. 4 Leather image acquisition system

dataset used in the proposed work included 577 defective and 100 non-defective
images where a split ratio of 80:20 used as training and testing dataset.
194 M. Jawahar et al.

2.2 Image Preprocessing

Leather has large variations in its intensity grey level due to the inherent texture, so
wiener filter is used to preprocess to images. The Wiener filter estimates the target
feature by detecting the observable process of linear time-invariant, implying a known
stationary signal as well as additive noise. This preprocessing is based on minimizing

the mean square error between the estimated signal W (ω) and the original signal
W (ω). Lim and Oppenheim defined the wiener filter [13] as

Wx (ω)
S(ω) = (1)
Wx (ω) + W y (ω)

where Wx (ω) and W y (ω) represents the noise free and the noisy background signal
which is stationary and uncorrelated. The preprocessed image can be computed after
computing the transfer function as

W (ω) = X (ω)S(ω) (2)

2.3 Feature Extraction

In this process, a feature set is found that can accurately distinguish between defec-
tive and non-defective regions. Texture-based features refer to the local intensity
variations in a region of a digital image from pixel to pixel.

2.3.1 Statistical Texture Feature Extraction

Local features of an image identify the relationship between the spatial distribution
and the pixel grey values. Based on the pixel level, statistical-based methods are
categorized as shown in Fig. 5.

Grey-Level Co-occurrence Matrix

A statistical method of examining texture that considers the spatial relationship of


pixels is the grey-level co-occurrence matrix (GLCM), also known as the grey-level
spatial dependence matrix. GLCM is based on the second-order statistics (Fig. 5b)
that characterize the texture of an image by calculating how often pairs of a pixel with
specific values and in a specified spatial relationship occur in an image, creating a
GLCM and then extracting statistical measures from this matrix. GLCM is a second-
order statistical texture function in which the frequency of occurrence in an image
A Machine Learning-Based Multi-feature Extraction … 195

a b

Higher-
Second- order
order
First-
order

Fig. 5 a Statistical-based features. b Design of GLCM matrix computation

is derived from two adjacent neighbouring pixels of the image. The function f (i,
j|d, θ ) in the image describes the likelihood of a pair of grey levels occurring at
the distance d with direction θ. GLCM is calculated based on two terms: neighbour
pixels displacement d and pixel orientation θ. Grey-level co-occurrence matrix can
reveal certain properties about the spatial distribution of the grey levels in the texture
image. For example, if most of the entries in the GLCM are concentrated along the
diagonal, the texture is coarse. Initially, Haralick et al. [14] analysed 14 parameters
that include autocorrelation, contrast, correlation, cluster prominence, cluster shade,
dissimilarity, energy (uniformity), entropy, homogeneity, maximum probability, sum
of square (variance), difference variance, information measure of correlation, inverse
difference normalized (INN) and inverse difference moment normalized.

Histogram of Oriented Gradients (HOG)

The HOG feature descriptor uses the distribution (histograms) directions of gradients
(oriented gradients) which are employed as the major characteristics. The gradients (x
and y derivatives) of an image are useful as they provide large information around the
corners and edges and hence results in bigger in size. As gradients are large around
corners and edges, HOG calculates the gradient images by filtering the greyscale
⎡ ⎤
−1
image with the following filter kernels Dx = [−1 0 1] and Dy = ⎣ 0 ⎦. The
1
magnitude and gradient are found using

g= gx2 + g 2y (3)

gy
θ = arctan (4)
gx

where gx and gy are the gradient vectors; g and θ are the gradient magnitude and the
orientation.
196 M. Jawahar et al.

2.4 Image Classification Using Machine Learning


Algorithms

Machine learning algorithms are used to extract the class of information from a large
set of data. Two types of classification can be done using machine learning tech-
niques: (a) supervised and (b) unsupervised classification. The supervised classifi-
cation includes user information associated with each class during the classification
process. Unsupervised classification finds the classes without human intervention.

2.4.1 Linear Regression (LR)

Linear regression is a supervised machine learning algorithm that performs a regres-


sion task to predict the target values based on independent variables. Linear regression
tries to identify the relationship between forecasting and the variables. This predicts
the dependent value (X) based on the independent value (Y ). The hypothesis function
is given in Eq. 5.

x = θ1 + θ2 ∗ Y (5)

where θ1 and θ2 are the intercept and coefficient of Y. the cost function of the Linear
regression is the RMSE between the predicted (P) and the true value (T ). For reducing
the cost function and to update θ1 and θ2 gradient descent is used in the linear
regression model.

1
n
C= (Pi − Ti )2 (6)
n i=1

2.4.2 Linear Discriminant Analysis (LDA)

LDA is similar to the analysis of variance ANOVA [15] tries to express the dependent
value as a combination of other features and uses continuous independent variables
and a categorical-dependent variable. It can be used as a dimensionality reduction
also. LDA handles the data with the unequal within class frequencies where the
performance can be evaluated using the randomly generated test data [16]. The
LDA can be used in classifying images, speech, etc. Class-dependent and class-
independent transformations are the different approaches of LDA. Class-based means
minimizing the difference between classes to class variance while class-independent
transformation involves the maximization of the ratio of overall variance to the class
variance. The mean of each dataset and the entire dataset are computed as given in
Eq. 7 where β refers to the mean value and C refers to the probabilities of the class
A Machine Learning-Based Multi-feature Extraction … 197

of data.

βx = C1 β1 + C2 β2 (7)

The within Ws and between-class Bs scatter can be computed as given in Eqs. 8


and 9.

Ws = Ci γi (8)
i

Bs = (βi − βx )(βi − βx )T (9)
i

where Ci is the covariance and γi is the probability factor.


The class-dependent transform and the class-independent transform are computed
as given in Eqs. 10 and 11.

CRi = inv(γi )Bs (10)

CR = inv(Ws )Bs (11)

2.4.3 KNN–K-Nearest Neighbour (KNN)

K-nearest neighbour is the oldest and the simple method for the classification process.
Mostly it achieves competitive results for most of the domains trained with proper
knowledge. KNN classifies the unlabelled data by the majority label among its k
neighbours to identify the closest neighbour. When prior information is not available
distance metrics like Euclidean distances are used to measure the similarities and
dissimilarities between the data. However, the distance metric for kNN has to be
chosen based on the problem statement [5–7]. The distance can be computed as
Eq. 12 and the cost function can be computed as Eq. 13.
→
x j = L(−
x i −−
D − → →
x i −−
→ 2
x j ) (12)


n i j L(− →
x i −−→ 2
∈ (L) = x j )

→ 2
n i j (1 − yil ) 1 + L −x i −−

x j  − L(−

x i −−
→ 2
+c x l ) (13)
198 M. Jawahar et al.

2.4.4 Classification and Regression Tree (CART)

Classification and regression tree is a predictive model that analyses how an outcome
value can be predicted based on the other given data. This is a decision tree model
where each fork is splitted as a predictor value where each node includes a prediction
and an outcome variable.

2.4.5 Random Forest (RF)

Breiman proposed a random forest (RF) classifier based on multiple decision trees.
Every tree can be considered a single classifier and computed as a unit to identify the
input function for final classification. Random Forest divides each node randomly
by means of selected features. Error has been predicted for the out-of-bag portion of
every tree and also the feature variable permutation for each variable was computed.
The same computation was carried out after permuting each feature variable. Splitting
stops when the feature variable standard deviation difference equal to 0 [17].
The node impurity measure is computed using Gini index in random forest
classifier. Gini(T ) is defined as


n
Gini(T ) = 1 − P j2 (14)
j=1

where Pj represents dataset T relative frequency with n classes.

2.4.6 Support Vector Machine (SVM)

Initially, SVMs were developed for classification and later extended for regression
and the learning process. A binary classifier is used so that SVM can result in either
positive or negative. It was later improved by combining the multi-class binary classi-
fier. In addition, the SVM can also be used to map the input space into nonlinear cases.
The linear classification is made equivalent to a nonlinear with the input space. The
input feature can be mapped to higher feature dimensions through maximal hyper-
planes in SVM. Based on the kernel parameters and hyperparameters better accuracy
can be achieved. For optimizing the data and to identify the best model kernel plays
a major role [18].

2.4.7 Multi-layer Perceptron Neural Network (MLP)

Multilayer feed-forward neural network consists of input, one or more hidden layers
and output layers. During feed-forward phase, each input node receives and then
transmits the input signal to each of the hidden nodes. Each hidden node calculates
A Machine Learning-Based Multi-feature Extraction … 199

the activation function and transmits its signal to the output node. The output unit
calculates the activation function for a certain input pattern as the network response.
Each output node compares its activation with the desired response during backprop-
agation to produce an error signal for a specific input pattern. This is repeated for
all samples in each training epoch. The error signal at each output unit is distributed
back to all units in the previous layer. The weights and biases are updated to minimize
error on each training epoch. The total squared error of the output is minimized by
gradient descent method known as backpropagation [19]. The error function and the
gradient function and weights change can be shown in Eqs. 15–17, where J is the
Jacobian matrix

H = JT J (15)

g = JTe (16)

−1
ω(n + 1) = ω(n) − [J T J + μI ] + JTe (17)

3 Experiment Results and Discussion

In this research work, multi-feature and multi-classifier analyses were used for leather
defect classification.
The leather database consists of the good leather or non-defective leather images
and defective leather images. The acquired image size was 1200 × 1600 and resized
to 400 × 600 without losing the major field of view. Out of 577 defective and 100
non-defective crust leather images, 542 samples images were used for the training
and remaining 135 samples for testing. Dataset was trained on a computer with
2.80 GHz Intel Core i7 CPU using Python SciPy.
Leather images were initially preprocessed using Wiener filter to smooth the
intrinsic background noises, and from the preprocessed data, GLCM and HOG
features were extracted. Similarly, Hu moments and HSV colour features were also
extracted. Extracted GLCM and HOG features were classified using the machine
learning algorithms like LR, LDA, KNN, CART, RF, SVM and MLP. The parame-
ters of the classifiers were tuned by trial and error method. Table 2 shows the mean and
standard deviation of the classification accuracy results for GLCM (Fea1) and HOG
(Fea2) feature extraction methods using the above seven classifiers using tenfold
cross-validation method.
Using the GLCM feature extraction method, the highest classification accuracy
was obtained for random forest, followed by KNN, SVM, CART, MLP, LDA and
LR. Similarly, RF obtained the highest classification accuracy using the HOG feature
extraction method followed by MLP, SVM, LR, CART, LDA and KNN. It can be
200 M. Jawahar et al.

Table 2 Mean and standard


GLCM feature vector (Fea1) HOG feature vector (Fea2)
deviation of the classification
accuracy results for GLCM LR: 0.830048 (0.316997) LR: 0.827195 (0.301932)
and HOG LDA: 0.846598 (0.257363) LDA: 0.775878 (0.211287)
KNN: 0.871971 (0.210713) KNN: 0.690979 (0.086188)
CART: 0.851493 (0.164304) CART: 0.777458 (0.228294)
RF: 0.888367 (0.190104) RF: 0.852327 (0.292470)
SVM: 0.858275 (0.290307) SVM: 0.843284 (0.328036)
MLP: 0.849956 (0.200788) MLP: 0.848275 (0.27029)

Table 3 Mean and standard


GLCM + Hu + HSV feature HOG + GLCM + Hu + HSV
deviation of the classification
vector (Fea3) feature vector (Fea4)
accuracy results of the
multi-feature set LR: 0.845040 (0.279990) LR: 0.852634 (0.251463)
LDA: 0.863323 (0.167780) LDA: 0.770040 (0.222580))
KNN: 0.871971 (0.210713) KNN:: 0.873442 (0.211344)
CART: 0.847015 (0.191423) CART: 0.804214 (0.185203)
RF: 0.897567 (0.191973) RF: 0.858253 (0.292667)
SVM: 0.873354 (0.237516) SVM: 0.849342 (0.310073)
MLP: 0.833538 (0.242540) MLP: 0.833538 (0.425402)

observed that the random forest outperforms the other classifiers for both the feature
extraction methods.
Subsequently, GLCM features were combined with Hu moments and HSV (Fea3)
colour features and finally HOG, GLCM, Hu and HSV (Fea4) features and these
multi-features were given as input to the seven classifiers. Table 3 shows the mean
and standard deviation of the classification accuracy results of the multi-feature set.
As can be seen from Table 3, random forest was found to be the best classifier for
the multi-feature vector as well.
Figure 6 illustrates the distribution of the classification accuracy data for the
seven classifiers trained using the four feature extraction methods. Leather being
natural material with inherent texture variations, the non-defective images which
had a bold grain or other prominent variations were often misclassified as defective.
These misclassifications can be seen as outliers in Fig. 6 for all the four feature
extraction methods trained with seven classifiers. Random Forest achieved superior
classification accuracy (89.75%) with Fea3(GLCM + Hu + HSV) feature vector
(Fig. 7c). Nevertheless, the classification success rate of GLCM (Fea1) feature vector
was also found to be remarkable (88.83%). Experimental results show that GLCM
texture features trained using Random Forest classifier can be successfully used for
leather defect classification. Furthermore, the multiple feature set fusing features of
GLCM with invariant Hu moments and HSV trained with Random Forest improves
the classification accuracy.
A Machine Learning-Based Multi-feature Extraction … 201

a. GLCM feature Set b. HOG feature set;

c. GLCM+Hu+HSV (Fea3) feature set; d. HOG+GLCM+Hu+HSV (Fea4) feature set

Fig. 6 Classification accuracy using multi-class classifier

4 Conclusion

In this paper, a multi-feature vector that combines GLCM, invariant Hu moments


and HSV colour features was proposed for leather defect classification. A dataset
comprising 577 defective and 100 non-defective images were used in this study.
Experimental results show that the classification accuracy increases as the dimen-
sion of the features increases. GLCM feature along with Hu and HSV for random
forest classifiers achieved an accuracy of 89.75% which had significant improvement
compared to the existing classification schemes.

Acknowledgements The authors gratefully acknowledge the Ministry of Electronics and Informa-
tion Technology (MeitY), Government of India for funding this research and Director, CSIR-CLRI
for his support during the project (A/2020/LPT/GAP1811).

References

1. Council For Leather Exports (CLE). https://leatherindia.org/indian-leather-industry/


2. Jawahar M et al (2016) Compression of leather images for automatic leather grading system
using multiwavelet. In: 2016 IEEE international conference on computational intelligence and
computing research (ICCIC). IEEE
202 M. Jawahar et al.

3. Jawahar M, Vani K (2019) Machine vision inspection system for detection of leather surface
defects.J Am Leather Chemists Assoc 114(1)
4. Jawahar M, Chandra Babu NK, Vani K (2014) Leather texture classification using wavelet
feature extraction technique. In: 2014 IEEE international conference on computational
intelligence and computing research. IEEE
5. Kwak C, Ventura JA, Tofang-Sazi K (2001) Automated defect inspection and classification of
leather fabric. Intell Data Anal 5(4):355–370
6. He F, Wang W, Chen Z (2006) Automatic defects detection based on adaptive wavelet
packets for leather manufacture.In: Technology and innovation conference, 2006. ITIC 2006.
International, pp 2024–2027. IET
7. Pölzleitner W, Niel A (1994) Automatic inspection of leather surfaces. In: Proceedings,
Machine vision applications, architectures, and systems integration III, vol 2347
8. Amorim WP, Pistori H, Jacinto MAC, Sudeste EP.A comparative analysis of attribute reduction
algorithms applied to wet-blue leather defects classification
9. Pistori H, Amorim WP, Martins PS, Pereira MC, Pereira MA, Jacinto MAC (2006) Defect
detection in raw hide and wet blue leather.In: CompIMAGE, pp 355–360
10. Yeh C, Perng DB (2001) Establishing a demerit count reference standard for the classification
and grading of leather hides. Int J Adv Manuf 18:731–738
11. Peters S, Koenig A (2007) A hybrid texture analysis system based on non-linear & oriented
kernels, particle swarm optimization, and kNN vs. support vector machines.In: 7th international
conference on Hybrid intelligent systems, 2007. HIS 2007, pp 326–331. IEEE
12. Viana R, Rodrigues RB, Alvarez MA, Pistori H (2007) SVM with stochastic parameter selec-
tion for bovine leather defect classification.In: Pacific-Rim symposium on image and video
technology, pp 600–612. Springer, Berlin
13. Bahoura M, Rouat J (2001) Wavelet speech enhancement based on the teager energy operator.
IEEE Signal Process Lett 8(1):10–12
14. Haralick RM, Shanmugam K, Dinstein IH (1973) Textural features for image classification.
IEEE Trans Syst Man Cybern 6:610–621
15. Balakrishnama S, Ganapathiraju A (1998) Linear discriminant analysis-a brief tutorial.Inst
Signal Inf Process 18
16. Utpal B, Dev Choudhury R (2020) Smartphone image based digital chlorophyll meter to
estimate the value of citrus leaves chlorophyll using Linear Regression, LMBP-ANN and
SCGBP-ANN.J King Saud Univ Comput Inf Sci (2020)
17. Breiman L (2001) Random forests. Mach Learn 45(1):5–32
18. Sharon JJ, Jani Anbarasi L, Edwin Raj B (2018) DPSO-FCM based segmentation and Classifi-
cation of DCM and HCM heart diseases.2018 Fifth HCT information technology trends (ITT).
IEEE
19. Haykin S (1994) Neural networks: a comprehensive foundation. Prentice Hall PTR
20. Sahin EK, Colkesen I, Kavzoglu T (2020) A comparative assessment of canonical correlation
forest, random forest, rotation forest and logistic regression methods for landslide susceptibility
mapping.Geocarto Int 35(4):341–363
Multiple Sclerosis Disorder Detection
Through Faster Region-Based
Convolutional Neural Networks

Shrawan Ram and Anil Gupta

Abstract Multiple sclerosis is a leading brain disorder that highly affects the normal
functions of the human body. Due to this disorder, protective coverings of neuron
cells are get damaged, which causes disrupting the information flow inside the brain
and other body parts. The early detection of multiple sclerosis helps healthcare prac-
titioners to suggest a suitable treatment for the disease. The detection of multiple
sclerosis is a challenging task. Many types of approaches had been proposed by the
researchers and academicians for accurately detecting the brain lesions. Precisely,
detecting the brain lesions is still a big challenge. Due to the recent innovations
in the field of image processing and computer vision, healthcare practitioners are
using advanced disease diagnosis systems for the prediction of disorders/diseases.
Magnetic resonance imaging approach is used for the detection of various brain
lesions by the neurosurgeons and neurophysicians. The computer vision approaches
are playing a major role in the automatic detection of various disorders. In this
research paper, the faster region-based convolutional neural networks approach
is proposed based on computer vision and deep learning, using transfer learning
for the detection of multiple sclerosis as a brain disorder. The proposed approach
is detecting the damaged area inside the brain with higher precision and accu-
racy. The proposed model detects the multiple sclerosis brain lesions with 99.9%
accuracy. Three DAGNetworks are used for training; there are Alexnet, Resnet18,
and Resnet50. As compare to Alexnet and Resnet18, deep networks, the Resnet50
Pre-trained network performed well with higher accuracy of detection.

Keywords Multiple sclerosis · Magnetic resonance imaging · Brain lesions ·


Computer vision · Convolutional neural networks · Deep learning

S. Ram (B) · A. Gupta


Department of Computer Science and Engineering, MBM Engineering College, Jai Narain Vyas
University, Jodhpur, Rajasthan, India
e-mail: shrawanbalach@jnvu.edu.in
A. Gupta
e-mail: anilgupta@jnvu.edu.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 203
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_16
204 S. Ram and A. Gupta

1 Introduction

Human beings are fighting against various diseases since the beginnings of human
civilization. Many types of healthcare systems were developed and developed from
time to time as per the human requirements. Various types of brain-related diseases
were found and investigated by healthcare scientists. Multiple sclerosis is one type
of neurological brain disorder that causes disability in every age of men and women
[1]. The symptoms of multiple sclerosis disease were first time defined by Jean-
Martin Charot, a French professor of anatomical pathology [2]. This type of disorder
highly affects the parts of the central nervous system including the main parts of the
brain such as spinal cord, cerebrum, cerebellum, and optic nerves [3]. It is estimated
through the recent study done by the National Multiple Sclerosis Society that more
than one million people are living with multiple sclerosis brain disorders in the USA
[4]. The National Multiple Sclerosis Society also found that more than 2.3 million
people are living with multiple sclerosis across the world. The researchers found that
the ration of women suffering from multiple sclerosis is higher than the men [5].
The main cause of multiple sclerosis is the damage of myelin sheath which is an
insulating cover around the nerves [6]. Multiple sclerosis lesions mostly affect the
white matter or gray matter inside the brain [7]. Magnetic resonance images (MRIs)
have become the most important source of disease diagnosis. Various MRI modalities
such as axial, coronal, and Segital, are used by the healthcare practitioners for refer-
encing. With MRI, medical experts can detect brain disorders and control the progres-
sion of the disease by proper treatment. Magnetic resonance images clearly show
disease activity and active lesions. The neurologists compare the scanned images
based on white and dark area distribution to find out the damaged and healthy tissues
[8]. MRI scans are very useful for detecting various brain tumors, traumatic brain
injuries, Alzheimer’s disease [9, 10], Parkinson’s disease, brain strokes, dementia,
brain infections, and multiple sclerosis brain lesions [8]. The commonly used MRI
imaging sequences are T1-weighted, T2-weighted, and fluid-attenuated inversion
recovery (Flair). The contrast and brightness of the images are controlled through
the time of echo (TE) and time of repeating (TR). Both the T1-weighted and T2-
weighted MRI images are used by the neurological experts for the diagnosis of disease
[8]. In this research paper, the T2-weighted MRI images of multiple sclerosis are
downloaded [11–14].
The recent advancement in the field of artificial intelligence and machine has
opened the door for the healthcare expert to use the automatic disease diagnosis
tools and systems to find out the nature and effects of various diseases on human
beings [15]. The deep learning approach, which is the subfield of machine learning,
is playing a major role in the domain of medical image analytics [16, 17]. Through
the deep learning approach, the large volume of medical imaging records can be
explored and analyzed. The higher computing power as such graphics processing
units (GPUs) manufactured by various leading companies is also playing a dramatic
role in the field of machine learning. The GPUs are used as the main hardware for
the implementation of deep learning algorithms. The object detection through deep
Multiple Sclerosis Disorder Detection Through Faster … 205

learning using convolutional neural networks is influencing the area of automatic


disease classification and detection through medical robots [18]. The convolutional
neural networks [19], are used to extract the features from the large volumes of
image datasets through the convolution operation by filter weights learned as a part
of the training process [20]. The architecture of the convolutional neural network
consists of various layers for automatic feature extraction and classification. Pre-
trained neural networks are also used as a directed acyclic graph networks for deep
learning. A directed acyclic graph network is a kind of neural network designed
with many layers having inputs from multiple previous layers and output for the
next multiple layers [21]. The pre-trained deep networks are Alexnet, Googlenet,
Resnet18, Resnet50, Resnet101, VGG16, VGG19, and inceptionv3, etc. [21]. All
these large deep networks are trained by the designers with a large volume of images
collected from different sources. These networks can be trained for the classification
and detection of objects [22]. The faster region with convolutional neural network (R-
CNN) object detection method dramatically improved in the state of the art in object
recognition and object detection. Faster region with convolutional neural networks
(R-CNN) approach with pre-trained deep networks is widely used for object detection
in many areas. This approach of object detection is more powerful, efficient, and
accurate [21].
The object detection network such as faster R-CNN is designed for feature extrac-
tion using the pre-trained networks such as Resnet18, Resnet50, and inception v3,
etc. It is explored by the researchers that convolutional feature maps are used by
region-based detectors such as faster R-CNN [23]. A region proposal network is
trained to generate the object proposals where the object of interest exists. The next
subnetwork is designed and trained to predict the object of interest. In this research
paper, the object detection model has been designed using the faster R-CNN object
detection approach for the detection of multiple sclerosis brain lesions. The faster
R-CNN network model takes as input an entire image and set object proposals [24].
The faster R-CNN network is trained with the help of ground truth values of images
through pre-trained deep networks such as AlexNet, Resnet18, and Resnet50. The
proposed model is implemented in the MATLAB R2020a version using deep learning
and computer vision approaches.
The rest of the paper is organized into four sections such as Sect. 2 focused on
related work carried out by the researchers and academicians for the detection and
classification of brain tumors through fast R-CNN and faster R-CNN. Section 3
explores the research methodology adopted for the implementation of the proposed
model using the datasets collected online. Section 4 explains the implementation
strategy with transfer learning through pre-trained deep networks. The last section
is related to the conclusion about the results obtained through the experiments by
suggesting the future scope of the study.
206 S. Ram and A. Gupta

2 Related Work

The academicians and researchers have proposed various object recognition and
object detection approaches using the convolutional neural networks [25, 26], with
transfer learning approaches. The pre-trained neural networks are designed and
trained with large image datasets are becoming more suitable networks for the clas-
sification and object detection tasks. To train a large network is a very expensive
and tedious task. Through the literature review, it is found that various approaches
to object detection were proposed by academicians.
Shaoqing Ren et al. proposed the approach for real-time object detection with
region proposal networks. They designed a region proposal network that shares the
features of a convolutional neural network with the detection network. The region
proposal network is a somewhat fully convolutional network that is used to predict
object bounds and object detection scores at the desire position. The authors explored
the approaches of object detection with pre-trained networks [23].
Ross Girshick explored a faster region-based convolutional neural network (R-
CNN) for object detection. The author experienced that as compared to the image
classification process, object detection is a more challenging task. The object detec-
tion process requires more complex methods. The author proposed an algorithm
based on single-stage training that jointly learns to classify object proposals and
update the corresponding spatial locations. The VGG16 deep network was trained
through the above-said method [24].
R. Ezhilarasi and P. Varalakshmi proposed a model for brain tumor detection
using the faster R-CNN approach. The Alexnet pre-trained network model was used
for the classification of various tumors as a basic model along with region proposal
network through the faster R-CNN approach. The transfer learning approaches were
used during the training of the network. The faster R-CNN was used for the detection
of brain tumor with creating the bounding box around the tumor area with tumor type
[27].
Ercan AVSAR and Kerem SALCIN proposed the approach for classification and
detection of brain tumors from MRI images through faster R-CNN. The authors
applied the faster R-CNN approach to brain MRI images to detect and locate the
tumor area. The authors said that approach used for the detection and classification is
more efficient and accurate as compared to simple R-CNN and fast R-CNN methods.
They achieved a 91.66% classification accuracy [28].

3 Research Methodology

This part of the research paper explains the research methodology used to detect
multiple sclerosis brain lesions. Multiple sclerosis detection approaches are carried
systematically from data collection to pre-processing the image datasets and finally
implementing the proposed approach. The most important step of the research is
Multiple Sclerosis Disorder Detection Through Faster … 207

the selection of a suitable dataset for implementing the model. The performance of
the model mostly depends on the quality of the image datasets. After the selection
of suitable datasets and pre-processing the MRI images, the selection of convo-
lutional neural network architecture is a very essential and important step. The
research methodology adopted for the proposed research is based on the labeling
of datasets with ground truth values. The step of labeling the datasets is a very
important part of the research methodology. After labeling the image, dataset four
pre-trained DAGNetworks are used to train the model to extract the features from
image datasets. The region proposal network is trained with the features extracted
from the pre-trained networks. Different training parameters are selected to train the
proposed model.

3.1 The Dataset Collection

The multiple sclerosis MRI images are collected online from [11, 13, 14, 29], contain
38 patients T2-weighted MRI images in TIFF and BMP image file formats. These
images are the collection of first and second examination with the very beginning
and 6–12 months’ time interval. The total numbers of MRI images are 718. Some
images of FLAIR modality are also included in the dataset. The dataset is prepared
with the help of online downloaded images as well as images collected from the
above-cited sources.

3.2 Pre-processing of MRI Images

The image datasets originally collected from the sources were of different formats.
All the images are converted to a PNG file format with size 512 × 512. After data
conversion into the PNG file format, an image datastore is created through MATLAB
image processing tools. All the MRI images are labeled using the image labeler
approach of the MATLAB. A ground truth datastore is created with a label “multiple
sclerosis.” Few sample images are shown in Fig. 1.

3.3 Multiple Sclerosis Brain Disorder Detection Through


Pre-trained Deep Networks

Deep learning is one of the very powerful machine learning approaches that auto-
matically extract the images features through learnable filter weights [21]. The faster
region-based convolutional neural network approach proposed by Ren et al. is widely
used for the object detection and classification [23]. The faster R-CNN approach is
208 S. Ram and A. Gupta

Fig. 1 Multiple sclerosis sample images

based on the pre-trained convolutional neural network and region proposal network
(RPN) [23]. In this research work, Alexnet, Resnet18, Googlenet, and Resnet50 pre-
trained deep networks are used as the base models with the faster R-CNN object
detection approach. The base models are trained with ground truth label images. The
AlexNet is a pre-trained network with 25 layers, having five convolutional layers
with three fully connected layers. The second pre-trained DAGNetwork used for the
training is Resnet18. It is a deep network with 71 layers with output name as “Clas-
sificationLayer_predictions” [21]. The third pre-trained DAGNetwork used for the
training is Resnet50. All the above-mentioned pre-trained networks are trained with
ground truth labels to extract the features from the labeled images. Two subnetworks
are used after extracting the features from the labeled images. The region proposal
network (RPN) is a subnetwork used after the feature extraction process and it is
trained to generate the object proposals [21]. The object proposals are the areas
inside the image where the object of interest exists. The next subnetwork is trained
Multiple Sclerosis Disorder Detection Through Faster … 209

Fig. 2 Proposed detection steps

to predict the actual class of each object proposal [21]. The region proposal network
is one kind of convolutional neural network consists of convolutional layers and a
proposal layer.
The bounding boxes with detection probability are drawn within the image
showing the region of interest (ROI) through the evaluation of an object detector that
is trained using region proposal network. The performance of the model measures
through the accuracy of the detection of the infected part within the image by drawing
the bounding around the infected part. The rectified linear unit (ReLU) activation
function is used after each convolution operation to take positive values. It is the
most commonly used action function in deep learning algorithms.

f (x) = max(0, wx + bias) (1)

Where w is the learnable weight and x is the input values in the form of the
image matrix. The other performance measurements of accurately identifying the
information are recall and precision values obtained through each testing step. The
precision is the value that indicates the positive predictive result refers to the positively
predicted values divided by the sum of positive predictive value and false positive
predicted values. It can be written in the following formula

True Positive
Precision = (2)
(True Positive + False Positive)
210 S. Ram and A. Gupta

The recall is the value representing the positively predicted values with a fraction
of the sum of positive predicted values and false negative values. It can be written
using Formula 3 as given below.

True Positive
Recall = (3)
(True Positive + False Negative)

4 Experimental Setup and Results

The model is trained on HP Z6 Workstation having Windows 10 pro workstation oper-


ating system, Intel Xeon Silver 4110 two CPUs of 2.1 GHz with 32 GB RAM. The
workstation is equipped with NVIDIA Quadro P5000 GPU having 16 GB GDDR5X
GPU Memory, 2560 NVIDIA CUDA Cores, and 8.9 Teraflops Computing Power. The
model is implemented using the MATLAB programming environment with the help
of deep learning and computer vision approaches using MATLAB version R2020a
[23]. The Stochastic Gradient Descent Momentum (SGDM) optimizer is used as
a part of training options. SGDM is one of the powerful optimization algorithms
generally used to train the convolutional neural networks. The dataset contains 718
MRI images of multiple sclerosis. 602 images are labeled with ground truth label as
“Multiple_Sclerosis” in the form of a rectangular region of interest using the image
labeler approach in MATLAB version R2020a. The ground truth labels are saves in
the .mat file format.
The ground truth labels are loaded in the MATLAB programming environment.
After loading the ground truth labels, the gTruth value is generated with three data
values such as data source, label definition table, and ground truth label data stored in
a table form. After storing the ground truth label data into a table, the dataset is split
in a training dataset and testing dataset to train and evaluate the detector. The first
step is used to train the pre-trained deep network with the help of a transfer learning
approach. In the second step, the region proposal network is trained to generate the
region proposals (either object or background). The final step is used to train the
network to detect the actual class of the object (either brain disorder or background).

4.1 The Results Generated Through Pre-trained Deep


Network

All the steps as described in Sect. 4 are repeated to train the AlexNet, Resenet18,
Googlenet, and Resnet50 pre-trained deep network.
Multiple Sclerosis Disorder Detection Through Faster … 211

4.1.1 Results Generating Through AlexNet Pre-trained Network

The Alexnet pre-trained network is trained on the above-mentioned hardware. During


the training period negative overlap range and positive overlap, range values are
decided so that the training samples tightly overlap with the actual ground truth
rectangular region of interest. The AlexNet pre-trained network is trained using the
train faster R-CNN object detection function. The network is trained in 1 h, 17 min,
and 33 s. Through the training process, a training table is generated which consists
of training epochs, iteration required for training, time elapsed, mini-batch loss,
mini-batch accuracy, mini-batch root mean squared error, region proposal network
mini-batch accuracy, region proposal mini-batch root mean squared error, and base
learning rate. Figure 3 shown below displays a graph plotted between precision
and recall. The average precision graph between precision and recall depicts how a
detector’s performance is varying with levels of recall.
Figure 4 shown depicts the mini-batch accuracy and mini-batch loss. Mini-batch
accuracy can be seen increasing as the loss is decreasing.
Figure 5 shows that the mini-batch root mean squared error and region proposal
squared errors; both are decreasing concerning each other.
As seen in Fig. 6, the region proposal network mini-batch accuracy is higher as
compared to mini-batch accuracy during the training period.
116 images are used for testing the detection accuracy of the object detector
Fig. 7 showing the part of 116 images with a bounding box and detection probability
printed inside each image through the testing of the object detector network trained
with AlexNet pre-trained network. The detector precisely detected the infected area
with high accuracy as provided in Fig. 7.

Fig. 3 Graph between precision and recall (Alexnet)


212 S. Ram and A. Gupta

Fig. 4 Mini-batch accuracy and mini-batch loss (Alexnet)

Fig. 5 Mini-batch RMSE and RPN mini-batch RMSE (Alexnet)

Fig. 6 Mini-batch accuracy and RPN mini-batch accuracy (Alexnet)


Multiple Sclerosis Disorder Detection Through Faster … 213

Fig. 7 Infected area detected by multiple sclerosis detector with Alexnet

4.1.2 Results Generated Through Resnet18 Pre-trained Network

The Resnet18 pre-trained network is trained using the train faster R-CNN object
detection function. The training is completed in 4 h, 3 min, and 12 s. Through the
training process, a training table is generated which consists of training epochs,
iteration required for training, time elapsed, mini-batch loss, mini-batch accuracy,
mini-batch root mean squared errors, region proposal network mini-batch accuracy,
region proposal network mini-batch root mean squared errors, and base learning rate.
Figure 8 shown display a graph plotted between Precision and recall. The precision
and recall are calculated based on true positive detected values to true positive plus

Fig. 8 Graph between precision and recall (Resnet18)


214 S. Ram and A. Gupta

false negatives values using, the Formulas (2) and (3). The Resnet18 network has
higher precision and accuracy as compared to Alexnet.
Figure 9 shown depicts the mini-batch accuracy and mini-batch loss during the
training period of the Resnen18 deep network.
Figure 10 shown below depicts the mini-batch root mean squared error and region
proposal squared errors; both are decreasing with respect to each other.
The region proposal network mini-batch accuracy is seen higher as compared to
mini-batch accuracy during the training period.
The multiple sclerosis detector trained with Resnet18 pre-trained network is tested
on 116 images of multiple sclerosis and the few images as a part of 116 images are
shown below in Fig. 12, with detection accuracy printed inside each image in the
form of detection probability.

Fig. 9 Mini-batch accuracy and mini-batch loss (Resnet18)

Fig. 10 Mini-batch RMSE and RPN mini-batch RMSE (Resnet18)


Multiple Sclerosis Disorder Detection Through Faster … 215

Fig. 11 RPN mini-batch accuracy and mini-batch accuracy (Resnet18)

Fig. 12 Infected area detected by multiple sclerosis detector with Resnet18


216 S. Ram and A. Gupta

4.1.3 Results Generated Through Resnet50 Pre-trained Network

The Resnet50 pre-trained network is trained using the train faster R-CNN object
detection function on the same hardware platform as used for training the Alexnet
and Resnet18. In this proposed research, the DAGNetwork [21] is retrained on the
grayscale images. Through the training process, a training table is generated which
consists of the same fields as the table created through the Resnet18 training process
[21]. Figure 13 shown depicts the graph plotted between precision and recall.
Figure 14 shown displays the mini-batch accuracy and mini-batch loss of the
pre-trained deep network.
Figure 15 depicts the mini-batch RMSE and region proposal network mini-batch
RMSE.

Fig. 13 Graph between precision and recall (Resnet50)

Fig. 14 Mini-batch accuracy versus mini-batch loss


Multiple Sclerosis Disorder Detection Through Faster … 217

Fig. 15 Mini-batches RMSE versus RPN mini-batch RMSE

There is a very small variation in the values of mini-batch accuracy and RPN
mini-batch accuracy as can be seen from Fig. 16.
Disorder detection accuracy of the multiple sclerosis detector trained with
Resnet50 is comparatively higher as compare to Alexnet, Resnet18, pre-trained deep
network. 36 images with detection probability printed inside the images are shown
in Fig. 17.
The values of precision and recall of all three networks are generated during the
training process of the pre-trained networks. The values of precision and recall are
generated through each epoch and the average of all values is calculated. A table for
precision and recall is created during the training period of each pre-trained network.

Fig. 16 Mini-batch accuracy versus RPN mini-batch accuracy


218 S. Ram and A. Gupta

Fig. 17 Infected area detected by multiple sclerosis detector with Resnet50

5 Conclusion

This research paper explored the transfer learning approaches with the help of the
pre-trained deep networks to provide more accurate results for the detection of brain
disorder. The pre-trained DAGNetworks is trained on grayscale image and achieve
higher brain disorder detection accuracy by comparing the performance of three
deep networks. All three pre-trained network’s performance is compared based on
precision, recall, and detection accuracy with bounding boxes. The Resnet50 deep
network has a higher precision value as compared to Alexnet and Resnet18 networks.
The 99.9% detection accuracy is achieved for the multiple sclerosis brain disorder
detection. The model can be used to detect the brain disorder inside the real-life
MRI images of the multiple sclerosis. As compared to other models proposed by
the researchers, our model for the brain disorder detection has the higher detection
Multiple Sclerosis Disorder Detection Through Faster … 219

Table 1 Performance
Alexnet Resnet18 Resnet50
comparisons
Average precision 0.928418 0.966239 0.977778
Average recall 0.47114 0.480835 0.482276

accuracy. As a future scope, the research work in the domain of medical image
processing applications can be explored.

References

1. Multiple Sclerosis available [online]. https://apps.who.int/iris/bitstream


2. Multiple sclerosis: the history of a disease. J. R. Soc. Med. 98(6):289. https://www.ncbi.nlm.
nih.gov/pmc/articles/PMC1142241
3. Multiple Sclerosis Report available [online]. https://www.who.int/mental_health/neurology/
Atlas_MS_WEB.pdf
4. Multiple sclerosis: facts, statistics, and you. Available https://www.healthline.com/
5. Prevalence and incidence of multiple sclerosis available [online]. https://www.mstrust.org.uk/
a-z/prevalence-and-incidence-multiple-sclerosis
6. Myelin sheath available [online]. https://medlineplus.gov/ency/article/002261.htm.
7. Uijlings JRR, van de Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for
object recognition. Int J Comput Vision 104(2):154–171
8. Magnetic resonance imaging (MRI) of the brain and spine: basics available. https://casemed.
case.edu/clerkships/neurology/NeurLrngobjectives/MRI.htm
9. Gunawardena KANNP, Rajapakse RN, Kodikara ND (2017) Applying convolutional neural
networks for pre-detection of Alzheimer’s disease from structural MRI data. In: 2017 24th
international conference on mechatronics and machine vision in practice (M2VIP), Auckland,
2017, pp 1–7. https://doi.org/10.1109/M2VIP.2017.8211486
10. Shahar A, Greenspan H (2004) A probabilistic framework for the detection and tracking in
time of multiple sclerosis lesions. In: 2004 2nd IEEE international symposium on biomedical
imaging: nano to macro (IEEE Cat No. 04EX821), Arlington, VA, USA, 2004, pp 440–443,
vol 1. https://doi.org/10.1109/ISBI.2004.1398569
11. Loizou CP, Murray V, Pattichis MS, Seimenis I, Pantziaris M, Pattichis CS (2011) Multi-scale
amplitude modulation-frequency modulation (AM-FM) texture analysis of multiple sclerosis
in brain MRI images . IEEE Trans. Inform. Tech. Biomed. 15(1):119–129
12. Loizou CP, Kyriacou EC, Seimenis I, Pantziaris M, Petroudi S, Karaolis M, Pattichis CS (2013)
Brain white matter lesion classification in multiple sclerosis subjects for the prognosis of future
disability. Intell. Decis. Technol. J. (IDT) 7:3–10 (2013)
13. Loizou CP, Pantziaris M, Pattichis CS, Seimenis I (2013) Brain MRI image normalization in
texture analysis of multiple sclerosis. J. Biomed. Graph. Comput. 3(1):20–34
14. Loizou CP, Petroudi S, Seimenis I, Pantziaris M, Pattichis CS (2014) Quantitative texture
analysis of brain white matter lesions derived from T2-weighted MR images in MS patients
with clinically isolated syndrome. J Neuroradiol 2015 Apr 42(2):99–114
15. Manoharan S, Ponraj N (2019) Precision ımprovement and delay reduction ın surgical
telerobotics. J Artif Intell 1(01):28–36
16. Zitnick CL, Dollar P (2014) Edge boxes: locating object proposals from edges. In: European
conference on computer vision, Zurich, Switzerland, Sept 2014, pp 391–405
17. LeCun Y, Bengio Y, Hinton G (2015) Deep Learning. Nature 521:436–444 (2015)
18. Smys S, Ranganathan G (2019) Robot assisted sensing, control and manufacture ın automobile
ındustry. J. ISMAC 1(03):180–187
220 S. Ram and A. Gupta

19. Krizhevsky A, Sutskever I, and Hinton G (2012) ImageNet classification with deep convolu-
tional neural networks. In: Proceedings of advances in neural ınformation processing system,
pp 251090–1098
20. Ram S, Gupta S, Agrawal B (2018) Devanagari character recognition model using deep convo-
lution neural networks. J. Stat. Manag. Syst. 21(4):593–599. https://doi.org/10.1080/09720510.
2018.1471264
21. MATLAB R2020a, The MathWorks, Inc., Natick, Massachusetts, United States
22. Ettinger GJ, Grimson WEL, Lozano-Perez T, Wells WM, White SJ, Kikinis R (1994) Automatic
registration for multiple sclerosis change detection. In: Proceedings of IEEE workshop on
biomedical image analysis, Seattle, WA, USA, pp 297–306. https://doi.org/10.1109/BIA.1994.
315885
23. Ren S, He K, Gershick R, Sun J (2017) Faster R-CNN: towards real-time object detection with
region proposal networks. IEEE Trans Patt Anal Mach Intell 39(6):1137–1149
24. Girshick R (2015) Fast R-CNNproceedings of the 2015 IEEE international conference on
computer vision. Santiago, Chile, Dec. 2015, pp 1440–1448
25. Solanki D, Ram S. Object detection and classification through deep learning approaches
26. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object
detection and semantic segmentation. In: Proceedings of the 2014 IEEE conference on computer
vision and pattern recognition, Columbus, OH, June 2014, pp 580–587
27. Ezhilarasi R, Varalakshmi P (2018) Tumor detection in the brain using faster R-CNN. In:
Proceedings of the second ınternational conference on I-SMAC
28. Ercan AVSAR, Kerem SALCIN (2019) Detection and classification of brain tumors from MRI
ımages using faster R-CNN. Tehnıckı Glasnık 13(4):337–342
29. Loizou CP, Kyriacou EC, Seimenis I, Pantziaris M, Petroudi S, Karaolis M, Pattichis CS (2013)
Brain white matter lesion classification in multiple sclerosis subjects for the prognosis of future
disability. Intell Decis Technol J (IDT) 7:3–10
Retaining Named Entities for Headline
Generation

Bhavesh Singh, Amit Marathe, Ali Abbas Rizvi, and Abhijit R. Joshi

Abstract Text summarization is a way of generating a more succinct block of text


which sustains noteworthy data and the general significance of the source text. This
paper overcomes some of the concerns of three existing text summarization archi-
tectures (transformer, pointer generator, attention), specifically used for converting
circumlocutory news articles into crisp headlines. The dataset was scraped from a
popular news Web site, Inshorts, where a variety of news articles and headlines are
readily available (Inshorts.com in Breaking news headlines: read all news updates
in English—Inshorts, 2020 [1]). A major challenge in headline summarization is the
named entities viz. names of people, places, products, etc., which are lost in this
process. These named entities are important for better headline summarization. In
this paper, an approach is proposed that preserves these named entities in the output
headline resulting in a better summarization. The paper also presents an evaluation
of these abstractive approaches based on ROGUE metrics.

Keywords Summarization · Inshorts · Seq2Seq · Transformer · Pointer


generator · Attention

B. Singh (B) · A. A. Rizvi · A. R. Joshi


Department of Information Technology, D.J. Sanghvi College of Engineering, Mumbai, India
e-mail: bhavesh.singh@djsce.edu.in
A. A. Rizvi
e-mail: aliabbas.rizvi@djsce.edu.in
A. R. Joshi
e-mail: abhijit.joshi@djsce.ac.in
A. Marathe
Department of Electronics and Telecommunication, Xavier Institute of Engineering, Mumbai,
India
e-mail: amitmarathe13@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 221
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_17
222 B. Singh et al.

1 Introduction

Due to an exponential rise in the amount of data available on the Web, it can get
really difficult for an individual to collect relevant information. It not only takes a
lot of effort to read through this enormous data, but also consumes a lot of valuable
time. A challenging task is to obtain as much of the relevant information as possible,
in the least amount of time. A need arises to have tools that can quickly brief or
summarize the important points of a large document, to ensure that human effort and
time is saved.
Majorly, there are two approaches followed for text summarization, namely
extractive and abstractive summarization. The former uses the content verbatim and
reforms it into a shorter version, while the latter uses an internal language repre-
sentation to generate more human-like summaries, paraphrasing the content of the
original text.
Natural language processing (NLP) is a huge domain, and one of its branches
extends down to text processing. One of the main applications under text processing
is text summarization. A large number of researches under this field use a sequence-
to-sequence model, a deep neural network technique, which gives an amazing output
accuracy on well-trained data.
As technology and science have advanced to such a great extent, and with the
sudden boom in the fields of AI and ML, automation of work has been undertaken
in several domains, and lately, this has been adopted by few news companies for
news summarization. Although, in such applications of text summarization, a major
problem that arises is retaining named entities. News headlines, report summaries
and all similar use cases have an important requirement of not losing important facts
and figures during summarization.
Inshorts [1], a company which invented an app that offers the latest news stories,
has recently developed a new algorithm Rapid60, that can automatically summarize
a full-length news article to a 60 words news brief, also creating a headline and a
card image.
This paper explores three models of text summarization. The first is a vanilla
encoder–decoder model with the recurring unit being a LSTM layer for long-term
dependencies [2]. This model is useful in figuring out the impact of every word from
the input article on the headline also called attention [3]. In this model, one can see
the system of the encoder as a text capturing mechanism from an article, which is
then decoded to generate the headline.
The second model, i.e., pointer generator has a similar architecture with a mech-
anism that points out at words directly from the context. As the named entities are
crucial to the credibility of the news, this system points the words that affect the
semantics significantly, from the article directly to the headline, to better preserve
the meaning of the article [4].
The transformer, a completely different model, uses the concept of self-attention
and parallelization rather than recurrence for finding dependencies and boosting
Retaining Named Entities for Headline Generation 223

speed. This model has achieved good results in many NLP tasks, and its n-headed
self-attention is a desirable feature for the summarization task at hand [5].
In this paper, the focus is on the evaluation of an accuracy of generated headlines
using the above-mentioned text summarization models. In addition to this, a new
method is proposed for retaining named entities, which are useful in generating
accurate headlines [6].
The rest of the paper is organized as follows. Section 2 throws light on the related
works carried out on text summarization. Section 3 explores the proposed approach
along with the design and implementation details of the system. Section 4 presents
an analysis of the results obtained from the system and evaluated based on ROGUE
metrics. The paper ends with a conclusion and directions for future work.

2 Literature Survey

Apart from being dependent on an optimal function, text summarization also relies
on a sentence similarity measure, up to a certain extent. It can significantly improve
the efficiency of abstractive summarization techniques. Masum and et al. modified
existing models and algorithms to generate headlines [7]. In addition to this, some
further processing, like forming classifications for named entities, have been carried
out by them to improve system accuracy and minimize possible problems [7].
Hanunggul and Suyanto presented a comparison between the two types of atten-
tion: global and local. They observed that a larger number of words pertaining to
the original summary were produced in the global attention-based model. While in
the local attention-based model, more sets of words from the original summary were
produced. The reason behind such an outcome is the subsets of input words get
considered instead of entire input words in the local attention implementation [8].
In 2015, Sutskever and et al. made the first attempt to summarize the text using
recurrent networks based on encoder decoder architecture [2]. Later in 2016, the
model designed by Loung and et al. on the famous CNN/daily mail dataset was the
stepping stone for abstractive summarization [3]. The next model to give promising
results was the pointer generator [4], a hybrid model that not only points the words
directly to the summary but also generated new ones from the vocabulary.
Masum et al. developed an efficient way of summarizing text using sequence-to-
sequence RNNs [9]. They proposed a method of successfully reducing the training
loss that occurs while training the model. The steps involved in their methodology
include data preprocessing, counting vocabulary size and then going on to adding
word embeddings and passing it to the encoder decoder layer with LSTM. One of the
limitations of their work is that it does not provide good results for large text inputs.
A paradigm shift in natural language processing occurred with the introduction
of the transformer [5]. It uses self-attention as a way to find dependencies instead of
recurrence. This design gave state-of-the-art results on a plethora of NLP tasks and
was efficient due to the parallelization in the architecture.
224 B. Singh et al.

2.1 Observations on Existing Work

Following are the observations on existing works:


• Named entities are not retained after summarization which is mainly because
word embeddings for named entities are not easily available.
• The same words appear consecutively in the summarized output.
• The generated summarizations are sometimes grammatically incorrect.
• LSTM is computationally inefficient as it processes text word by word instead of
processing them simultaneously.
• Very large text sources produce meaningless summarizations because the long-
term dependencies in sentence formation are not preserved. For instance, to
develop a software for predicting the next word in a sentence, the Seq2Seq network
must have a better knowledge about the words prior to it. If there are too many
words prior to it, then the dependency on the word that occurred earliest in the
sentence would be very low.
Considering the reasons for generating better headline summarization from news
articles, it is decided to explore this domain of text summarization, trying to
address the problems mentioned above. The details of the proposed methodology
are presented in the next section.

3 Methodology

In this section, the approach followed in the development of the system is presented
in detail. This provides a more clear view of the proposed system by providing
diagrammatic representations and explanations.

3.1 Data Processing

If there is a lot of unnecessary and redundant information present or noisy and


unreliable data, then knowledge discovery during the training phase turns out to be
more difficult. Analyzing data that has not been carefully screened for such problems
can produce misleading results. Thus, the representation and quality of data are first
and foremost before working on any model.
Data preprocessing is the first step in overall process which starts with data collec-
tion followed by cleaning the dataset, word embeddings, identifying and classifying
named entities and tokenization as shown in Fig. 1. Let us see each of these steps in
data preprocessing in detail.
Getting the Dataset: It is axiomatic that for any model to be successful, the dataset
needs to be of good quality and at the same time, as clean as possible. Two major
Retaining Named Entities for Headline Generation 225

Fig. 1 Data preprocessing


steps

datasets were identified after scrutiny, namely, the CNN daily mail news dataset and
the Amazon food review dataset. The main challenge in this model was summarizing
the dataset into a short, crisp headline, which also retained most of the important
information. It was found that the Amazon food review dataset was not fit for the
model because the labels contained only 2–3 words which did not meet the criteria
of a headline. Some marked inconsistencies were observed in the CNN daily mail
dataset where the data present was in the form of summaries of the article instead of
headlines, and thus, both datasets were discarded. Eventually, a dataset was created
using Web scraping where multiple articles and their headlines were acquired from
Inshorts [1]. The acquired news is not restricted to any single domain, rather it
consisted of a profusion of domains, viz. sports, business, politics, fashion, health,
etc. The final dataset holds 64,094 news articles along with their headlines, which
were preprocessed and split into training, validation and test sets, respectively.
Cleaning the Dataset: After acquiring the dataset, it is cleaned to get the best possible
accuracy on the model. The Web scraped dataset contained various ambiguous char-
acters, and thus for cleaning, these were processed with the use of regular expres-
sions. The dataset had frequent occurrences of “...”, new lines which were replaced
by a simple space. It is a common practice to dump punctuations before training
the models. Instead, for training this model, these punctuations are considered as
individual words as they play a vital role in the semantic and impact of the headline.
Thus, pretrained word embeddings file contained the embeddings of the punctua-
tions as well, which helps in improving the quality of the language of the generated
headline. Punctuations and symbols like (“.”, “!”, “?”, “:”, “;”, “*”, “,”, “|”, “/”)
are kept, and every word/character is left space separated.
Word Embeddings: The word embeddings used for the model are acquired from
the Stanford NLP Web site. The famous Global Vectors for word Representations
or Glove [10] embeddings each of dimensionality 300 are used. These vectors are
226 B. Singh et al.

trained using word co-occurrence statistics on the Common Crawl corpus. The word
embeddings of more than 94% of the words of the dataset’s vocabulary are present in
the glove.840B.300d.zip file (This number is achieved only after the steps to handle
the named entities are taken, as mentioned in the next paragraph). Random vectors of
size 300 are assigned to the words whose embeddings are not present. The embedding
layer of the model is given these pretrained vectors and is set to further train them to
figure out the latent meanings of these words with randomly assigned embeddings
and further refine the ones that already existed.
Identifying the named entity: An important issue that arises when using abstractive
text summarization, especially dealing with news datasets, is that the named entities
are abundant and not well retained during summarization which can compromise the
credibility of the news. This happens because named entities occur in large numbers
and comprise the majority of the vocabulary size. With each one being assigned a
random word embedding, there are not just enough occurrences of that named entity
for the model to learn the embedding. To tackle this problem, all the named entities
are categorized into 19 disparate classes as shown in Table 1, and each one is assigned
a token which would replace the named entity in the sentence. Using the open-source
spaCy library, for advanced natural language processing [11], various named entities

Table 1 Description of
Named entities Description
named entities [11]
PERSON People, including fictional
NORP Nationalities or religious or political groups
FAC Buildings, airports, highways, bridges, etc.
ORG Companies, agencies, institutions, etc.
GPE Countries, cities, states
LOC Non-GPE locations, mountain ranges, water
bodies
PRODUCT Objects, vehicles, foods, etc.
EVENT Named hurricanes, wars, sports events, etc.
WORK OF ART Titles of books, songs, etc.
LAW Named documents made into laws
LANGUAGE Any named languages
DATE Absolute or relative dates or periods
TIME Times smaller than a day
PERCENT Percentage, including “%”
MONEY Monetary values, including unit
QUANTITY Measurements, as of weight or distance
ORDINAL “First”, “second”, etc.
CARDINAL Numerals that do not fall under another type
INTERNET Web sites and Hashtags
Retaining Named Entities for Headline Generation 227

in the dataset are identified and replaced by the category tokens [12]. Although
the ones which are not identified, are found manually and processed using regular
expressions. With the entire set of named entities reduced down to 19 categories,
the vocabulary size and the number of unknown embeddings reduced radically. This
ensured that the tokens assigned to the categories would occur frequently, and its
embedding captured the latent meaning of the token. A dictionary is assigned to
every preprocessed article with the keys being the detected categories of the named
entities, and their corresponding values are lists of named entities of that category.
This is done so that the named entities could be replaced in the headline effortlessly.
After all these preprocessing steps, a clean dataset is ready, which would help in
obtaining a more accurate model to generate the best possible patterns and results for
a given input. The only thing left to do now is to replace the tokens in the generated
headline with the named entities of the article of the same category [13].
Table 2 shows an article and its headline, before and after replacing the named
entities. The named entities of the article are identified by spaCy. The second row
shows these named entities categorized depending on their type. The category tokens
are simply the category name encapsulated by “ < > ”. This token then replaces the
named entity in the article and the headline. The named entities of the Article that
occur in the Headline are highlighted in italic.
Tokenization: The final step left before preprocessing is to create tensor objects
out of the pairs of articles and headlines. The “<SOS>” (start-of-sentence) and
“<EOS>” (end-of-sentence) tokens are added at the beginning and the end of each
article and headline, respectively. All articles and headlines are truncated to a length
of 80 and 25 characters, respectively (punctuations included). An additional “<PAD>
” (padding) token is added to the articles and headlines until they met the desired size
of 80 and 25, respectively. When there is an appearance of an unknown word during
testing, the “<OOV>” (out of vocabulary) token is used. But the preprocessing with
spaCy mitigates the use of an OOV token. Every word and every token are assigned
a number in the vocabulary which is then used to create the tensors that could be
taken by the embedding layer.

3.2 Architecture

The mechanism of the three approaches: (1) Seq2Seq with attention model, (2) The
transformer, (3) Pointer generator model has been described in this section.
Sequence to sequence with Attention: It is a classic example of an encoder–decoder
model [2] with the encoder responsible for creating context vector representations
from the news articles that are provided and the decoder for generating a headline.
The headline is generated word by word, as the model calculates the attention given
to the encoder representations at every instant.
The Encoder: It consists of a trainable embedding layer for which the Glove [10]
300-dimensional word embedding is used. The recurrent layer is chosen to be of
228 B. Singh et al.

Table 2 Shows an article and its headline before and after the preprocessing steps and the
corresponding named entity dictionary
Article Headline
Original text The total number of India reports more than 5000
coronavirus cases in India has coronavirus cases in 5 days,
risen to 12,759 after over 5000 total cases rise to 12,759
cases were reported in last five
days. Meanwhile, the
coronavirus death toll in India
has risen to 420, while 1515
Covid-19 patients have been
cured, discharged or migrated.
Earlier today, the Health
Ministry revealed that 941 new
cases and 37 deaths were
reported on Wednesday
Recognition and classification GPE—[India] GPE—[India]
of named entities CARDINAL—[12,759, 5000, CARDINAL—[5000,
420, 1515, 37] 12,759]
DATE—[5 days, Wednesday] DATE—[5 days]
TIME—[Earlier today]
ORG—[the Health Ministry]
After replacing the tokens The total number of <GPE> reports
coronavirus cases in <GPE> <CARDINAL> coro-navirus
has risen to <CARDINAL> cases in <DATE>, total cases
after over <CARDINAL> cases rise to <CARDINAL>
were reported in last <DATE>.
Meanwhile, the coronavirus
death toll in <GPE> has risen
to <CARDINAL>, while
<CARDINAL> Covid-19
patients have been cured,
discharged or migrated.
<TIME>, <ORG> revealed that
<CARDINAL> new cases and
<CARDINAL> deaths were
reported on <DATE>

bi-directional long short-term memory networks (LSTM) in order to better preserve


long-term dependencies. The article is given to the encoder word by word x1 … x j .
The hidden representations at every instant of the encoder are used to calculate the
attention later in the decoder part as shown in Fig. 2.
The Decoder: Every word of the headline (y1 … y2 ) is generated one-by-one in the
decoder as the attention mechanism calculates the attention over the encoder inputs
[3]. At every instant of the decoder for generating the new word, the significance e jt
(the importance of the jth word of the headline on the tth time-step of the decoder)
is calculated using the hidden states of the encoder h t and the previous state of the
decoder st−1 .
Retaining Named Entities for Headline Generation 229

Fig. 2 Encoder–decoder LSTM model with attention [3]

Equation 1 [3] gives the actual representation of the significance. The matrices Uatt
and Watt are used to get the vectors st−1 and h j , respectively, to the same dimension.
T
The Vatt is a matrix that leaves us with a scalar e jt . Softmax is applied to the values
of e jt which gives us alphas. The output vectors at every instance are multiplied by
their corresponding scaling factor and are added to form one vector.
The final dense layer is of the size of the vocabulary, and cross entropy loss is the
loss function. During training, there is a 50% chance of the predicted word being
sent in as the next input to implement teacher-forcing. This would result in the model
being not completely reliant on the proper input while training and would function
better in the testing scenarios
  
e jt = Vatt
T
tanh Uatt st−1 + Watt h j (1)

The Transformer: The model (Fig. 3) chosen is the exact rendition of the one from
the paper. Attention is all you need [5]. The article is positionally encoded after
every word is assigned its 300-dimensional embedding. This is important because
the transformer does not rely on recurrence. Hence, an idea of order is required for
the model to understand sequence data.
Each side of the transformer consists of six encoders and decoders, having multi-
headed attention of eight heads for better focus over the article. For calculating
230 B. Singh et al.

Fig. 3 Transformer model [5]

self-attention, a set of three matrices are multiplied with every input word producing
three vectors, i.e., query, key and value. The query and key vectors of every input
word are used to calculate constants that scale the value vector. This determines the
impact of every other word on the current word. This represents a single head of
attention, and n such sets of matrices are used to find n-headed attention.
The transformed vectors are added to the original ones and are followed by normal-
ization. This occurs in every encoder for 6 times, which leads to generating a context-
rich vector, and it is fed to every decoder from the decoder stack. The output of the
final decoder from the stack is fed to a dense layer of the size of the vocabulary to
predict the next word using the categorical cross entropy loss.
Pointer Generator: The paper [4] presents a new architecture for abstractive text
summarization that augments the standard sequence-to-sequence attentional model.
In this method, a hybrid pointer generator network is used that can not only
copy words from the source text via pointing, which aids accurate reproduction
of information, but also produces novel words through the generator. Further, the
generation probability pgen ∈ [0, 1] for time-step t is calculated from the context
vector h ∗t , the decoder state st and the decoder input xt using Eq. 2 [4].
 
pgen = σ whT∗ h ∗t + wsT st + wxT xt + bptr (2)

where vectors wh ∗ , ws , wx and scalar bptr are learnable parameters and σ is the
sigmoid function.
Now, this value of pgen is used to determine whether the words should be picked
from the article directly or from the original vocabulary distribution. One of the main
advantages of the pointer generator model is its ability to produce out of vocabulary
words, by contrast, other text summarization models are restricted to their pre-set
vocabulary.
Retaining Named Entities for Headline Generation 231

4 Result

This paper is primarily focusing on addressing the issues in existing architectures


of text summarization by extending them to improve the accuracy of the generated
headlines. The emphasis on identifying the named entities provides better accuracy
in news article summarization.
In this section, a walk-through of the system demonstrating the experimental result
obtained from the system are presented first. The section ends with an evaluation
study using the ROGUE score to assess the efficiency of these models.
Walk-through of the system: During preprocessing, every article is assigned a
dictionary in which the keys are the named entity tokens and the values are lists of
named entities of that category. This step requires the spaCy library and is carried
out for every article that is preprocessed. Now, when the model predicts any of the
named entity tokens as the next word, it can be replaced with the named entities of
that article of the same category. If there exists more than one named entity of the
same category, then it simply finds the most suitable permutation with a sentence
similarity between the generated headline and the original article. Table 3 provides
an overall process of predicting the headline for a particular article.
Some more examples of the predicted headlines are shown in Tables 4 and 5.
Experimental Evaluation: In this section, a comparison between the various
models is carried out and analyzed using the ROGUE metric. It is an abbreviation for
recall oriented understudy for gisting evaluation. It is a mechanism for analyzing auto-
mated summarization of text and also machine translation. Basically, it compares the
generated headlines to the original headlines. In Table 6, the average of the F1 scores
for ROGUE-1, ROGUE-2, ROGUE-L is shown, which measures the word-overlap

Table 3 Obtaining usable results


Article Congress leader Rahul Gandhi has said that the
recent video of two young dalit men “being
brutally tortured in Nagpur, Rajasthan is horrific
and sickening” and urged immediate action.
Meanwhile, Rajasthan CM Ashok Gehlot said,
“Seven accused have been arrested … we will
ensure that the victims get justice”. The two men
were allegedly beaten up on the suspicion of
stealing money
Predicted headline Brutal torture of dalits in <GPE> horrific,
sickening:
<PERSON>
After replacing with named entities of the Brutal torture of dalits in [Rajasthan, Nagpur]
same category horrific, sickening: [Rahul Gandhi, Ashok Gehlot]
Best permutation according to sentence brutal torture of dalits in Rajasthan horrific,
similarly sickening: Rahul Gandhi
232 B. Singh et al.

Table 4 Example 1
Original article Delhi-based diabetes management app BeatO has raised over
11 crore in a pre-Series A funding round led by Orios Venture
Partners. The funding round also saw participation from
existing investors Blume Ventures and Leo Capital. Founded
in 2015 by Gautam Chopra, Yash Sehgal and Abhishek
Kumar, BeatO offers diabetes management programmes to
users via a smartphone app
Original headline Diabetes management app BeatO raises 11 crore led by Orios
Transformer headline Diabetes management app BeatO raises 11 crore led by Orios
Pointer generator headline Diabetes management app BeatO raises 11 crore in series by
Seq2Seq with attention headline Diabetes management app

Table 5 Example 2
Original article The TMC is leading in Kharagpur Sadar and Karimpur seats
in the West Bengal Assembly by poll. Meanwhile, BJP is
leading in the Kaliaganj seat. The Kaliaganj by poll was
necessitated following the death of sitting Congress MLA
Pramatha Nath Roy, while Kharagpur Sadar and Karimpur
seats had fallen vacant after the sitting MLAs were elected as
MPs in the LS polls
Original headline TMC leading in 2 of 3 seats in West Bengal by poll
Transformer headline TMC leading in 2 seats in West Bengal by poll
Pointer generator headline TMC leading in 2 TMC seats in West Bengal assembly
Seq2Seq with attention headline TMC TMC company seats in polls

Table 6 Comparison of
Architectures ROGUE-1 ROGUE-2 ROGUE-L
results on the basis of
ROGUE metrics Transformer 0.335 0.162 0.521
Pointer generator 0.369 0.157 0.493
Seq2Seq with attention 0.216 0.091 0.225

(unigram, bigram, etc.) between the generated headlines and original headlines over
the entire test set.
From Table 6, it can be inferred that the pointer generator performs better than
transformer and Seq2Seq based on the ROGUE-1 metric, as the pointer generator
has a special mechanism of pointing at single words directly from the article. But on
comparing the other metrics, the transformer model outperforms the rest in preserving
the semantics and also the named entities over the entire headline. The basic Seq2Seq
model not only lacks a mechanism to point important words directly to the output,
but also has no extensive self-attention architecture like the transformer. Hence, the
ROGUE metric is low in both short- and long-term dependencies.
Retaining Named Entities for Headline Generation 233

5 Conclusion and Future Work

This paper has presented an approach for adapting existing text summarization
models to generate crisp headlines by taking news articles as input. Although it
is observed that the SeqSeq with attention and pointer has a problem of repetitions
that occur while headline generation. The pointer model has been found to perform
well under most circumstances. The transformer model has been seen to give the
best results out of all three. A new technique for retaining important named entities
has been presented here and produces more natural and meaningful headlines. The
proposed system would be a stepping stone toward automating the process of fool
proof headline generation, to be used in latest automated AI-based news platforms
like Inshorts. Also, the same system can be modified and trained for similar, but other
use cases like legal document analysis, stock market prediction based on news or
summarization of customer feedback on products, where retaining named entities is
essential.

References

1. Inshorts.com (2020) Breaking news headlines: Read All news updates in English—Inshorts.
Available at: https://inshorts.com/en/read. Accessed 4 August 2020
2. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks
3. Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural
machine translation
4. See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator
networks
5. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I
(2017) Attention is all you need
6. ETtech.com (2020) Inshorts debuts ai-based news summarization on its app—
Ettech. Available at https://tech.economictimes.indiatimes.com/news/startups/inshorts-debuts-
aibased-news-summarization-on-its-app/64531038. Accessed 4 Aug 2020
7. Masum KM, Abujar S, Tusher RTH, Faisal F, Hossai SA (2019) Sentence similarity measure-
ment for Bengali abstractive text summarization. In: 2019 10th international conference on
computing, communication and networking technologies (ICCCNT), Kanpur, India, 2019, pp
1–5. https://doi.org/10.1109/ICCCNT45670.2019.8944571
8. Hanunggul PM, Suyanto S (2019) The impact of local attention in LSTM for abstractive text
summarization. In: 2019 international seminar on research of information technology and
intelligent systems (ISRITI), Yogyakarta, Indonesia, 2019, pp 54–57. https://doi.org/10.1109/
ISRITI48646.2019.9034616
9. Mohammad Masum K, Abujar S, Islam Talukder MA, Azad Rabby AKMS, Hossain SA (2019)
Abstractive method of text summarization with sequence to sequence RNNs. In: 2019 10th inter-
national conference on computing, communication and networking technologies (ICCCNT),
Kanpur, India, 2019, pp 1–5. https://doi.org/10.1109/ICCCNT45670.2019.8944620
10. Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation.
EMNLP 14:1532–1543. https://doi.org/10.3115/v1/D141162
11. Spacy.io (2020) Industrial-strength natural language processing. Available at: https://spacy.io/.
Accessed 4 August 2020
12. Partalidou E, Spyromitros-Xioufis E, Doropoulos S, Vologiannidis S, Diamantaras KI (2019)
Design and implementation of an open source Greek POS Tagger and Entity Recognizer
234 B. Singh et al.

using spaCy. In: 2019 IEEE/WIC/ACM international conference on web intelligence (WI),
Thessaloniki, Greece, 2019, pp 337–341
13. Li J, Sun A, Han J, Li C (2018) A survey on deep learning for named entity recognition. IEEE
Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2020.2981314
14. Janjanam P, Reddy CP (2019) Text summarization: an essential study. In: 2019 international
conference on computational intelligence in data science (ICCIDS), Chennai, India, 2019, pp
1–6. https://doi.org/10.1109/ICCIDS.2019.8862030
15. Partalidou E, Spyromitros-Xioufis E, Doropoulos S, Vologiannidis S, Diamantaras KI (2019)
Design and implementation of an open source Greek POS Tagger and entity recognizer
using spaCy. In: 2019 IEEE/WIC/ACM international conference on web intelligence (WI),
Thessaloniki, Greece, pp 337–341
16. Modi S, Oza R (2018) Review on abstractive text summarization techniques (ATST) for single
and multi-documents. In: 2018 international conference on computing, power and communi-
cation technologies (GUCON), Greater Noida, Uttar Pradesh, India, pp 1173–1176. https://
doi.org/10.1109/GUCON.2018.8674894
Information Hiding Using Quantum
Image Processing State of Art Review

S. Thenmozhi , K. BalaSubramanya, S. Shrinivas,


Shashank Karthik D. Joshi, and B. Vikas

Abstract The bottleneck of digital image processing field narrows down to the
memory consumption and the processing speed problems, which can be resolved by
performing image processing in quantum state. In this Internet era, all the information
is exchanged or transferred through the Web of things, which necessitates maintaining
the security of transmitted data. A variety of techniques are available to perform
secret communication. A quantum steganography scheme is introduced to conceal
a quantum secret message or image into a quantum cover image. For embedding of
secret data into the quantum cover, many algorithms like LSB Qubits, QUALPI are
available. This paper thrashes out on the subject of secret data transmission using
quantum image steganography.

Keywords Quantum image steganography · Quantum secure communication ·


Quantum log-polar image · Quantum image expansion · Lsbqubits

1 Introduction

Quantum computation uses the properties of quantum mechanics such as entangle-


ment state and superposition to store, process and retrieve the data. Considering an
electron as a basic element, it has two states namely spin up (Bit1) and spin down
(Bit0).
According to quantum mechanics, the total angular momentum of an electron can
be represented as a superposition of both spins up and spin down. The representation

S. Thenmozhi (B) · K. BalaSubramanya · S. Shrinivas · S. K. D. Joshi · B. Vikas


ECE, Dayananda Sagar College of Engineering, Bangalore, India
e-mail: thenmozhirayan@gmail.com
K. BalaSubramanya
e-mail: balasubramanya58464@gmail.com
S. K. D. Joshi
e-mail: shashankkarthikdjoshi92@gmail.com
B. Vikas
e-mail: vikas.bhargav01@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 235
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_18
236 S. Thenmozhi et al.

of quantum bits is shown in Fig. 1. The concept of quantum computation was first
proposed by Richard Feynman in the year 1982. Shor’s quantum integer factoring
methodology proposed in the year 1994, and a search algorithm proposed by L.K.
Grover in the year 1996 (named after him) has developed a new and possible way
of computation. By the dawn of the late 1990s, development of quantum computing
became a hot topic in Information Science and Technology. Quantum information
hiding, a part of quantum computing, is divided into two parts, namely quantum
watermarking and quantum image steganography. Due to advancement in infor-
mation technology with tons of data being transferred through the Internet, it is
essential to have secure communication between the end users. This led to the rapid
development of quantum multimedia technology and quantum image steganography.
Quantum image steganography deals with information hiding inside an image in such
a way that the information is completely masked and the eavesdropper will never
know about its existence. Many quantum image steganography models have been
developed ever since [1]. In this work, the author proposed a Qubit lattice in the year
2003. Further, in the year 2010, he improved his model and proposed an intertwined
representation to store statistical information [2]. In [3], authors projected a real ket
model of quantum steganography in the year 2005. A new Flexible Representation
for Quantum Images (FRQI) model was projected by the authors in the year 2011
[4]. So, for the first time, a method that took consideration of position and inten-
sity was proposed. In the year 2014, Yi Zhang et al. proposed a methodology called
novel enhanced quantum representation (NEQR) [5]. This method is similar to FRQI
except for the fact that, in FRQI method, it considers only one qubit sequence. On
the contrary, NEQR considers superposition of all the qubits. To enhance the perfor-
mance of quantum steganography, a novel method called quantum log-polar image
representation (QUALPI) was proposed by Yi Zhang et al. in the year 2013 [6].
The advantages of quantum steganography over conventional methods are briefly
described in Table 1.

Fig. 1 Representation of
quantum bits

Bit 1 Bit 0 Superposition of Bit 0 and 1


Information Hiding Using Quantum Image … 237

Table 1 Comparison of classical and quantum steganography


Conventional steganography Quantum steganography
The basic memory unit in classical The basic memory unit in quantum
steganography is expressed in bits steganography is expressed in qubits
A bit can take on a value of ‘0’ or ‘1’ at a A qubit can take the superposition of ‘0’ and
particular time ‘1’ simultaneously
It is possible to represent only one of the 2n If there are ‘n’ qubits, it is possible to
states using ‘n’ bits simultaneously represent all 2n states
Classical steganography has many Whereas, in quantum steganography, due to the
vulnerabilities. For example, a secret key used principle of non-cloning theorem, an exact
during the embedding process can be copy of the secret key cannot be obtained
compromised
Computational speed in case of classical Contrary to that when database search happens
steganography is linear. Taking an example of in quantum computers the time √ complexity
the database search, it takes a time complexity involved is in the order of O( n), this is
of O(n) to identify the required data possible by making use of Grover’s search
algorithm

2 Literature Review

The quantum image steganography was implemented using a variety of techniques.


Majorly they can be divided based on (1) Embedding technique used (2) Type of
cover chosen. In this paper, the type of cover chosen is an image. The literature review
was performed for the cover as an image and concentrating on various embedding
techniques.

2.1 LSQbits-Based Methods

The paper [7] here proposed a new matrix-coded quantum steganography algorithm
which made use of the quantum color image. Here, a covert and quantum secure
communication is established by taking into account the good invisibility and higher
efficiency in terms of embedding of the matrix coding. Two embedding methods were
applied in this paper. The first being single pixel-embedded (SPE) coding where
three least significant qubits (LSQbs) of single quantum carrier image pixel were
embedded with two qubits of secret message. The second method used was multiple
pixels-embedded (MPsE) coding where three least significant qubits (LSQbs) of
different pixels of the carrier quantum image were embedded with two qubits of secret
message. The PSNR values of the embedding methods used here were found to be
higher than other methods referenced in the paper. Combining PSNR and histogram
analysis, it is shown that this protocol achieves very good imperceptibility. The
protocol is also shown to have good security against noises in the quantum channel
238 S. Thenmozhi et al.

and various attacks. The efficiency and capacity in terms of embedding of single-
pixel embedded coding are shown to be 2.67 and 22n+1 , respectively, and MPsE is
shown to be 2.67 and 22n+1 /3 [8] Here, the authors proposed three quantum color
image steganography algorithms which involved “Least Significant bit” technique.
Algorithm one utilized a generic LSB technique. Here, information bits of secret
data were substituted in place of the pixel intensity’s LSB values. This utilized a
single image channel to hide secret information. Algorithm two made use of least
significant bit (LSB) Xoring technique and utilized a single image channel to cover
secret data. Algorithm three made use of two channels of the cover image to cover the
color image for hiding secret quantum data. As the number of channels increased, the
capacity of the third algorithm also increased. An image key was used in all the three
algorithms in the process of embedding the secret data and extraction of secret data.
The evaluation parameters considered here were invisibility, robustness and capacity.
The PSNR values observed in the first algorithm were around 56 dB, the second
algorithm was around 59 dB, and the third algorithm was around 52 dB. The quality
of the stego image obtained by making use of the second algorithm was better than the
other two. As the third algorithm made use of two channels of the cover image to cover
the secret data, the capacity was enhanced. The capacity of the third algorithm was
2 bits/pixel, whereas the other two algorithms were 1 bit/pixel. [9] In this proposed
work, initially, a quantum carrier image was prepared using “NEQR” model by
employing two-qubit patterns to accommodate the grayscale intensities as well as
the position of each pixel. In EMD embedding, a group of pixels having a Npixels is
formed, and every secret digit of the hidden message, belonging to a system of (2N +
1)-ary notation, was embedded into that group. During this embedding of the secret
digit, a single pixel of the cover image alone might be modified or the cover image
pixels remain as such. If the cover pixel value was to be modified, then it was either
incremented or decremented by a unit value. This implies that for an N no of cover
pixels (2N + 1), different transformations need to be performed to acquire (2N + 1)
values of a secret digit. The advantage of EMD embedding is to provide good quality
of the image with a PSNR exceeding 52 dB. This algorithm achieves high embedding
efficiency, security and imperceptibility of the secret information. However, as ‘N’
becomes larger, the embedding rate reduces. In this paper [10], initially, a (2n ×
2n )-sized cover image and a (2n−1 × 2n−1 )-sized watermark image were modeled
by a “Novel Quantum Representation of Colour Digital Images Model (NCQI)”.
The watermark was scrambled into an unordered form through image preprocessing
technique to simultaneously change the position of the pixels while changing the
color pixel information based on “Arnold transformation”. The (2n−1 × 2n−1 )-sized
scrambled watermark image with a gray intensity range of 24-qubits was subjected
to expansion to acquire a (2n × 2n )-sized image with a gray intensity range of 6-
qubits using the “nearest-neighbour interpolation” method. This watermark image
was embedded onto the carrier by LSB steganography scheme, by substituting the
least significant bits of the pixels of the cover image of three channels, i.e., red, green
and blue. In the meantime, a (2n × 2n )-sized key image with information of 3-qubits
was also created to retrieve the actual watermark image. The extracting process is
just the inverse process of embedding. The PSNR value for the algorithm exceeds
Information Hiding Using Quantum Image … 239

54 dB, which indicates that the imperceptibility of the cover image is not affected
by the embedding of a watermark. The proposed scheme, thus, provides good visual
quality, robustness, steganography capacity and lower computational complexity.

2.2 FRQI-Based Methods

This work [11] proposes three strategies which involve the design of new geometric
transformation performed on quantum images. The proposed design focused on
affected regions in an image, separability and smooth transformations by representing
an image in a quantum computer. The first strategy considered transformations that
considered parts of a quantum image. More controls were added to show information
about parts present in a quantum image. The second method took the separability
present in classical operation to transformation in the quantum state. By making use
of the flexible representation for quantum image (FRQI) model, it was feasible to
examine and define separable and geometric transformations. Third method aimed at
the transformations taking place smoothly. Multi-level controls which were used by
the cyclic shift transformations were the primary technique in obtaining smooth trans-
formation. The methods proposed in the paper provided top-level tools for expanding
the number of transformations required for building practical applications dealing
with image processing in a quantum computer. It is also shown that the design of a
quantum circuit with a lesser complexity for inconsistent geometric transformation
is feasible [6]. In this paper, the author proposed FRQI, a method in which images
are mapped onto its quantum form, in a normalized state which captures information
about colors and positions. The quantum image compression algorithm starts with
the color group. From the same color group, Boolean min-terms are factored. By
combining all the min-terms, a min-term expression is created. In the next step, the
minimization of min-terms is done. At the final step, minimized Boolean expression
of output is obtained. The paper has three various parameters evaluated depending
upon the unitary transformation on FRQI dealing only with color images, color
images with its current positions and a union of both color and pixel points. Consid-
ering the application of QIC algorithm on a unit-digit binary image, the compression
ratios vary from 68.75 to 90.63%, and considering a gray image, the value varies
from 6.67 to 31.62% [12] This paper focuses on estimating the similarity between
quantum images based on probabilistic measurements. The similarities between the
two images were determined by the possibility of amplitude distribution from the
quantum measurement process. The methodology utilized in this paper for repre-
senting the quantum state is FRQI. The obtained quantum image was then passed
on through a Hadamard gate to recombine both the states, and then, it is followed
by quantum measurement operation. The result of the measurement was dependent
on the differences in the two quantum images present in the strip. The probability of
getting a 0 or 1 was dependent on the pixel differences among the two quantum images
in the strip, and this was determined through quantum measurements. Comparing
a 256 × 256 grayscale original image with the same size watermarked image, the
240 S. Thenmozhi et al.

similarity value was found to be 0.990. When compared with the same-sized dark-
ened image with the original image, the similarity was found to be 0.850. Hence,
the similarity between two images is more proficient when the similarity value is
nearer to one [4]. The protocol used in this paper enhances the existing FRQI model,
by representing the qubit sequence to store the grayscale information. This method
starts by converting the intensity values of all the pixels into a ket vector. Then, a
tensor product of respective position and its intensity was done to form a single qubit
sequence, thereby, successfully converting a traditional image into a quantum image.
At the receiver, a same but inverse operation called quantum measurement is done
to retrieve back the classical image. The computational time in NEQR is found to be
very less. Compression ratio was also found to be better than FRQI.

2.3 NEQR-Based Methods

This paper deals with a methodology for hiding a grayscale image into a cover image
[13]. An (n/2 × n/2)-sized secret grayscale image with a gray intensity value of 8-
bits was expanded into a (n × n)-sized image with a gray value of 2-bits. This secret
gray image and a (n × n)-sized cover image were represented using NEQR model
which stores the color information and position of every pixel in the image. The
obtained secret image, in quantum form, was scrambled using “Arnold Cat map”
before starting the process of embedding. Later, the quantum secret image, which
underwent scrambling, was embedded onto a cover image in quantum form using two
“Least Significant Qubits (LSQb)”. The process of extracting requires the stegano-
graphic image alone to extract the secret image embedded. This scheme achieves high
capacity, i.e., 2-bits per pixel which is significantly higher compared to other schemes
in the field of quantum steganography. The security of this scheme is enhanced since
this method involves scrambling the image before embedding. The PSNR achieved
by this scheme accounts to a value around 43 dB which is higher when compared with
Moiré pattern-based quantum image steganography, but less when compared with
other LSB techniques. In this proposed paper, initially, the image to be encrypted
was mapped onto a NEQR model which stores pixel values and pixel positions in an
entangled qubit sequence [5]. A chaotic map called logistic map was used to generate
chaotic random sequences. The process of encryption of the carrier image includes
three stages, namely, intra bit permutation and inter bit permutation and chaotic
diffusion. The intra bit permutations and inter bit permutations were operated on the
bit planes. The intra bit permutation was accomplished by sorting a chaotic random
sequence, which modified the position of the bits, while the pixel weight remained
the same. As the percentage of bit 0 and bit 1 was roughly the same in each and
every bit plane, all the bits were uniformly distributed due to the permutation oper-
ations. The inter bit permutation was operated between different bit planes, which,
simultaneously, modified the grayscale information as well as the information of the
pixel. This was achieved by choosing two-bit planes and performing Qubit XOR
operations on them. Finally, a chaotic diffusion procedure was put forth to retrieve
Information Hiding Using Quantum Image … 241

the encrypted text image, which was facilitated using an XORing of the quantum
image. The chaotic random sequence generated from a logistic map determined the
controlled-NOT gates, which was significant in realizing the XOR operations. The
parameters to the logistic map were found to be sensitive enough to make the keyspace
value large enough. Larger the keyspace value, the more difficult it is to perform the
brute-force attack. This methodology not just altered the grayscale intensities and the
positions of the pixels, yet, in addition, the bit distribution was observed to be more
uniform progressively. According to the simulation output, the proposed technique
was found to be more proficient than its classical equivalent. The security accom-
plished is confirmed by the measurable examination, the sensitivity of the keys and
keyspace investigation. When compared with the classical image cipher techniques,
mathematical entanglement of the proposed approach was found to be lesser. The
PSNR value for a grayscale image of size 256 × 256 was found to be 8.3956 dB
as opposed to an image cipher algorithm implemented using no linear chaotic maps
and transformation whose value was found to be 8.7988 dB [14]. This paper intro-
duces a novel, keyless and secure steganography method for quantum images dealing
with Moiré pattern. Here, the proposed methodology consists of two steps. Initially,
they carried out the embedding operation where a secret image was embedded onto
a preliminary Moiré grating of the original cover image which resulted in Moiré
pattern. Here, the preliminary Moiré grating was modified in accordance with the
secret image to result in a final Moiré pattern. The workflow of the embedding oper-
ation consisted of three steps. First, a preliminary Moiré grating was under consid-
eration, and the user had the flexibility in choosing the same. Second, a deformation
operation was performed to generate a Moiré pattern by making use of the prelimi-
nary grating and the image which was needed to be hidden. Finally, denoising was
performed which transformed the obtained Moiré pattern to a steganographic image.
The second phase of the methodology dealt with the extraction of the secret image
by making use of the preliminary grating and an obtained Moiré pattern. Evaluation
parameters considered here were visual effects and robustness. PSNR was performed
in displaying the steganography scheme’s accuracy. Even though the PSNR value
was observed to be around 30 dB, not much noticeable change was found between
the cover image and stego image. For the sake of understanding robustness of the
proposed scheme, the addition of salt & pepper noise with various densities was done
to stego image. The secret image extracted was easily identifiable and robust against
the addition of the salt and pepper noises. The stego image was under the influence
of cropping attack, and the extracted secret image from the cropped stego image
consisted of a few non-adjacent parallel black lines attached. Even though they had
observed the appearance of parallel black lines, the meaning and content of a hidden
image were observed conveniently.
242 S. Thenmozhi et al.

2.4 QUALPI-Based Methods

The paper [15] proposes a new quantum image steganography method which intro-
duced a quantum image representation called QUALPI that makes use of log-polar
images in preparing the quantum image model. This was followed by quantum
image expansion where an atlas consisting of various quantum image copies are
superimposed. The expanded quantum image was subjected to the embedding of
the secret information. This was done by choosing one particular image copy out
of the atlas followed by embedding the secret information onto the chosen image
copy. At the receiver, Grover’s search algorithm, an algorithm that aimed at reducing
the time complexity of searching a record present in an unsorted database, was
utilized in the retrieval of secret information. This work included three perfor-
mance parameters namely imperceptibility, capacity and security. The secret infor-
mation is embedded onto one of the many image copies and with a smaller angle
of image expansion, a complex atlas was obtained showing better imperceptibility
and thus greater security against eavesdroppers [16]. This paper introduced a novel
representation for a quantum image named quantum log-polar image (QUALPI)
which involved processing and storing of a sampled image in log-polar coordinates.
QUALPI involved following the preparation procedure. Initially, it dealt with the
conversion of classical image to an image sampled in log-polar coordinates. For an
image of size 2m × 2n with 2q grayscale values, a register consisting of (m + n + q)
number of qubits in the quantum state was defined in the storage of image informa-
tion as qubit sequences also referred to as ket. Later, an empty ket was initialized by
keeping all the grayscale intensities as zero followed by setting all the pixels with
their appropriate intensities. This constituted a final image representation in quantum
state named QUALPI. The time complexity involved in storing a 2m × 2n log-polar
image having a grayscale value of 2q was O(q(m + n) · 2m+n ). Common geometric
transformations such as rotational transformations and symmetric transformations
were performed conveniently with the help of QUALPI when compared with other
representations, for example, NEQR and FRQI.

2.5 Other Methods

In this paper, a new technique for constructing substitution boxes [17] dealing with
quantum walks nonlinear properties was presented. Quantum walks are universal
quantum computational models used for designing quantum algorithms. The perfor-
mance of this method was evaluated by an evaluation criterion called S-box. Also, a
novel method for steganography of images was constructed using the S-boxes. The
proposed method consists of a mechanism involving data hiding (traditional) and
quantum walks. This technique is shown to be secure for data which is embedded.
It is also seen that the secret message of any type can be used in this technique.
Information Hiding Using Quantum Image … 243

During the extraction process, only the preliminary values for the S-boxes genera-
tion and steganographic image are found to be required. This method has a greater
capacity of embedding and good clarity with greater security [18]. In this work, a
new quantum steganography protocol was proposed using “Pixel Value Differenc-
ing” (PVD) which satisfactorily adheres to edge effects of image and characteristics
of the human optical system. The whole process was divided into three parts namely
quantization of two-pixel blocks based on the difference in their grayscale values,
data embedding and extraction. Here, the cover image was embedded with the oper-
ator’s information and secret image based on pixel value differencing. Based on
the pixel value difference level, information about the operator with different qubit
numbers was embedded. The difference in pixel values was not a concern while
embedding a secret image. Secret image and information about the operator were
embedded by swapping the pixel difference values belonging to the two-pixel blocks
of the cover image with similar ones where embedded data qubits are included. Secret
information traceability is realized by extracting information about the operator. The
extraction process is seen to be completely blind. There were two parameters taken
into account while checking for the invisibility of the secret image. During histogram
analysis, it is seen that the histograms of steganographic images are very similar to
the original ones. Considering “Peak Signal-to-Noise Ratio” (PSNR), it is seen that
the algorithm proposed obtains good clarity. It is also seen that the scheme allows for
good embedding capacity and is found to be highly robust [19]. This paper discusses
a new protocol which is based on quantum secure direct communication (QSDC).
The protocol is used to build a concealed channel within the classical transmission
channel to transmit hidden information. The protocol discussed in this paper uses
QSDC as its basis. The technique adopts the entanglement transaction of its bell-
basis states to embed concealed messages. This protocol contains six steps which are
crucial for the preparation of large numbers, mode selection by a receiver, control
mode, information transmission, covert message hiding mode and concealed data
retrieving mode. The protocol uses IBF which is the extension and a more secured
method over BF coupled with QSDC. It was seen that the protocol can reliably
deal with the intercept-resend attack and auxiliary particle attack and also a man-in-
the-middle attack. This protocol also shows great imperceptibility. Compared to the
previous steganography protocols based on QSS and QKD, this protocol has four
times more capacity in hidden channels, thereby increasing the overall capacity of the
channel [20]. In this proposed work, a quantum mechanical algorithm was proposed
to perform three main operations: Creating a configuration in which amplitude of a
system present in anyone of the 2n states is identical. Performing Fourier transforma-
tion, rotating
√ the selective states by the intended angle. This paper presents a method
having O( n) complexity in time for identifying a record present in a database with
no prior knowledge of the structure in which the database is organized.
244 S. Thenmozhi et al.

2.6 Validation Parameter

As in the case of classical image steganography, in this quantum steganography also


the performance of the method needs to be evaluated. Evaluation can be done in
qualitative as well as quantitative analysis. Under quantitative analysis, the metrics
used are (1) PSNR (2) NCC (3) SSIM, etc., and under qualitative analysis, checking
the visual quality of the method using the histogram is used. The histogram of both
cover and stego image should not have any trace of embedded data. In addition to
this, the key sensitivity and keyspace analysis can also be used as a metric to measure
the performance of the algorithm used. For better security, the algorithm should have
a very large keyspace.

3 Conclusion

Quantum computation possesses amazing properties, namely superposition, entan-


glement and parallelism, because of which quantum image processing technologies
offer performances and capabilities that are in a position of no rivalry by their classical
equivalents. Guaranteed security, computing speed and minimal storage require-
ments are the improvements that could be achieved. There is a need for a faster
and more efficient way of storing and processing data. Quantum image processing
which inherently possesses parallelism and superposition properties could be used
to satisfy this need. This pertains to operations in image processing such as expan-
sion, reconstruction or image recognition, which is an even more difficult problem
to handle.
Based on the literature survey, have seen that “Phuc Q Le” et al. proposed FRQI
in the year 2009. But, this representation is not efficient enough. Hence, Yi Zhang
et al. proposed a better mapping model from classical to quantum form called novel
enhanced quantum representation (NEQR) in the year 2014. This representation
model comprises the quantum image in the form of qubit series to store gray intensity
values rather than the random magnitude of a single qubit for representing grayscale
values as in FRQI. Therefore, this model has very less computational time, better
compression ratio and accurate retrieval of the classical image. “Yi Zhang” et al. later
enhanced the above model when they proposed quantum representation for log-polar
images (QUALPI). This model inherits the properties of NEQR while representing
the quantum images in log-polar/ ρ, θ coordinate system. This offers more security
against third party attackers. By comparing with other steganography algorithms, it is
seen that by the implementation of expanding the quantum images to make an atlas
and retrieval using the well-known Grover search is giving greater PSNR values,
better security and large payload capacity than the existing methods. Therefore, by
combining any of the methods cited in the literature, a better secure communication
protocol can be developed in a quantum state.
Information Hiding Using Quantum Image … 245

References

1. Venegas-Andraca SE, Bose S (2003) Storing, processing, and retrieving an image using
quantum mechanics. Proc SPIE 5101:1085–1090
2. Venegas-Andraca SE, Ball JL (2010) Processing images in entangled quantum systems. Quant
Inf Process 9(1):1–11
3. Latorre JI (2005) Image compression and entanglement, pp 1–4. Available https://arxiv.org/
abs/quant-ph/0510031
4. Zhang Y, Lu K, Gao Y, Wang M (2013) NEQR: a novel enhanced quantum representation of
digital images. Quant. Inf. Process. 12(8):28332860
5. Lıu X, Xıao D (Member, IEEE), Xıang Y (2019) Quantum ımage encryption using ıntra and
ınter bit permutation based on logistic map. https://doi.org/10.1109/ACCESS.2018.2889896
6. Le PQ, Dong F, Hirota K (2011) A flexible representation of quantum images for polynomial
preparation, image compression, and processing operations. Quant. Inf. Process. 10(1):63/84
7. Qu Z, Cheng Z, Wang X (2019) Matrix coding-based quantum ımage steganography algorithm.
IEEE Access 1–1 (2019). https://doi.org/10.1109/access.2019.2894295
8. Heidari S, Pourarian MR, Gheibi R, Naseri M, Houshmand M (2017) Quantum red–green–blue
image steganography. Int. J. Quant. Inf. 15(05):1750039. https://doi.org/10.1142/s02197499
17500393
9. Qu Z, Cheng Z, Liu W, Wang X (2018) A novel quantum image steganography algorithm based
on exploiting modification direction. Multimedia Tools Appl. https://doi.org/10.1007/s11042-
018-6476-5
10. Zhou R-G, Hu W, Fan P, Luo G (2018) Quantum color image watermarking based on Arnold
transformation and LSB steganography. Int. J. Quant. Inf. 16(03):1850021. https://doi.org/10.
1142/s0219749918500211
11. Le P, Iliyasu A, Dong F, Hirota K (2011) Strategies for designing geometric transformations
on quantum images. Theor. Comput. Sci. 412:1406–1418. https://doi.org/10.1016/j.tcs.2010.
11.029
12. Yan F, Le P, Iliyasu A, Sun B, Garcia J, Dong F, Hirota K (2012) Assessing the similarity
of quantum ımages based on probability measurements. In: 2012 IEEE world congress on
computational ıntelligence
13. Zhang T, Abd-El-Atty B, Amin M, Abd El-Latif A (2017) QISLSQb: a quantum ımage
steganography scheme based on least significant qubit. https://doi.org/10.12783/dtcse/mcsse2
016/10934
14. Jiang N, WangL (2015) A novel strategy for quantum ımage steganography based on moire
pattern. Int J Theor Phys 54:1021–1032. https://doi.org/10.1007/s10773-014-2294-3
15. Qu Z, Li Z, Xu G, Wu S, Wang X (2019) Quantum image steganography protocol based on
quantum image expansion and grover search algorithm. IEEE Access 7:50849–50857. https://
doi.org/10.1109/access.2019.2909906
16. Zhang Y, Lu K, Gao Y, Xu K (2013) A novel quantum representation for log-polar images.
Quant Inf Process 12(9):31033126
17. EL-Latif AA, Abd-El-Atty B, Venegas-Andraca SE (2019) A novel image steganography tech-
nique based on quantum substitution boxes. Opt Laser Technol 116:92–102. https://doi.org/10.
1016/j.optlastec.2019.03.005
18. Luo J, Zhou R-G, Luo G, Li Y, Liu G (2019) Traceable quantum steganography scheme based
on pixel value differencing. Sci Rep 9(1). https://doi.org/10.1038/s41598-019-51598-8
19. Qu Z-G, Chen X-B, Zhou X-J, Niu X-X, Yang Y-X (2010) Novel quantum steganography
with large payload. Opt Commun 283(23):4782–4786. https://doi.org/10.1016/j.optcom.2010.
06.083
20. Grover L (1996) A fast quantum mechanical algorithm for database search. In: Proceedings of
the 28th annual acm symposium on the theory of computing, pp 212–219 (1996)
Smart On-board Vehicle-to-Vehicle
Interaction Using Visible Light
Communication for Enhancing Safety
Driving

S. Satheesh Kumar, S. Karthik, J. S. Sujin, N. Lingaraj, and M. D. Saranya

Abstract Li-Fi technology has emerged as one of the sound standards of commu-
nication where light sources such as LED and photodiodes are used as a data source.
This technology is predominantly used in various modes for felicitating any type of
data communication. In the field of automobile, the role of Li-Fi technology marks
highly essential for achieving vehicle-to-vehicle interaction in a smart environment.
This smart communication finds its user end application even at a traffic light control
system. Both the transmitter and receiver section take advantage of using LED as a
light source due to its fast switching nature which makes the entire systems to be real-
ized at low cost and greater efficiency. In this paper, an intelligent transport system
is proposed using a Li-Fi technique which is generally a visible light communication
for felicitating a secured vehicle-to-vehicle interaction in a dynamic situation. The
receiver design is robust and dynamic which interprets the light waves into data with
the help of solar panels and amplifiers which is being transmitted from the other
vehicle. The overall data throughput is good and found an appropriate replacement
for typical RF communication systems in automobiles.

Keywords Li-Fi · Vehicle-to-vehicle communication · Visible light


communication · Automobiles

S. Satheesh Kumar (B) · M. D. Saranya


Department of Electronics and Communication Engineering, KPR Institute of Engineering and
Technology, Coimbatore, India
e-mail: ssatheeshkumarpsg@gmail.com
M. D. Saranya
e-mail: saranmdme@gmail.com
S. Karthik · J. S. Sujin
Department of ECE, Sri Krishna College of Engineering and Technology, Coimbatore, India
e-mail: karthikheyram@gmail.com
J. S. Sujin
e-mail: sujinjs123@gmail.com
N. Lingaraj
Department of Mechanical Engineering, Rajalakshmi Institute of Technology, Chennai, India
e-mail: lingarajleelavathy@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 247
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_19
248 S. S. Kumar et al.

1 Introduction

Nowadays, vehicle-to-vehicle communication (V2V) found greater importance in


the automobile to enhance safe driving and access reliable road transport. The means
of transportation is mostly, wireless in which two different vehicles communicate
each other over an ad hoc mesh network. This kind of safe driving mesaures will
prevent the large scale of accidents merely happening due to unkowns facts about
neighbouring vehicle’s status. On incorporating V2V communication in on-board
vehicles, the details about speed control, emergency warning and location can be
communicated to each vehicles and make the driver to take appropriate decision for
safe driving. Various automobile techologies such as ACC, embedded lane departure
systems, blind spot detection and smart parking technology were used to improve
the smartness of a vehicle rather than ensuring the safety aspects. Li-Fi-based smart
transportation system, originally a safe communication measure, was initially funded
by Department of US Transportation and NHTSA allows enhancing the road trans-
port vehicles get connected using the roadside communicating elements like traffic
light signs and RF towers etc., In future, driverless cars are realizable not only due
to artificial intelligence but also with the implication of light-based V2V communi-
cation. Today almost all automobile frontiers are trying to incorporate wide scope
advantage of V2V communication in their future models.

2 Review of Related Works

In 2004, Komine and Nakagawa [1] proposed a visible light communication tech-
nique for including an adaptable mechanism in vehicles for light dimming, and it is
mainly due to fast modulation of optical sources such as LED and visual light commu-
nication standard (IEEE 802.15.7) for effective wireless communication for short
range. Noof Al Abdulsalam et al. (2015) identified a novel approach for designing
Li-Fi module using normal LEDs for vehicular automation. Turan et al. [2] proposed
a novel modulation coding scheme to achieve minimum Bit Error Rate (BER) for
low latency secured communication using VLC. In 2017, Cailean and Dimian [3]
addressed challenges for implementing Li-Fi-based vehicle communication for anal-
ysed distance measures and visible light positioning. Poorna Pushkala et al. [4]
projected a solution for radiofrequency congestion and reliable audio, and image data
was communicated using Li-Fi communication without involving microcontrollers
and other peripheral devices. In 2018, Jamali et al. [5] proposed a methodology to
avoid road accidents due to vehicle collision-based Li-Fi technology. Satheesh Kumar
et al. [6] reviewed various advancements in recent automobiles to enhance a vehic-
ular communication for human-centred interactions and also deal with emotions
pertained through driver’s action and gestures. Gerardo Hernandez-Oregon et al.
(2019) analysed the performance of V2V and V2I communication by modelling
the road infrastructure by Markov process which benefits accurate calculation of
Smart On-board Vehicle-to-Vehicle Interaction Using Visible … 249

data throughput. In 2020, Subha et al. [7] explained OFDM-based Li-Fi architec-
ture for 5G and beyond wireless communication for enhancing the natto cells for
wireless channel capacity. In 2017, Satheesh Kumar et al. [9] and in 2015, Christable
Pravin et al. [8] explained about automation techniques which can be incorporated for
continuous speech recognition systems. In 2019, Ganeshprabhu et al. [11] discussed
about solar powered robotic vehicle and in 2019, Sujin et al. [10] explained about
the impact of public e-health monitoring systems. As a matter of ensuring the safety
of the individual, in 2018 Nagaraj et al. [12] proposed an alcohol impaired vehicle
tracking system using wearable smart helmet. In 2015, Allin Christe et al. [12, 13]
implemented a novel 2D wavelet transform approach for image retrival and segmen-
tation which paved a way for effective motion capture while driving the vehicle. With
the same intention, in 2020, Mazher Iqbal et al. [15] implemented a MWT algorithm
for effective analysis of medical images. All the way, in 2014, Satheesh Kumar et al.
[14] proposed rapid expulsion of acoustic soft noise using RAT algorithm which was
found to be an effective algorithm for removing soft noises in the images too and to
make it more convenient, region based scheduling is practiced with certain sensor
networks and this is focussed by Karthik et al. [16] in the 2019.

3 Smart Vehicle-to-Vehicle Interaction for Safety Driving

As discussed in previous sections of this paper, smart vehicle-to-vehicle interaction


is an effective solution for on-board automobile communication to eliminate the
frequency of occurrence of road accidents due to indecision nature of driver while out
of control. Vehicle-to-vehicle communication is always found to be a radial approach
which brings out phenomenal outcome when it comes to automation. People are still
dwelling behind the scope of AI and data analytics but fail to utter for visible light
communication. The proposed system brings out a powerful and simple approach
of visible light communication to eliminate the protocol-based Li-Fi technology to
save the cost of implementation and energy.

3.1 Transmitter Section

The basic functionality of the transmitter section is discussed here. The sensor module
integrated with the transmitter section acquires the data from the vehicle which is
being sensed. Due to the dynamic nature of the vehicle movement, the variations in
the sensing element are generally a fluctuation voltage, but moreover, it represents the
AC voltage. This will be converted into a DC voltage level by the sensor module which
will be readable by the microcontroller unit. A microcontroller unit is a processing
unit which compares the current data with the previous one and provides the output
to the LED driver (Fig. 1).
250 S. S. Kumar et al.

Fig. 1 Transmitter section


using Li-Fi technology

Once the output reaches the LED driver circuit, the data will be ready for transmis-
sion via wireless mode. The photodiode detects the light which is been transmitted
and converts it to a current. LCD will display the output appropriately. This approach
will make our road accidents to at least reach to a smaller extent. The push buttons are
generally used to take care of the contact establishments between different modules.
The motor got interfaced with the brake shoe and other primary controlling units of
an automobile. The usage of LEDs will bring a simple possible transmitter module
for felicitating the speed control of a vehicle for smart transport.

3.2 Receiver Section

The receiver section has been implemented with the same set-up where the LED
blinking can be detected at the frequency above 1 kHz. The ultrasonic sensor detects
the distance between the two different vehicles. If the distance goes below the
threshold range, which is generally a safe distance level, an appropriate alert is
being transmitted by the Arduino module. The collected data will be processed by
a PIC microcontroller unit for reliable control actions. The ultrasonic sensor unit
is completely tuned for safe distance and violating that distance results in a chance
of getting hit with other vehicles. So, Arduino pro-mini module takes care of other
peripheral sensors for further processing of sensed information (Fig. 2).
All these separate processes help in achieving sudden response during an unfair
situation. The auxiliary systems help the receiver unit to take appropriate actions and
reduce the computation and processing burden of PIC microcontroller. The Bluetooth
systems help in conveying the state of information to the driver and help him to handle
the situation manually to some extent. But if the action fails to happen within a span
of time then, automatic actions will be triggered by the microcontroller which helps
to avoid accident situation.
Smart On-board Vehicle-to-Vehicle Interaction Using Visible … 251

Fig. 2 Receiver section using Li-Fi technology

4 Experimental Results

4.1 Major Task

The entire system gets connected in a coherent fashion to take care of major task
whatever the V2V system adopts during the time of controlling actions. The user
is the deciding one who can choose the option which highly recommended at the
situation to access the vehicle.

4.2 Bluetooth Connectivity

The entire set-up is well connected, and served Bluetooth allows the information to
be transferred to a short-range preferably inside the vehicle to get connected with all
devices. LCD module helps in indicating the information which is really needed to
display.

4.3 Switching Control

Manual control is essentially required to control the vehicle actions based on the
level of comforts. Special buttons can be provided to perform switching actions.
252 S. S. Kumar et al.

4.4 Visible Light Communication for Li-Fi

Through VLC and the information, video and sound are then moved through light
devotion. Clear light interchanges (VLC) operate by adjusting the current over and
over to the LEDs at an incredible rate, to rate to see the naked eye in any way, so that
there is no gleaming. While Li-Fi LEDs should be held on to submit information,
they could be darkened to below people’s vision while still radiating enough light
for information to be transmitted. In addition, the invention is important, relies on
the obvious spectrum, because it is limited to the improvement and is not modified
in line with portable correspondence. Advances that allow different Li-Fi cells to
wander through otherwise called hinders that make it possible for the Li-Fi to change
consistently. The light waves can not be separated into a lot shorter range, but are
increasingly free from hacking than Wi-Fi. For Li-Fi, direct views do not matter;
70 Mbit/s can be obtained from light that reflects the dividers (Fig. 3).
Li-Fi is an invention of ORs, using light emanating from light diodes (LEDs) as
an organized, flexible and quick link to Wi-Fi via these boards. Illumination from
light-emanating diodes (LEDs). The Li-Fi ad was expected to rise at an annual rate
of 82% from 2013 to 2018 and to amount to more than 6 billion dollars per year by
2018. However, the market has not created a speciality showcase in this capacity,
and Li-Fi remained primarily for creative assessment (Fig. 4).
These sorts of V2V frameworks are required in light of the fact that human can
commit errors while driving which can make mishaps and they are valuable all
together use the excursion viably and in a made sure about way (Figs. 5, 6 and 7).

Fig. 3 Process flow of Li-Fi-based VLC


Smart On-board Vehicle-to-Vehicle Interaction Using Visible … 253

Fig. 4 LED ıntensity versus


transmission range

Fig. 5 Safety units implemented in vehicle ınteraction

For instance fluorescent and flashing lights, powered lights are up to 80% more
efficient than traditional lighting. 95% of life in LEDs is converted into light, and
just 5% is lost as energy.
This is in contrast to glaring lights which make 95% of vitality warm and 5%
cold. LEDs are amazingly vitality proficient and expend up to 90% less force than
brilliant bulbs. Since LEDs utilize just a small amount of the vitality of a glowing
light, there is a sensational decline in power costs.
Shading rendering index is an estimation of a light’s capacity to uncover the
real shade of items when contrasted with a perfect light source (characteristic light).
High CRI is commonly an alluring trademark (despite the fact that obviously, it relies
upon the necessary application). LEDs by and large have high (great) appraisals with
regards to CRI.
Maybe the most ideal approach to acknowledge CRI is to take a gander at an
immediate correlation between LED lighting (with a high CRI) and a customary
lighting arrangement like sodium fume lights (which by and large have helpless
CRI evaluations and are at times practically monochromatic). See the accompanying
picture to thoroughly analyse the two occasions (Fig. 8).
254 S. S. Kumar et al.

Fig. 6 Signal propagated with various frequency using LED source

5 Conclusion

The task targets planning a model for move of data from one vehicle in the front to
the back. This can be controlled remotely by means of an application that gives the
highlights of switch mode. An application is run on Android gadget. The framework
can be utilized in a wide scope of zones. The framework incorporated with various
highlights can be applied in the accompanying fields.
• The vehicles will be securely rerouted to elective streets and courses, which will
eliminate traffic clog essentially
• Because of frameworks, for example, the early accident alert, which will let vehi-
cles impart speed, heading and area with one another, there will be less occasions
of accidents.
• Because of lower blockage and less time spent in rush hour gridlock, the
contamination brought about by the vehicle will be lower also.
• The innovation is incorporated with the vehicle during its unique creation and
frequently gives both sound and visual alerts about possible issues with the vehicle
or the environmental factors.
Smart On-board Vehicle-to-Vehicle Interaction Using Visible … 255

Fig. 7 Signal received for various frequency using an LED source

Fig. 8 Hardware set-up


256 S. S. Kumar et al.

• The innovation is included after the first get together; reseller’s exchange gadgets
are ordinarily not as completely coordinated as those applied during creation; V2V
secondary selling gadgets can be introduced by vendors or approved vendors.
Other reseller’s exchange gadgets could be independent and versatile gadgets that
can be conveyed by the traveller or driver.
• These gadgets are founded on street foundation things, for example, street signs
and traffic signals. The vehicles would have the option to get data from founda-
tion gadgets, which will help forestall mishaps and give natural advantages; this
correspondence procedure is called V2V, for short. This sort of correspondence
could give an admonition when a vehicle disregards a red light or a stop sign, has
unreasonable speed, enters a diminished speed zone, enters a spot with unexpected
climate changes and comparable.
As of now, the application is made for Android smartphone; different OS stage
does not bolster our application. Taking a gander at the current circumstance, cross-
stage framework that can be created on different stages like iOS, Windows can be
manufactured.

References

1. Komine T, Nakagawa M (2004) Fundamental analysis for visible-light communication system


using LED lights. IEEE Trans Consum Electron 50:100–107
2. Turan B, Narmanlioglu O, Ergen SC, Uysal M (2016) Physical layer ımplementation of standard
compliant vehicular VLC. In: IEEE vehicular technology conference. https://doi.org/10.1109/
VTCFall.2016.7881165
3. Cailean A-M, Dimian M (2017) Curent challenges for visible light communication usage in
vehicle applications: a survey. IEEE Commun Surveys Tutor. https://doi.org/10.1109/COMST.
2017.2706940
4. Poorna Pushkala S, Renuka M, Muthuraman V, Venkata Abhijith M, Satheesh Kumar S (2017)
Li Fi based high data rate visible light communication for data and audio transmission. Int J
Electron Commun 10, 83–97 (2017)
5. Jamali AA, Rathi MK, Memon AH, Das B, Ghanshamdas, Shabeena (2018) Collision avoid-
ance between vehicles through Li-Fi based communication system. Int J Comput Sci Netw
Secur 18, 72–81 (2018)
6. Satheesh Kumar S, Mazher Iqbal JL, Sujin JS, Sowmya R, Selvakumar D (2019) Recent
advancements in automation to enhance vehicle technology for human centered interactions. J
Comput Theor Nanosci 16
7. Subha TD, Subash TD, Elezabeth Rani N, Janani P (2020) Li-Fi: a revolution in wireless
networking. Elsevier Mater Today Proc 24:2403–2413
8. Christabel Pravin S, Satheesh Kumar S (2017) Connected speech recognition for authentication.
Int J Latest Trends Eng Technol 8:303–310
9. Satheesh Kumar S, Vanathi PT (2015) Continuous speech recognition systems using reservoir
based acoustic neural model. Int J Appl Eng Res 10:22400–22406
10. Sujin JS, Gandhiraj N, Selvakumar D, Satheesh Kumar S (2019) Public e-health network system
using arduino controller. J Comput Theor Nanosci 16:1–6
11. Ganesh Prabhu S, Karthik S, Satheesh Kumar S, Thirrunavukkarasu RR, Logeshkumar S (2019)
Solar powered robotic vehicle for optimal battery charging using PIC microcontroller. Int Res
J Multidisc Technovat 4:21–27
Smart On-board Vehicle-to-Vehicle Interaction Using Visible … 257

12. Nagaraj J, Poongodi P, Ramane R, Rixon Raj R, Satheesh Kumar S (2018) Alcohol impaired
vehicle tracking system using wearable smart helmet with emergency alert. Int J Pure Appl
Math 118:1314–3395
13. Allin Christe S, Balaji M, Satheesh Kumar S (2015) FPGA ımplementation of 2-D wavelet
transform of ımage using Xilinx system generator. Int J Appl Eng Res 10:22436–22466
14. Satheesh Kumar S, Prithiv JG, Vanathi PT (2014) Rapid expulsion of acoustic soft noise for
noise free headphones using RAT. Int J Eng Res Technol 3:2278–0181
15. Mazher Iqbal JL, Narayan G, Satheesh Kumar S (2020) Implementation of MWT algorithm
for image compression of medical images on FPGA using block memory. Test Eng Manag
83:12678–12685
16. Karthik V, Karthik S, Satheesh Kumar S, Selvakumar D, Visvesvaran C, Mohammed Arif
A (2019) Region based scheduling algorithm for Pedestrian monitoring at large area build-
ings during evacuation. In: International conference on communication and signal processing
(ICCSP). https://doi.org/10.1109/ICCSP.2019.8697968
A Novel Machine Learning Based
Analytical Technique for Detection
and Diagnosis of Cancer from Medical
Data

Vasundhara and Suraiya Parveen

Abstract Cancer is the most dreadful disease which has been affecting human race
for a decade. Cancer is of different types which have been affecting people to the
at most saturation and ruining thousand of life per year. The most common cancer
is breast cancer in females and lung sarcoma in male. Globally, breast cancer has
been turned into a cyclone of disease due to which many females particularly of
the middle age (30–40 years). Cancer proved out to be the most nationwide disease
due to which millions of lives are being taken as recorded by the National Cancer
Registry programme of the Indian Council of Medical Research (ICMR), that is,
more than 13,000 individuals lose their life every day. Breast cancer contributes to
the majority of deaths of many women and the maximum percentage, i.e., 60% of
the ladies are being declared dead due to breast cancer. In this paper, our main point
of implementation is to develop more précised and more accurate techniques for the
diagnosis and detection of cancer. The machine learning algorithms have been taken
into account for the betterment and advancement in the medical field. Support vector
machine, naïve Bayes, KNN, decision tree, etc. have been used for classification as
these are the various types of a machine learning algorithm.

Keywords Breast cancer · Machine learning · Dreadful disease · Women

1 Introduction

Machine learning is an application of artificial intelligence which has the poten-


tial to enhance the system and improve our earlier experiences without complexing
them [1]. It mainly focuses on improving dataset and makes it more pronounced in
comparison to the earlier one. Machine learning is defined as a process of creating

Vasundhara · S. Parveen (B)


Department of Computer Science, School of Engineering Science and Technology, Jamia
Hamdard, New Delhi, India
e-mail: husainsuraiya@gmail.com
Vasundhara
e-mail: vasundhara301993@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 259
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_20
260 Vasundhara and S. Parveen

Fig. 1 Comparison of
mammogram images of
normal and abnormal breast

models which can perform a certain task without the need of human work for a human
explicitly programming it to do something [2]. For example, if a person X listen to
music A on YouTube if she or he likes it hitting the like button, then whenever he
again listens to music which he likes to hear. The high pace and soothing which
provide only due to machine learning (Fig. 1).
Machine learning has evolved way of treating and diagnosing breast cancer in
the most appropriate manner with the utilization of its various algorithm [3]. It has
been proving an essential tool for the early detection and classifying breast cancer
based on risk factor extremities. A number of machine learning algorithm such as
a neural network (Bayesian network, decision tree, support vector machine) are
taken into account on large scale and proved to be an asset as it allows the doctors
to detect cancer and classify it on the extremities of different stages.

2 Related Work

Breast cancer is one of the most dangerous cancers which has taken lakh of lives and
affected the women population the most. Comparsion of different types of algorithms
such as support vector machine, ART, naïve Bayes and K-nearest neighbor are
being used. In this paper, the algorithm being used is support vector machine on
the Wisconsin breast cancer database where KNN technology gives the best accurate
results. Other algorithms also performed well in this experiment. SVM is a strong
technique and used with Gaussian kernel which is the most appropriate technique
for the occurrence and non-occurrence prediction of breast cancer. The SVM used
is applicable only when the number of class variable is binary [4].
In [5], this author focuses on proposing adaptive ensemble learning voting method
for diagnosing breast cancer database. The aim is to do a comparative study and
explain how ANN and logistic algorithm, when combined with ensemble machine
learning algorithms, provide output which gives more accurate figures. The dataset
being used here is Wisconsin breast cancer database. The accuracy of being used
A Novel Machine Learning Based Analytical Technique … 261

ANN 98.05% as compare to other algorithms. While testing accuracy of this


database, came to the conclusion the diagnosing of breast cancer at early stages
is more beneficial and many lives can be saved.
In [6] this paper, attempted to bring out a comparison between different machine
learning algorithms such as support vector machine, random forest, naïve Bayes
for more accurate detection of breast cancer. These machine learning algorithms are
used along with the database. The results using the above technique give us an exact
idea to use which machine learning algorithm for more accurate outcomes. Machine
learning techniques are being used on a large scale to serve as a useful diagnostics
tool along with other medical equipment. This helps us in many ways. Based on
our results, each machine learning algorithm programme is best in its way. But the
one which outshines the other is support vector machine, having more accuracy and
precision. Random forest is the best to classifying tumors better than support vector
machine.
In [7] this paper, many women are fighting with this dreadful disease and their lives
are taken aback. Due to unawareness of the diagnosis of this disease, this is increasing
on a wider level. The practical analysis shows that 97.8% support vector machine is
the most appropriate algorithm. In this paper, description of various algorithms has
been done to predict breast cancer. The application technology in the medical field
has become an essential asset since it provides more efficiency along with medical
knowledge.
In [8] this paper, breast cancer is the most known common cancer occur in women.
As machine learning technique has been used for classification and thus more used
for early diagnosis and detection of cancer. In this, the author tries to compare that
artificial neural network and SVM are more effectively used as a machine learning
algorithm for classification and find its accuracy.
This paper [9] emphasis on the implementation of IoT technology medical health
care for enhancing the quality of care and minimizing the cost required for its automa-
tion and optimization of resources. The use of IoT technology in medical imaging
insures to have correct and more accurate information about a particular symptom
related to a particular disease. Digitalizing and use of modern technology in the field
of medicine have paved off.

3 Methodology

In this research, various machine learning techniques have been surveyed and
analyzed for analyzing and diagnosing the medical data and have found that the
following techniques are very beneficial for it.
262 Vasundhara and S. Parveen

3.1 Support Vector Machine

Support vector machine is an efficient and accurate tool of machine learning algo-
rithm [1]. The main role of SVM is to help in minimizing the upper bound gener-
alization error by increasing the margin between separating hyperplane and dataset.
It automatically tells about the error and it has the potential of running linear or
nonlinear classification [10]. It helps in the early detection of breast cancer. Since
the invention of SVM playing an essential role and has proven to be working effi-
ciently with the efficiency of 98.5% in the field of medical data, it is capable of
classifying up to 90% of amino acids of different compounds due to its high effi-
ciency, and it has the potential of detecting various cancer stages at its initial stage
[11]. When machine learning algorithm is applied in the medical field, it has always
come up with high accuracy and help the population to detect the cancer stages as
early as possible and save thousands of lives across the globe.

3.2 Naïve Bayes

Naïve Bayes classifier is also a type of machine learning algorithm and is one of the
most effective and simple probabilistic classifiers which works on the principle of
Bayes theorem with strong independent assumptions [12]. Naïve Bayes being simple
has proven to be a more précised method in medical data mining. The requirements
of naïve Bayes is not much as only a little piece of data is required for the detection
and diagnosis of breast cancer [2]. One of the most important facts is that naïve
Bayes mainly focuses on providing the decision based on available data. To provide
the result to a maximum of the capabilities, they take into account all the necessary
and important approaches to give out the results more transparently [13].

3.3 PCA

PCA refers to principle component analysis. It is a type of machine learning algorithm


which is taken into account when extraction of data from a high dimensional space
into lowers dimensional space. It mainly focuses to extract the data having more
variation and deleting of non-essential data having few numbers of variants [14].
It can be applied to the given below working fields;
Data visualization: In this step, taking into considerations the various types of data
having non-essential inputs in higher dimensional space PCA proves to be an asset has
it mainly play its role of converting high dimensional space data into low dimensional
space data [3].
A Novel Machine Learning Based Analytical Technique … 263

Speeding machine learning algorithms: PCA helps in speeding up the machine


learning algorithm training and checkout time.

4 Case Study for the Design and Implementation


of the Proposed Methodology

Cancer is one of the most dreaded diseases due to which millions of lives are being
taken successively by each crossing year. Cancerous cells are those cells which have
lost the property of contact inhibition, i.e., when cells come in contact with their
neighboring cells, then they start dividing mitotically to give rise to cancerous cells
or tumor cells. Cancer gene is known as an oncogene and its cells are called onco-
cells. Cancer cells often form lumps or unregulated growth which turns out to be
dangerous for the whole body. Tumor is a majority of two types; (i) benign tumor
and (ii) malignant tumor (Fig. 2).
Among the male population, lung cancer is the most common, whereas in female
population, breast cancer is the most dreaded one and its taking millions of lives
of females all over the globe. Breast cancer is the cancer of the breast in which the
alveolar lobes which are 15–20 in number become cancerous, and in the extreme
case, the entire breast which is being affected by cancer needs to be removed so
that it does not start infecting other especially. It comprises of four stages of cancer.
The diagnosis is being done by biopsy in which the piece of tissue is being cut and
cultured in culture media for detection of many cancerous cells.
Alpha interferon is available in which along with chemotherapy and radiation
is used for its diagnosis. Nowadays, many tablets are being developed by using the
principle of genetic engineering for curing this disease but a finalized sample is
not yet been discovered which can be used as a substitute in place of the painful

Fig. 2 Malignant breast cyst


264 Vasundhara and S. Parveen

chemotherapy. So, to deal with this, machine learning algorithm are being tried to
use to cope up with the uncertainties which arose during the treatment of cancer.
Chemotherapy is a procedure of diagnosis of cancer by injecting of a certain drug
which will destroy the cancerous cells.

4.1 Stages of Cancer

With the use of machine learning algorithms, it is possible to distinguish between the
risk factor related to various stages of breast cancer [15]. These induced technologies
have provided with the results, i.e., a way easier to describe four stages of breast
cancer depending upon the severity of tumor size. These four stages are shown in
Fig. 3.

4.1.1 Stage Zero

This stage indicates the tumor size. The survival rate is 100% in this situation and it
is the earliest indication of cancer being developed in the body and it does not require
much initiative to get cured. It does not comprise of any types of carcinoma.

4.1.2 Stage One

It is also known as an invasive stage. In this stage, the tumor size is not much but
the cancerous cell broke into fatty adipose tissue of the breast. It comprises of two
stages:

Fig. 3 Survival rate in different stages of breast cancer


A Novel Machine Learning Based Analytical Technique … 265

Stage 1A

The tumor size is 2 cm or smaller.

Stage 1B

The tumor size is 2 cm or smaller and it cannot be seen in the breast lobules but is
seen in the lymph node with the size of 2 mm.

4.1.3 Stage Two

In this stage, tumor grows as well as start spreading to an associated organ. The tumor
size remains small but it starts growing and its shape become like that of walnut. It
comprises of two stages:

4.1.4 Stage 2A

The tumor size is 2 cm or smaller but you can find 1–3 cancer cells in the lymph
node under the arms.

4.1.5 Stage 2B

The tumor size is larger than 2 cm but less than 5 cm and the cancer cells have reached
the internal breast that is the mammary gland and the axillary lymph node.

5 Stage Three

In this, size of a tumor is quite prominent and it does not spread to organ and bone but
start to spread to 9–10 more lymph nodes. This stage is very hard to fight a patient
undergoing the treatment. It comprises of two stages:

5.1 Stage 3A

The tumor size is larger than 5 cm and 4–9 cancer cells have reached the axillary
lymph node.
266 Vasundhara and S. Parveen

Fig. 4 Different stages of breast cancer

5.2 Stage 3B

The tumor size is larger than 5 cm and is approximately 9 cm.

5.3 Stage Four

The tumor is 10–20 µm. This is the last stage and the survival is very low because
the tumor cells have been spreading to organ other breast, and hence, this stage is
known as a metastatic stage (Fig. 4).

6 Proposed Methodology

In this review paper, the dataset which has been used is being taken into account
through different sources of data for retrieving information like radii of tumor size,
the stage of cancer, the diagnosis techniques which have been used to detect cancer at
the earliest stage possible by the use of techniques like mammogram, PET-SCAN
and got the outcomes based on these techniques.
Mammogram: It is the technique which is the most crucial step in breast cancer
detection since it tells about the early symptoms and signs about breast cancer it is a
type of X-RAY of the breast. In this technique, the breast is being placed on a plate
which scans and tells about the cancer symptoms or lumps formed [16] in the breast
(Fig. 5).
A Novel Machine Learning Based Analytical Technique … 267

Fig. 5 Mammogram imprint

PET-SCAN: It stands for positron emission tomography. It is advancement in the


area of medical field which gives us clarity over the cancerous cells which are at a
faster rate of growth and tells about the correct radii of tumor cell and the part of the
body which is being adversely affected by the metastatic property of the cancerous
cell [17] (Fig. 6).
In this section, the detailed study on classification and extraction of data from the
dataset is dealt with the utilization of various machine learning algorithm which has
come into the light.
Steps involved in the classification and extraction of data are shown in 7:
Data collection: It is the process of fetching data from different sources and accu-
mulating in the form of.csv file and is preconditioned has an input to the existing
review model.

Fig. 6 PET-SCAN report


268 Vasundhara and S. Parveen

Fig. 7 Proposed model


Data Extraction/Mining

Data Processing

Data Classification

Performance Evaluation

Results

Data processing: It is the step in which the data is processed and it mainly emphasized
on preventing the missing values, noise reduction and picking of relevant data.
Data Classification: In this step, the data is categorized based on different machine
learning algorithm, support vector machine, naïve Bayes and decision tree.
Performance Evaluation: In this step, the processed data is being evaluated and
machine learning algorithm is applied to it and then selection of the data which is
more accurate and efficient utilizes for further medical diagnosis [18].
It helps in four arenas of requirement, i.e., accuracy, precision, F1-measure and
Recall.
Accuracy = TP + TN/TP + FP + TN + FN
Precision = TP/TP + FP
Recall = TP/TP + FN
F1-measure = 2 * precision * Recall/Precision + Recall
These instances are defined as TP as true positive, TN as true negative, FP as false
positive and FN as false negative [19].
Results—This step provides us with the output that can be applied to the analysis
of the whole data.
A Novel Machine Learning Based Analytical Technique … 269

Table 1 Experimental results


Machine learning algorithm Accuracy Precision Recall
Support vector machine 97.78 98.13 96.29

Naïve Bayes 95.98 94.23 93.21

Decision tree 96.45 94.69 95.37

PCA with decision tree 98.03 97.74 97.89

Dataset
In this research paper, the dataset which is being taken into account is breast cancer
dataset which has been extracted from the machine learning repository. In the
given dataset, there are 569 instances which have been categorized as benign and
malignant and 30 attributes have been used.

7 Results

This research is based on the use of various machine learning algorithm being applied
to the required data to extract the required output, i.e., more accurate and précised
diagnosis and detection of breast cancer at an early stage of cancer (Table 1).

8 Conclusion

This research paper attempts to identify the most appropriate methodology for the
diagnosis and detection of breast cancer with the support of machine learning algo-
rithms such as support vector machine, naïve Bayes, decision tree and PCA. The main
point of focus is the prediction of early stages of cancer with the use of the most effi-
cient and precise algorithms. Hereby, concluded that PCA outshines other machine
learning algorithm with an accuracy rate of 98.03%, recall 97.89% and precision
97.74% more areas of improvement have also been viewed and therefore PCA can
work at a more comfortable rate with 1–2% of improvement in its methodology.

References

1. Ray S (2019) A quick review of machine learning algorithms. In: 2019 International conference
on machine learning, big data, cloud and parallel computing (COMITCon), Faridabad, India,
pp 35–39
2. Mello F, Rodrıgo P, Antonelli M (2018) Machine learning: a practical approach on the statistical
learning theory
270 Vasundhara and S. Parveen

3. Zvarevashe K, Olugbara OO (2018) A framework for sentiment analysis with opinion mining
of hotel reviews. In: 2018 Conference on ınformation communications technology and society
(ICTAS), Durban, pp 1–4
4. Bharat A, Pooja N, Reddy RA (2018) Using machine learning algorithm breast cancer
risk prediction and diagnosis. In: Third international conference on circuits, controls,
communication and computing (I4C), pp1–4
5. Khuriwal N, Mishra N (2018) Breast cancer diagnosis using ANN esemble learning algorithm.
In: 2018 IEEEMA, engineer infinite conference (eTechNxT), pp 1–5
6. Bazazeh D, Shubair R (2016) Comparative studyof machine learning algorithm for breast
cancer and detection. In: 2016, fifth international conference on electronics devices ,systems
and application (ICEDSA), pp1–4
7. Khourdifi Y, Bahaj M (2018) Applying best machine learning algorithms for breast
cam=ncer prediction and classification. In: 2018, International conference on elec-
tronics,control,optimaization and computer science (ICECOCS), pp1–5
8. Bayrak EA, Kırcı P, Ensari T (2019) Comparison of machine learning methods for breast cancer
diagnosis. In: 2019 Scientific meeting on electrical-electronics & biomedical engineering and
computer science (EBBT), Istanbul, Turkey, pp 1-3
9. Chandy A (2019) A review on iot based medical imaging technology for healthcare applications.
J Innov Image Process (JIIP) 1(01):51–60
10. Potdar K, Kinnerkar R (2016) A comparative study of machine algorithms applied to predictive
breast cancer data. Int. J. Sci. Res. 5(9):1550–1553
11. Huang C-J, Liu M-C, Chu S-S, Cheng C-L (2004) Application of machine learning techniques
to web-based intelligent learning diagnosis system. In: Fourth ınternational conference on
hybrid ıntelligent systems (HIS’04), Kitakyushu, Japan, pp 242–247. https://doi.org/10.1109/
ICHIS.2004.25
12. Ray S (2019) A quick review of machine learning algorithms. In: 2019 International conference
on machine learning, big data, cloud and parallel computing (COMITCon), Faridabad, India,
pp 35–39. https://doi.org/10.1109/COMITCon.2019.8862451
13. Seref B, Bostanci E (2018) Sentiment analysis using naive bayes and complement naive
bayes classifier algorithms on hadoop framework. In: 2018 2nd ınternational symposium on
multidisciplinary studies and ınnovative technologies (ISMSIT), Ankara, pp 1–7
14. Li N, Zhao L, Chen A-X, Meng Q-W, Zhang G-F (2009) A new heuristic of the decision tree
induction. In: 2009 International conference on machine learning and cybernetics, Hebei, pp
1659–166
15. Kurniawan R, Yanti N, Ahmad Nazri MZ, Zulvandri (2014) Expert systems for self-diagnosing
of eye diseases using Naïve Bayes. In: 2014 International conference of advanced ınformatics:
concept, theory and application (ICAICTA), Bandung, pp 113–116
16. Pandian AP (2019) Identification and classification of cancer cells using capsulenetwork with
pathological images. J Artif Intell 1(01):37–44
17. Vijayakumar T (2019) Neural network analysis for tumor investigation and cancerprediction.
J Electron 1(02):89–98
18. Rathor S, Jadon RS (2018) domain classification of textual conversation using machine learning
approach. In: 2018 9th ınternational conference on computing, communication and networking
technologies (ICCCNT), Bangalore, pp 1–7
19. Douangnoulack P, Boonjing V (2018) Building minimal classification rules for breast cancer
diagnosis. In: 2018 10th ınternational conference on knowledge and smart technology (KST),
Chiang Mai, pp 278–281
Instrument Cluster Design for an Electric
Vehicle Based on CAN Communication

L. Manickavasagam, N. Krishanth, B. Atul Shrinath, G. Subash,


S. R. Mohanrajan, and R. Ranjith

Abstract Electric vehicles are the need of the hour due to the prevailing global
conditions like global warming and increase in pollution level. For a driver, control-
ling the EV is same as a conventional IC engine automobile. Similar to the instrument
cluster of a conventional vehicle, EV also has an instrument cluster that acts as an
interface between the human and the machine. But the later one displays more crit-
ical parameters that are very essential for controlling the EV. This paper deals with
the development of EV instrument cluster that would display vital parameters by
communicating with different ECUs of the vehicle using industrial standard CAN
bus. Speedometer and odometer details are displayed on a touch screen panel which
is designed with a user-friendly interface. Python-based GUI tools are used to design
the interface.

Keywords Electric vehicle · Instrument cluster · Motor control · BLDC motor ·


CAN communication · UI design

1 Introduction

Global warming has become a serious threat to the existence of human beings. One
of the main reasons for global warming is carbon dioxide (CO2 ) emission through
various man-made sources. One such man-made source is the internal combustion
(IC) engine that powers a variety of automobiles worldwide. Electric vehicles (EVs)
of different types are replacing IC engine vehicles. The different types of EVs are

L. Manickavasagam · N. Krishanth (B) · B. Atul Shrinath · G. Subash · S. R. Mohanrajan ·


R. Ranjith
Department of Electrical and Electronics Engineering, Amrita School of Engineering, Amrita
Vishwa Vidyapeetham, Coimbatore 641112, India
e-mail: krishanthraju@gmail.com
S. R. Mohanrajan
e-mail: sr_mohanrajan@cb.amrita.edu
R. Ranjith
e-mail: r_ranjith@cb.amrita.edu

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 271
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_21
272 L. Manickavasagam et al.

battery electric vehicles (BEV), hybrid electric vehicles (HEV), and plug-in hybrid
electric vehicles (PHEV). BEV is the main focus area as it involves core electrical
and electronic components. EVs also come in four-wheelers and two-wheelers. Two-
wheeler EV is the main area of focus.
The main components of EVs are the drive motor, battery, and many control units
like electronic controller unit (ECU), battery management system (BMS), instrument
cluster which work in harmony to make EV a working engineering marvel. Instrument
cluster is one of the most important parts of a vehicle as it acts as an interface between
the human and the machine. Based on the information given by the instrument cluster,
the driver can take the necessary decisions and actions. So, a proper interface between
the user and the machine is required. On the motor control front, there are various
methods to control the drive motor efficiently. Motor control is the heart of the EV.
Motor control involves choosing the appropriate control strategy, implementing the
same using microcontroller and integrating motor and inverter ensuring proper basic
functionality of an EV.
The instrument cluster is the main focus. In the early days, mechanical instrument
clusters were used. These clusters used gear arrangements to infer speed, and they
displayed the same using needle. The accuracy of these clusters was very less, and
they were prone to damage. These faults gave way to electronic clusters which are
very accurate and are prone to very less damage. Instrument cluster design involves
the calculation of the data to be displayed in the instrument cluster, development of
user interface, and programming of the microcontroller for the instrument cluster.
There is a need to establish communication between two control units as data is calcu-
lated in one unit and displayed in other. There are many communication protocols.
CAN bus protocol, which is the industry standard communication, is used to estab-
lish communication. Parameters such as distance covered and speed of the vehicle
are displayed. The objective of the project can be divided into three areas—motor
control, instrument cluster design, and communication between the ECUs.

2 Instrument Clusters for BLDC Motor

BLDC motor is preferred over the DC motor because of the BLDC employees the
electronic commutation which avoids the wear and tear of the machine, unlike the
mechanical commutation which DC motor employees. There are various control
methods present for BLDC motor for reducing the torque ripple such as sinusoidal,
trapezoidal, and field-oriented control. The back electromotive force (back-emf)
waveform of a permanent magnet brushless hub motor is nearly sinusoidal, which
is suitable for sinusoidal current control. To minimize the torque ripple, the field-
oriented control based on hall-effect sensors is applied. The authors D. Li and D.
Gu propose the field-oriented control for BLDC motor for four-wheel drive vehicles.
The rotor position is estimated through the interpolation method using the signals of
the hall-effect sensors [1]. A semantic scholar S. Wang has written about the torque
ripple reduction in his paper using modified sinusoidal PWM. To reduce the torque
Instrument Cluster Design for an Electric Vehicle … 273

ripple, a feed-forward controller is developed to eliminate the instantaneous great


torque change of the motor [2].
The paper by G. Vora and P. Gundewar presents the designing advantages of
digital instrument cluster and explained detailly about the communication protocols
involved in the automotive applications.
J. Pešić, K. Omerović, I. Nikolić, and M.Z. Bjelica explained the process of open
source techniques with the development of clustering application [3–5]. The paper
by H. Chen and J. Tian introduces the process of CAN protocol, and it helps for the
process of fault identification [6].
An EV may use one or more electric motors for propulsion. Other main compo-
nents are the power converter for an electric motor, high-voltage battery pack,
charger, and battery management system (BMS). The paper by L.A. Perişoară, D.L.
Săcăleanu, and A. Vasile deal with the usage of two interfaces for the monitoring
of electric vehicle. The first interface is based on Arduino Uno with a 20 × 4 LCD.
The second interface is based on virtual instrument cluster designed on LabView.
Both the interfaces communicate by CAN buses. The instrument clusters can be
designed according to the requirements of the user. The hardware cluster is a low-
cost solution, while the virtual cluster needs more expensive hardware interface for
the communication bus [7].

3 Proposed Instrumentation System

The overall system is designed to do the following tasks as discussed in the objective:
• BLDC motor control
• Display of vehicle parameters in a display
• CAN bus communication between ECUs

3.1 BLDC Motor Control

Battery, inverter, and motor (along with sensors) form the part of the system which is
readily present in the EV (Fig. 1). To start the work, the first part is to make the motor
work and is achieved using the motor control unit (MCU). The input to this MCU is the
Hall-effect sensor of the BLDC motor. The output goes as PWM signals to the three-
phase Inverter. The calculations of speed and distance travelled are computed here.
All these ECU’s should be communicating in real time to have a real-time dashboard.
There are two ECUs in the design which needs to be communicating with each other.
CAN bus is the standard communication protocol used in vehicles. Therefore, CAN
bus is chosen. The computed values are communicated to the instrument cluster
controller-Raspberry pi. A digital display is interfaced with the pi, which acts as the
dashboard. The required components and its specifications are listed (Table 1).
274 L. Manickavasagam et al.

Fig. 1 Block diagram of electric vehicle

Fig. 2 Closed-loop control by PWM method


Instrument Cluster Design for an Electric Vehicle … 275

Fig. 3 Basic UI

Fig. 4 Graphical UI

For speed control of BLDC motor, there are three methods namely trapezoidal
control, sinusoidal control, and field-oriented control. In trapezoidal control, the
stator poles are excited based on a commutation logic which will be described in
detail in the later part of this section. The pulses are chopped into smaller pulses, and
based on the speed, the pulses width of the smaller pulses is varied. This method is
comparatively easier to implement when compared with other methods. This method
276 L. Manickavasagam et al.

Fig. 5 CAN test hardware setup

Fig. 6 CAN frame seen in an oscilloscope

requires the position of the rotor to be known to excite the stator. This method gives
ripples in torque. This method is called trapezoidal control as the back-emf has
the shape of a trapezium. In sinusoidal control, back-emf is made to resemble the
sinusoids. Thus, the motor coils are excited by three sinusoids, each phase shifted by
120°. In this method, torque ripples are reduced. This method requires the position
of the rotor to be known at the accuracy of 1°. Therefore, this method is complex in
implementation as many estimations has to be done.
Instrument Cluster Design for an Electric Vehicle … 277

Fig. 7 Implementation of motor control

Fig. 8 Communication implementation

The third method is field-oriented control. The main principle of this method is
that the torque of a motor is at a maximum when the stator and the rotor magnetic
fields are orthogonal to each other. This method tries to maintain both the magnetic
fields perpendicular to each other. This is done by controlling the direct axis current
and quadrature axis current of the motor. These currents have arrived from the line
currents using the Park and Clarke transformations. This method provides the best
torque results but very complex to implement. Trapezoidal control is implemented.
An electric vehicle is operated using three-phase inverter. The output from the
inverter is connected to the three input terminals of the motor. In Fig. 2, the motor
phases are represented in a series of resistance, inductance, and back-emf. The gating
pulse for the inverter is given from the microcontroller by sensing the position of
the rotor with the help of the Hall-effect sensor. Each phase in the stator (A, B, C)
is phase shifted by 10°. So, at each phase, there is one Hall-effect sensor embedded
in the stator. Hall-effect sensor can sense the presence of a rotor if the rotor is at
90° left or 90° right to the position of the sensor. Each sensor conducts for 180°. Let
us assume that the rotor is at phase A. So, here hall B and hall C cannot sense the
presence of rotor since the rotor is not around 90° from the position of sensors, and
only hall A can sense it. So, logic for Hall sensors will be 100. Each sensor is active
for 180°° in each cycle. As it can be seen, the back-emf generated is trapezoidal in
shape, and the phase voltage is varied accordingly by taking the Hall-effect sensor
states. There are six different states. Based on the Hall-effect sensor outputs, the
278 L. Manickavasagam et al.

Fig. 9 System overview

required phase of the BLDC motor is excited to move the motor forward. Thus, the
following excitation table is got (Table 2).
Model-based development is an embedded software initiative where a model
is used to verify control requirements and that the code runs on target electronic
hardware. When software and hardware implementation requirements are included,
you can automatically generate code for embedded deployment by saving time and
avoiding the introduction of manually coded errors. There is no need to write code
manually. The controller automatically regenerates code. Model-based development
can result in average cost savings of 25–30% and time savings of 35–40% [7]. Instead
of using a microcontroller to generate PWM signals for inverter switching operation,
here, STM32F4 microcontroller is used (Fig. 2).
In closed-loop speed control, the actual speed is fed back and is compared with the
reference speed. This error can be reduced by tuning the PI controller accordingly.
The output of the PI controller is given to the PWM generator which generates a
pulse based on the duty ratio. When this pulse is given to AND with the gating
pulses of the inverter, PWM pulses are obtained. By generating PWM signals, the
Instrument Cluster Design for an Electric Vehicle … 279

average voltage that is applied to the motor can be varied. As the voltage applied to
the motor changes, speed also changes. If the average voltage applied to the motor
increases, then speed also increases and vice-versa.
To generate PWM signals, the first step is creating a simulation in MATLAB.
But MATLAB as such do not support STM32F4 board. So, it is necessary to install
software which supports STM32F4 to interface with PC. Waijung is a software
which interfaces STM32F4 discovery board with MATLAB. After installing Waijung
module blocks in MATLAB, run and build the simulation. Once the simulation
builds, waijung target setup automatically generates the code, and it is dumped into
the board from which gating signals are taken. Hence, without writing code manu-
ally, MATLAB automatically generates the code using waijung blockset. Thus, the
STM32 discovery board was programmed using waijung blockset and MATLAB to
implement the above speed control method.
The motor always needed an initial push to start. So in-order to avoid this, various
methods were discussed.
The second method was to the fact that inverter is triggered based on the position
of the rotor. As rotor position cannot be accurately got to the 1° precision, triggering
of inverter always had an uncertainty of getting the correct phases excited. So, the
main aim was to generate the Hall-effect sensor output pulses manually with the
same lag and correct time period based on the speed. The main aim of this was to
excite the motor in a way to make it move initially and then give the triggering based
on the original Hall-effect sensors present in the motor. This method was done and
the motor started to run in slow speed just enough for the Hall-effect sensors to give
proper output, and then the Hall-effect sensors took care of the normal running. The
third method was to the fact that there was a lag between Hall-effect sensors input
and triggering pulses. So to improve that, the sampling time of the board as set in
the waijung blockset was reduced even further. This method was also tried, and this
method drastically improved the response of the whole system. The second and third
methods were implemented in the STM32 discovery board using the MATLAB and
waijung blockset.

3.2 Instrument Cluster Design

To display a parameter in the instrument cluster, the parameters to be displayed need


computation upon the data acquired from other ECUs. The important parameters
that are calculated—distance and speed. Hall-effect sensors play a very important
role in the proposed design as most of the calculations depend on the output of these
sensors.
Distance calculation is computed by finding the number of revolutions wheel has
undergone and then multiplying it with the circumference of the wheel. The above
logic has been implemented in the Simulink using a counter block, then a function
block for adding the circumference recursively and a memory block are used to
facilitate the recursive action.
280 L. Manickavasagam et al.

One of the main computations to be done is speed. RPM data acquired from motor
control unit is converted to speed by considering the diameter of the wheel. The main
approach to get the RPM is based on the frequency of the Hall-effect sensor pulses.
The relation between frequency and RPM of the motor is got. Then, the relational
constant is used to get the RPM from the frequency of the Hall-effect pulses. Then,
the RPM is converted to kilometers per hour (kmph) using a formula.
Main objective of the project is to create a useful user interface (UI) for the driver
to get the required data from the vehicle. Raspberry Pi is used as the controller for
the instrument cluster. For developing the UI, Python language is used. Software
tool mainly used to design the interface is the Python 3 IDLE. Tkinter is the Python
module used for building the UI. A meter class with a basic structure of gauge is
formed. The class has the definitions for creating the structure of a gauge, moving
the meter needle as per the value, setting the range of the meter and other display
specifications such as height and width. Then, Tkinter is used to create a canvas,
and two objects of the class meter are placed for displaying RPM and speed. It is an
analog type meter which forms the basic structure for displaying of speed and RPM
(Fig. 3).
Here, two gauges were developed separately for speed and RPM. Initial values
are given in the Python program which is represented by the needle.
The initial UI design was very basic. So, to make the UI more graphical and
look better, various Python packages were searched. An interactive graphing library
called Plotly has many indicators which were similar to the requirement. So, the
Plotly library was used and the below UI with two meters for speed and RPM, and
a display for distance was formed. This module displays in a Web browser (Fig. 4).
Here, two gauges were developed using Plotly module in the Python program.
This presents the speed, RPM, and distance in a graphical manner than the previous
one.

3.3 CAN Communication

To communicate data between the motor control unit (MCU) (STM32 discovery
board) and instrument cluster, control area network (CAN) is chosen as it is the
current industry standard. The CAN protocol is robust and frame-based; hence, it
allows the communication between ECUs that happen without any complex wiring
in between. CAN uses a differential signal, which makes it more resistant to noise,
so that messages are transmitted with less marginal errors.
Control area network is a serial communication bus designed to allow control units
in vehicles to communicate without a host computer. CAN bus consist of two wires
named CAN high (CAN H) and CAN low (CAN L) and two 120  resistors at the
end for termination. Traffic is eliminated since the messages are transmitted based
on the priority, and the entire network meets the timing constraints. Each device can
decide if the message is relevant or needs to be filtered. Additional non-transmitting
nodes can be added to the network without any modifications to the system. The
Instrument Cluster Design for an Electric Vehicle … 281

messages are of broadcast type, and there is no single master for the bus. Hence, it
is multi-master protocol. Speed can be varied from 125 kbps to 1mbps.
For communication in CAN, two additional hardware is required. One is CAN
controller, and the other is CAN transceiver. In the transmitting end, CAN controller is
used to converting the data to CAN suitable messages. These messages are then turned
to differential signals using CAN transceiver. In receiving end, CAN transceiver takes
the differential signals and changes it to CAN message. Then, this is changed to the
data by the CAN controller. STM32 board has inbuilt CAN controller so only CAN
transceiver is required. But Raspberry Pi does not have an inbuilt CAN controller,
so it needs to be provided separately along with transceiver. For communication
between CAN controller and microcontroller, SPI communication is used.
The STM board and Raspberry Pi were made to communicate via CAN bus.
Raspberry Pi does not have inbuilt CAN controller, so CAN module was used, and
communication between Raspberry Pi and CAN module was established using SPI
communication (Fig. 5).
SPI communication was enabled in Raspberry Pi with oscillator frequency of the
oscillator present in the CAN module, and then, CAN communication was brought
up with the baud rate. CAN message was sent from STM board and was programmed
using Simulink (Fig. 6).
Then, the CAN message was received as sent by the STM board in the Raspberry
Pi by setting the baud rate to the value used while programming STM. It was viewed
in the terminal of Raspberry Pi and also in the Python shell. CAN library for Python
was used to get the CAN message and display it.

4 Results

The integration of all three separate parts results at the intended hardware system,
the electric vehicle.
As discussed in the previous sections, the trapezoidal speed control was imple-
mented using the STM32 discovery board to control the drive motor as discussed
in the previous chapter using waijung blockset. The Hall-effect sensor data was
received using pins, and triggering pulses were given to inverter switches based on
an algorithm. A potentiometer is being used to control the speed by giving the refer-
ence speed. The inbuilt ADC is used for giving the potentiometer reading to the
controller. Thus speed control is implemented. The parameters to be displayed are
also calculated in the motor control unit as discussed earlier (Fig. 7).
As discussed in the previous sections, Raspberry Pi is used as the instrument
cluster controller unit. Raspberry Pi display is used as the display for the instrument
cluster. Python language is used to create the UI for the cluster. There are many
packages like Tkinter and wxPython for UI creation. Plotly graphs provided a better
design for UI. Plotly graphs have inbuilt gauge meters which are used in the display.
As discussed in the previous sections, communication between control units is
established using CAN bus. CAN bus is established between the discovery board
282 L. Manickavasagam et al.

(MCU) and Raspberry Pi (instrument cluster). Discovery board has inbuilt CAN
controller, so it is connected to the CAN bus using a CAN transceiver module. For
Raspberry Pi, external CAN controller and CAN transceiver modules are used. Thus,
communication is established (Fig. 8).
The calculated values are sent as a message in CAN bus by the motor control
unit. CAN package in Python language is used to retrieve the CAN message sent
by the motor control unit. The data is separated from the message and then given
to the UI to display the same. The data is separated to speed and distance based
on the bit position set on the MCU while sending data. Then, the data is converted
from hexadecimal to the decimal and then given to the UI as the appropriate variable
(Fig. 9).

5 Conclusions

Instrument cluster for an electric vehicle was implemented. The implemented motor
control is primitive which gives many disadvantages like torque ripples. A better
control algorithm can be implemented. The inverter used for implementation is very
bulky and damages the core principle of a two-wheeler which is its compactness. A
better single-chip inverter is required to maintain the compactness of the vehicle. The
calculation algorithms used for calculation of the parameters such as the speed and
distance work. The results seem to be correct as the RPM, which is the main parameter
for calculating the speed and distance, are in accordance with the tachometer reading.
But they can be perfected using better algorithms which give more precision and
accuracy. This proposed design proves that high-level programming languages can be
used to design user-friendly interfaces instead of sophisticated software tools. CAN
communication which was established between the ECUs was troublesome because
of many issues, like getting garbage values that were faced during its implementation.
Many important parameters like the SOC of battery, distance left until charge drains,
and speed of economy can be added as future works.
Instrument Cluster Design for an Electric Vehicle … 283

Table 1 Components and its specifications


Components Specifications
Battery Battery is the electrical energy source for all the EV system
components
Capacity-7 Ah
BLDC motor Brushless DC motor is the drive for the vehicle
Voltage: 48 V
Current: 5.2 A
Power: 250 W
Speed: (250–350) rpm
Three-phase inverter Inverter is used to supply the energy from the battery to the BLDC
motor
Voltage: 48 V
Current: 18 A
Motor control unit ( MCU) This ECU controls the basic operation of the vehicle. It takes the
input using accelerator, brake and gives appropriate triggering
pulses to the inverter based on the position of the rotor. STM32
discovery board is used as the motor control unit. STM32
discovery board uses an STM32F407VG processor which is based
on the high-performance ARM®Cortex®-M4 32-bit RISC core
Instrument cluster Instrument cluster displays the necessary information to the user
using the display connected to it. Instrument cluster has been
implemented using Raspberry Pi. Its high processing capabilities
allow it to act as a standalone minicomputer

Table 2 Excitation table


Hall1 Hall2 Hall3 Sw1 Sw2 Sw3 Sw4 Sw5 Sw6
0 0 0 0 0 0 0 0 0
0 0 1 1 0 0 1 0 0
0 1 0 0 1 0 0 1 0
0 1 1 1 1 0 0 0 0
1 0 0 0 0 1 0 0 1
1 0 1 0 0 1 1 0 0
1 1 0 0 0 0 0 1 1
1 1 1 0 0 0 0 0 0

References

1. Lu D, Li J, Gu J (2014) Field oriented control of permanent magnet brushless hub motor in


four-wheel drive electric vehicle. In: 2014 8th international conference on future generation
communication and networking, Haikou, pp 128–131
2. Singh J, Singh M (2019) Comparison and analysis of different techniques for speed control of
brushless DC motor using MATLAB simulink
284 L. Manickavasagam et al.

3. Pešić J, Omerović K, Nikolić I, Bjelica MZ (2016) Automotive cluster graphics: current


approaches and possibilities. In: 2016 IEEE 6th international conference on consumer
electronics-Berlin (ICCE-Berlin), Berlin, pp 12–14
4. Krivík P (2018) Methods of SoC determination of lead acid battery. J Energy Storage 15:191–
195. ISSN: 2352–152X
5. Choi J, Kwon Y, Jeon J, Kim K, Choi H, Jang B (2018) Conceptual design of driver-adaptive
human-machine interface for digital cockpit. In: 2018 international conference on information
and communication technology convergence (ICTC), Jeju, pp 1005–1007
6. Chen H, Tian J (2009) Research on the controller area network. In: 2009 international conferenon
networking and digital society, Guiyang, Guizhou, pp 251–254
7. Perişoară LA, Săcăleanu DL, Vasile A (2017)Instrument clusters for monitoring electric vehicles.
2017 IEEE 23rd international symposium for design and technology in electronic packaging
(SIITME), Constanta, pp 379–382
Ant Colony Optimization: A Review
of Literature and Application in Feature
Selection

Nandini Nayar, Shivani Gautam, Poonam Singh, and Gaurav Mehta

Abstract Ant colony optimization (ACO) is a meta-heuristic that is inspired by


real ants that are capable of exploring shortest paths, which inspires researchers to
apply it for solving numerous optimization problems. Outstanding and acknowl-
edged applications are derived from biologically activated algorithms like ACO
that are established from artificial swarm intelligence which in turn is motivated
by the amalgamated behavior of social insects. ACO is influenced by natural ants
system, their behavior, team planning and organization, their integration for seeking
and finding the optimal solution and also to preserve data of each ant. Currently,
ACO has appeared as a popular meta-heuristic technique for finding the solution of
conjunctional optimization problems that is beneficial for finding shortest paths via
construction graphs. This paper highlights the behavior of ants and various ACO
algorithms (their variants as well as hybrid approaches) that are used successfully
for performing feature selection, applications of ACO and current trends. The funda-
mental ideas of ant colony optimization is reviewed including its biological back-
ground and application areas. This paper portrays how current literature utilizes the
ACO approach for performing feature selection. By analyzing the literature, it can
be concluded that ACO is a suitable approach for feature selection.

Keywords Ant colony optimization · Feature selection · Swarm intelligence

N. Nayar (B) · G. Mehta


Department of Computer Science and Engineering, Chitkara University Institute of Engineering
and Technology, Chitkara University, Himachal Pradesh, India
e-mail: nandini.nayar@chitkarauniversity.edu.in
G. Mehta
e-mail: gaurav.mehta@chitkarauniversity.edu.in
S. Gautam · P. Singh
Department of Computer Science and Applications, Chitkara University School of Computer
Applications, Chitkara University, Himachal Pradesh, India
e-mail: shivani.gautam@chitkarauniversity.edu.in
P. Singh
e-mail: poonam.cse@chitkarauniversity.edu.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 285
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_22
286 N. Nayar et al.

1 Introduction

In social insects’ colonies, every insect typically performs its tasks autonomously.
However, the tasks performed by individual insects are correlated because the whole
colony is able to solve even the complex problems by cooperation. Without any
kind of central controller (supervisor), these insect colonies can solve numerous
survival-related issues, e.g., selection/pick-up of material, exploring and storing the
food. Although, such activities require sophisticated planning, still, such issues are
resolved by insect colonies devoid of any central controller. Hence, such collective
behavior emerging from social insects’ group is termed as “swarm intelligence.”
During the last few decades, ever-increasing research on these algorithms suggests
that nature is an extraordinary source of inspiration for developing intelligent systems
and for providing superior solutions for numerous complicated problems.
In this paper, the behavior of real ants are being reviewed. Owing to the fact
that these ants are capable of exploring the shortest path between their food source
and their nest, many ant-based algorithms are proposed by researchers. Ant colony
optimization (ACO) algorithm is a prominent and successful swarm intelligence
technique. During the past decades, a considerable amount of research has been
conducted to develop the ACO-based algorithms as well as to develop its pragmatic
applications to tackle real-world problems. Inspired by real ants’ behavior and their
indirect communication, the ant colony optimization algorithm was proposed by
Marco Dorigo. Since then, it is gaining tremendous attention to research.
In ACO algorithms, ants (simple agents) are involved that collaborate for achieving
amalgamated behavior for the system and thereby develop a “robust” system that can
find superior-quality solutions for a variety of problems comprising of large search
space.
The paper reviews the basis of ACO. In Sect. 1, the behavior of real ant colonies is
depicted. In Sect. 2, the feature selection concepts are presented. In Sect. 3, numerous
existing ACO algorithms are reviewed.

1.1 Biological Background

While walking, individual ants deposit a chemical known as “Pheromone” on the


ground. Due to the accumulation of this chemical, a trail is created by the ants to mark
their path. When an ant discovers its food source, it will create a trail for marking
the path from its nest toward its food source and vice versa.
The other ants can detect the presence of pheromone, and they prefer the path
having more pheromone concentration. As the intensity of pheromone is supposed
to be higher in case of shortest paths toward a food source, so the other ants are
anticipated to follow the shortest path. Thus, the shorter pathway attracts more ants.
Individual ants can find a solution to a problem. However, by cooperating with one
another, they can find superior solutions.
Ant Colony Optimization: A Review of Literature … 287

1.2 Ant Colony Optimization (ACO) Algorithm

Ants are considered to be “social” insects, and they live in colonies. Ants drop
pheromone on the ground while traveling, which helps ants to explore the shortest
route. Probabilistically, every ant prefers to follow the path that is rich in pheromone
density. However, pheromone decays with time, thus leading to less pheromone
intensity on less popular paths. Therefore, the shortest route will have more number
of ants traversing, whereas other paths will be diminished till all ants pursue the
same shortest path leading the system to converge to a “single” solution. Over time,
the pheromone intensity decreases automatically. Practically, this pheromone evap-
oration is required for avoiding speedy convergence of this algorithm towards any
sub-optimal region.
Inspired by the behavior of real ants, the artificial ants were designed to solve
numerous optimization problems as they are capable of moving through the existing
problem states and making decisions at every step.
In ACO, basic rules are defined as:
• Problem is depicted as a graph with “nodes” representing features and “edges”
representing a choice of the subsequent feature.
• η denotes the “heuristic information” that is the goodness of path.
• “Pheromone updating rule” which updates pheromone level on edges.
• “Probabilistic transition rule” which finds the probability of ant for traversing to
a subsequent node (Table 1).

1.3 Advantages of ACO

• ACO algorithms are robust in nature; i.e., they are flexible according to the
changing dynamic applications.
• They have the advantage of distributed computations.
• They give positive feedback, which, in turn, brings about the revelation of optimal
solutions which may be further used in dynamic applications.
• They allow dynamic re-routing via shortest path algorithms if any node is broken.
• While analyzing real dimension networks, ACO algorithms allow network flows
to be calculated more fastly than traditional static algorithms [33].

Some more advantages of ACO are summarized in Fig. 1 as:

1.4 Generic Structure of the ACO Algorithm

Generic ACO algorithm is depicted below [34], which comprises of four major steps:
1. Initialization
288 N. Nayar et al.

Table 1 ACO applications


Application area Authors
Water resource management Maier, Holger R., Angus R. Simpson, Aaron C. Zecchin, Wai
Kuan Foong, Kuang Yeow Phang, Hsin Yeow Seah, and Chan
Lim Tan [1]
López-Ibáñez, Manuel, T. Devi Prasad, and Ben Paechter [2]
Zheng, Feifei, Aaron C. Zecchin, Jeffery P. Newman, Holger R.
Maier, and Graeme C. Dandy [3], Sidiropoulos, E., and D. Fotakis
[4], Shahraki, Javad, Shahraki Ali Sardar, and Safiyeh Nouri [5]
Protein structure prediction Do Duc, Dong, Phue Thai Dinh, Vu Thi Ngoc Anh, and Nguyen
Linh-Trung [6], Liang, Zhengping, Rui Guo, Jiangtao Sun, Zhong
Ming, and Zexuan Zhu [7]
Tele-communication Özmen, Mihrimah, Emel K. Aydoğan, Yılmaz Delice, and M.
Duran Toksarı [8], Di Caro, Gianni, and Marco Dorigo [9], Khan,
Imran, Joshua Z. Huang, and Nguyen Thanh Tung [10]
Feature selection Shunmugapriya, P., and S. Kanmani [11], Sweetlin, J. Dhalia, H.
Khanna Nehemiah, and A. Kannan [12], Mehmod, Tahir, and
Helmi B. Md Rais [13], Wan, Youchuan, Mingwei Wang, Zhiwei
Ye, and Xudong Lai [14], Ghosh, Manosij, Ritam Guha, Ram
Sarkar, and Ajith Abraham [15], Dadaneh, Behrouz Zamani,
Hossein Yeganeh Markid, and Ali Zakerolhosseini [16], Peng,
Huijun, Chun Ying, Shuhua Tan, Bing Hu, and Zhixin Sun [17],
Nayar Nandini, Sachin Ahuja, and Shaily Jain [18], Rashno,
Abdolreza, Behzad Nazari, Saeed Sadri, and Mohamad Saraee
[19], Saraswathi, K., and A. Tamilarasi [20]
Vehicle routing problems Ding, Qiulei, Xiangpei Hu, Lijun Sun, and Yunzeng Wang [21],
Yu, Bin, Zhong-Zhen Yang, and Baozhen Yao [22], Wu, Libing,
Zhijuan He, Yanjiao Chen, Dan Wu, and Jianqun Cui [23],
Huang, Gewen, Yanguang Cai, and Hao Cai [24], Xu, Haitao, Pan
Pu, and Feng Duan [25], Huang, Ying-Hua, Carola A. Blazquez,
Shan-Huen Huang, Germán Paredes-Belmar, and Guillermo
Latorre-Nuñez [26], Zhang, Huizhen, Qinwan Zhang, Liang Ma,
Ziying Zhang, and Yun Liu [27]
Robot path planning Brand, Michael, Michael Masuda, Nicole Wehner, and Xiao-Hua
Yu. [28], Chia, Song-Hiang, Kuo-Lan Su, Jr-Hung Guo, and
Cheng-Yun Chung [29], Cong, Yee Zi, and S. G. Ponnambalam
[30], Liu, Jianhua, Jianguo Yang, Huaping Liu, Xingjun Tian, and
Meng Gao [31], Deng, Gao-feng, Xue-ping Zhang, and Yan-ping
Liu [32]

In this step, all the pheromones and parameters are declared.


2. Formulate Ant Solutions
In this step, a group of ants formulate the solution of the problem that is being
solved by making use of pheromone values and other related information.
3. Local Search (optional)
In this step, the optimization of the built solution is constructed.
4. Global Pheromone Update
Ant Colony Optimization: A Review of Literature … 289

Fig.1 Benefits of ant colony optimization algorithm

In the last step, updations in pheromone variables are performed based on the
search background as echoed by ants.
Begin ACO Algorithm
initialization;
while (Iterate till end criteria is satisfied) do.
 Formulate ant solutions;

 Perform local search;

 Global pheromone update;
end
end End of ACO.

2 Feature Selection

As a consequence of technological innovations, a massive amount of data is generated


that leads to an increased number of dimensions in a dataset. For the discovery of
knowledge from these massive datasets, feature selection is imperative [35, 36].
As the number of features surpasses a definite limit, there exist a substantial
number of features that are redundant and independent, leading to poor classi-
fier performance. The purpose of feature selection is to reduce dimensionality for
290 N. Nayar et al.

performing significant data analysis. Feature selection is a significant task in data


mining as well as for pattern recognition. The goal of feature selection is to choose
an optimal feature subset having the maximum discriminating ability as well as
minimum redundancy [14]. It is imperative to develop a classification model, suit-
able for dealing with problems having different sample size as well as different
dimensionality.
The vital tasks include a selection of valuable features and an apt classification
method. The idea of feature selection involves choosing a minimal feature subset,
i.e., the best feature subset comprising of k-features yielding least amount of gener-
alization errors. It is expected that feature selection techniques are utilized either as
preprocessing step or in combination with a learning model for the task of classifica-
tion. The set of all original features are given as input to feature selection methods,
which will subsequently generate “feature subsets.” Then, the subset that is selected
is evaluated by learning algorithm or through consideration of data characteristics.

2.1 Need for Feature Selection:

1. If the number of features is extremely large, it becomes a complicated task to


work with all available features.
2. Most of the available features are redundant, noisy or irrelevant to the classifica-
tion or clustering task.
3. If the number of features exceeds the number of data input data points, it becomes
a problem.
4. To decrease the computation cost as well as training time.
5. To avoid the curse of dimensionality.
6. To provide better model interpretability and readability.

2.2 ACO for Feature Selection

Motivated by numerous benefits possessed by ACO algorithm, it is widely being


used for performing the task of feature selection. The numerous advantages of ACO
include its powerful search ability, ability to converge expeditiously, thereby leading
to efficient exploration of minimal feature subset [37]. ACO-based methods for
feature selection are fairly prominent as they apply knowledge from previous itera-
tions and thus achieve optimum solutions [38]. As compared to other conventional
approaches, ACO is fast and simple. It is considered to be one of the most preferred
approaches for resolving various complex problems [39]. In ACO, the problem as
a graph is represented, where nodes correspond to features and edges denote the
choice of a subsequent feature.
Ant Colony Optimization: A Review of Literature … 291

3 Literature Review: ACO for Feature Selection

Several ACO-based algorithms have been proposed by researchers. In this section,


a review of ACO-based algorithms is presented that is used for feature selection.
Subset evaluation based on “memory” is introduced [15] for keeping the best ants
and pheromone update that is feature-dimension dependent for selecting a feature
set in a multi-objective manner. The approach was tested on numerous life datasets
making use of multi-layer perceptron and K-nearest neighbors classifiers.
In [40] a hybrid approach comprising of ant colony optimization and k-nearest
neighbor (KNN) is proposed for the selection of valuable features from customer
review dataset. The performance of this algorithm was evaluated and validated
by parametric significance tests. Results prove that this technique is valuable for
providing a superior feature set for customer review data.
A computer-aided diagnostic system is developed [12] for detecting “pulmonary
hamartomas” from lung CT images. For selecting relevant features, the ACO algo-
rithm is used, which train the SVM and naïve Bayes classifiers, to mark nonexistence
or existence of the hamartoma nodule. Results demonstrate that from features selected
by ACO-RDM approach yields superior accuracy (94.36%) with the SVM classifier.
Using the CT images of the lungs, [41] proposed an ACO-based method for
selecting relevant features for enhancing accuracy for diagnosis of pulmonary bron-
chitis. The approach comprises of ACO with cosine similarity and SVM classifier.
Furthermore, the tandem run recruitment strategy assists in choosing the best features.
Results demonstrate that the ACO algorithm, with a tandem run strategy, yields an
accuracy of 81.66%. In [19], the feature space is reduced by ACO, which decreases the
time complexity that is a vital concept in Mars on-board applications. The proposed
method reduces feature size to a great extent (up to 76%) and yields high accu-
racy (91.05%), thereby outperforming the conventional approach that yields 86.64%
accuracy.
By combining traits of ant colony and bee colony, [11] proposed a hybrid AC-ABC
algorithm for optimizing feature selection. The approach eliminates the stagnated
behavior of ants, as well as the time-consuming search for the initial solutions of
bees. The approach is evaluated on 13 datasets and shows promising results in terms
of optimal features selected and classification accuracy.
Reference [20] proposed for ant colony optimization-based algorithm for feature
selection for extracting feature sets from reviews by making use of term frequency-
inverse document frequency (TF-IDF). Furthermore, the selected features are clas-
sified by SVM or naïve Bayes classifier. The results obtained demonstrate that the
approach is efficient for classifying opinions.
Reference [42] presented SMMB-ACO method by combining Markov blanket
learning with stochastic and ensemble features, thereby guiding the stochastic
sampling process by including ACO. The experimental results demonstrate that the
proposed method is more stable as compared to SMMB. [14] presented an approach
for feature selection, which is based on modified binary-coded ant colony optimiza-
tion (MBACO) algorithm integrated with genetic algorithm. There are two models:
292 N. Nayar et al.

“pheromone density model” and “visibility density model.” The approach is vali-
dated on ten datasets obtained from the UCI repository. The results demonstrate that
the proposed method is capable of keeping a fair balance on classification accuracy
as well as efficiency, thereby making it apt for feature selection applications.
For enhancing the stability of feature selection, [17] proposed FACO, an improved
algorithm for feature selection, that is based on ACO. It uses two-stage pheromone
updating rule that averts the algorithm from falling into premature local optimum and
is validated on KDD CUP99 dataset. The outcomes demonstrate that the algorithm
has great practical significance as it enhances classification accuracy as well as the
efficiency of a classifier.
By conducting the literature review, some gaps can be identified, so can be
concluded that “construction algorithms” are not able to provide superior-quality
solutions and may not be optimal according to “minor” changes. Some challenges in
ACO include exploring superior pheromone models and to uphold apt balance among
“exploration” as well as “exploitation” of search space. Moreover, there is a need
for adaptive intelligent models having the capability of automatically identifying
dynamic alterations of dataset’s characteristics, thereby upgrading the algorithms in
a self-directed manner.
After carrying out an extensive literature review of ACO in feature selection, the
value of accuracies achieved by various ACO variants is summarized in Table 2.

4 Current Trends

The recent developments of ACO algorithms involve hybridization with other


renowned meta-heuristics or with other mathematical programming techniques.
Moreover, many capable schemes for parallelization of ACO algorithms are also
being proposed.
The problems comprising of conflicting, as well as multiple objectives, also need
to be addressed by exploring a solution that provides superior compromise among
various other objectives. According to [53], ACO algorithms are effectively used in
numerous technology fields. ACO consequently has made an effectual contribution in
digital image processing and monitoring of structural damage. Furthermore, ACO has
gained much attention in solving the issues related to economic dispatch problems.
Researchers are also exploring the capability of ACO for data clustering, scheduling
and routing problems.
Present ACO applications come under two types of classes: Static as combinatorial
optimization problems [54].
• Static problems can be defined as the problems wherein topology and price do
not change while resolving the problems, e.g., the traveling salesman problem
where the city location and intercity distance do not change during run-time of
an algorithm.
Ant Colony Optimization: A Review of Literature … 293

Table 2 Comparison of classification accuracy for test datasets


Dataset Algorithm Accuracy achieved (%)
Sonar ACO-BSO 99.09
Statlogheart PSO [43] 95.53
Zali ACO-BSO 86.56
PSO [43] 84.89
ACO-BSO 88.35
PSO [43] 84.29
Cleveland dataset ACO-HKNN [44] 99.2
SVM [44] 97.74
Naïve Bayes [44] 96.21
STULONG dataset ACO [45] 80.75
ACO/PSO [45] 78.1
ACO/PSO with new Q [45] 87.43
Labor ACO-CE 78.94
Ionosphere PSO [46] 70.17
ACO-CE 91.16
PSO [46] 86.32
Sonar BACO 86.0
Ion ABACO [47] 91.0
Vehicle BACO 92.1
ABACO [47] 93.3
BACO 76.9
ABACO [47] 78.7
Soyabean-large-MD Genetic search 98.33
Heart ACO [48] 99.02
SPECT Genetic search 84.81
ACO [48] 85.92
Genetic search 70.03
ACO [48] 75.65
Perturbed-breast dataset ACO-based search 97.14
Perturbed-dermatology dataset PSO-based search [49] 97.14
ACO-based search 88.28
PSO-based search [49] 80.9
Harvard
NSL-KDD PSO [50] 96.04
ACO [50] 98.13
ABC [50] 98.9
Reuter’s dataset ACO [51] 79.02
ACO-ANN [51] 81.35
GA [51] 78.27
German traffic signs ACO + ANN [52] 82.22
ACO + SVM [52] 88.95
ACO + EFSVM [52] 92.39
(continued)
294 N. Nayar et al.

Table 2 (continued)
Dataset Algorithm Accuracy achieved (%)
Hepatitis ACO-PSO [11] 75.34
cancer ABC-DE [11] 71.26
AC-ABC [11] 79.29
ACO-PSO [11] 87.06
ABC-DE [11] 96.01
AC-ABC [11] 99.43

• Dynamic problems are those problems where the topology and cost change
even when the solutions are being built. For example routing problem in
telecommunications network where traffic patterns keep on changing everywhere.
The ACO-based algorithms for addressing these kinds of problems are the same in
general, but they vary unquestionably in implementation details. Of late, ACO algo-
rithms have raised a lot of speculation among the researchers. Nowadays, there are
various successful implementation of ACO algorithms which are utilized to a broad
scope of various combinatorial optimization problems. Such kinds of applications
come under two broad application areas:
• NP-hard problems:
• For these problems, the best-noted algorithms have found to have the exponential
time worst-case complexity. Most ant-based algorithms are equipped with more
abilities like problem-specific local optimizers that obtain the ant solution to local
optima.
• Shortest path problems:
• In these problems, the properties of the problem’s graph representation may
vary over time (synchronously) with the optimization method, which needs to be
adapted to problem dynamics. In such a scenario, the graph can be made available
but its properties (cost of components, connections) may vary over time.
• In such cases, can be concluded that the use of ACO algorithms is recommended
as the variation rate of the cost augments but the know-how of the variation process
decreases.

5 Conclusion and Future Scope

From the literature studied, it can be inferred that the identification of pertinent
and valuable features for training the classifier impacts the performance of the clas-
sifier model. ACO has been and proceeds to be a productive paradigm for struc-
turing powerful combinatorial solutions for optimization problems. In this paper,
the origin and biological background of the ACO algorithm is presented and several
application areas of ACO. Finally, a survey of ACO used in the domain of feature
selection is presented. ACO algorithm has become one of the most popular meta-
heuristic approaches to resolve various combinatorial problems. The previous ACO
Ant Colony Optimization: A Review of Literature … 295

versions were not good enough to compete with other well-known algorithms, but
the outcomes were promising enough to open new avenues for exploring this area.
Since then, many researchers have explored the basic ACO algorithm and updated
it for obtaining promising results. This paper focuses on outlining the latest ACO
developments in terms of algorithms as well as ACO applications. Applications like
multi-objective optimization and feature selection are the main targets of recent ACO
developments. For enhancing the performance of ACO algorithms, these algorithms
are further combined with existing meta-heuristic methods and inter-programming
techniques. A clear improvement in results for different problems has been shown by
hybridization of ACO algorithms. Implementation of ACO algorithms with parallel
versions has been seen in latest trends. Due to the use of multi-core CPU architectures
and GPUs, the creation of enhanced parallel versions of ACO algorithms is possible.

References

1. Maier HR, Simpson AR, Zecchin AC, Foong WK, Phang KY, Seah HY, and Tan CL (2003)
Ant colony optimization for design of water distribution systems. J Water Resour Plan Manage
129(3):200–209
2. López-IbáñezM, Prasad TD, Paechter B (2008) Ant colony optimization for optimal control of
pumps in water distribution networks. J Water Resour Plann Manage 134(4):337–346
3. Zheng F, Zecchin AC, Newman JP, Maier HR, Dandy GC (2017) An adaptive convergence-
trajectory controlled ant colony optimization algorithm with application to water distribution
system design problems. IEEE Trans Evol Comput 21(5):773–791
4. Sidiropoulos E, Fotakis D (2016) Spatial water resource allocation using a multi-objective ant
colony optimization. Eur Water 55:41–51
5. Shahraki J, Sardar SA, Nouri S (2019) Application of met heuristic algorithm of ant Colony
optimization in optimal allocation of water resources of Chah-Nime of Sistan under managerial
scenarios. IJE 5(4):1
6. Do Duc D, Dinh PT, Anh VTN, Linh-Trung N (2018) An efficient ant colony optimization
algorithm for protein structure prediction. In: 2018 12th international symposium on medical
information and communication technology (ISMICT), pp 1–6. IEEE
7. Liang Z, Guo r, Sun J, Ming Z, Zhu Z (2017) Orderly roulette selection based ant colony
algorithm for hierarchical multilabel protein function prediction. Math Prob Eng
8. Özmen M, Aydoğan EK, Delice Y, Duran Toksarı M (2020) Churn prediction in Turkey’s
telecommunications sector: a proposed multiobjective–cost-sensitive ant colony optimization.
Wiley Interdisc Rev Data Min Knowl Disc 10(1):e1338
9. Di Caro G, Dorigo M (2004) Ant colony optimization and its application to adaptive routing
in telecommunication networks. PhD diss., PhD thesis, Faculté des Sciences Appliquées,
Université Libre de Bruxelles, Brussels, Belgium
10. Khan I, Huang JZ, Tung NT (2013) Learning time-based rules for prediction of alarms from
telecom alarm data using ant colony optimization. Int J Comput Inf Technol 13(1):139–147
11. Shunmugapriya P, Kanmani S (2017) A hybrid algorithm using ant and bee colony optimization
for feature selection and classification (AC-ABC Hybrid). Swarm Evol Comput 36:27–36
12. Sweetlin JD, Nehemiah HK, Kannan A (2018) Computer aided diagnosis of pulmonary hamar-
toma from CT scan images using ant colony optimization based feature selection. Alexandria
Eng J 57(3):1557–1567
13. Mehmod T, Md Rais HB (2016) Ant colony optimization and feature selection for intrusion
detection. In: Advances in machine learning and signal processing, pp 305–312. Springer,
Cham
296 N. Nayar et al.

14. Wan Y, Wang M, Ye Z, Lai X (2016) A feature selection method based on modified binary
coded ant colony optimization algorithm. Appl Soft Comput 49:248–258
15. Ghosh M, Guha R, Sarkar R, Abraham A (2019) A wrapper-filter feature selection technique
based on ant colony optimization. Neural Comput Appl:1–19
16. Dadaneh BZ, Markid HY, Zakerolhosseini A (2016) Unsupervised probabilistic feature
selection using ant colony optimization. Expert Syst Appl 53:27–42
17. Peng H, Ying C, Tan S, Bing Hu, Sun Z (2018) An improved feature selection algorithm based
on ant colony optimization. IEEE Access 6:69203–69209
18. Nandini N, Ahuja S, Jain S (2020) Meta-heuristic Swarm Intelligence based algorithm for
feature selection and prediction of Arrhythmia. Int J Adv Sci Technol 29(2):61–71
19. Rashno A, Nazari B, Sadri S, Saraee M (2017) Effective pixel classification of mars
images based on ant colony optimization feature selection and extreme learning machine.
Neurocomputing 226:66–79
20. Saraswathi K, Tamilarasi A (2016) Ant colony optimization based feature selection for opinion
mining classification. J Med Imaging Health Inf 6(7):1594–1599
21. Ding Q, Xiangpei Hu, Sun L, Wang Y (2012) An improved ant colony optimization and its
application to vehicle routing problem with time windows. Neurocomputing 98:101–107
22. Yu B, Yang Z-Z, Yao B (2009) An improved ant colony optimization for vehicle routing
problem. Eur J Oper Res 196(1):171–176
23. Wu L, He Z, Chen Y, Dan Wu, Cui J (2019) Brainstorming-based ant colony optimization for
vehicle routing with soft time windows. IEEE Access 7:19643–19652
24. Huang G, Cai Y, Cai H (2018) Multi-agent ant colony optimization for vehicle routing problem
with soft time windows and road condition. In: MATEC web of conferences, vol 173, p 02020.
EDP Sciences
25. Xu H, Pu P, Duan F (2018) Dynamic vehicle routing problems with enhanced ant colony
optimization. Discrete Dyn Nat Soci 2018
26. Huang Y-H, Blazquez CA, Huang S-H, Paredes-Belmar G, Latorre-Nuñez G (2019) Solving the
feeder vehicle routing problem using ant colony optimization. Comput Ind Eng 127:520–535
27. Zhang H, Zhang Q, Ma L, Zhang Z, Liu Y (2019) A hybrid ant colony optimization algorithm
for a multi-objective vehicle routing problem with flexible time windows. Inf Sci 490:166–190
28. Brand M, Masuda M, Wehner N, Yu X-H (2010) Ant colony optimization algorithm for robot
path planning. In: 2010 international conference on computer design and applications, vol 3,
pp V3–436. IEEE
29. Chia S-H, Su K-L, Guo J-R, Chung C-Y (2010) Ant colony system based mobile robot path
planning. In: 2010 fourth international conference on genetic and evolutionary computing, pp
210–213. IEEE
30. Cong YZ, Ponnambalam SG (2009) Mobile robot path planning using ant colony optimization.
In: 2009 IEEE/ASME international conference on advanced intelligent mechatronics, pp 851–
856. IEEE
31. Liu J, Yang J, Liu H, Tian X, Gao M (2017) An improved ant colony algorithm for robot path
planning. Soft Comput 21(19):5829–5839
32. Deng G-F, Zhang X-P, Liu Y-P (2009) Ant colony optimization and particle swarm optimization
for robot-path planning in obstacle environment. Control Theory Appl 26(8):879–883
33. Deepa O, Senthilkumar A (2016) Swarm intelligence from natural to artificial systems: ant
colony optimization. Networks (Graph-Hoc) 8(1):9–17
34. Akhtar A (2019) Evolution of ant colony optimization algorithm—a brief literature review. In:
arXiv: 1908.08007
35. Nayar N, Ahuja S, Jain S (2019) Swarm intelligence for feature selection: a review of literature
and reflection on future challenges. In: Advances in data and information sciences, pp 211–221.
Springer, Singapore
36. Manoharan S (2019) Study on Hermitian graph wavelets in feature detection. J Soft Comput
Paradigm (JSCP) 1(01):24–32
37. Aghdam MH, Kabiri P (2016) Feature selection for intrusion detection system using ant colony
optimization. IJ Netw Secur 18.3:420–432
Ant Colony Optimization: A Review of Literature … 297

38. Aghdam MH, Ghasem-Aghaee N, Basiri ME (2009) Text feature selection using ant colony
optimization. Expert Syst Appl 36(3):6843–6853
39. Shakya S, Pulchowk LN, A novel bi-velocity particle swarm optimization scheme for multicast
routing problem
40. Ahmad SR, Yusop NMM, Bakar AA, Yaakub MR (2017) Statistical analysis for vali-
dating ACO-KNN algorithm as feature selection in sentiment analysis. In: AIP conference
proceedings, vol 1891(1), p 020018. AIP Publishing LLC
41. Sweetlin JD, Nehemiah HK, Kannan A (2017) Feature selection using ant colony optimization
with tandem-run recruitment to diagnose bronchitis from CT scan images. Comput Methods
Programs in Biomed 145:115–125
42. Sinoquet C, Niel C (2018) Ant colony optimization for markov blanket-based feature selec-
tion. Application for precision medicine. In: International conference on machine learning,
optimization, and data science, pp 217–230. Springer, Cham
43. Liang H, Wang Z, Liu Yi (2019) A new hybrid ant colony optimization based on brain storm
optimization for feature selection. IEICE Trans Inf Syst 102(7):1396–1399
44. Sowmiya C, Sumitra P (2020) A hybrid approach for mortality prediction for heart patients
using ACO-HKNN. J Ambient Intell Humanized Comput
45. Mangat V (2010) Swarm intelligence based technique for rule mining in the medical domain.
Int J Comput Appl 4(1):19–24
46. Naseer A, Shahzad W, Ellahi A (2018) A hybrid approach for feature subset selection using
ant colony optimization and multi-classifier ensemble. Int J Adv Comput Sci Appl IJACSA
9(1):306–313
47. Kashef S, Nezamabadi-pour H (2013) A new feature selection algorithm based on binary ant
colony optimization. In: The 5th conference on information and knowledge technology, pp
50–54. IEEE
48. Jameel S, Ur Rehman S (2018) An optimal feature selection method using a modified wrapper-
based ant colony optimisation. J Natl Sci Found Sri Lanka 46(2)
49. Selvarajan D, Jabar ASA, Ahmed I (2019) Comparative analysis of PSO and ACO based feature
selection techniques for medical data preservation. Int Arab J Inf Technol 16(4):731–736
50. Khorram T, Baykan NA (2018) Feature selection in network intrusion detection using
metaheuristic algorithms. Int J Adv Res Ideas Innovations Technol 4(4)
51. Manoj RJ, Praveena MDA, Vijayakumar K (2019) An ACO–ANN based feature selection
algorithm for big data. Cluster Comput 22(2):3953–3960
52. Jayaprakash A, KeziSelvaVijila C (2019) Feature selection using ant colony optimization
(ACO) and road sign detection and recognition (RSDR) system. Cogn Syst Res 58:123–133
53. Nayyar A, Le DN, Nguyen NG (eds) (2018) Advances in swarm intelligence for optimizing
problems in computer science. CRC Press (Oct 3)
54. Dorigo M, Stützle T (2019) Ant colony optimization: overview and recent advances. In:
Handbook of metaheuristics, pp 311–351. Springer, Cham
Hand Gesture Recognition Under
Multi-view Cameras Using Local Image
Descriptors

Kiet Tran-Trung and Vinh Truong Hoang

Abstract Hand gesture recognition has various applications in recent years such as
robotics, e-commerce, human–machine interaction, e-sport, and assisting people with
hearing-impaired humans. The latter is the most useful and interesting application
in our daily life. Nowadays, cameras can be installed easily and everywhere. So,
gesture recognition faces the most challenging issue under image acquisition by
multiple cameras. This paper introduces an approach for hand gesture recognition
under multi-views cameras. The proposed approach is evaluated on the HGM-4
benchmark dataset by using local binary patterns.

Keywords Hand gesture recognition · Local image descriptor · Multi-view


cameras

1 Introduction

The hand gesture is a typical and basic tool of humans for conversation. It is very
difficult to train someone to learn and understand all gestures-based sign language in
a short time. Many intelligent systems are proposed to automatically recognize and
understand those gestures. Hand gesture recognition has received a lot of attention
from many vision scientists in the last decade. It is a core process of smart home,
contactless device, and multimedia systems [1, 2]. Various methods are proposed
in literature which is based on image analysis. Dinh et al. [3] proposed a method
for analyzing hand gesture sequence images by using the hidden Markov model and
evaluates on the one- and two-hand gestures databases. Tavakoli et al. [14] recognize
hand gestures based on EMG wearable devices and SVM classifiers. Chaudhary et al.
[2] introduced a method based on light invariant for hand gesture recognition. They
applied a technique for extracting features by orientation histogram on the region of

K. Tran-Trung (B) · V. T. Hoang


Ho Chi Minh City Open University, Ho Chi Minh City, Vietnam
e-mail: kiet.tt@ou.edu.vn
V. T. Hoang
e-mail: vinh.th@ou.edu.vn

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 299
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_23
300 K. Tran-Trung and V. T. Hoang

interest. Chansri et al. [1] presented a method based on HOG descriptor and neural
network for Thai sign language recognition.
Since cameras are installed at any outdoor or indoor position, the modern hand
gesture faces a challenging issue due to the multi-views in different angles of acqui-
sition. Figure 1 illustrates an example of one hand gesture under four cameras at
different positions. The distinct views can be seen and illusion from a unique hand
gesture. The problem of hand gestures under multi-views has been investigated in
[11, 12]. The authors introduced a method to fuse features extracted from different
cameras with two hand gestures. There exist a few public and gesture datasets in
literature [10, 13]. All images are usually captured by one camera. Recently, Hoang
[4] surveyed different hand gesture databases with multi-views and released a new
dataset (HGM-4) under four different cameras for Vietnamese sign language recog-
nition. This paper presents a preliminary result on the HGM-4 dataset based on a local
image descriptor. The local binary pattern (LBP) [8, 9] is considered to represent a
hand gesture image since it is an efficient approach and fast computing for character-
izing texture image [7]. The remainder of this paper is organized as follows. Section 2
introduces the LBP descriptor, the proposed approach, and the experimental results
on HGM-4 dataset. Finally, the conclusion is given in Sect. 3.

Fig. 1 An example of one-hand gesture captured by four different cameras


Hand Gesture Recognition Under Multi-view Cameras … 301

2 Proposed Approach

2.1 Local Binary Patterns

Local binary patterns (LBP) are obtained by computing the local neighborhood struc-
ture for representing the texture around each pixel of the image from a square neigh-
borhood of 3 × 3 pixels. The L B P P,R (xc , yc ) code of each pixel (xc , yc ) is calculated
by comparing the gray value gc of the central pixel with the gray values {gi }i=0 P−1
of
its P neighbors, by this formula:


P−1
LBP P,R (xc , yc ) = (gi − gc ) × 2i (1)
i=0

where  is a threshold function which is computed as:



1 if (gi − gc ) ≥ 0
(gi − gc ) = (2)
0 otherwise

2.2 Proposed Approach

Several LBP patterns occur more frequently in texture images than others. The
authors in [15] proposed to define the “LBP uniform pattern” LBPu2 P,R which is a
subset of the original LBP [8]. For this, they consider a uniformity measure of a
pattern which analyzes the number of bitwise transitions from 0 to 1 or vice versa
when a circular bit transformation is applied. An LBP is named uniform if the tran-
sition is achieved at most 2. For example, the patterns 11111111 (0 transitions),
00011110, and 11100111 are uniform, and the pattern 00110010 is not. The final
features obtained from image patch are better and more representative than extracting
from a global image [7, 15]. To extract features from multi-blocks, each original
image is proposed. The features extracted from these blocks of each color compo-
nent are then fused to create a final feature vector, e.g., having a vector with 59 × 3 =
177 features, for an original image without division. An illustration of the proposed
approach is presented in Fig. 2.
The HGM-4 [4] is a benchmark dataset for hand gestures under multi-camera.
Table 1 presents the characteristic of this database. Four cameras are installed at
different positions to capture hand gesture images and have 26 distinct gestures which
are performed by five different persons. Figure 3 illustrates different images of the
same gesture under one camera (left camera). Since all images are segmented to have
a uniform background, this problem is more challenging in a complex background.
302 K. Tran-Trung and V. T. Hoang

Fig. 2 Proposed approach

Table 1 Characteristics of HGM-4 benchmark dataset


Camera Gestures and number of images per Number of performing person Total
gesture
Below 26 × 8 5 1040
Front 26 × 8 5 1040
Left 26 × 8 5 1040
Right 26 × 8 5 1040

Fig. 3 A gesture performing by different volunteers under one camera

2.3 Results

Cross-validation method is applied by hold out a technique on the initial dataset to


create training and testing subset. Different ratios are considered as: 50:50; 70:30;
80:20, and 90:10. Seven strategies are applied to divide the whole image into multi-
blocks. Table 2 illustrates the classification results by using LBP uniforms features
extracted from color images. The first column indicates the number of blocks used
to split the original image. For the decomposition at 50:50, the best accuracy is
achieved at 78.43% with 7 × 7 blocks. Similarly, better results are always obtained
Hand Gesture Recognition Under Multi-view Cameras … 303

Table 2 Classification performance by 1-NN classifier and LBP uniform on the HGM-4 dataset
Decomposition (Train:Test)
Number of blocks 50:50 70:30 80:20 90:10
1×1 57.35 61.90 63.38 62.29
2×2 72.87 77.84 77.91 79.58
3×3 75.81 79.51 81.30 81.97
4×4 77.39 80.90 83.28 83.54
5×5 77.83 82.46 83.33 85.16
6×6 78.35 83.22 84.01 85.79
7×7 78.43 83.02 85.73 86.58

by using this number of blocks. This can confirm the extraction approach based on
block division for extracting LBP uniform features as in [15].

3 Conclusion

This paper presented an approach for hand gesture recognition under multi-views
cameras. The LBP uniform descriptor is used to perform the features extraction from
color images on the HGM-4 benchmark dataset. First, this work is now extending
to enhance the recognition rate by fusing many local image descriptors and deep
features. Second, a fusion scheme should be proposed by capturing all information
from different cameras.

References

1. Chansri C, Srinonchat J (2016) Hand gesture ecognition for Thai sign language in complex
background using fusion of depth and color video. Procedia Comput Sci 86:257–260
2. Chaudhary A (2018) Light invariant hand gesture recognition. In: Robust hand gesture
recognition for robotic hand control, pp 39–61. Springer
3. Dinh DL, Kim JT, Kim TS (2014) Hand gesture recognition and interface via a depth imaging
sensor for smart home appliances. Energy Procedia 62:576–582
4. Hoang VT (2020) HGM-4: a new multi-cameras dataset for hand gesture recognition. Data
Brief 30:105676
5. Just A, Marcel S (2009) A comparative study of two state-of-the-art sequence processing
techniques for hand gesture recognition. Comput Vis Image Underst 113(4):532–543
6. Lee AR, Cho Y, Jin S, Kim N (2020) Enhancement of surgical hand gesture recognition using a
capsule network for a contactless interface in the operating room. Comput Methods Programs
Biomed 190;105385 (Jul 2020)
7. Nhat HTM, Hoang VT (2019) Feature fusion by using LBP, HOG, GIST descriptors and
Canonical Correlation Analysis for face recognition. In: 2019 26th international conference on
telecommunications (ICT), pp 371–375 (Apr 2019)
304 K. Tran-Trung and V. T. Hoang

8. Ojala T, Pietikäinen M, Harwood D (1996) A comparative study of texture measures with


classification based on featured distributions. Pattern Recogn 29(1):51–59
9. Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution grayscale and rotation invariant
texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell
24(7):971–987
10. Pisharady PK, Saerbeck M (2015) Recent methods and databases in vision-based hand gesture
recognition: a review. Comput Vis Image Underst 141:152–165
11. Poon G, Kwan KC, Pang WM (2018) Real-time multi-view bimanual gesture recognition. In:
2018 IEEE 3rd international conference on signal and image processing (ICSIP), pp 19–23.
IEEE, Shenzhen (Jul 2018)
12. Poon G, Kwan KC, Pang WM (2019) Occlusion-robust bimanual gesture recognition by fusing
multi-views. Multimedia Tools Appl 78(16):23469–23488
13. Ruffieux S, Lalanne D, Mugellini E, Abou Khaled O (2014) A survey of datasets for human
gesture recognition. In: International conference on human-computer interaction,pp 337–348.
Springer
14. Tavakoli M, Benussi C, Alhais Lopes P, Osorio LB, de Almeida AT (2018) Robust hand gesture
recognition with a double channel surface EMG wearable armband and SVM classifier. Biomed
Signal Process Control 46:121–130
15. Van TN, Hoang VT (2019) Kinship verification based on local binary pattern features coding
in different color space. In: 2019 26th international conference on telecommunications (ICT),
pp 376–380 (Apr 2019)
Custom IP Design for Fault-Tolerant
Digital Filters for High-Speed Imaging
Devices

Somashekhar Malipatil, Avinash Gour, and Vikas Maheshwari

Abstract Digital filters are most commonly used in signal processing and communi-
cation systems. The fault-tolerant filters are required when the system is unreliable.
Many methodologies have been proposed to defend digital filters from errors. In
this paper, fault-tolerant finite impulse response have been designed using error-
correcting codes and Hamming codes with efficient coded in hardware descriptive
language Verilog. In this paper, we have designed custom IP for fault-tolerant digital
filter with reduced power dissipation and with high speed. This work is concentrating
on creating and packaging custom IP. The proposed design of custom IP fault-tolerant
digital filter is synthesized in Xilinx Vivado 2018.3 and selected Xilinx Zynq-7000
SoC ZC702 evaluation board.

Keywords Custom IP · Vivado 2018.3 · Xilinx Zynq-7000 · Fault tolerance · FIR


filter · VLSI · Verilog · ECC · Hamming codes

1 Introduction

Digital filters are the essential devices in the digital signal processing system and
recently used in several applications such as video processing, wireless communi-
cations, image processing, and many imaging devices. The use of digital circuits is
exponentially increasing in space, and automotive and medical applications in reli-
ability are critical. In this type of designs, a designer has to adopt some degree of
fault tolerance. This requirement further increases in CMOS technologies that are
soft errors and manufacturing variations [1].
The generally used hardware redundancy techniques are double modular redun-
dancy and triple modular redundancy (TMR) [2]. These methods are suitable to

S. Malipatil (B) · A. Gour


Department of Electronics & Communication Engineering, Sri Satya Sai University of
Technology & Medical Sciences (SSSUTMS), Sehore, Madhya Pradesh, India
e-mail: somashekhar49@gmail.com
V. Maheshwari
Department of Electronics & Communication Engineering, Bharat Institute of Engineering &
Technology, Hyderabad, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 305
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_24
306 S. Malipatil et al.

identify or detect the errors but consume more area to implement these techniques.
These names itself indicates double and triple and so needs similar structures in
parallel to detect faults.
In this [3], author proposed FIR filter using reduced precision replicas which
was designed to minimize the cost of implementing modular redundancy. Some
researchers used different implementation methodologies using only one redundant
to rectify errors [4].
The new method to protect parallel filters is generally used based on ECC in
modern signal processing to the outputs of the parallel filter to identify and correct
errors. A discrete-time filter [5] is implemented by Eq. (1). In this Eq. (1), Y [n]
represents output, x[n] is input signal, and h[i] represents impulse response.


Y [n] = x[n − 1] · h[i]. (1)
i=0

In system on a chip, the custom IP blocks are used to improve productivity. Custom
IP consists of pre-designed blocks that are easy and can be used in bigger. In custom,
IP is divided into two types, one is hard IP and another one is soft IP. Usually, hard
IP will have a pre-designed layout, but soft IP comes as a synthesizable module [6].

2 Proposed Work

In the world of embedded systems, it is the engineer’s responsibility to minimize


cost and increase the performance. To achieve this, our intellectual properties (IP)
have been designed.
In this work, FIR filter with fault-tolerant have been designed and also designed
custom IP for fault-tolerant FIR filter. Firstly, FIR filter using error-correcting codes
and Hamming codes have been designed with efficient coding used in Verilog. The
block diagram of the fault-tolerant FIR filter is shown in Fig. 2. This design includes
five main blocks code generation, syndrome, memory, bit, and info block. In this,
seven bits have been designed. The block diagram of information signal is shown
in Fig. 3 and internally it consists multiplexer. The block diagram of syndrome is
shown in Fig. 5. Table 1 shows the properties of designed custom IP (Figs. 1 and 4).
In this work, a simple custom IP have been packaged and added it into a user repos-
itory location. In the process, specified vendor, library, name, and version (VLNV)
information, added device compatibility, specified IP parameters, explored many
possible file groups (though you used only the basic file groups for this lab), specified
ports, and defined interfaces.
After packaging the IP, configured it from the Vivado IP catalog for use in design
and directly instantiated it. A block design is also used to instantiate the IP along with
the clock and experienced the benefits of graphically connecting interfaces (without
needing to know the details of the individual signals).
Custom IP Design for Fault-Tolerant … 307

Table 1 Properties of IP
İP properties
fault_tolerant_v1_0
Version 1.0 (Rev. 2)
Description fault_tolerant_v1_0
Status Production
License Included
Change log View change log
Vendor Xilinx, Inc
VLNV xilinx.com:user.fault_tolerant:1.0

Fig. 1 Block diagram of FIR filter

Fig. 2 Block diagram of fault-tolerant FIR filter

Fig. 3 Block diagram of


information block
308 S. Malipatil et al.

Fig. 4 Internal structure of the information block

Fig. 5 Syndrome

In this paper, a fault-tolerant digital FIR filter have been designed with reduced
power and area efficient by using ECC codes and avoiding TMR and DMR method-
ologies and also designed custom IP for fault-tolerant FIR filter using Xilinx
Vivado 2018.30 version and implemented on Xilinx Zynq-7000 SoC ZC702 evalua-
tion board. This proposed custom IP produces similar outcomes as that of the existing
module. The total on-chip power consumption is 1.803 W including dynamic and
static power consumption. The area has analyzed based on resource utilization.
Custom IP Design for Fault-Tolerant … 309

In this design, error-correcting codes and Hamming codes have been used to
design fault-tolerant FIR filter and also designed custom IP. This designed IP is
reusable and cost efficient. In this design, the check bits are produced by an XOR-
tree to the G matrix. The syndrome is generated by an XOR network corresponding
to the H matrix. No error is detected if the syndrome is zero vector.
⎡ ⎤
1000111
⎢0 1 0 0 1 1 0⎥
G=⎢ ⎥
⎣0 0 1 0 1 0 1⎦ (2)
0001011
⎡ ⎤
1110100
H = ⎣1 1 0 1 0 1 0⎦ (3)
1011001

The encoding is obtained by Eq. (4), and error is detected by computing Eq. (5).

out = x · G (4)

s = out · H T (5)

Syndrome structure consists internally of XOR, mux, and latches. It will scan for
the error, and if no error found, the signals memen1 and write will set to logic ‘1’
(Figs. 6 and 7).

Fig. 6 Package IP
310 S. Malipatil et al.

Fig. 7 Final designed custom IP for fault-tolerant FIR filter

3 Results and Discussions

3.1 Synthesis Results

See Table 2 and Graph 1.

Table 2 Resource utilization


Resource Estimation Available Utilization in %
summary
LUT 17 53,200 0.03
FF 7 106,400 0.01
BRAM 0.50 140 0.36
IO 36 200 18.00
BUFG 1 32 3.13

Graph 1 Resource utilization summary


Custom IP Design for Fault-Tolerant … 311

Fig. 8 Power analysis

3.2 Power Analysis

The power analysis is shown in Fig. 8. The total 1.803 W on-chip power is achieved.
It is the combination of both dynamic and static power consumption. 92% of dynamic
power is achieved, and it is useful and will consume power when the device is working
condition. They have reduced almost static power consumption and having 8% of
static power consumption. The following parameters have been used to achieve low
power consumption.
Total on-chip power: 1.803 W.
Junction temperature: 45.8 °C.
Thermal margin 39.2 °C(3.3 W).
Effective JA 11.5 °C/W.
Power supplied to off-chip 0 W.

4 Implementation Results

The implementation have been done on Xilinx Zynq-7 ZC702 evaluation board. The
product Zynq-7000, package-clg484, and speed grade which is −1 have been used.
From Table 3, it shows that all the routed nets are working properly, and there is no
unrouted nets (Figs. 9, 10 and 11).
312 S. Malipatil et al.

Table 3 Implementation
İmplementatıon Summary Route status
summary
Conflict nets 0
Unrouted nets 0
Partially routed nets 0
Fully routed nets 59

Fig.9 Floor planning

5 Conclusion

In this paper, fault-tolerant digital FIR filter have been designed with reduced power
and area efficient by using ECC codes and avoiding TMR and DMR methodologies
and also designed custom IP for fault-tolerant FIR filter using Xilinx Vivado 2018.30
version and implemented on Xilinx Zynq-7000 SoC ZC702 evaluation board. This
proposed custom IP produces similar outcomes as that of the existing module. The
total on-chip power consumption is 1.803 W including dynamic and static power
consumption. The area has been analyzed based on the resource utilization, and our
intellectual properties (IPs) have been designed. This type of fault-tolerant filters is
used in space, and automotive and medical applications in reliability are critical.
Custom IP Design for Fault-Tolerant … 313

Fig.10 IO planning

Fig.11 Simulation results


314 S. Malipatil et al.

References

1. Gao Z et al (2014) Fault tolerant parallel filters based on error correction codes. IEEE Trans
Very Large Scale Integr (VLSI) Syst
2. Somashekhar, Vikas Maheshwari, Singh RP (2019) A study of fault tolerance in high speed
VLSI ciruits. Int J Sci Technol Res 8(08) (Aug)
3. Shim D, Shanbhag N (2006) Energy-efficient soft error-tolerant digital signal processing. IEEE
Trans Very Large Scale Integr (VLSI) Syst 14(4):336–348 (Apr)
4. Reviriego P, Bleakley CJ, Maestro JA (2011) Strutural DMR: a technique for implementation
of soft-error-tolerant FIR filters. IEEE Trans Circuits Syst Exp Briefs 58(8):512–516 (Aug)
5. Oppenheim AV, Schafer RW (1999) Discrete time signal processing. Prentice-Hall, Upper
Saddle River, NJ, USA
6. Software manual Vivado Design Suite Creating and Packaging Custom UG973 (v2018.3)
December 14, 2018, [online] Available: www.xilinx.com
7. Vaisakhi VS et al (2017) Fault tolerance in a hardware efficient parallel FIR filter. In: Proceeding
of 2018 IEEE ınternational conference on current trends toward converging technologies. 978–
1–5386–3702–9/18/$31.00 © 2017 IEEE
8. Nicolaidis M (2005) Design for soft error mitigation. IEEE Trans Device Mater Rel 5(3):405–
418 (Sept)
9. Kanekawa N, Ibe EH, Suga T, Uematsu Y (2010) Dependabilitu in electronic systems: mitiga-
tion of hardware failures, soft errors, and electro-magnetic disturbances. Springer, NewYork,
NY, USA
10. Lin S, Costello DJ (2004) Error control coding, 2nd edn. Prentice-Hall, Englewood Cliffs, NJ,
USA
11. Cheng C, Parhi KK (2004) Hardware efficient fast parallel FIR filter structures based oniterated
short convolution. IEEE Trans Circuits Syst I: Regul Pap 51(8) (Aug)
12. Somashekhar, Vikas Maheshwari, Singh RP (2019) Analysis of micro inversion to improve fault
tolerance in high speed VLSI circuits. Int Res J Eng Technol (IRJET) 6.03 (2019):5041–5044
13. Gao Z, Yang W, Chen X, Zhao M, Wang J (2012) Fault missing rate analysis of the arithmetic
residue codes based fault-tolerant FIR filter design. İn: Proc. IEEE IOLTS, June 2012, pp
130–133
14. Somashekhar, Vikas Maheshwari, Singh RP (2020) FPGA ımplementation of fault tolerant
adder using verilog for high speed VLSI architectures. Int J Eng Adv Technol (IJEAT) 9(4)a.
ISSN: 2249–8958 (Apr)
15. Hitana T, Deb AK (2004) Bridging concurrent and non-concurrent error detection in FIR
filters. İn: Proc. Norchip Conf., Nov 2004, pp 75–78. https://doi.org/https://doi.org/10.1109/
NORCHP.2004.1423826
16. Ponta’relli s,Cardarilli GC,Re M, Salsano (2008) Totally fault tolerant RNS based FIR filters.
İn: Proc.14th IEEE Int On-Line Test Symp (IOLTS), July 2008, pp 192–194
17. Kumar NM (2019) Energy and power efficient system on chip with nanosheet FET. J Electron
1(01):52–59
A Novel Focused Crawler
with Anti-spamming Approach & Fast
Query Retrieval

Ritu Sachdeva and Sachin Gupta

Abstract The Web pages are growing in a design of terabytes or even petabytes day
by day. In the case of the small Web, it is an easy task to answer a query, whereas
robust modus operandi for storage, searching, or spamming is needed in case of large
volumes of data. This study gives a novel approach for the detection of malicious
URLs and fast query retrieval. The proposed focused crawler checks URL before
entering in the search engine database. It discards malicious URLs but allows benign
URLs to enter in search engine database. The detection of malicious URLs is done
via the proposed feature set of URL which is created by selecting those attributes of
URL features which are susceptible to spammers. Thus, a non-malicious database
is created. The searching process performed through this search engine database by
triggering a query. Search time taken by the proposed crawler is less as compared to
the base crawler. The reason behind it is that the proposed focused crawler uses a trie
data structure for storing fetched results in Web repository instead of HashSet data
structure as used by the base crawler. On behalf of computed average search time
(for ten queries), it is observed that proposed focused crawler is 12% faster than base
crawler. To check the performance of proposed focused crawler, quality parameters,
i.e., precision and recall, are computed which are found to be 92.3% and 94.73%.
Detection accuracy is found to be 90% with an error rate of 10%.

Keywords HashSet · Trie · Focused crawler · Base crawler · Search engine ·


Malicious URL · Lexical · Content

1 Introduction

Technically, information retrieval (IR) [1] is a discipline of searching for sole data
in a document, all documents as well as metadata (text, image, or sound data or

R. Sachdeva (B) · S. Gupta


Department of Computer Science, MVNU, Palwal, India
e-mail: ritusach08@gmail.com
S. Gupta
e-mail: sachin.gupta@mvn.edu.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 315
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_25
316 R. Sachdeva and S. Gupta

Fig. 1 Process of crawling (in general)

databases). Search engine is a type of information retrieval that enlists the relevant
documents via specified keywords by the use of a spider robot [2].
The process of crawling is initiated with a list of URLs, called seed URLs. A queue
of pages (called frontier) is preserved by a Web crawler that is to be downloaded. The
seed set initializes the frontier (done manually). A URL from this seed collection has
been selected and submitted to the downloader to download the URL Web page. The
indexer module utilizes the fetched pages. A continuous process in which extracted
URLs from downloaded pages are fed to the URL frontier for further downloading
till the frontier becomes empty. Figure 1 illustrates how a Web crawler functions.
The main components of the crawler are URL frontier, DNS resolution, fetch
module, parsing module & URL duplicate eliminator. The URL frontier is the collec-
tion of URLs which are to be fetched next in the crawl. A DNS resolution module is
used to determine the IP address of the webserver specified by the URL in the URL
frontier. A fetch module uses the hypertext transfer protocol (HTTP) to extract the
Web page. A parsing module takes the Web page as input, extracting from it the text
and collection of hyperlinks. URL duplicate eliminator checks the availability of the
links in the frontier and discards the link if it is already fetched. The robot template
is used to determine whether or not to allow removal of the Web page.

1.1 Focused Crawler

Chakrabarti et al. had proposed the oriented crawler [3]. It is composed of a hypertext
classifier, a distiller, and a crawler. The classifier makes appropriate decisions about
the expansion of links on crawled pages, while the distiller calculates a measure of
the centrality of crawled pages to determine visit priorities. The search function of
A Novel Focused Crawler with Anti-spamming … 317

the crawler is dynamically reconfigurable priority controls and is managed by the


distiller and the classifier.

1.2 Spamming in Consonance with Malicious URLs

Today’s spammers target the URL to induce spamming in Web pages. Such types
of URLs are called malicious URLs. These URLs are difficult to be detected by
the end user, and user data is illegitimately accessed. These malicious URLs have
resulted in a cyberattack, unethical behavior like the breach of confidential as well
as secure content, installation of ransomware on the user devices causing massive
losses worldwide each year, etc. Benign URLs can be converted into malign URLs by
obfuscation. Obfuscation is a technique used to mask malicious URLs. It is reported
that about 50 million Web site users are visiting malicious Web sites. Black–hosting,
heuristic classification, etc., are traditional techniques based on keyword as well
as URL syntax matching are some filtering mechanisms used to reveal malicious
URLs, but these techniques are inefficient to cope up with technologies and Web
access techniques as well as detecting modern URLs.

1.3 Query Processing & Role of Data Structure in Searching

In response to queries, the crawler locates the related Web pages and downloads their
content that is stored on the disk for further processing. These results are usually
stored in a database repository in the form of an inverted index, hash tables, etc.,
to help user queries be processed in the future. But the Web index generated must
be compact, i.e., the memory requirements for index storage should be smaller. The
main challenges are improving query performance by handling queries faster and
providing faster results in trillions of Web data. Kanagavalli [4] discussed several data
structures based on storage, process, as well as a description that is used for storing
the data. The author dictated that mostly hash tables are used as a storage structure.
The efficient data structure used for storage leads to the optimization of search engine
and ultimately the whole process of generating final results has accelerated.

1.4 HashSet Versus Trie

HashSet is an unordered, special array of elements. It is implemented by means of


a hash table. A HashSet contains a set of objects, but in such a way that the user
can quickly and easily decide if an object is already in the set or not. It does this by
managing an array internally and storing the object using an index calculated from
the object’s hashcode. HashSet also provides standard set operations such as union,
318 R. Sachdeva and S. Gupta

symmetric, and intersection. The methods add, delete, and contain are of constant
time complexity O (1). A HashSet has an internal structure (hash), in which objects
can be easily searched and defined. It does not preserve the order of elements. There
is no access by indices. But either enumerator or built-in functions can be used to
access elements. Built-in functions convert the HashSet to a list and iterate through
it. Iterating through a HashSet (or having an object by index) is therefore very slow
particularly in the case of large text queries. Moreover, HashSets are not cache
friendly.
Trie is a dynamic ordered tree data structure used to store a set or associative array
in which keys are normally strings. These strings are arranged in lexicographic order.
The search complexity of a string of key length m is O(m) in the worst case. Updating
a trie is quite simple as inserting starts with a search and when a node with no correct
edge to follow appears, then a node is added with the remaining string on the edge
to this node. Trie can be better represented in the form of a compressed or compact
form. A compressed or compact representation of a trie is one that merges all chains
of edges that have no branches (the nodes between these edges are of degree one) to
one edge, labeled with the string of characters of the merged edges or labeling the
resulting path. In the particular case of a compact binary trie, the total number of
nodes is 2n − 1, like in a full binary tree, where there are n-strings are represented
by trie.

1.5 Advantages of Trie Data Structure Over HashSet

Tries are an incredibly special and functional data structure that is dependent on the
prefix of a string. So, these are being able to help in searching for a value having the
longest possible prefix similar to a given key. Tries can also be used for determining
the association of value with a group of keys that share a common prefix. They are
used to signify the “Retrieval” of data. Strings are placed in a top to bottom manner
based on their prefix in a trie. All prefixes of length 1 are stored up to level 1, and
all prefixes of length 2 are sorted down to level 2 and so forth. So, it is considered a
better data structure for faster searching of string in comparison to HashSet.
Trie typical makes use of the case when dealing with a group of strings rather than
individual strings. The search, insert, and delete complexity of operations is O(L)
where L is the length of a key. It is faster because of the ways it is implemented.
Do need to compute any hash function. There is no collision handling. It prints all
words in alphabetical order. These are space efficient if you are storing lots of words
that start with a similar pattern. These may reduce the overall storage cost by storing
shared prefixes once. Thus, trie can quickly answer queries about words with shared
prefixes resulting in efficient prefix queries.
A Novel Focused Crawler with Anti-spamming … 319

Table 1 Studies on URL feature-based crawlers


S. no Author Study done
1 Justin [14] Proposed a real-time framework that gathers lexical & host-based
URL features and pairs it to a wide Webmail provider with a
real-time stream of labeled URLs.
2 Xu [15] Put limelight on malicious Web sites & their detection via a novel
cross-layer method, handling adaptive attacks as well as statistical
characteristics with the help of lexical, host, and content features
3 Choi [16] Proposed a technique that spans discriminative features like link
structures, textual properties, Web page contents, network traffic,
DNS information, etc., considering URL lexical features
4 Popescu [17] Focuses on machine learning for detecting malicious URLs.
Moreover, FPR & detection rate is also calculated
5 Mamun [18] Described an approach using lexical analysis. Blacklist as well as
obfuscation methods are also discussed
6 Vanhoenshoven [19] Detection of malicious URLs using a machine learning technique.
Performance is checked via reliability, TP, TN, FP & FN
7 Sridevi [20] Worked on malicious URL at browser level using blacklisting,
heuristic, as well as machine learning techniques
8 Chong [21] Detection of malicious URLs using lexical and Javascript features
of URL via support vector machine
9 Patil [22] Proposed a multi-classification technique considering 42 features
of spam, malware URLs, and phishing.
10 Naveen [23] Detection of malicious URLs using keyword matching and URL
syntax matching, i.e., syntactic, lexically, as well as semantically
11 Sahoo [24] Framed a malicious URL detection system using lexical, context,
host, as well as popular features of URL

2 Literature Survey

2.1 Studies Done on URL Feature-Based Crawlers

A URL has many features like lexical features, host-based Features, content-based
features, popularity features, and context features, on behalf of which spammed URL
can be detected. Table 1 shows a summary of related work.

2.2 Studies Were Done on the Usage of Data Structures


as Storage Unit in Searching

Shishibori [5] discussed the implementation of binary tries as a fast access method
by converting binary tries into a compact bitstream for reducing memory space.
320 R. Sachdeva and S. Gupta

Moreover, he explained that it takes ample time to search (due to pre-order bitstream)
and update in large key sets, thereby, hike in time and cost of each process in case
of large key sets. Kangavalli [4] discussed various data structures to be required in
information retrieval due to trillions of data. The author explains that data structures
can be process-oriented, descriptive, or storage in this case. Response time, as well as
the quality of the system, is defined for its performance. Steffein Heinz [6] proposed
a new data structure called burst tries for string keys which is faster than a trie but
slower than a hash table. Shafiei [7] discussed Patricia tree in concern to the shared-
memory system by implementing insertion, deletion, and replacement operations.
This proposed work is also justified for storage of unbounded length strings with flags
but being avoided due to consumption of a lot of memory. Thenmozhi [8] analyzed
the efficiency of various data usage models for tree- and trie-based implementations
under different hardware and software configurations such as the size of RAM &
cache, as well as the speed of physical storage media. Andersson [9] discussed the
string searching problem using a suffix tree being compressed at level, path, and data.
It is very effective for large texts due to decreasing the number of accesses to slow
secondary memory as well as limited main memory usage simultaneously. Roberto
Grossi [10] proposed fast compressed tries through path decompositions with less
memory space and latency. Nilsson [11] implemented a dynamic compressed trie,
i.e., LPC trie, with level and path compression. A comparison with balanced BST
showed that search time is better due to small average depth, but memory usage of
balanced BST and LPC trie is similar. So, LPC trie is a good choice for a data structure
that preserves order, where very quick search operations are necessary. Shishibori
[12] proposed a strategy for compressing Patricia tries into a compact data structure,
i.e., bitstream. But, compact Patricia stores information about eliminated nodes, so
large storage is required to implement it. The study also evaluates the space and time
efficiency. Isara Nakavosute [1] suggested an approach for maximizing information
retrieval (IR) time or database search time using a BST & a doubly linked list. Mangla
[13] proposed a method named context-based indexing in IR system using BST that
solves the large search space problem by indexing, stemming, and removal of stop
words in case of large documents.

3 Proposed Focused Crawler

The proposed focused crawler or classifier is based on selected attributes of different


URL features like lexical features, JavaScript features, content features & popularity-
based features. The selected features are susceptible to the spammers in a lesser or
greater degree. An experimental analysis via different Web sites is done for each
chosen feature and assigned a weight based on the existence of these attributes.
If the attribute has existed in maximum URLs, then its severance is high, so the
weight assigned to that attribute is high and vice versa. For fewer occurrences for
any attribute of URL, less weight is assigned. Then, the average value has been set
for each feature unit (mentioned in Table 2). The total sum of the average weights of
A Novel Focused Crawler with Anti-spamming … 321

Table 2 Feature representation of proposed focused crawler


Features Category Average value Status assumption
Count of dots in URL Lexical 0.1 >3
Length of primary domain Lexical 0.1 >10
Length of URL Lexical 0.1 >30
Keywords like “Confirm”/“Banking” Popularity-based 0.1 Existing
Escape() Javascript 0.4 Equal to 4
Eval() Javascript
Link() Javascript
Unescape() Javascript
Exec() Javascript
Search() Javascript
“Scripting.FileSystemObject” DHTML 0.2 Equal to 2
“WScript.Shell” DHTML
“Adodb.Stream” DHTML

all the attributes is 1. Then, for the detection of malicious URL, multiple malicious
URLs are analyzed, and the threshold value is determined based on the sum of
weights based on attributes that occurred in provided URLs. It is found to be 0.7.
Thus, the system inbreds a mathematical range from 0 to 1 and differentiates the
benign and malign URL statistically. Zero depicts that the URL is benign, while the
value greater than 0.7 (the threshold value) shows that the URL is malign.

4 Methodology of Proposed Focused Crawler

The proposed focused crawler works in two steps. The first step filters benign and
malign URLs on behalf of selected feature set. Malicious URLs are rejected while
benign URLs are added in the search engine database. Then, the query is triggered,
and results are displayed.
• Filteration of Benign & Malign URLs
It filters malign and benign URLs on the interface of the multifeatured malicious
URL detection system. After detecting malicious URL, these URLs are blocked, and
benign URL passes the filter. The benign URLs are entered into the database.
• Fast Query Retrieval
Later, searching is performed by triggering a query on the search interface. As it is
a focused crawler, it limits the search up to a domain. This interface leads to a window
that not only gives search results but also gives the comparison of search time of base
crawler as well as proposed focused crawler. The base crawler uses HashSet, and the
322 R. Sachdeva and S. Gupta

Fig. 2 Design of proposed focused crawler

proposed focused crawler uses trie as a storage unit during searching. Moreover, the
theme of the categorization of a focused crawler improves the search results. Also,
it reduces crawling time as well as saves database space (Fig. 2).
A Novel Focused Crawler with Anti-spamming … 323

5 Pseudo Code of the Proposed Methodology


324 R. Sachdeva and S. Gupta

Fig. 3 Interface for detecting malign and benign URLs

6 Experimental Results and Discussion

6.1 Data Source and Dataset

A database of collection of showing malign and benign URLs has been downloaded
from https://www.kaggle.com/antonyj453/urldataset/data. In the implementation of
this classifier, 50 URLs from different domains are tested on the malicious URL
detection interface of the crawler (Fig. 3).
This interface filters benign and malign URLs. Malign URLs are blocked and are
not allowed to enter in the database while benign URLs are passed the anti-malicious
barrier and saved in the database. Then, these URLs take part in the searching process.
A number of queries or keywords are to be passed in search space on the searching
interface which leads to a search engine result page after searching. This window
shows the comparison of base crawler and proposed focused crawler in terms of
search time based on the storage data structure used by base crawler, i.e., HashSet,
and proposed focused crawler, i.e., trie, during searching.

6.2 Experimental Results

6.2.1 Detection of Malign and Benign URLs

See Table 3.
A Novel Focused Crawler with Anti-spamming … 325

Table 3 Record of tested URLs


Sr. no Domain Statement of Web address Malicious/Non-malicious URL
domain (URL of the Detected Actual
Web site)
1 D1: Cab Different cabs http://http://dub NM NM
lincabs.com/
http://http://swi NM NM
ftcabs.com/
http://http://cha NM M
ndigarhcabs.
com/
http://www.che M NM
nnaicabs33.
com/
http://www.mus NM NM
kancabs.com/
Cabs outstation http://www.biz M M
aad.com/taxi-
tour-package
http://www.par NM NM
ultravels.com/
http://delhicarb NM NM
ooking.in/
http://www.inn M M
ovarental.in/ban
galore
http://driveg NM NM
oplus.com/
2 D2: Food Food outlets http://motima NM NM
in/near Faridabad hal.in/
http://www.bol NM NM
oco.com/
http://www.sur NM NM
uchirestaurants.
com/
http://www.amu NM NM
ldairy.com/
http://www.foo NM NM
dzonepalwal.
com/
Food items http://www.gul M M
sproductions.
com/
http://www.mer M M
rymilkfoods.
com
(continued)
326 R. Sachdeva and S. Gupta

Table 3 (continued)
Sr. no Domain Statement of Web address Malicious/Non-malicious URL
domain (URL of the Detected Actual
Web site)
http://speyfoods. M M
com/
http://mcdona NM NM
ldsrestaurantsd
ucanadalte-273.
foodpages.ca/
http://o.foodpa NM NM
ges.ca/
3 D3: Books Best fiction novels http://www.mod NM NM
ernlibrary.com/
http://www.bbc. NM NM
com/
http://muh NM NM
ich.pw/
http://www.mod NM NM
ernlibrary.com/
http://www.boo NM NM
kspot.com/
Role of books in http://mcxl.se/ NM NM
our life http://www.kli NM NM
entsolutech.
com/
http://www.rus NM NM
evec.com/
http://www.myn NM NM
ewsdesk.com/
http://lifestyle. M NM
iloveindia.com/
4 D4: Care Animal care http://www.pfa NM NM
centers faridabad.com/
http://www.san NM NM
jaygandhianimal
carecentre.org/
http://smallanim NM NM
alcarecenter.
com/
http://abhyas NM NM
trust.org/
http://www.ani NM NM
malandbirdvet.
com/
(continued)
A Novel Focused Crawler with Anti-spamming … 327

Table 3 (continued)
Sr. no Domain Statement of Web address Malicious/Non-malicious URL
domain (URL of the Detected Actual
Web site)
Health insurance http://www.app M M
leton-child-care.
com/
http://sunkeyins M M
urance.com/
http://nycfootdr. NM M
com/
http://insurance M NM
companiesinn
ewyork.com/
http://www.kai NM NM
serinsuranceonl
ine.com/
5 D5: Sports Sports arena http://richsport M M
smgmt.com/
http://2amspo M M
rts.com/
http://opensport NM NM
sbookusa.com/
http://raresport NM NM
sfilms.com/
http://www.sch NM NM
ultesports.com/
Sports famous in http://www.wal NM NM
India kthroughindia.
com/
http://www.ilo NM NM
veindia.com/
http://www.ias NM NM
lic1955.org/
http://www.ind NM NM
iaonlinepages.
com/
http://www.icc NM NM
rindia.net/
*Acronym used in the table—for malicious & NM (non-malicious)
328 R. Sachdeva and S. Gupta

Table 4 Parameter values of performance factors


True positive (TP) True negative (TN) False positive (FP) False negative (FN)
36 9 3 2

6.2.2 Computed Parameter Values

On behalf of data tested, the different parameter values of true positives (TP), true
negatives (TN), false positives (FP), and false negatives (FN) is got where
TP = Case was positive & predicted positives, i.e., benign URLs
TN = Case was negative & predicted negative, i.e., malign URLs
FP = Case was positive, but predicted negative, i.e., malign URLs
FN = Case was negative, but predicted positive, i.e., benign URLs
The obtained values of TP, TN, FP & FN from Table 4 are as follows.

6.2.3 Computed Search Time of Proposed Focused Crawler & Base


Crawler During Searching

See Table 5.

Table 5 Output window of searching


Sr. no Domain Statement of domain Search time (milli seconds)
Using trie Using HashSet
1 D1: Car Cab 0.0012 0.6083
2 D2: Food Food 0.0016 0.0201
3 D3: Books Books 0.0016 0.0192
4 D4: Care Care 0.0012 0.0201
5 D5: Sports Sports 0.0028 0.0192
6 D6: Education Code 0.0049 0.0254
7 D7: Travel Cab 0.0024 0.0246
8 D8: Insurance policies Insurance 0.002 0.0143
9 D9: Animals Animals 0.002 0.0197
10 D10: Animal care center Animals care center 0.002 0.0201
Total 0.092 ms 0.7207 ms
A Novel Focused Crawler with Anti-spamming … 329

7 Analytical Study

The proposed focused crawler is binary class classifier as it differentiates only two
classes, i.e., benign and malign. Different binary evaluation metrics are precision,
recall, false positive rate (FPR), false negative rate (FNR), detection accuracy, F-
measure, and AUC. Worked on three parameters, i.e., accuracy, rrecision & recall.

7.1 Accuracy

This parameter is calculated to observe the overall performance in terms of accuracy


and error rate. It is determined by dividing the total number of instances by all correct
predictions. It can be said that in the absence of any mistake (FP and FN being zero),
the measure of accuracy will be 1 (100%).

TP + TN
Accuracy =
TP + TN + FP + FN

Thus, accuracy = (36 + 9)/(9 + 36 + 2+3) = 0.9 or 90%.

7.2 Precision (Positive Predictive Value)

It is the ratio of correctly classified positive predictions. It is determined by dividing


the number of correct positive predictions divided by a total number of positives.

TP
Precision =
TP + FP

Thus, precision = 36/(36 + 3) = 0.923 or 92.3%.

7.3 Recall (Sensitivity)

It is the ratio of actually positive cases that are also identified as such. It is calculated
by dividing the number of correct positive predictions by the total number of positives.

TP
Recall =
TP + FN

Thus, recall = 36/(36 + 2) = 0.9473 or 94.73%.


330 R. Sachdeva and S. Gupta

Fig. 4 Graphical analysis of


search time of HashSet & trie Animals Search Time
cab using Hashset
(ms)
Sports
Search Time
Books
Using Trie (ms)
outstaƟon cab
0 0.01 0.02 0.03

7.4 Graphical Analysis of HashSet & Trie

Several queries are made for comparing the search time using HashSet and trie
storage data structures. Figure 4 graphically analyzes search time taken to search the
same keyword by using HashSet and trie data structures.

8 Conclusion & Future Research

Studies of McGrath Gupta (2008) and Kolari et al. (2006) suggested that a combina-
tion of URL features should be used to develop an efficient classifier. The proposed
focused crawler is based on this theory. The classifier is developed by a combina-
tion of lexical features, Javascript features, DHTML features, and popularity-based
features. It successfully detects malign URLs with an accuracy of 90% with an error
rate of 10%. Other metrics, precision and recall, are computed as 92.3 and 94.73%.
Moreover, storing the fetched data in a trie data structure during searching leads to
less search time as compared to the base crawler that uses a HashSet data structure.
Thus, it fastens the query retrieval process.
This focused crawler can be made more resistive to spamming via adding a more
robust feature set of URL. A study on short URLs can be done for effective detection
and attack type identification because it is the most growing trend today on the
microblogging sites or online social networks like Facebook, Twitter, Pinterest, etc.
Implementing via machine-learning approach will make this classifier more dynamic.

References

1. Nakavisute I, Sriwisathiyakun K (2015) Optimizing information retrieval (IR) time with doubly
linked list and binary/search tree (BST). Int J Adv Comput Eng Netw 3(12):128–133. ISSN
2320-2106
2. Lewandowski D (2005) Web searching, search engines and information retrieval. Inf Serv Use
25(3):137–147. IOS - 0167-5265
3. Soumen C, Van Den BM, Byron D (1999) Focused crawling: a new approach to topic-specific
web resource discovery. Comput Net J 1623–1640
A Novel Focused Crawler with Anti-spamming … 331

4. Kanagavalli VR, Maheeha G (2016) A study on the usage of data structures in information
retrieval. http//www.researchgate.net/publication/301844333A
5. Shishibori M et al (1996) An efficient method of compressing binary tries. Int J Sci Res (IJSR)
4(2):2133–2138 (IEEE). 0-7803-3280-6
6. Heinz S, Zobel J, Williams HE (2002) Burst tries a fast, efficient data structure for string keys.
ACM Trans Inf Syst 20(2):192–223
7. Shafiei N (2013) Non-blocking Patricia tries with replace operations. In: Proc. Int. Conf. Distrib.
Comput. Syst. IEEE, 1063-6927, pp 216–225
8. Thenmozhi M, Srimathi H (2015) An analysis on the performance of Tree and Trie based
dictionary implementations with different data usage models. Indian J Sci Technol 8(4):364–
375. ISSN 0974-5645
9. Andersson S, Nilsson A (1995) Efficient implementation of suffix trees. Softw Pract Exp CCC
25(2):129–141. 0038-0644
10. Grossi R, Ottaviano G (2014) Fast compressed tries through path decompositions. ACM J Exp
Algorithms 19(1):1.8.2–1.8.19
11. Nilsson S, Tikkanen M (1998) Implementing a dynamic compressed trie. In: Proc. WAE’98,
pp 1–12
12. Shishibori M et al (1997) A key search algorithm using the compact patricia trie. In: Int. Conf.
Intell. Process. Syst. ICIPS ’97. IEEE, pp 1581–1584
13. Mangla N, Jain V (2014) Context based indexing in information retrieval system using BST.
Int. J. Sci. Res. Publ 4(6):4–6. ISSN-2250-3153
14. Justin MA, Saul LK, Savage S, Voelker GM (2011) Learning to detect malicious URLs. ACM
Trans Intell Syst Technol 2(3):2157–6904
15. Xu S, Xu L (2014) Detecting and characterizing malicious websites. Dissertation, Univ. Texas
San Antonio, ProQuest LLC
16. Choi H, Zhu BB, Lee H (2011) Detecting malicious web links and identifying their attack
types. In: Proc. 2nd USENIX Conf. Web Appl. Dev. ACM, pp 1–12
17. Popescu AS, Prelipcean DB, Gavrilut DT (2016) A study on techniques for proactively iden-
tifying malicious URLs. In: Proc. 17th Int. Symp. Symb. Numer. Algorithms Sci. Comput.
SYNASC 2015, 978-1-5090-4/16, IEEE, pp 204–211
18. Manun MSI et al (2016) Detecting malicious URLs using lexical analysis, vol 1. Springer Int.
Publ. AG, pp 467–482. 978-3-319-46298-1_30. http//www.researchgate.net/publication/308
365207
19. Vanhoenshoven F, Gonzalo N, Falcon R, Vanhoof K, Mario K (2016) Detecting malicious
URLs using machine learning techniques. http://www.researchgate.net/publication/31158202
20. Sridevi M, Sunitha KVN (2017) Malicious URL detection and prevention at browser level
framework. Int J Mech Eng Technol 8(12):536–541. 0976-6359
21. Chong C, Liu D, Lee W (2009) Malicious URL detection, pp 1–4
22. Patil DR, Patil JB (2018) Feature-based Malicious URL and attack type detection using multi-
class classification. ISC Int J Inf Secur 10(2):141–162. ISSN 2008-2045
23. Naveen INVD, Manamohana K, Verma R (2019) Detection of malicious URLs using machine
learning techniques. Int J Innov Technol Explor Eng (IJITEE) 8(4S2):389–393. ISSN 2278-
3075
24. Sahoo D et al (2019) Malicious URL detection using machine learning: a survey. Association
Comput Mach 1(1):1–37. arXv 1701.07179v3
A Systematic Review of Log-Based Cloud
Forensics

Atonu Ghosh, Debashis De, and Koushik Majumder

Abstract Inexpensive devices that leverage cloud computing technology has prolif-
erated the current market. With the increasing popularity and huge user base, the
number of cybercrimes has also increased immensely. The forensics of the cloud
has now become an important task. But due to the geographically distributed nature
and multi-device capability of the cloud computing environment, the forensics of the
cloud has become a challenging task. The logs generated by the cloud infrastructure
provide the forensics investigator with major hints that may follow to reconstruct
the crime scene chronology. This is highly critical for the forensics investigator to
investigate the case. But the logs are not easily accessible, or they often fail to provide
any critical clues due to poor logging practices. In this paper, initially, the impor-
tance of log-based cloud forensics has been discussed. Then, a taxonomy based on
the survey of the literature has been furnished. Finally, the issues in the existing
log-based cloud forensics schemes have been outlined and open research problems
have been identified.

Keywords Cloud forensics · Digital forensics · Log forensics · Log-based cloud


forensics · Issues in cloud forensics · Cloud forensics taxonomy

1 Introduction

The untoward exploitation of the capability and the flexibility of the cloud computing
environment has brought in the need for cloud forensics [1]. The cloud computing
environment is not only capable of meeting minor general-purpose computing
requirements, but the tremendous power of the cloud computing environment can
be exploited by the malicious users to procure gigantic computing resources and
network bandwidth to launch various attacks on or off devices and applications. Thus,
there is a need for forensics investigation in the cloud computing environment. The
commonly used digital forensics practices do not apply to cloud forensics due to the

A. Ghosh · D. De · K. Majumder (B)


Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of
Technology, Kolkata, West Bengal, India
e-mail: koushik@ieee.org

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 333
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_26
334 A. Ghosh et al.

inherent properties of the cloud computing environment. The multitenant, volatile,


geographically distributed nature, and the complex architecture of cloud computing
hinder the forensics process in the cloud. Nevertheless, currently, the forensics inves-
tigators have extended the use of digital forensics tools to the cloud which makes the
whole process of cloud forensics investigation less promising. There is a pressing
need for extensive research and development in the field of cloud forensics.
In this work, the literature on log-based cloud forensics published between 2011
and 2020 has been reviewed. A taxonomy based on the survey of the literature has
been provided in Sect. 2. In Sect. 3, the challenges that log-based cloud forensics
faces have been identified. Finally, in Sect. 4, the open research areas have been
provided.

1.1 Log-Based Cloud Forensic

Logs are records generated by software under execution in some format as specified
by the developer. Each application, platform, or infrastructure usage is logged by the
CSP for various purposes such as but not limited to troubleshooting and malicious
activity tracking. In each of the cloud service models, logs are generated for every
possible access and execution of applications or any other services provided by the
CSP. So generated logs are maintained by the CSP. These logs have the potential
to reveal an enormous amount of information that might be required in various
scenarios as mentioned earlier [2] Thus, these cloud infrastructure generated logs
are used by cloud forensics investigators to reconstruct the sequence of activities
that have taken place in a cloud crime scene. A use case of logs generated by various
systems in a network is depicted in Fig. 1. The logs may be needed to track down
unauthorized access to a network by an unauthorized IP. In this scenario, the network
logs from a router or a firewall can be of tremendous help to find the details of
such intrusion and possibly prosecute the intruder in the court of law. In the cloud
computing environment, such as unauthorized access or other malicious activities,
such as sensitive data theft, causing damage to other businesses over the cloud, have
become quite common. In an investigation in the cloud computing environment,
the logs generated can give a promising direction to the investigation and may help
the prosecution to succeed as the logs generated in the cloud provide details of the
activities that have taken place in a cloud crime scene. The cloud forensics activity
which takes the help of the Logs generated in the cloud is known as log-based cloud
forensic. The users rarely have full access to logs. The CSP holds exclusive access
to the logs generated in her cloud service infrastructure. But she may or may not be
obliged to grant access to the logs to her users [3]. As mentioned in earlier sections,
the cloud computing environment is spread all over the globe. It is a mammoth task
to track down the exact location where the generated logs sit. In a cloud forensics
scenario, the CSP may provide access to the logs on an order by the court of law.
Since the cloud computing environment encompasses multi-jurisdiction property,
it again, in turn, becomes very tough to acquire the desired logs for the forensics
A Systematic Review of Log-Based Cloud Forensics 335

Fig. 1 Log-based investigation scenario

investigation. This grant of access to the logs by the CSP to the investigators may
lead to sensitive data leaks of other cloud service users. This is one of the main
reasons, why the CSPs do not tend to disclose the logs as doing so might lead to a
breach of the SLA among the CSP and its users. Such a breach, in turn, may defame
the CSP and lead to running out of business, let alone the jurisdictional chaos that the
CSP might have to face in case a cloud service user reports a breach of SLA to the
court of law. As per a report from the DPCI, 74% of the forensics investigators have
raised dependency on CSP as a concern. Also, ununiform logging structure leads to
the difficulty of identification and segregation of logs if access is granted.

2 Comparison with Related Work and Taxonomy

In this section, this work of review has been compared with similar works by other
researchers. Additionally, the proposed taxonomy constructed based on the literature
review for log-based cloud forensics has been provided (Table 1).
336 A. Ghosh et al.

Table 1 Comparison with related work


Contribution Solution taxonomy Coverage of tools Research gap Scope of literature
identification review
This work ✓ ✓ ✓ 2011–2020
[4] ✗ ✓ ✓ 2010–2017
[5] ✗ ✗ ✗ 2011
[6] ✗ ✗ ✗ 2011–2014
[7] ✓ ✓ ✗ 2011–2018
[8] ✗ ✗ ✓ 2011–2016

2.1 Proposed Taxonomy

Log-based cloud forensics research is categorized into continual forensics (discussed


in Sect. 3) and sporadic forensics (discussed in Sect. 4). This high-level categorization
focuses on the practice of log acquisition for forensics investigations. Forensics
process models, frameworks, forensics systems, and other security and integrity
systems can be the contributions by researchers. During this literature survey, it
was found that trust in third parties and the integrity of logs have been emphasized
as some of the major issues in log-based cloud forensics. Logs are generated in
almost all systems in the cloud infrastructure, the client devices, and the intermediate
communication systems. Thus, it is not surprising that vast literature focusing on such
log generating systems was encountered and their significance being highlighted
(Fig. 2).

Fig. 2 Proposed taxonomy


A Systematic Review of Log-Based Cloud Forensics 337

2.1.1 Continual Forensics

Continual forensics put the forensicability of a system under test. It is the practice
of continuous analysis of the logs generated by a forensics sound system. Unlike the
post-incident forensics analysis of logs, in continuous forensics, the system logs are
continuously monitored for ill behavior in the system. This is a result of the develop-
ment in cloud services and the research in forensics readiness of the cloud systems.
Due to the ever-broadening of the cybercrime landscape, several contributions have
been made in the “forensic-by-design” attribute of cloud systems. Simou et al. [9]
in their work proposed a methodology for forensic readiness of the cloud systems.
They coined the term “forensicability” to describe a system or a service that can
be designed in a forensic sound manner. They have further identified the forensic
constraints which are the concepts to be realized to enable a cloud as forensic ready.
These constraints when implemented increase the forensicability of a cloud service.
Kebande et al. [10, 11] have proposed a botnet-based solution for gathering logs
from live systems in the cloud computing infrastructure. They proposed infecting
the virtual machines with non-malicious bots that would collect live logs from the
user virtual machines and provide logs for live analysis. Park et al. [12] described the
work environments incorporating cloud services as smart work environments. They
suggested that cloud services for such work environments must implement forensic
readiness as a pro-active measure of threat preemption. They further proposed their
pro-active forensics model that was based on their investigation of the literature. They
identified and analyzed 50 components of digital forensic readiness and designed 7
detailed areas. For validating the model designed by them, they undertook surveys
of their models by digital forensics professionals. Finally, they deduced the areas
that can be worked on for the existing systems to gain forensic readiness. Datta et al.
[13] proposed a machine learning-based system that ranks malicious users in a crime
scene. This ranking of the suspect IPs helps eliminate the need to investigate all the
IPs in a crime scene (Fig. 3).
De Marco et al. [14] stated the fact that breaches in the cloud by the cloud client
happens by the violation of the service level agreements. Thus, the pro-active detec-
tion of violation of the service level agreements (SLAs) has the potential to lead a
forensic investigation to confidence. Based on this, they emphasized the need for
automation of detection of SLA breaches. They further proposed a framework for
the development of forensic ready cloud systems. Their framework considered the
technical aspects of the SLA and monitored that the system fulfilled the service obli-
gations. Baror et al. [15] emphasized the need for forensic readiness in the cloud. The
authors stated that there must be some form of communication in natural human-
understandable language be it friendly or vindictive. Thus, they proposed a natural
language processing-based framework that analyzes the language of the malicious
entity and detects an ongoing cyber-attack.
338 A. Ghosh et al.

Fig. 3 Continual and sporadic forensics

2.1.2 Sporadic Forensics

Sporadic forensics is referred to the forensics process that includes the activities being
carried out as a response to an incident rather than a continuous process of forensics
preparedness activities being carried out in the cloud infrastructure. It is where the
forensics data is acquired on a later stage as opposed to continuous data acquisition
for future incidents of forensics as in continual forensics. Dykstra and Sherman [16]
proposed “Forensic Open-Stack Tools” (FROST) for upright log extraction from
Infrastructure-as-a-Service (IaaS) cloud platforms. FROST is capable of extracting
logs from virtual disks, APIs, and guest firewalls. The distinct feature of FROST
is that it operates in the cloud management plane and does not interact with the
guest operating system. It also ensures log data integrity by maintaining hash trees.
Marty [17] proposed a log-based cloud forensic framework that focuses solely on the
logs generated at different levels of the cloud computing environment. The proposed
model is carefully designed keeping in mind—when there is a need for logging, what
is being logged and how an event is being logged. The author also emphasizes on
the non-existence of any standard log format and proposed the must-haves in a log
record such as the timestamp of the log record, key-value pair format of the log entry,
normalized values for making the logs ready for the analysis, etc. The author has
also focussed on log transportation, storing logs in centralized storage, archiving the
logs, and retrieving the logs when needed. Anwar and Anwar [18] showed that the
system generated logs of a cloud system that can be used in a forensic investigation.
They generated their log dataset by launching known attacks on Eucalyptus. Then,
they analyzed the data generated by the attacks and built a system that could detect
further such attacks on Eucalyptus. Roussev et al. [19] showed that the traditional
forensic practices on the client-side are inefficient to be employed in cloud forensics
A Systematic Review of Log-Based Cloud Forensics 339

and it requires a new toolset for efficient forensics in the cloud computing envi-
ronment. Further, the authors developed and demonstrated tools for forensics anal-
ysis of the cloud. They proposed “kumodd” for remote acquisition of cloud drives,
“kumodocs” for acquisition and analysis of Google docs, and “kumofs” for remote
visualization of cloud drives. Ahsan et al. [20] stated that the existing systems focus
on the forensics of the traditional systems rather than the cloud computing environ-
ment. The authors then proposed their logging service system called “CLASS: cloud
log assuring soundness and secrecy scheme for cloud forensics.” In their proposed
logging system, discrete users encrypted their logs with the help of their public
key with the purpose that only the users can decrypt their logs using their private
key. To avert unsanctioned alteration of logs, the authors spawned “Proof of Past
Log (PPL)” by implementing Rabin’s Fingerprint and Bloom Filter. Park et al. [21]
affirmed that despite the extensive research and development in the field of cloud
forensics, there still exist problems that have not yet been addressed. Maintaining
the integrity of the log is one such area. To solve this problem of log integrity, they
proposed their blockchain-based logging and integrity management system. Khan
and Varma [22] in their work proposed a framework for cloud forensics taking into
consideration the service models in the cloud. Their proposed system implemented
pattern search and used machine learning for the extraction of features in the cloud
logs. This implementation of machine learning in their work enabled the prioritiza-
tion of shreds of evidence collected from the virtual machines in the cloud. Rane
and Dixit [23] emphasized that there exists a great deal of trust in the third party in
log-based cloud forensics such as acquiring the logs from the cloud service provider
(a third party in the cloud forensics investigation). Stakeholders may collude to alter
the logs for their benefit. Thus, to solve this problem, the authors proposed their
forensic aware blockchain-based system called “BlockSLaaS: blockchain-assisted
secure Logging-as-a-Service” for the secure storage of logs and solving the collusion
problem. Further, they claimed that their proposed system provides the preservation
of log integrity (Table 2).

2.1.3 CSP Provided Logs

The logs in the cloud computing environment are generated in all the levels of the
service model. In the SaaS level, the user gets little or no logs at all. In the PaaS
level, the user has access to some of the logs but the degree of detail in such logs is
limited. The IaaS level gives the highest access to logs that can be instrumental to the
process of forensics investigation in the cloud. It is the cloud service provider (CSP)
who determines what logging information is provided to the cloud user. The CSP has
exclusive control of the logs in the cloud computing environment. It is because of
this reason, in a forensics investigation, the CSP is requested to provide the relevant
logs for the investigation to proceed. The CSP is not an entity that can be completely
trusted as there is a chance of collusion and alteration of logs. In the case of collusion
among the CSP and the adversary, the investigation might not lead to confidence and
even the wrong person might get framed [24].
340 A. Ghosh et al.

Table 2 Recapitulation of contributions in continual and sporadic forensics


Continual forensics
[9] Proposed forensic readiness model
[10, 11] Proposed botnet-based logging system
[12] Proposed forensics readiness model based on identified components of
digital forensics
[13] Proposed machine learning-based suspect ranking
[14] Proposed system that alerts breach of service level agreement
[15] Proposed NLP-based forensics system
Sporadic forensics
[16] Proposed “FROST” for log extraction in IaaS
[17] Provided and demonstrated logging guidelines
[18] Emphasized the usefulness of system logs
[19] Proposed and demonstrated forensics acquisition tools
[20] Proposed “CLASS: cloud log assuring soundness and security scheme”
[21] Proposed blockchain-based system that ensures the integrity of logs
[22] Proposed machine learning-based system for prioritization of evidence
[23] Proposed “BlockSLaaS: blockchain-assisted secure LaaS

VM Logs

These logs provide detailed clues to what has been done during an attack by an
adversary. The logs might not be available, and a legal notice needs to be sent out.
But multijurisdictional issues might lead to the non-availability of such logs. Due to
the pivotal role of logs from VMs, several researchers have contributed to various
methods and suggestions for VM log gathering and analysis. Thrope et al. [25]
proposed a tool called “Virtual Machine Log Auditor (VMLA).” VMLA can be used
by a forensics investigator to generate a timeline of events in the virtual machine
using the virtual machine logs. Zhang et al. [26] proposed a method for the detection
of active virtual machines and the extraction of system logs, process information,
user accounts, registry, loaded modules, network information, etc. They have exper-
imented and have been successful with current CPUs and operating systems such as
Fedora. Lim et al. [27] have emphasized the role of VM’s in forensics investigation
and have presented suggestions on how to forensically investigate a virtual machine
based on their findings. Wahyudi et al. [28] in their research demonstrated that even
when a virtual machine is destroyed, and the forensic relevant data can be recovered
from the host machine using autopsy tools and FTK.
A Systematic Review of Log-Based Cloud Forensics 341

Resource Logs

Performing forensics in the cloud computing environment not only requires the logs
from virtual machines and host machines, but the logs from other resources such as
load balancers, routers, and network firewalls are also essential. These resource logs
help the forensics investigator in the reconstruction of the crime scene. The acquisi-
tion of such logs is a challenge and demands trust in the cloud service provider as
the cloud infrastructure after all is owned and maintained by her. While surveying
the literature, it was found that there has been a significant amount of research and
development in this area too. Mishra et al. [29] emphasized the usefulness of the
resource logs along with the logs from the virtual machines and the host system. In
their work, they proposed the collection of virtual machine logs along with resource
logs and stored the logs in a database. They further demonstrated the implementation
of a dashboard for monitoring the logs for the identification of unusual activities in
the virtual machines and the resources that are being logged. They performed their
experiment in a private cloud implemented using Eucalyptus. Gebhardt and Reiser
[30] in their research, outlined the need for network forensics in the cloud computing
environment. Additionally, they have emphasized the challenges in network foren-
sics. To solve these problems, they have proposed a generic model for forensics
of the network in the cloud computing environment. They validated their proposed
model by implementing a prototype with “OpenNebula” and the analysis tool called
“Xplico.”

2.1.4 LaaS Provided Logs

Logs are extracted from various levels of the cloud infrastructure as well as from
the devices that reside with the Internet service provider. During forensics investiga-
tion, logs are requested from the cloud service provider as well as from the Internet
service provider. But the issue of putting trust in a third party is subsisted in such a log
acquisition process. To mitigate the issue of putting trust in a third party, Logging-
as-a-Service has emerged. This scheme of service gathers logs and provides access
to the logs to the forensics investigator through trusted, secure, and privacy-aware
mechanisms. Thus, keeping the dependency on untrusted parties to a minimum.
Khan et al. [31] have emphasized the importance of cloud logs and have proposed
a “Logging-as-a-Service” scheme for the storage of outward gathered logs. Deploy-
ment of logging system being expensive due to the persistence of logs gathered, they
have opted for a cloud-based solution. They deployed the logging service in the cloud
where they implement “reversible watermarking” for securing the logs. This kind of
watermarking is very efficient, and any tampering of the logs can be easily detected
by the virtue of it. The logs are collected using Syslog and the logs thus collected
are stored for a longer stretch of time. Muthurajkumar et al. [32] have accentuated
the pivotal role that logs play in forensics and the usefulness of extended storage of
logs. In their work, they have implemented a system using Java and Google Drive
for the secured and integrity maintaining manner. The authors have implemented the
342 A. Ghosh et al.

“Temporal Secured Cloud Log Management Algorithm” for maintaining log trans-
action history. The logs that they store are encrypted before storage. Batch storage of
logs is implemented by the authors for seamless retrieval of the stored logs. Liu et al.
[33] have outlined the importance and vulnerability of logs in the cloud computing
environment. Considering the risks that persist for the log databases, the authors have
proposed a blockchain-based solution for such log storage. The authors have imple-
mented the logging system where the integrity of the logs to be stored is first verified
and then the logs are stored in the log database and the hash of the logs are stored
in the blockchain. Users retrieve the hashes from the blockchain and store them in
a separate database called the “assistant database.” Then, the users send acceptance
of the logs to the cloud service provider. Finally, the cloud service provider stores
the acceptance in the log database. Patrascu and Patriciu [34] discuss the problem
of logs not being consolidated. They further propose a system for the consolidation
of cloud logs to help the forensics process in the cloud computing environment.
The framework proposed by the authors consists of five layers. The “management
layer” consists of a cloud forensics module and other cloud-related services. The
“virtualization layer” consists of all virtual machines, workstations, etc. The third
layer consists of the log data storage that is sent from the “virtualization layer.”
The raw log data is then analyzed in the fourth layer. Finally, in the fifth layer, the
analyzed and processed data are stored. Rane et al. [35] proposed an interplanetary
file system (IPFS)-based logging solution. The IPFS system is used to store network
and virtual machine log meta-data. The authors claim that their system provides
“confidentiality,” “integrity,” and “availability.” The authors maintain an index of
hashes from the IPFS system. Any tampering of data will result in new hash which
will not be present in the index. Thus, providing integrity of the logs (Table 3).

2.1.5 Client Device Logs

Cloud services exhibit multi-device capability, i.e., the cloud services can be accessed
from different kinds of devices. A quite common example of such a cloud service
is cloud storage services such as Google Drive, Dropbox, etc. From the lens of
forensics investigation, all such devices accessing the cloud services need to be
examined for shreds of evidence. Several developments have been made in this area
of research. The logs found in such client devices have been termed as “somatic logs”
in the proposed taxonomy. Satrya and Shin [36] in their work proposed a method
for forensics investigation of the client devices and the Google Drive application on
Android devices. They demonstrated the forensics investigation by following the six
steps of a digital forensics investigation. They compared various logs generated in
the system such as login and logout data, install and uninstall data. Amirullah et al.
[37] performed forensics analysis of client-side applications on Windows 10 devices
to find remnant data on the device related to the crime scene. They discovered various
kinds of data in the device such as deleted data, install and uninstall data, browser
data, and memory data. They claim their success to be 82.63%. But they were unable
to analyze the remaining data on the device (Table 4).
A Systematic Review of Log-Based Cloud Forensics 343

Table 3 Recapitulation of contributions in CSP provided logs and LaaS provided logs
CSP provided logs
[25] Proposed system that generates timeline of events using VM logs
[26] Proposed system for extraction of logs
[27] Provided suggestions on how to perform forensics investigation of VMs
[28] Recovered evidences from deleted VMs using Autopsy tools and FTK
[29] Proposed system for acquisition and consolidation of logs
[30] Proposed a generic model for forensics of network using Xplico tool
LaaS provided logs
[31] Proposed “Logging-as-a-Service” system using “reversible
watermarking” in cloud
[32] Proposed system for secure and integrity preserving persistence of logs
using Google Drive
[33] Proposed blockchain-based solution for log storage and anonymous
authentication
[34] Proposed an extensible system for consolidation of logs for existing
clouds
[35] Proposed IPFS-based log storage system

Table 4 Recapitulation of contributions in client device logs


Client device logs
[36] Proposed and demonstrated forensics investigation of Google Drive client
Android application
[37] Performed forensics analysis of client-side applications on Windows 10
devices

3 Challenges in Log-Based Cloud Forensics

Log-based cloud forensic faces several challenges. In this section, the challenges
faced by an investigator in log-based cloud forensics have been discussed.
• Synchronization of Timestamps
Timestamps in the cloud logs enable the forensics investigator to reconstruct the
chain of activities that have taken place in a crime scene in the cloud. By design,
cloud infrastructure is spread across the globe. This makes the logs from different
systems maintaining timestamps of respective time zones. Thus, when logs from
different time zones are analyzed, co-relating the timestamps becomes a mammoth
task.
• Logs Spread Across Layers
Moving along in the order IaaS, PaaS, SaaS, the access of logs decreases, i.e., in
SaaS, the CSP either provides the user with little or no log data. In PaaS, the user
gets access to some extent. The highest level of (comparatively) access to the logs
344 A. Ghosh et al.

is given to the user in IaaS. There is no centralized access to the logs in the cloud
computing environment. Moreover, the IaaS user is only granted access to logs
that the cloud service provider deems suitable. For detailed logs of the network
and hardware, the cloud service prover must be requested and trusted.
• Volatile Logs
In the cloud environment, if the user can create and store data then she also can
delete the data. Because of the multi-locational nature of cloud computing, the
data present at different data locations are mapped to provide abstraction and to
provide the illusion of unity to the user. When data is deleted, its mapping is also
erased, this removal of the mapping happens in a matter of seconds thus making
it impossible to get remote access of the deleted data in an investigation scenario
which partly relies on deleted data recovery.
• Questionable Integrity of Logs
The cloud service provider is the owner of most of the crucial logs and must be
requested for access to the logs in a forensics investigation. But the integrity of logs
that are provided by the cloud service provider is questionable. There is always
a chance of collusion among the parties involved in the forensics investigation.
Moreover, the cloud service providers are bound by the service level agreements
for the privacy and integrity of their clients. Thus, a cloud service provider will
not be obliged to breach the service level agreements fearing their running out of
business due to such a breach.

4 Open Research Areas in Log-Based Forensics

Despite the extensive research and development surveyed in this paper, there still
exist several unaddressed issues. Some of the open research areas that can be worked
on for the maturing of log-based cloud forensics are.
• Forensics-as-a-Service
The volume of data that must be analyzed is huge. Analysis of such a huge volume
of data requires high computing power. The typical workstation-based practice
of forensics analysis needs to be changed for an improved turnaround time of
the cases. The elasticity of the cloud computing environment can be exploited to
process such huge volumes of data. Thus, cloud-based Forensics-as-Service can
play a pivotal role in the reduction of pending cases.
• Innovation of Tools
Cloud forensics is still practiced using traditional digital forensics tools. This
makes the process inefficient and at times leads to a complete halt of the cases.
Thus, there is an urgent need for specialized tools that suit the cloud computing
environment, that can handle the huge volumes to be analyzed, and automates the
tasks in the process of forensics analysis.
• Integrity of Log Files
A Systematic Review of Log-Based Cloud Forensics 345

As discussed in Sect. 3, the integrity of the logs provided by the cloud service
provider is questionable. There is an urgent need to come with solutions that help
preserve the integrity of the log files. This is because, if the logs are modified
by any party, the case will not lead to confidence and there is a chance of the
righteous person being punished.
• Prioritization of Evidence to Be Analyzed
One of the major reasons for the high turnaround time of the cases is the exces-
sively high volumes of data that need to be examined. Thus, discarding irrelevant
data will speed up the process of examination. Hence, there is a need for smarter
and automated tools and techniques that prioritizes the data relevant to a case.

5 Conclusion

The field of computing is evolving rapidly due to the introduction of cloud computing.
Cloud computing has given computing a new dimension which the humankind can
leverage. Misuse of the tremendous power and flexibility also exists and the cloud
itself can be used to deal with the adversaries. Cloud computing has the potential
to give a significant boost to forensic investigations be it on cloud or off the cloud.
Overall, the potential is phenomenal. In this work of review, it has been explained why
traditional digital forensics fails in a cloud computing environment. By undertaking
the survey, taxonomy has been proposed with continual and sporadic Forensics being
the two main types of cloud forensics. The sub-types of the proposed taxonomy have
been discussed in detail with the coverage of tools. The challenges in log-based
cloud forensics have been identified and discussed in detail. Finally, the areas of
open research and development in the field of log-based cloud forensics have been
identified.

References

1. Santra P, Roy A, Majumder K (2018) A comparative analysis of cloud forensic techniques in


IaaS.Advances in computer and computational sciences. Springer, Singapore, pp 207–215
2. Santra P et al (2018) Log-based cloud forensic techniques: a comparative study.Networking
communication and data knowledge engineering. Springer, Singapore, pp 49–59
3. Datta S, Majumder K, De D (2016) Review on cloud forensics: an open discussion on challenges
and capabilities. Int J Comput Appl 145(1):1–8
4. Baldwin J et al (2018) Emerging from the cloud: a bibliometric analysis of cloud forensics
studies.Cyber threat intelligence. Springer, Cham, pp 311–331
5. Ruan K et al (2011) Cloud forensics.IFIP International conference on digital forensics. Springer,
Berlin
6. Sibiya G, Venter HS, Fogwill T (2015) Digital forensics in the cloud: the state of the art. In:
2015 IST-Africa conference. IEEE
7. Studiawan H, Sohel F, Payne C (2019) A survey on forensic investigation of operating system
logs. Dig Invest 29:1–20
346 A. Ghosh et al.

8. Khan S et al (2016) Cloud log forensics: foundations, state of the art, and future directions.ACM
Comput Surv (CSUR) 49(1):1–42
9. Simou S et al (2019) A framework for designing cloud forensic-enabled services (CFeS).
Requirements Eng 24.3:403–430
10. Kebande VR, Venter HS (2015) Obfuscating a cloud-based botnet towards digital forensic
readiness. In: Iccws 2015—the proceedings of the 10th ınternational conference on cyber
warfare and security
11. Kebande VR, Venter HS (2018) Novel digital forensic readiness technique in the cloud
environment. Austral J Forens Sci 50(5):552–591
12. Park S et al (2018) Research on digital forensic readiness design in a cloud computing-based
smart work environment.Sustainability 10(4):1203
13. Datta S et al (2018) An automated malicious host recognition model in cloud forensics. In:
Networking communication and data knowledge engineering. Springer, Singapore, pp 61–71
14. De Marco L et al (2014) Formalization of slas for cloud forensic readiness. In: Proceedings of
ICCSM conference
15. Baror SO, Hein SV, Adeyemi R (2020) A natural human language framework for digital forensic
readiness in the public cloud.Austral J Forensic Sci 1–26
16. Dykstra J, Sherman AT (2013) Design and implementation of FROST: digital forensic tools
for the OpenStack cloud computing platform. Digital Invest 10:S87–S95
17. Marty R (2011) Cloud application logging for forensics. In: Proceedings of the 2011 ACM
symposium on applied computing
18. Anwar F, Anwar Z (2011) Digital forensics for eucalyptus. In: 2011 Frontiers of ınformation
technology. IEEE
19. Roussev V et al (2016) Cloud forensics–tool development studies & future outlook.Digital
investigation 18:79–95
20. Ahsan MAM et al (2018) CLASS: cloud log assuring soundness and secrecy scheme for cloud
forensics.IEEE Trans Sustain Comput
21. Park JH, Park JY, Huh EN (2017) Block chain based data logging and integrity management
system for cloud forensics.Comput Sci Inf Technol 149
22. Khan Y, Varma S (2020) Development and design strategies of evidence collection framework in
cloud environment. In: Social networking and computational ıntelligence. Springer, Singapore
23. Rane S, Dixit A (2019) BlockSLaaS: blockchain assisted secure logging-as-a-service for cloud
forensics. In: International conference on security & privacy. Springer, Singapore
24. Alex ME, Kishore R (2017) Forensics framework for cloud computing. Comput Electr Eng
60:193–205
25. Thorpe S et al (2011) The virtual machine log auditor. In: Proceeding of the IEEE 1st
ınternational workshop on security and forensics in communication systems
26. Zhang S, Wang L, Han X (2014) A KVM virtual machine memory forensics method based
on VMCS. In: 2014 tenth ınternational conference on computational ıntelligence and security.
IEEE
27. Lim S et al (2012) A research on the investigation method of digital forensics for a VMware
Workstation’s virtual machine.Math Comput Model 55(1–2):151–160
28. Wahyudi E, Riadi I, Prayudi Y (2018) Virtual machine forensic analysis and recovery method
for recovery and analysis digital evidence.Int J Comput Sci Inf Secur 16
29. Mishra AK, Pilli ES, Govil MC (2014) A Prototype Implementation of log acquisition in
private cloud environment. In: 2014 3rd ınternational conference on eco-friendly computing
and communication systems. IEEE
30. Gebhardt T, Reiser HP (2013) Network forensics for cloud computing. In: IFIP ınternational
conference on distributed applications and ınteroperable systems. Springer, Berlin
31. Khan A et al (2017) Secure logging as a service using reversible watermarking.Procedia Comput
Sci 110:336–343
32. Muthurajkumar S et al (2015) Secured temporal log management techniques for cloud. Procedia
Comput Sci 46:589–595
A Systematic Review of Log-Based Cloud Forensics 347

33. Liu J-Y et al (2019) An anonymous blockchain-based logging system for cloud computing. In:
International conference on blockchain and trustworthy systems. Springer, Singapore
34. Patrascu A, Patriciu V-V (2015) Logging for cloud computing forensic systems. Int J Comput
Commun Control 10(2):222–229
35. Rane S et al (2019) Decentralized logging service using IPFS for cloud ınfrastructure.Available
at SSRN 3419772
36. Satrya GB, Shin SY (2018) Proposed method for mobile forensics investigation analysis of
remnant data on Google Drive client.J Internet Technol 19(6):1741–1751
37. Amirullah A, Riadi I, Luthfi A (2016) Forensics analysis from cloud storage client application
on proprietary operating system. Int J Comput Appl 143(1):1–7
Performance Analysis of K-ELM
Classifiers with the State-of-Art
Classifiers for Human Action Recognition

Ratnala Venkata Siva Harish and P. Rajesh Kumar

Abstract Recent advances in computer vision have drawn much attention toward
human activity recognition (HAR) for numerous applications similar to video games,
robotics, content recovery, video surveillance, etc. The enlightening and pursuing of
human actions recognized by the wearable sensor devices (WSD) generally used
today face difficulty in precision and reckless automatic recognition due to regular
change of body movements by the human. Primarily, the HAR system will preprocess
the WSD signal, and then, six sets of features were extracted from wearable sensor
accelerometer data that are viable from the computational viewpoint. In the end, after
the crucial dimensionality reduction process, the selected features were utilized by the
classifier to ensure high human action classification results. In this paper, to analyze
the performance of the K-ELM, classifiers-based deep model for selected features
is predominantly focused with the state-of-the-art classifiers such as artificial neural
network (ANN), k-nearest neighbor (KNN), support vector machines (SVM) and
convolutional neural network (CNN). The experimental results obtained by analyzing
performance using the metrics such as Precision, Recall, F-measure, specificity and
accuracy shows that K-ELM outperforms with less time for most of the above-
mentioned state-of-the-art classifiers.

Keywords Kernel extreme learning machine (K-ELM) · Human action recognition


(HAR) · Wearable sensor devices (WSD) · Classifiers

1 Introduction

The persuading development in the field of computer vision is utilized by various


smart applications today, principally the concept of human action recognition is

R. V. S. Harish (B) · P. Rajesh Kumar


Department of Electronics and Communications Engineering, Au College of Engineering
(Autonomous), Visakhapatnam 530 003, Andhrapradesh, India
e-mail: rvsharish@gmail.com
P. Rajesh Kumar
e-mail: rajeshauce@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 349
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_27
350 R. V. S. Harish and P. Rajesh Kumar

exploited for security as well as for various smart environmental applications [1].
The recognition of human action can be analyzed only through constant moni-
toring by the methods as shown in Fig. 1 among that, in recent this can be attained
mostly by the researchers by manipulating a wearable sensor device (WSD) [2].
Furthermore, among the various HAR system developed with internal and external
sensors for posture and motion estimation, accelerometers and gyroscopes are more
precisely used by the researchers [3]. Among that, accelerometers are the sensor most
commonly used in wearable devices, owing to its noted primes such as miniature
size, low cost and power stipulations, and its competency to deliver data promptly
interrelated to the motion of the people [4]. The signal logged by the accelerometer is
validated upon the human activity and its device location, and the increase in the use
of accelerometers for HAR needs to embrace certain inadequacies like positioning
issues and useableness apprehensions [5].
The accelerometer sensor-based reliable HAR system requires an efficient clas-
sifier to speed up the recognition process and its accuracy, and time taken by each
classifier is a major constraint issue [6]. Therefore, quick classification of human
action is necessary to overcome the drawback of conventional classifiers used in
processing signals as it processed as a time-series and desires to remain as continuous
as probable [7]. Furthermost, the recent research studies related to HAR make use of
classifiers such as k-nearest neighbor (kNN), support vector machines (SVM), super-
vised learning Gaussian mixture models (SLGMM), random forest (RF), k-means,
Gaussian mixture models (GMM) and hidden Markov model (HMM) [8]. Although
the advancements in recognizing daily living activities like standing, sitting, sitting
on the ground, lying down, lying, walking, stair mounting and standing up are done
through various approaches, automated HAR is still inadequate due to minor classi-
fication inaccuracies [9]. These issues drawn us toward the analysis of standardized
classifier evaluation based on WSD for multiple applications due to its exertion in
characterizing promising classifier for human action recognition system [4].
The main contribution of this paper is to evaluate the performance of the K-ELM
classifier-based deep model by comparing it with that of the conventional state-of-art
classifiers by using the real-world dataset, which was collected by W. Ugulino’s team
using wearable accelerometers. The human action recognition includes following

Vision
Based
Human Action
Wearable
Recognition
(HAR)

Sensor Object
Based Tagged

Dense
Sensing

Fig. 1 HAR approaches


Performance Analysis of K-ELM Classifiers … 351

process: (i) Accelerometer sensor placement (ii) Preprocessing (iii) Feature extrac-
tion (iv) Feature selection (v) Classification. The results obtained by the classifiers are
evaluated based on metrics such as F-measure, recall, precision and accuracy. This
paper is organized as follows: In Sect. 2, background on human action recognition
system by various research scholars was addressed. In Sect. 3, the adopted K-ELM
classifier-based HAR with above five steps was addressed. In Sect. 4, the exper-
imental results for suggested and state-of-art classifiers were deliberated. Finally,
with a short description, the perspectives of the paper are concluded in Sect. 5.

2 Related Work

In recent times, due to the striking accomplishment of various classifiers in computer


vision diligences, research scholars are keen to use it for HAR systems. Some of
their research works were reviewed here in this section, and its limited classification
accuracy of state-of-art classifiers is tabulated in Table 1.
Sheng et al. in [10] has introduced an improved Extended Region-aware Multiple
Kernel Learning (ER-MKL) for HAR by fusing the human and contextual visual
cue (Multilayer deep features) based on the pre-learned CNN classifiers and prior

Table 1 Extraction of limited classification accuracy of state of artwork in related works (Sect. 2)
Author Year Classifier Classification accuracy
Sheng et al. in [10] 2020 Extended region-aware Classification accuracy of
multiple kernels learning about 70.5%
(ER-MKL)
Weiyao et al. in [11] 2016 Kernel-based extreme Classification accuracy of
learning machine classifier about 94.5%
Jaouedi et al. in [12] 2019 Recurrent neural networks Classification accuracy of
to predict human action about 86%
Xiao et al. in [13] 2019 Convolutional neural Classification accuracy of
network about 0.6212, 0.6637, 0.9216
and 0.894 for different datasets
Zhang et al. in [14] 2019 Deep belief network as Classification accuracy of
classifier about 74.59%, 89.53%,
87.03%, 90.66% based on the
differ features
Zerrouki et al. in [15] 2018 AdaBoost algorithm Classification accuracy of
about 96.56%, 93.91%,
96.56%, 93.91%, for different
datasets
Feng-Ping et al. in [16] 2018 SVM classifier Classification accuracy of
about 92.1%, 91.3%, 91.2%,
79.8%, 88.3%, 55.2% for
different datasets
352 R. V. S. Harish and P. Rajesh Kumar

knowledge utilized. They make use of a JHMDB and UCF Sports datasets to evaluate
the performance of the proposed ER-MKL strategy in comparison with the other
conventional classifiers.
Weiyao et al. in [11] has suggested an effective framework by modeling the
multilevel frame select sampling (MFSS) model to sample the input images for
recognizing human action. Then the motion and static maps (MSM) method, block-
based LBP feature extraction approach and fisher kernel representation are used to
get the motion and static history, texture extraction and combining the block features,
respectively. By analyzing the key parameters such as τ and MSM thresholds, it was
proved that the 3-level temporal level was effective in recognizing human action
than the others. The evaluation of the proposed approach was carried out on three
publicly available datasets and as a future, suggestion Convolutional Neural Network
and NTU dataset was recommended.
Jaouedi et al. in [12] has introduced a HAR strategy by using the GMM-KF based
motion tracking and Recurrent Neural Networks model with Gated Recurrent Unit
for video sequencing. An important tactic used in this approach is to extract the
features from each and every frame of the video under analysis to achieve a better
human action recognition. The experiment outcome proves its high classification
rate and suggests an idea to minimize the video classification time for challenging
datasets like UCF Sport and UCF101 as a future scope.
Xiao et al. in [13] has suggested a new HAR approach, it includes spatial decom-
position by three-level spatial pyramid feature extraction scheme and deep repre-
sentation extraction by the dual-aggregation scheme. Then by fusing both the local
and deep features, CXQDA based on Cosine measure and Cross-view Quadratic
Discriminant Analysis (XQDA) are utilized to categorize the human action. The
experimental outcome shows its effective performance than that of the conventional
strategies.
Zhang et al. in [14] has suggested a DBN based electromyography (EMG) signal
classifier for time-domain features for 20 human muscular actions. By means of the
best set of features 4-class EMG signal classifier was designed for a user interface
system mainly used in potential applications. Due to high variance of EMG signal for
multiple features, it was difficult to choose the optimal classifier, hence they suggest
to optimize the structural parameters of DBN with dominant features for real-time
multi-class EMG signal recognitions for human muscular actions.
Zerrouki et al. in [15] has introduced a video camera monitoring along with
adapting AdaBoost classifier based human action recognition strategy in the paper. By
partitioning the human body into 5 partitions six classes of activities such as walking,
standing, bending, lying, squatting, and sitting are analyzed during the recognizing
process. To evaluate the performance Universidad de Malaga, fall detection dataset
(URFDD) was utilized and to deliberate its effectiveness they compared it with the
conventional classifiers like a neural network, K-nearest neighbor, support vector
machine and naive Bayes. Finally, as future direction, they suggest using an automatic
updating method and infrared or thermal equipped cameras to ease the recognition
process in the dusky environment.
Performance Analysis of K-ELM Classifiers … 353

Feng-Ping et al. in [16] has developed a deep learning model based on MMN
and Maxout activation function for human action recognition. The suggested
approach guarantees stable gradient propagation, avoid slow convergence process
and improves the image recognition performance. Here, high-level space–time
features from the sequences are extracted and finally classified with a two-layer
neural network structure trained support vector machine. The type of human action
and multi-class action recognition can be achieved through RBM-NN approach.
The multi-class human action recognition can be evaluated by means of 3 set of
datasets and proves to be quick and accurate recognition than that of the conventional
multi-class action recognition approaches.

3 Proposed Methodology

The main objective here is to work out the performance analysis of a K-ELM deep
model aimed at human action recognition (HAR) originate from wearable sensor
device motion analysis all for the selected set of features. The extraction of features
subsisted through the help of multilayer extreme learning machine (ML-ELM) and
finally classified in the midst of kernel extreme learning machine (KELM) classifier
which has the advantage of convolutional neural network (CNN) to overcome the
instability in ELM. The brief extension of the proposed strategy is portrayed in the
underneath sections.

3.1 Extreme Learning Machine (ELM)

ELM is one of the successful feed-forward regression classifiers suits well for large-
scale video or motion analysis exertions. The conventional neural networks involve
hidden layer and include mapping through back propagation algorithm and least
square approaches while learning. But, the learning problems in ELM are converted
into an undeviating scheme whose weight matrices were evaluated through compre-
hensive inverse operation (Moore–Penrose pseudo inverse), i.e., it will assign only
the hidden neurons and randomize the weights as well as bias in between the input and
hidden layers to evaluate the output matrix during the execution process. Finally, the
Moore–Penrose pseudo inverse method beneath the principle of least-squares method
helps to attain the weight in between the final hidden and output layer. This undevi-
ating learning scheme with the norm of weights and less error process quickly with
superior classification competence than that of the conventional learning strategies.
Figure 2 indicates the ELM network with input layer with ‘n’, ‘l’ and ‘m’ number
of inputs, hidden and output layers, respectively.
Let us assume that the network with input sample as

[x, y] = {xi , yi }; where, i = 1, 2, . . . Q (1)


354 R. V. S. Harish and P. Rajesh Kumar

Output Layer

………….

β
H=G(ωX+b)
Hidden Layer

………….

Input Layer

………….

Fig. 2 Structure of the extreme learning machine (ELM)

The input feature of the above sample x and its desired matrix y are represented
as follows,

x = (xi1 , xi2 , . . . xi Q ) (2)


Performance Analysis of K-ELM Classifiers … 355
⎡ ⎤
x11 x12 . . . x1Q
⎢x x22 . . . x2Q ⎥
⎢ 21 ⎥
⎢ ⎥
⎢· · · ⎥
x =⎢ ⎥ (3)
⎢· · · ⎥
⎢ ⎥
⎣· · · ⎦
xn1 xn2 xn Q

y = (yi1 , yi2 , . . . yi Q ) (4)


⎡ ⎤
y11 y12 . . . ym Q
⎢y y22 . . . ym Q ⎥
⎢ 21 ⎥
⎢ ⎥
⎢· · · ⎥
y=⎢ ⎥ (5)
⎢· · · ⎥
⎢ ⎥
⎣· · · ⎦
yn1 yn2 ym Q

In the above equations, ‘n’ and ‘m’ denote the input and output matrix dimensions,
the randomized weight wi j sandwiched between the input and hidden layer was
expressed as follows.
⎡ ⎤
w11 w12 . . . w1m
⎢w w22 . . . w2m ⎥
⎢ 21 ⎥
⎢ ⎥
⎢· · · ⎥
wi j = ⎢ ⎥ (6)
⎢· · · ⎥
⎢ ⎥
⎣· · · ⎦
wl1 wl2 wln

Likewise, the weight β jk made by the ELM in between the hidden and output
layer was represented as follows,
⎡ ⎤
β11 β12 . . . β1m
⎢β β22 . . . β2m ⎥
⎢ 21 ⎥
⎢ ⎥
⎢· · · ⎥
β jk =⎢ ⎥ (7)
⎢· · · ⎥
⎢ ⎥
⎣· · · ⎦
βl1 βl2 βlm

The Bias made by the ELM for the hidden layer neurons were expressed as
B = [b1 b2 . . . bn ]T , the network activation function was represented as g(x), and the
output matrix was represented as follows, T = [t1 t2 . . . t Q ]m×Q , i.e.,
356 R. V. S. Harish and P. Rajesh Kumar

⎡ ⎤

l

⎢ βi1 g(ωi x j + bi ) ⎥
⎢ i=1 ⎥
⎡ ⎤ ⎢ ⎥
ti j ⎢ l ⎥
⎢ ⎥
⎢t ⎥ ⎢ βi2 g(ωi x j + bi ) ⎥
⎢ 2j ⎥ ⎢ ⎥
⎢ ⎥ ⎢ i=1 ⎥
⎢. ⎥ ⎢ ⎥
⎢ ⎥
tj = ⎢ ⎥ = ⎢ .⎢ ⎥; j = 1, 2, 3, 4, . . . Q (8)

⎢. ⎥ ⎢ ⎥
⎢ ⎥ ⎢. ⎥
⎣. ⎦ ⎢ ⎥
⎢ ⎥
⎢ . ⎥
tm j ⎢ ⎥
⎢ l ⎥
⎣ ⎦
βim g(ωi x j + bi )
i=1

By using the above equations, it can be formulated,



Hβ = T (9)

where H represents the hidden layer output, T  is the transpose of T . To evaluate the
weight matrix value β with minimum error, least square method was utilized.

β = H +T  (10)

To regularize the term β and for the hidden layer neurons with less training samples
with stabilized output results, the β is represented as follows,

−1
1
β= + HT H HT T (11)
λ

Similarly, for hidden layer neurons with more training samples, the β is
represented as follows,

−1
1
β=H T
+ H HT T (12)
λ

3.2 Multilayer Extreme Learning Machine (ML-ELM)

The multilayer extreme learning machine (ML-ELM) consists of two or more hidden
layers with ‘l’ neurons and a single output layer. The g(x) is selected for the network
layers, then the bias evaluation and weight updating for all the layers in between the
input and output layers were done by using the following equivalences.
Performance Analysis of K-ELM Classifiers … 357

Let us assume that two hidden layers ML-ELM shown in Fig. 3 has (X, T ) =
{xi , ti }; i = 1, 2, 3, . . . , Q training samples, in which x denotes the input and t
denotes the ruminated sample, the hidden layer output can be evaluated by using the
following equations.

H = g(wx + b) (13)

where w and b signify the randomly initialized weight and bias of the hidden layers,
the final layer output matrix was evaluated by using the following equation

Output Layer

………….

βnew

Hidden Layer

………….

H2=g(wH H)

Hidden Layer

H=g(wHX)

Input Layer

………….

Fig. 3 Structure of multilayer extreme learning machine (ML-ELM)


358 R. V. S. Harish and P. Rajesh Kumar

β = H +T (14)

where H + signifies the Moore–Penrose inverse matrix of H . Let us assume that our
ML-ELM was designed with three hidden layers and its expected output and weight
matrix were evaluated by using the following Eq. (15)
+
H2 = Tβnew (15)

where
+
βnew signifies the inverse weight matrix βnew

W H 1 = [B1 W1 ] (16)

H3 = g(H1 w1 + B1 ) = g(w H H) (17)

where w H 1 = g −1 (H2 )H1+ , W2 and H2 = weight and output between second and
third hidden layer, H1+ = Inverse of H1 = [1 H2 ]T ; 1 signifies the column vector
size Q and g −1 (H2 ) signifies the inverse activation function. Here, to evaluate the
performance, the logistic sigmoid function is adopted,

g(x) = 1 (1 + e−x ) (18)

Finally, the last layer, i.e., second layer and the output weight matrix with less and
more neurons than that of the training samples was evaluated by using the following
equations,

H4 = g(W H 1 H1 ) (19)

−1
βnew = 1 λ + H3T H2 H3T T (20)

−1
βnew = H3T 1 λ + H1 H1T T (21)

f (x) = H3 βnew (22)

where f (x) is the actual final hidden layer output after parameter optimization
through all the inner layers present.
⎧ −1

⎨ h(x)H T I C + H H T T; N < l
f (x) = (23)
⎩ h(x) I + H T H
⎪ −1
C HT T; N ≥ l
Performance Analysis of K-ELM Classifiers … 359

The deep learned six-time-domain features like mean value, standard deviation,
Minmax, skewness, kurtosis and correlation are extracted by means of the above
equivalences help us to gain better action recognition rate in the next classifica-
tion process. Though specific features are captured during the different aspects
of actions in the video, synthesis of features before classification gives us distinct
characteristics.

3.3 Kernel Extreme Learning Machine (K-ELM)

The ELM is a most efficient method with high-speed classification process than that
of the conventional back propagation strategies due to its capability in generating the
weights and bias randomly. Kernel based extreme learning machines proposed by
Hung et al. in [17] by following Mercer’s has been utilized here for Human action
classification process. The kernel matrix with unknown mapping function h(x) is
well-defined as follows.
⎡ ⎤
k(x1 , x1 ) . . . k(x1 , x N )
⎢· · ⎥
ELM = HH = ⎢
T
⎣·

⎦ (24)
·
k(x N , X 1 ) . . . k(x N , x N )

K (xi , x j ) = h(xi ) · h(x j ) (25)

At last, the K-ELM output function was expressed as



T
..
f (x) = h(x)β = k(x, x1 ).k(x, xn ) ( I C + k)−1 T (26)

By considering the above equations from (24) to (26), the output weight of the
K-ELM was evaluated by using the following Eq. (27), in which  is the kernel
matrix of input matrix given to the K-ELM classifier.
−1
β = + I C Y (27)

Our proposed K-ELM-based HAR strategy syndicates the benefits of convolu-


tional neural network and multiple layer extreme learning machine methods. The
time-domain features of wearable sensor data (WSD) are extracted by means of
two-layer ML-ELM and then correlated before sending it to the K-ELM classifier.
360 R. V. S. Harish and P. Rajesh Kumar

3.4 Proposed K-ELM-Based HAR Strategy

The accelerometer data collected by the WSD was given as an input to the HAR
system, assume that data with a four-dimensional sequence with respect to time is
taken here for action recognition from the real-life dataset collected by W. Ugulino’s
by means of using four wearable accelerometers. After the acquisition of data, prepro-
cessing was carried out, which includes the dimensionality reduction and segmen-
tation of moving parts, i.e., sequencing the signal data into subsequences quietly
termed as sliding window process is applied to sequential data partitioning.
Subsequently, after partitioning, the input sensor data is preceded by dint of the
feature extraction process. Here, time-domain features are aimed to extract for human
action recognition in this exertion by an ML-ELM. The accelerometer signal time
integrals are evaluated by means of heterogeneous metric so-called as integral of
modulus of accelerations (IMA) is expressed as follows.

N N N
 
IMA = |ax |dt + a y dt + |az |dt (28)
t=1 t=0 t=0

where
ax , a y , az —orthogonal acceleration components.
t—time.
N —window length.
Formerly from the extracted features, a set of six-time-domain features is selected
which includes mean value, standard deviation, Minmax, skewness, kurtosis and
correlation to differentiate from the original set of samples and to ease the further
classification process at less time. Finally, K-ELM classifier is used to classify human
actions based on the selected set of features with less error. The performance of the
K-ELM hierarchical classifier was compared with those of standard classifiers such
as an artificial neural network (ANN), k-nearest neighbor (KNN), support vector
machines (SVM) and convolutional neural network (CNN) based on its classification
accuracy rate was discussed in the below section.

4 Result and Discussion

The performance analysis of the K-ELM classifier in HAR is implemented in


MATLAB 2015 and compared here with the conventional standard classifiers based
on the wearable W. Ugulino’s HAR dataset. W. Ugulino’s makes use of a four tri-
axial ADXL335 accelerometer, AT mega 328 V microcontroller, Lilypad Arduino
toolkit, and by placing the accelerometer in the waist, left thigh, right ankle and right
arm, the HAR data was collected. Here, the dataset of W. Ugulino’s is classified into
training and testing data with 80–20% with 128 readings/window for each dimension
Performance Analysis of K-ELM Classifiers … 361

data. The performance evaluation criteria used here for analysis include precision,
recall, F-measure, specificity and accuracy.

Accuracy = Tp + Tn Tp + Tn + Fp + Fn (29)

where
Tn (True negative)—Truly classified negative samples.
Tp (True positive)—Truly positive samples.
Fn (False negative)—Faultily classified positives.
Fp (False positive)—Faultily classified negatives.
F-measure is the integration of both the recall and precision, and it is expressed
as follows

Precision = Tp Tp + Fp (30)


Recall = Tp Tp + Fn (31)


F-Score = (1 + β ) ∗ recall ∗ precision β 2 ∗ recall + precision
2
(32)


Specificity = Tn Tn + Fp (33)

where β—represents the weighing factor.


Here, the performance of ANN, KNN, SVM and CNN is analyzed with the
proposed K-ELM approaches for human action recognition system with wearable
sensor data. The six features are extracted (mean value, standard deviation, Minmax,
skewness, kurtosis and correlation), and these features are used to train K-ELM
classifier for more stable performance than that of the conventional classifiers. The
execution outcome of these approaches and its comparative difference were discussed
through Table 2 and Figs. 4, 5, 6 and 7.
As shown in Table 1, every classifier under analysis shows a level of accuracy
with respect to time for HAR. Here, the SVM and ANN have a lesser F.score of
about 90.66–90.23% subsequently with the time of about 11 and 20 min separately
to classify the human action while simulating. Similarly, KNN and CNN had an

Table 2 Comparison between the different approaches for detecting HAR


Time F-score Recall Accuracy Specificity
ANN 20 min 90.23 91.21 91.83 97.07
KNN 15 min 94.60 94.57 94.62 99.67
SVM 11 min 90.66 90.98 90.33 96.56
CNN 30 min 93.21 93.45 95.72 96.57
K-ELM 20 min 98.243 97.458 98.97 98.56
362 R. V. S. Harish and P. Rajesh Kumar

Fig. 4 Comparative F-score analysis of classifiers

Fig. 5 Comparative recall analysis of classifiers

Fig. 6 Comparative specificity analysis of classifiers


Performance Analysis of K-ELM Classifiers … 363

Fig. 7 Comparative accuracy analysis classifiers

F.score of about 94.60–93.21% congruently with the time taken of about 15–30 min
desperately to classify the human action. Our proposed K-ELMM classifier has an
F.score of about 98.24% with the time taken of about 20 min, from this analysis,
K-ELM attains efficient F.score value at the more or lesser time than that of the
other classifiers under analysis with high F.score value.
Similarly, the recall, specificity and accuracy of the proposed K-ELM approach in
comparison with the state-of-art classifiers are shown in Figs. 4, 5 and 6, respectively.
The selected set of features helps to characterize the human action (sit, walk, upstairs,
stand and downstairs) well than that of the conventional classifiers for some reasons
like sensor used and procedure variances meant for validation. In the case of ANN,
KNN, SVM, CNN and K-ELM, the recall of about 91.21%, 94.57%, 90.98%, 93.45%
and 97.458% is obtained while recognizing the action. Similarly, in the case of
specificity and accuracy, the values obtained are about 97.07%, 99.67%, 96.56%,
96.57%, 98.56% and 91.83%, 94.62%, 90.33%, 95.72%, 98.97% correspondingly
by the classifiers under analysis. It should be noted that the analysis using selected
six sets of features as input to the K-ELM classifiers shows better performances than
the others from this study by recognizing the human actions.

5 Conclusion

In this paper, the performance analysis of proposed k-ELM classifier has been
presented with a selected set of features with the conventional state-of-art classi-
fiers by using W. Ugulino’s accelerometer dataset. The human action recognition
process is described with its signifying equivalences utilized for feature extraction
and classification. Finally, a comparative analysis of K-ELM is presented by using
the selected set of time-domain features (mean value, standard deviation, Minmax,
skewness, kurtosis and correlation) as an input shows effective results than that
of other ANN, KNN, SVM and CNN approaches. From the analysis, the integra-
tion of above classifiers as a future direction would perform better with accurate
364 R. V. S. Harish and P. Rajesh Kumar

complementary decisions. However, our approach has a drawback of computation


complexity, and while implementing it on real-time application because of the Wi-Fi
signal requirement by WSD, an effective action recognition without any interruption
is solely difficult.

References

1. Bayat A, Pomplun M, Tran D (2014) A study on human activity recognition using accelerometer
data from smartphones. Procedia Comput Sci 34:450–457
2. Casale P, Oriol P, Petia R (2011) Human activity recognition from accelerometer data using a
wearable device. In: Iberian conference on pattern recognition and image analysis. Springer,
Berlin, pp 289–296
3. Pantelopoulos A, Bourbakis N (2010) A survey on wearable sensor-based systems for health
monitoring and prognosis. IEEE Trans Syst Man Cybern Part C (Appl Rev) 40:1–12
4. Jordao A, Antonio C, Nazare Jr., Sena J, Schwartz WR (2018) Human activity recognition
based on wearable sensor data. A standardization of the state-of-the-art. arXiv preprint arXiv:
1806.05226
5. Cleland I, Kikhia B, Nugent C, Boytsov A, Hallberg J, Synnes K, McClean S, Finlay D
(2013) Optimal placement of accelerometers for the detection of everyday activities. Sensors
13:9183–9200
6. Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput
28:976–990
7. Vishwakarma S, Agrawal A (2012) A survey on activity recognition and behavior understanding
in video surveillance. Vis Comput 29:983–1009
8. Tharwat A, Mahdi H, Elhoseny M, Hassanien A (2018) Recognizing human activity in mobile
crowdsensing environment using optimized k-NN algorithm. Expert Syst Appl 107:32–44
9. Ignatov A (2018) Real-time human activity recognition from accelerometer data using
convolutional neural networks. Appl Soft Comput 62:915–922
10. Weiyao X, Muqing W, Min Z, Yifeng L, Bo L, Ting X (2019) Human action recognition using
multilevel depth motion maps. IEEE Access 7:41811–41822
11. Jaouedi N, Boujnah N, Bouhlel M (2020) A new hybrid deep learning model for human action
recognition. J King Saud Univ Comput Inf Sci 32:447–453
12. Xiao J, Cui X, Li F (2020) Human action recognition based on convolutional neural network
and spatial pyramid representation. J Vis Commun Image Rep 71:102722
13. Zhang J, Ling C, Li S (2019) EMG signals based human action recognition via deep belief
networks. IFAC-Pap OnLine 52:271–276
14. Zerrouki N, Harrou F, Sun Y, Houacine A (2018) Vision-based human action classification
using adaptive boosting algorithm. IEEE Sens J 18:5115–5121
15. An F (2018) Human action recognition algorithm based on adaptive initialization of deep
learning model parameters and support vector machine. IEEE Access 6:59405–59421
16. Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and
multiclass classification. IEEE Trans Syst Man Cybern Part B (Cybern) 42:513–529
17. Niu X, Wang Z, Pan Z (2019) Extreme learning machine-based deep model for human activity
recognition with wearable sensors. Comput Sci Eng 21:16–25
Singular Value Decomposition-Based
High-Resolution Channel Estimation
Scheme for mmWave Massive MIMO
with Hybrid Precoding for 5G
Applications

V. Baranidharan, N. Praveen Kumar, K. M. Naveen, R. Prathap,


and K. P. Nithish Sriman

Abstract Channel estimation is a very challenging task for an on-grid massive


MIMO over the mmWave system with hybrid precoding. The problem arises due
to more number of radio frequency chains that are comparatively smaller than the
antennas used in the systems. The conventional massive MIMO channel estimation
system over mm waveband is based on the off-grid learning models parameters and
virtual channel estimators. The resolution loss is arises based on the high computa-
tional complexity of channel estimation by its Bayesian learning framework. The
accuracy of the on-grid channel estimation, the work proposes a singular value
decomposition-based iterative reweighted channel estimation scheme for massive
MIMO over mmWave band. The objective function-based gradient descent method
is used for the optimization. This proposed objective functions can be used to opti-
mize the channel estimation by iteratively moving the estimate of an angle of arrival
and departs. The iteratively reweighted parameters are used for the optimization of
trade-off errors.

Keywords Massive MIMO · Angle of arrival · mmWave band · Channel


estimation · Angle of departure

V. Baranidharan · N. Praveen Kumar (B) · K. M. Naveen · R. Prathap · K. P. Nithish Sriman


Department of Electronics and Communication Engineering, Bannari Amman Institute of
Technology, Sathy, India
e-mail: praveenkumarn.ec16@bitsathy.ac.in
V. Baranidharan
e-mail: baranidhar@hotmail.com
K. M. Naveen
e-mail: naveenkm.ec16@bitsathy.ac.in
R. Prathap
e-mail: prathap.ec16@bitsathy.ac.in
K. P. Nithish Sriman
e-mail: nithishsriman.ec18@bitsathy.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 365
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_28
366 V. Baranidharan et al.

1 Introduction

In 5G wireless communication, mmWave massive MIMO systems have been realized


as a greatest and use trendy technology in modern days 5G applications. In this
system, hybrid precoding is recommended to reduce the cost of the hardware and
consumption of high power. More number of antennas is handled by N number of
radio frequency chains. Hybrid precoding contain both analog and digital co-design,
the RF chain used is less, so the antennas cannot directly accessible by the digital
baseband. The accurate channel state information (CSI) needed for hybrid precoding
[1]. So it can be very hard to estimate the high range of MIMO channels and its matrix.
Recently, more novel channel estimation schemes are proposed for this massive
MIMO system over mmWave band with hybrid precoding technique. Specially, the
channel sounding scheme which is based on adaptive codebook, the best beam pair is
searched by both transmitter and receiver, due to adjusting the predefined precoding.
The codebook size is limited in this channel estimation scheme. Because of limited
codebook size, it can determine the best angle estimation with the help of comparison
between amplitude and beam pair. Or else, by use of angular channels sparsity, the less
training overhead channel could evaluate by on-grid compressive method. However,
it is assumed that the angle of arrival departures (AoA’s /AoD’s) always lies in the
angle domain as a discrete point. In the time domain, AoA’s/AoD’s are constantly
distributed in practice [2]. The use of on-grid AoA’s/AoD’s leads to power leakage
problem, which critically reduces the accuracy of channel estimation schemes. To
solve the issues and problem caused by angle estimation of on-grid channels, the
present SVD-based iterative reweighted estimation of the channel has been proposed
to formulate the off/on-grid AoA’s and AoD’s [3].
At first, the proposed work will iteratively optimize the estimates of the angle of
arrivals and departures to eliminate the weighted total of the sparsity of the symbols
and fitting error of data. This will introduce the SVD method preconditioning to
decrease computational complexity in this scheme. In addition, it is suggested that
the SVD method reduce the computational complexity of the iteratively reweighted
procedure and makes in real-time practice in mmWave channel estimation. The
paper is organized as follows, Sect. 2 gives related works carried out in massive
MIMO systems. Section 3 explains the system model and proposed SVD-based
high-resolution channel estimation scheme. Section 4 gives the simulated results
and its discussions. The paper is concluded in Sect. 5.

2 Related Works

This section explores the many existing techniques which greatly overcomes the
computational complexity and feedback overhead changes. Pazdanowski have
proposed the new channel estimation schemes through parameters learning [4] for
channel estimation of mmWave massive MIMO systems. This work is fully based
Singular Value Decomposition-Based High-Resolution Channel Estimation … 367

on the off-grid channel model is widely used to spatial sample mismatch-based char-
acterization using discrete Fourier transform (DFT) method for mmWave massive
MIMO channel estimation. The main limitation of this channel estimation scheme
is that this work is used to estimate off-grid parameters of mmWave massive MIMO
channels.
Qi et al., have proposed off-grid method to estimate the channel for massive
MIMO systems over mmWave band [5]. In this method, the major advantage is that
the pilot overhead is decreased. This system employs an off-grid scenario-based
sparse signal reconstructing scheme. The accuracy of channel estimation is quite
improved in this scheme. The separated stages of channel estimation including AOA
ad AOD’s estimation followed for path gaining estimation is good. In this proposed
method, the accuracy of this algorithm is comparatively less in grid point construc-
tion. Minimizing the objective function is not refined by suppressing the effect of
off-grids. Wang et al., have proposed multi-panel mmWave with hybrid precoding for
mmWave massive MIMO scheme [6]. In this method, the channel vector is converted
into an angular domain and then CSI information is restored by the way of formulated
angular CSI. In order to exploit the strucutural features of mmWave MIMO channels
is always very difficult in angular domain. The major disadvantage of this method is
that the computational complexity is not decreased. Qibo Qin et al., have proposed
channel estimation by time-varying for millimeter wave massive MIMO systems [7].
In this method, the scatter nature (i.e., time-varying) of mmWave channels is used
AoA’s/AoD’s are estimated. In order to overcome these issues, the adaptive angle
estimation method is used to formulate the AoA’s/AoD’s estimation. Even though,
the computational complexity of separated stages of channel estimation is very high.
Jianwei Zhao et al., have proposed hybrid precoding with angle domain and tracking
method of mmWave channel for wave massive-based MIMO channel systems [8].
A hybrid precoding with angle domain and mmWave channel tracking method is
used for searching the structural features of millimetre wave MIMO channel. All the
users can be scheduled by one of the schemes, based on their direction of arrivals [9,
10]. The major limitation of this method is high SNR error value is limited. While
retraining the system, the effect of DoA tracking is not improved.

3 Proposed SVD-Based High-Resolution Channel


Estimation Scheme for mmWave Massive MIMO Systems

3.1 System Model

Consider mmWave massive MIMO with efficient hybrid precoding systems, with
arbitrary geometry. N T , N TRF , N R , N RRF be the number of antennas for transmission,
RF transmitter chains, receive antennas and receiver RF chains, respectively [11–15].
In real-time practical 5G systems with hybrid precoding, the number of RF chains
is less than the number of antennas, i.e., N TRF < N T , N RRF < N R The system model
368 V. Baranidharan et al.

is given below,

r = Q H HPs + n (1)

where Q is a hybrid combining matrix, r is received signal, H is the matrix represen-


tation of the channel coefficients, P is precoding matrix used before the transmission,
n is noise received in channel and s is transmitted signal. The channel model
   H  azi ele 
H= zl a R ϕ azi
R,L , ϕ R,L aT ϕT,L , ϕT,L
ele
(2)

In this massive MIMO systems, assume that, number of propagation paths L 


minimum (N R , N T ), the azimuth elevation AoA’s and AoD’s, followed by steering
vector in both transmitter and receiver, similarly steering vectors are determined by
array geometry.

a(ϕ azi , ϕ ele ) = [1, e j2πd sin ϕ ele/λ , . . . , e j2π(N 1−1)d sin nϕazi sin ϕele/λ ]T
 T
⊗ 1, e j2πd cos ϕ ele/λ , . . . , e j2π(N 2 − 1) d cos ϕele/λ (3)

where the symbol ⊗ represents is a Kronecker product, d is spacing between the


antennas, the wavelength is λ.
 T
a(ϕ) = 1, e j2πd sin ϕ/λ , . . . , e j2π(N 1−1)d sin ϕ/λ (4)

The channel matrix H in (2) is given as

H = A R ∗ (θ R ) ∗ diag(z) ∗ A TH (θT ). (5)

Here, the ith channel matrix element at x transmitted signal at the ith transmit
antenna. In the mth time frame for the users, the combined matrix W m is obtained
by the number of RF chains N RRF dimensional received pilot arrangements.

Y p,m = WmH H x p + n p,m (6)

By received pilots in M path slots, the output of channel Y is

Y = WHHX + N (7)

The estimated channel matrix H (7) is always equivalent the number of paths
estimation, AOA’s and AOD’s angles are normalized. The channel matrix (H) is
obtained due to the angle domain sparsity
   
 
min ẑ o , s.t.Y − W H Ĥ  ≤ ε (8)
z,θ R ,θT F
Singular Value Decomposition-Based High-Resolution Channel Estimation … 369

where Z o is the total number of nonzero elements occurs at ẑ, and ε is the error
tolerance parameter.

3.2 Proposed Optimization Formulation

The major disadvantage of the above equation is that by solving l0 -norm vector.
Because this l0 -norm is not computationally efficient as expected.


L
   
 
min F(z)  log |z 1 |2 + δ , s.t.Y − W H Ĥ  ≤ ε (9)
z,θ R ,θT F
l=0

where δ > 0 ensures that the below Eq. (10) is well defined. Z, θ R , and θT are the
parameters which are used to determine Ĥ . Further, it is used to optimize this as
an unconstrained optimized problem by addition of a normalization parameter λ is
greater than 0.


L
   2
 
min G  log |z 1 |2 + δ + λY − W H Ĥ  (10)
z,θ R ,θT F
l−1

When the log-sum function is replaced by iterative surrogate function, the


minimized surrogate function is similar to the minimization of G(z, θ R , θT ).
 2
 
min s (i)  λ−1 z H D (i) z + Y − W H Ĥ  (11)
z,θ R ,θT F

where D (i) is given by,

1 1 1
D (i)   2 +  2 · · ·  2 (12)
 (i)   (i)   (i) 
ẑ 1  + δ ẑ 1  + δ ẑ 1  + δ


(i)
where the z is the ith iteration estimation of z.

3.3 Iterative Reweight-Based Channel Estimation

The constrained optimization problem is overcome by the reconstructing method


which is discussed in Fig. 1. The term s(i)
 is based on
H
 the sum of two parts: z Dz
 
regulates sparsity of estimated results. Y − W H Ĥ  is signifying the balance.
F
Where λ term is defined as the regularization parameter to controls the trade-off
370 V. Baranidharan et al.

Fig. 1 Flowchart of the proposed algorithm to find AOA’s and AOD’s

between data fitting error and sparsity. The cyclic-reweighted technique (10), the
value λ is not at all fixed, however, it is updated in every cycle. If the past cycle is
ineffectively fitted, the λ is chosen smaller to make the estimated as sparser. The
larger value is chosen to quicken the search of the best-fitting estimation. In the
calculation is proposed, λ is given as


λ = min (d) r (i) , λmax (13)
Singular Value Decomposition-Based High-Resolution Channel Estimation … 371

To make the problem well-conditioned λmax is selected, a constant scaling factor


is denoted as d, squared residue value r (i) , i.e.,

 
2

r (i) = Y − W H A R θ̂ R(i) diag ẑ (i) A TH θ̂T(i) X  (14)
F

More details of updated λ were discussed in (13). At angle domain grids the
iteration of the proposed algorithm begins. The main aim is to estimates θ̂ R(i+1) and
θ̂T(i+1) in the neighborhood of the previous estimate θ̂ R(i) and θ̂T(i) This is used to define
the smaller the objective function s (i) . This is done by gradient descent method
(GDM) of optimization. The equation is given as


θ̂ R(i+1) = θ̂ R(i) − η · ∇θ R Sopt
(i)
θ̂ R(i) , θ̂T(i) (15)


θ̂T(i+1) = θ̂T(i) − η · ∇θT Sopt
(i)
θ̂ R(i) , θ̂T(i) (16)

Depending on gradient values, the chosen step length is ï to make sure that
the new optimized objective function’s estimate is less than equal to the previous
optimized objective functions estimation. During the iterative searching, the esti-
mate becomes more accurate, it happens till when the previous estimate is the same
as the new estimate. In this scheme proposed, from the initial estimated on-grid
coarse values (θ R , θ T ) are moved to its actual positions of the off-grid. The flowchart
of the proposed algorithm to find AOA’s and AOD’s is explained in Fig. 1. It is
very much important to figure out the value of the unknown sparsity level. In this
scheme, the sparsity is beginned greater than the real value of the channel sparsity.
The iteration-based high-resolution scheme, the sparsity level is set to maximum
when comparing to the real channel sparsity, and the paths will always be consid-
ered as a noise generated in the channel instead of real paths when the path gain is
too small. Then, this proposed algorithm is pruned these channel paths to make the
result as sparser than the existing systems. During iteration, the predicted sparsity
level will have decreased to the number of paths. The proposed algorithms compu-
tational complexity of each iteration lies in gradient value calculation. The term
Ql (N X NY (N R + N T )L 2 is the computational complexity which is used to find the
gradients. The total number of starting candidates L (0) is critical. To make the compu-
tation affordable, L (0) should be smaller. The method is widely used to select the

(0)
(0)
effective initial values of θ R and θ T before the iteration will be discussed as detail
in the next section.

3.4 Preconditioning Using SVD Techniques

To minimize the computational complexity in the iterative reweight-based existing


channel estimation, SVD-based preconditioning is introduced in this scheme. The
372 V. Baranidharan et al.

angle domain grids which are nearer to the AoD’s/AoA’s are identified by using this
scheme. The preconditioning is significantly minimized the calculation difficulties
when comparing to the usage of all N R and N T grids as initial candidates. By
using the singular value decomposition to the matrix Y, Y = U V H , where =
diag(σ1 , σ2 . . . σ(min(N X ,NT ) ∈ R NY ∗NT ) whose entries in diagonal is σ1 ≥ σ2 ≥
. . . ≥ σmin(N X ,NT ) ≥ 0 are Y singular values and U H U = I(N X ∗NT ) . From the above
equations,

Y = (W H A R (θ R ))diag(z)(X H A T (θT )) H + N (17)

Since the noise is comparatively small, the L paths identify the singular value and
their vector, i.e., for i = 1, 2, … L. From uniform planar array, for an N 1 × N 2 receiver
antenna arrays, the set of grids can be determined by R = {(i/N1 , j/N2 )|i =
0, 1, . . . , N1 − 1; j = 0, 1, . . . N2 − 1}. T is also determined similarly for the
transmitter.
The algorithm for SVD-based preconditioning is described in this section. The
initial candidate’s values of Fig. 1 are used to set the values of the grid, i.e.,
L (0) = N R N T . When N T and N R values are large, the computational complexity
is Ql N X NY (N R +N T )N R2 N T2 and it is unaffordable. As in Fig. 2, it will describes
the singular value decomposition-based preconditioning, the beginner candidates of
Fig. 1 is coarse estimates, i.e., L (0) = Nintt ≈ L. So the computational complexity will
be Ql N X NY (N R +N T )L 2 . This is the result after the SVD-based preconditioning. The
computational difficulty is lesser when comparing to directly applying in the scheme.

Fig. 2 Comparison of SNR with normalized mean square error (NMSE)


Singular Value Decomposition-Based High-Resolution Channel Estimation … 373

4 Simulation Results and Its Discussion

The performance metrics are investigated by using the simulation results achieved
using MATLAB is explained in this section. The proposed SVD-based high-
resolution-based channel estimation scheme with hybrid precoding is compared
with the existing systems. Initially, some simulation parameters are considered for
mmWave massive MIMO with precoding techniques.
The path gain of the channel is given as α 1 . The path gain is assumed to be
as Gaussian, i.e., α ∼ Cjð (0, σ 2 ). Here, the value of e is the power required for
transmission and it is uniformly distributed from 0 to 2π. The SNR is defined as the
average power of the transmitted signal divided by the average power of the RMS
value of the noise voltage across the system.

4.1 Comparison of NMSE with Respect to SNR

For this mmWave channel estimation scheme, the SNR value will become SNR =
C(σ α 2 /σ n 2 ). Where σ n 2 —noise variance. Figure 2 compares the SNR with normal-
ized mean square error (NMSE) under the both line of sight (LOS) and non-line of
sight (NLOS) channels are considered to estimate the channel of the proposed high-
resolution-based channel estimation scheme with the hybrid precoding technique
with an existing spatial mismatching-based DFT method.
The SNR value is varied from −5 to 10 dB. In this below figure, the red and
blue color line indicates the hybrid precoding method and DFT spatial mismatching
technique, respectively. The Rician K factor value is considered as 20.
In both cases, the proposed channel estimation scheme will outperform than the
existing method. The normalized mean square error value will comparatively low
than the existing method. This result is achieved by considering the uniform planer
array in this proposed scheme. The 64 antenna uniform planar array is considered in
both the transmitter and receiver with 8 rows and 8 columns. This result is achieved
by the way to estimate the azimuth and elevation angles. Here, it is observed that the
estimate values of the proposed and existing values of channel estimation schemes for
massive MIMO system over mmWave communication. Table 1 gives the comparison
values of the SNR and NMSE of the proposed and existing schemes. This table shows
that the NMSE plot statistics values are comparatively less than the existing system
(Table 2).

4.2 Comparison of SNR Versus Squared Residue of Samples

The squared residue value is defined as a sum of the square of the residuals, i.e., the
predicted deviations from actual empirical data values. The squared residue error
374 V. Baranidharan et al.

Table 1 Initial simulation


Simulation parameters Values
parameters
Number of propagation paths (L) 3
Antenna size (d) λ/2
Number of transmit antennas 64
Number of receiver antennas 64
Number of transmitting RF chains 4
Number of receiving RF chains 4
Number of transmitting pilot sequence (N X ) 32
Number of receiving pilot sequence (N Y ) 32

Table 2 Comparison of SNR


Parameters SNR NMSE
and NMSE of the proposed
work with existing systems Spatial mismatch High-resolution
DFT hybrid coding
Min −5 0.01339 0.003572
Max 10 0.2136 0.2009
Mean 2.5 0.07807 00,657
Median 2.5 0.07112 0.005486
Standard 5.4001 0.0674 0.06611
deviation
Range 15 0.2002 0.1974

values are measured for the different number of samples 1, 10, 20, 30, 40 and 50.
The SNR versus average squared residue error values for different samples of the
existing (spatial mismatching using DFT method) and proposed (high-resolution-
based hybrid precoding) methods are shown in Figs. 3 and 4.
The ideal Channel State Information (CSI) of the both existing and the proposed
systems are used to find the estimation errors of azimuth and elevation angles of the
massive MIMO mmWave channels. The average residue error codes are estimated to
get an AOA’s and AOD’s of the channels. Table 3 shows the mean values of sample
1, 10, 20, 30, 40 and 50 of both existing and the proposed schemes.
From the above table, the number of samples increases the average residual values
decreases for the proposed schemes. The existing system shows that the mean value
is 32.47 but for the proposed scheme (high-resolution-based channel estimation
scheme) it is 30.04 value only. The average residue error is decreased. This will
lead to less computational complexity and channel CSI is adopted for good commu-
nication systems over mmWave MIMO channels. The proposed estimation scheme
can achieve better channel accuracy as compared with the existing system.
Singular Value Decomposition-Based High-Resolution Channel Estimation … 375

Fig. 3 Comparison of SNR versus squared residue of samples of DFT-based spatial mismatching
channel estimation schemes

Fig. 4 Comparison of SNR versus squared residue of samples of SVD-based high-resolution


channel estimation schemes with hybrid coding

Table 3 Comparison of
Iteration samples Spatial mismatching High resolution with
average residue values (error
DFT hybrid precoding
values) of the proposed work
with existing schemes Sample 1 32.75 34.32
Samples 10 32.65 33.22
Samples 20 31.93 34.36
Samples 30 32.15 32.43
Samples 40 32.59 31.39
Samples 50 32.47 30.04
376 V. Baranidharan et al.

5 Conclusion

The SVD-based high-resolution channel estimation scheme for mmWave MIMO


systems with hybrid coding techniques has been analyzed critically with existing
systems. At first, an efficient optimization problem of the new objective function is
proposed which is used to calculate the summation weighted based on the channel
sparsity and data fitting errors. The proposed channel estimation schemes will calcu-
late the on-grid points in the angle domain, and these points are a move toward
neighboring actual points by iteratively via gradient descent methods. The accu-
racy is increased in this proposed high-resolution channel estimation schemes is
confirmed based on the simulation results. Better estimation of the angle of arrivals
and departure with high resolution will give a better spectral efficiency. In future
work, high-resolution-based channel estimation scheme will be employed for high
mobility multi-cell mmWave MIMO systems is a challenging topic that needs to be
investigated.

References

1. Gavrilovska L, Rakovic V, Atanasovski V (2016) Visions towards 5G: technical requirements


and potential enablers. Wireless Pers Commun 87(3):731–757. https://doi.org/10.1007/s11277-
015-2632-7
2. Hu C, Dai L, Mir T, Gao Z, Fang J (2018) Super-resolution channel estimation for mmWave
massive MIMO with hybrid precoding. IEEE Trans Veh Technol 67(9):8954–8958. https://doi.
org/10.1109/TVT.2018.2842724
3. Mumtaz S, Rodriguez J, Dai L (2016) mmWave massive MIMO: a paradigm for 5G. mmWave
massive MIMO: a paradigm for 5G, pp 1–351
4. Pazdanowski M (2014) SVD as a preconditioner in nonlinear optimization. Comput Assist
Methods Eng Sci 21(2):141–150
5. Qi B, Wang W, Wang B (2019) Off-grid compressive channel estimation for mm-wave massive
MIMO with hybrid precoding. IEEE Commun Lett 23(1):108–111. https://doi.org/10.1109/
LCOMM.2018.2878557
6. Qin Q, Gui L, Cheng P, Gong B (2018) Time-varying channel estimation for millimeter wave
multiuser MIMO systems. IEEE Trans Veh Technol 67(10):9435–9448. https://doi.org/10.
1109/TVT.2018.2854735
7. Shao W, Zhang S, Zhang X, Ma J, Zhao N, Leung VCM (2019) Massive MIMO channel
estimation over the mmWave systems through parameters learning. IEEE Commun Lett
23(4):672–675. https://doi.org/10.1109/LCOMM.2019.2897995
8. Wang W, Zhang W, Li Y, Lu J (2018) Channel estimation and hybrid precoding for multi-
panel millimeter wave MIMO. Paper presented at the IEEE international conference on
communications. https://doi.org/10.1109/ICC.2018.8422137
9. Zhao J, Gao F, Jia W, Zhang S, Jin S, Lin H (2017) Angle domain hybrid precoding and
channel tracking for millimeter wave massive MIMO systems. IEEE Trans Wireless Commun
16(10):6868–6880
10. Hur S, Kim T, Love DJ, Krogmeier JV, Thomas TA, Ghosh A (2013) Millimeter wave
beamforming for wireless backhaul and access in small cell networks. IEEE Trans Commun
61(10):4391–4403
11. Alkhateeb A, Ayach OE, Leus G, Heath RW (2014) Channel estimation and hybrid precoding
for millimeter wave cellular systems. IEEE J Sel Top Signal Process 8(5):831–846 (2014)
Singular Value Decomposition-Based High-Resolution Channel Estimation … 377

12. Zhu D, Choi J, Heath RW (2017) Auxiliary beam pair enabled AoD and AoA estimation
in closed-loop large-scale millimeter-wave MIMO systems. IEEE Trans Wireless Commun
16(7):4770–4785
13. Lee J, Gil GT, Lee YH (2016) Channel estimation via orthogonal matching pursuit for hybrid
MIMO systems in millimeter wave communications. IEEE Trans Commun 64(6):2370–2386
14. Marzi Z, Ramasamy D, Madhow U (2016) Compressive channel estimation and tracking for
large arrays in mm-wave picocells. IEEE J Sel Top Signal Process 10(3):514–527
15. Fang J, Wang F, Shen Y, Li H, Blum RS (2016) Super-resolution compressed sensing for line
spectral estimation: an iterative reweighted approach. IEEE Trans Sig Process 64(18):4649–
4662
Responsible Data Sharing in the Digital
Economy: Big Data Governance
Adoption in Bancassurance

Sunet Eybers and Naomi Setsabi

Abstract Bancassurance organizations, with its origin in Europe, have become a


contemporary phenomenon in developing countries. Bancassurance organizations
are typically formed by companies who expand their services to selling insurance
products and services to their current customer base. The benefits are realized by
both the bank and customers as banking customers who are potential customers in
insurance houses and insurance policyholders who may have an interest in banking
accounts. To enable this process, data is shared among the bank and the insurance
house. Typically, information technology (IT) infrastructure and data resources inter-
changeably connect to enable data sharing. This might introduce not just infrastruc-
ture challenges but also considerations for governance dictating what data can be
shared and the format of datasets. This case study investigated the big data gover-
nance structures currently adopted by bancassurance organizations in a developing
country focusing on three main areas identified in literature. These areas include
basic, foundation level big data governance structures, data quality and the adop-
tion of guidelines and frameworks with subsequent business value calculations. The
results indicated the existence of data governance structures for structured and semi-
structured operational data but highlighted the need for governance catering for
unstructured big data structures. This also applies to data quality checking proce-
dures. Additional education and training for the various roles responsible for organi-
zational data governance can increase the quality of interoperability of data among
entities.

Keywords Data governance · Data sharing · Big data · Bancassurance · Data ·


Decisions

S. Eybers (B) · N. Setsabi


University of Pretoria, Private Bag X20, Hatfield, Pretoria, South Africa
e-mail: Sunet.eybers@up.ac.za
N. Setsabi
e-mail: naomi_nkosi@yahoo.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 379
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_29
380 S. Eybers and N. Setsabi

1 Introduction

Data, information, knowledge, understanding and wisdom (DIKW) pyramid are


one of the prevalent concepts in the information systems field of study [1]. This
concept drives the idea that information does not exist without data, knowledge is
not gained without information and without learning how to apply the knowledge,
and no wisdom will be created. The symbiotic relationship is depicted using a triangle
and referred to as the DIKW pyramid, with data as the foundation [2].
Organizations depend heavily on data to survive because, without data, no
informed decisions can be made. Increased demand for rapid business decisions leads
to the need to decrease data processing time. Furthermore, an increase in innova-
tions and changes in the digital society lead to the availability of higher data volumes
from diverse data sources that can be collected and analyzed. These datasets have
unique characteristics such as volume (datasets in large files), velocity (speed of data
throughput), variety (data outputs in various file formats) and veracity (accuracy of
data) and value (social and monetary gain) and often referred to as big data [3, 4].
Large, diverse datasets from different data sources imply stricter data governance
requirements. The management of the data as well as how it is consumed and stored
has a direct impact on the quality of organizational decisions due to the quality of the
data used [5]. The availability of vast amounts of data calls for great responsibility for
users of the data in how they should engage with information. As a result, data gover-
nance has received renewed attention to protect consumer rights. According to [6]
big data governance is the management of large volumes of structured (contained in
columns and rows according to data model specifications), semi-structured (data that
can be structured but doesn’t conform to data model specifications) and unstructured
data (no predefined structure).
The main objective of this research is to investigate how data governance is
enforced in big data implementations focusing on bancassurance organizations in a
developing country such as South Africa. The research question under investigation
is, therefore, what elements should be considered when bancassurance organizations,
in a developing country, adopt data governance during big data implementation
projects?

2 Bancassurance and Data Governance

In bancassurance, a bank and an insurance company hold a significant number of


customers shared between the two entities [7]. In isolation each organization only
has a partial customer view and understanding, placing them at a disadvantage in
serving their client needs.
Typically, each organization holds data on their respective information technology
(IT) platforms as such variances in the products and customer data on the data
Responsible Data Sharing in the Digital Economy … 381

shared are imminent. Furthermore, consumers can perform many financial transac-
tions using the digital platforms offered on mobile devices; and can use social media
platforms for escalations or compliments.
One of the focus areas in the banking industry in South Africa is a drive to
identify and understand the characteristics of their customers and their needs with the
main objective of pre-empting customer needs. Due to the fast pace of information
flow by consumers, the banks have expanded this notion with predictive analysis
focusing on customer behavior, demographics, financial status, social standing and
health. This analysis has been based on large volumes of data collated from customer
profiles sourced from various data sources over an extensive period. For example,
unstructured customer data is collected from voice recordings, images and identity
documents (to name a few). This is combined with structured data, which refers
to data generated by enterprise systems used in operational and executive decision
making in this instance. The output of this is usually in the form of reports. Strict rules,
regulations, standards and policies are required to govern these big datasets to protect
the customer, ensure the streamlining of reports to provide accurate information
fostering decision making as well as conceptualization of innovations [3].
To ensure accountability in decisions over the organization’s data assets, [8] refers
to data governance as a business program that should be enforced in the entire orga-
nization. The research further suggests that effective data governance programs are
key to ensuring that processes are correctly followed when managing data assets,
similar to financial controls when monitoring and controlling financial transactions.
In a bancassurance model, a company uses the existing channels and customer
base to offer insurance services. Figure 1 depicts how bancassurance relates to big
data, what typical data is shared between the bank and insurance organization and
the need for governance.
Figure 1 starts by highlighting the different types of big data as well as the various
decision domains in data governance. Big data is then classified according to its
features, which is the 5 Vs (volume, variety, velocity, veracity and value), while the
scope of data governance is defined by the business model, the stakeholders, content
and federation. The scope of data governance and the data governance programs will
be discussed later.
The model highlights the different types of structured and unstructured data found
in each of the bancassurance entities. A decision is further required to determine
which of the different types of big data will need to be shared across entities. This
includes, but not limited to, mostly structured data (e.g., customer details like name,
age, gender, marital status, race, source of funds, residential address, banking patterns
(deposits and withdrawals), credit bureau risk indicator, insurance products sold,
banking core product held, customer personal identification number, bank account
number, bank balance, customer segment, bank account status, policy activation
status) and sometimes may include unstructured data such as FICA documents which
includes a copy of the national identity document, vehicle license document, proof
of address, company registration documents.
In the context of big data, in terms of volume, as at December 2016 Bank A had
11.8 million active customers. A competitor bank showed that they were getting
382 S. Eybers and N. Setsabi

Big Data Governance


Structured Data Quality
Unstructured Data Interoperability
Semi Structured Metadata
Data Policies and Principles
Interoperability/Sharing
Data Storage
Data Privacy and Security
Data Access/Data Use
Analytics

DECISIONS DECISIONS DECISIONS

Velocity Business Model


Variety Stakeholders
Volume Content
Veracity Federation
Value
BANCASSURANCE
Insurance Data Banking Data
- Insurance products - Bank customer base
- Insurance Premiums - Active and non active accounts
- Activated and non Activated Policies - Customer bank balance
- Non banking customers - Customer personal information
- policy documentation - customer interaction recording
- customer interaction recording (email, calls, social
media)

Bancassurance Decisions

Structured data Data Quality


- customer bank Data Access
balance Data Interoperability
- premium payment Data Privacy and Security
- Policy activation Analytics
Bank customer base

Veracity Volume Value Velocity

Unstructured data Data Quality


- customer Data Privacy and Security
interaction recording analytics
- policy Data Access/Use
documentation

Variety

Fig. 1 A bancassurance model depicting big data sharing and data governance
Responsible Data Sharing in the Digital Economy … 383

on average 120,000 new customers per month in 2016. In terms of velocity, about
300,000 large transactions on average per minute are processed in the evening (7 pm)
and about 100,000 transactions are processed in minute midday (12 pm). Multiple
data sources make for variations in data especially in the insurance entity (referring to
the big data characteristic of variety). Data sources are multichannel within customer
interactions including call (voice) recordings from contact centers, digital interac-
tions via online channels (including social media), voice recordings at branches sites,
image scans of customer personal data via mobile devices, video recordings of the
car/building inspection uploaded via mobile devices and textual conversations using
USSD technology. Veracity in data has been implemented by the bank using various
tools to determine the data quality of its customer base. The FAIS and Banks Act
highlight the importance of insurance and banking organizations to ensure they are
presenting the right products, to the right customer according to the identified need.
This process is carried out by business compliance officers, risk operations officers
as well as data stewards.
Data monetization is evident through data analytics. According to [4] the value
characteristic in big data represents value, evident through the utilization of analytics
to foster decision making. As a result, using data analytics, decisions can pre-empt
the next move organizations should take to obtain or retain a competitive advantage.
The more analytics the organization applies to data, the more it fits the definition of
big data.
In his book, [8] remarks that three components affect the scope of data governance
(DG) programs, namely: (1) understanding the business model of your organization,
the corporate structure and its operating environment; (2) understanding the content
to be controlled such as data, information, documents, emails, contacts and media
and where it adds the most business value; and lastly, (3) the intent, level and extent
to which the content is monitored and controlled.
The business model of the bancassurance organization is the partnership of an
insurance organization selling their products using the bank’s distribution channel.
As such, the nature of this business model is that the organization has different
business units and as such does not share common information. It is in this instance
that [8] suggests the data governance programs are implemented per business unit.
Insurance and the bank hold a significant number of customers shared between the
two entities. In isolation, each organization only has a partial customer view and
understanding, placing them at a disadvantage in understanding and serving their
client needs. The two organizations hold data on their respective IT platforms and
differences in the product and customer numbers on shared customers is noted.
Content control: Bancassurance as a role player in the financial services industry is
highly regulated by national as well as international banking industry laws, insurance
laws as well as financial services legislation and regulations such as the Financial
Intelligence Centre Act 38 of 2001 (FICA), the Financial Advisory and Intermediary
Services Act (FAIS), Anti-Money Laundering, Code of Ethics, Code of Conduct,
Protection of Personal Information and National Credit Regulator (NCR) (to name a
few). As such, structured data (such as financial transactions and premium payments),
384 S. Eybers and N. Setsabi

as well as unstructured data (such as emails and insurance policies), may require
governance that would focus on identifying the data used and how it should be
used. Furthermore, archival and protection of personal information should also be
considered.
Federation: Federation as a process that involves “defining an entity (the data gover-
nance program) that is a distinct blend of governance functions where the various
aspects of data governance interact with the organization” [8]. As a result, a clear
understanding of the extent to which various governance standards will be applied
and implemented across various business units, divisions and departments within an
organization is important. The governance processes applied and implemented must
be driven by the same organizational motives and involves the same processes.

3 Research Method and Approach

A case study approach was adopted to investigate how data Governance was instilled
during a big data implementation project. A case study is a heuristic investigation
of a phenomenon within a real-life context to understand its intricacies [9]. In this
instance, a qualitative research approach was adopted using data collected from
interviews, heuristic observations and secondary documentation to create an in-depth
narrative describing how data governance was considered during the big data project.

3.1 Interviews

Due to the nature of bancassurance organizations (the bank and insurance), different
teams manage big data and as such different policies were implemented. As a result,
the governance of big data governance is a joint responsibility and much dependent
on the stakeholders working together toward a common decision-making process
and decision-making standards on big data-related matters. There also needs to be
clearly defined roles and responsibilities in such decisions [10].
To determine which policies and governance standards were implemented in
different teams and the logic used, interviews were conducted with various role
players with varying data governance responsibilities, as well as interdepartmental
teams working with big data such as enterprise information management, business
intelligence, customer insights and analytics.
A convenience sampling method was adopted as the participants were selected
based on stakeholders’ availability, geographic location and reachability, as well as
the geographical location of the researcher. All the selected participants, as well as
the researcher, were based in Johannesburg, South Africa. Johannesburg is part of
Gauteng, which is one of nine provinces in South Africa and the largest by population
with the biggest economic contribution.
Responsible Data Sharing in the Digital Economy … 385

Table 1 Summary of interview participants


Job title Job level Governance focus Bancassurance link
CIO for data analytics and AI Executive Group level Bank
Head [divisional] measurement Localized Bank—business unit
Head of customer insights and Group level Bank—global
analytics
Release train engineer Senior manager Group level Bank
Head of [divisional] data Localized Insurance
Data engineer Middle manager Localized Insurance
Data scientist Junior Localized Insurance
Analytics and modeling Senior manager Localized Insurance
manager

The participants have been working in their respective roles for a minimum of
two years (junior role) and five years in a senior role. They were, therefore, all well
informed on the topic of data governance. Based on the evidence of current data
governance implementation in the organization under study (governance structures
implemented), the data governance maturity was classified as in its infancy. A total
of eight in-depth semi-structured interviews were conducted: seven interviews in
person which was recorded and one using email. Table 1 contains a summary of
the positions of the various research participants as well as job level, the level on
which the participants focus with regard to governance-related matters as well as an
indication of their involvement in the bank or insurance-related business units.
In-depth, face-to-face interviews were conducted using a predefined interview
template. Twenty-five interview questions were grouped into three main focus areas
identified during an extensive systematic literature review on the topic of data
governance. The focus areas are described as part of the case study section.
In instances where the response from participants was not clear, second and third
round of interviews were conducted. Each interview was scheduled for one hour and
the research participants were invited to the interview via a detailed email which
provided context and background to the request for their time and why they were
identified as the ideal candidates for this research.

3.2 Heuristic Observations

In observations, [11] mentions that during the planning of an observational research


method the researcher to prepare a list of possible outcomes of expected animal
behavior based on a list of common behaviors. A similar concept has been applied
to this study. A prepared sheet with hypothetical questions that entails the hypo-
thetical outcome of each big data governance structure was used, supplemented by
386 S. Eybers and N. Setsabi

a sheet with comments from observations. Grove and Fisk [12] refer to this obser-
vation method as a structured observation of a more positivist than interpretivist in
philosophy. Due to the limited time frame in which to conclude the study, a few
in-person (direct) live observations will be made. Observations will be done based
on the various data governance framework themes identified by the researcher.
Organizational interdepartmental teams frequently attend scheduled forums
where all issues related to big data, including governance, were discussed. The output
and decisions taken at these forums were implemented over time and as such override
any other decisions taken within the business units. The researcher attended three of
these meetings to observe the dynamic in the group as well as to get feedback on
data governance-related topics.

3.3 Documents

The data obtained from participant interviews and observations were supplemented
by various secondary documentation, including but not limited to: documented
outputs from the respective discussion forums and divisional meetings focusing on
data governance (minutes of the meetings as well as supporting documentation); data
architecture documents; data flow diagrams; visualization and information architec-
ture documentation as well as conceptual design documents for data-related concepts
(including data storage); consumer and data engineering training plans (for big data);
business area enrollment form requesting to become a data consumer; principles and
best practices (for data asset management and data scrubbing, data governance struc-
tures) as well as metadata management (for data tokenization); progress meetings
of data artifact development teams. Thematic analysis was used to evaluate all the
documentation, in particular, the transcriptions of the interviews. Atlas.ti was used
to substantiate the manual coding process.

4 Case Study

Bank A is currently the second-largest banking financial service provider in South


Africa, employing more than 48,000 employees in various branches with their head
office in the economic hub (Johannesburg) in South Africa. The bank has an estimated
annual revenue of $7.9B (as of December 2019). The bank is currently moving toward
a digital economy. As a result, big data interventions have been earmarked and data
governance forums established with the mandate to ensure the implementation of
the enterprise-wide digitization strategy across all platforms. The strategy is largely
influenced by the protection of information act (PoPI), FICA, Banks Act as well as
FAIS encouraging financial service providers to provide evidence of their operational
ability. The data governance forums in the insurance part of the business highlighted
the need for big data governance accountability due to a lack of data-related policies
Responsible Data Sharing in the Digital Economy … 387

and standards. As a result, departmental insurance data governance forums were


mandated with this task as they develop, support and maintain their own, departmental
level business systems and subsequent data entities.
Discussion forums were attended by one exco member and a combination of senior
data governance role players. Being in the minority, the exco member had difficulty
influencing insurance data requirements discussed during the forums. Additional
topics of discussion were data quality and big data analytics. Unfortunately, these
forums were discontinued after a while due to reasons such as:
Lack of identification and involvement of relevant stakeholders: IT partners were not
invited to discussion forums. They can assist with the clarification of technological
infrastructure and other technology requirements in support of big data information
requirements. Key role players were not involved such as data stewards, risk and
compliance officers. In some instances, third-party applications were acquired but
system owners lacked the understanding of data elements, including data correctness
and validity.
Terminology and requirement clarification: Forum members didn’t have a clear
understanding of the meaning of big data and its attributes, the business requirements
for business decision-making purposes and a lack of data lifecycle clarification.
Mandate clarification: The forum wasn’t provided with a clear mandate and policies
and procedures were nonexistent. No key success factors were established leaving
the discussion forums without a clear goal.
Roles and responsibilities: Data governance forum attendees, including data stewards
and stakeholders, were not clear on what their tasks, duties, roles and responsibilities
were.
Benefit clarification: Forum participants did have insights into the anticipated mone-
tization of big data implementations, for example, the cost of change and the potential
revenue to be gained from just-in-time analytics.
Training: There was no formal training available for forum participants to educate
them on the key aspects of data-related topics such as data sharing and data quality
checking processes applicable to insurance.
Existing big data governance documentation, outside of the discussion forum,
was consulted to shed light on the planned big data intervention as part of the digi-
tization project. Surprisingly, a lot of work has already been done on drafting and
implementing big data governance structures, such as data quality checkpoints. This
indicates an awareness of big data governance on the group executive level of both the
bank and the insurance organization. Unfortunately, this was not transparent enough
to raise awareness among forum participants.
Interview questions focused on three main focus areas identified in current
academic literature namely; (1) basic, foundation level big data governance elements
that should be implemented to support enterprise-wide data governance (referring
to big data policies and standards, ethical prescriptions, data governance chain of
command and organizational hierarchies, auditing checks and balances and lastly
388 S. Eybers and N. Setsabi

storage structures) [6, 10, 13–15]; (2) data governance elements focusing on data
quality (which refer to data dictionaries and data libraries, metadata and processes
followed to ensure data quality) [5, 10, 14–17]; and (3) the adoption of big data
governance guidelines and frameworks (pertaining to the data lifecycle, and safe-
guarding of data, including during interoperability procedures that will maximize
the big data value proposition) [5, 8, 13, 15–19]. These three focus areas were used
to describe the findings of the research.

4.1 Focus Area 1: Data Governance Foundational Elements

At the executive level, the majority of participants in both the banking and insurance
functional levels indicated that they were aware of big data governance structures that
were in place. Linked to the organizational structure, one executive mentioned that
he was unsure if the existing governance policies were specific to the handling of big
data or data in general. On a lower functional level, engineers were not aware of data
governance policies, while one out of three senior managers shared a similar view:
“yes, having a group-wide forum where having a set data governance standards. A
framework have been put together on how our data is governed as an organization.
As a business unit, what the rest of the group are doing is being adopted. This
was drafted as a document after functional business unit inputs and shared as the
organizational standard”.
Organizational data governance structures: Executives confirmed that they were
aware of the current group data governance structure. This could be attributed to the
seniority level they have in the organization which is privy to the group’s structures.
Senior managers shared the same view, adding that “there is an executive data office
(EDO) which governs the usage of data”. Another senior manager added that an
enterprise data committee (EDC) was formed: “individual business units have their
own data forums and own data committee. This is mandated from the EDC. What-
ever have been agreed upon can be taken up to EDC for noting”. Importantly data
stewards, acting as business unit request coordinators and business unit represen-
tatives, play an integral part in the data committees. Interestingly, junior interview
participants were not aware of these structures.
Key data stakeholders: Executive level interview participants highlighted that a lot
of interest has been shown lately in data-related projects and as a result, many partic-
ipants volunteered to become part of the big data project. Unfortunately, participants
were not necessarily skilled in the area of data science and therefore lacked skills
of data scientists. However, various data-related roles existed within the organi-
zation namely data analysts, information management analysts, domain managers,
data stewards, provincial data managers and data designers. Executive participants
alluded that “…specific roles are critical in driving this contract [Big data project],
e.g., data engineers and data stewards in the information management space”. Apart
from these resources, business stakeholders were also included in data governance
Responsible Data Sharing in the Digital Economy … 389

forums and of vital importance: “…the owner of the data is the one who signs off
on the domain of the data. A business will thus be the owner in that domain of that
data. For example, head of the card division will be the domain holder of credit card
data”. In contrast, one executive claimed that all participants were data practitioners
(and not business stakeholders).
All senior managers, as well as the middle manager and the junior resource, agreed
that stakeholders with specific roles were invited to the data governance forums.
Although these roles were involved in the forums, resources fulfilling these roles
were not sure of what was expected of them. Other roles that were included in the
forums were “support casts” (as postulated by [20]) and include IT representatives,
compliance officers, IT security and business analysts.
Data storage: Participants acknowledge the importance of the ability of current tech-
nologies, as part of the IT infrastructure, to cater for organizational data storage
and easy retrieval. Importantly, one of the executives mentioned that regulatory and
compliance data storage requirements might differ from operational, business unit
requirements. “The analytics of financial markets comes in quick and fast and this
requires a reliable storage system… From an analytics perspective, you would need
as much of historic data as possible, way beyond the five years”. The current ability
of organizational data storage was debated among research participants. Although
one advised that the current technologies do cater to the storage needs of the organi-
zation, it can be improved. Furthermore, other participants indicated that the storage
facilities and infrastructure was adequate as it adhered to regulatory and compli-
ance prescriptions. However, the value can still be derived from data stored and
not necessarily required to meet regulatory requirements. This can be meaningful
in the analysis of current customer retention. It was also felt that the organization
should focus more on the processes prescribed to store data and not necessarily data
storage technology ability. Junior resources indicated their frustration in delays when
requesting data, mainly attributed to processes.

4.2 Focus Area 2: Data Quality

All interview participants believed that a data dictionary for each business unit is of
vital importance. This was attributed to the nature of the bancassurance organization
(the difference between the meaning of data entities). Data dictionaries were “living
documents” authored by data stewards and referred to as an “information glossary”. A
data reference tool was being created at the insurance group level to assist the business
of updating the glossary, but currently under development. This will, in particular,
assist with the need to include data source to target mapping requirements.
An important feature of Bank A’s current big data implementation strategy is
the focus on metadata. Metadata existed for structured and semi-structured data
(particularly) in different toolsets. Despite the availability, the additional meaning
was not derived from data entities. For example, referring to a personal identity
390 S. Eybers and N. Setsabi

number, “knowingly that it’s an id number but if they derive age and date of birth
… that level of maturity is not there yet. It’s on the cards”. No metadata existing for
unstructured data. The senior managers all concur that there was metadata available
for the different types of data content on both group and business unit level. The
biggest concern was that metadata was not frequently updated. The junior resource
mentioned that they have not been exposed to unstructured metadata and as such
believes it does not exist. The resource suggested that this could be due to the large
volume of insurance data.
On the topic of big data quality, executives mentioned that a data quality process
exists that ensures data quality throughout the data lifecycle. One of the executives
added that “there is no one size fits all”, referring to the standard data quality process
across business functions, but that measures or weightings applied to data elements
might differ. Research participants did not concur when asking about data quality.
Although raw data was thoroughly checked during the data generation phase (during
customer interaction), not enough data checks were performed after the acquisitions
phase. However, junior resources who actively engaged with the data felt that the
data quality checks performed during the data generation phase were insufficient as
they had to frequently perform data cleanup tasks.
Research participants agreed on the existence and enforcement of data checks to
ensure data accuracy and completeness. The process has been seen as “fail-proof
and well executed”. For example, a clear set of criteria was identified against which
datasets were checked, including data models. Tools and infrastructure technologies
from the IT partners have been employed to assist with third-party data sources.
Additional data checks are performed whereby personal data (e.g., national identify
numbers) are verified against the central government database and vehicle detail
verified against the national vehicle asset register (NaTIS system). Trends analysis
is also used to perform data validity checks. For example, if funeral sales were 100
with a general 10% monthly increase, a sudden monthly increase by 40%, with no
change in business strategy or business processes, the dataset is flagged.
Communication and data management: To manage changes to datasets, the
respondents highlighted that various tools were used to communicate changes and the
introduction of new datasets, mainly by data stewards. Only data owners are included
in the communication process at various stages of the data lifecycle management on
an event management basis. Should there be a need, data stewards will escalate data
related to higher levels of governance such as the information governance forum.
Not surprisingly research participants felt that the data management process if far
from perfect as it is reactive in nature—only when there’s an impact in business or
in systems is communication effected.
Big data policies and standards—audit checks: The organization is forced to imply
strict auditing checks as prescribed by industry compliance regulations. As a result,
respondents indicated that they have employed “…. an information risk partner,
compliance and risk and legal. Depends on how frequently the data is used, they’ll
either audit you every quarter or once a year. There’s also internal and external
audit initiatives”. Another executive, in charge of data analytics, mentioned frequent,
Responsible Data Sharing in the Digital Economy … 391

“periodic deep sticks” into sample datasets. Furthermore, it was highlighted that Bank
A also leverage off the data supporting cast such as IT security to run risk checks on it.
Apart from infrequent, ad hoc data checks, the compliance criteria were programmed
into reporting artifacts such as dashboards. An interesting finding was that most data
practitioners were aware of the policies and standards, but business stakeholders
lacked knowledge on the topic.

4.3 Focus Area 3: Guidelines, Frameworks and Value


Proposition

Focusing on the issue of big data privacy and security, the majority of executives
explained there were adequate measures to guard against unauthorized access to
data. The general notion was that the financial services industry was very mature
when dealing with data due to strong regulatory prescriptions of data being handled
in real time, near real time and batches. One of the three executives, however, high-
lighted that although the measures were in place and validated in conjunction with
IT partners, he was not sure that measures were sufficient.
One of the senior managers questioned the adequacy of access to internal data
as well as the access to bank data available to third parties. According to the senior
manager, internal data access is adequate. However, third parties have been lacking
adherence to safeguarding principles set out to safeguard the bank’s data against
unauthorized access. The rest of the senior managers concurred that measures were
in place and maintained in the entire organization; however, the competence and capa-
bility of those measures are sometimes inadequate. Junior staff members supported
this viewpoint and elaborated that measures are prone to human discretion to grant
access to data. As a result, predefined quality checkpoints can be ignored. The middle
manager felt at the localized level (insurance) adequate measures were informed to
external parties who request access to data.
Big data interoperability: Executives indicated that terms of references have been
agreed at the Bank’s group architecture level when sharing data. Predefined methods
and procedures exist for the sharing of data to ensure data integrity during the move-
ment process. One of the executives threw caution to the wind highlighting that,
even though data is securely transported between landing areas, integrity in the data
is sometimes compromised between the landing areas.
Research participants were confident that data security and integrity measures
were successfully employed when sharing data. Training was provided to data users
as well as the senders of the data. Software security tools were approved by group
information security to ensure that data was not compromised during live streaming.
In addition, “failsafe” methods are currently being developed. Apart from this, addi-
tional sign off procedures was employed at each stage of data movement which
ensures integrity and safe transportation. This can also be accredited to the source
392 S. Eybers and N. Setsabi

to target mapping exercise that is done that sets a baseline on what to expect from
source data as well as thereafter.
Big data analytics access: Only one of the executives mentioned that their role didn’t
require to have access to the ability to analyze data. However, the other executives
as well as senior managers indicated that data analytics was important to them and
therefore drive business unit level data analytics interventions. Junior and middle
manage confirmed that they have access to data analytics employed data mining and
analytical tools like SAS. Data analytics was used to predict customer behavior, based
on a large historical dataset, to identify fraudulent activities. Prescriptive analytics
was still in its infancy. An example of prescriptive analytics in short-term insurance
would be Google’s driverless car. Knowing that there will be no driver as part of
the underwriting questions, what would be the next course of action? Or there are a
number of algorithms that play input to Google’s self-driving car to determine what
the next course of action it needs to take, i.e., take a short-left turn, sharp curve ahead
drive slowly, pick up speed as going up the mountain, etc.
The general look and feel of data visualization artifacts are governed by a data
visualization committee. This committee provides guidance and standard practice in
the design of the dashboards, visualization tools to be used as well as who needs to
use them is discussed at the guilds.
Scope of big data: Currently, the scope of big data intervention projects is clearly
defined. Senior-level research participants remarked that there is no need for data
governance in instances where the business unit can attend to their big data request
locally and supported by the data steward and IT. Only in instances where there is
“inter-sharing of data including from 3rd parties, then the Governance process will
kick in”. For enterprise level, big data projects, the scope of the projects was defined
at the executive data office level.
Business value: Research participants indicated that the value of (correct) data is
measured as an information asset. Subsequently, it is important to understand the
objective of data analysis. For example, does the organization want to increase
revenue considering existing customer trends, or saving costs by predicting fraud-
ulent activities? Quality data drives meaningful decisions. One of the executives
mentioned by looking at the output of the data, a four dimension matrix comes into
play—“(1) I’m expecting an outcome and there’s an outcome (2) I’m not expecting an
outcome but there’s an outcome (3) I’m expecting an outcome but I’m seeing some-
thing different (4) I’m not expecting an outcome and there is no outcome. …Mea-
suring the value of data looks at the quantum of the opportunity that sits on the
data.” Another executive highlighted, measuring the value of data is done through
prioritization of initiatives within the business or the organization: “to ensure that it
works on the most impactful drivers of the organization is needed”.
Responsible Data Sharing in the Digital Economy … 393

5 Summary of Findings and Conclusion

An academic literature review on the topic of data governance and big data high-
lighted three main data governance focus areas that should be considered in the
implementation of big data projects. These three focus areas were used in an in-
depth case study to identify the data governance elements that should be considered
in bancassurance organizations when implementing big data projects.
Focus area one, in general, highlighted the fact that current data governance struc-
tures in the bancassurance organization under study did not cater to big data interven-
tions per se but data in general. It, therefore, seems as if some unique elements of the
planned big data intervention might be missed. His research [16] has indicated that
big data interventions might need special intervention and definitional clarification.
A lack thereof can have a huge effect on focus area three, the value proposition of
big data implementations. Data governance specifications also need to cater for the
unique characteristics of big data such as volume, velocity and variety.
Focus area two indicated that formal education and training should be included as
a formal structure or at the very least a part of the communication decision domain
within the big data governance structures. This is because business stakeholders
required to attend the data governance structures are either new to the business, new
to the role or simply not aware of the importance of having big data governance
structures in place. Education and training on big data via “big data master classes”
and “insurance data sharing” held by the bancassurance under study are a stepping
stone toward bringing awareness to every stakeholder working with data and their
role in the decision making of the data asset. The importance of the clarification
of different data governance roles and responsibilities and subsequent educational
background was highlighted by extensive research by Seiner [20].
The researcher also noted most big data governance structures have been adopted
for structured data but not for unstructured data. Metadata for unstructured data
is nonexistent and as such the management of unstructured data is pushed over to
another guild referred to as “records management”. Unstructured data also proves
difficult to apply the big data quality processes due to its nature. Thus, a lot more
work will need to be put in to ascertain standardized processes that will be required
to govern unstructured data. Governance structures ensuring data quality of current
structured and semi-structured data was well enforced and adequate. The need for
quality assurance of unstructured datasets remained.
The researcher versed some limitations in this study as the content dealt with is
highly classified and several governance processes had to be followed to obtain it. At
some point, only one contact at the executive level was used to verify the accuracy of
the data obtained. The availability of research participants was also limited as they
are based in different buildings, as such non-face-to-face meetings were held with
most of them in the interest of time.
Finally, research area three highlighted adequate data interoperability governance
structures. Although research participants took cognizance of the value of big data,
394 S. Eybers and N. Setsabi

no evidence of such calculations on group level (both banking and insurance) could
be found.
An unintentional finding of the research was the reasons for the failure of data
governance discussion forums. It should be interesting to investigate this matter in
future research work.

References

1. Data Management Association (2009) The DAMA guide to the data management body of
knowledge (DMA—DMBok Guide). Technics Publications
2. Rowley J (2007) The wisdom hierarchy: Representations of the DIKW hierarchy. J Inf Sci
33:163–180
3. Munshi UM (2018) Data science landscape: tracking the ecosystem. In: Data science landscape:
towards research standards and protocols. Springer, Singapore
4. Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19:171–209. https://doi.
org/10.1007/s11036-013-0489-0
5. Ghasemaghaei M, Calic G (2019) Can big data improve firm decision quality? The role of data
quality and data diagnosticity. Decis Support Syst 120:38–49
6. Al-Badi A, Tarhini A, Khan AI (2018) Exploring big data governance frameworks. Procedia
Comput Sci 141:271–277. https://doi.org/10.1016/j.procs.2018.10.181
7. Elkington W (1993) Bancassurance. Chart Build Soc Inst J
8. Ladley J (2012) Data governance: how to design, deploy, and sustain an effective data
governance program. Morgan Kaufmann, Elsevier
9. Yin RK (2014) Case study research design and methods. Sage
10. Kuiler EW (2018) Data governance. In: Schintler LA, McNeely CL (eds) Encyclopedia of
big data, pp 1–4. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-
32001-4_306-1
11. Ice GH (2004) Technological advances in observational data collection: the advantages and
limitations of computer-assisted data collection. Field Methods 16:352–375
12. Grove SJ, Fisk RP (1992) Observational data collection methods for service marketing: an
overview. J Acad Mark Sci 20:217–224
13. Soares S (2014) Data governance tools: evaluation criteria, big data governance, and alignment
with enterprise data management. MC Press online
14. Mei GX, Ping CJ (2015) Design and implementation of distributed data collection management
platform. Presented at the 2015 international conference on computational intelligence and
communication networks, Jabalpur, India, 12 Dec 2015
15. Ballard C, Compert C, Jesionowski T, Milman I, Plants B, Rosen B, Smith H (2014) Information
governance principles and practices for a big data landscape. RedBooks
16. Al-Badi A, Tarhini A, Khan AI (2018) Exploring big data governance frameworks. In: Procedia
computer science, pp 271–277
17. Khatri V, Brown CV (2010) Designing data governance. Commun. ACM 53:148–152
18. Almutairi A, Alruwaili A (2012) Security in database systems. Glob J Comput Sci Technol
Netw Web Secur 12:9–13
19. Davenport TH, Dyche J (2013) Big data in big companies
20. Seiner R (2014) Non-invasive data governance: the path of least resistance and greatest success.
Technics Publications, USA
A Contextual Model for Information
Extraction in Resume Analytics Using
NLP’s Spacy

Channabasamma , Yeresime Suresh , and A. Manusha Reddy

Abstract The unstructured document like resume will have different file formats
(pdf, txt, doc, etc.), and also, there is a lot of ambiguity and variability in the language
used in the resume. Such heterogeneity makes the extraction of useful information a
challenging task. It gives rise to the urgent need for understanding the context in which
words occur. This article proposes a machine learning approach to phrase matching in
resumes, focusing on the extraction of special skills using spaCy, an advanced natural
language processing (NLP) library. It can analyze and extract detailed information
from resumes like a human recruiter. It keeps a count of the phrases while parsing
to categorize persons based on their expatriation. The decision-making process can
be accelerated through data visualization using matplotlib. Relative comparison of
candidates can be made to filter out the candidates.

Keywords NLP · spaCy · Phrase matcher · Information extraction

1 Introduction

While the Internet has taken up the most significant part of everyday life, finding
jobs or employees on the Internet has become a crucial task for job seekers and
employers. It is vastly time-consuming to store millions of candidate’s resumes in
the unstructured format in relational databases and requires considerably a large
extent of human effort. In contrast, a computer which parses candidate resumes has
to be constantly trained and adapt itself to deal with the continuous expressivity of
human language.

Channabasamma (B) · A. Manusha Reddy


VNRVJIET, Hyderabad 500090, India
e-mail: channu.ar@gmail.com
A. Manusha Reddy
e-mail: manushareddy.a@gmail.com
Y. Suresh
BIT&M, Ballari 583104, India
e-mail: suresh.vec04@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 395
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_30
396 Channabasamma et al.

Resumes may be represented with different file formats (pdf, txt, doc, etc.) and
also with different layouts and contents. Such diversity makes the extraction of useful
information a challenging task. The recruitment team puts a lot of time and effort in
parsing resumes and pulling out the relevant data. Once it is extracted, matching of
resumes to the job descriptions is carried out appropriately.
This work proposes a machine learning approach to phrase matching in resumes,
focusing on the extraction of special skills using spaCy [1], an advanced natural
language processing (NLP) library. It can analyze and extract detailed information
from resumes like a human recruiter. It keeps a count of the phrases while parsing to
categorize persons based on their expatriation. It improves recruiter’s productivity,
reduces the time in the overall candidate selection process, and improves the quality
of selected candidates.
The rest of the paper is organized as follows: Section 2 highlights the literature
survey, Sect. 3 presents the proposed work of extracting relevant information from the
unstructured documents like resumes, Sect. 4 discuses about the implementation and
the obtained output, and Sect. 5 concludes the work with scope for future research.

2 Literature Survey

This section summarizes the contributions made by various researchers toward the
extraction of relevant information for resume parsing.
Sunil Kumar introduced a system for automatic information extraction from
the resumes. Techniques for pattern matching and natural language processing are
described to extract relevant information. Experimental results have shown that the
system can handle different formats of resumes with a precision of 91% and a recall
of 88% [2].
Papiya Das et al. proposed an approach to extract entities to get valuable informa-
tion. R language with natural language processing (NLP) is used for the extraction
of entities. In this paper, the authors briefly discussed process of text analysis and
extraction of entities by using different big data tools [3].
Jing Jiang worked on information extraction, described two fundamental tasks—
named entity recognition [NER] and the relation extraction. NER concept is to find
names of entities, for instance, people, locations, and organizations. A named entity
is often context dependent. Relation extraction aimed at finding semantic relations
among entities [4].
Harry Hassan et al. introduced an unsupervised information extraction framework,
which is based on mutual reinforcement in graphs. This framework is mainly used
in acquiring the extraction patterns for the content extraction, relation detection
and then for characterization task, as it is one the difficult tasks in the process of
information extraction due to the inconsistencies in the available data and absence
of large amounts of training data. This approach achieved greater performance when
compared with supervised techniques with reasonable training data [5].
A Contextual Model for Information Extraction … 397

Sumit Maheshwari et al. identified resumes by analyzing “skills” related informa-


tion. It improved the performance of resume processing of candidates by extracting
required special skill type and even special skill values. It is observed that the
recruiting team can reduce 50–94% of the features while selecting an appropriate
resume [6].
Chen Zhang et al. proposed a ResumeVis, a visualization system to extract
and visualize resume data. To extract semantic information text, mining method
is presented. Then, to show the semantic information in various perspectives a set of
visualizations are presented. This paper focused on accomplishing of following tasks:
to trace out individual career development path to mine underlying social relations
among individuals and even to keep the full representation of enormous resumes’
combined mobility. It is observed that the system has effectively demonstrated on
several government officer resumes [7].
Eirini Papagiannopoulou et al. presented keyphrase extraction methods using
supervised and unsupervised learning. Keyphrase extraction is the task of textual
information processing which deals with automatic extraction of characteristic and
representative phrases from a document that expresses all the key aspects of its
content. According to this paper, among listed unsupervised methods, for short docu-
ments, graph-based methods will work better, while statistical methods will work
better for long documents [8, 9].
Khaled Hasan et al. focused on an in-depth approach of automatic keyword extrac-
tion from Bangla scientific documents employing a hybrid method, which utilizes
both unsupervised and supervised machine learning model. Hybrid approach exhib-
ited 24 and 28% improvements over two existing approaches: neural-based approach
and approach using part-of-speech tagging along with named entity recognizer,
respectively, for extraction of five keywords [10].
Xavier et al. evaluated and compared five named entity recognition (NER) soft-
ware (StanfordNLP, OpenNLP, NLTK, SpaCy and Gate) for two different corpora.
NER takes a major part in finding and classifying the entities in NLP applications
[11].
It has been observed that the past research done on resumes generally focused on
information extraction from resume to filter millions of resumes to a few hundred
potential ones. If these filtered resumes are similar to each other, examining each
resume becomes challenging to know about the right candidates. For a given set of
similar resumes, none of the above approaches tried to extract special skills and to
keep a count it.
In our work, effort has been made to extend the notion of special skills related to
a particular domain like data science, machine learning, R language, big data, etc.,
to improve the recruiter’s productivity, to reduces the time in the overall candidate
selection process, and to improve the quality of selected candidates.
398 Channabasamma et al.

3 Methodology

3.1 Natural Language Processing (NLP) Background

NLP is an artificial intelligence (AI) technique which allows the software to under-
stand human language whether spoken or written. The resume parser works on the
keywords, formats, and pattern matching of the resume. Hence, resume parsing soft-
ware uses NLP to analyze and extract detailed information from resumes as like a
human recruiter.
The raw data needs to be preprocessed by the NLP algorithm prior to which the
consequent data mining algorithm is used for processing. NLP algorithmic process
involves various sub-tasks such as tokenization of raw text, part-of-speech tagging,
and named entity recognition etc.
Tokenization: In this process, the text is first tokenized into small individual tokens
such as words, punctuation. This process is done by the implementation of rules,
specific to each language. Based on the specified pattern, the strings are broken
into tokens using regular expressions. The patterns used in this work (r \w ) remove
the punctuation in the data processing. The function add.lower() can be used in the
lambda function to convert to lowercase.
Stopword Removal: The stopwords are a group of often used words in the language.
Like in English, having several stop words such as “the”, “a”, “is”, “are”, etc. The
perception of using these kinds of stop words is, removal of low informative words
from the text could lead to focus more on the important words. spaCy has inbuilt
stopwords. Based on the context, for example in sentiment analysis, the word “not”
and “no” are important in the meaning of a text such as “not good”, stopwords list
can be modified accordingly.
Stemming and Lemmatizing: Both stemming and lemmatization shorten words
from the text to their root form. Stemming is the process of decreasing or removing the
inflection in words to their root form (for instance performs/performed to perform).
In this case, the “root” might not be true root word, but simply a canonical form
of its original word. It streams the prefixes of words based on common words. In
some cases, it is helpful, but not always as a new word may lose its actual meaning.
Lemmatization is the process of converting a word into its base form, for example,
“caring” to “care”. spaCy’s lemmatizer has been used to obtain the lemma (base)
form of the words. Unlike stemming, it returns an appropriate word that could be
easily found in the dictionary.
Part-of-speech tags and dependencies: After the process of tokenization, spaCy
will parse and provide the tags to a given document. At this point, statistical models
are used, which enable the spaCy to predict the label or tag that likely appears with
the context. A model will consist of binary data, it shows the system good enough
examples to make the predictions which may generalize across the language—say,
a word following “the” in the English language is most of the times a noun.
A Contextual Model for Information Extraction … 399

Named entity recognition (NER): NER is possibly the first step in the information
extraction; it identifies and classifies the named entities in the document into a set
of pre-defined categories like the person names, expressions of times, locations,
organizations, monetary values, quantities, percentages [11]. The more accuracy in
the recognition of a named entity as a preprocessing step, the more information on
relations and events can be extracted. There are two types of NER approaches: a
rule-based approach and a statistical approach [4], and even a combination of both
(hybrid NER) has also been introduced. The hybrid approach provided a better result
compared to relying only on the rule-based method in recognizing the names [12, 13].
The rule-based approach defines a set of rules that determines the occurrence
of an entity with its classification. To represent the cluster of relatively independent
categories, ontologies are also used. These systems are most useful for the specialized
entities and their categories, which have a fixed number of members. The quality of
the rules determines performance.
Statistical models use supervised learning that is built on very large training sets
of data in the classification process. Algorithms use real-world data to apply rules;
rules can be learnt and modified. The process of learning could be accomplished
through a fully supervised, unsupervised, or semi-supervised manner [14, 15].

3.2 The Approach

There is a lot of ambiguity and variability in the language used in the resume.
Someone’s name can be an organization name (e.g., Robert Bosch) or can be an
IT skill (e.g., Gensim). Such heterogeneity makes the extraction of useful informa-
tion a challenging task. It gives rise to the urgent need of understanding in the context
where the words occur.
Semantics and context play a vital role while analyzing the relationship between
objects or entities. The most difficult task for unstructured data is extracting the rele-
vant data because of its complexity and quality. Hence, semantically and contextually
rich information extraction (IE) tools can increase the robustness of unstructured data
IE.
The Problem: In various scenarios, running of a CV parser for a person’s resume and
to look for data analytical skills will help you to look for candidates with the knowl-
edge in data science. The parsing fails if the search is more specific, like if you are
looking for a Python developer who is good in server-side programming with having
good NLP knowledge in a particular development environment for the development
of software systems in the healthcare domain. This is because the parsing of job
descriptions and resumes do not bring quality data from unstructured information.
The Solution:
• Have a table or dictionary which covers various skill sets categorized.
400 Channabasamma et al.

• An NLP algorithm to parse the entire document for searching the words which
are mentioned in the table or dictionary.
• Count the occurrence of the words belonging to different categories.
In the proposed system spaCy [1], an advanced library of natural language
processing (NLP) is used. It can analyze and extract detailed information from
resumes as like a human recruiter.

4 Experimental Results and Discussions

To evaluate information extraction systems, in this work, 250 resumes with different
formats and templates have been considered which are downloaded from Kaggle.
These are parsed by an advanced library of NLP, Spacy, which has a feature called
“Phrase Matcher.” It can analyze and extract detailed information from resumes by
preprocessing.
When raw text is fed as an input to NLP, spaCy tokenizes it, processes the text,
and produces a Doc object. Then, the Doc processed in several different steps—it is
known as the processing pipeline. The pipeline depicted in Fig. 1 includes a tagger,
a parser and an entity recognizer (detailed in Sect. 3). The output Doc that has been
processed in each phase of pipeline is fed as input to the next component.
The proposed model is designed as shown in Fig. 2; the steps to implement the
module are as follows:
1. Dictionary has been created (Table 1) which includes various skill sets catego-
rized from a different domain. The list of words under each category is used to
perform the phrase matching against the resumes.
2. Documents are parsed by the advanced library of NLP, Spacy, which has a feature
called “Phrase Matcher.”
3. Which intern parses the entire document to search for the words which are listed
in the table or dictionary.
4. Finds the phrase.
5. Then the frequency of the words of different categories will be counted.

Fig. 1 NLP pipeline


A Contextual Model for Information Extraction … 401

Fig. 2 Proposed design

6. Data Visualization: Matplotlib is used to represent the above information visually


so that it becomes easy to choose the candidate (see Fig. 3).
From the above screenshot, it looks like two candidate resumes are satisfying the
job requirements as a data scientist is visualized.

4.1 Merits

1. The code automatically opens the documents and parses the content.
2. While parsing it keeps a count of phrases, for easy categorization of persons
based on their expatriation.
3. The decision-making process can be accelerated using data visualization.
4. Relative comparison of candidates can be made to filter out the job applicants.
402 Channabasamma et al.

Table 1 Dictionary of the various skills for different categories


Statistics Machine Deep learning R Python NLP Data
learning language language engineering
Statistical Linear Neural network R Python nlp aws
models regression
Statistical Logistic keras ggplot Flask Natural ec2
modeling regression language
processing
Probability K means theano shiny django Topic amazon
modeling redshift
Normal Random Face detection cran Pandas lda s3
distribution forest
Poisson xgboost Neural network dplyr numpy Named docker
distribution entity
recognition
Survival svm Convolutional tidyr scikitlearn pos tagging kubernetes
models neural network
(enn)
Hypothesis Naive Recurrent lubridate sklearn word2vec scala
testing Bayes neural
network(RNN)
Bayesian pea Object knitr matplotlib WORD teradata
inference detection embedding
Factor Decision yolo scipy lsi google big
analysis trees query
Forecasting svd gpu bokeh spacy aws lambda
Markov Ensemble cuda statsmodel gensim aws emr
chain models
Monte carlo Boltzman tensorflow nltk hive
machine
lstm nmf hadoop
gan doc2vec sql
opencv cbow
bagod
words
skip gram
bert
sentiment
analysis
chat bot
A Contextual Model for Information Extraction … 403

Fig. 3 Visualization of shortlisted candidates with the frequency count of special skills

5 Conclusion

Heterogeneity in unstructured data like resume makes extraction of useful infor-


mation a challenging task. Once the relevant data has been extracted, the process of
matching skills from resumes to job descriptions can be easily carried out. This paper
explores previous approaches to extract meaningful information from unstructured
documents. Furthermore, this paper provides background knowledge and covers a
basic understanding of NLP. Finally, it presents how documents are parsed by an
advanced library of NLP, Spacy, which has a feature called “Phrase Matcher.” Spacy
parses the entire document to search for the words listed in the table or dictionary.
The next step is to count the occurrence of the words of various categories. For
data visualization, Matplotlib is used to represent the information visually so that it
becomes easy to choose the candidate.
With the fast blowing big data on the Internet, further research on information
extraction can deal with noisy and diverse text. The work can be extended further by
generating recommendations to the shortlisted candidates to avoid predicted metric
deviation.

References

1. https://spacy.io/usage/spacy-101. Last accessed 2020/3/15


2. Sunil Kumar K (2010) Automatic extraction of usable information from unstructured resumes
to aid search. 978-1-4244-6789-1110/$26.00©2010 IEEE
3. Das P, Pandey M (2018) Siddharth Swarup Rautaray: A CV parser model using entity extraction
process and big data tools. Int J Inf Technol Comput Sci (IJITCS) 10(9):21–31. https://doi.org/
10.5815/ijitcs.2018.09.03
4. Jiang J (2012) Information extraction from text. In: Aggarwal CC, Zhai CX (eds) Mining text
data. https://doi.org/10.1007/978-1-4614-3223-4_2 © Springer Science+Business Media, LLC
2012
404 Channabasamma et al.

5. Hassan H et al (2006) Unsupervised information extraction approach using graph mutual


reinforcement. In: Proceedings of the 2006 conference on empirical methods in natural
language processing (EMNLP 2006), pp 501–508, Sydney, July 2006. c 2006 Association
for Computational Linguistics
6. Maheshwari S et al (2010) An approach to extract special skills to improve the performance of
resume selection. In: DNIS 2010, LNCS 5999, pp. 256–273, z2010. _c Springer-Verlag Berlin
Heidelberg
7. Zhang C, Wang H (2018) ResumeVis: a visual analytics system to discover semantic infor-
mation in semi-structured resume data. ACM Trans Intell Syst Technol 10(1), Article 8, 25 p.
https://doi.org/10.1145/3230707
8. Papagiannopoulou E, Tsoumakas G (2019) A review of keyphrase extraction. WIREs Data
Mining Knowl Discov 2019:e1339. https://doi.org/10.1002/widm.1339
9. Vijayakumar T, Vinothkanna R () Capsule Network on Font Style Classification.J Artif Intell
2(02):64–76
10. Sazzad KH et al (2018) Keyword extraction in Bangla scientific documents: a hybrid approach.
Comput Sci Eng Res J 11. ISSN: 1990-4010
11. Schmitt X et al (2019) A replicable comparison study of NER software: StanfordNLP, NLTK,
OpenNLP, SpaCy, Gate. 978-1-7281-2946-4/19/$31.00 ©2019 IEEE
12. Balgasem SS, Zakaria LQ (2017) A hybrid method of rule-based approach and statistical
measures for recognizing narrators name in Hadith. In: 2017 6th international conference on
electrical engineering and informatics (ICEEI), Langkawi, pp 1–5
13. Kumar, Saravana NM (2019) Implementation of artificial intelligence in imparting education
and evaluating student performance.J Artif Intell 1(01):1–9
14. Zaghloul T (2017) Developing an innovative entity extraction method for unstructured data.
Int J Qual Innov 3:3. https://doi.org/10.1186/s40887-017-0012-y
15. Nedeau D, Sekine S (2007) A survey of named entity recognition and classification.
Linguisticae Investig 30(1):3–26
16. https://towardsdatascience.com/a-practitioners-guide-to-natural-language-processing-part-i-
processing-understanding-text-9f4abfd13e72, 2020/3/15
Pediatric Bone Age Detection Using
Capsule Network

Anant Koppar, Siddharth Kailasam, M. Varun, and Iresh Hiremath

Abstract Convolution neural network (CNN) is a state-of-the-art method that is


widely used in the field of image processing. However, one major limitation of CNN
is that it does not consider the spatial orientation of the image. Capsule network,
proposed by Geoffrey E. Hinton et al., was an attempt to solve this limitation.
However, the architecture was designed for discrete data. This paper modifies the
architecture appropriately to make it suitable to work on continuous data. It works
on the dataset RSNA Pediatric Bone Age Challenge (2017) (RSNA Pediatric Bone
Age Challenge in Stanford medicine, Dataset from https://www.kaggle.com/kma
der/rsna-bone-age (2017) [1]) to detect the bone age of the patient from his X-ray,
whose maximum age is restricted to 228 months. In order to achieve the purpose,
mean squared error (MSE) was used for backpropagation. The 20 most significant
outputs were taken from the network to address the problem of diminishing gradi-
ents. The results were validated to check if it is biased to an age range. This could
be a characteristic for running on continuous data using an architecture that supports
the classification of only discrete data. Since the validation held true, one could infer
that this network could be more suitable for continuous data than capsule network.

Keywords Capsule network · Pediatric bone age · Bone age detection ·


Continuous data · Convolutional neural network · Neural network · Mean squared
error

1 Introduction

Convolutional neural network (CNN) is a state-of-the-art method that has been rele-
vant for quite a long time in today’s image processing field. It tries to identify an
object by identifying its subcomponents, which are further identified by key points.
It has been widely used as it is known to yield very good accuracy.

A. Koppar · S. Kailasam (B) · M. Varun · I. Hiremath


Computer Science Engineering Department, Engineering Department, PES University, 100 Feet
Ring Road, Banashankari Stage III, Dwaraka Nagar, Banashankari, Bengaluru, Karnataka
560085, India
e-mail: k.siddharthm13@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 405
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_31
406 A. Koppar et al.

However, CNN has its own limitations. One major limitation is that it does not
consider spatial information when it tries to detect the object. When a CNN identifies
a key point somewhere, it is always considered as a match irrespective of where it
found the key point and in what direction it was found. This could hence lead to a
few misclassifications.
Capsule network [2] tries to address this particular issue. It obtains vectors as
an output from each neuron instead of a single intensity value. The neuron then
communicates with vectors obtained from other neurons, before it decides on how
confidently it was a part of a bigger subcomponent. This way, it ensures that the
spatial orientations are taken into consideration.
Capsule network [2] was originally designed for discrete data. This paper attempts
to modify the architecture so that it could be used for continuous data. It tries to predict
the bone age of a patient based on an X-ray of the person’s wrist bone. It tests its
validity using the dataset provided by the Radiological Society of North America
(RSNA) called RSNA Pediatric Bone Age Challenge [1].

2 Literature Survey

The capsule network was introduced in the paper “Dynamic Routing Between
Capsules” by Hinton et al. [2]. It performed handwritten digit recognition on the
MNIST dataset. The input image is fed into a convolution layer with RELU acti-
vation function. The output was normalized by using a safe norm that scales the
length of the probability vectors between 0 and 1. Then the vector with the highest
estimated class probability was taken from the secondary capsule and the digit class
is predicted. The loss function consists of margin loss and reconstruction loss added
together with a learning rate alpha. The dataset was split into two parts, where 80% of
the dataset was used for training and 20% of the dataset was used for validation. The
model achieved a final accuracy of 99.4% on the validation dataset after 10 epochs.
This paper was hence ideal to classify discrete data, although it may not be suitable
for continuous data.
The paper “Capsule Networks and Face Recognition” by Chui et al. [3] talks
about performing the task of face recognition by using a capsule network. It used the
dataset Labelled Faces in the Wild (LFW) [4]. A total of 4324 images was sampled
from the dataset where 3459 were used as training images and 865 as testing images.
In another attempt to train the network, the dataset comprised 42 unique faces from
a collection of 2588 images with at least 25 faces for every unique person. The train-
test split is similar to the way mentioned above. The model achieved an accuracy of
93.7% on the whole test dataset.
The paper “Pediatric Bone Age Assessment Using Deep Convolutional Neural
Networks” by Iglovikov et al. [5] approaches the problem of pediatric bone age
assessment by using a deep convolution neural network where the CNN performs
the task of identifying the skeletal bone age from a given wrist bone X-ray image.
This paper implements a stack of VGG blocks. It used the exponential linear unit
Pediatric Bone Age Detection Using Capsule Network 407

(ELU) as the activation function. The output layer was appended with a softmax
layer comprising 240 classes as there are 240 bone ages which result in a vector of
probabilities. A dot product of the probabilities with the age was taken. The model
used mean squared error (MSE) as the loss function. This way, CNN was used to
predict continuous data.
The paper “the relationship between dental age, bone age and chronological age
in underweight children” by Kumar et al. [6] talks about the relationship between
dental age, bone age, and chronological age in underweight children. It was experi-
mentally proven that a normal female has a mean difference of 1.61 years between
the chronological age and the bone age and a mean difference of 1.05 years for males.
In addition, the study concludes by saying that bone age and chronological age have
a positive correlation with each other which are the maturity indicators of growth.
Therefore, any delay incurred between the bone age and the chronological age is an
attribute significant to a sample of 100 underweight children.
The paper “Bone age assessment with various machine learning techniques” by
Luiza et al. [7] talks about the traditional approaches to assess the bone age of the
individual as one of the topics among many other topics it talks about. Traditional
methods include the Fels method which is based on radio-opaque density, bony
projection, shape changes, fusion. It also talks about Fischman technique, which is
based on the width of the diaphysis, a gap of the epiphysis, fusion of epiphysis and
diaphysis between third and fifth finger. But manual methods are usually very time
consuming and prone to a lot of errors as humans are involved.

3 Proposed Methodology

This paper proposes an approach based on capsule network [2] to detect the bone age.
Capsule network is an object recognition tool that is a modification of a convolutional
neural network (CNN) [8]. It imparts an additional property of making it robust to
spatial orientation. Capsule network follows a hierarchical approach for detection
just like a CNN. For example, in facial recognition of a three-layered network, the
first layer may detect the types of curves. The second layer may use the detected
curves to identify features such as an eye, a nose, or an ear. The third layer may use
these detected subparts and identify a face.
The difference lies in the fact that a CNN outputs a single value from each neuron
which represents the confidence of the particular shape. The confidence value may
or may not be on a scale of one. However, a capsule network outputs a vector which
not only represents the confidence, but also the direction in which the shape was
identified. The neurons in each layer communicate with each other to decide on the
magnitude of the vector. The vector is then passed to the next layer, which tries to
see the bigger picture of the image based on the output vectors from the previous
layer.
408 A. Koppar et al.

An issue with capsule network is that it has been designed to work only on discrete
data. This paper modifies it to detect continuous data, which is the bone age of the
wrist X-ray of the given patient.

4 About Capsule Network

Capsule network was proposed by Hinton et al. [2]. It consists of the following layers
1. Primary capsules—This is a set of convolutional layers that are applied to the
image. Each neuron represents a particular shape. The output from this layer is
a feature map from which n vectors of m dimensions are derived, where m and n
are constants depending on the architectural decision by the user. There is usually
more than one convolutional layer in the neural network.
2. Squash function—This acts as an activation layer and imparts nonlinearity to the
network so that it could effectively learn from state-of-the-art backpropagation
algorithms, which depend on nonlinearity. It is given by the formulae
 2
s j  sj
vj =  2   (1)
1 + s j  s j 


3. Digit capsule—This is the output layer that gives the probability of occurrence
of each value. For example, in handwritten digit recognition, there are 10 digit
capsules as there are 10 outputs between 0 and 9. Similarly, this paper proposes
to use 228 digit capsules as the age range of the pediatric bone age dataset as
given by RSNA [1] is between 1 and 228 months
4. Decoder—This component tries to reconstruct the original image from the digit
capsule. This reconstructed image is used to calculate the reconstruction loss,
which is the loss of image data after it passes through the network.
One of the most important features in a capsule network is routing by agreement.
After obtaining the output from each convolutional layer in the primary capsules,
this operation is performed before the output goes to the next convolutional layer.
This enables communication across neurons to see if a feature identified by a neuron
has an equivalent feature identified by other neurons in the same layer.
Let the output of layer 1 be u1 , u2 , u3 … un , the output vector be m dimensions
represented as v1 , v2 , v3 … vm , the weights from u to v be W 1,1 , W 1,2 … W n,m . The
following constants is got

u 1|1 = W1,1 u 1
u 1|2 = W1,2 u 2
.. (2)
.
u n|m = Wn,m u n
Pediatric Bone Age Detection Using Capsule Network 409

The network includes another set of values b1,1 , b1,2 … bn,m whose ultimate goal
is to indicate how the vector outputs of the neurons from the previous layer correlate
to the input of the neurons from the next layer based on other vector outputs from
the next layer. These are initialized to the same value at the beginning. The weights
c1,1 , c1,2 … cn,m are then calculated by applying a softmax function on the values
b1,1 , b1,2 … bn,m .
 
c1,1 , c1,2 . . . c1,m = softmaxc1,1 , c1,2 . . . c1,m 
c2,1 , c2,2 . . . c2,m = softmax c2,1 , c2,2 . . . c2,m
.. (3)
.  
cn,1 , cn,2 . . . cn,m = softmax cn,1 , cn,2 . . . cn,m

The values v1 , v2 , v3 … vm are then calculated using the following formulae


 

v j = squash ci, j × u j|i (4)
i

Following this, bi,j is updated using the following formulae

bnewi, j = bi, j + u ji · v j (5)

The term uj|i . vj talks about how much vj has changed with respect to uj|i . The
network is then run again with the new bi,j . This is done for a fixed number of
iterations, so that the final vj appears like, all the neurons have communicated with
each other to decide the final output vector.
The routing by agreement algorithm of a capsule network is given in Fig. 1.

Fig. 1 Routing by agreement algorithm in a capsule network


410 A. Koppar et al.

5 Proposed Preprocessing

Before the image is fed to the neural network, it is always important to ensure that
it is in its best form and could be easily understood by the neural network. The
preprocessing in this paper has 3 main goals
• To identify the edges in the image.
• To remove the background as much as possible with minimal loss of data.
• To highlight the edges of the wrist bone.
Before the goals were achieved through various preprocessing techniques, the
image was first resized to a standard size of 1000 * 1000 to ensure that the effect of
any technique applied on the image was similar for every image.
The edges were then identified using adaptive thresholding. This method iden-
tifies the areas of interest based on the intensity of the neighboring pixels. In
order to strengthen the working of adaptive thresholding, contrast enhancement was
performed, so as to widen the intensity difference between the pixels. This was
followed by smoothing, using a Gaussian filter to remove noise and hence reduce
the chance of salt and pepper noise in the output of adaptive thresholding.
Once to have all the edges, the next aim is to ensure that the background edges
such as the frame of the X-ray are removed as much as possible from the image, so
that the network can focus on the wrist bone features. This was removed by applying
a closing filter on the image using kernels with long horizontal line and vertical
line. To cope up with the real-world data, random white spots were added to these
kernels. These kernels were applied 10 times on the image, each time with white
spots at different places.
Following this, the image was converted to grayscale using a Gaussian filter. This
could get intermediary values depending on the surrounding. Also, color inversion
was performed for human convenience for evaluating the quality of output as humans
are generally more accustomed to seeing X-rays as white bone on black background.
Hence, one can genuinely see if quality has improved. Following this, contrast is
enhanced to ensure a maximum difference between pixels.
The image was then sharpened two times to make the edges glow. In between,
the image was smoothed using an edge preserving filter. Edge preserving filter is
the latest smoothing tool that smoothes the pixels by identifying an edge instead of
using the surroundings. Hence, it is ideal in this case.
Once the image runs through this pipeline, it is ready to be used by the neural
network (Fig. 2).

6 Neural Network

The neural network is based on capsule network architecture. Capsule network [2]
is a modification of convolutional neural network [8] that imparts the property of
Pediatric Bone Age Detection Using Capsule Network 411

Fig. 2 Preprocessing architecture

considering the spatial orientation of the image in addition to the properties provided
by a CNN [8].
However, capsule network has been designed to classify discrete data. This paper
uses it to predict continuous data. It tries to make sure that the accuracy is not biased
to a particular age range as is usually the case when a network that classifies discrete
data is applied on continuous data. The original capsule network architecture has
two loss functions—margin loss and reconstruction loss.
The margin loss is given by the following formulae
 2  2
L k = Tk ∗ max 0, m + − vk  + λ(1 − Tk ) ∗ max 0, m − − vk  (6)

where

L k = margin loss


0, if k is correct
Tk =
1 if k is incorrect

m + = 0.9
m − = 0.1
412 A. Koppar et al.

λ is a constant

The reconstruction loss is a function that indicates how well it has coded the image
to represent the original image. It is based on the output obtained from the decoder.
Final loss function is given by

Final loss = margin loss + α ∗ reconstruction loss (7)

where alpha was the learning constant and was taken as 0.0005 in the original capsule
network architecture [2].
In the network, the margin loss function tries to ensure that the network gets as
close as possible to the original distribution, while the reconstruction loss tries to
ensure that the final layer represents as much information of the original image as
possible.
In discrete data, like handwriting recognition, when an image of digit 3 is given,
then recognizing the digit as 4 is equally wrong as recognizing the digit as 5. However,
in continuous data, if the original age is 15 months, predicting it as 18 months is much
better than predicting it as 30 months. It is hence clear that the goal of the network in
a continuous data is to get as close to the value as possible, while in case of discrete
data if it cannot reach the exact value, it does not matter what value is predicted.
Let us now examine the margin loss function. Let the correct value be k. Consider
3 points k—alpha, k—(alpha–gamma), k + alpha, where 0 < gamma < alpha < k.
Most backpropagation algorithms propagate its network based on the loss function is
known. Hence, for that particular iteration, when all other coefficients are constant,
if the loss function varies, then backpropagation is taken on different steps. However,
if the loss function is the same, the neural network propagates by the same step in
the same direction.
Case 1—The Predicted Value for the Iteration is (k-Alpha)
T k = 0 as prediction is incorrect
From Eq. 6
 2  2
L k = Tk ∗ max 0, m + − vk  + λ(1 − Tk ) ∗ max 0, m − − vk 
 2  2
= 0 ∗ max 0, m + − vk  + λ(1 − 0) ∗ max 0, m − − vk 

L k = λ ∗ max(0, vk  − 0.1)2 (8)

Case 2—The Predicted Value for the Iteration is (k-(Alpha-Gamma))


T k = 0 as prediction is incorrect
From Eq. 6
 2  2
L k = Tk ∗ max 0, m + − vk  + λ(1 − Tk ) ∗ max 0, m − − vk 
Pediatric Bone Age Detection Using Capsule Network 413
 2  2
= 0 ∗ max 0, m + − vk  + λ(1 − 0) ∗ max 0, m − − vk 
L k = λ ∗ max(0, vk  − 0.1)2 (9)

Case 3—The Predicted Value for the Iteration is (k + alpha)


T k = 0 as prediction is incorrect
From Eq. 6
 2  2
L k = Tk ∗ max 0, m + − vk  + λ(1 − Tk ) ∗ max 0, m − − vk 
 2  2
= 0 ∗ max 0, m + − vk  + λ(1 − 0) ∗ max 0, m − − vk 
L k = λ ∗ max(0, vk  − 0.1)2 (10)

From Eq. (8), (9), and (10), it is evident that the step is taken across the same
direction and magnitude irrespective of how far or on which direction the data is
present. Hence, the convergence to any minima is dependent on the order in which
the data is fed to the network. The problem here is evident. Although the margin loss
function indicates if the value is correct or incorrect, it does not indicate how close
it is to the actual value. Hence, the capsule network as proposed by Hinton et al. [2]
is not suitable for continuous data and needs modifications for usage on continuous
data.
For this purpose, this paper uses mean squared error as a loss function and is
scaled down to 3 using the following formulae
yk
y_normk = ∗3
228
y_pred_normk
y_pred_normk = ∗3
228
L k = (y_normk − y_pred_normk )2 (11)

where
yk is the actual age
y_pred_normk is the predicted age
228 is the highest expected age
This is then added to reconstruction loss using Eq. (7).
There are 228 output layers called digit capsules in the network, with each layer
representing the confidence value of the respective output from 1 to 228 months.
These were made into probabilities by passing them to a softmax layer. From here,
20 highest probabilities were taken and scaled such that they add to one. This was
done by dividing each value by the sum of 20 probabilities, which could be denoted
as
Pi
Pi = j=20
(12)
sort_desc j=0 (P j )
414 A. Koppar et al.

where
P i is the updated probability at age i
Pi is the initial probability at age i
i is a subset of all j values
These probability values were then multiplied to the age they represent and were
added.
This paper proposes to take the top 20 outputs instead of all the probabilities
in order to address the problem of vanishing gradients during the initial phase of
training that eventually leads to network collapse. At the beginning of the training,
due to a large number of output neurons with Gaussian initialization, the probabilities
are almost equal to each other. Hence, it outputs the same value for every image.
Later, when the neural network begins to alter weights during backpropagation, it
still continues to output the same values as there are too many weights to alter. In
the end, because multiple alterations do not affect the result, the network collapses
and stops learning.
In order to make sure that these values were not subjected to excessive variance,
each batch of size 3 was given 3 consecutive tries, which tried to make the image
get as close as possible to the actual distribution. The value 3 was obtained using
parameter tuning.
When top 20 probabilities are taken, it is made sure that each time different digit
capsules are taken, thus resulting in different values based on the image. The top 20
probabilities represent 20 most significant outputs of the distribution of the neural
network X  and should effectively represent most of X  . It is expected that the neural
network learns such that the top one or two values are much higher than the rest.
Hence, this setup is expected to work well in the long run too.
Another modification made to the network was to change ReLU [9] activations
in the convolution layers to Leaky ReLU [10]. This helped to solve the problem of
“dying ReLU”, where if a neuron reaches the value of zero, ReLU [9] never provides
a way that the weight could be altered again, which implies the neuron has effectively
died.
On using four layers in the network, as proposed by Hinton et al.[2], there are
too many weights in the neural network. This introduces the possibility of exploding
gradients. Hence, this paper proposes to use only two layers in order to address this
problem.
To summarize, the following modifications are proposed to the capsule network
to make sure it handles continuous data
1. The backpropagation was done using mean squared error (MSE) scaled to three in
place of margin loss. This makes the model try to minimize the distance between
the predicted value and the actual value instead of trying to focus on getting the
exact value. The reconstruction loss was still added to this network.
2. The values from the 228 digit capsules were passed through the softmax layer
and the probabilities were obtained. Following this, top 20 values were taken and
were scaled such that these 20 probabilities add up to 1. The top 20 probabilities
were multiplied with their respective values.
Pediatric Bone Age Detection Using Capsule Network 415

Fig. 3 Size of filters in neural network

3. In order to make sure that these values were not subjected to excessive variance,
each batch of size 3 was given 3 consecutive tries, for the neural network to get
as close as possible to the actual distribution.
4. The ReLU [9] was changed to Leaky ReLU [10] to address the problem of “dying
ReLU”
5. To address the problem of exploding gradients, only two layers were used.
The specifications of the filter size are given in Fig. 3.

7 Convergence

Here are 2 prerequisite theorems to prove the convergence


Theorem 1 A function is convergent to a minima if it is convex. In other words, a
function f is convergent if

   
f x + x  ≥ f (x) + f x  (13)

Theorem 2 A composite function f (g(x)) is convex if f (x) and g(x) is convex and
the range of g(x) is within the domain of f (x).
416 A. Koppar et al.

Let the loss function be L(x) and the optimizer be O(x)


The capsule network uses Adam optimizer [11] to optimize the loss function. It
has been proven by Diederik P. Kingma and Jimmy Lei Ba that Adam optimizer is
a convex function. Hence, O(X) is a convex function.
Let the correct value be x. Consider 2 points x + y and x + z, where

0<y<z (14)

L(x, x + y) = y 2 (15)

L(x, x + z) = z 2 (16)

From Eq. (14) to (16), it could be inferred that

L(x, x + y) > L(x, x + z) when y > z (17)

Hence, it is proven that L(x, x  ) is convex on (x − x  ), where x  is a sample from the


neural network distribution corresponding to the sample x from the original dataset.
Both L(x, x  ) and O(x) work on real numbers. L(x, x  ) and O(x), is a known
function of multiplication, addition, and subtraction on x and x  . Since real numbers
have closure property with respect to addition, subtraction, and multiplication, it
could be inferred that range of L(x, x  ) is within the domain of O(x).
Since L(x, x  ) is convex on (x − x  ), O(x) is convex on x and the range of L(x,
x ) belongs to the domain of O(x), it could be inferred that O(L(x, x  )) is convex on


(x–x  ) using Theorem 2.


Since O(L(x, x  )) is convex on (x−x  ), it could be inferred that (x − x  ) is convergent
to a minima using Theorem 1.
Hence, the backpropagation tries to pull the distribution of the neural network X 
to X + mu where X is the distribution of the originally provided dataset and mu is
the convergence point of the function.
Now let us see how the network works when top k outputs are considered.
Let the distribution of the original dataset be X and the distribution of the neural
network be X  . Let the loss function be expected to converge at X + mu.
Consider 2 cases
1. When X  has reached close to X + mu
2. When X  is far away from X + mu

Case 1 (When X  has reached close to X+ mu)


The significance of the neuron output decreases as the output probability
decreases. Hence, for an optimal value of k-
Significance of top k neuron  significance of other neurons.
Pediatric Bone Age Detection Using Capsule Network 417

Hence, when the top k values are taken, have been effectively sampled X  .
Hence, X  is still close to X + mu and changes are made to the distribution X 
by backpropagation is not major, as it should rightfully be.
Case 2 (When X  is far away from X+ mu)
There could be two sub cases here
1. When the top k neurons taken do not change in the next iteration, the probability
of the appropriate neurons is still reduced as X  is far away from X + mu.
2. When the top k neurons taken a change in the next iteration change
• If the top k significant neurons of X  obtained is such that it is closer to X +
mu, it is converging closer.
• If the next highest probability is farther to X + mu, then the neuron is propa-
gated with a huge loss function in the next iteration. Hence, the most significant
outputs of X  are propagated with a bigger loss function, in the next trial or
when a similar sample is taken again.

8 Dataset Used

The dataset used was RSNA Pediatric Bone Age Challenge (2017) [1]. It was devel-
oped by Stanford University and the University of Colorado, which was annotated
by multiple expert observers. The dataset contains images of wrist bone X-ray in
multiple orientations using multiple X-ray scanners, each resulting with a different
texture of the X-ray.

9 Results

The experiments were conducted on Google Cloud on a TensorFlow VM. The system
was configured to have two virtual CPUs, 96 GB RAM on tesla P100 GPU, to support
the comprehensive computations.
The training set was split into 9000 images for training and 3000 images for
validation. The results of the model were analyzed on the validation dataset.
Figure 4 was plotted on the results obtained when all the 228 output layers were
taken into consideration instead of the top 20 probabilities. From Fig. 4, one can
observe that the algorithm outputs a constant value within a very narrow range of
113–114 months for random samples. This happens because of vanishing gradients,
as a large number of weights are learned. Hence, it is justified why top 20 probabilities
are taken and scaled to 1 instead of taking all 228 probabilities.
Figure 5 is a depiction of parameter tuning to identify the optimal number of trials
to be given for the network to come close to the actual distribution with a batch of
three images. It could be found here that the lowest point in the graph is obtained at
three trials and is hence best while training. In the same graph, the “not as number
418 A. Koppar et al.

Fig. 4 Scatterplot of a random sample predicted bone age samples for first 100 iterations when all
228 probabilities were taken

Fig. 5 Average MSE for number of trials with a batch size of three images

(NaN)” obtained corresponding to 1 trial and 2 trials also show us why the network
is needed to give a few trials for it to get close to the actual distribution.
Figures 6 and 7 and the images following it are the results plotted on the validation
set after training with the proposed methodology. One can observe in Figs. 6 and 7
that the deviation is unbiased to any particular age range in general. In other words,
it could be observed that the ratio of the number of patients with age deviation >15
and the number of patients with age deviation <15 is somewhat constant across most
age ranges.
One can observe from Fig. 8 that the actual age and the predicted age have a
positive correlation. This further confirms the fact that the bone age has effectively
been deployed for continuous data.
Pediatric Bone Age Detection Using Capsule Network 419

Fig. 6 Actual age in months of individuals having an age deviation of less than or equal to 15 months

Fig. 7 Actual age in months of individuals having an age deviation greater than 15 months

Fig. 8 Scatterplot between actual age and predicted age


420 A. Koppar et al.

10 Inference

Hence, the above results show that the data is not biased toward any particular age
range in general. Hence, this network could be more suitable for continuous data
than capsule network proposed by Hinton et al. [2]. [11] [12] [13] [14] [15]

References

1. Stanford Medicine (2017) RSNA pediatric bone age challenge. Dataset from https://www.kag
gle.com/kmader/rsna-bone-age
2. Hinton GE, Sabour S, Frosst N (2017) Dynamic routing between capsules. Code from https://
github.com/ageron/handson-ml/blob/master/extra_capsnets.ipynb
3. Chui A, Patnaik A, Ramesh K, Wang L. Capsule Networks and Face Recognition
4. Labelled faces in the wild (LFW). University of Massachusetts, https://vis-www.cs.umass.edu/
lfw/
5. Iglovikov V, Rakhlin A et al (2018) Pediatric bone age assessment using deep convolutional
neural networks. Pubished by DLMIA,19 June 2018. In: Ahmad F, Najam A, Ahmed Z (2018)
Image-based face detection and recognition. Published by IJCSI, 26 Feb 2013. University
of Waterloo, https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=
rja&uact=8&ved=2ahUKEwjAu_zVgoXuAhXS4jgGHQi0B8YQFjAAegQIARAC&url=
https%3A%2F%2Flindawangg.github.io%2Fprojects%2Fcapsnet.pdf&usg=AOvVaw1GB
oE1a_eSnUMnkLQpUdKE
6. Kumar V, Venkataraghavan K, Krishnan R (2013) The relationship between dental age, bone
age and chronological age in underweight children. US National Library of Medicine National
Institutes of Health
7. Dallora AL, Anderberg P, Kvist O, Mendes E, Ruiz SD, Berglund JS (2019) Bone age
assessment with various machine learning techniques: a systematic literature review and
meta-analysis. Published by Plos one, 25 July 2019
8. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional
neural networks, 1097–1105
9. Agarap AFM (2019) Deep learning using rectified linear units (ReLU)
10. Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in
convolutionnetwork
11. Kingma DP, Ba JL (2017) Adam: a method for stochastic optimization
12. VijayakumarT (2019) Comparative study of capsule neural network invarious applications. J
Artif Intell 1(01):19–27
13. Patrick MK, Adekoya AF, Mighty AA, Edward BY (2019) Capsule networks—a survey
14. Wang Y, Huang L, Jiang S, Wang Y, Zou J, Fu H, Yang S (2020) Capsule networks showed
excellent performance in the classification of hERG blockers/nonblockers
15. Mughal AM, Hassan N, Ahmed A (2014) Bone age assessment methods: a critical review
Design High-Frequency and Low-Power
2-D DWT Based on 9/7 and 5/3
Coefficient Using Complex Multiplier

Satyendra Tripathi, Bharat Mishra, and Ashutosh Kumar Singh

Abstract The distinctive 1-D and 2-D discrete wavelet transform (DWT) models
that subsist in the composing are scull-section, equal channel, crumbled, flipping,
and iterative skeleton. The plans vary with approbation to the competition and gear
essential, the memory reacquired to provision the data picture and widely appealing
coefficients. The guideline focus of this investigation striving is to decide capable
VLSI structures, for the gear use of the 9/7 and 5/3 DWT, using complex multiplier
(CM) and improving the speed and hardware complicities of existing plans.

Keywords 1-D DWT · 2-D DWT · CM

1 Introduction

A portion of its highlights are: versatile time-recurrence windows and lower


associating.
Of late, there have been basic undertakings by the business and authorities to make
PC tomography imaging structures as far as new getting ready and multiplication
figurings. A CT scanner has broad mindfulness with respect to improve both gear and
programming to enable significant standards of CT pictures to be made. The quality
and speed with which they are created are somewhat a direct result of improvements
in capable picture propagation counts [1].
This field is up ‘til now creating and new counts are by and by used that change
in accordance with the variety of issues like a model and handle the projection
upheaval, low identifier check, non-uniform game-plans of the sensor, scatter, etc.
Two important requests in picture entertainment computations are methodical and
Iterative estimation. An analytical model is computationally compelling and speed

S. Tripathi (B) · B. Mishra


MGCGV, Chitrakoot, India
e-mail: satyendra.mgcgv16@gmail.com
A. K. Singh
IIT-A, Prayagraj, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 421
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_32
422 S. Tripathi et al.

with a couple of assumptions like scanner geometry and rough data, for instance,
cognizance of the projections and quiet estimations, etc. [2, 3].
To achieve better picture quality from comparable rough data, continuously down
to earth speculations about scanner geometry and upheaval estimations must be made.
This is done in the more computationally complex iterative revamping methods.
Such iterative diversion techniques may realize longer revamping occasions yet also
in extensively less picture commotion from a comparative rough data through a
progressively complex showing of discoverer response and the quantifiable lead of
the estimations. Iterative multiplication estimation is a lot of capable than logical
computation, among which iterative propagation is considered in this assessment.
Nowadays, iterative generation is playing a genuine activity in PC tomography to
improve the nature of picture and decrease the development of old rarities. Along
these lines, a lot of exploration works have been done as such as to improve the
reproduced picture in both visual and error assessment [4].
The DWT is the best methodology in the field of Image weight and Image coding.
Joint Pixel Master Group (JPEG) is the essential standard method for image pressure.
The coding ability and picture quality are capable in the DWT when appeared differ-
ently in relation to the standard DCT. JPEG has explained the irreversible sort of
the discrete wavelet transform for the beneficial picture pressure. Mechanized image
is one of the standard requirements for both persistent applications additionally as
examination zone. The essential of the image pressure is commonly high on account
of the traffic conveyed by the media sources. The one dimensional what is progres-
sively, two-dimensional discrete wavelet transform is the key limit with respect to
picture getting ready. The multi-objectives signal assessment is cultivated in both
time and repeat space in DWT. The DWT is extensively used in the image pressure
in JPEG 2000 as a result of its time and repeat properties [5].
The image redoing is described as the arrangement of inclusive of two-
dimensional similitudes into macintosh by looking at the condition of the image.
The image redoing is prevalently used in various appositeness like medicine applies
self-governance and gaming. In DWT, there is some course of action of wavelet obli-
gation that is used for the weight, racket diminishes, and revamping process. With
everything taken into account, all correspondence channels have a sporadic racket
in light of these characteristics, and these aqueducts are impacted by the horren-
dous relationship from the fountainhead of the aqueduct. The image amusement is
percolated by the up testing sought after by the modernized aqueduct [6].
Multi-objectives wavelet recondition is the traditional technique of diversion.
The essential downside of the customary system is the utmost amazing gear indis-
pensability to plethora the midway characteristics. The computational deferment
of the tenacious is moreover remarkable. To fatigue these concerns, the multiband
wavelet vicissitude is essentially used for the image revamping process. By using
the proposed multiband wavelet vicissitude, the repeat covering of the equipment is
diminished. The summation aqueducts are used to gather the multiplication square.
The image distinction and force are capable in the multiband wavelet to vicissitude
when appeared differently in relation to standard multiresolution wavelet vicissitude
[7, 8].
Design High-Frequency and Low-Power 2-D DWT Based … 423

2 CM

Structure of N × N CM is representing in Fig. 1. Rr and Ri is first I/P, I r and I i is the


second I/P and Or and Oi is O/P of CM. r and I represent by a real and imaginary
number.
CM is consists of four N × N multipliers and one KSA and subtractor. N is
representing by the number of bits. CM is the important role of the different types
of field, i.e., wireless and speech processing.

R = Rr + j Ri
I = Ir + j Ii

R multiplier I then

O=R ×I

O = Rr × Ir − Ri × Ii + j (Rr × Ii + Ri × Ir )
Or = Rr × Ir − Ri × Ii

Oi = Rr × Ii + Ri × Ir

Fig. 1 Structure of N × N Rr
CM N×N
Ii Multiplier
KSA Oi
Adder
Ri
N×N
Ir Multiplier

N×N
Multiplier

Sub- Or
tractor
N×N
Multiplier
424 S. Tripathi et al.

3 DWT

The resolution analysis limit and time-gradation district properties of the DWT have
set up it as a stunning resource for different applications, for instance, signal exam-
ination, picture pressure, and numerical assessment, as communicated by Mallat. It
is driven different exploration social occasions to make counts and gear models to
execute the DWT.
In the standard convolution method for DWT, several finite impulse response
(FIR) aqueducts are applied in equal, to decide high-pass and low-pass aqueduct
coefficients. Mallat’s monolith estimation can be recycled to addresses the wavelet
coefficients of an illustration in a couple of spatial headings.
The plans are by and large crumbled and can be completely requested into consecu-
tive and equal structures as discussed [7]. The designing discussed executes aqueduct
bank structure capably, using digit consecutive pipelining. This building structures
the explanation behind the gear execution of sub band rot, using the conversational
DWT for JPEG 2000. An accustomed plan in whichever DWT break down the
information picture is showing up underneath in Fig. 2.
Each crumbling level showed up in Fig. 2 incorporates two stages arrange performs
level isolating, and stage 2 operate vertical permeate. In the primary level rot, the
breadths of the data picture are N by N size and dissociate four standby federate L_L,
H_H, L_H and H_L. L is imitated by low and H is imitate by high frequency. Four
standby federate are N/2 by N/2 size. L_L standby federates more dossier compared
to other standby federate by virtue of L standby federate is boilerplate value of the
pixel and H standby federate is difference value of a pixel. H_H standby federate is
fewer dossiers. Derived all standby federate is below:


K −1 K
 −1
x LJ L ((n 1 , n 2 ) = h(i 1 )h(i 2 )x LJ −1
L (2n 1 − i 1 )(2n 2 − i 2 )
i 1 =0 i 2 =0


K −1 K
 −1
x LJ H (n 1 , n 2 ) = h(i 1 )g(i 2 )x LJ −1
L (2n 1 − i 1 )(2n 2 − i 2 )
i 1 =0 i 2 =0

Fig. 2 Three-level DWT of


an image
Design High-Frequency and Low-Power 2-D DWT Based … 425


K −1 K
 −1
x HJ L (n 1 , n 2 ) = g(i 1 )h(i 2 )x LJ −1
L (2n 1 − i 1 )(2n 2 − i 2 )
i 1 =0 i 2 =0


K −1 K
 −1
x HJ H (n 1 , n 2 ) = g(i 1 )g(i 2 )x LJ −1
L (2n 1 − i 1 )(2n 2 − i 2 )
i 1 =0 i 2 =0

Position of X LL is 2-D data picture, J is boilerplate by decomposing, h and g is


boilerplate by low- and high-pass distill coefficient.
Explanatory and iterative reproduction calculations are the two philosophies in
PC tomography for the examination of picture quality. Explanatory model is one in
which it endeavors to locate the immediate answer for the picture remaking from
the obscure projections. The investigative calculation is constrained to fragmented
projections and scanty in see. In iterative reproduction, image gauge is dynamically
refreshed toward an improved arrangement. To help the iterative picture reproduction
calculation, numerous methodologies have been introduced in writing. Among these
techniques, the projection-based strategy is a proficient and a twisting less method.

4 Proposed Methodology

In the DWT, the bi-balanced wavelets are realized by using the lifting strategy. The
spatial territory and lifting system is used to create a lifting strategy. In the lifting plan,
three guideline steps are generally played out that is, split, anticipate and update. The
information picture tests x(n) are apportioned concerning the odd and even models
in the split square. The channel is required for the odd and even guides to keep from
the bothersome hailing. Lifting plan is performed by based kind of the channel.
The scaling step is used to find the low-pass subgatherings of the odd and even
tests. Channel utilization is changed into the growth of cross sections in the lifting
plan (Fig. 3).
The image pressure is performed successfully by using the lifting plan, and the
gear uses are significantly diminished by using the channels.
Inward item calculation can be communicated by the complex multiplier. The
DWT detailing utilizing convolution plot given in can be communicated by internal
item, where the 1-D DWT definition surrendered (1) and (2) cannot be communicated
by inward item.
In spite of the fact that convolution DWT requests number juggling assets than
DWT, tortuousness DWT is speculated to ensnare the benefits of CM-based plan.
CM definition of tortuousness-based DWT utilizing 5/3 and 9/7 biological channel
is introduced here (Fig. 4).
As per (5) and (6), the 5/3 wavelet channel calculation in tortuousness structure
is communicated as
426 S. Tripathi et al.

Fig. 3 Multiplier-based 9/7 coefficient-based 1-D DWT


4
YL = h(i)X n (i)
i=0


2
YH = g(i)X n (i)
i=0

Position of h(i) and g(i) is boilerplate by low- and high-pass distill 5/3 coefficient.
Position of Y L and Y H is boilerplate by low- and high-pass distill O/P. Position I
vary between 0 to 4 for low distill and 0 to 2 for high distill.

5 Simulation Result

CM-based 5/3 and 9/7 2-D DWT is designed Xilinx software with 14.2i version.
Xilinx is work on two steps, i.e., primary and secondary design. The primary design
is defined on the I/O part of the systems and second part has defined the relation
between I/O part. 5/3 1-D DWT is represented primary and secondary design in
Figs. 5, 6 and 7.
9/7 1-D DWT is represented primary and secondary design in Figs. 8 and 9 (Table
1).
Design High-Frequency and Low-Power 2-D DWT Based … 427

X(n)

DMUX

Low pass portion with CM High pass portion with CM


(LPP_CM) (HPP_CM)

YL YH

DMUX DMUX

LPP_CM LHP_CM HLP_CM HHP_CM

YLL YLH YHL YHH

Fig. 4 5/3 and 9/7 2-D DWT using CM technique

6 Conclusion

It is concluded that the CM-based 2-D DWT provides the best result compared to
the previous year. The compared result is based on delay, adder, frequency, and net
power (Figs. 10, 11 and 12).
428 S. Tripathi et al.

Fig. 5 Primary delineate of 1-D DWT including 5/3 coefficient

Fig. 6 Secondary delineate of 1-D DWT including 5/3 coefficient


Design High-Frequency and Low-Power 2-D DWT Based … 429

Fig. 7 Summary of 1-D DWT including 5/3 coefficient


430 S. Tripathi et al.

Fig. 8 Primary delineate of 1-D DWT including 9/7 coefficient

Fig. 9 Secondary delineate of 1-D DWT including 9/7 coefficient

Table 1 Comparison result


Parameter Proposed design 2017 [1] 2018 [2] 2019 [3]
Delay 19.430 – 21.233 ns –
Adder 6 – – 7
Frequency 865.801 MHz 350 MHz 800.6 MHz –
Net power 62,061 NW 74,459.9 NW – –
Design High-Frequency and Low-Power 2-D DWT Based … 431

Fig. 10 Summary of 1-D DWT including 9/7 coefficient


432 S. Tripathi et al.

Fig. 11 Secondary delineate of 2-D DWT including 9/7 coefficient

Fig. 12 Secondary delineate of 2-D DWT including 9/7 coefficient

References

1. Gardezi SEI, Aziz F, Javed S, Younis CJ, Alam M, Massoud Y (2019) Design and VLSI imple-
mentation of CSD based DA architecture for 5/3 DWT. 978-1-5386-7729-2/19/$31.00©2019
IEEE
2. Mohamed Asan Basiri M, Noor Mahammad S (2018) An efficient VLSI architecture for convo-
lution based DWT Using MAC. In: 31th international conference on VLSI design and 2018,
17th international conference on embedded systems. IEEE
3. Chakraborty A, Chakraborty D, Banerjee A (2017) A memory efficient, high throughput
and fastest 1D/3D VLSI architecture for reconfigurable 9/7 & 5/3 DWT filters. In: Interna-
tional conference on current trends in computer, electrical, electronics and communication
(ICCTCEEC-2017)
4. Biswas R, Malreddy SR, Banerjee S (2017) A high precision-low area unified architecture for
lossy and lossless 3D multi-level discrete wavelet transform. Trans Circuits Syst Video Technol
45(5)
Design High-Frequency and Low-Power 2-D DWT Based … 433

5. Bhairannawar SS, Kumar R (2016) FPGA implementation of face recognition system using
efficient 5/3 2D-lifting scheme. In: 2016 International conference on vlsi systems, architectures,
technology and applications (VLSI-SATA)
6. Martina M, Masera G, Roch MR, Piccinini G (2015) Result-biased distributed-arithmetic-based
filter architectures for approximately computing the DWT. IEEE Trans Circuits Syst I Regul
Pap 62(8)
7. Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representa-
tion. IEEE Trans Pattern Anal Mach Intell 110:674–693
8. Alam M, Rahman CA, Jullian G (2003) Efficient distributed arithmetic based DWT architec-
tures for multimedia applications. In: Proceedings of IEEE workshop on SoC for real-time
applications, pp 333–336
9. Zhao X, Vi Y, Erdogan AT, Arslan T (2000) A high-efficiency reconfigurable 2-D discrete
wavelet transform engine for JPEG 2000 implementation on next generation digital cameras.
978-1-4244-6683-2/10/$26.00 ©2010 IEEE
10. Fan X, Pang Z, Chen D, Tan HZ (2010) A pipeline architecture for 2-D lifting-based discrete
wavelet transform of JPEG2000. 978-1-4244-7874-3/10/$26.00 ©2010 IEEE
11. Baviskar A, Ashtekar S, Chintawar A, Baviskar J, Mulla A (2014) Performance analysis of sub-
band replacement DWT based image compression technique. 978-1-4799-5364-6/14/$31.00
©2014 IEEE
12. Deergha Rao K, Muralikrishna PV, Gangadhar C (2018) FPGA implementation of 32 bit
complex floating point multiplier using vedic real multipliers with minimum path delay. https://
doi.org/10.1109/UPCON.2018.8597031@2018 IEEE
13. Lian C, Chen K, Chen H, Chen L (2001) Lifting based discrete wavelet transform architecture
for JPEG 2000. In: Proceedings of IEEE international symposium on circuits systems, vol 2,
Sydney, Australia, pp 445–448, May 2001
14. Aziz F, Javed S, Gardezi SEI, Younis CJ, Alam M (2018) Design and implementation of
efficient DA architecture for LeGall 5/3 DWT. In: IEEE international symposium on recent
advances in electrical engineering (RAEE)
Fuzzy Expert System-Based Node Trust
Estimation in Wireless Sensor Networks

K. Selvakumar and L. Sai Ramesh

Abstract Wireless sensor networks are used in most recent real-time systems glob-
ally. In WSN, many researchers feel they are arriving at the destination in all aspects
of security and trust. Because trust is the significant factor to implement a secure
network by transmitting a packet in a trusted way. Most of the models evaluate the
present trust value of the nodes and do not predict the upcoming changes in the trust
factor of the nodes. This paper provides a new trust estimation model to evaluate
the trust value of the node with a fuzzy expert system that predicts the changes
that may be going to occur in the future based on the inference mechanism. This
proposed work also concentrates on energy efficiency to optimize the node energy
level even though the nodes are undergone trustworthy data transmission. Further,
the results obtained from the experiments show that the proposed model outperforms
the existing in both parameters as trust and energy efficiency.

Keywords Trust model · WSNs · Malicious node · Fuzzy expert systems ·


Fuzzification · Fuzzy rules

1 Introduction

WSNs comprise of a huge volume of sensor nodes that are tiny and have restricted
process capableness and energy supports [1]. It performs multi-role like servers,
routers, etc., in general during active mode and in standby behavior also goes to
processing mode while detecting events in encircled space, particularly in explode
cases of subtle packets generated by malicious nodes which increase network traffic
as well as the energy consumption problem. Moreover, the presence of malicious

K. Selvakumar
Department of Computer Applications, NIT, Trichy, India
e-mail: kselvakumar@nitt.edu
L. Sai Ramesh (B)
Department of Information Science and Technology, CEG Campus, Anna University, Chennai,
India
e-mail: sairamesh.ist@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 435
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_33
436 K. Selvakumar and L. Sai Ramesh

nodes with a magnified tendency of malfunction would worsen the network perfor-
mance. Designing an optimal path-based routing and energy consumption system in
WSNs has been proposed previously without and with a fuzzy expert system. These
research works deal with the packet routing and energy saving by establishing a trust
model to measure the trust value of nodes in WSNs.
A fuzzy logic-based research model is proposed to efficiently construct network
traffic and reduces the packet transmission loss for prioritized event-driven traffic
approaches [2]. An energy consumption-based QoS packet routing algorithm for
WSNs was developed to run effectively with best-attempt traffic [3]. However, these
models are assaying to cut down the packet transmission overhead based on the
trust value of the nodes and stable environment without considering the current trust
values of the untrusted nodes which are trying to increase the network traffic as well
as energy consumption.
The main aim of the trust estimation model is to predict trust values that are used
to depict the trustworthiness, reliableness, or competency of each node, with the aid
of some management techniques [4]. Hence, the estimated trust information is used
for the top-level layer to perform packet routing [5–7], data accumulation [8], and
energy optimization process [9–11]. There are a variety of trust management schemes
which are planned for WSNs [12–18]; however, most of them did not establish an
inexpensive trust management theme to precise the sound judgment, uncertainty, and
transitivity of trust characteristics in WSNs. This research article proposes a fuzzy
expert system (FES)-based node trust estimation with the help of a trust estimation
model to optimize the packet transmission as well as energy consumption.

2 Related Works

Trust is outlined because of the belief that the trusting agent has within the trustworthy
agent’s temperament and capability to deliver a reciprocally united service in an
exceedingly given context and in an exceedingly given timeslot [19]. It conveys the
level of an object or a process which is conceived to be true. In most recent years,
many research work on the trust direction system has been pulling more attention,
and also, some peer-to-peer trust control models were introduced.
Trust control models are used to manage the network security issues which also
incorporates the support of encryption techniques with decision building authorities
[20]. Even though there is much significant research works done on trust estimation
models by various researchers, their pertinency on mobile ad-hoc systems as well as
wireless sensor networks has been experienced determined research scope.
The trustworthy protocols have merged the ideas from two various broad spec-
trums: trust estimation model as well as packet routing protocols for both mobile
ad-hoc and wireless sensor networks. Beth et al. [21] suggested a trust control
model, which brought in the idea to convey and calculate the trust, in writing the
worthy equation was extracted and combined. This model also splits the trust into
the following two major types, namely direct trust and recommendation trust, which
Fuzzy Expert System-Based Node Trust Estimation … 437

are the key measures to represent the relationship of trust between the subject and
object and recommendation object to subject. A trust control model projected by
Josang [22] supported the subjective logic model, which introduced the evidence
and invention space to explain and compute the construct of trust relationships. This
model outlined a collection of subjective logic operators for ancestry and intensive
computation of trust worth. Nevertheless, mobile ad-hoc networks had some apparent
underlying features, like restricted resources, easy application, deficiency of central-
ized committed server, varying topology, etc. As a result, the authorized deputation
mechanism, as well as public-key encryptions, is not seemed to be appropriate for
this environment. Therefore, the ancient trust management models are not suitable.
Hence, in this circumstance, so many trust estimation models are projected within
the region of the network. In the mobile ad-hoc atmosphere, trust is often thought
of because of the reliance of a network node on the power of alternative nodes to
sending packets or providing services timely, combinedly, and dependably [23]. In
this research work, a novel adaptive and intelligent trust-based model has been built
to estimate the trustworthiness of nodes existing in the network topology to achieve
secure packet transmission between sender and receiver nodes. Also, the fuzzy rule
base which is used to classifying the node’s trust to reduce the malicious behavior of
those nodes causes the savings of energy in the network. Hence, the energy saving is
achieved and also predicts the optimal path between the sender and receiver nodes
in WSNs.

3 Trust Construction Model

Trust manifests the assumption or assurance or anticipations on the honestness, unity,


power, handiness, and quality of service of objective node’s succeeding activeness. It
additionally shines the reciprocal relationships wherever an afforded node acts in an
exceedingly trustful manner and holds authentic communications alone with nodes
that are extremely believed by the committed node. This heuristic-based trust model
consists of three different types of trust values, such as basic trust value, current trust
value, and route trust value also called path trust value.
The proposed fuzzy trust estimation model evaluates the initial trust value of all
the nodes in the network through the interaction with the neighbor nodes. Every node
is deployed with an intelligent agent within that to maintain the modified trust table
of them and kept the information about the trust value of their neighbors. The values
are changed according to the node energy level, packet transmission rate, and average
delay achieved after each packet transmission. The parameter values are not to be
the same for all transmissions. It will change according to the range of transmission
and number of neighbor nodes it approached to transmit the packet.
Two types of trust are considered in this model; one is basic trust BTij evalu-
ated through the initial value of energy and direct trust from neighbor nodes. The
second one is current rust CTij calculated at a specific time interval or after each data
transmission based on the parameters which mentioned earlier. From these two trust
438 K. Selvakumar and L. Sai Ramesh

values, the present trust value NTVij is evaluated as mentioned in Eq. 1. The trust
values are updated at specific time intervals which are mentioned as t 1 and t 2 .

NTV(t)i j = α BT (t)i j + βC T (t)i j t1 ≤ t ≤ t2 (1)

The weights α and β (α, β ≥ 0, α > β, and α + β = 1) are assigned to BTij and
CTij . Now the basic trust is computed using the relation represented by SEm (i, j) as
given Eq. 2.
 N tk
SEm (i, j)
BT(t)itkj = m=1
(2)
Ntk

The current trust (CT) value estimated in this model is the trust value of the node
in the time interval between t and t + 1. This proposed trust model from this research
work is to compute the node’s entire trust value based on the fuzzy expert system
approach. In this article, the term current trust (CT) represents the node’s current
trust value as given in Eq. 3. Another factor that evolved using threshold value based
on trust is creditability. The node which is going to be a part of the transmission path
is based on the creditability value. The value of the creditability is high when it is
higher than the specified threshold value; otherwise, it will be considered as medium
or low. The creditability value will change dynamically based on the factors used for
evaluating the creditability.

CT(t)ir j = CC(t)ir × BT(t)r j t1 ≤ t ≤ NOW (3)

In this work, current trust (CT) is computed using the mathematical representation
given in Eq. 4.

CC(t)ir = BT(t)i1 × BT(t)12 × BT(t)23 × · · · × BT(r −1)r t1 ≤ t ≤ NOW (4)

If n nodes are present in the communication, having current trust values: Using
these n values, CT(t) is computed using the form shown in Eq. 5.

CT(t)r P1 j , CT(t)i P2 j , . . . , CT(t)i Pn j


n
CT(t)i j = W Pk × CT(t)i Pk j (5)
k=1

This model which estimates the trust value of the node i on node j in time interval
t + 1 is represented as (T ij (t + 1)) which is derived with the help of both basic trust
of i on j at time t (BTij (t)) and current trust on j to i by few other nodes at the time
of t as (CTij (t)) as shown in Eq. 6 as follows

T i (t + 1) = α × BTij (t) + (1 − α) × CTij (t), 0 ≤ α ≤ 1, t1 ≤ t ≤ NOW (6)


Fuzzy Expert System-Based Node Trust Estimation … 439

Table 1 Range of fuzzy


Linguistic variables Fuzzy values Symbols
values for each input trust
parameter basic trust (BT), Low 0.0 ≤ z ≤ 0.4 LOW
current trust (CT) and path Low medium 0.3 ≤ z ≤ 0.6 LM
trust (PT)
Medium 0.5 ≤ z ≤ 0.8 MED
High 0.7 ≤ z ≤ 1.0 HGH

Fig. 1 Fuzzy membership function representation of the node’s basic trust (BT)

In this research work, the proposed model incorporates Gaussian fuzzifiers for
estimating membership values of the number of packets transmitted by each node
using Eq. 7.
 
−(x−c)2
μTrust-value (X ) = e 2σ 2 (7)

Based on the knowledge of domain experts, input parameters (low, low medium,
medium, and high), as well as output parameters (low, low medium, medium, and
high), are selected. The range of fuzzy value for each linguistic variable of the
trust-based parameter is shown in Table 1. The fuzzification process begins with
the transubstantiation of the given node-based trust parameters using the functions
that are represented in Eq. 7. Both basic and current trust of node’s related fuzzy
membership representation is shown in Figs. 1 and 2, respectively.

4 Results and Discussions

The proposed model combines both global as well as local-based trust optimization
and provides an acceptable and accurate prediction of malicious nodes as well as path
recommendation. The environment for this experiment is created using NS2.3.5. The
simulation environment considered 25 nodes in an area of 500 x 500 m2 . Nodes are
static and each node having equal energy 1 J at the initial stage. The membership
440 K. Selvakumar and L. Sai Ramesh

Fig. 2 Fuzzy membership function representation of the node’s current trust (CT)

values was determined from these values by using the Gaussian fuzzy membership
function which is discussed in Sect. 3.
In the crisp set approach, the minimum threshold value is assumed as 0.4. If the
trust value is greater than threshold value, it is represented as 0, i.e., the trusted
node, and if it is lesser than the threshold value, then it is represented as 1, i.e.,
untrusted node (malicious node) in crisp set (Table 2). Even though the crisp set
value is accurate, but they do not explain anything about the range of trust value. To
overcome this dynamism of truth value, a fuzzy expert system is approached which
corresponds to low, low medium, medium, high.
Fuzzy expert system values are more accurate than the crisp set value which does
not provide anything about the range of trust value. With the help of a fuzzy expert
system, a trusted path is established for transferring data from source to destination.
Hence, a fuzzy expert system-based trust evaluation model is a better, accurate,
reliable result than existing approaches depicted in Fig. 3.

5 Conclusions and Future Work

In this research article, a completely new extension of the representation model of


a fuzzy expert system-based trust model through a heuristic approach is proposed
to measure the trust worth of nodes. This newly proposed model furnishes versatile
and feasible overture to select a better node altogether based on the trust constraints,
and energy consumption is achieved. The trust value includes both initial and current
trust values which make the system to efficiently identify the actual trust value before
starting the packet transmission. The future work may consider any other alternative
imputes to the trust model for enhancing the accuracy of trust evaluation.
Table 2 FES-based trust estimation of nodes and its trust classes
Node BT BT_MF BT_LTM CT CT_MF CT_LTM TV TV _MF TRUST_CLS CRS_CL RUL_NO
N1 0.98 0.681849 MED 0.804 0.941919 HGH 0.892 0.819655 MED 0 15
N2 0.984 0.671108 MED 0.84 0.877001 HGH 0.912 0.76934 MED 0 15
N3 0.966 0.718986 MED 0.891 0.755925 MED 0.9285 0.725373 MED 0 11
N4 0.795 0.999009 HGH 0.907 0.713284 MED 0.851 0.908152 MED 0 12
N5 0.961 0.732033 MED 0.873 0.80169 MED 0.917 0.756218 MED 0 11
N6 0.911 0.852302 HGH 0.852 0.851118 HGH 0.8815 0.844413 HGH 0 16
N7 0.754 0.990311 HGH 0.975 0.524318 LMD 0.8645 0.881575 MED 0 8
N8 0.372 0.17909 LOW 0.433 0.391793 LMD 0.4025 0.252918 LOW 0 5
N9 0.977 0.689871 MED 0.773 0.979677 HGH 0.875 0.859074 MED 0 15
N 10 0.925 0.820922 HGH 0.852 0.851118 HGH 0.8885 0.828048 HGH 0 16
N 11 0.333 0.12746 LOW 0.277 0.112574 LOW 0.305 0.107043 LOW 1 1
N 12 0.819 0.98847 HGH 0.73 0.999992 HGH 0.7745 0.996703 HGH 0 16
Fuzzy Expert System-Based Node Trust Estimation …

N 13 0.923 0.825533 HGH 0.846 0.864295 HGH 0.8845 0.83747 HGH 0 16


N 14 0.937 0.792451 MED 0.715 0.99786 HGH 0.826 0.949458 MED 0 15
N15 0.437 0.294878 LOW 0.361 0.235038 LOW 0.399 0.246115 LOW 1 1
N 16 0.652 0.836519 MED 0.497 0.562255 LMD 0.5745 0.694499 LMD 0 7
N 17 0.963 0.72683 MED 0.885 0.771475 MED 0.924 0.737544 MED 0 11
N18 0.652 0.836519 HGH 0.634 0.907792 HGH 0.643 0.867305 HGH 0 16
N19 0.628 0.779817 MED 0.702 0.99215 HGH 0.665 0.911407 MED 0 15
N 20 0.935 0.797287 MED 0.839 0.879071 HGH 0.887 0.831603 MED 0 15
N 21 0.872 0.926678 HGH 0.919 0.680443 MED 0.8955 0.81113 MED 0 12
(continued)
441
Table 2 (continued)
442

Node BT BT_MF BT_LTM CT CT_MF CT_LTM TV TV _MF TRUST_CLS CRS_CL RUL_NO


N 22 0.337 0.132171 LOW 0.244 0.080896 LOW 0.2905 0.092537 LOW 1 1
N23 0.939 0.78758 MED 0.885 0.771475 MED 0.912 0.76934 MED 0 11
N 24 1 0.627781 LMD 0.967 0.546446 LMD 0.9835 0.571087 LMD 0 6
N 25 0.575 0.640932 LMD 0.528 0.649018 MED 0.5515 0.629911 LMD 0 10
K. Selvakumar and L. Sai Ramesh
Fuzzy Expert System-Based Node Trust Estimation … 443

Fig. 3 Successive process of summation of all outputs of node-based trust estimation

References

1. Forghani A, Rahmani AM (2008) Multi state fault tolerant topology control algorithm for
wireless sensor networks. future generation communication and networking. In: FGCN ‘08.
Second ınternational conference, pp 433–436
2. Munir SA, Wen Bin Y, Biao R, Man M (2007) Fuzzy logic based congestion estimation for
QoS in wireless sensor network. In: Wireless communications and networking conference,
WCNC.IEEE, pp 4336–4346.
3. Akkaya K, Younis M (2003) An Energy-Aware QoS Routing Protocol for Wireless Sensor
Networks. Distributed Computing Systems Workshops, Proceedings. 23rd International
Conference. 710–715
4. Sun YL, Han Z, Liu KJR (2008) Defense of trust management vulnerabilities in distributed
networks. Commun Mag 46(4):112–119
5. Sathiyavathi V, Reshma R, Parvin SS, SaiRamesh L, Ayyasamy A (2019) Dynamic trust
based secure multipath routing for mobile Ad-Hoc networks. In: Intelligent communication
technologies and virtual mobile networks. Springer, Cham, pp 618–625
6. Selvakumar K, Ramesh LS, Kannan A (2016) Fuzzy Based node trust estimation in wireless
sensor networks. Asian J Inf Technol 15(5):951–954
7. Thangaramya K, Logambigai R, SaiRamesh L, Kulothungan K, Ganapathy AKS (2017) An
energy efficient clustering approach using spectral graph theory in wireless sensor networks.
In: 2017 Second ınternational conference on recent trends and challenges in computational
models (ICRTCCM). IEEE, pp 126–129
8. Poolsappasit N, Madria S (2011) A secure data aggregation based trust management approach
for dealing with untrustworthy motes in sensor networks. In: Proceedings of the 40th
ınternational conference on parallel processing (ICPP ’11), pp 138–147
9. Feng RJ, Che SY, Wang X (2012) A credible cluster-head election algorithm based on fuzzy
logic in wireless sensor networks. J Comput Inf Syst 8(15):6241–6248
10. Selvakumar K, Karuppiah M, SaiRamesh L, Islam SH, Hassan MM, Fortino G, Choo KKR
(2019) Intelligent temporal classification and fuzzy rough set-based feature selection algorithm
for intrusion detection system in WSNs. Inf Sci 497:77–90
11. Raj JS (2019) QoS optimization of energy efficient routing in IoT wireless sensor networks. J
ISMAC 1(01):12–23
12. Claycomb WR, Shin D (2011) A novel node level security policy framework for wireless sensor
networks. J Netw Comput Appl 34(1):418–428
444 K. Selvakumar and L. Sai Ramesh

13. Selvakumar K, Sairamesh L, Kannan A (2017) An intelligent energy aware secured algorithm
for routing in wireless sensor networks. Wireless Pers Commun 96(3):4781–4798
14. Feng R, Xu X, Zhou X, Wan J (2011) A trust evaluation algorithm for wireless sensor networks
based on node behaviors and D-S evidence theory. Sensors 11(2):1345–1360
15. Ganeriwal S, Balzano LK, Srivastava MB (2008) Reputation-based framework for high integrity
sensor networks. ACM Trans Sens Netw 4(3):1–37
16. Kamalanathan S, Lakshmanan SR, Arputharaj K (2017) Fuzzy-clustering-based intelligent
and secured energy-aware routing. In: Handbook of research on fuzzy and rough set theory in
organizational decision making. IGI Global, pp 24–37
17. Shaikh RA, Jameel H, d’Auriol BJ, Lee H, Lee S, Song YJ (2009) Group-based trust
management scheme for clustered wireless sensor networks. IEEE Trans Parallel Distrib Syst
20(11):1698–1712
18. Selvakumar K, Sairamesh L, Kannan A (2019) Wise intrusion detection system using fuzzy
rough set-based feature extraction and classification algorithms. Int J Oper Res 35(1):87–107
19. Chang EJ, Hussain FK, Dillon TS (2005) Fuzzy nature of trust and dynamic trust modeling in
service-oriented environments. In: Proceedings of workshop on secure web services, pp 75–83
20. Guo S, Yang O (2007) Energy-aware multicasting in wireless ad hoc networks: a survey and
discussion. Comput Commun 30(9):2129–2148
21. Beth T, Borcherding M, Klein B (1994) Valuation of trust in an open network. In: Proceedings
of ESORICS, pp 3–18
22. Josang A (2001) A logic for uncertain probabilities. Int J Uncertainty Fuzziness Knowl Based
Syst 9(3):279–311
23. Darney PE, Jacob IJ (2019) Performance enhancements of cognitive radio networks using the
improved fuzzy logic. J Soft Comput Paradigm (JSCP) 1(02):57–68
Artificial Neural Network-Based ECG
Signal Classification and the Cardiac
Arrhythmia Identification

M. Ramkumar, C. Ganesh Babu, G. S. Priyanka, B. Maruthi Shankar,


S. Gokul Kumar, and R. Sarath Kumar

Abstract Electrocardiogram is an essential tool to determine the clinical condition


of cardiac muscle. An immediate and the precise detection of cardiac arrhythmia is
highly preferred for aiding good and healthy life, and it leads to healthy survival for
the humans. In this study, utilizing MATLAB tools the feature extraction is made
by various statistical parameters from both normal and the abnormal categorization
of ECG signals. These features are inclusive of variance, arithmetic mean, kurtosis,
standard deviation, and skewness. The feature vector values reveal the informational
data with respect to a cardiac clinical health state. The focus on this study is made
by utilizing the classifier of artificial neural network in order to identify the ECG
abnormalities. Levenberg–Marquardt backpropagation neural network (LM-BPNN)
technique is being utilized for the cardiac arrhythmia classification. The ECG data
are extracted from the MIT-BIH cardiac arrhythmia database, and the data are tested
which is utilized for further classification of ECG arrhythmia. The comparison for the
results of classification is made in terms of accuracy, positive predictivity, sensitivity,
and specificity. The results of experimentation have been validated based on its
accuracy of classification through tenfold cross-validation technique. It has resulted

M. Ramkumar (B) · G. S. Priyanka · B. Maruthi Shankar · R. Sarath Kumar


Department of Electronics and Communication Engineering, Sri Krishna College of Engineering
and Technology, Coimbatore, India
e-mail: mramkumar0906@gmail.com
G. S. Priyanka
e-mail: priyu3025@gmail.com
B. Maruthi Shankar
e-mail: maruthishankar@gmail.com
R. Sarath Kumar
e-mail: sarathkumar@skcet.ac.in
C. Ganesh Babu
Bannari Amman Institute of Technology, Sathyamangalamn, India
e-mail: bits_babu@yahoo.co.in
S. Gokul Kumar
Department of Technical Supply Chain, Ros Tech (A & D), Bengaluru, Karnataka, India
e-mail: gokulkumar_ece@yahoo.co.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 445
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_34
446 M. Ramkumar et al.

with an average accuracy of 99.5% in predicting the cardiac arrhythmias of a different


class.

Keywords Artificial neural networks · Electrocardiogram · Feature extraction ·


Classification · MIT-BIH arrhythmia database · Levenberg–Marquardt (LM)
algorithm

1 Introduction

The heart enables the triggering of minute electrical impulses at the sinoatrial node,
and it enables its spread through the heart’s conduction system in order to make the
rhythmic contraction. The recording of these impulses is done by the ECG instrument
in terms of sticking the surface electrodes over the layer of skin in various parts of the
chest surrounding the cardiac muscle. The electrical tracings of the heart’s activity
are represented as the ECG waveform, and the spikes and dips will determine the
conditions of the cardiac muscle. The generation of normal ECG waveform is shown
in Fig. 1. An ECG waveform is represented as the series of positive waves and the
negative waves which are resulted due to various deflections in each section of the
cardiac beat.
The tracing of typical ECG signal is consisting of the P wave, QRS complex,
and T wave for each cycle of a cardiac beat. The ECG makes the detection over the
ion transfer via the myocardium which gets varied in each heartbeat. The isoelectric
line is denoted as the ECG signal’s baseline voltage wherein which it gets traced
following the sequence of T wave and the preceding of the successive P wave. The

Fig. 1 Generation of normal ECG Wave


Artificial Neural Network-Based ECG Signal Classification … 447

Table 1 Normal ECG signal amplitudes and time duration


S. No. Parameters of ECG Typical amplitude (mV) and time duration (s)
1 P wave 0.25 mV
2 R wave 1.60 mV
3 Q wave 25% of R wave
4 T wave 0.1–0.5 mV
5 P-R interval 0.12–0.20 s
6 Q-T interval 0.35–0.44 s
7 S-T segment 0.05–0.15 s
8 P wave Interval 0.11 s
9 QRS interval 0.09 s

heart’s upper chamber initiates the P wave. The P wave is declared as the initial wave
to be generated because of the contraction of heart’s upper chamber followed by the
flat straight line caused because of the electrical impulse and travels to the lower
chambers. As it is being discussed, the ventricle contraction determines the QRS
complex and the last production of T wave for resting the ventricles. The periodic
cycle of the heart’s electrical activity is denoted by the sequence of P-QRS-T. The
normal values of various ECG waveforms are represented in the following Table 1.
Different data mining and machine learning methods have been formulated for
improving the accuracy of ECG arrhythmia detection. Due to the non-stationary
and the nonlinear nature of the ECG signal, the nonlinear methods of extraction
are denoted as the best candidates for the information extraction in the ECG signal
[1]. Because of the ANN is denoted as the pattern matching method on the basis of
mapping the nonlinear input–output data, it can be efficiently utilized for making the
detection of morphological variations in the nonlinear signals like the ECG signal
component [2]. This study proposes the usage of neural network using backpropaga-
tion algorithm and the technique of Levenberg–Marquardt (LM) technique for ECG
signal classification with which the data has been acquired from MIT-BIH arrhythmia
database.

2 Review of Literature

The presentation over a few studies has been made over the neural network system
performance when it is being utilized for detecting and recognizing the abnormal
ECG signals [3]. The utilization of neural network systems for analyzing the ECG
signal produces few advantages over several conventional techniques. The required
transformations and the clustering operations could be performed by the neural
network simultaneously and automatically. The neural network is also capable of
448 M. Ramkumar et al.

recognizing the nonlinear and the complex groups in the hyperspace [4]. The capa-
bility of producing distinct classification results over various conventional appli-
cations holds a better place for neural network computational intelligence systems.
However, minute work has been dedicated to making the derivation over better param-
eters for the network size reduction along with the maintenance of good accuracy
value in the process of classification.
The model of artificial neural network is being utilized for the prediction of coro-
nary cardiac disease on the basis of risk factors which comprises of T wave ampli-
tude variation and a segment of ST [5]. Two stages have been adapted in the neural
network for the classification of acquired input ECG waveform into four different
types of beats that aid in the improvement over the accuracy on diagnosis [6]. Support
vector machine (SVM) is denoted as one of the machine learning algorithms which
is utilized for the classification can proceed with the process of pattern recognition
based on statistical learning theory [7]. The KNN method is denoted as the process
of learning on the basis of instance which is widely utilized the technique of data
mining in recognizing the pattern and classifying the problems [8].
Mode, median, standard deviation and mean are represented as the first-order
probabilistic features. Variance, skewness, and kurtosis denote the top order proba-
bilistic features [9]. Standard deviation lends the calculative measure for quantifying
the total amount of depression or variation for a set of values in a data. Kurtosis is
declared as the measurement of the data whether it is flat or peaked in relation to
the normal distribution. Informational data which possess a high value of kurtosis,
it is assumed to possess a distinct mean subsequent to the mean, rapidly it declines
added to that it possesses heavy tails [10]. Skewness makes the indication over the
deviation and asymmetry in the analysis of distribution to the normal distribution.
(a) Mean:
When the values set possess the deep central tendency sufficiently, the relation by
the set of numbers to its respective moments determines the additive component of
the values integer powers. The arithmetic mean for the set of values for x1 , . . . , xn
is denoted by the following equation.

1 
N
x= xj (1)
N j=1

(b) Variance:
When the description of the mean is made to the distribution location, the variance
is declared as the path for capturing its degree or scale of its condition being spread
apart. The variance unit is denoted as the square of the original variable unit. The
variance positive square root is termed to be a standard deviation.

1  2
N
Var(x1 , . . . x N ) = xj − x (2)
N − 1 j=1
Artificial Neural Network-Based ECG Signal Classification … 449

(c) Standard Deviation:


The standard deviation is denoted as the set of multiple values and it is represente as
the measure of values in the state of probabilistic dispersion. Usually, the standard
deviation is denoted by the symbol σ. Definition of standard deviation is expressed
in terms of determining the square root of variance.
  
σ x1,..., x N = Var(x1 , . . . x N ) (3)
 
where σ x1,..., x N is denoted as the distribution of standard deviation.
(d) Skewness:
For the statistical distribution of real-valued random variable, skewness is represented
as its asymmetrical calculation or measurement. The positive parameter or the value
of skewness represents an asymmetric tail distribution which extends throughout the
positive region of x at the time of negative value gets signified by the distribution,
whose later tail is extending through a maximum negative region of x [11]. However,
any of the sets of calculated parametric values of N willingly tend to lend a nonzero
value even if the distribution is symmetrical throughout its underlying (possess skew-
ness of zero). For focusing this as a meaningful statement, few ideas are required in
which its standard deviation has to be determined as the skewness estimator for the
underlying distribution.
 3
  1  Xj − X
N
Skew x1,..., x N = (4)
N j=1 σ

(e) Kurtosis:
The definition of Kurtosis is made in such a way that the cumulant of the 4th scale is
divided by the square of cumulant in the second scale which is equated to its fourth
moment surrounding the mean which gets divided by the variance square of the
statistical distribution subtracted by 3, which is denoted as the term excess kurtosis.
The conventional expression of the kurtosis is denoted by the following expression.
⎧  4 ⎫
  ⎨1 N
Xj − X ⎬
Kurt x1,..., x N = −3 (5)
⎩N σ ⎭
j=1

where the term (−3) denotes the zero value for the normal distribution.
If this is being considered for a case, the 3rd moment or skewness and the 4th
moment or kurtosis must be utilized with caution else it need not be used with these
considerations. The kurtosis is being determined as the quantity of non-dimensional.
It establishes the measurement of the relative distribution over its flatness or peak. The
positive kurtosis distribution is represented as leptokurtic, and the negative kurtosis
distribution is represented as the platykurtic. The middle part of the distribution is
450 M. Ramkumar et al.

Fig. 2 Distributions whose 3rd moment and 4th moment significantly getting varied from the
Gaussian or normal distribution. a 3rd moment or Skewness, b 4th moment or kurtosis

termed to be as mesokurtic [12]. Figure 2 depicts the variation of its distribution from
the Gaussian to the skewness and kurtosis.
The term skewness determines the representation over the distribution in an asym-
metrical manner. The distribution which possesses an asymmetric tail that tends to
extend toward right side direction is denoted to be skewed in a positive direction.
And similarly, an asymmetric tail that tends to extend toward left side direction is
assigned to be skewed in a negative direction. In our study, it is mainly used for the
measurement and verification of symmetricity of data that indicates the statistical
variable distribution. Kurtosis is the measurement which makes the determination
over the distribution degree of flatness whether it is flattened or tapered made in
comparison with the normal pattern characterization. When the value of Kurtosis
becomes higher the resulted values will be greater than that of the average value.
Thus, in this classification study, these realizations are made with respect to select
the adequate features to process within the ANN classifier.

3 Materials and Methods

The proposed approach for classifying the ECG cardiac arrhythmias makes the
involvement over ECG signal preprocessing, feature extraction over the distin-
guished statistical and the non-statistical parameters and finally classifying the
cardiac arrhythmias utilizing the artificial neural network technique with the appli-
cation of Levenberg–Marquardt backpropagation neural network (LM-BPNN). The
schematic diagram for the classification of ECG arrhythmia using ANN has been
represented in Fig. 3.
Artificial Neural Network-Based ECG Signal Classification … 451

Fig. 3 Flowchart for the ANN classification

In Fig. 3, the general flowchart of ANN could be visualized in classifying and


detecting the heartbeats and cardiac arrhythmias. In this flowchart, the determination
has been shown with the input layers, hidden layers, and the output layers. 12 cate-
gorization of ECG beats have been acquired from the ANN classification system,
and the detection of arrhythmia is being made. Figure 4 represents the functional
block diagram of a neural network system in diagnosing the cardiac arrhythmia. The
ANN network is consisting of a single input layer, single hidden layer and single
output layer. The input layer is possessed with 5 neurons denoting the features of
mean, standard deviation, variance, kurtosis and skewness, and it is possessed with
tan-sigmoid transfer function. The hidden layer is possessed with 4 neurons with
log-sigmoid transfer function, and the output layer is possessed with 12 neurons
indicating the arrhythmia of ECG beats. Hence as the sequence of the process, the
raw ECG from MIT-BIH arrhythmia database is acquired and the denoising is carried
out in terms of preprocessing followed by the statistical feature extraction and with
selected features, the classification by ANN is undergone.
(a) Levenberg–Marquardt (LM) algorithm
The Levenberg–Marquardt (LM) algorithm is generally represented as the method
functions with iteration that place the minimum value of a multivariate function with

Raw ECG signal from


Preprocessing in ECG Feature
MIT-BIH Arrhythmia
terms of Denoising Extraction
database

Output
categorization of ANN
cardiac beats and Classification
detection of of ECG
arrhythmias

Fig. 4 Block diagram of neural network system in diagnosing the cardiac arrhythmia
452 M. Ramkumar et al.

which the expression is made in terms of the square’s sum of real-valued functions
which is nonlinear [13, 14]. LM algorithm can be considered as the combination of
Gauss–Newton method and the steepest descent algorithm. This LM algorithm is
declared as one of the most robust methods when compared to that of GN algorithm
with which most importantly it identifies the solution even if it is initiated with
the final minimum. At the time of iterations, the new weight configuration in the
sequential step k + 1 for which the calculation is made as follows.
 −1 T
W (k + 1) = W (k) − J T J + λI J ε(k) (6)

where J-denotes the Jacobian matrix, λ-denotes the adjustable parameter, ε-denotes
the error vector. The modification over the λ parameter is based on error function (E)
development. If the step induces the reduction of E, then it could be accepted. Else
the value of λ will be varied. Finally, the original value is being reset and recalculation
is made for W (k + 1).
(b) Preprocessing of data
Data preprocessing is the primary initial step for developing any model. The columns
which is consisting of all 0’s is being deleted along with the disappeared values and
the columns with most of the zero values are also being deleted. It is being acquired
with 182 columns out of which 12 are meant for categorizing and the balance 170 are
meant as numerical one. As the next step, 32 rows have been deleted which comprises
of missing values and the balance 37,500 number of samples are considered for
determining the analysis in the system. Randomization is completely done on the
datasets after deleting the unwanted records. In this mode, there is no presence of
outlier in the processing data. The partitioning of datasets has been made into three
different representation. They are 68% of training sets, 16% of validation sets, and
16% of the testing dataset.
(c) Classification of arrhythmia
In this study, following are the arrhythmias of ECG beats considered for classification
and it is made into 12 classes.
1. Normal beat
2. Left Bundle Branch Block (LBBB) beat
3. Right Bundle Branch Block (RBBB) beat
4. Atrial Escape (AE) beat
5. Nodal (Junctional) Escape (NE) beat
6. Atrial Premature (AP) beat
7. Premature ventricular contraction (PVC) beat
8. Fusion of ventricular and normal (FVN) beat
9. Ventricular escape (VE) beat
10. Paced beat
11. Supra-ventricular premature beat (SP) beat
12. Nodal (junctional) Premature (NP) beat
Artificial Neural Network-Based ECG Signal Classification … 453

By utilizing ANN, for classifying the ECG cardiac arrhythmias the analysis is
being determined for mean, standard deviation, variance, skewness, and kurtosis as
the variables of input, and it is acquired from the heart rate signals. The suitable
values for various cardiac arrhythmias are being chosen as provided from Table 1
[15, 16].
(d) Method of Performance Evaluation
By the utilization of ANN, the classification performance has been evaluated by
utilizing 4 performance measures of metrics. They are sensitivity, specificity, positive
predictivity, and accuracy. These performance metrics are determined using True
Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) [17,
18].
1. True Positive: An instance with which the detection of cardiac arrhythmia is
being coincided with the diagnosis of the physician.
2. True Negative: An instance with which both the physician and the classifier output
has provided a suggestion that the result declares the absence of arrhythmia.
3. False Positive: An instance with which the classification system wrongly
classifies the healthy ECG as the arrhythmia.
4. False Negative: The classification system determines the result as healthy instead
of arrhythmia.
5. Classification Accuracy: It is determined as the ratio of the total count of correctly
classified signals and is denoted with the following equation.

TP + TN
Accuracy = (7)
N
N denotes the total count of inputs.
6. Sensitivity: It denotes the rate of positively classified correct samples. It is also
named as True Positive Rate. Normally for a system, the value of sensitivity must
be higher.

TP
Sensitivity = (8)
TP − FN

7. Specificity: It denotes the detection over the negative samples has made correctly.
It is also referred to as the False Positive Rate. Normally for a system, the value
of specificity must be the highest.

TN
Specificity = (9)
FP − TN

8. Positive Predictivity: It denotes the ratio of the total count of correctly detected
events (TP) to the total count of events with which the analyzer has been detected.

TP
Positive Predictivity = (10)
TP − FP
454 M. Ramkumar et al.

4 Results and Discussions

The training of neural network is made with backpropagation algorithm, i.e., variable
learning rate backpropagation Levenberg–Marquardt. The neural network training
window is being shown in Fig. 6. The neural network fitting function is shown in
Fig. 5. The neural network is allowed to process with 37,500 various samples for
doing the training and the testing processes. Among those various samples, 68% of
samples are utilized for training the neural network, 16% of the samples are utilized
for testing the neural network, and the balance 16% of the samples are utilized for
the validation the network. The comparison of the results has been made over the
periodical repetition of iterations through the adaptive mechanism and by shuffling
the sample values during the process of training. The error histogram is denoted
as the plot in-between the value of error and the total count of instances in which
the error has been formed. The 20-bins error histogram with respect to the different
instances on y-axis and the error value (target-output) has been plotted, and it is
depicted in Fig. 7. At the middle of the histogram plot, it has a minimum error and
the error value increases as it is moved away from its center.
Figure 8 depicts the neural network training regression plot which shows the
relation between the target and the output. In the NN training, for classifying the
cardiac arrhythmias it takes 50 iterations to complete the cycle. At 17th epoch, the
regression window has been depicted. The neural network training state has been
shown in Fig. 9 which shows the relation between the gradient and the epochs at
14th iteration. The best validation performance has been determined at the 17th
epoch, and it is acquired with the value of 0.0016349, and its window is shown in
Fig. 10. The output response as the result of classification has also been analyzed
with the individual element determining the characteristics over the target, output
and error with respect to time. Its window is shown in Fig. 11. The indication over the
output and the target is related by the regression value. The plot of regression lends
the information on how close the output is matched with the target values. The output
of the network would possess a strong linear relationship with the desired targets if
the regression coefficient value termed to be as unity. If the regression coefficient
value is approaching zero, then the prediction over the output and the target cannot
be done with the relation.
The performance plot is denoted as the plot across the mean square error and the
total count of epochs. Mean square error is denoted as the squared average difference

Fig. 5 Fitting function of neural network


Artificial Neural Network-Based ECG Signal Classification … 455

Fig. 6 Neural network


training window

Fig. 7 20-bins error Error Histogram with 20 Bins


105
histogram plot 2
Training
Validation
Test
Zero Error
1.5
Instances

0.5

0
-0.985
-0.8828
-0.7806
-0.6785
-0.5763
-0.4741
-0.3719
-0.2697
-0.1675
-0.06532
0.03687
0.1391
0.2412
0.3434
0.4456
0.5478
0.65
0.7522
0.8544
0.9566

Errors = Targets - Outputs


456 M. Ramkumar et al.

Fig. 8 Regression plot relating the output and the target

between the output data and the target data. MSE with zero indicates that there is no
error. When the training is initiated and it is under progress, the error gets reduced.
When the value of mean square error is reduced to the minimum value, the process of
training gets stopped and the validation of the network happens with the samples. In
the phase of validation, if the behavior of the network is identified properly, then the
training comes to an end and it will be ready to undergo the testing process. The LM
determines better performance when the comparison is made with other methods on
the basis of calculating MSE. Table 2 determines the classification of 12 different
types of arrhythmias along with its accuracy, sensitivity, positive predictivity, and
specificity values.
The error plot has been obtained on the basis of the acquired error value at the
time of training stage, validation stage and testing stages utilizing the individual
cycle data. The mechanism of data sharing with respect to statistical feature selection
techniques is significantly attaining high performance with minimum error. Hence,
Artificial Neural Network-Based ECG Signal Classification … 457

Fig. 9 Neural network training state at 17th epoch

in the histogram plot of 20 bins, the peak value is in the middle region of 0.03687,
and it holds minimum error only in the middle region. And also, it indicates that
when the error histogram moves away from the middle region, it will result in more
error. This proves the ANN system has been classified with high accuracy.
The regression plot which relates the output and target is shown in Fig. 8. At the
regression value R with 0.99387, 0.99431, and 0.9951 for training validation and
testing, respectively, the output has been acquired with almost linear which is nearly
unity. This linearity determines the best realization in determining the relationship
between the training, validation, and the testing values. As a whole, the processing
data has been attained with the regression value of 0.99412. At 17th epoch, the neural
network training state is being shwon in Fig. 9 which infers the performance over the
neural network training process is highly efficient and has resulted in the gradient
value of 0.00052892. It yields the accuracy of the ANN system by evaluating its
performance. The realization has been made between the MSE and the total number
of iterations that have been undergone. It has been resulted in the best validation
performance of 0.0014894 at 17th iteration. The best realization has attained from
the training, validation, and the testing data in formulating the error analysis. The
458 M. Ramkumar et al.

Fig. 10 Best validation performance between MSE and 17 epochs

response characteristics of the output element had also been shown in Fig. 11 realizing
its response of error, target, and output.
From the above classification results, it could be inferred that the accuracy is
higher with 98.8% for classifying the normal beats, and the sensitivity is high with
97.64% for the fusion of ventricular and normal (FVN) beat, specificity is high with
96.68% for left bundle branch block (LBBB) beat and the positive predictivity is
with the higher value of 97.63% for fusion of ventricular and normal (FVN) beat.

5 Conclusion

The evaluation over the performance of classification algorithm using Levenberg–


Marquardt backpropagation neural network (LM-BPNN) technique is being done by
acquiring the data from the MIT-BIH arrhythmia database with accuracy, sensitivity,
positive predictivity, and specificity. These performance metrics have been made
defined by utilizing True Negative (TN), True Positive (TP), False Positive (FP),
Artificial Neural Network-Based ECG Signal Classification … 459

Response of Output Element 1 for Time-Series 1


1
Training Targets
Training Outputs
0.9995 Validation Targets
Validation Outputs
Test Targets
Output and Target

0.999 Test Outputs


Errors
Response
0.9985

0.998

0.9975

0.997
10-3
3
Targets - Outputs

2
Error

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Time

Fig. 11 Response characteristics of output element

and False Negative (FN). The results over the experimentation have shown that the
accuracy of classification is being existed from 91.18 to 98.8% for the 12 class of
ECG arrhythmias.
460 M. Ramkumar et al.

Table 2 Probabilistic results of classifying ECG signal by LVQ NN showing the performance
metrics of 12 class of arrhythmias
S. No. ECG arrhythmia Accuracy Sensitivity Specificity Positive
beats (%) (%) (%) predictivity
(%)
1 Normal beat 98.8 96.48 55.48 96.55
2 Left bundle branch 94.48 93.34 96.68 93.47
block (LBBB) beat
3 Right bundle branch 92.68 90.84 92.24 90.89
block (RBBB) beat
4 Atrial escape (AE) 91.25 90.97 91.18 90.92
beat
5 Nodal (junctional) 95.62 91.68 39.14 91.67
escape (NE) beat
6 Atrial premature 96.24 91.43 84.49 91.45
(ap) beat
7 Premature 94.64 95.45 85.68 95.41
ventricular
contraction (PVC)
beat
8 Fusion of 95.54 97.64 92.46 97.63
ventricular and
normal (FVN) beat
9 Ventricular escape 91.18 92.21 95.57 92.23
(VE) beat
10 Paced beat 97.68 97.04 91.28 97.09
11 Supra-ventricular 94.14 96.67 85.66 96.68
premature beat (SP)
beat
12 Nodal (junctional) 96.66 90.01 78.24 90.09
premature (NP) beat

References

1. Turakhia MP, Hoang DD, Zimetbaum P et al. (2013) Diagnostic utility of a novel leadless
arrhythmia monitoring device. Am J Cardiol 112(4):520–524
2. Perez de Isla L, Lennie V, Quezada M et al (2011) New generation dynamic, wireless and
remote cardiac monitorization platform: a feasibility study. Int J Cardiol 153(1):83–85
3. Olmos C, Franco E, Suárez-Barrientos A et al (2014) Wearable wireless remote moni-
toring system: An alternative for prolonged electrocardiographic monitoring. Int J Cardiol
1(172):e43–e44
4. Huang C, Ye S, Chen H et al (2011) A novel method for detection of the transition between
atrial fibrillation and sinus rhythm. IEEE Trans Biomed Eng 58(4):1113–1119
5. Niranjana Murthy H, Meenakshi M (2013) ANN model to predict coronary heart disease based
on risk factors. Bonfiring Int J Man Mach Interface 3(2):13–18
6. Ceylan R, Özbay Y (2007) Comparison of FCM, PCA and WT techniques for classification
ECG arrhythmias using artificial neural network. Expert Syst Appl 33(2):286–295
Artificial Neural Network-Based ECG Signal Classification … 461

7. Dubey V, Richariya V (2013) A neural network approach for ECG classification. Int J Emerg
Technol Adv Eng 3
8. Zadeh AE, Khazaee A, Ranaee V (2010) Classification of the electrocardiogram signals using
supervised classifiers and efficient features. Comput Methods Prog Biomed 99(2):179–194
9. Jadhav SM, Nalbalwar SL, Ghatol AA (2010) ECG arrhythmia classification using modular
neural network model. In: IEEE EMBS conference on biomedical engineering and sciences
10. Sreedevi G, Anuradha B (2017) ECG Feature Extraction and Parameter Evaluation for
Detection of Heart Arrhythmias. I Manager’s J Dig Signal Process 5(1):29–38
11. Acharya UR, Subbanna Bhat P, Iyengar SS, Rao A, Dua S (2003) Classification of heart rate
using artificial neural networkand fuzzy equivalence relation. Pattern Recognit 36:61–68
12. Kannathal N, Puthusserypady SK, Choo Min L, Acharya UR, Laxminarayan S (2005) Cardiac
state diagnosis using adaptive neuro-fuzzy technique. In: Proceedings of the IEEE engineering
in medicine and biology, 27th annual conference Shanghai, China, 1–4 Sept 2005
13. Acarya R, Kumar A, Bhat PS, Lim CM, Iyengar SS, Kannathal N, Krishnan SM (2004)
Classification of cardiac abnormalities using heart rate signals. Med Biol Eng Comput
42:288–293
14. Shah Atman P, Rubin SA (2007) Errors in the computerized electrocardiogram interpretation
of cardiac rhythm. J Electrocardiol 40(5):385–390
15. Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple
way to prevent neural networks from overfitting. J Mach Learn Res 15(1): 1929–1958
16. Turakhia MP, Hoang DD, Zimetbaum P, Miller JD, Froelicher VF, Kumar UN, Xu X, Yang
F, Heidenreich PA (2013) Diagnostic utility of a novel leadless arrhythmia monitoring device.
Am J Cardiol 112(4):520–524
17. Xiong W, Droppo J, Huang X, Seide F, Seltzer M, Stolcke A, Yu D, Zweig G (2016) Achieving
human parity in conversational speech recognition. arXiv preprint arXiv:1610.05256
18. Melo SL, Caloba LP, Nadal J (2000) Arrhythmia analysis using artificial neural network and
decimated electrocardiographic data. In: Computers in cardiology 2000, pp. 73–76. IEEE
CDS-Based Routing in MANET Using Q
Learning with Extended Episodic Length

D. S. John Deva Prasanna, D. John Aravindhar, and P. Sivasankar

Abstract Broadcast storm created during the message exchanges in MANET is a


serious issue in routing as MANET nodes have limited processing resource and
residual energy. Researches in the area of connected dominating set (CDS) in
MANETs mostly focus on centralized approaches, which require strong and stable
topological information about the MANET. Adaptation of centralized algorithms is
intangible due to the dynamic nature of the MANET and will demand extensive
control message exchange. Hence, it is required to deduce an algorithm to compute
CDS using the partial observations of the MANET. In this paper, a Q learning-based
CDS formation algorithm has been proposed with extended episodic length approach
for efficient MANET routing. In the proposed algorithm, the CDS nodes were chosen
not only based on its own Q value but also based on the Q value of its neighbours.
In this way, the episodic length of the Q learning algorithm is extended from one
hop to two hops. Here residual energy and signal stability are used to estimate the
Q values of the individual nodes. The simulation of the algorithm gives promising
results when compared to conventional CDS establishment algorithms.

Keywords Connected dominating set · Reinforcement learning · Q learning ·


Decay factor · Learning rate · Episode

1 Introduction

Routing in MANETs is challenging due to the dynamic nature of the network. The
routing information in MANET needs to be updated on regular intervals due to the

D. S. John Deva Prasanna (B) · D. John Aravindhar


CSE, HITS, Chennai, India
e-mail: johndp@hindustanuniv.ac.in
D. John Aravindhar
e-mail: jaravindhar@hindustanuniv.ac.in
P. Sivasankar
NITTTR, Chennai, India
e-mail: siva_sankar123p@yahoo.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 463
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_35
464 D. S. John Deva Prasanna et al.

Fig. 1 Sample MANET used for illustration

mobility of the nodes. Exchange of routing information is done by the broadcasting


of control messages across the network. This message exchanges often create a
broadcast storm due to multiple retransmission of messages. This broadcast storm
will cause the nodes to receive multiple copies of the same control messages from
multiple neighbours. To avoid this problem, the virtual backbone of nodes is formed
so that all transactions are done only through those backbone nodes.
Figure 1 shows sample network architecture considered for explaining proposed
extended Q CDS-based MANET routing.

1.1 Advantages of Connected Dominating Sets in MANET

Connected dominating set (CDS) is a resilient technique used for the formation of a
backbone in the MANET. CDS is a graph theory concept in which every node in the
graph either will be in the dominating set of the graph or will be a one-hop neighbour
to the dominating node. The concept of CDS is used in MANET routing, as the CDS
will act as a backbone for all communications. MANET routing is usually done by
broadcasting, and hence, if a node transmits a data, then all the neighbouring nodes
will be receiving that message. By routing all the communications through CDS, this
can be avoided.
Most CDS constructions techniques follow a centralized approach, which needs
information about the node and edge weight for all the nodes in the graph. Central-
ized CDS formation approaches are hard to implement in a MANET scenario, as it
needs information about the entire MANET. CDS is constructed using nodes with
better network metrics like link stability and residual energy. Though these network
parameters are considered, the construction of CDS is done by greedy approaches in
MANET. Greedy approaches are easy to implement but might result in less efficient
CDS with an increased number of CDS nodes.
CDS-Based Routing in MANET Using … 465

1.2 Partial Observability of MANETs and Significance of Q


Learning

In MANET scenario, it is less possible to obtain the entire topographical information


about the nodes and the link metrics due to the mobility of the nodes. This limitation
makes the centralized CDS algorithms to be vague for deducing CDS in MANET.
Any node in the MANET can obtain information only about its immediate next-
hop neighbour. Any pair of nodes in the MANET cannot have a full view of all
the intermediate nodes between them and the quality of the link between the nodes.
Though proactive routing protocols obtain routes prior to the data transmission, the
routes need to be recomputed often due to node mobility. This character of MANET
is referred to as partial observability.
Q learning is a reinforcement learning (RL) algorithm, in which the system learns
by interacting with the environment and reinforcing the learning process using the
feedback obtained by the previous interaction with the environment. This feature of
the RL algorithm makes it suitable for environments, which are partially observable
like MANET. In the Q learning algorithm, the learning entities are called agents.
Agents interact with the environment by taking action, and the feedback of the action
will be either a reward or a penalty. The agent will be rewarded for taking an action
if the result is a success and will be penalized if the result is a failure. The entire
sequence of the agent’s interaction with the environment and obtaining the feedback
is called an episode.
Hence, this paper proposes a novel extended Q learning-based CDS formation
algorithm by considering the advantages of conventional CDS and Q learning algo-
rithms to optimize the efficient route between the source and the destination. In this
paper, the nodes of the MANET are considered as agents and the nodes learn about
their neighbours by transmitting data to them. The nodes will receive a reward if the
data transmission is successful and will receive a penalty if the transmission fails. Q
values are then estimated using the rewards values of the nodes. In this manner, any
nodes will possess a Q value, which is a cumulative representation of its previous
transmission behaviours. The nodes with higher Q values will naturally have a higher
success rate in transmission. The nodes with higher Q values are included to form
CDS in the MANET, which will act as a backbone.

2 Literature Survey

A MANET routing algorithm using the concept of reinforcement learning was


proposed in [1]. In this paper, the nodes learn about their neighbours’ continuously
using reward and penalty model. The routes with higher cumulative reward are used
for data transmission. The work establishes decay constants for identifying the old
routes. Though this work solves routing using RI, the concept is not used for CDS
constructions.
466 D. S. John Deva Prasanna et al.

A Q learning algorithm for MANET routing was proposed in [2], which uses
residual energy, node mobility and link quality for finding the efficient intermediate
nodes in a data transfer. These parameters are used in calculating the Q values of
the nodes. The nodes in the MANET are bootstrapped with these parameters before
the learning begins. This algorithm suffers a setback when the size of the MANET
increases. The algorithm requires the nodes to know about the hop count between
any pair of nodes and hence obtaining the topological information from all the nodes
in the MANET, which is challenging in large networks. Performance evaluation
of routing algorithm for MANET based on the machine learning techniques was
proposed in [3].
A Patially Observable Markov Decision Process (POMDP) was modelled for the
entire MANET Scenario in [4]. Here the MANET is considered as an observable
entity since nodes cannot obtain topological information for the entire network. The
nodes are marked as agents. The packet delivery actions like unicasting, multicasting
and broadcasting the packets and considered as actions. The nodes can interact with
the environment by sending packets in the network. Status of the nodes during the
packet transfer like packet sent, packet received and packet dropped is considered
as the state of the nodes. If a node successfully transfers a packet to its neighbours,
then it received a reward value, and if the transmission fails, the node will receive a
penalty. The reward/penalty model accumulates the value of a node.
A MANET path discovery was done using on policy first-visit Monte Carlo
method in [5]. This algorithm combines with ticket-based path discovery method
with reinforcement learning. The nodes send probing tickets to the destination node
and learn about the intermediate nodes and their available resources. The nodes will
maintain a table of routes to other nodes in the MANET. Though the algorithm mini-
mizes resource utilization, it is hard for the energy-constrained to probe the route in
a dynamic environment. The algorithm also suffers a set back when the size of the
MANET grows.
In [6], an algorithm was built to construct connected dominating set using the
nodes, which have a high packet delivery ratio and residual energy. This algorithm is
called RDBR and follows conventional beacon method exchange of network param-
eter information like packet delivery ratio and residual energy. Usage of machine
learning algorithms which can learn on partial observable conditions like MANET
is not used in this paper.
Several algorithms with centralized CDS formation approach and optimizations
were proposed in [7] and [8] by forming a maximal independent set (MIS) and
then optimizing the MIS nodes to be connected to each other in order to form a
CDS. Though this algorithm computes the minimum CDS, it follows a centralized
approach and will only be suitable for stable networks. The algorithms have higher
time complexity and may not be suitable for dynamic networks, where observing the
topological changes of the entire network is not possible within the stipulated time.
In [9], a CDS reconstruction algorithm was proposed to reconstruct the CDS
locally with minimum changes by exchanging tiny control message packets. The
algorithm reduced the control over considerably, and hence, the performance of the
MANET is increased. This works much focuses on the reconstruction of the CDS
CDS-Based Routing in MANET Using … 467

and contributes less to the initial establishment of CDS. Adopting CDS construc-
tion strategies based on network performance metrics will further improve the
performance of this algorithm [9].
A CDS construction using the Q values of the nodes estimated using Q learning
was proposed in [10]. In this algorithm, the CDS construction is done in a greedy
fashion using the Q values of the nodes in the MANET. This algorithm suffers due
to the greedy approach as some CDS nodes might have all neighbour nodes with
low Q values. The algorithm will not have any option except to add one of the low
Q-valued nodes as CDS node.
From the literature survey, it can be inferred that many MANET routing algorithms
are formulated using reinforcement learning technique. Each algorithm has its own
pros and cons. Constructions of CDS using reinforcement learning is done using
a greedy approach. Therefore, a Q learning-based CDS construction algorithm in
MANETs is proposed, and to avoid the greedy nature of the algorithm, the length of
the reinforcement learning episode is extended from one hop to two hop.

3 Proposed Extended Q-CDS Construction Using Q


Learning with Extended Episode Length

This paper aims to achieve an efficient route between the source and destination by
proposing an algorithm to construct CDS using Q learning algorithm with extended
episode length. Nodes in the MANET interact with its neighbours by sending a
message to its neighbour. Successful transactions earn a node a rewards and every
failed transaction will incur a penalty. Every node in the MANET develops cumula-
tive Q values of its neighbours by sending messages at various point of time. This
scenario of a node sending a message and calculating reward/penalty is called an
episode.
In MANET, nodes can assess the network parameters like link quality and
residual energy for its one-hop neighbours only. In conventional routing techniques,
collecting parameter values beyond one-hop neighbours will need extra control
message exchanges. In the proposed algorithm, the Q value of a node is estimated
using its signal stability, residual energy and the Q value of its neighbour. Hence,
the Q value of any node will now reflect the quality of itself as well as the quality its
best neighbour. When this Q value is used for CDS construction, not only nodes with
higher Q values and quality neighbours are selected as CDS members. Through this,
the visibility of a node is increased from one hop to two hops and the obtained CDS
is more efficient and stable. The conceptual diagram and workflow of the proposed
extended Q CDS algorithm are shown in Fig. 2.
468 D. S. John Deva Prasanna et al.

Fig. 2 Process flow of the proposed extended Q CDS

3.1 Issues in Greedy CDS Exploration

During the CDS exploration process, the algorithm always considers the immediate
next node with the maximum Q value to be added as the next CDS node. This tech-
nique is greedy and sometimes will result is longer and inefficient CDS. Figure 3
illustrates this scenario, where the greedy approach constructs a sub-optimal solu-
tion. The initial Q values are assumed according to the residual energy and signal
stability ratio of the individual nodes in the network for the illustrative purpose. The
Q values are learned and estimated during the exploration phase and are updated
during the exploitation phase based on signal stability, residual energy and assigned
reward/penalty values. Here the node n2 chooses node n5 as its next CDS node as it
has the highest Q value. After including the node n5 to the CDS, the only possible next

Fig. 3 MANETs with CDS comprising of the nodes n2, n5, n3, n7, n11 and n10
CDS-Based Routing in MANET Using … 469

CDS node is n3 which has a very low Q value. Constructing CDS through this path
will only result in sub-optimal CDS, which may require frequent re-computation.
Moreover, the number of nodes in the CDS is also increased due to this technique.

3.2 Mitigating Greedy Issue by Extending the Episode Length

To solve the above issue every, it is prudent to increase the episode length in RL
learning phase. The nodes learn about their one-hop neighbouring nodes through
interaction, in this technique, the nodes will also know about the neighbours of its one-
hop neighbours. When a node sends a message to its neighbour node, the receiving
node will acknowledge the message and appends the highest Q value among of all
its one-hop neighbours. In the example scenario shown in Fig. 4, node n2 sends a
message to the node n7 and node n7 will acknowledge along with the highest Q value
of n7’s one-hop neighbours, which is in this case n11. Node n2 incorporates the Q
value of the node n11 into the Q value of node n7. In this way, the nodes that are
having high Q value and high-quality neighbour nodes will be selected to form the
CDS. The obtained CDS is found to be optimal in terms of CDS size and versatile
in terms of CDS lifetime.

Fig. 4 MANETs with CDS comprising of the nodes n2, n7, n11 and n10
470 D. S. John Deva Prasanna et al.

The proposed extended Q CDS construction using Q learning will be elaborated by


briefing Q value estimation, decay factor, learning rate, exploration and exploitation
of CDS, extending the episodic length to obtain an efficient CDS node in formation
of backbone, in the following section.

3.3 Q Value Estimation

Q values of the nodes are calculated by sending beacon packets to its neighbours and
verifying the status of the packet delivery. The node, which is sending the beacon
packet, is called an assessor node, and the node, which receives the packet, is called
as an assessed node. The assessor node will set a reward of one, for all the assessed
nodes, which are responding to the initial beacon with an acknowledgement. This
reward value will be further characterized by the learning rate parameter calculated
based on the residual energy and the signal stability obtained from the received
acknowledgement. The Q values calculated for every packet transmission are added
to form a single cumulative Q value for the particular node.
The Q value of the nodes will be estimated using the generic Q learning formula
personalized by learning rate and decay constant.
⎛ ⎞

Q(S, at ) = Q(S, at ) + ρ ⎝ Rt+1 + 


max (β · Q(S, at+1 )⎠ (1)
S∈D,F

In Eq. (1), ‘S’ refers to the set of states a node can be in any point of time, ‘at ’
refers to the action taken at time ‘t’ that can change the state of the node from one
to another.
S = {D, F}, the terms D and F refer to the states ‘“Delivered’ and ‘“Failed’,
respectively.
at = {T, B}, the terms T and B refer to the action of ‘Transmit’ and ‘Buffer’,
respectively. When the node transmits the data, the action is referred to as ‘“Transmit’.
If the node stalls the transmission or the transmission fails, then the data will remain
in the buffer and the action is referred to as ‘Buffer’.
If a node successfully delivers the packet to its neighbour, then the state of the
node will be delivered, and if the packet is not delivered, then the node will be in the
failed state.
Q(S, at ) refers to the exiting Q value of the node.
max (β · Q(S, at+1 )) refers to the policy of selecting an action which will yield

S∈D,F
maximum reward, in our case, ‘D’ is the desired state which can be obtained by
taking an action of data transmission. If the data transmission fails, then the node
will attract a negative reward.
Rt+1 refers to the reward allotted for taking an action.
The learning rate ρ and the decay factor β are elaborated in the following sections.
CDS-Based Routing in MANET Using … 471

3.4 Estimation of Decay Parameter

Decay parameter is used to estimate the staleness of the learned Q value in Q learning.
The Q value will be reduced as time passes on because the estimation once done might
change over a period of time. In the proposed algorithm, residual energy of the nodes
is considered as decay parameter. The energy levels of the nodes will reduce as they
involve in the data transfer, more the nodes involve in data transfer more they will
drain their residual energy. The decay factor calculated using the residual energy
will add a standard deduction to the node’s Q value whenever the node transmits
a data. When a CDS node’s Q value goes below than the threshold, then the CDS
re-computation is triggered. Decay factor is represented using the symbol ‘β’.

β = r0 e−0.001tr (2)

t r refers to the number of packets sent/received to a particular node and r0 refers to


the initial energy of the node.

3.5 Estimation of Learning Rate

Learning rate is a parameter, which controls the algorithm’s speed of learning. When
a node receives acknowledgement from more than one neighbour by beaconing, not
all nodes can be assigned with the same Q value. Q learning has to ensure that the
best neighbour receives the highest Q value. The proposed algorithm identifies the
best neighbour node by using the residual energy of the node and the signal stability
between the nodes as learning rate parameters. Through the learning rate, the node
with better residual energy and link stability will receive a higher Q value than other
nodes.
Estimation Link Stability Learning Rate ρ ss . The signal stability between the two
nodes will be estimated based on bit error rate calculation.
1
ρss = LSi j ∝ (3)
BERi j
1
LSi j = k (4)
BERi j

where
K is the proportionality constant and can be assigned with an integer.
LSij → link stability between the node i and j.
Estimation of Residual Energy Learning Rate ρ re . The residual energy of the node
will be evaluated using the initial energy of the node and the energy consumed.
472 D. S. John Deva Prasanna et al.

ρre = 1 − E c (5)

Here E r and consumed energy of the node, respectively, which is represented by

Ec = Et + Er + Ep (6)

E c → Energy consumed
E t → Energy spent to transmit a packet.
E r → Energy spent to receive a packet.
E p → Energy spent to process a packet.

ρre = E AR + E PR (7)

Hence, learning rate is calculated by

ρ = ρre + ρss (8)

3.6 Extended Q Value Estimation

In the extended Q learning algorithm, the Q value of a node is computed with its Q
value as well as the Q value of its best neighbour.
Hence,



Q x (S, at ) = (w1 Q i (S, at )) + w2 max Q j (S, at ) (9)
j=1...n

where
Q x (S, at ) refers to the extended Q value of a node ‘i’ incorporated with the Q
value of its neighbour.
Qi (S, at ) is the Q value of the node ‘i’ estimated through Eq. (1).
The term max Q j (S, at ) refers to the highest Q value found among the neigh-
j=1...n
bouring nodes of the node ‘i’. The direct Q value of the node and the maximum Q
value of the neighbour nodes are given the weightage of w1 and w2 which are 0.6
and 0.4, respectively. So the Q value of any node will reflect its own quality as well
as its one-hop neighbour where w1 > w2 here w1 and w2 .
CDS-Based Routing in MANET Using … 473

3.7 Exploration and Exploitation of CDS

The process of deducing the CDS using the estimated Q values of the nodes is called
an exploration process. CDS exploration will happen only during the initial phase
of the CDS establishment and when the Q value of any one of the CDS node goes
below the threshold value. During the exploration process the node, which initiates
the CDS construction will select its neighbour node with highest extended Q value
as the next CDS node. All other one-hop neighbour nodes are declared as covered
nodes. This incremental addition of nodes with the highest extended Q values to CDS
will continue until all the nodes in the MANETs are covered by the CDS. Once the
CDS is established, all communications will go through the backbone and the process
is called exploitation. During the exploitation process, the Q value is calculated on
every transaction. If any CDS node’s Q value goes below the threshold, then CDS
exploration process is triggered.

3.7.1 Algorithm for Exploration of Q-CDS

Step 1: Initialize the MANET by placing nodes randomly with equal energy and
specified terrain dimension.
Step 2: Bootstrap the nodes with distance, BER and residual energy.
Step 3: Estimate signal stability learning rate using BER and distance.
Step 4: Estimate residual energy learning rate.
Step 5: Estimate the overall learning rate using signal stability learning rate and
residual energy learning rate.
Step 6: Assign reward and penalty values for nodes based on packet transitions.
Step 7: Calculate Q value of the neighbouring nodes and incorporate the Q value of
the two hop nodes obtained from neighbouring nodes.
Step 8: Explore the next best neighbour based on highest Q value and include it in
the CDS. All the immediate neighbours will act as covered nodes.
Step 9: Repeat step 8 to form CDS until all nodes in the network are covered. Each
and every node will update its Q value table about their neighbours.
Step 10: If the Q value of any one of the nodes decays below the threshold then
reinitiate exploration again.
Figure 5 illustrates the flowchart of the extended Q CDS.
474 D. S. John Deva Prasanna et al.

Fig. 5 Flowchart of the


extended Q CDS
CDS-Based Routing in MANET Using … 475

4 Results and Discussion

The extended Q CDS is implemented using NS2,and the simulation parameters are
provided in Table 1.
Figure 6 and 7 show the screenshots of NS2 simulation of the algorithm. The
experiments have been carried out for different seed values, and the average is used
for result and analysis.
The algorithm is experimented by varying the number of nodes and metrics
such as packet delivery ratio, end-to-end delay, residual energy and size of CDS

Table 1 Simulation
Parameter Values
parameters in NS2
Number of nodes 100
Speed Upto 20 m/s
Mobility model Random waypoint
Node placement Random
Initial energy of the node 400 J
Simulation area 1000 m × 1000 m
Simulation time 10 min

Fig. 6 Screenshot of NS2 implementation of extended Q CDS


476 D. S. John Deva Prasanna et al.

Fig. 7 Screenshot of NS2 implementation of extended Q CDS trace file

was measured. The extended Q CDS algorithm is compared with the reliable CDS
(RCDS) [4] and cognitive CDS (CDSCR) [2] and Q CDS [10] and the algorithm
performs considerably better.
Figure 8 illustrates that extended QCDS can perform better than other algorithms
with respect to packet delivery ratio. The QCDS and extended QCDS algorithms
constructs almost same CDS when the number of nodes are less, but when the number

No.of nodes Vs Packet Delivery Ratio


100
Packet Delivery Ratio(%)

95
90
85
RDBR
80
75 CDSCR
70 QCDS
65
Ex QCDS
60
10 20 30 40 50 60 70 80 90 100
No.of Nodes

Fig. 8 Number of nodes versus packet delivery ratio


CDS-Based Routing in MANET Using … 477

of nodes is increased, the algorithm shows improvement in the CDS construction


which reflects in the packet delivery ratio.
Figure 9 shows that the extended Q CDS algorithm computes a very optimal and
stable CDS, so the end-to-end delay of the algorithm is very minimal than other
algorithms. Initially, all algorithms perform equally but the performance degrades
when MANET size increases. RDBR and CDSCR are centralized CDS algorithms,
and QCDS follows the greedy approach; due to this, the performance degrades as
more time is spent on route computation before data transfer.
By inspecting Fig. 10 extended Q CDS computed CDS with a minimum number
of nodes. Though the size of the CDS increases linearly with the increase in MANET
size, extended Q CDS still computes CDS with lesser number of nodes but the CDS
size is closer to other algorithms. Since RBDR and CDSCR follows a centralized
approach it is easy to optimize the redundancy in CDS, these algorithms construct
CDS with lesser number of nodes.
Figure 11 shows that the control overhead increases linearly for all algorithms
with respect to the increase in the number of nodes in the MANET. RBDR and

No.of Nodes Vs End to End Delay


0.2
0.18
0.16
0.14
Delay in ms

0.12 RDBR
0.1
0.08 CDSCR
0.06
0.04 QCDS
0.02
0 Ex Q CDS
10 20 30 40 50 60 70 80 90 100
No.of Nodes

Fig. 9 Number of nodes versus delay

Nof Nodes Vs No of CDS Nodes


35
30
No of CDS Nodes

25
20
RDBR
15
CDSCR
10
QCDS
5
Ex Q CDS
0
20 40 60 80 100
No.of Nodes

Fig. 10 Number of nodes versus number of nodes in CDS


478 D. S. John Deva Prasanna et al.

No.of Nodes Vs Control Overhead


1000

800
Control Overhead
600 RDBR

400 CDSCR
QCDS
200
Ex Q CDS
0
20 40 60 80 100
No.of Nodes

Fig. 11 Number of nodes versus control overhead

CDSCR algorithms demand control message exchange whenever there is a change


in the MANET topography due to the centralized approach. The extended Q CDS
consumes less overhead as Q values are calculated over normal data transactions.
Thus, whenever there is a failure is CDS, the Q values are readily available with the
node and the CDS re-computation begins with a low number of control overhead
message exchange.

5 Conclusion

In this paper, a technique for construction of connected dominating sets using Q


learning with expended episodic length is proposed. The results were found to be
better, when compared to existing heuristic CDS construction algorithms, In this
paper, the decay factor is measured from the neighbour nodes and energy level
of nodes is bootstrapped into the system before the algorithm begins the learning
phase. In heterogeneous environments, the nodes will have varying energy levels
and hence the decay factor needs to be estimated dynamically with respect to the
neighbour nodes energy level. In future, the episodic length can be further increased
by adopting optimization algorithms like ant colony optimization and tabu search.

References

1. Dowling S, Curran E, Cunningham R, Cahill V (2005) Using feedback in collaborative rein-


forcement learning to adaptively optimize MANET routing. IEEE Trans Syst Man Cybern Part
A Syst Hum 35(3):360–372
2. Tilwari V, Dimyati M, Hindia A, Fattouh, Amiri I (2019) Mobility, residual energy, and link
quality aware multipath routing in MANETs with Q-learning algorithm. Appl Sci 9(8):1582
3. Duraipandian M (2019) Performance evaluation of routing algorithm for Manet based on the
machine learning techniques. J Trends Comput Sci Smart Technol (TCSST) 1(01):25–38
CDS-Based Routing in MANET Using … 479

4. Nurmi P (2007) Reinforcement learning for routing in ad hoc networks. In: 2007 5th interna-
tional symposium on modeling and optimization in mobile, ad hoc and wireless networks and
workshops
5. Usaha W, Barria J (2004) A reinforcement learning ticket-based probing path discovery scheme
for MANETs. AdHoc Netw
6. Preetha K, Unnikrishnan (2017) Enhanced domination set based routing in mobile ad hoc
networks with reliable nodes. Comput Electr Eng 64:595–604 (2017)
7. Tran TN, Nguyen T-V, An B (2019) An efficient connected dominating set clustering based
routing protocol with dynamic channel selection in cognitive mobile ad hoc networks. Comput
Electr Eng
8. Hedar AR, Ismail R, El-Sayed GA, Khayyat KMJ (2018) Two meta-heuristics designed to solve
the minimum connected dominating set problem for wireless networks design and management.
J Netw Syst Manage 27(3):647–687
9. Smys S, Bala GJ, Raj JS (2010) Self-organizing hierarchical structure for wireless networks.
In: 2010 international conference on advances in computer engineering. https://doi.org/10.110
9/ace
10. John Deva Prasanna DS, John Aravindhar D, Sivasankar P (2019) Reinforcement learning
based virtual backbone construction in Manet using connected dominating sets. J Crit Rev
A Graphical User Interface Based Heart
Rate Monitoring Process and Detection
of PQRST Peaks from ECG Signal

M. Ramkumar, C. Ganesh Babu, A. Manjunathan, S. Udhayanan,


M. Mathankumar, and R. Sarath Kumar

Abstract An electrocardiogram (EKG/ECG) is represented as the recording of elec-


trical impulses of the cardiac muscle and it is utilized in investigating and detecting
the cardiac disease or arrhythmia. This activity of electrical impulses of the heart’s
cardiac muscle exhibits the translation into the tracings of line on a paper. This
dips and spikes over the tracings of the line are determined to be as a wave series.
This wave series is consisting of six waveforms of various characteristics which are
discernible and its differentiation can be made as P peak, Q peak, R peak, S peak, T
peak, and sometimes U peak. One of the most ancient methodologies in analyzing
the ECG signal for making the detection of PQRST waveform is done based on
digital signal processing technique such as fast Fourier transform (FFT), discrete
wavelet transform (DWT), and artificial feed forward neural networks. However,

M. Ramkumar (B) · R. Sarath Kumar


Department of Electronics and Communication Engineering, Sri Krishna College of Engineering
and Technology, Coimbatore, India
e-mail: mramkumar0906@gmail.com
R. Sarath Kumar
e-mail: sarathkumar@skcet.ac.in
C. Ganesh Babu
Department of Electronics and Communication Engineering, Bannari Amman Institute of
Technology, Sathyamangalam, India
e-mail: bits_babu@yahoo.co.in
A. Manjunathan
Department of Electronics and Communication Engineering, K. Ramakrishnan College of
Technology, Trichy, India
e-mail: manjunathankrct@gmail.com
S. Udhayanan
Department of Electronics and Communication Engineering, Sri Bharathi Engineering College for
Women, Pudukkottai, India
e-mail: udhai.patt@gmail.com
M. Mathankumar
Department of Electrical and Electronics Engineering, Kumaraguru College of Technology,
Coimbatore, India
e-mail: mathankumarbit@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 481
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_36
482 M. Ramkumar et al.

this proposal of current work is made in such a way that it is a very dependable and
simple method in detecting the values of P, Q, R, S, and T peak values of an ECG
waveform. This technique is proposed on the basis of determining the mathemat-
ical relationship between the ECG signal’s peak value and the time sequence. This
methodology has been focused in the direction of making the graphical user interface
(GUI) design for exhibiting the detection of PQRST by utilizing the MATLAB tool
and plots these peak values on the waveform of ECG signal at each instant of its
respective time. These types of ECG signal processing techniques will be aided for
scientific research purposes instead of proceeding with a medical diagnosis.

Keywords Graphical user interface · MATLAB · Analysis of QRS complex ·


Heart rate detection · ECG feature extraction · Leads of ECG

1 Introduction

The ECG or electrocardiogram is determined as one of the simplest and accurate tests
utilized for estimating the heart’s clinical condition. An ECG analysis is the most
widely utilized technique since it is capable of screening the different abnormalities
of the heart. ECG tools and devices which is utilized for measuring the heart rate
and plotting the waveform based on the frequency of heartbeat are most commonly
available in the medical centers and it is declared to be as less expensive and riskless
with respect to treatment. From the tracing of the ECG waveform, the identification
of the following data can be determined [1].
1. Heart rhythm
2. Heart rate
3 Thickening of cardiac muscle
4. Any physical symptoms for the possibilities of heart attack
5. Detection and the estimation of coronary artery disease
6. Detection of conduction abnormal state (an abnormal state with which the
spreading of electrical impulse results across the cardiac muscle).
The above-mentioned traits shall be treated as one of the most significant degra-
dation factors affecting cardiac functionalities. The results of ECG signal tracing
make the doctors to determine the exact clinical condition of the heart and also the
researches in this field does the work in enhancing the non-invasive detection tech-
nique in estimating the cardiac abnormalities. Additional testing of ECG is normally
carried out for determining the serious problem which induces a better way to proceed
with the treatment. In the hospitals, the test related to ECG most commonly includes
stress test and catherization of cardiac muscle. The subjects with which they are
resulted with severe heart pain, increased blood pressure, variation in the velocity
of flow of blood through the arteries and veins, and unbalanced cholesterol levels
in the blood are susceptible to undergo the ECG diagnosis test to check whether
there is any thickening of valves over the cardiac muscle which causes arrhythmia
or abnormality condition. It is very easier to identify through the variations in the
A Graphical User Interface Based Heart Rate Monitoring Process … 483

electrical flow of the signal pattern of ECG lends the reason for the abnormal condi-
tion of the heart. Hence, at the time of every cardiac cycle, an ECG is represented as
the graphical pattern of the bioelectrical signal which is being generated within the
human body [2]. It is highly possible to acquire the relevant and useful information
via the ECG chart or graphical waveform which makes the relation to the heart’s
function through the waves and the baseline denoting the variations of the voltage
of cardiac muscle at every instant of time [3]. An ECG is very much essential and it
possesses its adequate amplitude values at every instant of time [4] and it aids in the
determination of certain clinical cases after diagnosis. They are as follows.
1. Heart attack (myocardial infarction)
2. Electrolytic transformations
3. Ventricular and auricular hypertrophy
4. Pericarditis
5. Cardiac arrhythmias
6. Medicine effects over cardiac muscle, specifically quinidine, and digital
7. Abnormal blood pressure and inadequate velocity of blood flow.
The proposal over this study determines the prototype over the package of soft-
ware. This development over this software has been made mainly for scientific and
technological research over the medical diagnosis under the clinical setting. In this
proposed study, MATLAB has been utilized to develop the software package in
designing the graphical user interface for analyzing P, Q, R, S, and T peak values of
an ECG signal. The computation of all the peak value parameters has been made from
the recordings of the ECG signal. The acquisition of ECG signal in terms of input
informational data can be done in terms of binary or text-formatted files or simple
excel files. The structure of this paper has been prescribed as follows. The second
section determines the literature review on feature extraction techniques of ECG
signal. The third section determines the summary on the nature of ECG continuous
signal patterns and its placing of leads. The fourth section describes the software
developed techniques which analyzes the ECG signal to obtain P peak, Q peak, R
peak, S peak, and T peak amplitudes. The fifth section focuses on the results and
discussion obtained from the simulations, and lastly, the sixth sections are ended
with its conclusion and the future scope of work.

2 Review on a Related Literature Survey

The recording of continuous ECG signal plays a vital role in the initial stage of
diagnosing the cardiac disease which is later on processed and the analysis which
will be carried out with the assistance of signal conditioning device. Even though the
ECG analysis has been acquired through the determination of heartbeats and heart
rates, the abnormalities over the heart functioning results due to the physical change
in position and the size of the chambers or the causes resulted due to the continuous
consumption of drugs prescribed by the physicians. Acquiring the ECG signals from
484 M. Ramkumar et al.

the electrocardiogram device helps in diagnosing the cardiac abnormality functions


by suitable predefinition of the parameters of ECG signals which is configured in
the device, and also by the development of optimization prediction techniques, the
abnormality conditions of the heart are analyzed more accurately with the devel-
opment of computational algorithms [5]. Many computational artificial intelligence
algorithms have been developed to classify various abnormality factors associated
with cardiac muscle functioning. It includes fuzzy logic techniques, artificial neural
networks, analysis of digital signal processing techniques, support vector machines,
genetic algorithm, self-organizing map, hidden Markov model, Bayesian approach,
slope vector waveform (SVW), difference operation method (DOM), discrete wavelet
transform (DWT), and fast Fourier transform (FFT) [6–11]. All the above-mentioned
methods have been revealed their study based on analyzing the P wave, QRS complex,
and T wave peak amplitude for different cycles of ECG signals. For instance, the
method of slope vector waveform (SVW) has proposed an algorithm to make the
detection of QRS complex of ECG signal and evaluated R-R interval. This SVW
technique has offered better prediction over the R-R interval and QRS peak with
adequate accuracy which aids in the best classification results due to better extraction
of features over ECG wave component [12]. Whereas the method of discrete wavelet
transform (DWT) performs the process of feature extraction in acquiring adequate
input data from the ECG input signal and finally does the process of classification
into different sets. The proposal in this particular work using the DWT technique is
consisting of the series of processing functional modules such as preprocessing to
eliminate the noise and process noise-free signal, feature extraction, feature selec-
tion, and finally classification. In the process of feature extraction, the design of the
wavelet transform is applied to focus the problem on non-stationary parameters of
ECG signal. Its derivation is obtained from the single function of the generation
which is denoted as mother wavelet by the process of translation and dilation. By
applying this technique over the feature extraction, it is extracted with the variable
size of the window, narrowing of the signal at very high frequency, very broad at
minimum frequencies that lead to an optimal resolution over the frequency-time
domain in all the range of frequencies [13].
Difference operation method (DOM) is utilized for making the detection of QRS
complex for an ECG signal and it is inclusive of two main processes. The first is the
difference operation process (DOP) and the second is the waves detection process.
The outline of this method is inclusive of two main steps. The initial step is to
find the peak point of R by the application of operation on the difference equation
for the ECG signal component. The second focuses on the Q and S peak points in
accordance with the R peak point to determine the QRS complex [14]. One more
variant system to determine the ECG waveform features by the utilization of neural
networks has advertised an integral system with which this system possess a tech-
nique of cepstrum coefficient for enabling the extraction of features from the models
of artificial neural networks and long-term signals of ECG to do the classification
process. From the mentioned methods, the recognition of features can be made for an
ECG signal and do the process of classification and also detect the arrhythmias [15].
Whereas the novel technique by utilizing the neural networks and wavelet transform
A Graphical User Interface Based Heart Rate Monitoring Process … 485

method for performing the classification of ECG images on the basis of its extracted
features. Extraction of features is made from wavelet decomposition method with
the intensity of ECG images and then processed further by the utilization of artificial
neural networks. The essential features are median, mean, maxima, minima, stan-
dard deviation, mean absolute deviation, and variance [16]. Another technique by the
utilization of artificial neural networks for the detection of PQRST peak waveform
by the utilization of derivative with which the establishment is made for maximum
and the minimum search for the derivative of an ECG wave component. The R peak
which is denoted as the highest peak must be within the zero crossing between the
derivative of the minimum and the maximum. Subsequently, the Q peak must be
existing prior zero crossing maximum and the presence of S peak must be relying
on the zero crossing after the point of minimum.
The P peak and the T peak are made similarly by focusing its view on the local
maxima in the original existing signal and then utilizing the derivative for making the
identification of peak and the end points [17]. In this proposed study, it is presented
with the dependable and the simple methodology for detecting the P peak, QRS
peak, and T peak values of an ECG signal. This method is preceded on the basis of
determining the mathematical relationship between the maximum peak and valley
values of the ECG signal with respect to time. In this proposed study, the GUI
has been designed by using the MATLAB software for the detection of PQRST
peak waveform by the application of the simple mathematical algorithm in order
to acquire the PQRST waveform and plot these values over the ECG signal with
respect to time. Apart from these, the process of denoising has been implied for the
extraction of noise-free signal.

3 Description of ECG Signal

An electrocardiogram is determined as the measurement of electrical impulse activity


of cardiac muscle which could be acquired from the skin’s surface from various angles
as shown in Fig. 1. When the contraction of the cardiac muscle is initiated, the blood
is pumped to various parts of the body with which the action potentials are produced
through the mechanical process within the cardiac muscle which then leads to the
activity of electrical impulse [18].

3.1 Waveform of ECG

An ECG signal is denoted as the waveform plot which gets explored from the print
out as a paper trace with which the recording is done to acquire the activity of elec-
trical impulses of the human’s cardiac muscle. The normal ECG signal component
is comprising of a series of negative and the positive cycles of waveforms such
as P wave signal, QRS complex waveform, and T wave signal. The existence of
486 M. Ramkumar et al.

Fig. 1 Anatomy of cardiac


muscle along with the
signals acquired from
different regions of the heart

P wave amplitude and the QRS complex determines the linear relationships over
distinguishing different irregularities of cardiac muscle. The typical ECG waveform
is depicted in Fig. 2, wherein which the peak amplitude of P wave denotes the state
of atrial depolarization and the initial part of deflection in the upward direction.
Whereas the QRS complex is comprising of three peaks, namely Q peak, R peak,
and S peak, which determines the state of depolarization of ventricles and the T
peak of the waveform corresponds to the ventricle repolarization and results with the
termination of ventricle systolic effect [18].
The typical ECG waveform which is depicted in Fig. 2 determines that the hori-
zontal axis of the plot denotes time variant parameter, whereas the vertical axis of
the plot denotes the depth and height of the wave and its amplitude is measured in
terms of voltage. The first timing interval over the horizontal axial line is termed to
be as P-R timing interval which denotes the time period from the P peak onset to the
initial position of the QRS complex. The interval denotes the axial time in between

Fig. 2 Typical ECG


waveform [18]
A Graphical User Interface Based Heart Rate Monitoring Process … 487

Table 1 Range of normal amplitudes of typical ECG waveform [18]


Signal Range of amplitudes
Potential of Potential of Potential of Potential of Potential of Potential of
Lead 1 Lead 2 Lead 3 Lead aVR Lead aVL Lead aVF
P 0.015–0.12 0.00–0.19 -0.073–0.13 -0.179–0.01 -0.085–0.140 -0.06–0.16
Q 0.00–0.16 0.00–0.18 0.00–0.28 0.00–0.90 0.00–0.22 0.00–0.19
R 0.02–0.13 0.18–1.68 0.03–1.31 0.00–0.33 0.00–0.75 0.02–0.15
S 0.00–0.36 0.00–0.49 0.00–0.55 0.00–0.15 0.00–0.90 0.00–0.71
T 0.06–0.42 0.06–0.55 0.06–0.3 0.00–0.00 −0.16 to 0.27 0.04–0.46
Signal Range of amplitudes
Potential of Potential of Potential of Potential of Potential of Potential of
Lead V1 Lead V2 Lead V3 Lead V4 Lead V5 Lead V6
P −0.08 to 0.15–0.16 0.00–0.18 0.01–0.23 0.00–0.24 0.00–0.19
0.18
Q – – 0.00–0.05 0.00–0.16 0.00–0.21 0.00–0.27
R 0.00–0.49 0.04–1.52 0.06–2.24 0.18–3.20 0.42–2.42 0.25–2.60
S 0.08–2.13 0.19–2.74 0.09–2.22 0.02–2.09 0.00–0.97 0.00–0.84
T 0.03–1.22 −0.14 to 0.00–1.60 0.05–1.31 0.0–0.96 0.0–0.67
1.44

the initial position of the atrial depolarization and the initial arise of ventricular
depolarization. The QRS complex which is followed by the S-T framed segment
determines the section between the terminal point of S peak denoted as J point, and
the initial position of T peak which makes the representation over the timing space
between the depolarization and the repolarization of ventricles. The interval of Q-T
is denoted as the time scale between the initial arise of Q peak to the terminal end of T
peak over the cardiac muscle’s electrical impulse cycle. The interval of Q-T denotes
the entire duration of electrical impulse activity of the ventricle depolarization and
repolarization. Table 1 denotes the normal levels of amplitudes for an ECG signal.

3.2 Leads in ECG [19–23]

Any contraction over a muscle produces electrical impulse variation in terms of


depolarization, and subsequently the detection over this variation could be made by
the electrode pair which is placed on the body surface by utilizing the leads of ECG
and these leads constitute to an imaginary axial line in-between two electrodes of
ECG and they are all together comprising of 12 leads and each one of these leads
denotes the activity of electrical impulse from different orientation acquired from
the cardiac muscle. This leads to 12 various electrical plots which correspond to
various shapes and voltage levels depending on the electrode placement over the
body surface, which thereby results with a multi-dimensional projection of cardiac
488 M. Ramkumar et al.

muscle with different states of response over its function. The standard 12 leads of
ECG are partitioned into two groups with which the first set of the cluster is denoted
as limb leads and it is comprising of three bipolar limb leads (1, 2, and 3), wherein
which the lead 1 is acquired between the positive electrode and the negative electrode
in which the negative electrode is made located on the right forearm and the positive
electrode is made located on the left forearm. Lead 2 is acquired between the positive
and the negative electrode with which the negative electrode has relied on the right
forearm, whereas the positive electrode relies on the left foot. Then, the lead 3 is
acquired between the positive and the negative electrode with which the negative
electrode has relied on the left forearm, whereas the positive electrode relies on the
left foot. The second cluster of leads is represented as chest leads with which they are
denoted in terms of AVR, AVL, and AVF. They are also represented as V leads (V1,
V2, V3, V4, V5, and V6) or precordial leads. The 12 ECG leads were described and
the schematic representation which determines the position of electrode mapping is
depicted in Fig. 3.

4 Definition of Heart Rate

Heart rate is determined as the velocity of heartbeat and the measurement is done
by the total count of heartbeats in the specific interval of time and it is normally
expressed in terms of bpm (beats per minute). The normal rate of heart from the
normal human being is ranging from 60 to 100 beats calculated per minute and its
value gets varied with the variation in sex, age, and also other relevant factors. When
the heart rate is minimum than 60 beats per minute then the condition is represented
as Bradycardia, whereas when the heart rate is maximum than that of 100 beats per
minute the condition is stated as Tachycardia [24, 25]. There are several techniques
to determine the value of heart rate which is dependent on the preceding ECG signal
by utilizing interval space of R-R peak as the subsequent follows. The initial one
depends on making the count of the total number of R peak in the strip of 6 s cardiac
rhythm and the value has to be multiplied by the factor of 10. As the second step,
determines the total count of small boxes, with which the R-R interval is represented
in mm as a typical value. Then perform the process of division with the factor of
1500 in order to determine the heart rate. As the third step, determines the total count
of large boxes in-between the preceding successive R peaks for arriving the typical
value on the basis of R-R interval. Finally, perform the division by the factor of 300
to the resulted number and make the determination of heart rate [22].

5 Results and Discussion

In this study, the results have been simulated from the MATLAB software with which
the recordings of electrocardiogram signal have been analyzed to acquire the P peak,
A Graphical User Interface Based Heart Rate Monitoring Process … 489

Start

Input ECG Data

Selection of Lead (1,2,3)

Selection of file type (.mat file, xlsx file, txt file)

Read the ECG signal

Removal of Low frequency components

Usage of windowing filter and perform thresholding

Adjust the filter coefficients and plot the PQRST wave and detect the heart rate

End

Fig. 3 Precordial chest electrodes usually located on the left side of the chest leads

Q peak, R peak, S peak, and T peak and finally determine the detection of heart rate.
The graphical user interface (GUI) is very simpler for obtaining PQRST values of
peak and gets plotted by the analysis of ECG. It is much required for all the human
who makes use of testing of this source code, must determine the selection of total
sample count for a single ECG waveform cardiac cycle to detect the PQRST peak and
must equal the total sample count of 400. This software tool produces the essential
following features for processing ECG signal and analyzing it.
1. Preliminary recordings of ECG signal have to be loaded from any informational
source as in the form of excel, binary, or text files.
2. The recordings of ECG which has been loaded has to be plotted for every lead.
3. The detection of PQRST has to be made as a unique value and also it should be
made to appear on the plot.
4. The graph has to be exported in terms of bmp or png or fig types.
5. The data has to be saved as either mat or txt or xlsx types.
490 M. Ramkumar et al.

6. The plotting has to be extracted for any leads of ECG as a response.


7. Finally, R peak detection and the heart rate measurement have to be established
with the following sequence as depicted in the following Fig. 4.
Figure 4 depicts the sequential steps for plotting ECG PQRST waveform and the
detection process of heart rate. Initially, the informational data of ECG has to be read
after the selection of lead and the information type from any of the source files that
have to be acquired as an input. Once the acquisition is made, the low-frequency
components are removed and later on by using the window filtering technique and
do the process of thresholding. Once the thresholding is done, the filter coefficients
are adjusted the R peak is detected to determine the estimation of heart rate. And the
PQRST peak is detected by sourcing the code with simple mathematical operations
in the MATLAB.

Fig. 4 Sequential steps for


plotting ECG PQRST
waveform and the detection
of heart rate

Fig. 5 Graphical user


interface design for
acquiring the ECG signal
A Graphical User Interface Based Heart Rate Monitoring Process … 491

Figure 5 depicts the design of the graphical user interface (GUI) for plotting the
ECG wave and determine the plot of PQRST peak. The following are the sequence
of steps followed to frame an algorithm using the MATLAB tool.
1. Determine the sampling rate for the ECG waveform and estimate the calculation
of heart rate.
2. By using the window filtering technique, determine the detection of heart rate.
3. Obtain the plot and save it in the form of (.png) or (.bmp) or (.fig) type.
4. Next, the data has to be saved in the format of txt or mat or xlsx type.
5. Perform the analysis of acquired ECG informational data and estimate the values
after the detection of PQRST peak values.
6. After the plot is created for the PQRST peaks, the plot is extracted with the
marked peak values.
7. Then, the marked or acquired plot is saved in any one of the represented formats
as (.png) or (.bmp) or (.fig) type.
8. Selection based on the requirement has to be done to print the entire samples
or the specific samples alone.
9. The graph shall be finally saved and proceed with the program for the heart rate
detection and acquire the ECG plot again in the txt or mat or xlsx type.
10. Based on the selected lead, the response could be seen with the adequate plot
of ECG to read the heart rate from the waveform.
11. On entering the sampling data range for analysis, the ECG is imported to acquire
PQRST peaks.
Figure 6 illustrates the plotting of raw ECG signal. Later on, the filtered signal
plotting is exhibited in Fig. 7. The detection over the R peak points in order to estimate
the heart rate of the ECG signal component is depicted in Fig. 8. It determines the
demonstration over how to acquire P, Q, R, S, and T peak values for the approximated
range of data values say nearly 400 samples and establish the process of heart rate

Fig. 6 Plotting of raw ECG Raw ECG Data plotting


200
signal
180

160

140

120
amplitude

100

80

60

40

20

0
0 5 10 15 20 25 30
time
492 M. Ramkumar et al.

Fig. 7 Plotting of filtered Filtered ECG signal


120
ECG signal

100

80

amplitude
60

40

20

-20
0 5 10 15 20 25 30
time

Fig. 8 Detection of peak PEAK POINTS DETECTED IN ECG SIGNAL


120
points in the ECG signal
100

80
amplitude

60

40

20

-20
0 5 10 15 20 25 30
time

detection. Figure 9 depicts the acquisition made on the QRS filtered signal along with
the identified pulse train which is formulated by the adaptive threshold detection.
From the above-mentioned steps, the graphical user interface could be designed in
order to acquire the ECG signal for further processing. The main impact of designing
a GUI is the ECG could be acquired directly from the database such as MIT-BIH
arrhythmia database to make the detection of cardiac abnormalities. In this proposed
study, the acquisition of the ECG signal could be processed only after the process of
detecting the peaks such as PQRST from the ECG is being done and the mapping
could be directly interpreted to measure the heart rate and it could also be monitored
continuously. In Fig. 5, GUI for acquiring ECG is being represented. There is a
virtual key for acquiring the input which is stored in the text format. Once the ECG
A Graphical User Interface Based Heart Rate Monitoring Process … 493

QRS on Filtered Signal


0.5

-0.5

-1
100 200 300 400 500 600 700 800 900 1000
QRS on MVI signal and Noise level(black),Signal Level (red) and Adaptive Threshold(green)
0.3

0.2

0.1

100 200 300 400 500 600 700 800 900 1000
Pulse train of the found QRS on ECG signal
0.4
0.2
0
-0.2
-0.4
100 200 300 400 500 600 700 800 900 1000

Fig. 9 Sequence of acquisition of filtered QRS along with the representation identified of pulse
train

signal acquisition is completed, based on the peak values of PQRST and the time
duration of R-R, R-T, Q-T, and S-T interval the abnormality state of cardiac muscle
could be diagnosed by processing it through computational intelligence techniques.
For the detection of cardiac arrhythmias from the ECG signals acquired from MIT-
BIH arrhythmia database, by using the machine learning algorithms such as artificial
neural networks, genetic algorithm, fuzzy logic techniques, and so on could be devel-
oped and this will aid as a non-invasive technique in detecting the abnormal states of
cardiac muscle which will lead to immediate death. Preprocessing of raw acquired
ECG signal inclusive of denoising, dimensionality reduction and baseline wander
removal, feature extraction, feature selection and classification of ECG whether it is
coming under the normal category or abnormal category.
The essential segment in designing the GUI is once the clear plot has been made
with the predefined peaks and interval between the peaks, enhancement can be made
in developing the computational intelligence techniques for classifying the cardiac
arrhythmias and it will lead the doctors to proceed and carry their right path for
treatment. The test has been undergone by carrying the data acquisition process
from the MIT-BIH physionet database. The normal ECG signal is acquired from the
physionet database and the plot over the raw and filtered component of ECG has been
made. As the simulated pattern of GUI, the plot of the raw ECG signal is depicted
in Fig. 6. As an initial step immediately after the ECG signal acquisition process,
denoising is applied with which the noise-free ECG signal has resulted and the plot
of noise-free filtered ECG signal is depicted in Fig. 7. For the noise-free ECG signal
494 M. Ramkumar et al.

Table 2 Comparison of
Parameters of ECG Standard PQRST Detected PQRST
PQRST peaks and the heart
values values
rate of normal acquired ECG
signal P 0.25 mv 0.054 mv
Q 25% of R wave −0.435 mv
R 1.60 mv 1.495 mv
T 0.1–0.5 mv 0.114 mv
Heart rate 60–100 bpm 78 bpm

component, the R peak detection is made represented as a plot and is depicted in


Fig. 8. The parametric representation of peak values of the acquired normal ECG
signal is determined over the comparison of the standard PQRST value along with
the heart rate is being depicted in Table 2.
From Table 2, it is being inferred that the comparison has been made for the peak
values of PQRST waveforms and the heart rate between the standardized values
and the obtained values. The detected values of P peak, Q peak, R peak, and T
peak are 0.054 mv, −0.435 mv, 1.495 mv, and 0.114 mv, respectively. Likewise,
the determined value of the heart rate is 78 beats per minute. Herewith, an analogy
has been created for determining the detection of PQRST peaks and the heart rate
based on GUI design which then enhances for the development of machine learning
algorithms.

6 Conclusion and Scope of Future Work

This study has proposed a technique of monitoring the heart rate and the detection
of PQRST peak from the acquired ECG signal component by designing the GUI
from the MATLAB software. This process of detection can be used by the clinical
analysts as well as the researchers in the field of diagnosing the abnormalities of the
ECG signal. The most initial technique which is used to determine the analysis of
ECG signal to estimate the PQRST peaks on the basis of digital signal processing
technique and artificial neural networks could be enhanced in terms of accuracy by
using this matrix laboratory software. The prediction of the optimal value of the heart
rate could be made accomplished by the proposed method of extraction from GUI.
It is also used for the prediction of different cardiac disease which is designated
as cardiac arrhythmias. As future work, by using the computational intelligence
techniques, the cardiac abnormalities classification algorithms could be implemented
in diagnosing the arrhythmias in a non-invasive manner. As similar to that of MIT-
BIH arrhythmia database, the processing over an ECG signal could be made by the
real-time acquisition of ECG signal processing for which the graphical user interface
could be developed integrating the machine learning algorithms in diagnosing the
abnormality state conditions.
A Graphical User Interface Based Heart Rate Monitoring Process … 495

References

1. Bronzino JD (2000) The biomedical engineering handbook, vol. 1, 2nded. CRC Press LLC
2. Goldshlager N (1989) Principles of clinical electrocardiography, Appleton & Lange, 13th edn.
Connecticut, USA
3. Singh N, Mishra R (2012) Microcontroller based wireless transmission on biomedical signal
and simulation in Matlab. IOSR J Eng 2(12)
4. Acharya RU, Kumar A, Bhat PS Lim CM, Iyengar SS, Kannathal N, Krishnan SM (2004)
Classification of cardiac abnormalities using heart rate signals. Med Biol Eng Comput 42:172–
182
5. Babak M, Setarehdan SK (2006) Neural network based arrhythmia classification using heart
rate variability signal. In: Signal Processing Issue: EUSIPCO-2006, Sept 2006
6. Beniteza D, Gaydeckia PA, Zaidib A, Fitzpatrickb AP (2001) The use of the Hilbert transform
in ECG signal analysis. Comput Biol Med 31:399–406
7. De Chazal P, O’Dwyer M, Reilly RB (2004) Automatic classification of heartbeats using ECG
morphology and heartbeat interval features. IEEE Trans Biomed Eng 51(7):1196–1206
8. Dewangan NK, Shukla SP (2015) A survey on ECG signal feature extraction and analysis
techniques. Int J Innov Res Electr Electron Instrum Control Eng 3(6):12–19
9. Dima SM, Panagiotou C, Mazomenos EB, Rosengarten JA, Maharatna K, Gialelis JV, Curzen
N, Morgan J (2013) On the detection of myocardial scar-based on ECG/VCG Analysis. IEEE
Trans Biomed Eng 60(12):3399–3409
10. Ebrahimi A, Addeh J (2015) Classification of ECG arrhythmias using adaptive neuro-fuzzy
inference system and Cuckoo optimization algorithm. CRPASE 01(04):134–140. ISSN 2423-
4591
11. Burhan E (2013) Comparison of wavelet types and thresholding methods on waveletbased
denoising of heart sounds. J Signal Inf Process JSIP-2013 4:164–167
12. Ingole MD, Alaspure SV, Ingole DT (2014) Electrocardiogram (ECG) signals feature extraction
and classification using various signal analysis techniques. Int J Eng Sci Res Technol 3(1):39–44
13. Jeba J (2015) Classification of arrhythmias using support vector machine. In: National
conference on research advances in communication, computation, electrical science and
structures-2015, pp 1–4
14. Kar A, Das L (2011) A technical review on statistical feature extraction of ECG signal. In:
IJCA special issue on 2nd national conference computing, communication, and sensor network,
CCSN, 2011, pp 35–40
15. Kelwade JP, Salankar SS (2015) Prediction of cardiac arrhythmia using artificial neural network.
Int J Comput Appl 115(20):30–35. ISSN 0975-8887.
16. Kohler B, Hennig C, Orglmeister R, (2002) The principles of software QRS detection reviewing
and comparing algorithms for detecting this important ECG waveform. IEEE Eng Med Biol
42–57
17. Kutlu Y Kuntalp D (2012) Feature extraction for ECG heartbeats using higher order statistics
of WPD coefficients. Comput Methods Progr Biomed 105(3):257–267
18. Li Q, Rajagopalan C, Clifford GD (2014) Ventricular fibrillation and tachycardia classification
using a machine learning approach. IEEE Trans Biomed Eng 61(6):1607–1613
19. Luz EJDS, Nunes TM, Albuquerque VHCD, Papa JP, Menotti D (2013) ECG arrhythmia
classification based on optimum-path forest. Expert Syst Appl 40(9):3561–3573
20. Malviya N, Rao TVKH (2013) De-noising ECG signals using adaptive filtering algorithms. Int
J Technol Res Eng 1(1):75–79. ISSN 2347-4718
21. Markowska-Kaczmar U, Kordas B (2005) Mining of an electrocardiogram. In: Conference
proceedings, pp169–175
22. Masethe HD, Masethe MA (2014) Prediction of heart disease using classification algorithms.
In: Proceedings of the world congress on engineering and computer science WCECS 2014, vol
2, pp 22–24
496 M. Ramkumar et al.

23. Moavenian M, Khorrami H (2010) A qualitative comparison of artificial neural networks and
support vector machines in ECG arrhythmias classification. Expert Syst Appl 37(4):3088–3093.
https://doi.org/10.1016/j.eswa.2009.09.021
24. Muthuchudar A, Baboo SS (2013) A study of the processes involved in ECG signal analysis.
Int J Sci Res Publ 3(3):1–5
25. Narayana KVL, Rao AB (2011) Wavelet-based QRS detection in ECG using MATLAB. Innov
Syst Des Eng 2(7):60–70
Performance Analysis of Self Adaptive
Equalizers Using Nature Inspired
Algorithm

N. Shwetha and Manoj Priyatham

Abstract Through the communication channel, the sender can send a message to the
receiver. But due to some noise in the channel, the sent message is not similar to the
received message. Likewise, in the digital communication channel, the broadcasted
signal may get dispersal. So that both the communicated and transmitted informa-
tion is not similar. An ISI (inter-symbol interference) and Additive noise cause the
dispersal of the signal. If the channel is exactly established, the ISI can be reduced. In
training, though, rarely have preliminary information of the channel attributes even
if there is an obvious issue of inaccuracy that occurs in physical deployments of the
filters. The equalization is utilized to counteract the distortion of intrinsic residual.
In this article, the accomplishment of an adaptive equalizer for data transfer through
a network that triggers ISI (inter symbol interference). One chance to decrease the
impact of this challenge is to utilize a channel equalizer at the receiver. The role
of the equalizer is to create a modernized version of the communicated signal as
near as possible to it. The equalizer is utilized to decrease the BER (bit error rate);
the proportion of received bits in error to overall transferred bits. In this article, the
hybrid approach like least mean square (LMS)and EPLMS algorithms are utilized
to detect the minimum MSE i.e. mean square error and Optimum Convergence rate
which will improve the efficiency of the communication system.

Keywords Inter symbol interference · Least mean square · Evolutionary


programming LMS · Bit error rate (BER) · Adaptive equalizer · Communication
channel

N. Shwetha (B)
Department of ECE, Dr. Ambedkar Institute of Technology, Bangalore, Karnataka 560056, India
e-mail: shwethaec48@gmail.com
M. Priyatham
Department of ECE, APS College of Engineering, Bangalore, Karnataka 560082, India
e-mail: manojpriyatham2k4@yahoo.co.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 497
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_37
498 N. Shwetha and M. Priyatham

1 Introduction

Due to the arrival of digital technology, digital signal communication got essential in
a wide range of applications. Such applications take the initiative in the digital field
several modulation systems and additional updates in them [1]. But those plans and
their updates were highly affected by noise. Basically, two fundamental problems
occur in traditional digital transmission methods. Mostly, ISI (Inter Symbol Interfer-
ence) and noise have a high impact on those methods and their updates. These errors
were caused due to channel characteristics that linked among receiver and transmitter
and the dissemination of transferred pulse. The noise impact on the communication is
determined by the channel features and may also be diminished with the appropriate
selection of channel [2–5]. However, the channel is still noisy the signal that the user
receive could be less impacted if the SNR kept at the transmitter by enhancing trans-
ferring signal power [6]. Because of ISI on symbol electrical power is propagated
into a new symbol duration which impacts the interaction and diffuses the symbol.
An efficient method of decreasing this is utilizing an adaptive equalized channel.
In digital communication systems, adaptive equalization is critical to diminish the
impact of ISI, where an adaptive algorithm like the LMS algorithm will alter the
measurements of the equalizer. When everything is good in the receiver, there is no
communication among the consecutive symbols; each symbol enters and is decrypted
individually of all others [7, 8]. However once the symbols communicate with each
other, the waveform of a single symbol harms the value of an adjacent symbol, and
then the accepted signal turns into a distorted. It is hard to differentiate the message
from such a transmitted pulse or accepted signal are rubbed out so that signals related
to the various symbols are not distinguishable. This impairment is known as the ISI
(inter symbol interference). This impact can be reduced by utilizing the channel
equalizer at the receiver. Two of the very intensively emerging fields of digital trans-
missions, such as cellular communications and digital subscriber lines are heavily
reliant on the implementation of a trusted channel equalizer (Fig. 1).

Fig. 1 Fundamental structure of channel equalization


Performance Analysis of Self Adaptive Equalizers … 499

where, Z −d is a delay function.

2 Concept of Inter Symbol Interference

In the digital communication system, if everything is right at the receiver side then
there will be no interaction among successive symbols. Here each of the signals which
are arrived are decoding self-reliantly among others. But when it comes to symbol
interaction, one of the waveforms will corrupt the values of the next nearby symbols.
Due to this, the received signal will be distorted. Because of this, it is difficult to
differentiate messages from such a received signal. The shortage is known as inter-
symbol interference (ISI). The purpose of an equalizer is to reduce the ISI so that can
have a reconstructed signal from the transmitter side. Due to this it also reduces the bit
rate of the transmitted signal. As assumption made in all pass AWGN is impractical.
The lack of frequency spectrum the signal is filtered to minimise the bandwidth
so that frequency structured division can be obtained. There are many band pass
channels available in practical but the response varies with respect to the different
frequency components. To avoid this, the simplest AWGN model is needed to have
for representing the practical channels very accurately. Such commonly available
retirement is a dispersive channel model shown in Fig. 2.

y(t) = x(t)∗ h c (t) + n(t) (1)

From the equation u(t) is the transmitted signal, hc (t) is the impulse response of the
channel and n(t) is AWGN power spectral density N 0 /2. The dispersive characteristic
of the channel is prototyped by using the linear filter hc (t). This dispersive channel
model is a low pass filter. By using this low pass filter as can line the transmitted
signal with respect to time causing the effect of symbol difficult to adjust symbols
in a practical case while transmitting the signals from the transmitter. Due to this,
the ISI will deteriorate the error caused by the transmitted signal with respect to
error performance in the communication system. Two main methods are mainly
concentrated on which eradicates the ISI deterioration effect. In the first method, the

Noise

Channel ∑
x(t) y(t)

Fig. 2 Inter-symbol interference


500 N. Shwetha and M. Priyatham

band limited transmission pulses are used to minimize the ISI. The pulses obtained
by the ISI are called free pulsed which are known by its name Nyquist Pulses. In the
second method, the received signal is needed to filter to cancel the ISI which was
introduced by the channel impulse response. This is known as equalization.

3 Channel Equalization

Equalization is utilized in several interaction applications in which the interference


and noise do occur and impossible to eliminate. Equalization is the procedure of
pairing frequency modules to the transmitter to the reception pulse to decrease the
interference and some noise produced throughout the broadcast. The device that
makes the balance among received pulse and receiver is referred to an equalizer. In
this article, the focus is on equalizing the transmitter to the channel [9–11].

3.1 Channel Equalization

Figure 1 shows the structure of channel equalization. The equalization is the proce-
dure of changing, channel equalization is adapting the coefficients of the channel.
The radio transmitter filter is levelled to the channel over which information is trans-
ferred. The channel is equalized by many algorithms. The equalizing receiver filter
reduces the impact of ISI on the channel. At present, adaptive algorithms catch the
attention of most investigators [12]. The channel is equalized by several determin-
istic algorithms. The most powerful deterministic algorithm utilized to eliminate ISI,
prior to adaptive methods is LMS, where channel response for inclinations assessed
depending upon the highest possible probability function. The coefficients of the
receive filter are equalized or adjusted to correspond to the channel [13–16]. The
ISI and noise are reduced by modifying coefficients at the output. Then an error
signal power the adaptation of equalizer. Contemporary data transmission methods
are manipulating progressively more physical events to develop the flow of data.
In late periods, a huge measure of effort was accomplished in the advancement of
transmission. To suit into limitations expressed in the global radio guidelines, a lot
of quickening and improvements were applied to previous forms of data modulation
and encryption. According to the volume of transmitted data, the harmful trans-
mission impacts are getting increasingly huge with the expansion of data intensity
in the channel, [17]. To counteract the distortion, modern-day radio communica-
tion equipment uses extraordinary measures that include digital signal processing
and channel analysis. Equalizers can recreate the transmitted pulse from its altered
version. The equalization procedure which improves the noise will not be able to
attain improved functioning [18]. This article concentrates on methods of adaptive
channel equalization, with an intent to replicate real interaction circumstances and
Performance Analysis of Self Adaptive Equalizers … 501

Fig. 3 Performance of Transmitting Estimated


channel equalizer Channel Equalizer
signal Signal
n

Channel
Estimation
ℎ(n)

execute a powerful error correction algorithm on the committed computer equip-


ment. Equalizers are essential for the effective functioning of electronic methods
i.e.an analogue transmission television. An equalizer utilized to maintain the pulse
contents in television application and cancels any interruptions in radio transmissions
like group and phase delay. Figure 3 illustrates the performance of channel equalizer.
Channel equalization is a process of adjusting the equalizer coefficients to channel
coefficients to reduce ISI. If a channel is considered as a filter then equalizer is an
inverse filter. An equalizer is not only going to compensate the effect of the channel
but also going to compensate all the unnecessary effects transmitted signal went
through, i.e. due to pulse shaping, transmitter filter and receiver filter to get back the
initial signal. The block diagram of channel equalization is shown in Fig. 3. When the
channel is known, then by sending a known signal through the channel error signal
can be calculated. The error signal is the difference between the desired signal and the
received signal. The error signal is the driving force for an equalizer. Equalizer will
aim to minimize the error signal. Hence optimization techniques/algorithms are used
to achieve this. There are many algorithms used for equalization. The most effec-
tive algorithm used before the adaptive algorithm is Maximum likelihood sequence
estimation (MLSE), wherein it depending on MLF of the channel response for the
impulse which was measured. The equalizer coefficients are adjusted or equalized
to nullify the effect by channel. Adjustment of the coefficients is done to reduce
the ISI and noise at the output. Hence, from the distorted version of the transmitted
signal, the original version can be reconstructed by the equalizer. Once the equalizer
weights are set then it would not change and then required information can be sent
through the channel.

3.2 Adaptive Channel Equalization

Figure 4 illustrates an adaptive filter structure and its working is specified in the four
stages that are as follows:
1. The signal received is being handled by the filter.
2. The response of the filter that characterizes the relation among obtained and
developed pulse.
502 N. Shwetha and M. Priyatham

Fig. 4 Adaptive filter

3. The factors of filter that are possible to adjust.


4. An adaptive algorithm that explains how the filter factors are modified from any
time moment to the later.
1. The filter handles the received pulse in two different ways they are:
(a) Training method: The filter response of the transmitter i.e. equalization is
accomplished in this method with a recognized input pulse and training
values for the parameters to adapt the approaches are modified with the
recognized signal as an input to the filter.
(b) Testing form: The channel equalizer has been examined with the pulse in
this method with undetermined consequences. When the equalizer is trained,
then this approach has been in existence. The equalizer is examined with the
actual time signal in this method, and it is desirable to obtain a minimal error
value because the equalizer is qualified previously. If the error achieved is
non-converged with considerable error, then the equalizer is prepared.
2. This stage illustrates the filter’s impulse response that provides the relationship
among the equalized and received noisy signal. The filter forms typically utilized
are transversal or FIR and IIR filter.
3. This phase illustrates the considerations of the filter which could be modified to
balance the channel. Such considerations are the mass values of the filter response
and the values for which are altered adaptively to have an excellent relationship
among the desired and obtained signal.
4. This phase illustrates the adaptive algorithm utilized that adapt the factor under
the framework of the filter to describe the output and input connection for each
iteration it manages.
Because of an active medium, the filtering method has to adjust to a specific envi-
ronment to decrease the undesirable signal distortion [19–23]. The concept following
an adaptive filter is essentially an uninterrupted modification of the filter’s handling
kernel that is dependent on a specific standard. Ordinarily, the important require-
ment is to accommodate the yield of the framework x(n) to coordinate the reference
signal d(n). The adaptive element begins from the error value, which is attained as
Performance Analysis of Self Adaptive Equalizers … 503

the contrast among reference input and output. In Fig. 4, the channel works in an
ordinary manner in which an input pulse is handled by the channel and is sent to the
output. Figure 4 illustrates a streamlined model of an adaptive filter [24, 25].

Error Signal = (required Signal) − (attained signal) (2)

Considering this aspect, the adaptive secondary system becomes an endeavour to


enhance the channel, framing a sort of feedback loop among input and output by
means of the mechanism of the adaptation.

4 Problem Formulation

In case the step size is enormous, it is realized that the merging rate of the LMS
algorithm will be dissolute, yet the consistent state MSE i.e. mean square error will
improve. Then again, if the step size is little, the consistent state MSE will be little,
yet the convergence rate will be moderate. In this way, the step size gives a trade-off
among the convergence rate and the consistent state MSE of the LMS algorithm.
The one way to increase the efficiency of the LMS algorithm is to make the step
size variable as opposed to fixed which leads to VSSLMS algorithms. By using this
methodology, both a fast convergence rate and a little consistent state MSE can be
achieved. The step size should satisfy the condition:
0 < step-size < 1/(max Eigenvalue of the input auto-correction matrix)
For fast convergence, step-size is set close to its maximum allowed value.

5 Performance Criteria

The performance of the LMS adaptive filter is explained in three methods, one is the
sufficiency of the FIR filter, the second one is the speed of the convergence of the
system, and finally the misadjustment in steady-state.

5.1 Speed of Convergence

The rate at which the coefficients approach their ideal qualities is known as the speed
of convergence. The speed of convergence improves as the estimation of the step size
is expanded, up to step sizes close to a one-a large portion of the most maximal value
necessary for the steady activity of the framework. This outcome can be acquired
from a cautious examination for various kinds of the input signal and relationship
measurements. For normal signal situations, it is seen that the speed of convergence
504 N. Shwetha and M. Priyatham

of the abundance MSE diminishes for huge enough advance size qualities. The speed
of convergence declines as the length of the filter is expanded. The most extreme
conceivable speed of convergence is restricted by the biggest advance size that can
be selected for solidity for related input signal less significant than a large portion of
the greatest qualities once the input signal is reasonably associated.

6 Formulation of LMS Algorithm

The LMS i.e. least mean squares algorithm is one of the most famous algorithms in
adaptive handling of the signal. Because of its robustness and minimalism has been
the focal point of much examination, prompting its execution in numerous appli-
cations. LMS algorithm is a linear adaptive filtering algorithm that fundamentally
comprises of two filtering procedure, which includes calculation of a transverse filter
delivered by a lot of tap inputs and creating an estimation error by contrasting this
output with an ideal reaction. The subsequent advance is an adaptive procedure,
which includes the programmed modifications of the tap loads of the channel as
per the estimation error. The LMS algorithm is additionally utilized for refreshing
channel coefficients. The benefits of LMS algorithm are minimal calculations on
the sophisticated nature, wonderful statistical reliability, straightforward structure
and simplicity of usage regarding equipment. LMS algorithm is experiencing prob-
lems regarding step size to defeat that EP i.e. evolutionary programming is utilized.
Figure 5 shows the block diagram of a typical adaptive filter
where
x(n) is the input signal to a linear filter
y(n) is the corresponding output signal
d(n) is an additional input signal to the adaptive filter
e(n) is the error signal that denotes the difference between d(n) and y(n)

Fig. 5 Typical adaptive filter


Performance Analysis of Self Adaptive Equalizers … 505

In the case of Linear filter, it can be of different types, namely the FIR or it can be
IIR. The coefficients of linear filter iterations are adjusted by the adaptive algorithm
to minimize the power of e(n). It also adjusts the coefficients of the FIR filter and
includes the recursive least square algorithm. The LMS algorithm performs some of
the operations to appraise the coefficient of an adaptive FIR filter. They are noted
below.
1. Calculates the output signal y(n) from the FIR filter.

y(n) = u T (n) · w(n)




where u(n) is the filter input vector and u(n) = [x(n)×(n −1) . . . x(n − N +1)]T
 T

w(n) 
is the filter coefficients vector and w(n) = w0 (n)w1 (n) . . . w N −1 (n)
2. Calculates the error signal e(n) by using the following equation: e(n) = d(n) −
y(n)
3. Updates the filter coefficients by using the following equation:

 + 1) = (1 − µc) · w(n)
w(n  + µ · e(n) · u(n)

where

µ is the step size of the adaptive filter


w(n) is the filter coefficients vector
u(n) is the filter input vector.

7 Evolutionary Programming

Evolutionary algorithms are stochastic search methods and not the deterministic
ones. In 1960, Lawrence J. Fogel utilized the evolutionary programming in the US to
utilize modelled evolution as an educational procedure which is seeking to create AI.
The previous existing methods like linear programming, calculus-based methods for
example Nutenian method are having the difficulties in delivering the global solution.
They are tending a stuck in the local solution. To overcome this problem nature
inspiration computation can be applied. In this approach, some characteristics that
are available in nature is taken as a reference to develop the mathematical model. This
mathematical model will utilize to discover the solution to the problem. In this paper,
the characteristics of nature are taken as evolution. This is one of the most success
full characteristics available in nature where the things evolved (the things changed)
with the time to adapt the environment in result betterment in fitness value hence,
the chances of survival are high. For example, the transformation from monkey to a
human. A mathematical model based on the evolution is referred to as evolutionary
computation.
506 N. Shwetha and M. Priyatham

8 EPLMS Algorithms

Based upon natural evolution a mathematical model called evolutionary computation


has created. In nature, the things change from one time to others to increase its
fitness so that chances of survival could be better. For example human evolution.
The Algorithm steps are as follows
1. At the beginning random step size is defined as the population.
2. With respect to each step size apply the LMS and get its corresponding error
value (fitness).

e(n) = d(n) − x T (n)∗ W (n) (3)

where, e(n) is the deviation error, d(n) is the expected output value, x(n) is the
input vector at sampling time ‘n’ and W (n) is coefficient vector.
3. A step size having the minimum error select it with respect to current sample
point.
4. With the selected step size LMS applied to get the coefficient value.

W (n + 1) = W (n) + µe(n)x(n) (4)

5. As the new input sample appears, from the previous generation a new population
of step size is created in EP and procedure repeated.

9 Simulated Results

MATLAB 2014b was utilized to implement the modelling and subsequent results
were shown in this portion. The ability of the recommended structures was deter-
mined in accordance with the conditions of its convergence nature as described in
figures. The GUI model shows the steps involved in implementing EPLMS algorithm
is described in Fig. 6.
To see the performance of Evolutionary Programming LMS algorithms (EPLMS)
for any channel, here 11 taps are selected for an equalizer. 500 samples are considered
in the Input signal; through uniform distribution, the values are generated randomly
as shown in Fig. 7. The Gaussian noise is having zero mean and has 0.01 standard
unconventionality additional with the input signal shown in channel features is given
by the vector:

[0.05 − 0.063 0.088 − 0.126 − 0.25 0.9047 0.25 0 0.126 0.038 0.088]

This is the randomly generated input signal consists of 500 samples. This signal
transfer in a bipolar form (+1, −1). To make the system more complex random infor-
mation generated between +1 and −1. This makes the information unpredictable at
the receiver side.
Performance Analysis of Self Adaptive Equalizers … 507

Fig. 6 Overall GUI module

Fig. 7 Generated input signal and signal with noise from channel
508 N. Shwetha and M. Priyatham

Figure 8 shows the MSE error plot using the LMS with fixed step size values 0.11,
0.045, and 0.0088. With observation, it is clear that for the different step size values
the performance is different. i.e., they have different convergence characteristics
along with a different quality of convergence. And also, it is very tough to identify
the optimum step size value.
Figure 9 shows the comparative MSE error plot using LMS and EPLMS. From
this, an observation can be made that the error with the EPLMS algorithm is reduced
when compared to the LMS algorithm.
Figure 10 shows the generated input signal, Signal from the channel after the addi-
tion of noise, Signal from equalizer and signal recovered after decision respectively.

Fig. 8 Fixed step size performance of LMS with step size equal to 0.11, 0.045 and 0.0088

Fig. 9 Comparative error plot using LMS and EASLMS algorithm


Performance Analysis of Self Adaptive Equalizers … 509

Fig. 10 Original signal, signal from channel with noise, signal from equalizer (EPLMS), the
recovered signal after a decision

Observation depicts that the EPLMS algorithm is very much efficient in providing
noise free information and also reduced bit error rate and Minimum mean square
error and EPLMS algorithm have proven to be the best algorithm in adaptive signal
processing.

10 Conclusion

Bandwidth-effective data transfer through radio and telephone channels has been
made through the usage of adaptive equalization to counteract for dispersal of time
launched by the channel. Stimulated by useful applications, a constant research
attempt over the past two decades has been producing a wealthy body of fiction
in adaptive equalization and the associated more common disciplines of a function
of system identification, adaptive filtering, and digital signals. This article provides
510 N. Shwetha and M. Priyatham

a summary of the adaptive equalization. In our design, since evolutionary program-


ming are being used, it will decide what would be the value of step size for a partic-
ular application so that Mean Square Error(MSE) is minimized and convergence is
optimal. And also, faster convergence is obtained. With the layout of the obtaining
filters, the impact of Intersymbol Interference can be reduced. Consequently, we can
enhance the effectiveness of an interaction system.

References

1. Dey A, Banerjee S, Chattopadhyay S (2016) Design of improved adaptive equalizers using intel-
ligent computational techniques: extension to WiMAX system. In: 2016 IEEE Uttar Pradesh
section international conference on electrical, computer and electronics engineering (UPCON).
IEEE
2. Gupta S, Basit A, Banerjee S (2019) Adaptive equalizer: extension to communication system.
Int J Emerging Trends Electron Commun Eng 3(1). ISSN:2581-558X (online)
3. Ghosh S, Banerjee S (2018) Intelligent adaptive equalizer design using nature inspired algo-
rithms. In: 2018 second international conference on electronics, communication and aerospace
technology (ICECA). IEEE
4. Dorigo M, Birattari M, Stutzle T (2006) Ant colony optimization. IEEE Comput Intell Mag
1(4):28–39
5. Shin H-C, Sayed AH, Song W-J (2004) Variable step-size NLMS and affine projection
algorithms. IEEE Signal Process Lett 11(2):132–135
6. Pradhan AK, Routray A, AbirBasak (2005) Power system frequency estimation using least
mean square technique. IEEE Trans Power Deliv 20(3):1812–1816
7. Banerjee S, Chattopadhyay S (2016) Equalizer optimization using flower pollination algorithm.
In: 2016 IEEE 1st international conference on power electronics, intelligent control and energy
systems (ICPEICES). IEEE
8. Praliya S (2016) Intelligent algorithm based adaptive control of nonlinear system. Dissertation
9. Sun L, Bi G, Zhang L (2005) Blind adaptive multiuser detection based on linearly constrained
DSE-CMA. IEE Proceed Commun 152(5):737–742
10. Schniter P, Johnson CR (1999) Dithered signed-error CMA: robust, computationally efficient
blind adaptive equalization. IEEE Trans Signal Process 47(6):1592–1603
11. Xiao Y, Huang B, Wei H (2013) Adaptive Fourier analysis using a variable step-size LMS
algorithm. In: Proceedings of 9th international conference on information, communications &
signal processing. IEEE, Dec 2013, pp 1–5
12. Wang Y-L, Bao M (2010) A variable step-size LMS algorithm of harmonic current detec-
tion based on fuzzy inference. In: 2010 The 2nd international conference on computer and
automation engineering (ICCAE), vol 2. IEEE
13. Xiao Y, Huang B, Wei H (2013) Adaptive Fourier analysis using a variable step-size LMS
algorithm. In: 2013 9th International conference on information, communications & signal
processing. IEEE
14. Eweda E (1990) Analysis and design of a signed regressor LMS algorithm for stationary
and nonstationary adaptive filtering with correlated Gaussian data. IEEE Trans Circ Syst
37(11):1367–1374
15. Sethares WA, Johnson CR (1989) A comparison of two quantized state adaptive algorithms.
IEEE Trans Acoust Speech Signal Process 37(1):138–143
16. Hadhoud MM, Thomas DW (1988) The two-dimensional adaptive LMS (TDLMS) algorithm.
IEEE Trans Circ Syst 35(5):485–494
17. Haykin SS (2005) Adaptive filter theory. Pearson Education India
Performance Analysis of Self Adaptive Equalizers … 511

18. Rao RV, Savsani VJ, Vakharia DP (2011) Teaching–learning-based optimization: a novel
method for constrained mechanical design optimization problems. Comput-Aided Des
43(3):303–315
19. Kennedy J (2006) Swarm intelligence. In: Handbook of nature-inspired and innovative
computing. Springer, Boston, MA, pp 187–219
20. Meng H, Ll Guan Y, Chen S (2005) Modeling and analysis of noise effects on broadband
power-line communications. IEEE Trans Power Deliv 20(2):630–637
21. Varma DS, Kanvitha P, Subhashini KR (2019) Adaptive channel equalization using teaching
learning based optimization. In: 2019 International conference on communication and signal
processing (ICCSP). IEEE
22. Gibson JD (ed) (2012) Mobile communications handbook. CRC Press
23. Garg V (2010) Wireless communications & networking. Elsevier
24. Palanisamy R, Verville J (2015) Factors enabling communication based collaboration in inter
professional healthcare practice: a case study. Int J e-Collab 11(2):8–27
25. Hassan N, Fernando X (2019) Interference mitigation and dynamic user association for load
balancing in heterogeneous networks. IEEE Trans Veh Technol 68(8):7578–7592
Obstacle-Aware Radio Propagation
and Environmental Model for Hybrid
Vehicular Ad hoc Network

S. Shalini and Annapurna P. Patil

Abstract The presence of physical obstacles between communicating vehicles will


significantly impact the packet transmission performance of the vehicular ad hoc
network (VANET). This is because the line-of-sight link among transmitter and
receiver is frequently being obstructed. Very limited work is carried out addressing
the impact of the presence of multiple obstacles in the line of sight (LOS) of
communicating vehicle on different environment condition such as urban, rural, and
highway. Further, very limited work is carried out in incorporating obstructing effect
in designing medium access control (MAC) under multichannel VANET communi-
cation environment, resulting in packet collision and affecting system throughput
performance. Thus, an efficient MAC design that maximizes throughput with a
minimal collision is required. For overcoming research challenges, an obstacle-aware
radio propagation (OARP) model for the different environment such as urban, rural,
and highway is presented. Along with, a distributed MAC (DMAC) that maximize
system throughput is presented. Experiment outcome shows proposed OARP-DMAC
model attain significant performance over existing radio propagation and MAC model
in terms of throughput and packet collision.

Keywords Environment model · Hybrid network · Medium access control · Radio


propagation · VANET · Vehicle of obstructing effects

1 Introduction

With the significant growth and requirement for provisioning smart transportation
systems, vehicles in current days are embedded with various hardware and smart
devices such as sensors and camera. Building smart intelligent transport system aids
in providing seamless mobility, safe journey, more enjoyable, and improved user

S. Shalini (B) · A. P. Patil


RIT Bangalore, Bangalore, India
e-mail: shalini.siddamallappa@gmail.com
A. P. Patil
e-mail: annapurnap2@msrit.edu

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 513
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_38
514 S. Shalini and A. P. Patil

experience on the go [1]. Moreover, the vehicle is expected to be connected every-


where with prototyping of fifth-generation (5G) communication network. Commu-
nication of vehicles in a 5G network is described as vehicle-to-everything (V2X) [2].
The V2X communication allows the vehicle to communicate with pedestrian (i.e.,
vehicle-to-pedestrian (V2P)), with other vehicle (i.e., vehicle–vehicle (V2V)), and
with roadside unit (RSU) among each other (i.e., vehicle-to-infrastructure). Thus,
makes VANET safe, smart, and efficient [3].
The IEEE 802.11-based cellular network is one of the widely used communi-
cation protocols for the vehicle to communicate among different entity such as
RSU, vehicle, and pedestrian. This allows a VANET device to access the Internet
through LTE. Nonetheless, these communication networks cannot cope with ever-
increasing vehicle density and packet load. Further, the data size is expected to
grow exponentially and make the data transmission even more challenging [4].
Recently, various alternative communication protocols such as dedicated short-range
communication (DSRC) and Wi-Fi have been used in VANET communication.
The heterogeneity nature of VANET brings new research problems and challenges.
Thus, various researchers are exploring a hybrid framework for supporting effi-
cient and reliable vehicular ad hoc network correspondences [5, 6] provided an
outline in building hybrid vehicular ad hoc network combining both cellular and
DSRC network. [7] evaluated the performance achieved by heterogeneous vehicular
communication comprising LTE, DSRC, and Wi-Fi network. However, the major
drawback is that it can access only one channel at a given session instance, which
leads to improper system resource utilization. Utilizing resource more efficiently [8]
presented software-defined network (SDN)-based vehicular ad hoc network frame-
work. In [9], the benefit of using SDN for providing flexible communication was
shown. Further, they introduced different features and services into vehicular ad hoc
network. Nonetheless, both [8, 9] did not address the issues in modeling realistic
and practical radio propagation model. This paper focuses on addressing the radio
propagation issues in VANET environment.
In general, the two major features of VANET correspondence are projected to cater
to future smart vehicular communication environment [10–12]. From one perspec-
tive, it can provide collision avoidance and vehicle location localization for enhancing
vehicle maneuvering safety by sharing channel characteristic dynamic information
in real time. On other perspectives, it offers ideal communication between devices
for efficiently transmitting packets. Both the services are dependent on the quality
of radio signal and channel conditions, which differs with different radio prop-
agating environment [10]. Further, in VANET, the vehicle moves at high speed
with dynamic mobility pattern, thus requires efficient and reliable wireless link
for ensuring precision and stability of real-time information [13]. Additionally, the
remote correspondence interface exceptionally depends upon the radio channel char-
acteristics, which are influenced by the kind of radio propagating environment [10].
Consequently, understanding VANET radio propagation channel characteristics is
significant, particularly under real-time traffic conditions.
Obstacle-Aware Radio Propagation and Environmental Model … 515

The state-of-the-art radio propagation models are divided into two classes. Few
approaches have focused on addressing the delay constraint by increasing propa-
gating speed. Rest of the approaches has focused on establishing a reliable prop-
agating route in the vehicular ad hoc network. However, the majority of these
approaches have considered that if vehicles are within association range can commu-
nicate among each other and if not they cannot communicate with each other. Further,
the presence of a bigger vehicle in a line of sight (LOS) of transmitter and receiver
will induce significant overhead of effective coverage for transmitting information,
because receiver will experience decreased received signal powers. Along these lines,
the receiver cannot decode the information successfully [14] as they do not consider
vehicles obstructing effect (VOE) into consideration. Thus, state-of-the-art conven-
tions will experience the ill effects of the broadcasting hole (BH) issue: few vehicles
within association range cannot receive the broadcasted message from the nearest
device (i.e., both source and hop device) with enough signal power. A communi-
cating device inside the zone of BH will fail in decoding information and will not
have any knowledge of the current dynamic traffic condition. Thus, these devices
consequently become potential casualties of vehicle collisions.
For addressing the above-discussed problems, this work describes transmission
efficiency (TE) (i.e., the additional attenuation power required) for estimating the
influence of vehicles obstructing effect on a different channel and environment
condition. The TE can be established in a heuristic manner by taking the ratio of
total vehicle density that obtain information with no error to the overall vehicle
density within correspondence range of source vehicle considering moving at a
certain speed and density. This paper presents the obstacle-aware radio propaga-
tion model considering VOE for different environmental conditions. Further, the
work presented in distributed MAC design maximizes vehicular ad hoc network
throughput with minimal collision under a multichannel environment.
The highlight of the work is discussed below:
• Presents obstacle-aware radio propagation model for different environment
condition such as urban, rural, and highway.
• Modeled distributed MAC that maximizes system throughput of vehicular ad hoc
network.
• Experiment outcome shows the proposed distributed MAC design archives better
outcome when compared with existing MAC design with better throughput and
less collision.
The organization of the research work is as follows. In Sect. 2, the literature survey
of various existing radio propagation and environmental is discussed. In Sect. 3, the
obstacle-aware radio propagation and environmental model are proposed. The result
and discussion are discussed in Sect. 4. Lastly, the research is concluded and the
future direction of research work is discussed.
516 S. Shalini and A. P. Patil

2 Literature Survey

This section discusses the various existing radio propagation model presented for
improving communication performance of vehicular ad hoc network under different
environment and network conditions. In [15], the author focused on highlighting the
physical obstacle and further it is observed that vehicles have a large impact of safety
information on optimized propagations in VANET through the continuous obstruc-
tion of links among two communicating devices. Moreover, obstructing effect incur
various impact on road safety and it diminishes effective coverage regarding safety-
related information; however, so far it is not addressed in efficient manner. Here, at
the first broadcast definition as a metric is highlighted then optimization of mitigation
problem on safety-related information is extensively investigated and to overcome
such issue graph theory optimization technique is adopted. Maximum broadcast-
efficient relaying (ER) aka MBER algorithm is developed for distributable optimiza-
tion in VANET, and MBER helps in maximizing operative information coverage, also
it tries to meet certain requirement such as reliability constraint through incorporating
propagation distance and broadcast efficiency in the relay contention. Furthermore,
algorithm is evaluated and it is observed that the MBER tries to promote the effec-
tive coverage of information (safety-related) through varying vehicular distribution
in vehicular ad hoc networks. In [16], the author focused on properties of V2V radio
channel characteristics of ramp conditions following the different structure; more-
over, ramps are divided into the various construction structures. The first structure
is defined bridge ramp along with soundproof walls in the urban area, second is
general ramp without any sound-proof walls is given sub-urban region. Moreover,
the whole propagation process of the radio signal is parted into different propagation
zones while considering the line of sight (LOS); further, the propagation charac-
teristics include shadow fading, propagation path loss, RMS delay spread, average
fade duration, level crossing rate, fading distribution, and fading depth, and these
characteristics are estimated. Furthermore, in accordance with different characteris-
tics, various ramp conditions are compared and the following observation is carried
out. (1) In urban bridge ramp condition, an abrupt fluctuation indicates the signifi-
cance of soundproof walls in radio channel of vehicle-to-vehicle communication. (2)
Frequent change in received signal strength parameter and various fading parameter
in different propagation environment are observed in ramp scenario of sub-urban
environment. Moreover, statistical features are optimized and fixed through certain
generalization error parameter; hence, propagation path loss is exhibited through
demonstration of path loss parameter differences in a given operating environment.
In [17], the author tries to achieve reliable communication; hence, it is observed
that features of the wireless channel need to be analyzed properly. In here author
mainly focuses on radio channel characteristics of V2V which is primarily based
on 5.9 GHz under overtaking scenario; further, they are analyzed through empir-
ical results under four environment and network conditions. However, the primary
concern is channel characteristics difference among non-overtaking and overtaking
scenarios; hence, here they divided the non-overtaking and overtaking points based
Obstacle-Aware Radio Propagation and Environmental Model … 517

on small-scale fading distribution and further it is observed that the value of average
fade distribution and root-mean-square delay spread are significantly high when
compared with non-overtaking scenarios; however, level crossing rate and root-mean-
square Doppler spreads are lesser than non-overtaking conditions. Moreover, [18]
considered variation in the velocity of communicating vehicles; the further generic
model was designed considering the various parameters such as path powers, path
delays, arrival angle, departure angle, and Doppler frequencies; these parameters are
analyzed and simplified through Taylor’s series. They aimed modeling mechanism
which can be applied for real-time vehicle-to-vehicle communication and further
explicitly reveals velocity variation impact on channels. In [19], the author designed
3D model of a stochastic model which was irregular in shape and based on the
geometry for vehicle-to-vehicle communication scenarios; here, multiple inputs and
multiple output mechanisms were used at transmitting device. Further, geometric
path length which is time variant is developed for capturing non-stationary; this non-
stationary is caused by transmitting and receiving device. Moreover, it is observed
that the author focuses on investigating the impact of relative moving time and direc-
tions of respective channel state information. Similarly, [20] observed that multipath
component in dynamic clusters is not modeled ideally in the existing model; hence,
in here multipath component clusters distributions for both horizontal and a vertical
dimension are considered. Here, expectation maximization algorithm is introduced
for extracting multipath component and further for identification and tracking is
carried through developing clustering and tracking methodologies. Moreover, MPC
clusters are parted into two distinctive categories, i.e., scatter cluster and global
cluster; the cluster distribution is further categorized through various inter- and intr-
acluster parameters. However, it is observed that elevation spread and azimuth spread
both follow a lognormal distribution.
From the survey, it is seen the number of radio propagation model has been
presented considering different scenarios considering the presence of an obstacle
and environmental conditions. The 3D geometric model is very efficient in modeling
VOE more efficiently. However, induce higher computation overhead. Further, the
number of 2-way and 3-way knife edge model has been presented addressing large-
scale fading issues, however did not address the small-scale fading issues under
varied environment scenarios. Along with, very limited work is carried out designing
distributed MAC employing VOE. For overcoming research issues in modeling VOE
under a different environmental condition in the next section, this work presents radio
propagation and distributed MAC model for different environment condition such as
urban, rural, and highway.
518 S. Shalini and A. P. Patil

3 Obstacle-Aware Radio Propagation and Environmental


Model for Hybrid Vehicular Ad hoc Network

This section presents obstacle-aware radio propagation (OARP) model for dynamic
environment condition such as urban, rural, and highway. Let us consider a set of a
vehicle of different size moving in different region as shown in Fig. 1.
Let us assume that each vehicle has a homogenous communication radius which
is described by notation S y and these vehicles can communicate with one RSU
or vehicles at given instance of time. Then, each vehicle transmits H number of
packets with the size of N and pass through radio propagation environment with a
set of vehicle A = {1, . . . , A}. Let M describe the average size of vehicles (i.e.,
average vehicles arrival rate within the coverage area) that is passing through a
radio propagation environment with the Poisson’s distribution. The vehicle speed
and density are described by u and l, respectively. The vehicle speed considering
certain vehicle density l can be obtained using the following equation
 
l
u = uk 1− , (1)
l↑

where u k depicts the speed of vehicles under Poisson’s distribution and l↑ is the
maximum feasible vehicle density in a radio propagation environment. Therefore,
the M can be estimated using the following equation

M = lu. (2)

The maximum amount of vehicle P that can be allowed by certain vehicle or RSU
y can be obtained utilizing floor function using the following equation
 
P↑,y = 2S yl↑ , ∀y ∈ A. (3)

Fig. 1 Obstacle-aware radio propagation model


Obstacle-Aware Radio Propagation and Environmental Model … 519

For improving packet transmission under dynamically changing environmental


condition, a distributed medium access control (DMAC) scheduling algorithm is
presented. In DMAC scheduling algorithm, the time is segmented into equal slot
size δn. The overall time that a vehicle will be within a coverage area of yth RSU or
vehicle is described using the following equation
 
N y = 2S y /uδn (4)

For computing N th slot time when the vehicle will be in the communication
region of neighboring vehicle can be obtained using the following equation


y−1
 
V(y, N ) = N x + N , ∀N ∈ 1, . . . , N y , (5)
x=0

Where N0 = 0. The time line representation of time slots in yth the device is
described using the following equation


N y = V(y, 1), . . . , V y, N y (6)

Further, for maximizing resource utilization of VANET, the slots are selected that
maximizes the system throughput. Let the slot assignment decision be exy and the
throughput attainable by each vehicle X in a vehicular ad hoc network is S X . Here,
e X N is set to 1 provided slot N is assigned to the vehicle. Similarly, if not slot is
assigned to a vehicle, e X N is set to 0. Therefore, the throughput gain problem is
described using the following equation


R
max SX . (7)
E
X

where R depicts the overall vehicle size in VANET. Further, slot assignment
constraint is described using the following equation


R
e X N = 1 ∀y. (8)
X

Thus, this paper computes the attainable throughput of vehicle X on slot assign-
ment below. Let VX describes the slots allocated to vehicle x and let l X N describes
the probability that slot N is reachable by vehicle X . For simplicity, this paper
assumes that l X N does not rely on each other. As a result, the S X is estimated using
the following equation
520 S. Shalini and A. P. Patil

T
e X N
 
SX = 1 − lXN = 1 − lXN (9)
N ∈VN N =1


where 1 − N ∈VX l X N depicts the probability that at least one slot is available for

each vehicle X Then, the parameter l X N depicts the probability that slot N is not
reachable for vehicle X is computed using the following equation

l X N = 1 − l X N (10)

This is because every vehicle can at least use only one assigned slots, its maximum
throughput achievable will be 1 under different radio propagation environment
considering certain data rate.
The different environment has different shading, path loss, and shadowing compo-
nent. Thus, it is important to model such an effect for improving packet transmission
performance. This work addresses the issues of path loss component on channel
attenuation. Let us consider for a given slot time n the bandwidth can be estimated
using the following equation

cn = C log2 G/P0 Crnα + 1 , (11)

where C depicts the bandwidth of vehicular ad hoc network, G depicts the commu-
nication power, P0 power spectral density with zero Gaussian noise, rn depicts the
distance between communicating vehicles at time slot n and α depict the path loss
component. For evaluating α in Eq. (11), as described in [21] log normal shadowing
model and path loss component considering signal-to-noise ratio (SNR) (r )dB with
distance r apart from the sender, the receiver can be obtained using the following
equation

α(r )d B = Pt − PL(r0 ) − 10n log10 (r/r0 ) − Xσ − Pn (12)

where Pt depicts the power required for processing packet PL(r0 ) depicts the path
loss component at a certain distance apart r0 , Xσ depicts the zero mean Gaussian
random parameters with standard deviation σ , Pn depict the noise level in decibel
watt. Further, this work considers VOE for modeling channel [22, 23] for improving
log normal shadowing models. This paper considers neighboring device as VOE
for modeling obstacle-aware radio propagation model. In the OARP model first, the
vehicle that would get affected by VOE between transmitting vehicle x and receiving
vehicle y is described as obtProbAff(x, y). If the distance between VOE of vehicle
x and vehicle y is higher than that of those in the middle of VOE vehicle, in that case,
the vehicle is said to be probably obstructing. Second, the vehicle that obstructs the
VOE between vehicle x and vehicle y are chosen from a selected probable candidate
of obstructing vehicle established in previous round are described using following
notation obtLOSaff([ProbableAff]). Further, it must be seen that the transmitted
signal may get obstructed because of obstructing effects of Fresnel zone ellipsoid
Obstacle-Aware Radio Propagation and Environmental Model … 521

(FZE) by vehicles which are estimated using the following equation.



raff
z = z y − zx + z x − 0.6sk + z t (13)
r
where raff depicts the distance between the obstructing vehicles and transmitting
vehicles z x and z y depicts the height parameter of transmitter vehicle x and receiver
vehicle y, z t depicts the vehicle antenna height, r depicts the distance between trans-
mitter vehicle and receiver vehicles and sk depicts the value for obtaining main FZE
using following equation.

sk = W raff (r − raff )/r (14)

where W depicts the wavelength. Finding the height of entire possible obstructing
vehicles before carrying out communication plays a significant part before transmit-
ting packets. Further, it is noted a vehicle will obstruct vehicle x and y provided if
z is greater than that of its height. Thus, the probability of VOE between vehicle x
and vehicle y is estimated using the following equation.

L LOS|z x , z y = 1 − Q(z − ϕz /ωz ) (15)

where L depicts the probability of VOE by vehicle (i.e., obstacle) among transmitter
vehicle and receiver vehicle, ϕz depict the mean of amplification of obstructing
vehicles and ωz depicts a standard deviation of amplitude of the obstructing vehicles,
Q(·) depicts Q function.
Finally, the amplified attenuation needed of signal power obtained is estimated for
VOE of obstructing vehicle in a prior round is established utilizing following notation
obtAttenuation([AffDevices]). This work uses multiple knife edge (MKE) models.
Using MKE a candidate of VOE vehicle is obtained and base on the distance and
height the attenuation is optimized. The OARP model computation for amplifying
attenuation between transmitter vehicle x and receiving device y considering the
presence of multiple obstacles because of neighboring vehicles are described in the
flow diagram in Fig. 2.
The proposed obstacle-aware radio propagation and distributed MAC model attain
significant performance when compared with the existing model under different
environment condition which is experimentally proved in the next section below.

4 Results and Analysis

This section presents a performance evaluation of the proposed obstacle-aware radio


propagation model under different environmental conditions such as urban, rural, and
highway. The SIMITS simulator [24] which is an extension of [25] is been used for
evaluating the performance of the proposed obstacle-aware radio propagation model
522 S. Shalini and A. P. Patil

Fig. 2 Flow diagram of obstacle-aware radio propagation modeling

under different environmental conditions. Further, the performance of obstacle-aware


distributed MAC is evaluated over the existing MAC model [24, 26] considering
packet collision and network throughput.
1. Performance evaluation of obstacle-aware radio propagation model under a
varied environmental condition such as urban, rural, and highway:
This section evaluates the collision and throughput performance achieved by
obstacle-aware radio propagation model under different environmental condition.
For experiment analysis, the vehicle moves at a speed of 3 m/s and a total of 40
vehicles is considered. Figure 3 shows the throughput performance achieved by
obstacle-aware radio propagation model for urban, rural, and highway environment.
From result achieved it is seen, higher throughput is achieved by the urban environ-
ment, followed by highway environment, and rural environment achieves the least
throughput when compared with urban and highway environment. Figure 4 shows
the collision performance achieved by obstacle-aware radio propagation model for
urban, rural, and highway environment. From result achieved it is seen, lesser colli-
sion is achieved by an urban environment, followed by highway environment, and
rural environment achieves the least throughput when compared with urban and
highway environment. No prior work such as [24, 26] considered such performance
evaluation considering the different environmental condition. Thus, the proposed
obstacle-aware radio propagation environment can be used to simulate in more
realistic environmental conditions as described in [16] (Table 1).
Obstacle-Aware Radio Propagation and Environmental Model … 523

Throughput performance attained under different environment


30 condition
Throughput (Mbps)

Throughput per channel_Urban


20 Throughput per channel_Highway
Throughput per channel _Rural
10

0
1 6 11 16 21 26 31 36
Simulation time (S)

Fig. 3 Throughput performance attained under varied environmental condition

Collision Achieved for Varied Environment


Number of packet collided

400 Collisions _Urban Collisions_Highway


300 Collisions_Rural

200

100

0
1 6 11 16 21 26 31 36
Simulation time (S)

Fig. 4 Collision performance attained under varied environmental conditions

Table 1 Simulation
Simulation parameter used Configured value
parameter used for
experiment analysis Vehicular ad hoc network size 50 m * 50 m
Number of vehicles 20–60
Modulation scheme Quadratic amplitude
modulation-64
Mobility of devices 3 m/s
Coding rate 0.75
Bandwidth 27 Mb/s
Number of channels 7
Number of time slot 8 µs
Message information size 20 bytes
Medium access control type used TECA, DMAC
524 S. Shalini and A. P. Patil

Throughput achieved 30 Throughput performance considering varied vehicle


per channel (Mbps)
Proposed Model Existing Model
20

10

0
20V 40V 80V
Number of vehicles

Fig. 5 Throughput performance attained by proposed distributed MAC under denser environmental
condition considering the varied size of vehicles

2. Performance evaluation of proposed distributed MAC over existing MAC under


denser environment condition considering varied vehicles:
This section evaluates the collision and throughput performance achieved by
proposed distributed MAC model over existing MAC model [24, 26] under dynamic
obstacle-aware radio propagation and environmental condition such as urban, rural,
and highway. Here, the vehicle moves at a constant speed of 3 m/s over urban,
followed by rural, and then through highway segment and throughput and collision
achieved is noted considering different vehicle size of 20, 40, and 80. Figure 5 shows
the throughput achieved by the proposed DMAC over existing MAC. From the result
it is seen as vehicle size is increased the throughput is improved for both proposed and
existing MAC. That is when vehicle size is 20, 40, and 80; the throughput obtained by
the existing model is 5.2075 mbps, 10.34 mbps, and 15.877 mbps, respectively. Simi-
larly, when vehicle size is 20, 40, and 80, the throughput obtained by the proposed
DMAC model is 8.598 mbps, 13.6 mbps, and 19.49 mbps, respectively. Thus, the
proposed DMAC model improves throughput by 39.43%, 23.96%, and 18.53% over
the existing model when vehicle size is 20, 40, and 80, respectively. Therefore,
an average throughput improvement of 27.31% is achieved by DMAC over existing
MAC model under dynamic radio propagation and environment condition. Further, it
is seen the DMAC model achieved much better throughput outcome when compared
with existing MAC irrespective of vehicle size. Figure 6 shows the collision achieved
by proposed DMAC over existing MAC. From the result it is seen as vehicle size
has increased the collision is increased for both existing and proposed MAC. That
is when vehicle size is 20, 40, and 80; the collision incurred by the existing model
is 11 packets, 45 packets, and 141 packets, respectively. Similarly, when vehicle
size is 20, 40, and 80, the throughput obtained by the proposed DMAC model is 6
packets, 33 packets, and 125 packets, respectively. Thus, the proposed DMAC model
improves throughput by 45.45%, 26.66%, and 11.35% over the existing model when
vehicle size is 20, 40, and 80, respectively. Therefore, an average collision reduc-
tion of 27.82% is achieved by DMAC over existing MAC model under dynamic
radio propagation and environment condition. Further, it is seen the DMAC model
Obstacle-Aware Radio Propagation and Environmental Model … 525

200
Collision performance considering varied vehicle
packet collided
Number of

Proposed Model

0
20V 40V 80V
Number of vehicles

Fig. 6 Collision performance attained by proposed distributed MAC under denser environmental
condition considering the varied size of vehicles

achieved much better collision outcome when compared with existing MAC irre-
spective of vehicle size. The significant throughput and collision reduction achieved
using proposed DMAC model under dynamical environmental condition are because
slots are assigned to a vehicle that maximizes throughput using Eq. (7) and the band-
width is optimized in Eq. (11) based on signal-to-noise ratio considering obstructing
effect of among communicating device. On the side, in the existing model, the slot
is assigned to vehicle based on resource availability. Thus, a vehicle cannot maxi-
mize system throughput. Further, they consider simple attenuation model without
considering multiple obstructing deceive in LOS among statically associating device.
However, the obstructing effects in real time vary significantly. Thus, require dynamic
obstructing effects measurement model. Thus, from the result it can be seen they
induce high packet loss. Thus, from the result achieved, the proposed DMAC can
be concluded as robust in nature under varied vehicle density and radio propagation
environment as it brings good tradeoffs between reducing collision and improving
throughput.
3. Results and discussions
This section discusses the result and its significance of proposed radio propagation
environmental and distributed MAC model over existing model [24, 26]. In Table 2,
the proposed model comparison of the proposed approach over the state-of-the-art
model is shown. [15] presented MBER by considering the presence of multiple obsta-
cles between LOS of communicating vehicles. However, they considered perfor-
mance evaluation under the highway environment only. Further, the number of radio
propagation model has been presented for addressing the obstacle effect between
LOS among communicating device [17–20]. However, this model aimed at reducing
propagation delay adopting 3D geometry, as a result inducing high computational
overhead. Along with, these models are simulated under different environment condi-
tion thus they are realistic. On the other side, this paper presented an efficient radio
propagation model that considers the obstructing effect between communicating
vehicle. Further, the experiment conducted in [24] shows high packet collision under
a multichannel environment. Thus, for addressing [26] presented a throughput effi-
cient channel access model. However, these models induce slightly higher colli-
sion and failed to maximize system throughput. For addressing, this paper presented
526 S. Shalini and A. P. Patil

Table 2 Performance comparison of proposed approach over state-of-the-art technique


OARP-DMAC [15] [24] [26]
Type of Urban, rural, and Highway Cognitive Cognitive
environment used highway environment VANET
environment
Type of MAC used DMAC MERA ENCCMA TECA
Radio propagation Custom (OARP) ITU-R [27] Log normal –
model used
Simulator used 802.11p SIMITS Omnet++ SIMITS –
MAC layer 802.11p 802.11p 802.11p 802.11p and
802.11ad
Communication V2I and V2V V2V V2V V2I
support
Multichannel Yes No Yes Yes
support
Performance Throughput and Broadcast Throughput and Throughput
metric considered collision efficiency and collision
relay successful
ratio

distributed MAC design that maximizes throughout with minimal collision overhead.
From overall result attained, it is seen the proposed model achieved much superior
throughput with lesser collision when compared with the existing model. Thus, the
proposed MAC model brings good tradeoff between maximizing throughput with
minimal collision.

5 Conclusion

First, this work analyzed various existing work that was presented recently for
addressing VOE among communicating vehicles. Various radio propagation method-
ologies considering obstacle in line of sight of communicating vehicle has been
presented with good result. However, these models are not applicable for simu-
lating under practical or real-time environment. Further, the adoption of 3D geom-
etry statistical model induces high computational complexity considering dynami-
cally changing vehicle environment. For addressing the research problem, this paper
first presented obstacle-aware radio propagation model. Further, the impact of the
obstructing effect on communication is tested under different environment condition
such as urban, rural, and highway. No prior work has considered such an evalu-
ation. Further, this paper presented a distributed MAC model that overcomes the
problems of maximizing throughput with minimal contention overhead. The OARP
model is incorporated into DMAC thus able to optimize the slot time dynamically
aiding system performance. Experiments are conducted by varying vehicle size. An
Obstacle-Aware Radio Propagation and Environmental Model … 527

average collision reduction of 27.81% is achieved by OARP-DMAC over OARP-


ENCCMA. Further, an average throughput enhancement of 27.31% is achieved
by OARP-DMAC over OARP-ENCCMA. From result attained, it can be stated
the proposed OARP-DMAC achieves much better throughput and collision reduc-
tion when compared with existing radio propagation and MAC model. Thus, the
OARP-DMAC is robust irrespective of vehicle density considering dynamic envi-
ronment such as urban, rural, and highway. Future work would further improve
MAC considering heterogeneous/hybrid network design (i.e., combining SDN and
cloud computing environment). The MAC will be designed considering multiuser
multichannel that maximizes system throughput with minimal resource allocation
overhead.

References

1. Fangchun Y, Shangguang W, Jinglin L, Zhihan L, Qibo S (2014) An overview of Internet of


Vehicles. China Commun 11(10):1–15
2. Study on LTE-based V2X services (Release 14), technical specification group services and
system aspects (TSG SA), document 3GPP TR 36.885, 2016
3. Kaiwartya O et al (2016) Internet of Vehicles: motivation, layered architecture, network model,
challenges, and future aspects. IEEE Access 4:5356–5373
4. Cisco visual networking index: forecast and methodology, 2016–2021. In: White Paper, Cisco,
San Jose, CA, USA, 2016
5. Naik G, Choudhury B, Park J (2019) IEEE 802.11bd & 5G NR V2X: evolution of radio access
technologies for V2X communications. IEEE Access 7:70169–70184
6. Abboud K, Omar HA, Zhuang W (2016) Interworking of DSRC and cellular network
technologies for V2X communications: a survey. IEEE Trans Veh Technol 65(12):9457–9470
7. Dey KC, Rayamajhi A, Chowdhury M, Bhavsar P, Martin J (2016) Vehicle-to-vehicle (V2V)
and vehicle-to-infrastructure (V2I) communication in a heterogeneous wireless network-
Performance evaluation. Transp Res C Emerg Technol 68:168–184
8. Liu Y-C, Chen C, Chakraborty S (2015) A software defined network architecture for geobroad-
cast in VANETs. In: Proceedings of IEEE international conference communications (ICC), June
2015, pp 6559–6564
9. Alexander P, Haley D, Grant A (2011) Cooperative intelligent transport systems: 5.9 GHz field
trials. Proc IEEE 99(7):1213–1235
10. Sepulcre M, Gozalvez J, Carmen Lucas-Estañ M (2019) Power and packet rate control for
vehicular networks in multi-application scenarios. IEEE Trans Veh Technol 1–1. https://doi.
org/10.1109/TVT.2019.2922539.
11. Akhtar N, Ergen SC, Ozkasap O (2015) Vehicle mobility and communication channel models
for realistic and efficient highway VANET simulation. IEEE Trans Veh Technol 64(1):248–262
12. Huang R, Wu J, Long C, Zhu Y, Lin Y (2018) Mitigate the obstructing effect of vehicles
on the propagation of VANETs safety-related information. In: 2017 IEEE ıntelligent vehicles
symposium (IV), Los Angeles, CA, pp 1893–1898
13. Li C et al (2018) V2V Radio channel performance based on measurements in ramp scenarios
at 5.9 GHz. IEEE Access 6:7503–7514
14. Chang F, Chen W, Yu J, Li C, Li F, Yang K (2019) Vehicle-to-vehicle propagation channel
performance for overtaking cases based on measurements. IEEE Access 7:150327–150338
15. Li W, Chen X, Zhu Q, Zhong W, Xu D, Bai F (2019) A novel segment-based model for non-
stationary vehicle-to-vehicle channels with velocity variations. IEEE Access 7:133442–133451
528 S. Shalini and A. P. Patil

16. Jiang H, Zhang Z, Wu L, Dang J (2018) Novel 3-D irregular-shaped geometry-based channel
modeling for semi-ellipsoid vehicle-to-vehicle scattering environments. IEEE Wirel Commun
Lett 7(5):836–839
17. Yang M et al (2019) A Cluster-based three-dimensional channel model for vehicle-to-vehicle
communications. IEEE Trans Veh Technol 68(6):5208–5220
18. Manzano M, Espinosa F, Lu N, Xuemin Shen, Mark JW, Liu F (2015) Cognitive self-scheduled
mechanism for access control in noisy vehicular ad hoc networks, Hindawi Publishing
Corporation. Math Probl Eng 2015, Article ID 354292
19. Hrizi F, Filali F (2010) simITS: an integrated and realistic simulation platform for vehicular
networks. In: 6th international wireless communications and mobile computing conference,
Caen, France, pp 32–36. https://doi.org/10.1145/1815396.1815404
20. Han Y, Ekici E, Kremo H, Altintas O (Feb. 2017) Throughput-efficient channel allocation
algorithms in multi-channel cognitive vehicular networks. In: IEEE transactions on wireless
communications, vol 16. no 2, pp 757–770
21. Huang R, Wu J, Long C, Zhu Y, Lin Y (2018) Mitigate the obstructing effect of vehicles
on the propagation of VANETs safety-related information. In: 2017 IEEE intelligent vehicles
symposium (IV), Los Angeles, CA, pp 1893–1898
22. Li C et al (2018) V2V radio channel performance based on measurements in ramp scenarios
at 5.9 GHz. In: IEEE Access, vol 6. pp 7503–7514
23. Chang F, Chen W, Yu J, Li C, Li F, Yang K (2019) Vehicle-to-vehicle propagation channel
performance for overtaking cases based on measurements. In: IEEE access, vol 7. pp 150327–
150338
24. Li W, Chen X, Zhu Q, Zhong W, Xu D, Bai F (2019) A novel segment-based model for
non-stationary vehicle-to-vehicle channels with velocity variations. In: IEEE access, vol 7. pp
133442–133451
25. Jiang H, Zhang Z, Wu L, Dang J (Oct. 2018) Novel 3-D irregular-shaped geometry-based
channel modeling for semi-ellipsoid vehicle-to-vehicle scattering environments. In: IEEE
wireless communications letters, vol 7. no. 5, pp 836–839
26. Yang M et al. A cluster-based three-dimensional channel model for vehicle-to-vehicle
communications. In: IEEE Transactions on Vehicular
27. ITU-R (June 2019) Propagation by diffraction. In: International telecommunication union radio
communication sector, 2007, Technology vol 68. no. 6, pp 5208–5220
Decision Making Among Online Product
in E-Commerce Websites

E. Rajesh Kumar, A. Aravind, E. Jotheeswar Raghava, and K. Abhinay

Abstract In the present era, customers are mainly engrossed in the product-based
system. To make their exertion easy all pupils are trusting in internet marketing.
By catching this public interest, all the product-based systems are playing enor-
mous activities which may be legal or illegal. Due to this reason, decision making
among products in e-commerce websites is making most ambiguity. Considering this
perspective, this paper is providing an analysis of how to evaluate customer reviews.
It deals with deciding how to manage the customer experience in marketing. This
paper presents how to analyze online product reviews. The framework aims to distill
large volumes of qualitative data into quantitative insights on product features so
that designers can make more informed decisions. This paper set out to identify
customer’s likes and dislikes found in reviews to guide product development.

Keywords Online products · Customer reviews · Naïve Bayes · Visualization ·


Classification · Support vector machine (SVM) · Decision making · E-commerce

1 Introduction

The data from different sources like conducting surveys, interviews, etc. The impor-
tance of the customer and their needs were the key role to design a product and
that must satisfy the customer needs. Nowadays, customers can review all aspects
of products in e-commerce websites. Big data is needed for the product designers

E. Rajesh Kumar (B) · A. Aravind · E. Jotheeswar Raghava · K. Abhinay


Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation,
Vaddeswaram, Guntur 522502, Andhra Pradesh, India
e-mail: rajthalopo@gmail.com
A. Aravind
e-mail: attadaa91@gmail.com
E. Jotheeswar Raghava
e-mail: jotheeswarreddy@gmail.com
K. Abhinay
e-mail: abhinaykotha@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 529
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_39
530 E. Rajesh Kumar et al.

to exploit product ability. A large amount of information is available on Internet.


Through Internet product reviews are taken from e-commerce websites provide a
piece of valuable information for the customers to buy a new product and also for
the manufactures to develop their products.
The summarization model builts multiple aspects of user preferences like product
affordance, emotions, and conditions. It is also introduced to overcome limitations
like opinions of user requirement to designers, ambiguity in the meaning of summa-
rization data, issue of complexity in linguistic patterns [1]. Sentimental analysis
design framework was built as a set of categorized customer opinions, and it is char-
acterized based on the integration of natural language processing techniques and
machine learning algorithms where the machine models are compared with general
analysis [2]. The problem of classifying documents is not by topic, but by overall
sentiment, for example, determining whether a review is positive or negative. The
usage of machine learning techniques definitely outperforms on human-produced
baselines [3]. There are many e-commerce websites, any website can give the review
in terms of rating or comments that is specified to a particular website. Generally, if
any customer wants to buy a product through online then customer need to observe all
the raw reviews which don’t have any specifications or specific pattern was not spec-
ified in the individual websites. The customer checks individual website ratings and
comments and can’t compare the product ratings or comments with other websites.
In this paper, the dataset is based on the reviews given by the end-users on the
products which are bought in different websites and specifications required by a
product. The specifications are mentioned with ratings. So by this paper customers
can easily analyze or select the products based on the specifications in reviews from
the end-users and also customers can compare the products with all other websites
to choose the best website.

2 Initial Consideration of Project

Figure 1 shows the person’s perception analysis over the online product. Mainly this
picture indicates the process of visualization on the online product by the customer.
Generally, any customer will list out the relative product website, now consider the
few interesting websites which are trusted by users.
Now the user will enter into the specific online website URL and look out the
interface if any pre-conditional login is required then the customer will fill and get
into it. So, by this login, the user can understand that the online website is giving user
individuality or not. After login into the website now the user will look out the entire
item which displayed on the first screen and now user search the key product in search
engine. After that, the user will select one product and look into its specification or
feature of a product. If the user finds all features are good, then the user decides
whether to buy a product or not. If the user didn’t find any item as good, then the
user browses to the next website. In this format, the user will travel from one source
of online website to another online website.
Decision Making Among Online Product in E-Commerce Websites 531

Fig. 1 General assumption of customers on online product websites

2.1 Naive Bayes

Naive Bayes is a probabilistic classification technique based on Bayes theorem. In


machine learning, it has proven to not only be simple but also fast, accurate, and
reliable. It is successfully used for many purposes. It assumes that the presence of
a particular feature in a class is unrelated to the presence of any other feature. An
object can be classified based on its features like the state and behavior of an object,
for example, consider an object desk, where the material is wood it is considered
as the state of the object, and it is used for work surface it is considered as the
behavior of the object. If these features depend on each other or depend on the
existence of different features, a naive Bayes classifier considers all these properties
to contribute independently to the probability that the object is a desk. The input
variables are categorical, but the algorithm accepts continuous variables. There are
ways to convert continuous variables into categorical variables this process is referred
to as the discretization of continuous variables [4].
Naive Bayes is probabilistic which means that calculates the probability of each
element and then output the element with the highest one. How naive Bayes can
get the probabilities is by utilizing Bayes theorem, which depicts the likelihood of
an element. It does surprisingly well, and it is widely used because it outperforms
more sophisticated classification methods [5]. It is a probabilistic model, and it can
implement easily and execute efficiently for very large datasets and without any prior
knowledge of data, it gives quick response to user requests in real-world applications
and it is one of the most popular algorithms for classifying the text documents. It is
used in spam filtering to distinguish spam emails from legitimate email, and it can also
be used in fraud detection, for example, to claim insurance based on attributes such as
532 E. Rajesh Kumar et al.

vehicle age, vehicle price, police report status naive Bayes can provide probability-
based classification whether the claim is genuine [4]. Bayes theorem provides a way
to calculate the posterior probability p(a/b) from p(a), p(b) and p(b/a).

Probability Rule
The conditional probability of event a occurring, given that event b has already
occurred, is considered as p(a/b).

a b
P P(a)
P = a
(1)
b P(b)

• P(a/b) is posterior probability


• P(b/a) is the probability of predictor given class
• P(a) is the prior probability
• P(b) is the prior probability of predictor.

Naïve Bayes classifier finds prior probability and likelihood for each feature of
the given class labels and posterior probability from the above formula. The class
label with the highest posterior probability is the result of the prediction [5]. Naive
Bayesian includes all predictors using Bayes’ rule and the independent assumptions
between predictors [6].

2.2 Support Vector Machine (SVM)

A support vector machine is a supervised machine learning algorithm, which is used


for both classification and regression. A support vector machine is a directed AI
calculation that can be utilized for both arrangement or relapse difficulties [7]. In
this algorithm, plots are based on all the data items in the dimensional space as points
[8]. The algorithm classifies the data items with the hyperplane that differentiate the
classes based on the features of each element in the dimensional space. Support
vector machines are best in segregating into different classes, i.e., hyperplane or line.
Support vector machines are based on decision planes concept that defines bound-
aries [9–12]. A hyperplane is one that separates between a set of objects with different
memberships. A schematic example is shown below, in this example, the items belong
to either circle or square shape can differentiate based on their features. The sepa-
rating plane called hyperplane defines that the items on the left side are square and
the items on the right side are circle. Any new object which classifies as a square
will come to the left and which classifies as the circle will come to the right.
Figure 2 is an example of a linear classifier, i.e., the classifier separates the objects
in two different sets with a line [8].
Decision Making Among Online Product in E-Commerce Websites 533

Fig. 2 Accurate hyperplane of the dataset

3 Visualization

The customer can analyze the occurrences of product websites that are preferred by
the end-user. From Fig. 3, customers can analyze the most preferred website like
amazon for that product next followed by Flipkart and so on (Fig. 4).

Fig. 3 Representation of customer observation on online product companies


534 E. Rajesh Kumar et al.

Fig. 4 Construction model

4 Implementation

1. Dataset collection is taken from Google form which specifies the different
attributes of the product should be rated with different values. This form is mainly
about to take the analysis of the customer who was already bought the product
in another source of online website.
2. Data validation is a process of splitting the dataset with reference to customer-
specified products and removing all the unnecessary variables upon the specified
product from the dataset and ordering the dataset according to the websites.
3. In the planning stage, the dataset is divided into two different data frames: one
as a training dataset and another one as a testing dataset. where training dataset
can train by the algorithm with its dataset, whereas testing can test the trained
dataset.
4. In the modeling stage, the training dataset undergoes into classified algorithm
training with considering the target variable of the dataset.
5. Prediction of the model is happened by considering both the trained and testing
datasets. This prediction use to test the dataset with reference to the trained
dataset.
6. The confusion matrix is the process of displaying the table of values which are
correctly predicted through the modeled dataset and testing dataset. The size of
the confusion matrix is a 2 × 2 matrix with labels as 0 and 1. Where [0,0] and
[2] matrix index used to show correctly predicted values of the data, [0, 1] and
[1, 0] matrix index shows incorrect predicted values.
7. If any new customer wants to buy the product, then the result analysis shows the
result in deployment.
Decision Making Among Online Product in E-Commerce Websites 535

Fig. 5 Naive Bayes result

Fig. 6 Support vector machine result

4.1 Result Analysis:

Figure 5 shows the result for a specific product using the naive Bayes algorithm.
Figure 6 shows the result for a specific product using the SVM algorithm.
By applying naive Bayes and SVM algorithm on the dataset which can get an
output of accuracy which is related to the Buys products. This accuracy has been
calculated from a confusion matrix where a result is a total number of correctly
predicted from the testing dataset by the total number of rows in a dataset. Buys
products are dependent on all the variable which are present in the dataset. This result
shows the best online product website based on the accuracy result of a product on
a specific company. From these two algorithms, Naïve Bayes gives the best accurate
result. Figure 5 shows Flipkart having the highest review rating of 38 percentage
and Fig. 6 shows the club factory having the highest review rating of 35%. so, by
this, the new consumer can utilize this analysis result for buying online products
on an e-commerce website. This process considers the algorithm based on an online
independent product specification review which is given individually by the customer.
The main important application of the paper is to predict the online product website
which is been reviewed by the end-user and give their review.
536 E. Rajesh Kumar et al.

5 Conclusion

This paper reviewed the current online review summarization methods for products.
Knowing how clients use to buy the item, what their enthusiastic state when utilizing
it. This paper has gone through the naive Bayes and SVM models to perform target
result where the SVM algorithm used to check the best hyperplane in the given data
set but it doesn’t check the occurrences in the data set. To overcome this problem
using a naive Bayes algorithm, where it use to see the probability of occurrences
over the data set. Hence by using the Naïve Bayes algorithm, it can easily predict
good accuracy results over SVM.

References

1. Bhongade S,Golhani S (2016) HIFdetectionusingwavelettransform, travellingwaveand


supportvectormachine. In: 2016Internationalconferenceon electricalpowerandenergysys-
tems(ICEPES)
2. Jin J, Ji P, Liu Y (2014) Prioritising engineering characteristics based on customer online
reviews for quality function deployment. J Eng Des 25(7–9):303–324
3. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine
learning techniques. In: Proceedings of the conference on empirical methods in natural language
processing (EMNLP), pp 79, 86
4. EMC Education Services (2015) Discovering, analyzing, visualizing and presenting data. In:
Data science & big data analytics
5. Bisht D, Joshi S (2019) Adding improvements to multinomial naive bayes for increasing the
accuracy of aggressive tweets classification
6. https://saedsayad.com/naive_bayesian.htm
7. Wang WM, Wang JW, Li Z, Tian ZG, Tsui E (2019) Multiple affective attribute classification
of online customer product reviews a heuristic deep learning method for supporting Kansei
engineering
8. https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-exa
mple-code/
9. Ireland R, Liu A (2018) Application of data analytics for product design: sentimentanalysis of
online product reviews. CIRP JManuf SciTechnol
10. Sentiment analysis using feature based support vector machine—A proposed method.
IntJRecent Technol Eng 2019
11. https://www.statsoft.com/textbook/support-vector-machines
12. Mantyla MV, Graziotin D, Kuutila M (2018) The evolution of sentimental analysis
A Descriptive Analysis of Data
Preservation Concern and Objections
in IoT-Enabled E-Health Applications

Anuj Kumar

Abstract The Internet of things (IoT) is an expanding field which increases its partic-
ipation in different fields like e-health, retail, smart transportation, etc., day by day
where devices communicate with each other and with persons for providing different
facilities for the users and for the overall community of humans. In this, communica-
tion technologies are used with modern wireless communications. Communication
between device to device and human made possible with the help of sensors and
wireless sensors network which are provided by IoT. With these capabilities in IoT,
various challenges are available. This paper focused on an overview of IoT and appli-
cation scenarios of IoT. IoT contribution health sector, IoT E healthcare architecture
and point out the various security concern and objections in the E-health enabled
with IoT are also talk over and disclosed.

Keywords Internet of things · IoT application · E-health care · IoT security ·


Privacy

1 Introduction

In IoT, there is a wide area for research available now which attracts research scholars.
IoT has now changed the way of living of humans. In this pattern, different types
of devices and gadgets are attached in the manner that they can communicate or
transfer information with each other. Internet is a medium which is used for this
type of interaction. The team of research and innovations Internet of things describe
IoT is a network Infrastructure which spread out worldwide with own composed
qualities which are based on standard rules of exchange and use information in large
heterogeneous network where both types of an object like physical and virtual have a
specification, physical attributes, and virtual characters use smart coherence and are
logically united into the information network. The ability to exchange and use infor-
mation in a large heterogeneous network is a special feature of IoT which provides

A. Kumar (B)
Department of Computer Engineering and Applications, GLA University, Mathura, India
e-mail: anujkumar.gla@gla.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 537
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_40
538 A. Kumar

Table 1 Applications of IoT [2, 3]


Fields Some example of applications
E-health Patient observations, ensuring availability of critical
Hardware, tracking sufferers, employees, and records
Increased medicine management
Retail and logistics Automated checkout, personalized discounts, beacons smart shelves,
robot employees, optimizing supply chain control
Smart transportation Real-time vehicle tracking, personalized travel information, smart
parking
Smart environment Pleasant accommodation, smart workplace, effective apparatus for
industry, and smart athletic club, modish shops
Energy conservation Adoption of green energy, energy management systems, smart grid
Smart home Lighting control, gardening, safety and security, air quality, water-quality
monitoring, voice assistants, switches, locks
Green agriculture Precision farming, climate conditions, agricultural drones, soil humidity
care
Futuristic Automatic taxi cab, town information prototype, increased play room

more and rapid growth in its popularity. IoT has a feature, to collect and share data
from the connected smart devices or objects with other devices and structure. Through
the analysis and processing of the data, there is a little bit or no need for any type of
interaction by a human in devices when devices performing their actions. Nowadays,
the Internet of things (IoT) completely modified connectivity from “anytime,” “any-
where” for “anyone” into “anytime,” anywhere” for “anything” [1]. Forming a smart
city and smart home, creating environmental monitoring, providing a new direction
for a new smart healthcare system, and adding specific features at transportation,
etc., are the objectives of IoT.
IoT Applications Some of the IoT applications are given in the below Table 1.
Many areas are seen where IoT gives its applications and creates dynamic changes
in our life. In the next section, smart health concept which is enabled with IoT is
discussed
1. IoT in e-health—Before the existence of the Internet of things there is some
limitation in traditional healthcare system. 1. Patients can be bounded to interact
with doctors only by visiting hospitals and sending text messages. 2. No options
were available for doctors for caretaking and monitor its patient health 24 h and
provide treatment accordingly. IoT solves these problems by providing different
IoT-enabled medical equipment which help to distant monitoring patients in the
medical field possible and meeting with doctors have become very systematic.
Presently, IoT is changing the scenario of the medical field by reformatting the
space of devices. Lots of IoT-enabled health applications in the medical field
provide gain for patients, families, doctors, and medical institutions (Fig. 1).
1.1 IoT for patients—IoT provides wearable devices like fitness bands, CGM,
sensors, coagulation testing, etc., give feels to a patient like doctors attend
A Descriptive Analysis of Data Preservation Concern … 539

Fig. 1 Sample healthcare scenario

personally his/her case. IoT provides a big change in old people life where these
devices track continuous health status of them. Devices contain techniques,
which provide an alert signal to the relatives and concerned medical practitioners
who follow up those people who are living alone.
1.2 IoT for physicians—Physicians use wearable and other embedded IoT devices.
With the help of this, they can monitor and track their patients’ health and any
type of medical needed by patients. IoT creates a strong and tightly bounded
relation between physician and with their patients. Patient’s output (data) come
from these devices and it is very helpful to doctors to identify the disease and
for providing the best treatment.
1.3 IoT for hospitals—IoT devices in hospitals are used in devices like defib-
rillators, tracking devices, etc. IoT-enabled devices protect the patient against
infection also. IoT devices also work as a manager for getting information like
pharmacy inventory control, and environmental monitoring, and also works for
humidity and temperature control.
IoT architecture contains four steps. The output of every step will give input to
the next step. These steps are combined in a process. Final combined values in the
process are used as per users need and for different area prospects.
Step 1 In the initial step, there is a formation of interconnected devices embedded
with sensors, actuators, monitors, detectors, etc., and these types of
equipment are used for the collection of data.
Step 2 In this step, sensor is used for providing data; data is in the analog form, so
it will be required to collect and transform analog form to the digital form
for further execution.
540 A. Kumar

Step 3 Outputted data of step 2 is digitized and aggregated; in this step, it will be
stored in a data center or cloud.
Step 4 Advanced analytics are used in this data in the final step for managing and
structuring it as users get the right decision on behalf of this.
Health care with IoT contains various issues and challenges in terms of data
security. IoT-enabled connected devices gathered a lot of data which also contains
very sensitive data indeed. So data security should be a major concern on this field,
and several security and privacy issues are observed [1].

2 Literature Survey

Tao et al. [4] discussed healthcare data acquisition technique and studied about secu-
rity and privacy concern of it. The author proposed a technique which was collected
data secured for IoT-based healthcare system. Secure data scheme was composed of
four layers but author contributed to the first three layers. FPGA algorithm and secret
cipher algorithm were used for the initial phase and implemented KATAN algorithm
maximize. FPGA was used for hardware platform, and for achieving privacy and
protection to patients’ data, the secret cipher was used. Distributed database tech-
nique was applied at cloud computing layer for achieving privacy in patients’ data.
Simulations with FPGA were used for measuring the performance of secure data
in terms of a different parameter of algorithm. Fan et al. [5] proposed a scheme
which can solve the problem of medical security and privacy. RFID was used for
this purpose because it contained features of data exchanging and collecting, and
for this execution back-end server was used. Encoded text (Cipher Text) was used
for this type of information exchanging process, which makes this process more
securable. Tang et al. [6] proposed a secured health data collection scheme where
data was collected from various sources and signature techniques were used for a
guaranteed fair incentive for contributing patients and for keeping data obliviousness
security and fault tolerance combination of two cryptosystems was used. Also, the
key factor of the scheme in terms of the resistance of system from attacks toleration
of healthcare center failure, etc., was discussed. Puthal [7] proposed a static model
for data privacy. Basically, a model was used for restricted flow of information on a
huge amount of data streams. Two types of the lattice were used; one for wearable
sensors, known as sensor lattice, and the second one is user lattice for users. Static
lattices aimed to execute the model as much as faster. Results manifest that model can
handle the huge amount of data which comes in the form of streams with minimum
dormancy and store requirement. Deebak et al. [8] proposed a secured communica-
tion scheme for healthcare applications. In this, an authentication scheme was used for
users which is based on biometric technology and gives better result as compared to
existing techniques in terms of like packet delivery ratio, end-to-end delay, throughput
rates, and routing overhead and founded these results when it was implemented on
NS3simulator. This will make more securable smart healthcare application system.
A Descriptive Analysis of Data Preservation Concern … 541

Minoli et al. [9] proposed a novel IoT protocol architecture, inspect tools and tech-
niques which was used in security that could be grip as part of the distribution of
IoT; the author said that these techniques were very most important in e-health and
special care facility home like nursing homes applications. Tamizharasi et al. [10]
discussed the various architectural models and also acquire control algorithms for
IoT-enabled e-health systems. Further they discussed a relative analysis of different
architecture segments and about the security measures, and at last advised best appro-
priate techniques for IoT-enabled e-medical care systems. Koutli et al. [11] firstly
surveyed about the field of e-health Internet of things and found the security require-
ments and challenges in (IoT) applications. Then proposed an architecture which
contained VICINITY. Also contained General Data Protection Regulation (GDPR)
which was an amenable feature ordered to provide secured e-medical facilities to old-
and middle-aged people. At last, it highlighted the point of designing of this architec-
ture and security and privacy needs of this system. Rauscher and Bauer [12] proposed
a safe and secured analysis approach which contained a standardized meta-model
and an IoT safety and security framework that embracing a personalized analysis
language. Boussada et al. [13] proposed a new privacy maintaining e-health solution
over NDN. All privacy requirements and security were achieved by this architecture
and focusing on the dependency, named AND_aNA. security analysis was followed
for proving the vigorous of that proposal and through the performance, evaluation
shows its effectiveness. At last, after simulation results are applied, it is disclosed
that technique had an acceptable transmission delay and involves a negligible over-
head. Almulhim and Zaman [14] proposed a secure authentication scheme, where
a group-based lightweighted scheme was used for credential for IoT-based e-health
applications, and the proposed model contained various specific features like mutual
authentication, energy efficient, and computation for healthcare IoT-based applica-
tions. For containing these features in the scheme, elliptic curve cryptography (ECC)
concept was used. Savola et al. [15] proposed a new set of rules applied for security
objective decomposition aimed at security metrics definition. defined and managed
security metrics Systematically Which provide a higher level of effectiveness in secu-
rity controls, permitting informed risk-driven security decision-making. Islam et al.
[16] proposed an intelligent collaborative security model to minimize security risk; it
was also discussed how different new technologies like big data, ambient intelligence,
and wearable, are gripped in a healthcare context. Various relations between IoT and
e-health policies and its control across the world are addressed, and a new path and
new area for future research on IoT-based health care based on a set of open issues
and challenges are provided. Suo et al. [17] discussed all the aspects of data security
in each layer of IoT architecture like perception, network, support application layers
and found the holes in power and storage and other different issues like DDoS attack
authentication and confidentiality privacy protection in IoT architecture. All these
issues and challenges are elaborated briefly. Qiang et al. [18] focused issues like
network transmission of information security wireless communication and informa-
tion security RFID tag information security privacy protection and found challenges
were like RFID identification, communication channel, RFID reader security issues
etc. Chenthara et al. [19] discussed a security model which works for electronic
542 A. Kumar

health record (EHR). They also focused on that points which identified after study
about the research work which has already published on EHR approach before two
decades. Those techniques which can maintain the integrity and other basic data
security measure of any patient in EHR are further explained. Chiuchisan et al.
[20] discussed major concerns of data security like confidentiality, integrity, etc.
Healthcare systems also surveyed about information protection in terms of different
measures which was used in security and communication techniques also. Some
issues about security which arose with patients in some specific disease like Parkinson
when performed monitoring and other services were further explained. Abbas and
Khan [21] focused on facilities of the cloud for health information record, like data
storage center. Author described the state of the art in the field of cloud supports in
health records and explained the point like classification and taxonomy, which were
found after surveyed different privacy preserving techniques. Further, it strengths
and weaknesses of these techniques are focused and some new challenges are also
give for new research scholars. Ma et al. [22] proposed a new technique for e-health
application. The technique was based on compression which was a combination of
two techniques: Fourier decomposition algorithm (AFD) and symbol substitution
(SS). In the initial step, AFD was worked on, data was compressed by using lossy
compression further SS performed lossless compression technique. Hybridization of
both techniques was very effective in terms of CR and PRD that gave more valu-
able results. Idoga et al. [23] highlighted those point who effected the healthcare
consumers. For identification, this data was applied on various measures and found
the structural path model, also development of healthcare center by using the cloud.
After applying data on various models, it is analyzed with some specific measures like
social science, LISREL, etc. Pazos et al. [24] proposed a new programmed frame-
work that inscribes the fragmentation process. Overall process flows with the help
of communicating agents. They were using different set of rules for communication
between devices. In this framework, communication agent was developed according
to giving specification, also was feasible in all terms of security and expandable.
Maurin et al. [25] discussed the objects that are exchanged and communicated on
the Internet and focused on security features of these objects and threats like cyber
risk, vulnerabilities discussed which broke the security shield of these objects. And
the overview of solutions of these problems is further explained and the requirement
which was necessary to adapt for business and market perspectives is explained.
Karmakar et al. [26] proposed SDN-based architecture for the Internet of things.
This was based on authentication scheme which means it allowed only authenticated
devices for accessing the specific network. For authentication, lightweight protocol
was used. It was also worked for secure flow in the network. Combinations of both
concepts make the system more securable from malicious attacks and threats. Chung
et al. [27] described the IoT issues and explored some challenges. A new system
according to our needs for security is further proposed. In this, author said that
security features could be added in old ones without regenerating a system. The
old features of the system are exchanged with new coming features without doing
any type of renewal in the system. Borgia [28] explored new security technique
in M2M communication, routing, end-to-end reliability, device management, data
A Descriptive Analysis of Data Preservation Concern … 543

management, and security of IoT which make more secure the overall process. And
further author found some challenges in this field at the time IoT-enabled devices and
objects were communicated. There were privacy and security issues at the time of
transmission of data. Xiaohui [1] explained the concepts of IoT than about the issues
and challenges in terms of security and privacy being faced in IoT field. At the time
of transmission, the author found two types of security issues like wireless sensor
network security problems and information transmission and processing security.
And highlighted other threats like counterfeit attacks and malicious code attacks.
Keoh et al. [29] described all four nodes on which IoT devices are based. Stan-
dard rules of security are described and communication security for IoT is mainly
focused. Some challenges such as interoperable security datagram layer security are
also explained.

3 Issues and Challenges in E-health and the Internet


of Things

See Table 2.

4 Motivation

Health is the most concerned subject of human beings and e-health with IoT has
a notable area of the future Internet with a vast effect on human’s community life
and trades. IoT applications belong to the health sector and in other sectors also and
services which are providing both. There are some security issues in applications of
this field which are elaborated in Table 2. To secure IoT environment against those
issues, a new architecture or a mechanism is needed for this application areas. By
the help of that mechanism, it can be solved or fill small holes who arise in security
terms like authentication, confidentiality, and data integrity in IoT-embedded field.
The main motivation behind this survey is to provide mainly a detailed study about
e-health system with IoT and other IoT applications related to this field and find the
security issues and challenges in the field of e-health with IoT.

5 Conclusion

IoT gives big changes in the usage of Internet and also opens new opportunities for
new research scholars in the real world. Although a lot of researches are on IoT,
its IoT-based application areas are still now open. Data security issues in e-health
system with IoT have been carried out. Lots of researchers already gave data security
544 A. Kumar

Table 2 Issues and challenges in the Internet of things


Author Description Issues and challenges
Tao et al. [4] Discussed security data scheme Collusion attacks,
For three layers eavesdropping
(1) IoT network Impersonation patients’ data
sensors/devices; leakage and
(2) Fog layers; Destruction
(3) Cloud computing layer
Fan et al. [5] Discussed RFID-based health Tag anonymity: replay attack
system, RFID system resistance
architecture with tags, fixed and Forward secrecy: mutual
mobile users authentication
Anti-DoS attack
Tang et al. [6] Explained secure data Healthcare centers fault
aggregation tolerance
(1) system setup; (2) Healthcare center and cloud
aggregation request server obliviousness security
allocation; (3) data Differential privacy
collection;
(4) data aggregation
Puthal [7] Focused to integrate the Information flow control
information flow control model problem
to a stream manager
Deebak et al. [8] Secure and anonymous The privacy preservation issues
biometric-based user in the IoM
authentication scheme
Minoli et al. [9] Explained Data availability
Secure data transmission from Eavesdropping, denial of
one end to another service attack
Unauthorized users cannot
access data
Tamizharasi et al. [10] Reviewed traditional methods of RBAC
access control techniques in ABE
following terms for IoT-based CP-ABE
e-health systems. security Novel approach required
privacy
Fine-grained access control
Scalability
Koutli et al. [11] Discussed two e-health Integrity
applications 1. ambient-assisted Availability
living (AAL) 2. M Health and Data minimization
Explain VICINITY architecture Anonymization
Rauscher and Bauer [12] Presented an approach for the Health-endangering
IoT-S2A2F for IoT-MD vulnerabilities safe and secured
architecture security and safety architectures identify
optimization architectural weak points
(continued)
A Descriptive Analysis of Data Preservation Concern … 545

Table 2 (continued)
Author Description Issues and challenges
Boussada et al. [13] Discussed named data Privacy issues over NDN
networking (NDN) nodes Comparison with IP solutions
exchanging Simulation Conduction
Identity-based cryptography
(IBC)
E-health solutions
Almulhim and Zaman [14] Lightweight authentication Middle attack
scheme Unknown key sharing attacks
ECC principles comparable Increase number of users’
level of security group based access points
authentication scheme\model Security issues
Savola et al. [15] Explored security risk, discussed Hierarchy of security metrics
heuristics for security objective More detailed security
decomposition, systematically objectives for the target system
defined, and managed security
Metrics
Boussada et al. [30] A novel cryptographic scheme Contextual privacy requirements
PKE-IBE. Based on Sensibility of exchanged data
identity-based cryptography Secure session keys
(IBC) tackles the key escrow transmission
issue and ensures the blind
partial private key generation
Islam et al. [16] Surveys advances in IoT-based Standardization IoT healthcare
healthcare technologies platforms
Analyzes distinct IoT security Cost analysis the app
and privacy features development process data
Discussed security protection network type
requirements, threat models, and Scalability
attack taxonomies
Suo et al. [17] Explained the security issues Storage issues, attacks like
which comes in all four types of DDoS attack
layers of IoT architecture Basic security needs like
authentication confidentiality,
access control, etc.
Qiang et al. [18] Discussed RFID tag information RFID identification,
security wireless communication channel, RFID
communication and information reader security issues
security network transmission of Radio signals attack
ınformation security Internet information security
Privacy protection Private information security
(continued)
546 A. Kumar

Table 2 (continued)
Author Description Issues and challenges
Chenthara et al. [19] Discussed EHR security and Integrity
privacy Confidentiality
security and privacy Availability
requirements of e-health data in Privacy
the cloud (3) EHR cloud
architecture, (4) diverse EHR
cryptographic and
non-cryptographic approaches
Chiuchisan et al. [20] Explored data security, Security issues in
communication techniques, communication techniques
strategic management,
rehabilitation and monitoring
with a specific disease
Abbas and Khan [21] Discussed facilities of the cloud Secure transfer of the data
for health information record, .Attacks like DoS
classification, and taxonomy, Authentication issues
reviewed more research work
Ma et al. [22] Explained combination of two Physical alteration can be
techniques Fourier possible
decomposition algorithm (AFD) There is no access control in the
and symbol substitution (SS), transmission of data
evaluated in terms of CR and
PRD
Wang [31] Worked for outsourced data and Due to IoT-enabled devices
user’s data secured in data unique working features this
sharing will create an issue of data
To ensure the privacy of data security
owner Mobility, scalability, the
multiplicity of devices
Idoga et al. [23] Identification, structural path Integration of different
model techniques creates a challenge
Data statistics like effort for security
expectancy performance Secure transfer of the data
expectancy information sharing
Pazos et al. [24] Discussed program-enabled Fragmentation in terms of
framework for fragmentation communication
flexible communication agents Protocols and data formats
Security and scalability aspects
Maurin et al. [25] Discussed the objects Communication between IoT
Threats like cyber risk, objects/ machines. Compromise
vulnerabilities basic security aspects of data.
The solution in terms of business Device tampering, information
and market perspectives disclosure, privacy breach
(continued)
A Descriptive Analysis of Data Preservation Concern … 547

Table 2 (continued)
Author Description Issues and challenges
Karmakar et al. [26] Explained SDN-based Malicious attacks and threats
architecture using an
authentication scheme
lightweight protocol
Security challenges
Chung et al. [27] Discussed on-demand security No proper pre-preparation for
configuration system handling security threats
Worked for unexperienced No techniques for authentication
challenges on security issues Compromise data security and
privacy
Borgia [28] Explored Security in terms of Authentication
IoT devices management and Privacy
security of data, network and Data security
applications
Xiaohui [1] Discussed wireless sensor Counterfeit attacks, malicious
network security problems and code attacks
information transmission and
processing
Security
Keoh et al. [29] Discussed Security issue at the time of
Standardization exchange and use information
Communication security by devices
Transport layer security

solutions in IoT but still, there is a need for more security solutions in application
fields of IoT like smart home, e-health, retail, etc. As an output of this survey, many
issues and challenges are found like a small hole placed in our data security in the field
of e-health in IoT like denial of service, man-in-the-middle, identity and data theft,
social engineering, advanced persistent threats, ransomware, and remote recording.
Many researchers gave solution about that but it is not sufficient, day by day, new
issues and challenges are in front of researchers so now more research should be
done in this field.

References

1. Xiaohui X (2013) Study on security problems and key technologies of the Internet of Things.
In: International conference on computational and Information Sciences, 2013, pp407–410
2. Mathuru GS, Upadhyay P, Chaudhary L (2014) The Internet of Things: challenges & security
issues. In: IEEE international conference on emerging technologies (ICET), 2014, pp 54–59
3. Atzori L, Iera A, Morabito G (2010) The Internet of Things: a survey. ElsevierComputer Netw
2787–2805
4. Tao H, Bhuiyan MZA, Abdalla AN, Hassan MM, Zain JM, Hayajneh T (2019) Secured data
collection with hardware-based ciphers for IoT-based healthcare. IEEE Internet of Things J
6(1):410–420. https://doi.org/10.1109/JIOT.2018.2854714
548 A. Kumar

5. Fan K, Jiang W, Li H, Yang Y (2018) Lightweight RFID protocol for medical privacy protection
in IoT. IEEE Trans Ind Inf 14(4):1656–1665. https://doi.org/10.1109/TII.2018.2794996
6. Tang W, Ren J, Deng K, Zhang Y (2019) Secure data aggregation of lightweight E-healthcare
IoT devices with fair incentives. IEEE Internet of Things J 6(5):8714–8726. https://doi.org/10.
1109/JIOT.2019.2923261
7. Puthal D (2019) Lattice-modeled information flow control of big sensing data streams for smart
health application. IEEE Internet of Things J 6(2):1312–1320. https://doi.org/10.1109/JIOT.
2018.2805896
8. Deebak BD, Al-Turjman F, Aloqaily M, Alfandi O (2019) An authentic-based privacy preser-
vation protocol for smart e-healthcare systems in IoT. IEEE Access 7:135632–135649. https://
doi.org/10.1109/ACCESS.2019.2941575
9. Minoli D, Sohraby K, Occhiogrosso B IoT Security (IoTSec) Mechanisms for e-Health and
Ambient Assisted Living Applications. In: 2017 IEEE/ACM international conference on
connected health: applications, systems and engineering technologies (CHASE), Philadelphia,
PA, pp 13–18. https://doi.org/10.1109/CHASE.2017.53
10. Tamizharasi GS, Sultanah HP, Balamurugan B (2017) IoT-based E-health system security:
a vision archictecture elements and future directions. In: 2017 International conference of
electronics, communication and aerospace technology (ICECA), Coimbatore, 2017, pp 655–
661. https://doi.org/10.1109/ICECA.2017.8212747
11. Koutli M et al (2019) Secure IoT e-health applications using VICINITY framework and GDPR
guidelines. In: 2019 15th International conference on distributed computing in sensor systems
(DCOSS), Santorini Island, Greece, 2019, pp 263–270. https://doi.org/10.1109/DCOSS.2019.
00064
12. Rauscher J, Bauer B (2018) Safety and security architecture analyses framework for the
Internet of Things of medical devices. In: 2018 IEEE 20th international conference on e-
health networking, applications and services (Healthcom), Ostrava, 2018, pp 1–3. https://doi.
org/10.1109/HealthCom.2018.853112
13. Boussada R, Hamdaney B, Elhdhili ME, Argoubi S, Saidane LA (2018) A secure and privacy-
preserving solution for IoT over NDN applied to e-health. In: 2018 14th International wireless
communications & mobile computing conference (IWCMC), Limassol, 2018, pp 817–822.
https://doi.org/10.1109/IWCMC.2018.8450374
14. Almulhim M, Zaman N (2018) Proposing secure and lightweight authentication scheme for
IoT based E-health applications. In: 2018 20th International conference on advanced commu-
nication technology (ICACT), Chuncheon-si Gangwon-do, Korea (South), 2018, pp 481–487.
https://doi.org/10.23919/ICACT.2018.8323802
15. Savola RM, Savolainen P, Evesti A, Abie H, Sihvonen M (2015) Risk-driven security metrics
development for an e-health IoT application. In: 2015 Information security for South Africa
(ISSA) Johannesburg, 2015, pp 1–6 https://doi.org/10.1109/ISSA.2015.7335061
16. Islam SMR, Kwak D, Kabir MH, Hossain M, Kwak K (2015) The Internet of Things for
health care: a comprehensive survey. IEEE Access 3:678–708. https://doi.org/10.1109/ACC
ESS.2015.2437951
17. Suoa H, Wana J, Zoua C, Liua J (2012) Security in the Internet of Things: a review. In:
International conference on computer science and electronics engineering, 2012, pp649–651
18. Qiang C, Quan G, Yu B, Yang L (2013) Research on security issues on the Internet of Things.
Int J Future Gener Commun Netw 1–9
19. Chenthara S, Ahmed K, Wang H, Whittaker F (2019) Security and privacy-preserving
challenges of e-health solutions in cloud computing. IEEE Access 7:74361–74382
20. Chiuchisan D, Balan O, Geman IC, Gordin I (2017) A security approach for health care infor-
mation systems. In: 2017 E-health and bioengineering conference (EHB), Sinaia, 2017, pp
721–724
21. Abbas, Khan SU (2014) A review on the state-of-the-art privacy-preserving approaches in the
e-health clouds. IEEE J Biomed Health Inf 18(4):1431–1441
22. Ma J, Zhang T, Dong M (2015) A novel ECG data compression method using adaptive Fourier
decomposition with security guarantee in e-health applications. IEEE J Biomed Health Inf
19(3):986–994
A Descriptive Analysis of Data Preservation Concern … 549

23. Idoga PE, Toycan M, Nadiri H, Çelebi E (2018) Factors Affecting the successful adoption
of e-health cloud based health system from healthcare consumers’ perspective. IEEE Access
6:71216–71228
24. Pazos N, Müller M, Aeberli M, Ouerhani N (2015) ConnectOpen—Automatic integration of
IoT devices. In: 2015 IEEE 2nd world forum on Internet of Things (WF-IoT), Milan, 2015,pp
640–644
25. Maurin T, Ducreux L, Caraiman G, Sissoko P (2018) IoT security assessment through the inter-
faces P-SCAN test bench platform. In: 2018 Design, automation & test in Europe conference
& exhibition (DATE), Dresden, 2018, pp 1007–1008
26. Karmakar KK, Varadharajan V, Nepal S, Tupakula U (2019) SDN enabled secure IoT archi-
tecture. In: 2019 IFIP/IEEE symposium on integrated network and service management (IM),
Arlington, VA, USA, 2019, pp 581–585
27. Chung B, Kim J, Jeon Y (2016) On-demand security configuration for IoT devices. In: 2016
International conference on information and communication technology convergence (ICTC),
Jeju, 2016, pp 1082–1084
28. Borgia E (2014) The Internet of Things vision: key features, applications and open issues.
Elsevier Comput Commun 1–31
29. Keoh SL, Kumar SS, Tschofenig H (2014) Securing the Internet of Things: a standardization
perspective. IEEE Internet of Things J 265–275
30. Boussada R, Elhdhili ME, Saidane LA ()2018 A lightweight privacy-preserving solution for
IoT: the case of e-health. In: 2018 IEEE 20th international conference on high performance
computing and communications; IEEE 16th international conference on smart city; IEEE 4th
international conference on data science and systems (HPCC/SmartCity/DSS), Exeter, United
Kingdom, 2018, pp 555–562. https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00104
31. Wang H (2018) Anonymous data sharing scheme in public cloud and its application in e-health
record. IEEE Access 6:27818–27826
32. Sudarto F, Kristiadi DP, Warnars HLHS, Ricky MY, Hashimoto K (2018) Developing of Indone-
sian intelligent e-health model. In: 2018 Indonesian association for pattern recognition inter-
national conference (INAPR), Jakarta, Indonesia, 2018, pp 307–314. https://doi.org/10.1109/
INAPR.2018.8627038
33. Abomhara M, Koien GM (2014) Security and privacy in the internet of things: current status
and open issues. In: IEEE International conference on privacy and security in mobile systems
(PRISMS), 2014, pp1–8.
34. Gubbi J, Buyya R, Marusic S, Palaniswami M (2013) “Internet of Things (IoT)” a vision,
architectural elements, and future directions. Elsevier Future Gener Comput Syst 1645–1660
35. Al-Fuqaha A, Guizani MM, Aledhari M, Ayyash M (2015) Internet of Things: a survey on
enabling technologies, protocols and applications. IEEE Commun Surv Tutor 17(4):2347–2376
36. Said O, Masud M (2013) Towards Internet of Things: survey and future vision. Int J Comput
Netw (IJCN) 1(1):1–17
37. Matharu GS Upadhyay P, Chaudhary L (2014) The Internet of Things: challenges & security
issues. In: IEEE, international conference on emerging technologies (ICET), 2014,pp 54–59
38. Granjal J, Monteiro E, Sa Silva J (2015) Security for the Internet of Things: a survey of existing
protocols and open research issues. IEEE Commun Surveys Tutor 17(3):1294–1312
39. Atamli AW, Martin A (2014) Threat-based security analysis for the Internet of Things. In:
International workshop on secure Internet of Things, 2014, pp 35–43
40. Mahmoud R, Yousuf T, Aloul F, Zualkernan I (2015) Internet of Things (IoT) security:
current status, challenges and prospective measures. In: International conference for internet
technology and secured transactions (ICITST), 2015, pp336–341
41. Vasilomanolakis E, Daubert J, Luthra M, Gazis V, Wiesmaier A, Kikiras P (2015) On the
security and privacy of Internet of Things architectures and systems. In: International workshop
on secure internet of things, 2015, pp 49–57
42. Zhang Z-K, Cho MCY, Wang C-W, Hsu C-W, Chen C-K, Shieh S (2014) IoT security: ongoing
challenges and research opportunities. In: IEEE international conference on service-oriented
computing and applications, 2014,pp 230–234
550 A. Kumar

43. Jiang DU, Shi Wei CHAO (2010) A study of information security for M2M of lOT. In: IEEE
international conference on advanced computer theory and engineering (ICACTE), 2010,pp
576–579
44. Basu SS, Tripathy S, Chowdhury AR (2015) Design challenges and security issues in the
Internet of Things. In: IEEE region 10 symposium, 2015, pp90–93
45. Miorandi D, Sicari S, De Pellegrini F, Chlamtac I (2012) Internet of Things: vision, applications
and research challenges. Elsevier Ad Hoc Netw 1497–1516
46. Asghar MH, Mohammadzadeh N, Negi A (2015) Principle Application and vision in Internet
of Things (IoT). In:International conference on computing, communication and automation,
2015, pp427–431
47. Chen X-Y, Zhi-Gang, Jin (2012) Research on Key technology and applications for Internet of
Things. In: Elsevier international conference on medical physics and biomedical engineering,
2012, pp561–566
48. Vermesan O, Friess P (eds) (2014) In: Internet of Things—From research and innovations to
market deployment. River Publishers Series in Communication
Applying Deep Learning Approach
for Wheat Rust Disease Detection Using
MosNet Classification Technique

Mosisa Dessalegn Olana, R. Rajesh Sharma, Akey Sungheetha,


and Yun Koo Chung

Abstract Nowadays, technologies deliver to humankind the ability to produce


enough food for billions of people over the world. Even though the world is producing
a huge amount of crops to ensure food security of the people, there are a lot of factors
that are threats in the process of ensuring food security. The threats that occur on
crops can be with climate changes, pollinators, and plant diseases. Plant diseases are
not only threats to global food security, but they also have devastating consequences
on smallholding families in Ethiopia, who are responsible for supporting many, in
one family. Crop reduction is the major problem the world is facing currently, and
solving this with artificial intelligence detection methods has been the major chal-
lenge of experts on the efficiency of the algorithms, because of the nature of the of
the diseases to be identified on the crops. Convolutional neural network is showing a
promising result, especially in computer vision. This paper elaborates on implemen-
tation of deep learning (convolutional neural networks) using the RGB value of the
color of the disease found on the crop, which increases the efficiency of the model.

Keywords Wheat rust · Convolutional neural networks · RGB value


segmentation · Computer vision · MosNet

1 Introduction

The economy of Ethiopia mainly depends on agriculture which constitutes of 40%


GDP, 80% of exports, and 75% of country’s overall workforce [1]. But, the outbreak
of different diseases on different crops is the most difficult challenge the country

M. D. Olana (B) · R. Rajesh Sharma · A. Sungheetha · Y. K. Chung


Department of Computer Science and Engineering, School of Electrical Engineering and
Computing, Adama Science and Technology University, Adama, Ethiopia
e-mail: mscsmd2018@gmail.com
R. Rajesh Sharma
e-mail: sharmaphd10@gmail.com
Y. K. Chung
e-mail: ykchung99@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 551
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_41
552 M. D. Olana et al.

is facing, which brings different socioeconomic problems such as food insecurity,


market inflation and hard currency shortage because of the necessity of imports to
cover it in import from foreign countries. In Ethiopia, the second position of crop
yielding is occupied by wheat [2] which covers approximately 17% of entire farmland
as per country’s statistics. The crop is critical to not only smallholder incomes but
the food and nutrition providers for 10 million Ethiopians. Ethiopia is expecting
production of 4.6 million tons in 2019/20, which is an increment of 0.1 ton only,
which is still not good enough to cover wheat consumption of the country [9]. Even
though yellow rust is the most common wheat rust disease, in some places stem and
leaf rust [1, 2] have also been identified.
The biggest challenge in Ethiopia to deal with this problem is that it could not be
addressed easily with the help of technology at an early stage so that it cannot spread
through all the fields, because every premature detection is done by deploying a large
number of experts in the field and inspects it manually.
From study-wise, there are a lot of studies that worked on different kinds of plant
disease detections. But, with all their performances, at the same time, they do have
their drawbacks as no research has a perfect ending. This study tried to figure out
some of the drawbacks that occurred in previous researches. Here are some basic
problems that tried to be figured out in this study:
• Using laboratory images and when the model is tested in the real field or cultivation
area, the accuracy of the models degrades.
• Being constrained to single leaf images that are with a homogenous background.
• Using manually cropped images specifically focusing only on the disease area
which decreases the efficiency when it is taken to a real-world test. These draw-
backs are discussed in the second chapter (Related works) in detail with their
references.
• Being dependent on previously trained models which bring future insecurity.

2 Lıterature Revıew

All over the world, wheat is considered as one of the common cereal grains which
comes from grass type (Triticum) and grown as a different variety all over the world.
Wheat is considered as highly proactive as it has gluten which triggers harmful
immune response in individuals. Besides, these people all over the world consume
wheat as main food due to its rich antioxidants, minerals, vitamins, and fibers. Wheat
is a prominent food security crop of Ethiopia, and it provides millions of dollars for
Ethiopia [3].
Despite the importance of the crop in Ethiopia, wheat is the most commonly rust-
infected grain, basically with three most common rusts: leaf rust, yellow (stripe) rust,
and stem rust (Table 1).
This paper deeply works on how to detect three types of wheat rust diseases using
2113 RGB value of the images with RGB value segmentation and convolutional
Applying Deep Learning Approach for Wheat Rust Disease Detection … 553

Table 1 Wheat import and


Year Import amount in (MT) Growth rate from the
local growth rate in Ethiopia
previous year (%)
2010 700 −34.58
2011 1300 85.71
2012 1100 −15.38
2013 1000 9.095
2014 1075 7.50
2015 2600 141.86
2016 1100 −57.69
2017 1500 36.36
2018 1500 0.00
2019 1700 13.33

neural network approaches together. Three types of wheat rust diseases are severely
harming wheat crops in Ethiopia.

2.1 Leaf Rust

Leaf rust (Puccinia triticina) is one of the common diseases that occurs in plants due
to fungus [4, 5]. İt is also called as brown rust that occurs mainly in wheat, barley,
and other grown crops. By attacking the foliage, leaf rust changes the leaf surface
into dusty, reddish orange to brown.

2.2 Yellow Rust

Yellow rust (Puccinia striiformis) [4] is also known as wheat stripe rust. It is one
of the rust diseases that occurs in wheat grown in cool environments. Mostly, the
northern latitudes are such regions which have temperature range of 2 and 15 °C.
Even though leaf and yellow rusts are types categorized under leaf rust family, they
are indifferent races, but they are sometimes difficult to recognize them easily by
normal visual inspection and they need to be tested in laboratory.

2.3 Stem Rust

The stem rust is caused by the fungus-like other rust types (Puccinia graminis) [4] and
is a significant disease that affects bread wheat, barley, durum wheat, and triticale.
554 M. D. Olana et al.

Deep learning is a subfield of machine learning [6] that studies statistical models
called deep neural networks that can learn hierarchical representations from raw data.
The aim of creating a feasible neural network for detection and recognition of images
that categories under neural network as convolutional neural network (ConvNet) [7,
8]. ConvNet is widely used for feature extraction through its processing layers for
the input image.

2.4 Related Works and Limitations

First, this research [9] used 15,000 manually cropped RGB images into a single leaf to
detect only the infected area of the crop. These images are used to classify three types
of cassava leaf diseases, by applying a different set of a train, validate, and test split
ranges, in which 10 percent is used for validating the model, and other are used for
train and test of 10/80, 20/70/, 40/50, and 50/40 percent, respectively. They also used
Google InceptionV3 and achieved 98% of accuracy, but at the same time, this study
cannot achieve good performance when having random images, which are captured
under random conditions, which will not allow the model to be applied in real-world
conditions. In this research [10], they used GoogLeNet and AlexNet models to train
54,306 images from the plantVillage Web site, in which GoogLeNet performs better
and consistently with a training accuracy of 99.35%. But in this study, the accuracy
degrades to 31.4%. In this study, three train–test split distribution of 75/25, 60/40,
and 70/30 in percent has been used with three types of image types which are RGB
color images, grayscale images, and segmented images. In third work [11], they used
an automated pattern recognition using CNN, to detect three types of plants and their
diseases, based on simple leaves of the plants, using the 5 basic CNN models, from
pre-trained models. The study uses 70,300 for training and other 17,458 images for
testing, with a standard size of 256 × 256 pixel size. Models fine-tuned [10, 12, 13]
in these studies are AlexNet, AlexNetOWTBn, GoogLeNet, OverFeat, and VGG,
with the highest accuracy on the training set with 100 and 99.48% on the testing set
in VGG model.
Table 2 shows the existing research for wheat rust detection with significantly
improved accuracy, as well as they have some limitations in their works which is
compared with the current scenario. Here, listed major four techniques are taken for
comparison of [9–12].

3 Proposed MosNet Method

In this learning, our model has been proposed from scratch which is named after the
author’s name ‘MosNet’ model without using any transfer learning method and our
model classifies the images into two categories, which are infected and not infected
Applying Deep Learning Approach for Wheat Rust Disease Detection … 555

Table 2 Comparison of existing work and limitations


Authors Techniques Accuracy (%) Limitations
S. P. Mohanty et al. Applied fine-tuning 99.35 The study constrained to
using pre-trained deep single leaf images with
learning models homogenous background
K. P. Ferentinos et al. Applied different 99.48 Accuracy degrades when
pre-trained networks to the model tested on images
train it on laboratory from real cultivation field
images
A. Picon et al Fine-tuned ResNet50 87 Images are segmented
model manually by expert
technicians
A. Ramcharan et al. Transfer learning using 96 Images are manually
InceptionV3 cropped to a single leaflet

classification; this is because it is needed to have hundreds of thousands or millions


of images to recognize in which type of wheat rust the crop is infected.
The first step before starting to train our model is image acquisition using a
digital camera and smartphones from different rural parts of Ethiopia. These images
are different types of wheat images which are infected with the three types of wheat
rusts and also the healthy ones in different positions and different humidity and
weather conditions to give the model variety images so that it can help us to achieve
better accuracy.
The algorithm described below illustrates the detailed approach for the proposed
image acquisition and RGB value segmentation of our algorithm.

3.1 MosNet Model

It is a convolutional neural network [14–17] architecture which is named after my


name and implemented from scratch with totally 2113 data of the infected and healthy
image classes, with three different training and test split sets. These train–test split
distributions are 50%–50%, 75%–25%, and 70%–30% for train and test, respectively.
Sigmoid activation function for the output layer is used with the loss function of the
binary cross-entropy, and Adam gradient descent algorithm is used. The weight
functions are learned at the learning rates of 0.001 and 0.0001.
• Conv2D is the first layer of a convolution layer. It has 32 feature maps of size 3
× 3 along with a rectifier activation function. This is the input layer, expecting
image shape with (pixels, width, and height) which I fed it with 150 × 150 image
sizes.
• The next layer contains a pooling layer that takes the max called MaxPooling2D.
It is configured with a pool size of 2 × 2.
556 M. D. Olana et al.

• The third layer contains a convolution layer with 32 feature maps with a size of
3 × 3 and a rectifier activation function called ReLU and a pooling.
• Layer with a size of 2 × 2.
• The next layer contains a convolutional layer with 64 feature maps with the size
of 3 × 3 and rectifier activation function followed by the max pooling layer with
a pool size of 2 × 2.
• In the next layer, the 2D matrix data is converted into flattened vector and it
allows the output to be processed through its fully connected layers and activation
function.
• Regularization layer is the next layer which uses dropout. In order to reduce the
overfitting, this layer is configured to randomly 30% of neurons.
• The next layer is a fully connected layer with 64 neurons with a rectifier activation
function.
• The output layer has 2 neurons for the 2 classes and a sigmoid activation function
to output probability-like predictions for each class. The model is trained using
binary cross-entropy as loss function and the Adam gradient descent algorithm
with different learning rates (Fig. 1).

3.2 Algorithm

Step 1 Image collection from different parts of Ethiopia, especially parts that
are known by their vast product of wheat crop using digital camera and
smartphones, is the first step.
Step 2 The second step is increasing the number of images using ‘data augmen-
tation’ technique because there are no means and culture of image data
stored in Ethiopia since the country is still not aware of new approaches of
machine learning. In this step, 192 original images had been taken from real
cultivation field to 2113 total images used to train our model using ten (10)
augmentation features. Here below the ten features have been listed used to
augment the original images.
• Rotation, height shift, width shift, rescaling, shear, zoom, horizontal flip, fill mode,
data formant, and brightness

Step 3 The third step is to resize the images into a common standard format so that
our model can have a uniform image reading system and it makes it easy to
get images with different sizes and resizes them into 150 × 150 height and
width.
Step 4 The final step is to segment the images using the RGB color Value of the
infected images and feed to the model. Therefore, each color of the three
types of wheat rust diseases has its color and all colors have their own unique
RGB value representation.
Note: R = red, G = green, B = blue.
Applying Deep Learning Approach for Wheat Rust Disease Detection … 557

Input image

(None, 32, 148, 148)

Convolution2D

Conv Layer1 ReLuActivation

MaxPooling2D

(None, 32, 72, 72)


Convolution2D

Conv Layer2
ReLuActivation

MaxPooling2D

(None, 64, 34, 34)

Convolution2D
Conv Layer3
ReLuActivation

MaxPooling2D

(None, 64)

Flatten

SigmoidActivation
Output Layer
Dense

Dense

Fig. 1 MosNet model architecture

These representations are:


Yellow rust has a color value in between B > 0 and B < 45.
Stem rust has a color value in between R > 130 and R < 200.
Leaf rust has a color value in between G > 50 and G < 150.
558 M. D. Olana et al.

Fig. 2 Original and RGB segmented images

Segmenting the images using these color values gave us the perfect classification
by creating an identified zone on only infected areas of the crop, and (f) the image
is the healthy one; it gives us a solid dark image so that it can identify it easily from
infected crops.
As shown in Fig. 2, there are four different crop images, which are the healthy
wheat image (a), a wheat crop which is infected with yellow rust (b), a wheat crop
which is infected with stem rust (c), and wheat crop which is infected with leaf rust
(d). This segmentation has made the model classify far better than images without
being segmented.

4 Experimental Results

All the experiments are performed using the Python programming language on
Anaconda distribution platform using JupyterLab editor. MosNet model has been
evaluated on three different datasets (grayscale image dataset, RGB image dataset,
and RGB value segmented image dataset) using different parameters that can affect
the efficiency of any model. These parameters are learning rate, dropout ratios, and
train–test split ratio. These important parameters have been used in different combi-
nations; because each and every parameter has its effect, every time the value of the
parameter is changed. Therefore here below there are the parameter values which
are selected to be more efficient and to show the impact of the parameter values.
Applying Deep Learning Approach for Wheat Rust Disease Detection … 559

Epochs Epochs are the number f training iterations the model runs for the given
dataset. Three different epochs have been used, and these are 100, 200 and 300
training iterations.
Learning Rate Adam (0.001), Adam (0.0001), and Adam (0.00001) have been used.
Test Ratio Test ratio is the amount of data that is used to test the accuracy of the
model after the training is finished. And test data is a type of data that is not learned
by the model before. Test ratios of 25, 20, and 30% have been used from the total
dataset.
Dropout It is explained in (2.5 dropout) section and has used two values of dropout
rate which are 50 and 30%. Each and every parameter has its impact on the model
depending on the values used in combination with the other parameters, and the
effect they brought is discussed.
In this study, more than two hundred experiments have been conducted by
exchanging the combination of parameters that are explained in this section. MosNet
is evaluated on three kinds of image dataset which are: grayscale, RGB, and RGB
value segmented image datasets.
Grayscale images are images with only one color channel, which cannot hold
enough information to be processed or extracted from the image data. Evaluating
the same model on the same dataset in Table 3, by changing their parameters that
can affect the efficiency of the model, the effect of the learning rate is discussed
for results 1 and 2 in the table. The only parameter changed from the two results is
their learning rate, in which 0.001 is used for the first result and 0.00001 is used for

Table 3 Cumulative summaries for MosNet model


No. Dataset Epochs Learning rate Test Dropout Training Accuracy Error
type ratio (%) time (%) (%)
(%)
1 Grayscale 100 Adam(0.001) 25 50 60.49 85.89 14.11
100 Adam(0.00001) 25 50 60.95 81.63 18.37
200 Adam(0.001) 25 30 120.77 80.78 19.22
200 Adam(0.001) 25 50 120.43 86.62 13.38
300 Adam(0.001) 25 50 179.25 79.08 20.92
2 RGB 200 Adam (0.001) 25 50 110.34 98.78 1.22
200 Adam (0.001) 20 50 115.65 98.48 1.52
300 Adam (0.001) 25 30 170.90 99.51 0.49
300 Adam (0.001) 25 50 196.24 99.27 0.73
300 Adam (0.0001) 25 50 176.19 97.57 2.43
300 Adam (0.00001) 25 50 169.17 96.84 3.16
3 RGB 300 Adam (0.001) 25 30 165.61 99.76 0.24
segmented 300 Adam (0.001) 25 50 168.14 98.05 1.95
560 M. D. Olana et al.

Table 4 Improved accuracy of the proposed model


No. Techniques Accuracy (%) Error (%)
1 Proposed MosNet 99.76 0.24
2 Applied fine-tuning using pre-trained deep learning models 99.35 0.65
3 Applied different pre-trained networks to train it on laboratory 99.48 0.52
images
4 Fine-tuned ResNet50 model 87.00 13.0
5 Transfer learning using InceptionV3 96.00 4.00

the second result, which results in an accuracy of 85.89% and 81.63%, respectively.
This shows that the learning rate in result 1 which is equal to 0.001 has decreased to
0.00001 in result 2.
Table 4 shows the improved result in accuracy of proposed MosNet model.
This sounds, as the learning rate decreases, the accuracy of the model decreases
proportionally, which sounds like, decreasing the learning rate means, decreasing
in the speed of the learning ability of the model. As long as the model is using the
same number of epochs, dropout rate, and test ratio, the result shows the model is
taking too much time to learn features from the data. As understood from the table,
the model starts to degrade on the 300th epoch and this shows for grayscale images
it has nothing more to extract and learn because there is only one channel this makes
it lose more features from the data and that will degrade the efficiency of the model
after the 200th epoch. This result will force us to use RGB images which have three
channels and can contain more than 16 million colors. This helps the model to extract
and learn from the color nature of images and helps to prevent losing information
from the images, which can be extracted from the color of the images since in the
case of this study wheat rusts are identified by their colors from the healthy wheat
images. The graphical results of each evaluation can be seen (Figs. 3 and 4).

Fig. 3 Confusion matrix of


validation data for Table 3,
row 1
Applying Deep Learning Approach for Wheat Rust Disease Detection … 561

Fig. 4 Confusion matrix of


validation data for Table 4,
row 1

As have shown in Figs. 5 and 6, there is a big difference when testing our model
on the training data and test data. The difference is the training data is the data
that is already learned by the model during the training time and it does not take
it so long for the model to generalize all samples as a class of learning time. As
understood from the above confusion matrix, loss graph and accuracy graph cannot
have good enough detection model that can work properly in real-time conditions as
it is with many error results, even though it detected more than 80% of the testing
data that is why needed to improve the quality and more extracted data as much as
possible. Therefore, the next step is to train our model using RGB images without
segmentation, this brought as much further accuracy than the model trained with the
grayscale image dataset, and the results of the two datasets can be compared with
the same parameters.
If the results of grayscale dataset of row 5 are shown in Table 3 and RGB dataset
of row 4 in Table 3, there is a big difference in their result.

Fig. 5 Loss graph of train


versus data for Table 3, row 1
562 M. D. Olana et al.

Fig. 6 Accuracy graph of


train versus val val data for
Table 3, row 1

Fig. 7 Accuracy graph for


grayscale dataset

The above two diagrams are the results found evaluating our model on two
different datasets, the grayscale dataset which is shown in Fig. 7 and RGB dataset
which is shown in Fig. 10. As the accuracy graph of both evaluations is seen, the one
with the grayscale dataset has the minimum value on validation (test) data, which
scores only 79.08% accuracy and 20.92% error, which is not convenient to use this
model on real-time applications, and this shows as the number of epochs increased
to the maximum when grayscale image dataset is used, it starts to degrade the accu-
racy of the model because when the model extracts grayscale images repeatedly,
the model generalizes the samples into false classes and starts to overfit the classes.
The next graph (Fig. 8) is the evaluation result of RGB image dataset with the same
parameter used in the previous one, the result is far better than the one with grayscale
image dataset, this evaluation scored with 99.27% accuracy and 0.73%, this is a pretty
good result, and besides the comparison, the model has still an accuracy value of
99.51%, but still, needed more extraction technique because have seen some prob-
lems with problems that it classifies some of the wheat images with rain droplets,
soil and leaves with fire burn as an infected crop which is an error. Therefore, there
Applying Deep Learning Approach for Wheat Rust Disease Detection … 563

Fig. 8 Accuracy graph for


RGB dataset

Fig. 9 Classification report of RGB segmented dataset value with the highest accuracy

is a need to find another way to fix this problem, it was segmenting the images using
their unique RGB value, and this brought us a big improvement with the result of
99.76% (Table 3 row 1 of RGB segmented dataset part) accuracy and also fixing the
problems encountered in previous evaluations on grayscale and RGB image datasets.
This result also gave us a precision and recall value of 1 which indicates a perfect
evaluation of the model (Fig. 9).
As shown in Fig. 10, the result that has been achieved by segmenting images using
their RGB value gave us an excellent accuracy with validation accuracy of 99.76%
and the lowest error rate of 0.24% which is absolutely a great achievement in this
study. You can see that the loss of the model starts from around 0.6 and tends to
almost zero error gradually, and for the accuracy part it starts from around 88% of
accuracy and ends on 99.76% on the 300th epoch.

5 Conclusion and Future Work

This study discussed different CNN models by applying different important factors
that can affect the model designed in the study. Three dataset types are used in the
study to conduct the experiments for the MosNet model, and these dataset types
564 M. D. Olana et al.

Fig. 10 Loss and accuracy graph of RGB segmented dataset with the best result

are: grayscale image dataset, a dataset which only contains one channel images,
RGB dataset, a dataset with 3 channel images and RGB color segmented dataset, a
dataset which is RGB and segmented with the disease color code. MosNet model
has achieved an accuracy of 86.62% with 200 epochs, 0.001 learning rate, and a
50% dropout rate. This result is improved when the model is trained on the RGB
image dataset, which climbed to an accuracy of 99.51%. Finally, after segmenting the
images using the color of the infected images, the model extracted better information
than the previous model and achieved an accuracy of 99.76% with 300 training
epochs, the learning rate of 0.001, and the dropout rate of 30%.
This study delivered a CNN model that can effectively monitor wheat crop health
which is quite helpful for early protection of the wheat farm before the disease spreads
out and makes total damage to the crop. This is a nice thing for early prevention of
total crop loss, but it is not enough for statistical data, which means the detection
should be with identifying which type of disease occurred and in how much extent
it occurred on the farm. Collecting enough and well-defined dataset on different
agricultural land, that help this study to progress to work on different variety of
crops and their disease types to apply CNN models to the real world in a short period
of time in Ethiopia. Since lack of enough data limits the study unable to progress
more than the result found currently.

References

1. FAO (2016) Ethiopia battles wheat rust disease outbreak in critical wheat-growing regions.
https://www.fao.org/emergencies/fao-in-action/stories/stories-detail/en/c/451063/. Accessed
20 Mar 2019
2. Seyoum Taffesse A, Dorosh P, Gemessa SA (2013) Crop production in Ethiopia: regional
patterns and trends. In: Food and agriculture in Ethiopia: progress and policy challenges, vol.
9780812208, pp 53–83
3. USDA. Ethiopia Wheat Production. https://ipad.fas.usda.gov/rssiws/al/crop_production_
maps/eafrica/ET_wheat.gif. Accessed 19 Jan 2019
Applying Deep Learning Approach for Wheat Rust Disease Detection … 565

4. GreenLife. Wheat Rust (1). https://www.greenlife.co.ke/wheat-rust/. Accessed 25 Mar 2019


5. Wegulo SN, Pathologist EP Byamukama E (2000) Rust diseases of wheat. https://ohioline.osu.
edu/factsheet/plpath-cer-12
6. Miceli PA, Blair WD, Brown MM (2018) Isolating random and bias covariances in tracks
7. Prabhu R (2018) Understanding of convolutional neural network (CNN) deep learning.
Medium.Com. https://medium.com/%40RaghavPrabhu/understanding-of-convolutional-neu
ral-network-cnn-deep-learning-99760835f148. Accessed 19 Feb 2019
8. Brownlee J (2019) A gentle ıntroduction to pooling layers for convolutional neural networks.
In: Deep learning for computer vision. https://machinelearningmastery.com/pooling-layers-
for-convolutional-neural-networks/. Accessed 14 Feb 2019
9. Ramcharan A, Baranowski K, McCloskey P, Ahmed B, Legg J, Hughes DP (2017) Deep
learning for image-based cassava disease detection. Front Plant Sci 8(2002):1–10
10. Mohanty SP, Hughes DP, Salathé M (2016) Using deep learning for ımage-based plant disease
detection. Front Plant Sci 7
11. Ferentinos KP (2018) Deep learning models for plant disease detection and diagnosis. Comput
Electron Agric 145(January):311–318
12. Too EC, Yujian L, Njuki S, Yingchun L (2019) A comparative study of fine-tuning deep learning
models for plant disease identification. Comput Electron Agric 16(October 2017):272–279
13. Picon A, Alvarez-Gila A, Seitz M, Ortiz-Barredo A, Echazarra J, Johannes A (2019) Deep
convolutional neural networks for mobile capture device-based crop disease classification in
the wild. Comput Electron Agric 161(October 2017):280–290
14. Sharma AR, Beaula R, Marikkannu P, Sungheetha A, Sahana C (2016) Comparative study of
distinctive image classification techniques. In: 2016 10th International conference on ıntelligent
systems and control (ISCO), Coimbatore, 2016, pp 1–8. https://doi.org/10.1109/ISCO.2016.
7727002
15. Rajesh Sharma R, Sungheetha A (2017) Segmentation and classification techniques of medical
images using innovated hybridized techniques—A study. In: 2017 11th International conference
on ıntelligent systems and control (ISCO), Coimbatore, 2017, pp 192–196. https://doi.org/10.
1109/ISCO.2017.7855979.
16. Sungheetha A, MsSujitha R, Arthi V, Sharma RR (2017) Data analysis of multiobjective density
based spatial clustering schemes in gene selection process for cancer diagnosis. In: 2017 4th
International conference on electronics and communication systems (ICECS), Coimbatore,
2017, pp 134–137. https://doi.org/10.1109/ECS.2017.8067854.
17. Sungheetha A, Sharma R, Nuradis J (2019) Implication centered learning mechanism for
exploring analysis of variance by means of linear regression in artificial neural networks. In:
2019 Third international conference on I-SMAC (IoT in social, mobile, analytics and cloud)
(I-SMAC) Palladam, India, 2019, pp 748–752. https://doi.org/10.1109/I-SMAC47947.2019.
9032475
A Decision Support Tool to Select
Candidate Business Processes in Robotic
Process Automation (RPA):
An Empirical Study

K. V. Jeeva Padmini, G. I. U. S. Perera, H. M. N. Dilum Bandara,


and R. K. Omega H. Silva

Abstract Robotic process automation (RPA) is the automation of business processes


(BPs) using software robots. RPA robots automate repetitive, non-value-adding
human work. The extent that a BP can be transformed into a software robot and
its utility depends on several factors such as task type, complexity, repeated use,
and regulatory compliance. The estimated RPA project failure rates are relatively
high, and transforming the wrong BP is attributed as one of the critical reasons for
this. Therefore, given a candidate set of BPs, it is imperative to identify only the
suitable ones for RPA transformation. In this paper, a decision support tool (DST)
is presented to select candidate BPs for RPA. First, 25 factors are identified from
the literature that captures the characteristics of a BP. The list is then reduced to 16
factors based on a set of interviews with RPA subject matter experts (SMEs). Then
an online survey with snowball sampling was conducted to measure the relevance
of those factors in predicting the outcome of RPA transformation of a candidate BP.
Finally, the two-class decision forest classification model was used to develop the
DST. The utility of the proposed DST was validated by applying it to three RPA
projects of a global IT consulting services company.

Keywords Business process automation · Decision forest algorithm · Decision


support tool · Robotic process automation

K. V. Jeeva Padmini (B) · G. I. U. S. Perera · H. M. N. Dilum Bandara · R. K. O. H. Silva


Department of Computer Science and Engineering, University of Moratuwa, Moratuwa, Sri Lanka
e-mail: jeeva.12@cse.mrt.ac.lk
G. I. U. S. Perera
e-mail: indika@cse.mrt.ac.lk
H. M. N. Dilum Bandara
e-mail: dilumb@cse.mrt.ac.lk
R. K. O. H. Silva
e-mail: omega.03@cse.mrt.ac.lk

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 567
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_42
568 K. V. Jeeva Padmini et al.

1 Introduction

Businesses continually strive to improve customer satisfaction while increasing the


quality of products or services, productivity, and reducing cost [1]. Robotic process
automation (RPA) plays a pivotal role in this attempt by automating various fractions
of business processes, especially the ones that involve repetitive, non-value-adding
human tasks and procedures [2–5]. For example, a customer service representative
could spend more time interacting with customers as opposed to entering data from
forms filed by customers. RPA could improve accuracy, quality, operational speed,
efficiency, customer satisfaction, employee productivity, and decrease operational
costs due to reduced human intervention and human errors [6, 7]. Thus, the use of
RPA in digitalization and automating business processes has dramatically increased
recently [4, 8]. It is estimated that 100 million knowledge workers will be replaced
by software robots by 2026 and RPA could cut cost up to 75% in financial services
firms [5].
In the journey of business process automation (BPA), RPA is the process of
using software robots to reduce the burden on human workers [1]. RPA mimics
users performing tasks on existing user interfaces and tools than the traditional BPA
approach of interacting with the backend of applications using application program-
ming interfaces (APIs) or specific scripting languages. However, software robots
are far more than a screen scraping technology and could encompass cognitive and
artificial intelligence features. Given the benefits and seaming potential to automate
almost anything, there is a great interest in transforming any business process to RPA.
However, not every business process is a suitable candidate for RPA transformation.
For example, a study by Lamberton [9] showed that 30–50% of RPA projects fail
in the initial attempt. Transforming the wrong business process is attributed to one
of the key reasons for such failure [8–10]. Therefore, deciding on suitable candidate
business processes for bot implementations is critical for the success of RPA trans-
formation. However, this is not trivial as RPA is an emerging technology, and only a
few process analysts have the required experts in RPA [4, 9]. In this paper, a decision
support tool (DST) is proposed to assist the process analysts in selecting candidate
business processes for RPA.
The first step in developing a DST is to identify the factors/parameters that affect
the decision. Therefore, the following research question is first formulated:
What are the factors that affect candidate business process selection for RPA transformation?

A literature survey is conducted to answer the research question and identified


25 factors that capture the type, complexity, and workload of a business process. A
set of interviews with RPA subject matter experts (SMEs) in the industry was then
conducted to identify more relevant factors. Based on the interview results, the list
of factors was reduced to 16. An online survey with snowball sampling was further
conducted to measure the relevance of those 16 factors in predicting the success and
failure of the RPA transformation of a candidate business process. Finally, using the
survey data, a two-class decision forest with fourfold classification was trained as
A Decision Support Tool to Select Candidate Business Processes … 569

the predictive model of the proposed DST. To test the utility of the proposed DST,
the feedback is consulted from five RPA SMEs of a global IT consulting services
company, who applied the tool to three different projects of a multinational financial
services company’s RPA transformation. Out of the three projects considered, one
project was a failure, and the proposed DST predicted all three scenarios correctly.
The rest of the paper is organized as follows. Section 2 presents the litera-
ture review. Section 3 describes the research methodology. The proposed DST for
analyzing business processes and identifying candidate business processes for RPA
is presented in Section 4. Section 5 presents the survey results and the empirical
study conducted to validate the utility of the proposed DST, concluded in Sect. 6.

2 Literature Review

RPA is a novel technology in BPA [1], which has the potential to automate business
processes quickly and efficiently without altering existing infrastructure and systems.
RPA software robots (aka., virtual bots or bots) work 24 × 7 and are programmed
to follow a specific set of rules to complete tasks with higher accuracy and speed,
which is not achievable from a typical human worker [12]. A bot can integrate with
almost any software without accessing third-party tools or API [1] and behaves like
a proxy human worker to operate a business application [7]. It relives workers from
mundane tasks and enables them to focus on work requiring more human intervention,
cognitive skills, and creativity [11]. RPA could replace the human work pertaining
to criteria such as high-volume transactions, need to access multiple systems, stable
environment, low cognitive requirements, easy decomposition into unambiguous
rules, proneness to human error, limited need for exemption handling, and clear
understanding of the current manual costs [12]. Artificial intelligence (AI) is applied
to non-repetitive tasks where is RPA is typically performed in repetitive tasks. Both
RPA and AI together help to generate compelling applications and decision making
in business processes [5].
As an emerging technology, RPA has its own set of challenges. Based on the
experience in implementing RPA in 20 companies, Lamberton [9] identified that
30–50% of the initial projects failed. It has been identified that applying traditional
software development methodologies and transforming the wrong business process
are critical contributors to such failure [8–10]. While both factors require consid-
erable attention from academics and practitioners, this work focused only on the
problem of selecting the right business process for RPA transformation. Failing to
select a suitable process early in the business analysis leads to user and customer
frustration, in addition to direct losses such as time and money. In [13], Lacity and
Willcocks presented a widely used set of indicators such as rule-driven, repetitive in
nature, data-intensive, high compliance, and validations to select suitable candidate
business processes for RPA. Further, in [14], a set of recommendations to assess
and analyze each business process against human factor, complexity, and stability
are presented. In [7, 15, 16], it has been identified that selecting candidate business
570 K. V. Jeeva Padmini et al.

processes with a high volume of transactions, process maturity, and business rules
lead to better success. Moreover, in [16], it is identified that business processes with
high workload and low complexity are better candidates for RPA bot implementa-
tion. While these related works give useful insights on a few factors to consider in
determining the RPA suitability of a candidate business process, no systemic mecha-
nism/process exists to do so. Therefore, it is imperative to identify a suitable process
to determine the candidacy early in the business analysis.

3 Research Methodology

To decide on RPA suitability of a candidate business process, needed to first under-


stand the characteristics of that business process and relative significance of each
characteristic in determining a successful outcome. Therefore, a two-step process is
adopted to develop the DST, where a set of factors/parameters is first identified that
characterize a business process and then developed a predictive model to derive a
yes/no decision on its RPA suitability.
Figure 1 shows the adopted research methodology. First, a literature review was
conducted to identify factors that characterize a given business process and determine
its RPA suitability. While formal literature on the topic was limited, identified several
whitepapers and articles discuss the difficulties in selecting a business process for
RPA transformation and a set of best practices. Based on these resources, 25 factors
are identified that could impact the selection of a candidate business process for RPA
transformation. They are categorized into three groups as process factors, external
factors, and financial impact factors.
As it would be difficult to derive a meaningful RPA candidacy decision with 25
factors, it is decided to identify more significant ones among them. For this, the
opinion of RPA SMEs is consulted using a set of face-to-face interviews. Five RPA
SMEs in the industry were identified using snowball sampling. Three of these SMEs
had one to three years of experience in cognitive RPA transformation and the other
two had one to three years of experience in static RPA transformation. In addition

Fig. 1 Research methodology


A Decision Support Tool to Select Candidate Business Processes … 571

to determining the relevance of the 25 pre-identified factors, identification of any


missing factors is also attempted. Based on the interview findings, the 16 factors
listed in Table 1 as significant in determining the RPA suitability of a candidate

Table 1 Finalized factors impacting candidate business process selection in RPA


Factor type Factor name Description Measurement criteria
Process factor Volume of No. of processing High, medium, or low
transactions (VOT) requests that the
computing system will
receive and respond to
within a specified period
Business process Levels of ability to High, medium, or low
complexity (BPC) analyze, variety,
non-routines, difficulty,
uncertainty, and
inter-dependence with
other business processes
Rate of change (ROC) No. of changes made to ≤2, 3–5, 6–8, ≥8
the business process
within a month
Rule-based business Logic-based business Yes, with few
process (RBBP) process exceptions, Yes, or No
Workload variation Irregularity of workload Yes, sometimes, no
(WV) where additional staff
support is required
during peak times
Regulatory Need to satisfy Yes, No
compliance (RC) organizational,
government, or any
other regulations, e.g.,
ISO or PCI-DSS
External factor Cognitive features Task requires creativity, Yes, no
(CF) subjective judgment, or
complex interpretation
skills (e.g., natural
language processing and
speech recognition)
No. of systems to Automation task ≤2, 3–5, 6–8, ≥8
access (NOSA) involves accessing
multiple systems
Multi-modal inputs Images, audio clips, or Yes, No
(MMI) videos are given as
inputs
Data-driven (DD) Decisions during an Yes, No
activity are made based
on the analysis of data
(continued)
572 K. V. Jeeva Padmini et al.

Table 1 (continued)
Factor type Factor name Description Measurement criteria
Stability of Availability of UAT or Yes, No
environment (SOE) any other
near-production
environment of the
target application for
automation
Financial impact Operational cost (OC) Cost of day-to-day High, medium, or low
factor maintenance and
administration
Impact of failure Financial impact to High, Medium, or Low
(IOF) client if the bot goes
wrong
Human error (HE) Task is error-prone due High, medium, or low
to human worker
negligence than
inability to code as a
computer program
End of life (EOL) How soon the bot will ≤2 years, ≤4 years,
become obsolete, e.g., ≥5 years
new system is on the
road map of client
Service-level Time taken to complete Seconds, minutes,
agreement (SLA) end-to-end hours, days
functionalities of a task

business process are finalized . Five factors were combined with other factors, as
they were determined to be alternative terminologies/definitions of the same set of
factors. Four factors were removed because they were hard to measure, e.g., the
client’s expected outcome of the process delivery. A requirement of irregular labor
was identified as a new factor from the interview findings that were not in the initial
list of 25.
Based on the interviews, it was further realized that not all 16 factors are of
equal significance. However, while attempted to rank the relative importance of the
factors with the feedback from RPA SMEs, it was not straightforward. Hence, an
industry survey is decided to conduct to measure 16 factors of RPA projects/bots
developed by surveyed SMEs and their outcomes and consequently developed a
questionnaire based on the 16 factors. It was first shared with ten RPA SMEs of the
global IT consulting services company as a pilot survey. Profiles of SMEs included
project managers, architects, developers, and testers. In the questionnaire, the survey
participants are still asked to prioritize the factors. However, while analyzing the pilot
survey results, it is understood that prioritization of factors depended on the role the
survey participants played within the RPA project. Hence, the factor prioritization
option is removed and update the questionnaire based on the feedback received.
A Decision Support Tool to Select Candidate Business Processes … 573

The online questionnaire is then shared with the industry experts across the
World. These experts were identified using LinkedIn, public forums, webinars,
etc. 56 responses are collected. Three of them were discarded after verifying their
validity because those respondents commented that they cannot assess the process
complexity. Even after attempting to collect data for more than three months, it was
difficult to collect many responses, as it was hard to find professionals who have
completed at least one RPA project. Moreover, some declined to participate, citing
confidentiality agreements with their clients or conflict of interest. Among the 53
responses, only 22 of survey participants were involved in a bot implementation that
was successful and another eight were involved projects with failed bots. The other
23 participants had ongoing projects which were at different phases of the project
lifecycle.
As the dataset collected from the online survey contained both categorical and
ordinal data, the Spearman correlation is calculated for 30 (22 + 8) responses. That
factors are identified such as workload variance (WV), no. of systems to access
(NOSA), and service-level agreement (SLA) had a positive correlation with the
RPA outcome of the business process (represented as the Status of Candidate Busi-
ness Process (SCBP) in Table 1), whereas volume of transactions (VOT), regulatory
compliance (RC), and cognitive features (CF) have a negative correlation with SCBP.
However, due to the limited number of responses, as well as the feedback from SMEs
during interviews indicated that the other ten factors are still useful in capturing the
characteristics of a business process, decided to develop the prediction model using all
16 factors. This was also motivated by the fact that the dataset was small. To derive
the RPA suitability decision, the two-class decision forest classification model is
used. The overall accuracy was verified using the fourfold cross-validation process
and the prediction model was evaluated to have an overall accuracy of 90%. The
choice of Spearman correlation, two-class decision forest classification model, and
resulting data analysis are presented in Sect. 4.
Finally, the predictive model was further validated with the help of five RPA
SMEs who applied it to three different projects of a global IT consulting services
company that develops RPA bots. The three projects were chosen from three business
processes of a consumer division of a multinational financial services company. At
the time of evaluation, three projects were already implemented at the customer
site and were in operation for a sufficient time to determine their success/failure in
achieving the business outcomes. The proposed DST determined two of the projects
as suitable and the other as unsuitable for RPA transformation. Indeed the project
that was determined to be predicted as unsuitable for RPA transformation had failed
due to wrong selection of the business process.

4 Predictive Model

While the correlation analysis indicated that all 16 factors are relevant in determining
the RPA suitability of a candidate business process, it is difficult to determine the
574 K. V. Jeeva Padmini et al.

Fig. 2 Process of data analyzing and predictive model development

suitability of a candidate business process based on the 16 values assigned to those


factors. While initially attempted to derive a linear model with a threshold score or
a set of rules to determine the RPA suitability, it was nontrivial to identify a suitable
set of weights, thresholds, and rules, especially with a mix of categorical and ordinal
data. Hence, a predictive data analysis method by adopting the two-class decision
forest algorithm is adopted. A two-class classification model was sufficient as the
DST is wanted to provide a yes or no answer on the RPA suitability of a candidate
business process.
Figure 2 shows the process followed in developing the decision model to determine
the RPA suitability of a business process. SPSS is used for survey data analysis which
consisted of data preprocessing and factor selection steps. First, the collected data is
validated and removed redundant and incomplete responses. After that, the Spearman
correlation is calculated to verify the interdependency of the factors and correlation
of the factors with the dependent variable (i.e., SCBP). Spearman correlation is used
because it helps to identify the significant factors in the dataset which showed the
impact of each factor in determining the suitability of a candidate business process.
Then the predictive model was developed using Microsoft Azure Machine Learning
Studio due to its ease of use. For the predictive model, a two-class decision forest with
fourfold classification is trained using the decision forest algorithm [17]. Figure 3
shows overall model development and testing workflow. While the above tools were
used for convenience, the proposed model can be implemented in any tool that
supports two-class decision forest and fourfold classification.

4.1 Data Preprocessing

56 responses are collected from the online survey shared across the industry. Three
responses were discarded during data cleaning to correct the data inconsistency and
to remove noise. Two of those respondents had commented that they could not assess
the process complexity of the selected business process.
A Decision Support Tool to Select Candidate Business Processes … 575

Fig. 3 Model development and testing workflow

4.2 Factor Selection

Spearman’s correlation is used to identify the significant factors from the survey
responses. Pearson’s correlation is used when the data contains intervals or ratios,
whereas some of the factors are binary or categorical. Because having a categorical
and ordinal dataset, Spearman’s correlation was used to measure the monotonic
relationship strength between variables. Table 2 lists the resulting correlation values.
SCBP is the dependent variable and the 16 factors are independent variables. Lower
Spearman’s correlation among factors indicates that there is no inter-relationship
among them. Therefore, the chosen 16 factors capture the different properties of the
candidate business process.
576

Table 2 Spearman’s correlation among 16 factors and the status of the business process
SCBP VOT BPC ROC RBBP WLV RC CF NOSA MMI DD OC IOF HE EOL SOE SLA
SCBP 1.00 −0.62 −0.16 0.00 −0.01 0.41 −0.32 −0.37 −0.31 −0.19 −0.06 −0.18 −0.02 −0.15 −0.02 0.05 0.40
VOT −0.62 1.00 0.16 0.06 0.14 −0.53 0.06 0.05 −0.19 0.00 0.00 0.05 0.10 0.19 0.14 0.00 −0.29
BPC −0.16 0.16 1.00 0.19 0.16 −0.07 −0.25 −0.04 0.13 −0.23 −0.53 0.35 0.32 0.12 −0.37 0.16 0.14
ROC 0.00 0.06 0.19 1.00 −0.19 0.22 −0.27 0.13 0.03 −0.27 −0.34 0.09 0.30 0.00 0.25 0.24 0.25
RBBP −0.01 0.14 0.16 −0.19 1.00 −0.08 −0.12 −0.15 0.06 0.15 −0.07 0.30 0.19 0.00 0.01 0.47 −0.11
WV 0.41 −0.53 −0.71 0.22 −0.77 1.00 −0.23 −0.30 0.11 0.07 −0.14 −0.29 0.22 0.00 −0.07 −0.16 0.38
RC −0.32 0.06 −0.25 −0.27 −0.12 −0.23 1.00 0.24 −0.15 0.11 0.33 −0.43 −0.19 −0.16 0.25 −0.25 0.05
CF −0.39 0.05 −0.04 −0.13 −0.15 −0.30 0.24 1.00 −0.07 0.61 0.21 −0.29 −0.17 0.00 −0.19 0.11 −0.42
NOSA −0.31 −0.19 0.13 0.03 0.06 0.11 −0.15 −0.07 1.00 −0.22 −0.35 0.24 0.07 0.26 −0.20 0.12 −0.18
MMI −0.19 0.00 −0.23 −0.27 0.15 0.07 0.11 0.61 −0.22 1.00 0.28 −0.19 0.02 0.00 −0.19 0.11 −0.24
DD −0.06 0.00 −0.53 −0.34 −0.07 −0.14 0.33 0.21 −0.35 0.28 1.00 −0.38 −0.52 0.00 0.17 −0.15 −0.08
OC −0.18 0.05 0.35 0.09 0.30 −0.29 −0.43 −0.29 0.24 −0.19 −0.39 1.00 0.51 0.21 0.01 0.14 −0.10
IOF −0.02 0.10 0.32 0.30 0.19 0.22 −0.19 −0.17 0.07 0.02 −0.52 0.51 1.00 0.00 −0.02 0.01 −0.04
HE −0.14 0.19 0.12 0.00 0.00 0.00 −0.16 0.00 0.26 0.00 0.00 0.21 0.00 1.00 0.37 −0.19 −0.02
EOL −0.02 0.14 −0.37 0.25 0.01 −0.07 0.25 −0.19 −0.20 −0.19 0.17 0.00 −0.02 0.37 1.00 0.02 −0.02
SOE 0.05 0.00 0.16 0.24 0.47 −0.16 −0.25 0.11 0.12 0.11 −0.15 0.14 0.01 −0.19 0.02 1.00 −0.24
SLA 0.40 −0.29 0.14 0.25 −0.11 0.38 0.05 −0.42 −0.18 −0.24 −0.08 −0.09 −0.04 −0.02 −0.03 −0.24 1.00
K. V. Jeeva Padmini et al.
A Decision Support Tool to Select Candidate Business Processes … 577

From Table 2, it can be seen that volume of transaction (VOT), regulatory


compliance (RC), and cognitive features (CF) variables have a negative correla-
tion with the dependent variable SCBP. Workload variance (WV), no. of systems
to access (NOSA), and service-level agreement (SLA) variables have a positive
correlation with the dependent variable, SCBP, whereas business process complexity
(BPC), rate of change (ROC), rule-based business process (RBBP), multimodel input
(MMI), data-driven (DD), operational cost (OC), impact of failure (IOF), human
error (HE), end of life (EOL), and stability of environment (SOE) have an insignif-
icant correlation with SCBP. VOT (-0.625), WV (0.413), and SLA (0.40) had a
moderate correlation with the SCBP and those were statistically significance, where
2-tailed values were 0.00, 0.002, and 0.003, respectively, whereas RC (−0.317),
CF (−0.396), and NOSA (0.307) had a weak correlation with the SCBP, where
the factors were statistically significant with 2-tailed values of 0.021, 0.003, and
0.025, respectively. However, BPC (−0.159), ROC (0.00), RBBP (−0.007), MMI
(−0.191), DD (−0.062), OC (0.175), IOF (−0.018), HE (0.147), EOL (−0.018),
and SOE (0.051) had insignificant correlation with SCBP. The corresponding 2-
tailed values were 0.256, 0.998, 0.961, 0.170, 0.660, 0.210, 0.898, 0.292, 0.898, and
0.716, respectively. This result may have been due to the lower number of samples,
as the interviews with RPA SMEs revealed that the ten factors with insignificant
correlations are still useful in capturing the characteristics of a business process.
Therefore, the model is decided to build with 16 variables including the ten variables
with insignificant correlation.

4.3 Classification Model Development

The two-class classification model is selected out of multiclass classification to keep


the model simple, as well as looking to derive a yes/no decision on the suitability
of a candidate business process. The decision forests algorithm is used [17] because
it is an ensemble learning method that can be used for classification tasks. These
models provide better coverage and accuracy compared to single decision trees. The
decision forests framework is used in Microsoft Azure Machine Learning Studio
which extends several forest-based algorithms and unifies classification, regression,
density estimation, manifold learning, semi-supervised learning, and active learning
under the same decision forest framework.

4.4 Model Evaluation

Next, the accuracy of the trained two-class classification model is evaluated based
on the decision forests algorithm. Accuracy, precision, recall, and F score are some
of the commonly used metrics to evaluate the accuracy of a classification model.
Accuracy measures the quality of the classification model as the proportion of the
578 K. V. Jeeva Padmini et al.

Table 3 Accuracy of the


Accuracy Precision Recall F score
predictive model
0.900 1.00 0.870 0.930

true results to all the cases within the model. Precision describes the proportion of
true results overall positive results. The recall is the fraction of all correct results
returned by the model. F score is calculated as the weighted average of precision
and recall between 0 and 1 (higher the better). Table 3 presents the results based on
fourfold cross-validation. It can be seen that the overall accuracy of the model is 90%
and the model has good precision and recall too. A higher F 1 score further indicates
good accuracy of the test.
The proposed DST primarily includes the trained two-class classification model
and the values of 16 factors fed into it. The prediction model is published as a Web
service in Microsoft Azure platform such that the SMEs could access it to validate the
model (see Fig. 3). Finally, the proposed DST is verified by applying it to evaluate the
RPA transformation of three business processes of a multinational financial services
company.

5 Results and Discussion

The proposed DST was validated with the help of five RPA SMEs who applied it to
three different projects of the global IT consulting services company. Figure 4 shows
the output of the DST for the failed RPA transformation project which illustrates
the results as 0 for No (1 for Yes) with the status of the list of factors. The second
sentence explains the output. The DST predicted the other two projects as suitable
for RPA transformation, and they were indeed successful in actual operation at the
client site.

<- ‘DSS in RPA 3 [Predictive Exp.]’ test returned [“0”, “Medium”, “High”, “X<=2”, “With few
exceptions”, “No”,”Yes”,”Yes”,”X<=2”, “No”,
“Yes”,”Medium”,”High”,”Medium”,”2<X<5=5”,”Yes”,”Days”, “-…

Results {“Results” :{“Output”{“type”:”table”:”value”:{“Column Names”:{“What is the state of


candidate business process that you have selected?”, “Volume of Transactions”, “Business Process
Complexity”, “Rate of Change”, “Rules-Based Business Process”, “Workload Variation”, “Regulatory
Compliance”, “Cognitive Features”, “No of Systems to Access”, “Multi-Model Inputs”, “Data Driven”,
“Operational Cost”, “Impact of Failure”, “Human Error”, “End of Life” , “Stability of Environment” ,
“Service Level Agreement” , “ Scored Label Mean” , “ Scored Label Standard Deviation” } ,“ ColumnType” :
{“String”, “String”, “String”, “String”, “String”, “String”, “String”, “String”, “String”, “String”, “String”,
“String”, “String”, “String”, “String”, “String”, “String”, “Double” , “Double”} , “Value”: { {“0” , “Medium” ,
:”High” , “X<=2” , “With few exemptions” , “No” , “Yes” , “Yes” , “X<=2” , “No” , “Yes” , “Medium” , “High”
, “Medium” , “2<X<=5” , “Yes” , “Days” , “-0.127917257182435” , “0.676987962011746”}}}}}}}

Fig. 4 Results from the prediction model


A Decision Support Tool to Select Candidate Business Processes … 579

5.1 Analysis of Survey Results

As per the interview and data analysis, it is identified that RPA projects fail when
the selected business processes have the following characteristics:
1. The complexity of the business process is high.
2. Workload tends to experience high variation.
3. Regulatory compliance is needed.
4. The business process needs to access more than three to five different systems.
5. Multi-model inputs need to be handled.
6. The operational cost is high.
7. The system environment is not stable.
All the failed RPA projects had these seven characteristics. Therefore, to be suit-
able for RPA transformation, a business process should have no more than two or
three of these characteristics.

5.2 Demographic Analysis

The demographic analysis of survey participants is further presented to show that the
industry status and how the responses may have affected our findings and conclusions.
As seen in Fig. 5 survey participants had different types of roles within an RPA project,
such as business analysts (30%), developers (26%), project managers (21%), testers
(14%), head of the transformation (3%), technical leads (3%), and architect (3%)
and did not have any agile specific roles such as scrum master. This could be due to

Technical Lead Architect


Head of
3% 3%
Transformation
3%
Business Analyst
30%
Tester
14%

Project Manager
21%

Developer
26%

Fig. 5 Role played by survey respondents in RPA projects


580 K. V. Jeeva Padmini et al.

relatively less use of agile principles and values within an RPA project, which has
been identified as another reason for the failure of RPA projects.
As seen in Fig. 6, 96% of respondents had less than five years of experience
and only 4% of respondent had over 5 years’ experience in RPA projects. This is
understandable given that RPA technology is new to the industry.
As per Fig. 7, it can be seen that 11% of the respondents are planning to develop
bots in the future, and 32% of bots developments are still in progress. Only 15%
of the bots failed where it was determined that the client/business is not satisfied
with the resulting bots. The industry is moving toward bot development, and 42% of
the respondents confirmed that business is satisfied with the bot development. These
successful projects had only one to three of the seven characteristics mentioned in
the above section.

5+ years
experience in
RPA Less than 1 years
4% experience in
RPA
11%

1+ years
3+ years experience in
experience in RPA
RPA 32%
53%

Fig. 6 Respondents experience in RPA industry

Planing to develop
bots in the future Bot development
11% project is sƟll in
progress
Bots are developed 32%
and business
needs are NOT
saƟsfied
15%

Bots are developed


and business is
saƟsfied
42%

Fig. 7 Status of the bots developed by the survey respondents


A Decision Support Tool to Select Candidate Business Processes … 581

6 Summary

A prediction model is developed to select candidate business processes for RPA


projects. This model is based on 16 factors that characterize a business process
and were identified through a combination of literature review and interviews with
RPA SMEs. Then an online survey is used to identify the impact of those factors in
determining the successful outcome of an RPA project. Workload variance, number
of systems to access, service level agreement had a positive correlation with the
RPA outcome of the business process while the volume of transactions, regulatory
compliance, and cognitive features had a negative correlation. Though identified that
other ten factors have insignificant correlation, it moved forward with all 16 factors to
develop the model as RPA SMEs considered these factors to be still useful, as well as
our dataset is the small dataset. The two-class decision forest classification model was
then used to predict the outcome of a candidate business process. The overall accuracy
of the model was verified using the fourfold cross-validation process, and the trained
model showed 90% accuracy. The resulting DST was further validated by five RPA
SMEs who successfully applied it to three completed RPA projects. In future, more
responses from completed RPA projects are planned to collect to further enhance and
evaluate the proposed DST. Also, to evaluate the tool’s accuracy in predicting RPA
projects in other domains beyond finance is planned. Also, the process of developing
a software process towards RPA projects specifically is targeted.

References

1. Asatiani A, Penttinen E (2016) Turning robotic process automation into commercial success—
Case OpusCapita. J Inf Technol Teach Cases 6:67–74
2. Auro Inc. Use cases—RPA in telecom industry. https://www.aurorpa.com/rpa-telecom-industry
3. Cosourcing Partner (2016) Exploring robotic process automation as part of process improve-
ment (2016)
4. Accenture (2016) Getting robots right—How to avoid the six most damaging mistakes in
scaling up robotic process automation. Accenture Technology Vision
5. Cline B, Henry M, Justice C (2016) Rise of the robots. KPMG
6. DeMent B, Robinson T, Harb J (2016) Robotic process automation: innovative transformation
tool for shared services. ScottMadden, Inc
7. Institute for Robotic Process Automation (2015) Introduction to robotic process automation—A
primer
8. Shared Services and Outsourcing Network (2017) The global intelligent automation market
report
9. Lamberton C (2016) Get ready for robots: why planning makes the difference between success
and disappointment. Ernst & Young
10. Sigurðardóttir GL (2018) Robotic process automation: dynamic roadmap for successful
implementation. Msc. thesis
11. Casey K (2019) Why robotic process automation (RPA) projects fail: 4 factors. https://enterp
risersproject.com/article/2019/6/rpa-robotic-process-automation-why-projects-fail/
12. Lacity M, Willcocks L (2015) Robotic process automation: the next transformation lever for
shared services. London School of Economics Outsourcing Unit Working Papers, vol 7, pp1--35
582 K. V. Jeeva Padmini et al.

13. NICE (2016) Selecting the right process candidates for robotic automation. NICE Systems Ltd.
14. Haliva F (2015) 3 criteria to choosing the right process to automate. https://blog.kryonsystems.
com/rpa/3-criteria-to-choosing-the-right-process-to-automate
15. Kroll C,Bujak A, Darius V, Enders W, Esser M (2016) Robotic Process automation—Robots
conquer business processes in back offices. Capgemini Consulting
16. Schatsky D, Muraskin C, Iyengar K (2016) Robotic process automation: a path to the cognitive
enterprise. University Press, Deloitte
17. Bernes J (2015) Azure machine learning. Microsoft azure essentials. Microsoft Press
Feature-Wise Opinion Summarization
of Consumer Reviews Using Domain
Ontology

Dushyanthi Vidanagama, Thushari Silva, and Asoka S. Karunananda

Abstract There is a rapid increase of contents generated by users such as reviews


and comments over the Internet. Analyzing a review is critical for data-driven deci-
sion making and corporate intelligence of individuals and organizations. This paper
focuses on feature-wise sentiment analysis of reviews which results in a summary
based on the important features of a specific domain. The proposed approach consists
of two major processes: data acquisition and preprocessing to aggregate reviews from
different sources. Ontology is generated based on the domain-specific knowledge,
and it is updated with new features, opinions and sentiment orientation. Sentiment
determination process uses the lexicon, SentiWordNet to discovery the sentiment
orientation of each opinion word and PMI value to find the sentiment expecta-
tion of each new feature. The summarization of reviews represents the feature-level
summary of reviews to the user. Evaluation results show high precision and recall
values which measures the exactness and completeness of the approach.

Keywords Features · Domain ontology · Sentiment orientation · Opinions

1 Introduction

The rapid advancements of the Web enhance the consumers’ experience of


purchasing products and services through e-commerce Web sites. Also, most of the
Web sites allow their customers to write comments on products or services which
they had purchased before. Customers can express their opinions on e-commerce
sites, blogs, forums, social media, etc. These reviews may influence others who are
going to make purchasing decisions as they cannot have the real purchasing experi-
ence. Also, the reviews are useful for organizations to make quality improvement of
a specific product based on reviews.
According to the key findings from Local Consumer Review Survey, [1] showed
that in 2017, 97% of consumers searched online for buying products and 12% of

D. Vidanagama (B) · T. Silva · A. S. Karunananda


Department of Computational Mathematics, Faculty of Information Technology, University of
Moratuwa, Moratuwa, Sri Lanka
e-mail: udeshika@kdu.ac.lk

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 583
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_43
584 D. Vidanagama et al.

customers examined for a product online every day. Most of the consumers believe
online reviews, while positive reviews increased the trust on a product of 73% of
consumers and 49% of consumers need higher rating before deciding on a product
[1].
When making decisions from a vast collection of reviews, it is difficult to read
all the reviews from Web sites, blogs, forums and social media at once. Otherwise,
the decision-makers may acquire some biased view on the product by reading only
a few reviews. There may be some chances of missing the reviews with positive or
negative opinions. Because of that, the reviews generated by the consumers can be
classified into positive, negative or neutral using the available review mining tech-
niques which are applied in the field of information extraction from text processing.
In these classical classification methods, sometimes the negatively classified review
may not mean that the customer did not like any feature about that product and a
positively classified review may not imply that the reviewer liked every feature of
that product. They do not provide an idea about what people more like or dislike
about those products in advance. Sometimes, a feature which is not important for
one person may be important to another. Also, there may be some reviews which
contain opinions without considering any feature about the product itself. A prefer-
able choice is to evaluate products based on its feature ratings without considering the
overall rating. So, it is necessary to provide a feature-wise opinion summarization.
On the other hand, when considering the semantic orientation of opinion words
associated with each feature, variety of the sentiment analysis techniques used
domain-oriented semantic lexicons which may be inaccurate for determining the
polarity of ambiguous opinions. So, this paper suggests an approach of feature-wise
opinion classification by selecting the important opinion targets using a domain
ontology and determines the semantic orientation of unambiguous and ambiguous
opinion words using a combinational approach of semantic lexicon method and
PMI algorithm. The final feature-wise summarization of reviews is generated by
aggregating all the feature-wise sentiments.
The structure of the paper is as follows: The second section focuses on related work
in the area of semantic analysis, feature-level semantic analysis, feature extraction
methods. The third section presents the proposed methodology. The fourth section
depicts the experiment details and evaluation results. The final section concludes the
paper.

2 Related Work

Sentiment analysis is the procedure of categorizing opinions stated in a piece of


comment as positive, negative or neutral. They are used to determine the reviewer’s
opinion toward a specific topic, product or service [2]. It obtains the writer’s emotion
about numerous products or services which are written as various posts or comments.
Sentiment analysis applications are the basis for the policy-makers which detect early
warnings in a timely manner by detecting feedbacks from consumers. Collecting the
Feature-Wise Opinion Summarization of Consumer Reviews Using Domain Ontology 585

feedbacks and analyzing them manually is a very expensive and erroneous task. So,
the computational sentiment analysis is supportive to recognize problems by reading,
other than by asking, thereby guaranteeing a more accurate likeness of certainty.
By analyzing such consumer opinions, they can generate customer vision, improve
marketing effectiveness, increase customer gratification and protect brand reputation
for better market research effectiveness.
Analysis of consumer reviews can be performed in three different ways, namely
sentence-level, document-level and feature-level analysis [3]. In the document-level
analysis, the whole document with opinions is considered to be having opinions on
a single object and finally, the whole document is classified as positive or negative
[4].
The sentence-level sentiment analysis method calculates the polarity by each
sentence by taking the sentence as a single unit [4]. The feature-based senti-
ment analysis method identifies the features of the certain object and classifies the
sentence/document based on the opinion word of the features.
Feature-level sentiment analysis first discovers the targeted objects, their compo-
nents, attributes and features of the opinionated sentence and then decides whether the
opinions are negative, positive or neutral [5]. This kind of analysis is required to make
product enhancements and to express what features of the product are mostly liked
or not [5]. A positive or negative opinionated document at whole does not mean that
the opinion holder has positive or negative opinions on all the features of that partic-
ular object. Such details cannot be identified in document-level or sentence-level
sentiment analysis. When performing feature-level sentiment analysis, it is required
to identify some information related to the reviews. Those are the synonyms and
feature indicators of a feature, target object, sentiment orientation of the opinion on
the feature, reviewer and the time of the review [5].

2.1 Methods in Sentiment Analysis

The sentiment analysis can be achieved by using either machine learning, lexicon-
based or hybrid approach. The supervised machine learning approach uses the Naïve
Bayes, support vector machine, K-nearest neighbors and maximum entropy algo-
rithms for classification [4]. The sentiment orientation (SO) of opinion words is
determined in the unsupervised approach [4]. Also, lexicon-based method [6] and
unsupervised dictionary-based technique [4] have been used to perform sentiment
classification.
Esuli and Sebastiani [7] proposed the LSA method to feature extraction and the
approach of MapReduce and Hadoop together with SVM to expand the accuracy
and efficiency of sentiment classification of movie reviews. Liu et al. [8] proposed
a sentiment mining approach using a Naïve Bayes classifier combined with the
Hadoop framework. A mixture of lexicon-based and machine learning method is
used by Zhang et al. [9] in their research. A lexicon-based method was applied to
Twitter data to perform sentiment analysis. Then, chi-square test was applied on the
586 D. Vidanagama et al.

output, and additional tweets with opinions were recognized. Then, a binary senti-
ment classifier was trained to allocate sentiment polarities of the newly recognized
opinionated tweets. The classifier used the training data which is provided by the
previous lexicon-based method. Nandhini et al. [10] discussed feature-based senti-
ment analysis method by performing the feature extraction with the use of feature
dictionary and opinion extraction with the use of opinion dictionary and combined
the effect.
Many types of research use ensemble classifiers to increase robustness, accuracy
and better overall generalization [11]. Apart from using lexicon-based or machine
learning techniques, there was a hybrid approach of the rule-based classifier, lexicon-
based classifier and SVM as the machine learning classifier for Twitter messages
[12] and a scalable and useful lexicon-based approach for mining sentiments using
emoticons and hashtags with Hadoop [13]. Govindarajan [11] proposed an ensemble
classification with Naïve Bayes and genetic algorithm for movie review data. Shinde-
Pawar [14] proposed a sentiment analysis technique by applying artificial neural
network (ANN) and fuzzy logic.

2.2 Feature-Based Sentiment Analysis

When expressing the opinion on products or services, the customers mainly focus
on different features of the particular object. Any object can have different features
which the customers may like or dislike, in turn, may help the manufactures to
improve the quality of the product by focusing on the features that need further
attention. Also, the customers who wish to buy a particular object can analyze the
feature-wise sentiment and get to know about the most significant features. Feature-
wise sentiment analysis considers the overall sentiment of the review as well as the
sentiment on the particular features of the products [15].
The basic steps of feature-level sentiment analysis are data preprocessing, extrac-
tion of features and extraction of opinions, determine the polarity of opinion words
and identify the polarity of opinionated sentence and generation of summary [4].
Data preprocessing step uses different techniques like part of speech (POS) tagging,
lemmatization stemming and removal of stop words which can be useful for noise
removal and enabling in feature extraction in a dataset. Feature mining classifies the
product features which are being opinionated by customers. Opinion word extraction
classifies the text which contains sentiment or opinion. Opinion word polarity iden-
tification decides the sentiment polarity of the opinion word as positive, negative
or neutral; finally, the opinion sentence polarity identification step aggregates the
polarity by sentence-wise and summary generation aggregates the results obtained
from each sentence.
The features describing the products can be categorized into morphological type
features, frequent features, explicit or implicit features [2]. Morphological features
can be classified as semantic, syntactic and lexical structural [2]. Semantic features
are types of contextual information and semantic orientation [2]. Syntactic features
Feature-Wise Opinion Summarization of Consumer Reviews Using Domain Ontology 587

use chunk labels, POS tagging, dependency depth feature and N-gram words [2].
Lexical structural features consist of special symbol occurrences, word distribu-
tions and word-level lexical features which are infrequently used in opinion mining
[2]. Common features are the features which the customers have more interest in
Apriori association rule mining, or frequent pattern mining is broadly used to iden-
tify frequent features. Most features have explicitly appeared within reviews [16],
e.g., ‘picture quality is very good’ where the picture is an explicit feature which
can be taken from the review. But some features cannot be directly derived from the
review itself, e.g., ‘it fits in the pocket nicely’ which derives the feature size implicitly.
Appropriate feature extraction techniques which are used in sentiment analysis
has an important role for identifying relevant attributes and opinions and that may
lead to increase the classification accuracy.

2.3 Feature Extraction Methods

More researches were focused on how to extract features from reviews effec-
tively and efficiently. Feature or feature extraction methods can be categorized as
NLP or heuristic-based, frequency-based, statistical, syntax-based, clustering, super-
vised, unsupervised machine learning methods and hybrid approaches [2, 17]. The
frequency-based feature detection method only considers nouns or noun phrases as
possible features. The syntax-related methods discover features by means consid-
ering the syntactical relations. NLP-based methods usually identify the noun, adjec-
tives, noun phrases, adverbs which express product features. They have achieved
high accuracy, but low recall by using POS of tagging [2]. The key weakness of
clustering-based feature extraction is that it can only extract main features and it is
difficult to extract unimportant features [2].
Statistical techniques have computational efficiency, but they ignore feature inter-
actions. There are no much-supervised learning methods for feature extraction. The
power of supervised approaches depends on the features which often constructed
using other methods [17].
The OPINE system used the unsupervised information extraction approach to
identify product features [18]. Also, an unsupervised feature extraction method by
using a taxonomy is developed with user-defined knowledge [19]. A domain ontology
model to extract features from tabular data from the Web was introduced by Holzinger
et al. [20]. Also, structural correspondence learning (SCL) algorithm [21] and a
pattern-based feature extraction which adapted the log-likelihood ratio test (LRT)
[22] were proposed. Also, [16] proposed a technique for product feature extraction
using association rule mining based on the assumption that people frequently use
similar kind of words. A semi-supervised feature grouping technique was used where
the features are grouped based on the synonyms, words or phrases [23]. The paper
of [24] used different feature extraction or selection techniques, namely single word,
document-level, multiword, Tf-Idf single word, phrase-level and Tf-Idf multiword
588 D. Vidanagama et al.

sentiment analysis. Also, some approaches like dependency parsing [25] and joint
sentiment topic model using LDA [26] have been used to extract features of opinion.
Hybrid techniques use some combinational approaches [2] POS tagging with
WordNet dictionary, the combination of lexical and syntactic features with a
maximum entropy model, and a combination of association rules and point-wise
mutual information. The hybrid approach was used to identify product features which
uses bootstrapping iterative learning strategy with the use of additional linguistic rules
for mining less occurring features and opinion words.
On the other hand, most of the researches were used domain ontologies [27–33].
The ontology-based approach does not require training dataset for feature extraction.
In this approach, ontology is used to present domain knowledge about a domain and
allows to show the details of the product which is rated in opinion.

2.4 Polarity Determination of Opinion Words

The polarity of the opinion word consists of a sentence and may be changed according
to the domain context. There are no effective methods that can be used to exactly
define the written pattern of sentences [29]. To determine which words are semanti-
cally concerned with, authors measured the co-occurrence of new words with words
from a known seed set of semantically oriented words [30]. But word pairs with
equal polarity can only be determined using their high co-occurrence rates [30].
However, in the some literature, the associated opinion polarity is determined
using publicly available opinion lexicons such as SentiWordNet, general inquirer
and SenticNet. Also, there was a method for defining the polarity of sentiment words
only from the feature of the textual context by converting the textual context of
sentences as semantic pattern vectors [29]. Those semantic pattern vectors were used
to compare the similarity of two sentences while exploring the polarity of sentiment
words included within the sentences.
The sentiment dictionary named ‘SentiMI’ extracted sentiment terms with POS
information from SentiWordNet. Then, the mutual information for both positive and
negative terms was calculated. The final class label is determined by the related
positive and negative scores [32].
Some opinion words have the same orientation in any context, e.g., ‘amazing,’,
‘excellent,’ ‘bad,’ etc [33]. But some words are domain-dependent, and it is very
difficult to find the actual polarity for ambiguous words such as ‘unpredictable,’
‘high,’ ‘good’ and ‘long’ [33]. For example, ‘The phone has long battery life’ and
‘This program takes a long time to run,’ the opinion word ‘long’ has both positive and
negative polarities in the first and second sentences, respectively [33]. These kinds of
opinion words change their polarities according to the context. Therefore, it is neces-
sary to acquire the polarity of these kinds of words by comparing with the contextual
information. So considering only opinion words is not sufficient, and it is wise to
consider its associated feature also. Because of that, some literature discussed the
methods of using point-wise mutual information (PMI) for determining the polarity
Feature-Wise Opinion Summarization of Consumer Reviews Using Domain Ontology 589

of opinion words [34]. This method only considers opinion words for classification.
There is a statistical method named PMI-IR, to study polarity of words by calculating
the statistical dependence among two words that are often used together in the text
according to the following equation [31] (Eq. 1).

p(x, y)
P M I (X, Y ) = log (1)
p(x) p(y)

3 Proposed Methodology

The proposed framework consists of five main processes. The first process is to collect
the reviews, preprocess and generate dependencies. The second process is to create
the ontology and updating the ontology based on the new features with sentiment
expectation using PMI algorithm and opinions coming from the third process. The
third process is to determine the important features and extract the opinion words
associated with those features. The fourth process is to determine the sentiment of
opinions using semantic lexicons. The fifth process is to visualize a feature-based
summary of the reviews. The overall architecture is shown in Fig. 1.

3.1 Ontology Development

Ontologies offer a properly arranged knowledge depiction with the advantage of


reusability. However, it works using domain ontologies with relations between
concepts and also provides a common vocabulary for a domain. Use of ontology
in the feature extraction process improves the performance of feature identifica-
tion by providing the arrangement of the features and mining of features. Also, it

Fig. 1 Proposed feature summarization model


590 D. Vidanagama et al.

improves the way of generating the feature-based summary. The knowledge of the
relevant domain is required to construct the ontology of a specific domain.
The objective of the proposed framework is to provide feature-based sentiment
summarization for a specific domain. The proposed framework is applicable for any
domain by replacing the ontology. Ontologies can be constructed by using existing
ontologies in the specific domain or building on from scratch. Since this research
focused on the domain of mobile phone, the initial ontology was constructed by using
the ConceptNet which is a large semantic network consisting of a large number of
concepts. The concepts from ConceptNet were extracted up to level 4 as it was
extracted un-related concepts when the level is increasing. But as the ConceptNet
lacks with related concepts, the specifications from official mobile phone Web sites
were collected to expand the ontology. Also, each node of ontology was expanded
by merging with the synonym words from the WordNet database.
The ontology consists of four main classes: review, feature, feature property and
sentiment. The review class contains review id and line id subclasses. The feature
class contains terms associated on mobile phone domain. Its subclasses are applica-
tion, battery, camera, display, general, price, services and speed. The final summary
is generated according to these feature sets. These subclasses further contain their
subclasses with more specific aspects. Each feature has an object property of semantic
expectation. The feature property class contains the extracted opinion words from
reviews and has the object property of sentiment. The sentiment class contains
only three instance values including 1,-1 and 0 where they represent the sentiment
polarities as positive, negative and neutral. The initial ontology is depicted in Fig. 2.

3.2 Review Acquisition and Preprocessing

This process is responsible for collecting mobile phone-related reviews from


Amazon. The collected reviews are stored in the MongoDB database in JSON format.
Since the consumer reviews are generated by the users, so many irrelevant, incorrect
and duplicate reviews may be acquired.
Removing irrelevant data, sentence splitting, tokenization, POS tagging and
lemmatization are the major preprocessing techniques handled by this process. The
sentence splitting is the task of splitting the sentences of the reviews using delimiters.
Tokenization is the process to break up the review sentences into words by removing
white spaces and other symbols. POS tagging is the process of assigning a lexical
category to each word such as verb, noun and adjective. Lemmatization is the process
of mapping words into their base form or to a single meaningful item. The Stanford
CoreNLP from Stanford NLP group is used for these tasks (Fig. 3).
Feature-Wise Opinion Summarization of Consumer Reviews Using Domain Ontology 591

Fig. 2 Mobile phone


ontology

Fig. 3 Review preprocessing process


592 D. Vidanagama et al.

3.3 Feature and Opinion Extraction

The opinion about the product is expressed through a feature of the product. The
features which the opinions are expressed were extracted during this process.
For example, by considering the sentence, ‘This picture quality is awesome,’
‘picture quality’ is a feature of the product which the sentiment is expressed.
Always the feature may be a noun or noun phrase. The feature and opinion
extraction process adopt a rule-based strategy. The rules are derived based on
the type of dependency relation and POS tag pattern. The dependency rela-
tion is identified from Stanford Dependency Parser. The relation contains three
components consistent with the type of relation (R), parent word (P) and depen-
dent word (D) denoted as R(P, D). The feature-opinion pairs and new features
of each sentence are extracted using the algorithms in Figs. 4 and 5. Ri is ith
dependency relation between the two words Pi and Di of a single sentence.
Let X = ‘JJ/JJS/JJR/VB/VBG/VBD/VBP/VBN/VBZ/RB//RBR/RBS’ and N =
‘NN/NNS/NNP’.
The relevancy of candidate opinion is checked with SentiWordNet and updates
the ontology accordingly. The extracted features are matched with the constructed
domain ontology, to remove all the insignificant and unrelated features. All the
features are nouns/noun phrases, but all the nouns/noun phrases may not be features
relevant to that domain. So, the algorithm in Fig. 6 used a probabilistic model (Eq. 1)
to identify the relevant features which are not currently included in the ontology.
The semantic expectation of new features can be determined using the algorithm in
Fig. 7 and can update the ontology.
The seed positive word list consists of some strong positive words such as ‘good,’
‘nice,’ ‘positive,’ ‘fortunate,’ ‘correct,’ ‘excellent’ and ‘superior,’ and the seed nega-
tive word list consists of some strong negative words such as ‘bad,’ ‘nasty,’ ‘nega-
tive,’ ‘unfortunate,’ ‘wrong,’ ‘poor’ and ‘inferior’ used to calculate the sentiment
expectation of features (Fig. 7).
The Feature-Score uses a corpus of mobile review sentences which are already
collected to determine the feasibility of candidate feature as a relevant feature. The
Feature-Score value is based on mutual information between a candidate feature and
list of existing features (Eq. 2). If the candidate feature is relevant, then the category
of the new features within the ontology is decided by PMI value (Eq. 2). The new
feature is listed under the feature category which shows high PMI value among all
the other feature categories. Later on, such features were included under the relevant
category as an ontology update.
 f (ai bi )
Feature - Score = log2 ×N (2)
i
f (a) f (bi )

where
a candidate feature
bi existing features
Feature-Wise Opinion Summarization of Consumer Reviews Using Domain Ontology 593

Fig. 4 Feature-opinion generation algorithm

f (a, bi ) frequency of co-occurrence of feature a and bi in each sentence


f (a) no. of sentences in the corpus where a appears
N number of sentences.

GoogleHitsCount(a, Ci )
PMI(a, Ci ) = log2 (3)
GoogleHitsCount(a) × GoogleHitsCount(Ci )

where a is the candidate feature and C i is the ith feature category under ontology.
594 D. Vidanagama et al.

Fig. 5 New feature extraction algorithm

3.4 Sentiment Determination

After identifying the features and opinions, the subsequent stage is to define the
sentiment of each opinion. The sentiment of unambiguous opinions can be retrieved
through the ontology using the sentiment expectation of the feature. The sentiment
of the new opinions is generated through SentiWordNet. For the opinions with
ambiguous sentiment orientations, the actual sentiment orientation (+1, −1, 0) of
feature-opinion pair can be calculated by multiplying the sentiment expectation of
the feature and the sentiment orientation of the opinion as the sentiment orientation
of such opinion is changed according to the feature associated with it.
If any of the opinion words are associated with negation words such as ‘not,’ ‘no,’
‘nor’ and ‘none,’ then the sentiment of the feature-opinion pair is changed. If a certain
feature-opinion pair is depended with a negation word, the sentiment is inversed.
Feature-Wise Opinion Summarization of Consumer Reviews Using Domain Ontology 595

Fig. 6 Relevant feature identification

Fig. 7 Sentiment expectation algorithm for features

All the feature-opinion sentiment details of each sentence are recorded in ontology
as individuals. The final summarized sentiment of each feature category can be
generated by issuing a SPARQL query to the ontology.
596 D. Vidanagama et al.

4 Experiment and Evaluation

Experimental evaluation was carried out on the dataset derived from [35] which was
originally collected from Amazon.com. The dataset contains 69 reviews and 554
sentences in the mobile phone domain. The precision and recall are selected as the
evaluation metrics, which are commonly used in information retrieval and document
classification research. Precision is the proportion of the number of appropriately
classified things to the total number of things that were classified. Recall is the
proportion of the number of appropriately classified things to the total number of
things that were classified as the same category in the annotated dataset.
It can be justified that the proposed approach has good recall and precision in
predicting features with both positive and negative opinions. The high recall and
high precision value show the ability to extract all the relevant feature sentiment
pairs. The high F-score value shows the accuracy of the approach (Table 1).

5 Conclusion

Customer reviews are a rich source of information for other consumers and sellers.
Reading all the reviews is time-consuming and not easy to determine which infor-
mation can help to purchase a product. A preferable choice is to evaluate prod-
ucts based on domain-related features. Sentiment analysis is a process of analyzing
user-generated contents for positive, negative or neutral. This paper represents an
approach for analyzing the summary of reviews by considering the features and
sentiment word pairs from reviews. This framework allows collecting consumer
reviews from various sources on a specific domain and collectively analyzing the
sentiment of products under different features. The extracted features are compared
with the existing features of the domain ontology. The ontology is updated with new
features, feature sentiment orientation, opinion words and feature-opinion pairs. The
summary is generated for all the available reviews under each feature category.
Further research suggests generating overall sentiment summary for each review
and considering the polarity of opinions instead of sentiment orientation.
Table 1 Evaluation results
# Of annotated feature # Of features # Of correct feature Recall Precision F-Measure
sentiment extracted sentiment
Feature sentiment
Pos Neg Pos Neg Pos Neg Pos Neg Pos Neg Pos Neg
310 148 295 125 270 118 95.2% 84.5% 91.5% 94.4% 93.3% 89.2%
Feature-Wise Opinion Summarization of Consumer Reviews Using Domain Ontology
597
598 D. Vidanagama et al.

References

1. Local consumer review survey. The impact of online reviews, BrightLocal. https://www.bright
local.com/learn/local-consumer-review-survey/
2. Asghar, MZ, Khan A, Ahmad S, Kundi FM, Khairnar J et al (2014) A review of feature
extraction in sentiment analysis. Int J Comput Sci Inf Technol (IJCSIT) 5(3):4081–4085
3. Joshi NS, Itkat SA (2014) A survey on feature level sentiment analysis. Int J Comput Sci Inf
Technol (IJCSIT) 5(4):5422–5425
4. Kolkur S, Dantal G, Mahe R (2015) Study of different levels for sentiment analysis. Int J Curr
Eng Technol 5(2)
5. Liu B (2010) Sentiment analysis and subjectivity. In: Indurkhya N, Damerau FJ (eds) Handbook
of natural language processing, 2nd edn.
6. Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for
sentiment analysis, vol 37. Association for Computational Linguistics
7. Esuli A, Sebastiani F (2016) SENTIWORDNET: a publicly available lexical resource for
opinion mining. In: Proceedings of the 5th conference on language resources and evaluation
(LREC’06), pp 417–422
8. Liu B, Blasch E, Chen Y, Shen D, Chen G (2013) Scalable sentiment classification for big data
analysis using Naive Bayes classifier. In: IEEE international conference on big data
9. Zhang L, Ghosh R, Dekhil M, Hsu M, Liu B (2015) Combining Lexicon-based and learning-
based methods for twitter sentiment analysis. In: National conference on advanced technologies
in computing and networking, pp 89–91
10. Nandhini A, Vaitheeswaran G, Arockiam L (2015) A hybrid approach for aspect based
sentiment analysis on big data. Int Res J Eng Technol (IRJET) 2:815–819
11. Govindarajan M (2013) Sentiment analysis of movie reviews using hybrid method of Naive
Bayes and genetic algorithm. In: Int J Adv Comput Res 3(4)
12. Pedro P, Filho B, Pardo TAS (2013) A hybrid system for sentiment analysis in twitter messages.
In: Second joint conference on lexical and computational semantics, vol 2: Seventh international
workshop on semantic evaluation (SemEval 2013), pp 568–572
13. Kaushik C, Mishra A (2014) A scalable, lexicon based technique for sentiment analysis. Int J
Found Comput Sci Technol (IJFCST) 4(5)
14. Shinde-Pawar M (2014) Formation of smart sentiment analysis technique for big data. Int J
Innov Res Comput Commun Eng 2:7481–7488
15. Rotovei D (2016) Multi-agent aspect level sentiment analysis in CRM systems. In: 18th
International symposium on symbolic and numeric algorithms for scientific computing
(SYNASC)
16. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceeding of 10th ACM
SIGKDD international conference on knowledge discovery and data mining. ACM, Seattle,
WA, USA, pp 168–177 (2004)
17. Schouten K, Frasincar F (2016) Survey on aspect-level sentiment analysis. IEEE Trans Knowl
Data Eng 28(3):813–830
18. Popescu AM, Nguyenand B, Etzion O (2005) Extracting product features and opinions
from reviews. In: Proceedings of the conference on human language technology and empir-
ical methods in natural language processing. Association for Computational Linguistics,
Vancouver, British Columbia, Canada, pp 339–346
19. Carenini, G, Ng RT, Zwart E (2005) Extracting knowledge from evaluative text. In: Proceedings
of the 3rd international conference on knowledge capture. ACM Banff, Alberta, Canada, pp
11–18
20. Holzinger W, Krupl B, Herzog M (2006) Using ontologies for extracting product features from
web pages. In: Proceedings of the 5th international semantic web conference. Athens, Georgia,
USA, pp 286–299
21. Ben-David S, Blitzer J, Crammer K, Pereira F (2007) Analysis of representations for domain
adaptation. Adv Neural Inform Process Syst 19
Feature-Wise Opinion Summarization of Consumer Reviews Using Domain Ontology 599

22. Ferreira L, Jakob N, Gurevych I (2008) A comparative study of feature extraction algorithms in
customer reviews. In: Proceedings of the IEEE international conference on semantic computing.
Santa Clara, CA, pp 144–151
23. Zhai Z, Liu B, Xu H, Jia P (2011) Clustering product features for opinion mining. In: Proceed-
ings of the fourth ACM international conference on web search and data mining. Hong Kong,
China, pp 347–35
24. The importance of sentiment analysis in social media—Results 2Day, Results 2Day. https://
results2day.com.au/social-media-sentiment-analysis-2
25. Mosha C (2010) Combining dependency parsing with shallow semantic analysis for chinese
opinion-element relation identification. IEEE, pp 299–305
26. Lin C, He Y (2009) Joint sentiment/topic model for sentiment analysis. In: Proceedings of the
18th ACM conference on information and knowledge management (CIKM)
27. Alkadri AM, ElKorany AM (2016) Semantic feature based arabic opinion mining using
ontology. Int J Adv Comput Sci Appl vol. 7(5):577–583
28. Lin S, Han J, Kumar K, Wang J (2018) Generating domain ontology from Chinese customer
reviews to analysis fine-gained product quality risk. In: Proceedings of the 2018 international
conference on computing and data engineering, pp 73–78
29. Wang J, Ren, H. Feature-based customer review mining. https://www.researchgate.net/public
ation/242227984_Feature-based_Customer_Review_Mining
30. Mukherjee S, Joshi S (2013) Sentiment Aggregation using concept net ontology. In:
Proceedings of the sixth international joint conference on natural language processing, pp
570–578
31. Sureka A, Goyal V, Correa D, Mondal A (2010) Generating Domain specific ontology from
common-sense semantic network for target specific sentiment analysis
32. Wang BB, McKay RIB, Abbass HA, Barlow M (2003) A Comparative study for domain
ontology guided feature extraction. In: Proceedings of the 26th Australasian computer science
conference, vol 16, pp 69–78
33. Vicient, C, Sánchez, D, Moreno A (2011) Ontology-based feature extraction. In: Proceedings
of the 2011 IEEE/WIC/ACM international joint conference on web intelligence and intelligent
agent technology—Workshops, WI-IAT
34. Turney PD (2001) Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In:
Proceedings of the twelfth European conference on machine learning. Springer, Berlin, pp
491–502
35. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the ACM
SIGKDD international conference on knowledge discovery and data mining (KDD-2004)
Machine Learning-Based Approach
for Opinion Mining and Sentiment
Polarity Estimation

H. K. S. K. Hettikankanama, Shanmuganathan Vasanthapriyan,


and Kapila T. Rathnayake

Abstract Opinions of others may be essential in making decisions or selecting


from a variety of alternatives. Review of customer feedback helps to improve sales
and eventually benefits the company. Most online businesses use recommendation
systems which use data mining and machine learning algorithms to find the right
product for the right customer at the right time to increase customer satisfaction.
This study illustrates how to increase the quality of the product selection process
for customers by reducing information overloading and complexity. The goal of
this study is to propose a novel product ranking model considering user reviews
which enable multiuser recommendation. Dataset was taken through some different
supervised learning methods and the best accurate algorithm was proposed. Values
are predicted considering positivity and negativity of reviews for a particular product
using the proposed algorithm. Products are ranked according to the given value. New
recommendation model and its workflow are illustrated here.

Keywords Machine learning · Opinion mining · Recommendation system ·


Sentiment analysis

1 Introduction

With the evolution of the Internet and people’s enthusiasm to perform tasks with
few clicks, a thing that got affected is how vendors and customers perform their
buying and selling tasks. The physical stores and businesses were moved to the
web by extending their services and capabilities worldwide by allocating a part of
cyberspace rather than being limited to one single physical location creating e-market
and e-commerce. These systems can promote business and organizations online and

H. K. S. K. Hettikankanama (B) · S. Vasanthapriyan


Department of Computing and Information Systems, Faculty of Applied Sciences, Sabaragamuwa
University of Sri Lanka, Balangoda, Sri Lanka
e-mail: mihirini@is.ruh.ac.lk
K. T. Rathnayake
Department of Physical Sciences and Technology, Faculty of Applied Sciences, Sabaragamuwa
University of Sri Lanka, Balangoda, Sri Lanka

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 601
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_44
602 H. K. S. K. Hettikankanama et al.

make it able for customers to perform their tasks with the minimal human intervention
[1]. This allowed vendors to expand the business beyond geographical limitation and
acquire huge customer base worldwide more efficiently.
As many of the physical stores began to move to the web, only being a part
of the web did not fulfilled the expectations of businesses. There emerged a huge
competition between online businesses. Therefore, they strive to achieve competitive
advantage to get more customer attraction and increase their sales. By increasing
customer satisfaction, the businesses can gain more customer attraction which shows
that e-retailers should provide e-satisfaction to create a good customer base [2].
Number one goal of many leading businesses worldwide is to make sure their
customers are satisfied [3]. To make customers satisfy, it is vital to provide the
service they expect by providing value for their money and facilitate them. It should
be made easier for customers to find what they want with good quality and in less
time and the process should be less exhaustive. For that, most online businesses use
recommender systems to find the right product for the right customer at the right
time [4]. These systems can increase the quality of the decisions that customers
make while searching and selecting a product by reducing the information overload
and complexity caused by them [5].
Product recommendation systems can be identified as filtering tools which use
data, pattern recognition and filtering to suggest the most relevant items to a partic-
ular user [6]. Those systems work on specific parameters like ratings or product
properties. These systems are having the challenge to give high quality and relevant
recommendations without errors in reduced time. This will increase the traffic to the
website [7]. Recommendation systems use various data mining and machine learning
algorithms [7]. As there is a huge competition between online businesses, the quality
and accuracy of product recommendation system chooses the winner in online sales
nowadays.
It is found that people read reviews given by other customers before purchasing.
The survey done by Myles Anderson shows that 88% consumers trust online reviews
like personal recommendations [8].
In this paper, it proposes a new model for product recommendation by rating
the products according to the reviews given by users. According to this method, the
textual reviews given by previous customers of a particular product are considered
and the products are ranked and sorted according to the cumulative score given to the
product and the recommendation is done as customer search by keywords. With this
method, it will suggest good quality products as reviews come from many users and
they provide a practical opinion with a wide view which provide multiuser recom-
mendation with the upcoming trend of product recommendation. The architecture
of the proposed model and method took to conduct the research and the results are
discussed in this paper.
Machine Learning-Based Approach … 603

2 Related Work

K. Yogeswara Rao, G. S. N. Murthy and S. Asinarayana have done a study titled,


product recommendation system from user reviews using sentiment analysis [9].
Here, they propose a recommendation system based on sentiment analysis within
the frame of matrix factoring. In their model, they have four steeps as data pre-
processing, extracting measures, sentiment measures, sentiment analysis. Here, they
give scores to each word and calculate sentiment polarity to recommend products.
Alia Hassan and Ahmed Abdul Wahhab in their study titled reviews sentiment
analysis for collaborative filtering [10] present some sentiment classification tech-
niques. The techniques are divided into two parts as machine learning approaches and
Lexicon-based approach and they categorize techniques under those two as shown
in Table 1.
When considering the latest years, in a publication done in 2017, SVM and senti-
ment analysis were proposed by Elmurngi and Gherbi proposes SVM and sentiment
analysis to identify reviews. SVM, decision tree, Naive Bayes and K-neighbors clas-
sifications performance were compared on a corpus which carries stop words and on
a corpus which has no stop words. For both, SVM had the best performance, which
recorded accuracies of 81.75 and 81.35% [11]. Ramadhan et al. at their study done
a sentiment analysis by using TF-IDF feature extraction and logistic regression on a
Twitter dataset which reported an accuracy of 83% [12].
In some studies done on 2018, Chakraborty and Das experimented using TF-
IDF model together with SVM on Amazon product review dataset which reported
accuracy of 85.86% [13]. Bhavitha, Rodrigues and Chiplunkar in their comparative
study which used many machine learning methods, sentiment analysis and lexicon-
based methods for reviews their results indicated an accuracy of 74% for sentiment-
based methods and 86.40% for SVM-based methods [14]. Vasileios Athanasiou and
Manolis Maragoudakis in their study titled as a novel, gradient boosting framework
for sentiment analysis in languages where NLP resources are not plentiful: a case
study for Modern Greek, used boosting machine learning algorithm for sentiment
analysis and found a precision accuracy of 86.20% [15].

3 Methodology

This research was conducted according to a specific method as shown in Fig. 1. In


this method, first of all, an questionnaire-based online survey to collect requirements
is done and the results given by the questionnaire were analyzed to understand draw-
backs of the current system, how much people trust the base of the current system
and how much will people trust the base of proposed system and suggestions to
improve the ranking for recommendation are take. After that, the relevant dataset
which has enough reviews on the product was taken and then as the next step, the
604 H. K. S. K. Hettikankanama et al.

Table 1 Techniques that can be used in sentiment analysis


Machine learning Probabilistic classifiers—based Naive Bayes
on hypothesizing a model which classifier—computes the positive
comes from original feature space probability of a class based on
word distribution
Maximum entropy—is a
conditional exponential
classifier. Use the labeled feature
as a vector and calculate the
occurrence of every vector
Linear classifiers—trains a model Support vector machine
to classify objects to classes with (SVM)—is a supervised classifier
a line which can be used for
classification and regression
analysis. Much good to use with
text data
Neural network—based on the
functionality of neurons.
Multilayer NN can be used for
nonlinear boundaries
Decision tree classifier—uses a division of data. Here, division can be
done based on availability or absence of a word in the text
Rule-based classifier—depends on modeling data with rules set.
Here, it can be used as if a word takes positive emotion, the review is
taken as positive, and if words hold negative emotion, the review is
negative
Unsupervised learning—Mostly used clustering. K-means, Gaussian
mixture model, hierarchical clustering, hidden Markov are some used
clustering method
Lexicon-based approach Dictionary-based approach—uses a collected opinion word with
defined orientation. This can be expanded by wordnet for synonyms
and antonyms
Corps-based approach—measure orientation of sentiment

data preprocessing is done to take the data into an applicable condition. Then senti-
ment values for the review text are calculated and sentiment polarity is estimated.
Then some insights of dataset and behaviors of some variables in relation to others
are measured. The supervised learning methods are applied on dataset and learning
algorithm with the highest accuracy is identified. After this, a product ranking method
considers the sentiment value of user reviews. Next step is model validation for high
accuracy as it is a vital characteristic of a recommender system. To research as the
programming language, Python programming language is used and Jupiter Microsoft
Azure Notebook is used as a development environment.
Machine Learning-Based Approach … 605

Fig. 1 Methodology

3.1 Online Survey

To validate the outcome of this research is giving value to the society and to gather
information and perspective from a wider audience, a survey was done in the form
of an online questionnaire. Design of the questions is as shown in Table 2.

3.2 Gather Relevant Data

Here, the customer product reviews given by users are extracted in native format. In
the extracted dataset, there are more than 400,000 review data from Amazon.com
on some different brands of mobile phones. It contains reviews from Amazon.com
about unlocked mobile phones. Dataset carries basic product information, price,
ratings and review vote ratings. Here, the dataset was extracted using web scrapper
PromptCloud.
Python programming language is used in analyzing and implementing the model.
A dataset which contains 400 thousand user reviews on 4410 products from 385
brands is taken.
606 H. K. S. K. Hettikankanama et al.

Table 2 Construct and question of online questionnaire


Construct Question item
Learn the awareness of online product 1. Have you heard about product
recommendation systems among online recommendation systems?
shopping sites 2. What do you think a product
recommendation system should do?
Identify whether the person is a user of online 3. Have you done online shopping?
shopping
Identify whether the person has been subjected 4. Have you ever been a victim of receiving
to receiving poor quality products than poor products than expected when
expected in online shopping and they care purchased online?
about that 5. How does it feel for you?
Get an idea of the person about the current 6. How much do you trust ratings given for
mechanism used to recommend products products? (Star Ratings)
Get an idea of how much people consider user 7. Do you prefer asking real users of the
given textual reviews before making an online product before purchasing online, even after
purchase seeing the product has acceptable ratings?
Identify the needfulness of considering user 8. Do you read reviews given by real users of
reviews in product recommendation the product before purchasing online?
9. How many reviews normally you read
before making a purchase?
10. How much time would you prefer to
allocate normally for reading reviews?
11. What do you think? Analysis of reviews by
reading by yourself is,
Get the user’s acceptance and agreement on a 12. Do you think it will help if a product
review-based ranking system for recommendation system analyzes real user
recommendation reviews and recommend products for you?
13. How much will you trust a system which
analyzes textual real user reviews and
recommend products?
Get suggestions from people to improve the 14. Do you have any suggestion to improve
system current product recommendation systems?

3.3 Data Preprocessing

Data preprocessing was done as a vital part of this study as this is the part where
most related words for sentiment analysis are extracted from the review text. Here, it
was done in six major steps. This preprocessing mechanism increased the accuracy
of prediction nearly from 40% as shown in Fig. 2. These steps were done using the
Python programming language.
HTML Element Removal As the data was scraped directly from the web, there
could be some HTML tags included in the reviews; they will make no sense when it
comes to sentiment analysis.
Special Characters Removal Here, stars, dots, commas and all other special
characters are removed from review text.
Machine Learning-Based Approach … 607

Performance
results without data
pre-processing

Performance
results after data
pre-processing

Fig. 2 Results comparison before and after data preprocessing

Non-word Entry Removal Here, the numerical entries in reviews are removed,
such as model numbers, price information.

Stop Word Removal Stop words like prepositions are removed from the reviews,
e.g., a, the, on etc.

Null Value Removal This is done in later part in this preprocessing activities as
the above activities can create some reviews null as it removes several words from
review text.

Lemmatization Here, the words used in review texts are taken into its common root
word for the analyzing purposes, e.g., running, ran are from the root word ran.

3.4 Sentiment Analysis

Review texts are analyzed and their negativity positivity identified as supervised
learning methods are going to be applied on dataset.

3.5 Insights of Dataset

In this phase of methodology, behavior of dataset and co-relation analysis is done to


understand the dataset.
608 H. K. S. K. Hettikankanama et al.

3.6 Applying Supervised Learning Methods on Dataset

Here, the best accurate algorithm to predict scores for products considering sentiment
polarity has to be proposed. In this research, some classification algorithms are used
and their results were collected. By comparing results of accuracy of each algorithm,
one most appropriate algorithm was proposed. In this study, random forest classifier,
decision tree classifier, K-neighbors classifier, AdaBoost classifier and ensemble
algorithm are used.

3.7 Proposing Best Accurate Algorithm

By applying all above-mentioned algorithms on dataset, their performance measure-


ment results are recorded. Here, as performance measurements cross-validation accu-
racy, F 1 score, precision, recall, root mean square error, mean absolute error are used.
As these measurements depict the accuracy of the algorithms, the algorithm which
has expected best results is proposed as the most accurate algorithm to predict results.
Above-mentioned results of performance measurements are recorded and one with
the highest value for cross-validation accuracy, F 1 score, precision, recall and least
values for mean absolute error and root mean square error is taken as the best accurate
algorithm to perform this prediction.

3.8 Predicting Values Using Proposed Algorithm

The algorithm which was proposed is trained and sentiment polarity of each review
is predicted using the proposed algorithm.

3.9 Ranking Products for Recommendation

An average value was given to each product considering sentiment polarity of each
review of a particular product.
Machine Learning-Based Approach … 609

4 Results and Discussion

4.1 Results of Online Survey

The questionnaire was completed by 164 participants where all of them are students
of foreign and local universities who follow computer science-related course. Results
show that 69.5% of people who answered have heard of product recommendation
systems while 93.9% have done online shopping. 89.9% of people had been victims
of receiving poor products than expected while shopping online. The questionnaire
shows that 68.9% are not trusting star ratings given for products as an answer for
6th question. 98.8% stated that they refer real user comments prior to buying online
though the product is having high star ratings. The majority stated that they read 5
to 10 number of reviews prior to purchase where it indicates 75.6% and 78% have
stated that reading and analyzing reviews by themselves is hard and could be misled
and 92.1% accept that it will be helpful if ranking system analyzes user reviews and
recommend products. 85.9% stated that they will trust products ranked considering
real user comments.

4.2 Results of Applying Supervised Learning Method


and Proposing Algorithm

In this study as described earlier, random forest classifier, decision tree classifier, K-
neighbors classifier and AdaBoost classifier are used. Their results were recorded as
shown in Table 3. According to results, K-neighbors classifier is proposed as the most
appropriate algorithm as it has highest precision, recall, cross-validation precision,
F 1 measure and also least mean error. Results comparison is graphically shown in
Fig. 3.

Table 3 Performance measurement results of algorithms


Algorithm name Cross-validation F 1 -score Mean Precision Recall Root
accuracy absolute mean
error square
error
Random forest 86 46 13.95 43 50 37.35
classifier
Decision tree 86 50 13.63 79 52 36.91
classifier
K-neighbors 87 54 12.83 92 54 35.81
classifier
AdaBoost 86 46 13.95 43 50 37.35
classifier
610 H. K. S. K. Hettikankanama et al.

100
90
80
70
60
50
40
30
20
10
0
Cross Validaon F1-score mean absolute precision recall root mean squre
Accuracy error error

Random Forest Classifier Decision Tree Classifier K-Neighbours Classifier Ada Boost Classifier

Fig. 3 Comparison of performance measurements of algorithms

Ensemble approach which causes to increase the accuracy is also used to predict
the sentiment values. The comparison of results along with results of K-neighbors
as shown in Fig. 4 ensures that K-neighbors is still having the highest accuracy and
that algorithm was proposed as the best algorithm to predict sentiment polarity of
reviews.
For the evaluation of the model, some evaluating mechanisms which consider
different types of measurements as accuracy or coverage are used. Accuracy measures
the number of correct recommendation divided by all possible recommendations.
Coverage is objects considered divided by objects in the search space. Accuracy

Fig. 4 Comparison of ensemble and K-neighbors algorithms


Machine Learning-Based Approach … 611

measurements are categorized into two parts as statistical and decision support accu-
racy metrics. Statistical accuracy metrics evaluate by comparing predicted recom-
mendations with actual user recommendations. Correlation, mean absolute error
(MAE) and root mean square error (RMSE) are statistical accuracies measuring
metrics here.
1 
MAE = |Pu,i − Ru,i | (1)
N u,i

Here Pu,i mean predicted rating for user u on item i. Ru,i is the actual rating and N
is the total number of rating on item set. Lower the MAE is better and same for the
RMSE.

1
RMSE = ( pu,i − ru,i )2 (2)
n u,i

Precision recall curve (PRC), reversal rate, receiver operating characteristics


(ROC), weighted errors are some mostly used decision support accuracy metrics.
Precision, recall and F-measure are calculated as follows.

Correctly recommended items


Precision = (3)
Total recommended problems
Correctly recommended items
Recall = (4)
Total useful recommended problems

F-measure used to take precision and recall into single metric.

2P R
F_measure = (5)
P+R

Coverage is measured considering the number of users that the recommender


system helps when considering the number of users of the product.
According to results, K-neighbors algorithm is used to predict sentiment value
and products were ranked according to those scores. Product ranking of the final
system results is as shown in Fig. 5.

Fig. 5 Product ranking according to review polarity score


612 H. K. S. K. Hettikankanama et al.

Figure 5 shows the final results of the system which ranks products for recommen-
dation by considering all textual reviews given by customers for a particular product
by analyzing reviews using KNN classifier and identifying positivity and negativity
of each review and finally calculating the product score by considering all reviews.

5 Conclusion and Recommendation

Here in this research, it demonstrated the following objectives at the initializing


phase and here at the conclusion emphasize that all three objectives of the research
are fulfilled.
Objective one which was to identify different types of methods used for product
recommendation was fulfilled through the literature review as shown in the literature
review part. The models which are being used and how much accuracy was given
by those models and for what purposes those models are used are identified in the
literature review part.
Objective two which was to propose a more effective algorithm for estimating
sentiment polarity of a review text was facilitated here as this study compares
some algorithms and proposing the best accurate algorithm considering performance
measurement results of all algorithms. Here, K-neighbors algorithm was proposed as
the algorithm with the best accuracy facilitating the second objective of the research
work.
Objective three which was to facilitate customers by providing recommendations
based on textual reviews given by product users was fulfilled here as product ranking
was done considering review sentiment polarity and products are presented to the
user considering highest average product score. Thus, the third objective also was
fulfilled.
As some limitations of this research, followings can be highlighted.
• Achieving 100% accuracy by automating text analyzing is not possible as human
also feel conflicts when trying to understand natural language statements
• Limited to English language
• Deep learning was barred due to limited resources.
As for future research direction, here the followings are recommended.
• Train the model to identify Sinhala and Singlish languages to be used in local
online businesses (any other language). To achieve this, this methodology can be
used.
• Combining review sentiment-based recommendation system with an already
using recommendation algorithms.
• Consider some more variables which affect product credibility along with review
sentiment intensity to create a better recommendation algorithm. Variables could
be the age of the review, credibility of the user who gave the review.
Machine Learning-Based Approach … 613

Acknowledgements Authors of this study thank all the lecturers of Faculty of Applied Sciences
who helped in making this successful. And much thankful to Sabaragamuwa University of Sri Lanka
for encouraging the research.

References

1. Mahmood SMF (2016) E-commerce, online shopping and customer satisfaction: an empirical
study on e-commerce system in Dhaka
2. Szymanski D, Hise R (2000) E-satisfaction: an initial examination. J Retail 76(3):309–322
3. Fieldboom.com (n.d.) Customer satisfaction survey. https://www.fieldboom.com/customer-sat
isfaction-surveys
4. Wang J, Zhang Y (2013) Opportunity model For E-commerce recommendation. In: Proceedings
of the 36th international ACM SIGIR conference on research and development in information
retrieval—SIGIR’13, pp 303–312
5. Spiekermann S (2001) Online information search with electronicagents: drivers, impediments,
and privacy issues. Unpublished doctoral dissertation. Humboldt University, Berlin
6. MacKenzie I, Meyer C, Noble S (2013) How retailers can keep up with consumers. McKinsey
& Company. https://www.mckinsey.com/industries/retail/our-insights/how-retailers-can-keep-
up-with-consumers
7. Vaidya N, Khachane AR (2017) Recommender systems-the need of the ecommerce ERA. In:
2017 International conference on computing methodologies and communication (ICCMC),
Erode, 2017, pp 100–104. https://doi.org/10.1109/ICCMC.2017.8282616
8. Anderson M (2014) 88% of Consumers trust online reviews as much as personal recommenda-
tions. Search Engine Land. https://searchengineland.com/88-consumers-trust-online-reviews-
much-personal-recommendations-195803
9. Rao KY, Murthy GSN, Adinarayana S (2017) Product recommendation system from users
reviews using sentiment analysis
10. Abdul Hassan A, Abdulwahhab A (2017) Reviews sentiment analysis for collaborative
recommender system. Kurdistan J Appl Res 2(3):87–91 (2017)
11. Elmurngi E, Gherbi A (2017) An empirical study on detecting fake reviews using machine
learning techniques. 107–114. https://doi.org/10.1109/INTECH.2017.8102442.
12. Ramadhan WP, Novianty STMTA, Setianingsih STMTC (2017)Sentiment analysis using multi-
nomial logistic regression. In: 2017 International conference on control, electronics, renewable
energy and communications (ICCREC), Yogyakarta, 2017, pp 46–49. https://doi.org/10.1109/
ICCEREC.2017.8226700
13. Das B, Chakraborty S (2018) An improved text sentiment classification model using TFIDF
and next word negation. Cornell University Library, Computation and Language
14. Bhavitha BK, Rodrigues AP, Chiplunkar NN (2017) Comparative study of machine learning
techniques in sentimental analysis. In: 2017 International conference on inventive communi-
cation and computational technologies (ICICCT). IEEE, pp 216–221
15. Athanasiou V, Maragoudakis M (2017) A novel, gradient boosting framework for sentiment
analysis in languages where NLP resources are not plentiful: a case study for Modern Greek.
Algorithms 10(1):34
Early Detection of Diabetes by Iris Image
Analysis

P. H. A. H. K. Yashodhara and D. D. M. Ranasinghe

Abstract Diabetes has become a global problem due to changing lifestyles, daily
eating habits, level of stress encountered by people, etc. According to statistics of
the World Health Organization (WHO) in 2016, 8.5% of the adult population of the
world is suffering from diabetes. Therefore, early detection of diabetes has become a
global challenge. The iris of the human eye depicts a picture of the health condition
of the bearer. Iridology is a method conceived decades ago that focuses on the study
of iris patterns such as texture, structure and color for diagnosis of various diseases.
By analyzing the images of human iris, a medical imaging method was explored with
computer vision for the identification of diabetes. Iris analysis of the human eye is
conducted based on the pancreas, kidney and the spleen of the human body where
the local datasets were collected using a Digital Single Lens Reflex (DSLR) camera.
Diabetes detection system with a low cost was created focusing on the localization,
segmentation, normalization and the system predicts the severity of diabetes with
85% accuracy.

Keywords Convolutional neural network · Diabetic · Feature extraction ·


Irido-diagnosis · Iris · Localization · Region of interest · Segmentation

1 Introduction

Due to the changes in the present lifestyle, eating habits, stress level and lack of
exercises, the entire humankind is facing various health issues. Specially, diabetes
is spreading at an alarming rate among people according to statistics of the World
Health Organization (WHO). The global report on diabetes and 2018 statistics of
WHO [1] reported that 8.5% of the adult population is suffering from diabetes and the

P. H. A. H. K. Yashodhara (B) · D. D. M. Ranasinghe


Department of Electrical & Computer Engineering, The Open University of Sri Lanka, Nawala,
Nugegoda, Sri Lanka
e-mail: yashodharahansi284@gmail.com
D. D. M. Ranasinghe
e-mail: ddran@ou.ac.lk

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 615
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_45
616 P. H. A. H. K. Yashodhara and D. D. M. Ranasinghe

percentages are on the rise. Sri Lanka is not an exemption from the global picture, and
according to reports of International Diabetes Federation (IDF) 2015, the prevalence
of diabetes among adults in Sri Lanka is also 8.5% and 1 in 12 adults suffer from
diabetes. According to WHO 2016 report, 7% of total deaths in all age categories
in Sri Lanka are due to diabetes and related complications. In addition, long-term
diabetes leads to damages in blood vessels in the heart, brain, legs, eyes, kidneys,
nerves system, etc.
Diabetes can be identified using various medical tests such as blood pressure tests,
fasting blood sugar tests, random blood sugar tests and oral glucose tolerance test.
[2]. However, from all these methods, diabetes is found accurately when it reaches
a certain level of maturity, which would be hard to cure then. In addition, diabetes is
the leading cause of blindness in humans in various ways such as cataract, glaucoma
and damage of blood vessels inside the eye. At that stage, a simple method of curing
is not possible; hence, a lifelong tedious, costly and life-risking medical process has
to be followed. Therefore, early detection of diabetes is one of the most essential
health requirements at present. This research introduces an alternative method of
pre-identification of diabetes by analyzing the human eye iris.
The iris is a part of the expansion of the nervous system and brain and consists
of thousands of nerve endings, blood vessels, connective tissues and nerve impulses
[3]. The nerve fibers in the iris of the eye receive impulses from the rest of the body
via the spinal cord, optic nerve and optic thalami. It is identified that iris of the human
eye represents the health conditions of various body organs; hence, by analyzing the
human eye iris, the health condition of various body organs can be detected easily.
Iridology [4] is an ancient medicine technique, which provides an alternative to
conventional medical diagnosis methods. Iridology makes use of color, the structure
of tissues, shape, patterns, pigments and several features of human eye iris to predict
abnormalities of body organs of an individual. It is a “pre-disease state” through
medical diagnosis and pigment abnormalities in the iris. The location of the abnor-
mality in the iris is related to the medical condition of the relevant body organ. Hence,
by analyzing these abnormalities of pigmentations, various health conditions can be
identified.
Iridology motivates for healthy behavior and cautious for prevention from diseases
throughout all stages of life and paves the way for a non-invasive, automated, accurate
and preventive healthcare model. In this research, computer vision and artificial
intelligence were linked with iridology to predict diabetes at an early stage-examining
diseased iris for tissue changes, spots and pigmentations in the pancreas, kidney and
spleen. An automated and non-invasive system based on image acquisition of the
iris, image pre-processing, localization, segmentation, normalization, the region of
interest extraction according to iridology chart and feature extraction was developed.
A convolutional neural network was used for classification. Section 2 of this paper
contains the literature survey on similar researches, Sect. 3 is on methodology, Sect. 4
contains results and discussion and Sect. 5 is on conclusion and future work.
Early Detection of Diabetes by Iris Image Analysis 617

1.1 Background

This section explains the background of the related work regarding iris recognition
and diagnosing of diabetes.
Eyes have long been known as the “windows of the soul.” But very few people are
cognizant of how accurate this observation is. Usually people know that diabetes can
effect temporary blurred vision, or it may lead to severe, permanent vision loss and
increase the threat of rising glaucoma and cataracts. Scientifically, it is identified that
patients with diabetes have a variety of symptoms, including diabetic retinopathy,
cataracts, diabetic macular edema (DME) and glaucoma. Diabetes can be identified
at early stages by the observing eyes is a lesser known fact.
Iridology is an alternative medical technology that examines colors, patterns
and other features of the iris of the eye that can be used to determine
the patient’s systemic health condition. Iridologists compare their observations
with iridology charts. These iridology charts contain different zones that correspond
to specific parts of the human body. Jensen [5], who drew the iridology chart given
in Fig. 1 describes that any specific changes in body tissues due to various reasons
are indicated by corresponding nerve fibers in the iris. This states that the features
of iris vary considerably according to the physical changes of the body organs.

Fig. 1 Iris chart for the right iris and left iris in eyes
618 P. H. A. H. K. Yashodhara and D. D. M. Ranasinghe

The iris can be defined as an extension of the brain which consists of microscopic
blood vessels, connective tissues and thousands of nerve endings that are attached to
each tissue of the body through the nervous system and the brain.
The nerve fibers receive their impulses from the optic nerve, spinal cord and optic
thalami. Therefore, what is revealed in the iris is a reflex condition of the body by
showing up as lesions or marks and color variations in the iris. This is why the eyes
have been called “the windows of the soul.”

1.2 Problem, Aim and Objectives

The research problem addressed in this research is how diabetes can be identi-
fied at an early stage. The aim of the research is to device a mechanism to use
non-invasive, automated and accurate alternative medicine technique to early detect
diabetes. Objectives of the research are to learn the concepts, methods and techniques
of the alternative medicine technique iridology, study the changes in the features of
the iris with respect to diabetes, design and implement an algorithm for iris recogni-
tion and develop a system, evaluate the developed system against benchmark dataset
and apply the evaluated system for local dataset to predict diabetes.

2 Related Work

In this research, related literature is being reviewed in order to find out the work
carried out in iris analysis with iridology. This section summarizes various works
done in this direction along with tools used with the organ related.
The pancreas has used as the organ for diagnosing diabetics in Ref. [6]. The
research aimed to value iridology as a diagnosis method for diabetes. A database of
200 subjects has used in this research. Thereafter, localization, segmentation, rubber-
sheet normalization and ROI extraction have done under pre-image processing.
Statistical, textural and 2D-DWT features have extracted in both eyes. For classi-
fication, six classification algorithms have accessed. Maximum accuracy of 89.66%
has achieved by the random forest (RF) classifier.
The gallbladder has used as the organ to diagnose type II diabetes in Ref. [7].
The research aimed to apply iris image analysis for clinical diagnosis in an effi-
cient manner and determine the health status of organs. After iris image acquisition,
noise removal and enhancement have done before analysis. The iris is obtained
by subtracting pupil from the sclera. Normalization was carried out to convert the
circular part to a rectangular shape. ROI has identified by visual inspection as per chart
of iridology. Various features have extracted from both iris eyes. For classification
purpose, a new method, support vector machine (SVM), has been used.
In Kale et al. [8], various methods and techniques, which are already implemented,
have put together for the detection of diabetes. It can be seen from the survey, that
Early Detection of Diabetes by Iris Image Analysis 619

diabetic retinopathy detection is more accurate in iridology applications. The survey


has found that artificial neural network (ANN) classifier provides greater accuracy
in classification over other classifiers.
The main purpose of the research given in Ref. [9] is to apply image processing
using MATLAB code to the images and find out whether the patient is diabetic or not.
In this research, image acquisition was done using 5 MP iridoscope. Pre-processing
has done to remove reflections, background noises and normalize the intensity of the
individual pixel. Thereafter, segmentation has been carried out to partition the image
into its objects. The pancreas has been considered as the organ; hence, positions of
those body organs have used according to iridology chart. Various features such as
gray level, discrete cosine transform (DCT) have been extracted. This research was
able to detect diabetes or not with an accuracy of 83%.

3 Methodology

Initially, a dataset of iris images of 100 persons (50 diabetic and 50 non-diabetics) was
taken by using a Canon Digital Single Lens Reflex (DSLR) camera with a 50 mm
macrolens. Color images of size 5184 × 3456 of both irises on both eyes were
captured simultaneously. Thereafter, pre-image processing, post-image processing
and various methods and techniques of machine learning were applied to the system.
After noise removal, localization and normalization were carried out and then each
iris was converted into a 2D array. Several regions of interest (ROIs) were extracted
from the left eye and right eye as per the iridology chart of Dr. Jensen Bernard. ROI
was extracted corresponding to the positions of pancreas, kidney and spleen organs
in both of the iris. For classification, the convolutional neural network was used. The
system was tested using local dataset as well as UBIRIS [10] standard dataset.

3.1 Top-Level Design

The system is divided into several stages. The process described in the previous
section is captured into the top-level diagram given in Fig. 2.

3.2 Data Collection

The eye images were captured with the help of Canon EOS 700D DSLR camera with
50-mm (mm) macrolens and stored in the database, which contained normal as well
as abnormal results of iris (Figs. 3 and 4).
620 P. H. A. H. K. Yashodhara and D. D. M. Ranasinghe

Fig. 2 Top-level design of the system

Fig.3 Canon DSLR camera


Early Detection of Diabetes by Iris Image Analysis 621

Fig. 4 50 mm macrolens

3.3 Pre-processing Stage

Pre-processing refers to the transform the image of the eye in such a way that it can
extract the desired features. It can be divided into three steps; noise reduction, iris
localization and iris normalization.

3.3.1 Noise Removing

The dataset contained noises such as salt and pepper noise, poisson noise, Gaussian
noise and various reflections. Removal of noise is essential to analyze an image with
a more refined dataset than the raw dataset. Hence, noise removal was done using a
wiener filter.
Wiener filter in the Fourier domain
D ∗ (x, y)Rs (x, y)
G(x, y) = (1)
|D(x, y)|2 Rs (x, y) + Rn (x, y)

D(x, y) Degradation function


D* (x, y) Complex conjugate of degradation function
Rn Power Spectral Density of Noise
Rs Power Spectral Density of un-degraded image.

The term Rn /Rs is the reciprocal of the signal to noise ratio (Figs. 5 and 6).
622 P. H. A. H. K. Yashodhara and D. D. M. Ranasinghe

Fig. 5 Original eye image

Fig. 6 Wiener filtered image

3.3.2 Iris Localization and Segmentation

The first step in iris segmentation is finding the inner and outer boundaries of an
iris. For the past few decades, various algorithms were proposed by researches for
segmentation of iris from the eye. Daughman presented integro-differential operator
for iris segmentation [11]. The operator hunts the circular boundary to distinguish iris
and separate it from sclera and pupil. Wildes [11] proposed a segmentation algorithm
to find the outer center and radius of the region of the iris, using Hough transform
theory.
In the method proposed by Wildes, the magnitude of the intensity range of the
image was a threshold to obtain the edge map of the image.

|∇G(x, y) ∗ I (x, y)|,


 
∂ ∂
∇≡ ,
∂x ∂y
Early Detection of Diabetes by Iris Image Analysis 623

Fig. 7 Hough transform for


circle detection of iris
separating iris from sclera

− (x−x0)2+(y−y
2
0 )2

1 2σ
G(x, y) = e
2π σ 2
(x, y) is used as a Gaussian smoothing function with scaling parameter σ selecting
the appropriate scale of iris edge analysis. The Hough transform is used for the iris
contour to maximize the voting process of the edge map. The maximum point in the
Hough space corresponded to the radius r and center coordinates x c and yc of the
circle. This can be defined according to the equation:

xc2 + yc2 −rc2 = 0. (2)

The parabolic Hough transform of Wildes’ [12] was used to detect the eyelids and
eyelashes by approximating the upper and lower eyelids with parabolic arcs (Fig. 7).

3.3.3 Iris Normalization

Iris normalization was done to convert the iris eye image into fixed dimensions
(Fig. 8).

Normalized_image = x/mean(image) − std(image), (3)

where x is every pixel of the image for both available standard University of
Beira Interior (UBIRIS) benchmark dataset and local dataset iris detection system
with feature extracted on normalized iris images has much better accuracy than
one with features extracted on iris images without normalization. Figure 9 shows
the comparison of equal error rate (EER) for feature extraction with and without
normalization.
Database A consists of UBIRIS free dataset and dataset B consists of local dataset.
624 P. H. A. H. K. Yashodhara and D. D. M. Ranasinghe

Fig. 8 Sample scheme of normalization

Fig. 9 EER for feature extraction with and without normalization

3.4 Post-processing Stage

In this stage, processes like a selection of organs, ROI extraction, feature extraction
and classification tasks were carried out.

3.4.1 Selection of Organs

When people develop diabetics conditions, the system of the body changes. Then
some of the organs are going to get more affected than the other organs. Since all these
Early Detection of Diabetes by Iris Image Analysis 625

organs are directly connected to the iris, any changes on these organs are reflected
in the iris. Therefore, this research explores the analysis of the iris eye image with
respect to kidney, pancreas and spleen to early detect diabetes.

Pancreas
The pancreas was selected as a diabetic organ because it is responsible for producing
the hormone called insulin which regulates the level of glucose in the blood. People
suffering from diabetics experience a build-up of glucose in their blood. If the
pancreas is not functioning properly due to insulin deficiency, the level of glucose in
the blood raises, which needs medication. When the pancreas is affected, the nerves
bring that signal to iris and fibers re-structure according to that metabolism and
symptoms are shown via the iris.

Kidney
Kidney was selected as a diabetic organ because high blood glucose damages the
blood vessels in the kidney. When the blood vessels are injured, they do not function
well; hence, abnormality of a kidney can be detected through the analysis of iris.

Spleen
The spleen is a fist-sized organ of the lymphatic system that behaves similar to the
pancreas. It normally operates filter for blood. Nowadays, researchers have found
that spleen enlargement means diabetes; hence, spleen was also selected as a diabetic
organ.

3.4.2 Region of Interest (ROI) Extraction

ROI extraction was done to crop the particular portions of regions in the pancreas,
spleen and kidney from normalized iris eye image according to the iris chart as shown
in Fig. 10.

3.4.3 Feature Extraction

There are many features such as tissue changes, pigmentations and orange color dots,
which can be considered for diagnosing diabetes, but only some of them give more
accurate results compared to the other features. Color moments and texture features
were known to give a better prediction of diabetics. Iris contains information on rich
texture and breaking tissues of the iris is concomitant with changes in texture features
directly.
626 P. H. A. H. K. Yashodhara and D. D. M. Ranasinghe

Fig. 10 Iridology chart for a left eye and right eye

3.4.4 Classification

For classification, convolutional neural network has been used. The CNN algorithm
automatically learns feature extraction; hence, the time and effort needed for imple-
mentation can be minimized. The CNN algorithm works well on image data and is
flexible.
CNN had basically three layers; input layer, output layer and hidden layers
(Conv2D, MaxPool2D, AvgPool2D, batch normalization, reshape, concatenate,
dense, DepthwiseConv2D and ReLU).

Algorithm 1 Pseudocode for the convolutional layer


1. for H from 1 to m do
2. for j from 1 to n do
3. for l from 1 to L0 do
4. for c from 1 to L0 do
5. tmp = 0
6. for ii from 1 to k do
7. for jj from 1 to k do
8. tmp = tmp + K[ii][jj] × X[j][s × (l-1) + ii][s × (c-1) + j]
9. end for
10. end for
11. [i][l][c] = Y[i][l][c] + tmp
12. if j = = n
13. Y[i][l][c] = f( Y[i][l][c]) + bias
14. end if
15. end for
16. end for
17. end for
18. end for

Custom layers have prepared as given below.


Early Detection of Diabetes by Iris Image Analysis 627

Fig. 11 Code snippet for CNN

In convolution block, there was convolution 3 × 3 kernel without activation with


l2 regularization. Batch normalization and relu activation have carried out. In eff
block, there was depth-wise 3 × 3 kernel convolution without activation. In network
part, it consisted with several layers like, input image (output: 1 layer), conv_block
(output: 8 layers), Average pooling (output: 8 layers), eff_block (output: 16 layers),
max pooling (16 layers), eff_block (32 layers), max pooling (32 layers), eff_block
(64 layers) and average pooling (64 layers).
The output of the first part is a vector of 64 elements (1 × 64). Feed the left and
right images and get two (1 × 64) vectors. Combine those to vectors to create 1 ×
128 vectors.
The output of the second part is a single value consisting of input layers (1 ×
128) and output layer (1 × 1). If that value is less than 0.5, the person does not have
diabetes. Else, the person has diabetes (Fig. 11).

4 Results and Discussion

This section presents the results of this research followed by a discussion on it.
Discussion is based on information gathered through a questionnaire with human
beings. Various charts have been drawn to explain the results.

4.1 Diabetes Prediction

Created a database of iris images of healthy people and people with diabetes. Iris
images of 100 subjects, 50 diabetic and 50 non-diabetics, have been used. The written
consent of each individual is obtained for this research. Cases of type I, type II and
628 P. H. A. H. K. Yashodhara and D. D. M. Ranasinghe

Fig. 12 Number of subjects Number of subjects


with various diseases, which 10
occurs because of diabetes 8
6
4
2
0
Kidney Problems Heart Foot sores High
disease with eyes aack cholesterol

gestational diabetes are considered for the study. The subjects vary from one year of
diabetes to 23 years of diabetes (Fig. 12).
Above information was collected by distributing the questionnaire among diabetic
and healthy persons in several areas in Sri Lanka.

4.2 Implementation

This work has been implemented using OpenCV, keras and TensorFlow. For data
acquisition, Canon DSLR camera was used with 50 mm macrolens. The noise of
collected iris images has been cleared using a wiener filter. After that, the Hough
transform is applied for localization and segmentation to isolate the iris from the
pupil and the sclera. To transform the iris in to fixed size, normalization was done.
After that, ROI extraction was done to identify required organs to detect diabetes.
For all these phases, OpenCV has been used. The classification has been done using
convolutional neural network using TensorFlow. User interface was created using
visual studio 2015.

4.3 Comparison of Madhumeha Detection Model


with Existing Techniques

Table 1 demonstrates the comparison of proposed iris recognition-based models for


predicting diabetes with existing models. The accuracy obtained with small sample
size confirms the effectiveness of the system.
Existing systems had used only one body organ but implemented system has
used three body organs to detect diabetes in order to identify which organ is giving
better results and how to improve the accuracy of the system. Pancreas is giving
80% accuracy, the spleen is giving 60% accuracy and kidney is giving 78% accuracy
in identification of diabetes early. The overall maximum accuracy of the proposed
model for predicting diabetes is 85%, which is given a combination of three diabetic
organs.
Early Detection of Diabetes by Iris Image Analysis 629

Table 1 Comparison of accuracy of madhumeha diabetes prediction model with existing techniques
No. Disease Classifier Number of samples Accuracy (%)
1 Detecting broken tissues Visual inspection 34 94.0
in pancreas [13]
2 Nerve system SVM 44 86.4
3 Pancreas disorder Neighborhood-based 50 83.3
(Lesmana et al. 2011) modified back
propagation using
adaptive learning
parameters
4 Madhumeha CNN 100 85

Table 2 Number of samples


Training Testing Verification
used to measure the accuracy
Local data 30 10 10
UBIRIS 30 10 10

The proposed model does not work properly when the patient is having lenses in
the eyes. It makes confusion when training data. Another important observation of
this study is that a person having controlled diabetes with medicine, proper diet and
exercise has also been identified as healthy.
Data training and testing were done using UBIRIS free dataset and my local
dataset. Accuracy was measured as k-folds of the dataset (Table 2).

4.4 Result

This application gives a maximum accuracy of 85%, which is encouraging improve-


ment over existing models (Fig. 13).

4.5 Limitations

Obtaining noise-free images was a trivial task with various lighting conditions using
the Canon DSLR camera and many images were captured with reflections as given
in Fig. 14.
Implementing an algorithm for iris localization and normalization was hard due
to the time consumption in writing codes. In circle detection, circle was detected
including eyelashes and a very small amount of sclera zone was captured as given in
the below picture. Inaccurate results of limbus detection because of the low contrast
630 P. H. A. H. K. Yashodhara and D. D. M. Ranasinghe

Fig. 13 Simulation of result

Fig. 14 Reflection of iris

Fig. 15 Inaccurate results of


iris detection

of limbus and the presence of eyelids and eyelashes have been illustrated in Fig. 14
(Fig. 15).
Therefore, capturing of noise-free iris images is the crucial task in any kind of
analysis or predictions.
Early Detection of Diabetes by Iris Image Analysis 631

5 Conclusion and Future Work

The implemented model is a non-invasive and non-contact model which is compact


and portable. Using the developed web application, prediction of diabetics can be
done within seconds with 85% accuracy given the condition of possessing an iris
scanner, iris cope or substitute camera to get iris images. The system created in this
study can be used for the benefit of humankind to assure their health status and
prevent them from many diseases by an early stage identification of diabetes. Since
the system is portable and easy to use, this system can be implemented in many free
medical clinics conducted by the government.
The work carried out in this research has a great potential to extend for a large
scale. The points mentioned below can be considered to extend this work as future
work.
• Improve accuracy considering more efficient features and classifiers and do perfor-
mance measurements identify the most efficient and effective way to detect
diabetes.
• Use efficient techniques to remove various noises so it can further improve system
performance.
• This research predicts human being as healthy or having diabetes. Still, system
proficiencies can further be enriched to predict the level and stage (Type I, Type
II and gestational) of diabetes.

References

1. Diabetes (2016) Retrieved from https://www.who.int/news-room/fact-sheets/detail/diabetes


2. Diabetes diagnosis: tests used to detect diabetes (0AD). Retrieved from https://www.webmd.
com/diabetes/guide/diagnosis-diabetes#1
3. Diagnosis of diabetes using computer methods: soft computing methods for diabetes detection
using iris (2017). World Acad Sci Eng Technol Int J Biomed Biol Eng 11(2), 63–68. Retrieved
from https://publications.waset.org/10006736/pdf
4. Samanth P, Agarwal R (2017) Diagnosis of diabetes using computer methods: soft computing
methods for diabetes detection using iris. Int J Biomed Biol Eng 11(2) Iridology (n.d.). Retrieved
from https://sciencebasedmedicine.org/iridology/8
5. Jensen B (n.d.) Updated iridology desk chart w/nutrition chart on Back11 × 17 laminated.
Retrieved from https://www.bernardjensen.com/UPDATED-IRIDOLOGY-DESK-CHART-
wNutrition-Chart-on-Back11-x-17-Laminated_p_117.html
6. Risk V, Hearing-Impaired A, Poultry S, Town T, Boards M, Blogs W, Center N (2019) Early
signs and symptoms of diabetes. WebMD. https://www.webmd.com/diabetes/guide/understan
ding-diabetes-symptoms#1. Accessed 15 May 2018
7. More SB, Prof. Pergad ND (2015) On a methodology for detecting diabetic presence from iris
image analysis. Int J Adv Res Electr Electron Instrum Eng 04(06), 5234–5238. https://doi.org/
10.15662/ijareeie.2015.0406044
8. Kale AP, Gumaste PP, Mane VM (2017) Diabetes prediction using iris—A survey. Int J Electron
Electr Comput Syst 6(11), 304–309. Retrieved from https://iridology-research.com/wp-con
tent/uploads/2018/07/f201711261511710321.pdf
632 P. H. A. H. K. Yashodhara and D. D. M. Ranasinghe

9. B. Ragavendrasamy, Mithun BS, Sneha R, Vinay Raj K, Hiremath B (2017). Iris diag-
nosis—A quantitative non-invasive tool for diabetes detection. Ragavendrasamy B (Special).
Retrieved from https://www.researchgate.net/publication/326201202_Iris_Diagnosis-A_Quan
titative_Non-Invasive_Tool_for_Diabetes_Detection. 2. (Wolfweb.unr.edu, 2019). 3. Foster I,
Kesselman C (1999) The grid: blueprint for a new computing infrastructure. Morgan Kaufmann,
San Francisco
10. UBIRIS (2018). Retrieved from https://iris.di.ubi.pt/
11. Wildes R (1997) Iris recognition: an emerging biometric technology. Proc IEEE 85(9):1348–
1363. https://doi.org/10.1109/5.628669
12. Hough Circle Transform (n.d.) Retrieved from https://opencv-python-tutroals.readthedocs.io/
en/latest/py_tutorials/py_imgproc/py_houghcircles/py_houghcircles.html
13. Wibawa AD, Purnomo MH (2006) Early detection on the condition of pancreas organ as the
cause of diabetes mellitus by real time iris image processing. In: APCCAS 2006 - 2006 IEEE
Asia Pacific Conference on Circuits and Systems. https://doi.org/10.1109/apccas.2006.342258
A Novel Palmprint Cancelable Scheme
Based on Orthogonal IOM

Xiyu Wang, Hengjian Li, and Baohua Zhao

Abstract To extract more palmprint features and achieve better recognition results,
an Orthogonal Index of Maximum and Minimum (OIOMM) revocable palmprint
recognition method is proposed in this paper. Firstly, the competitive code features
of region of interest (ROI) are extracted. Then, the statistical histogram of palmprint
competition code features is obtained by partitioning the features. The Gaussian
random projection (GRP)-based IOM mapping is used to generate a GRP matrix.
Orthogonal GRP matrix is obtained by Schmidt orthogonalization. OIOMM hash
converts real-valued biological eigenvectors into discrete index hash codes. Finally,
the palmprint image is matched with Jaccard distance. The experiment is carried out
in the palmprint database of Hong Kong Polytechnic University. When the random
projection size is 200 and the revocable palmprint feature length is 500, the equal
error rate is 0.90. This shows that the algorithm not only improves security but also
maintains the classification effect.

Keywords Palmprint recognition · Orthogonal Index of Maximum and


Minimum · Competitive code · Revocable palmprint template

1 Introduction

In the information society, people have more and more demand for identity authenti-
cation [1]. Palmprint recognition has become one of the most promising methods in
the field of biometric recognition because of its stability, reliability and easy acqui-
sition [2]. Palmprint recognition plays an important role in public security, access
control, forensic identification, banking and finance [3]. With the increasing use of
biometric recognition, the necessity toward securing the biometric data is also arising.
However, unlike revocable and redistributable credit cards or passwords, each unique
biometric template cannot be revoked. Once biometric data is stolen, it cannot be

X. Wang · H. Li (B) · B. Zhao


School of Information Science and Engineering, University of Jinan, Jinan 250022, China
e-mail: ise_lihj@ujn.edu.cn

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 633
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_46
634 X. Wang et al.

replaced. Therefore, the palmprint recognition process should be implemented on


the premise of biometric template encryption and privacy protection.
For these security and privacy issues, there have been studies showing that
biometric systems have weaknesses for potential attacks [4]. Traditional palm-
print recognition methods such as PalmCode [5] and its subsequent improved algo-
rithms improve the recognition rate, but they cannot resist attacks. Therefore, while
improving the recognition rate, it is necessary to enhance the ability to resist attacks.
The most direct solution is to apply data encryption algorithms (such as DES or
AES) to biometric templates. However, this simple method of adding encryption and
decryption links is inefficient and unsafe. To overcome these problems, the method
of revocable template protection no longer stores the original biometric template, but
stores or transfers the encrypted and converted template, which is called revocable
biometric template. The revocable palmprint biometric protection scheme [6] can be
revoked and redistributed when the generated template is lost or stolen. Tang [7] and
others proposed that the irreversible two-dimensional polar grid mapping method
be used to realize the revocable template protection for fingerprints. For palmprint
recognition, Leng et al. improved Palm Hash method, which combines palmprint
and palm vein features to realize 2D Palm Hash conjugate recognition [8]. However,
there are still some aspects to be improved, including the ability of template redis-
tribution and the improvement of recognition efficiency. The security of Palm Hash
method is based on the premise that the key is not lost. If the password is shared
or cracked, the recognition performance of the system will decline to the recogni-
tion level of the original template itself. The improved Palm Hash method proposed
by Lumini and Nanni [9] improves the recognition performance by increasing the
projection space, but the computational complexity is also greatly increased. In order
to increase security, Zhe et al. proposed an IOM method to obtain revocable palmprint
feature templates based on random Gaussian hash transform [10]. In this method,
the design of the Gaussian random projection matrix is very important. The structure
of Gaussian projection matrix will affect the content of palmprint features extracted.
Because the random Gaussian matrix is non-orthogonal, the extracted palmprint
features are highly correlated and there are much redundant information. An OIOM
scheme is proposed [11]. This scheme mainly carries out Schmidt orthogonalization
of random Gauss projection matrix in IOM, which can extract more palmprint feature
information and improve the recognition rate.
In order to overcome the shortcomings of the above technology, this paper mainly
describes an OIOMM-based revocable palmprint recognition method with compet-
itive code feature. Firstly, the ROI of the original palmprint image [12] is filtered
by a Gabor filter to extract the palmprint competitive code features [13, 14]. Then,
the statistical histogram of palmprint competition code features is obtained by parti-
tioning the features of palmprint competition code [15]. Multiple Gaussian random
projection matrices are generated, and Schmidt orthogonalization [16] is used to
obtain the orthogonal Gaussian random projection matrices. It multiplies the orig-
inal feature of palmprint and the position serial number is found corresponding to the
maximum and minimum values [17]. Finally, a revocable palmprint feature template
is generated. The palmprint to be tested is inputted and extract the features through the
A Novel Palmprint Cancelable Scheme … 635

same steps. Then it is matched with revocable palmprint template by Jaccard distance
[18]. If the Gaussian projection matrix is correlated, the extracted palmprint features
are also correlated. There will be a lot of redundant information and inadequate
expression of palmprint. Schmidt orthogonalization for Gaussian random projection
matrix is used. Furthermore, improve the location information of the maximum and
minimum values of the random orthogonal projection matrix of palmprint features,
which can not only extract more sufficient palmprint information but also improve
the security of palmprint recognition. It is an irreversible transformation of palm-
print recognition template to ensure the security and privacy of the template. If you
cancel the revocable palmprint template, you can regenerate the new template from
the same palmprint database.
The article is organized as competitive code discussion in Sect. 2, palmprint
competition code feature based on OIOMM in Sect. 3, experimental analysis and
discussions in Sect. 4, security analysis in Sects. 5 and 6 provides the conclusion.

2 Palmprint Competitive Code Feature Extraction

2.1 Competitive Code

The original palm print image is preprocessed and the features are extracted. The
extracted ROI is binarized to obtain a binary image of the palm print. The visual
cortex cells are simulated through Gabor transform and the illumination changes,
image contrast is used to improve the feature recognition in palm print analysis. The
filtered image is obtained from Gabor filter bank and it is given as
    
G R (u, v) = 4u 2 − 2 exp − u 2 + v 2 (1)

where u is abscissa and v is ordinate, the place coordinate is given as (u, v) and it is
obtained as
        
u 1 α 0 cos θ sin θ x − x0
= × × (2)
v 0 1 β − sin θ cos θ y − y0

where (x 0 , y0 ) is the center of the filter, θ is the rotation angle that can locally orientate
the filter along the palm line and α and β are to adapt the line orientation.
After that, the filtered image is normalized and the palmprint competitive code
features are obtained. The formula is as follows:
  
CompCode(x, y) = arg min I (x, y) ∗ G R x, y; θ j (3)
j
636 X. Wang et al.

where I(x, y) is the point with abscissa x and


 ordinate y in the palmprint image; *
represents convolution operation; θ j = jπ J , j = {0, . . . , J − 1}, J represents the
number of directions to obtain palmprint image, J = 8; G R represents Gabor filter.

2.2 Palmprint Feature Competitive Code Blocks

The obtained palmprint competitive code features are mapped and each palmprint
competitive code feature map is divided into p × p sub-blocks. The statistical
histogram hi of palmprint competition codes is calculated for each block. Because
there are eight possible competing codes in total, the apparent hi dimension is 8.
The histograms hi in all blocks are connected to form a large histogram h. The large
histogram h is used as the eigenvector. The dictionary A is defined as the series of
all eigenvectors according to the whole set of galleries.
The formula is as follows:
 
A = v1,1 , v1,2 , vk,n k ∈ R m×n (4)

where k is the number of classes in the library set; nk is the number of samples of
class k; m is the dimension of features; and n is the number of samples of each class.
Figure 1 is a process diagram for extracting feature vectors of palmprint
competition codes.
Compcode mapping is highly recognizable, but the major issue in compcode
mapping is its sensitivity. Even a small amount of registration errors will affect the
performance features in probe and training image. However, the global statistics and
its features such as histograms are robust against sensitivity. Combining these two
models will provide more advantages in the feature extraction process. This paper
uses the block statistics of compcode as the feature. However, this feature does not
have high security, so the GRP-based OIOMM mapping of palmprint feature vectors
is used to obtain revocable palmprint feature templates to enhance security.

Fig. 1 Palmprint feature vector extraction


A Novel Palmprint Cancelable Scheme … 637

3 Genration of Palmprint Revocable Feature Template


Based on OIOMM

3.1 Orthogonal Rule

Schmidt orthogonalization rule is used to orthogonalize the Gaussian random projec-


tion matrix. Schmidt orthogonalization is a method for finding the orthogonal basis of
Euclidean space. The specific process of Schmidt orthogonalization is shown below.
Let us take the vector group of three vectors as an example. Three linearly inde-
pendent vectors are selected from the chaotic matrix as vector group α1 , α2 , α3 . Let
us construct orthogonal vector group β1 , β2 , β3 and make α1 , α2 , α3 equal to vector
group β1 , β2 , β3 . The specific solution process is as follows:

α2 , β1 
β1 = α1 , β2 = α2 − β1 ,
β1 , β1 
α3 , β1  α3 , β2 
β3 = α3 − β2 − β2 (5)
β1 , β1  β2 , β2 

After the above transformation, can get the orthogonalized vector group β1 , β2 , β3 .
α1 , α2 , α3 and β1 , β2 , β3 are equivalent. By Schmidt orthogonalization of the whole
chaotic matrix, the orthogonal matrix can be obtained.

3.2 Jaccard Distance

After the revocable palmprint template is obtained, the palmprint image is recognized.
The palmprint image to be tested is input and use Jaccard distance to match and
classify the obtained revocable palmprint template. Jaccard uses the proportion of
different elements in two palmprint samples to measure the similarity between the
two samples and finally obtains the classification results.
Jaccard distance is used to describe the similarity between palmprint sets. The
larger the Jaccard distance, the lower the sample similarity. Given two palmprint sets
A and B, A represents the palmprint set to be tested and B represents the revocable
palmprint template set. The ratio of the size of the intersection of A and B to the
size of the combination of A and B is defined as the Jaccard coefficient and it is
represented as

|A ∩ B| |A ∩ B|
J (A, B) = = (6)
|A ∪ B| |A| + |B| − |A ∩ B|

where J(A, B) is defined as 1 if the palm print sets are empty. The Jaccard distance
is obtained based on the Jaccard coefficient and relative index and it is given as
638 X. Wang et al.

|A ∪ B| − |A ∩ B| AB
d j (A, B) = 1 − J (A, B) = = (7)
|A ∪ B| |A ∪ B|

where AB = |A ∪ B| − |A ∩ B|, J (A, B) = [0, 1].


When the decision threshold is a constant, Jaccard distance is calculated and
compared with the set decision threshold value. When the decision threshold value
is smaller than the set decision threshold value, Jaccard distance is determined as
a class, and when it is larger than the decision threshold value, Jaccard distance
is determined to be dissimilar. This decision threshold can be adjusted. When the
decision threshold is too low, it will increase the false acceptance rate (FAR). When
the decision threshold is too high, it will increase the false rejection rate (FRR).
Based on the specific condition, an optimal decision set is considered based on the
optimal decision threshold to obtain low FAR and FRR values. The intersection of
FAR and FRR is decided based on the threshold value.

3.3 The Main Process of Algorithm

The OIOMM hashing is a means of cancelable biometrics [19]. The pure discrete
exponential representation of OIOMM hash codes has the following advantages:
OIOMM hash makes biometric information more invisible, while biometric infor-
mation is often represented as position information. OIOMM hashing is essentially
a sort-based hashing method, so it is independent of feature size. This makes hash
codes robust to noise and changes that do not affect implicit sorting. The size inde-
pendence of OIOMM hash makes the generated hash code scale invariant. An orthog-
onal matrix is added to IOM based on GRP and orthogonalize Gaussian projection
matrix to get orthogonal Gaussian projection matrix. Moreover, the maximum loca-
tion information of palmprint features and the minimum location information of
palmprint features are extracted during the process. By simultaneously improving
the original IOM, more palmprint features can be extracted to improve recognition
rate in palm print.
Figure 2 depicts the process flow of the proposed algorithm. Firstly, the feature
of the palmprint competitive code is obtained by filtering with a Gabor filter.
The competitive code features of palmprint are divided into blocks and the palm-
print feature histograms are extracted, respectively. The generated Gaussian random

Fig. 2 Main process of the algorithm


A Novel Palmprint Cancelable Scheme … 639

projection matrices are Schmidt orthogonalized to obtain the orthogonal Gaussian


random projection matrices. It multiplies the original palmprint features and recorded
the maximum and minimum value of the location. Finally, the reversible palmprint
eigenvector template based on OIOMM is obtained.
The reversible palmprint feature template obtained by OIOMM transformation
improves the security of palmprint recognition and ensures the classification effect
of palmprint because of its reversibility and irreversibility.
The GRP-based OIOMM hashing algorithm can be compressed into three steps,
as follows:

Algorithm 1 Orthogonal Index of Maximum and Minimum Hash


Input: Feature vector A ∈ R d , number of Gaussian random matrices m and the number of
Gaussian random projection vector q
Output: Hashed code tG R P = {ti ∈ [1, q]|i = 1, . . . , m }
Step 1: Given the palmprint eigenvector A ∈ R d , generate q number of Gaussian random

projection vectors for m times, wij ∈ R d |i = 1, . . . , m, j = 1, . . . , q ∼ N (0, Id ), so the



Gaussian random projection matrix W i = w1i , . . . , wqi can be formed
Step 2: Orthogonal Gaussian random projection matrices are obtained by Schmidt
orthogonalization of the generated Gaussian random projection matrices. Then it multiplies the
original palmprint features
Step 3: The m indices of the maximum and minimum values calculated from the orthogonal
Gaussian random projection matrix and eigenvector A are recorded as ti . Therefore, OIOMM
hash codes based on Gaussian random projection are denoted as
tG R P = {ti ∈ [1, q]|i = 1, . . . , m }

GRP embeds palmprint feature vectors into q-dimensional Gaussian random


subspace and uses maximum and minimum projection feature index. Repeat the
process with m independent Gaussian random matrices and generate a set of m
OIOMM exponents.

4 Experimental Results

In the OIOMM method proposed in this paper, the Gaussian random projection matrix
is Schmidt orthogonalized. Because the orthogonal Gaussian random projection
matrix obtained is highly irrelevant and the maximum and minimum position infor-
mation of palmprint feature is extracted, more palmprint features can be extracted and
redundant information of palmprint features can be avoided. Therefore, this method
can improve security and maintain the classification effect better at the same time.
The database used for the experimentation is obtained from Hong Kong Polytech
University [20] which includes 600 grayscale images from 100 unique hands and
palms. An average interval of two months is maintained to obtain two samples from
each person. About 10 palm print images of 384 × 284 pixel of 5 DPI are collected on
640 X. Wang et al.

Fig. 3 a Original palmprint image, b palmprint competitive code feature

Table 1 Equal error rate of IOM, OIOM and OIOMM


N dimension 200 210 220 230 240 250 260 270 280 290 300
N projection 500 510 520 530 540 550 560 570 580 590 600
IOM EER 1.54 1.41 1.43 1.28 1.35 1.29 1.32 1.27 1.28 1.45 1.39
OIOM EER 1.37 0.93 1.28 1.03 1.08 1.08 1.10 1.13 1.15 1.34 1.26
OIOMM EER 0.90 0.95 1.03 1.00 0.95 0.96 0.96 1.14 0.97 1.10 1.08

each meet. For experimentation purpose, each image is clipped to obtain the desired
ROI with a size of 128 × 128. Competitive code features are obtained by filtering
the image using filter bank and the ROI was obtained. As shown in Fig. 3, (a) depicts
the ROI of the original palmprint image and (b) is the feature image of the palmprint
competitive code.
Table 1 provides a comparison of equal error rate (EER) for IOM, OIOM and
OIOMM models. The first line in the table represents palmprint features of different
lengths and the second line represents different sizes of random projections.
Based on the first set of data in Table 1, the ROC curves of IOM, OIOM and
OIOMM are got in Fig. 4. Based on the first set of data in Table 1, got the intra-class
and inter-class matching score distribution of OIOMM in Fig. 5.
It can be seen from Table 1 that when the random projection size is 200 and
the revocable palmprint feature length is 500, the EER of IOM is 1.54, the EER of
OIOM is 1.37, the EER of OIOMM is 0.90. According to the data of Table 1, it can be
seen that the recognition rate of the proposed scheme OIOMM is higher than that of
OIOM and IOM. Because compared with OIOM, it not only extracts the maximum
position information of palmprint feature but also extracts the minimum position
information of palmprint feature. Compared with IOM, Schmidt orthogonalization
of GRP matrix can extract more irrelevant palmprint information and make palmprint
features better expressed.
Based on the first set of data in Table 1, got the ROC curves of IOM, OIOM and
OIOMM in Fig. 4. It can be seen from the figure that when FAR is 1 × 10−1 , the
genuine acceptance rate(GAR) of OIOMM is 97.11% and that of OIOM is 96.02%
and that of IOM is 94.16%. When FAR is 1×10, the GAR of OIOMM is 99.26% and
that of OIOM is 98.81% and that of IOM is 97.75%. This shows that the maximum
and minimum position information of palmprint features can be extracted to obtain
better recognition results.
A Novel Palmprint Cancelable Scheme … 641

100
OIOMM
OIOM
95 data1
Genuine Accept Rate(%)

90

85

80

75

70
-4 -3 -2 -1 0 1 2
10 10 10 10 10 10 10
False Accept Rate(%)

Fig. 4 ROC curves of IOM, OIOM and OIOMM

2.5
Genuine
Imposter

2
Percentage(%)

1.5

0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Distance

Fig. 5 Intra-class and inter-class matching fractional distribution of OIOMM


642 X. Wang et al.

The intra-class and inter-class matching score distribution of OIOMM is obtained


based on the first set of OIOMM data and it is depicted in Fig. 5. It can be concluded
from the graph that the true matching score of OIOMM is around 0.90 and the
false matching score is around 0.70. Although there are few overlapping areas in
intra-class and inter-class distributions, true matching and false matching can still be
well separated. From these data, it is observed that the proposed model efficiently
differentiates the true matching and false matching compared to other models.
The above results show that this method improves security while maintaining the
classification effect. It can change the irreversible conversion of palmprint recogni-
tion template and ensure the security and privacy of the template. If you cancel the
revocable palmprint template, you can regenerate the new template from the same
palmprint function.

5 Security Analysis

The revocable palmprint recognition system achieves the standard of template protec-
tion scheme while maintaining the recognition accuracy. In this part, the irre-
versibility, non-linkability, revocability and diversity of the algorithm are analyzed.

5.1 Irreversibility

Irreversibility is the computational hardness of palmprint vectors recovered from


GRP-based OIOMM hash codes. Consider the hash codes and tokens are successfully
retrieved and the hash codes are known to the opponent. But in the case of GRP-
based OIOMM model, the values are discretely indexed so that there is no clue for
the opponent to guess the feature vectors from the hacked hash code. Also, it is very
difficult to retrieve the palm print vector from tokens since there is no relationship
between the tokens and vectors.

5.2 Non-linkability

In this section, the pseudo-genuine score is introduced to verify the requirement of


non-link ability. The genuine score describes the matching value between the hash
codes for different palm prints of the same individual using projection matrices. It
generated almost 3600 pseudo-matching scores. In the case of proposed work, the
pseudo-imposter substitution overlaps the genuine distribution so that the generated
hash code for the users is ambiguous. Also if the distributions are far apart, the hash
code can easily distribute by the opponent as the code is generated by the same
person. The pseudo-imposter and pseudo-genuine distributions are plotted in Fig. 6.
A Novel Palmprint Cancelable Scheme … 643

Fig. 6 Pseudo-genuine and pseudo-imposter distributions for non-linkability analysis of OIOMM

It is observed from the plot, the proposed OIOMM satisfies the non-link ability
condition.

5.3 Revocability

This section validates the requirement of revocability. Figure 7 depicts the genuine
distribution, pseudo-imposter and imposter details. And it is observed that there is
a great degree of overlap between the distribution of imposter and pseudo-imposter.
This means that the newly generated hash code with a given random projection
matrix is different, even if it is generated by the same palmprint vector source.
This significantly provides the revocability requirement which is satisfied through
the proposed approach. If tokens (such as random matrices) are stolen, precision
performance will not be significantly reduced. Therefore, tokens in the OIOMM
hash are only used for revocability and are not secret to the public.

5.4 Diversity

For palmprint protection mechanisms, to avoid cross-matching, many derived palm-


print templates should not be matched on different applications. Experiments show
that it is possible to generate multiple palmprint templates for a single subject.
644 X. Wang et al.

Fig. 7 Genuine, imposter and pseudo-imposter distributions for revocability analysis of OIOMM

These palmprint templates can still be significantly different from the original palm-
print templates. This means that individuals can register different templates for the
same topic in different physical applications without cross-matching. Therefore,
experiments verify the diversity of the revocable palmprint template.

6 Conclusion

In this paper, a revocable palmprint competitive code feature recognition method


based on OIOMM is used to obtain a revocable palmprint feature template to improve
the security of palmprint recognition. Firstly, ROI of binary palmprint image is got.
Then the feature vectors of palmprint competition codes are extracted. After mapping,
a revocable palmprint feature template based on OIOMM is obtained. Finally, palm-
print recognition is carried out based on Jaccard distance. This method can not only
maintain the original recognition rate but also improve security. The irreversible
conversion of palmprint recognition template can be changed to ensure the secu-
rity and privacy of the template. If the revocable palmprint template is canceled, a
new template can be regenerated from the same palmprint feature. OIOMM-based
palmprint revocable feature template can extract more effective palmprint features
and improve recognition rate because it extracts the maximum and minimum posi-
tion information of palmprint features and the orthogonal GRP matrix has strict
independence. It provides more meaningful applications for palmprint recognition.
A Novel Palmprint Cancelable Scheme … 645

References

1. Deshmukh M, Balwant MK (2017) Generating cancelable palmprint templates using local


binary pattern and random projection. In: 13th International conference on signal-image
technology & internet-based systems (SITIS), pp 203–209
2. Peng XR, Tian YM, Wang JQ (2013) A survey of palmprint feature extraction algorithms. In:
Fourth international conference on intelligent systems design and engineering applications, pp
57–63
3. Zhong DX, Yang Y, Du XF (2018) Palmprint recognition using siamese network. In: Chinese
conference on biometric recognition, pp 48–55
4. Pirbhulal S, Shang P, Wu WQ (2018) Fuzzy vault-based biometric security method for tele-
health monitoring systems. Comput Electr Eng 71:546–557
5. Younesi A, Amirani MC (2017) Gabor filter and texture based features for palmprint
recognition. Procedia Comput Sci 108:2488–2495
6. Qiu J, Li HJ, Zhao C (2019) Cancelable palmprint templates based on random measurement
and noise data for security and privacy-preserving authentication. Comput Secur 82:1–14
7. Tang Y, Liu JY, Tang DH (2015) A revocable fingerprint template protection scheme. Inf Netw
Secur 169:72–75
8. Leng L, Teoh ABJ, Li M (2015) Analysis of correlation of 2D Palmhash code and orientation
range suitable for transposition. Neurocomputing 131:377–387
9. Lumini A, Nanni L (2017) An improved bio hashing for human authentication. Pattern Recogn
40:1057–1065
10. Jin Z, Hwang JY, Lai YL (2018) Ranking-based locality sensitive hashing-enabled cancelable
biometrics: index-of-max hashing. IEEE Trans Inf Forensics Secur 13:1556–6013
11. Xiyu W, Hengjian L (2019) One-factor cancellable palmprint recognition scheme based on
OIOM and minimum signature hash. IEEE Access
12. Aykut M, Ekinci M (2015) Developing a contactless palmprint authentication system by
introducing a novel ROI extraction method. Image Vis Comput 40:65–74
13. Fei L, Zhang B, Zhang W (2019) Local apparent and latent direction extraction for palmprint
recognition. Inf Sci 473:59–72
14. Jia W, Zhang B, Lu JT (2017) Palmprint recognition based on complete direction representation.
IEEE Trans Image Process 26:4483–4498
15. Nezhadian FK, Rashidi S (2017) Palmprint verification based on textural features by using
Gabor filters based GLCM and wavelet. In: 2017 2nd Conference on swarm intelligence and
evolutionary computation (CSIEC), pp 112–117
16. Calfa JB, Keith B (2017) A multilabel extension of LDA based on the Gram-Schmidt
orthogonalization procedure. In: Iberoamerican congress on pattern recognition, pp 86–93
17. Jianqiu J, Jianmin L, Shuicheng Y, Qi T, Bo Z (2013) Min-max hash for Jaccard similarity. In:
2013 IEEE 13th international conference on data mining, Dallas, pp 301–309
18. Wang XL, Lu YG, Wang DC (2017) Using Jaccard distance measure for unsupervised activity
recognition with smartphone accelerometers. In: Asia-Pacific web (APWeb) and web-age
information management (WAIM) joint conference on web and big data, pp 74–83
19. Soliman RF, Amin M, EI-Samie FEA (2018) A modified cancelable biometrics scheme using
random projection. Ann Data Sci 1–14
20. https://www.comp.polyu.edu.hk/~biometrics/
Shape-Adaptive RBF Neural Network
for Model-Based Nonlinear Controlling
Applications

Kudabadu J. C. Kumara and M. H. M. R. S. Dilhani

Abstract Radial basis function neural networks (RBF-NNs) are simple in struc-
ture and popular among other NNs. RBF-NNs are capable of fast learning proving
their applicability in developing deep learning applications. Its basic form with
center states or the means and standard deviations with weight adaptation makes
limited variability and complex in tuning when such embedded to the model.
Dynamics systems are nonlinear, especially behavior is uncertain and unpredictable,
and complete mathematical modeling or model-based controlling has limited appli-
cability for stability and accurate control. Shape-adaptive RBF-NN presented in
the paper theoretically proved for stability control using the Lyapunov analysis. The
autonomous surface vessel controlling selected for the numerical simulation consists
of a mathematical model developed using marine hydrodynamics for a prototype
vessel and classical proportional-derivative (PD) controller. Results indicated that
shape adaptive RBF-NN blended controlling is more accurate and has a fast learning
ability in intelligent transportation vessel development.

Keywords Autonomous surface vessel · Nonlinear control · Radial basis function


NN

1 Introduction

Radial basis function [1] neural networks (RBF-NNs) are a special type of feedfor-
ward neural network [2]. As in Fig. 1, three layers of the RBF-NN are the input
layer, the hidden layer, and the output layer. The hidden layer activation function is a
radial basis function, and ψ(i) represents ith hidden neuron activation as defined by
(1). RBF-NNs having universal approximation properties [3] and simplicity of the

K. J. C. Kumara (B)
Department of Mechanical and Manufacturing Engineering, Faculty of Engineering, University of
Ruhuna, Galle, Sri Lanka
e-mail: kumara@mme.ruh.ac.lk
M. H. M. R. S. Dilhani
Department of Interdisciplinary Studies, Faculty of Engineering, University of Ruhuna, Galle, Sri
Lanka

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 647
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_47
648 K. J. C. Kumara and M. H. M. R. S. Dilhani

n-1

Fig. 1 Topology of the RBF-NN (multi-inputs single-output) showing three layers: input, hidden,
and output

structure compared to other types of NNs make it compact, fast learning-training [4],
and therefore, programming an embedded computer becomes easier using methods
rule extractions [5]. The common applications reported in the literature included
classification [6], pattern recognition [7], regression, and time series [8] analysis in
general. It also found many related works on biomedical [9], chemical [10], material
and mechanical engineering [11], robotics [12], and financial sectors [13]. In the
controller design, the nonlinear dynamics also found that the RBF-NNs have wide
use [14] and proved that such controller has a fast convergence ability compared
to classical control especially for model-based controlling [10]. When developing a
controller for autonomous systems in a dynamic environment that always consists
of perturbations, unforeseen and uncertain variations’ classical controlling does not
robust enough to handle when varying disturbances are present [15]. Even NN with
fixed shapes or with prior tuned weights, algorithms found deviations for real-time
controlling. Further various activation function optimizing algorithms found in the
literature are having offline training with big data sets [16], need high computing
power, and practical implementations also demand costly hardware.
⎡ 2 ⎤
m 
z i − ci j
ψi (z) = exp⎣− ⎦, i = 1, 2, . . . , N (1)
j=1
2σi2
Shape-Adaptive RBF Neural Network … 649

where ψ(i) is the activation function of ith hidden neuron,


z = [z 1 , z 2 , . . . , z m ] is the input vector for m number of input variables,
ci j = [ci1 , ci2 , . . . , cim ] is the array of center states,
σi j = [σi1 , σi2 , . . . , σim ] is the array of standard deviations of the Gaussians, and
N = number of hidden layer neurons.
The main objective of the works presented here is to extend the previously
published works of authors on developing RBF-NNs based controlling for the
autonomous surface vessel (SV) controlling [17, 18] with weight adaptation laws to
shape-adaptive RBFs-NNs and simulate the controller for the mathematical model
developed for prototype SV and compare the performance compared to classical
PD-like controlling. The Lyapunov method [19] vastly used in control engineering in
stability analysis also carried out for the newly developed controller. Section 2 of the
paper explains PD conventional [20] controller combined novel RBF-NN controller
design following the Lyapunov stability analysis. Summary of the SV mathematical
model developed based on CAD of the SV is given in Sect. 3 following final controller
design and numerical simulations in Sect. 4. Remaining sections are arranged as the
discussion and the conclusion.

2 Shape Adaptive RBF-NN Controller Design

In this section, first, the basic RBF-NN controller is presented. RBF function
for the shape-adaptive RBF-NN (SA-RBF-NN) is introduced, and derivation of
the controller signal is described in the proceeding section with the weight and
SA-RBF-NN shape-tuning laws as the third subsection.

3 RBF-NN Controller Design

The output signal of RBF-NN shown in Fig. 1 can be written as,



N

y= yi (2)
i=1

where yi is the calculated output of ith hidden neuron determined as follows.


y i = wi ψi (z) (3)

where wi ∈ R N is the vector of current values of connecting weights between the


hidden layer and single output node. By taking actual output as yi and ideal weight
as wi and, expanding of Eq. (3) into a Taylor series yields,
650 K. J. C. Kumara and M. H. M. R. S. Dilhani

yi − y i = (wi − wi )T ψi (z) + ξ (4)

where ξ (|ζ | < ζ N ) represents neglected higher-order terms. From now onward,

subscripts are omitted for brevity. From the signal from NN y for counteract from
modeled and unmodeled uncertainties and taking NN approximation errors ε < |ε N |,
the final control signal error can be written as

y= y − y + ε (5)

Using (4) and (5), we have


∼ T
y = w ψi (z) + ξ + ε (6)

The analytical expression of the activation function of the SA-RBF-NN by taking


its altitude (ai j ), width (z i − ci j ), and standard deviation (σ ) is given as,
⎡  2 ⎤

m
z i − ci j
ψi (z, a, c, σ ) = ex p ⎣ lnai j − ⎦, i = 1, 2, . . . , N (7)
j=1
2σi2

By substituting (7) into (3) and taking the Taylor series expansion, it yields,

∂   T

yi − y i = w·ψ a−a
∂a
∂   T ∂   T

+ w·ψ c−c + w·ψ σ −σ +ξ (8)


∂c ∂σ
 
w·ψ
 ∂

Taking ψk = ∂k for k = a, c, or σ above relation can be simplified


k=k
as

∼∼  ∼T  ∼T  ∼T
yi − y i = 3 w ψ +w ψa a + ψc c + ψσ σ

+ξ (9)

∼ ∼ ∼ ∼
where w= w − w , a= a − a , c= c − c, and σ = σ − σ are estimation errors with

w(w ≤ wmax ), a(a ≤ amax ), c(c ≤ cmax ), and σ (σ  ≤ σmax ) being ideal
values. The notation, · denotes the Euclidean norm of a matrix. Then, by combining
(5) and (9), control signal estimation error can be written as,

∼ ∼∼  ∼T  ∼T  ∼T
y = 3 w ψ +w ψa a + ψc c + ψσ σ

+ξ +ε (10)


where ψk R N ×N , in which N is the number of hidden neurons.
Shape-Adaptive RBF Neural Network … 651
⎡ ⎤
dψ1
dk1
dψ1
dk2
· · · dk
dψ1
⎢ dψ1 ⎥
N


dψ2 dψ2
. . . dk ⎥
ψk = ⎢ .. .. .. ⎥
 dk1 dk2
⎥; k = a, c, σ
N
⎢ ..
⎣ . . . . ⎦
dψ N
dk1
dψ N
dk2
. . . dψ N
dk N

  
The entries of ψa , ψc and ψσ can be calculated as follows,
⎧ ⎡ ⎤

⎪ 
m
dψi ⎨ exp⎣ a i j ⎦; i = j


ψa = (11)
dai ⎪

j

0; i = j
⎧ ⎡ 2 ⎤
⎪ m 


⎪ z − c
dψi ⎨ exp⎣− ⎦; i = j
i ij

ψc = 2σ 2
(12)
dai ⎪

j

0; i = j

and
⎧ ⎡ 2 ⎤
⎪ m 


⎪ z − c
dψi ⎨ exp⎣− ⎦; i = j
i ij

ψσ = 2σ 2
(13)
dσi ⎪

j

0; i = j

4 Tuning of Weights and Shape of RBF-NN

Weights of the NN and the shapes of RBF are updated using tuning laws proposed
by a set of relationships in (14) to achieve guaranteed tracking performance of the
autonomous controller. In Sect. 3, a convergence of the system with SA-RBF-NN
blended PD controller is proved using Lyapunov analysis for these tuning laws or the
format of the tuning laws are selected such that proposed SA-RBF-NN achieves an
accurate path tracking with these parameter updating relationships. A similar kind
of approach is described in [21].
 

ẇ = F 3I Fr ψ T − κEw




ȧ = G r T w ψa − κEa




ċ = H r T w ψc − κEc




σ̇ = J r T w ψa − κEσ (14)
652 K. J. C. Kumara and M. H. M. R. S. Dilhani

where I represents identity matrix, F = F T > 0, G = G T > 0, H = H T > 0,


and J = J T > 0 any constant matrices, and κ > 0 is a design parameter. E is the
error vector defined by the difference of the target output and current output of the
SA-RBF-NN PD controller. For the application of this work, it is a 3 × 2 matrix
defined.
 T
E = e ė (15)

where e is the tracking error of the path given by the surge (x), the sway (y), and
the orientation (φ) and ė are related to velocity errors.
Further, r is defined and the sum of error matrices, so that the space state model
of error dynamics is as follows.

Ė = AE + B y (16)

r = E + Ė (17)

A and B are constant matrices related to PD controller gains, and details are
discussed in Sect. 3. Finally, the position and velocity errors are bounded, and the
convergence pf SA-RBF-NN parameters are guaranteed with practical bounds as
proved in the next section.

5 The Lyapunov Analysis

The Lyapunov theory was introduced in many controller designs in the literature
[19], [22] as it can ensure the globally asymptotical stability of nonlinear control
systems. In order to prove the controller developed above, the Lyapunov function
candidate is defined as,
   
L(E, w̃, ã, c̃, σ̃ ) = E T P E + tr w̃ T F −1 w̃ + tr ãG −1 ã T
 

+ tr c̃H −1 c̃ T + tr σ̃ J −1 σ̃ T (18)

where P is a positive definite solution of the Lyapunov equation A T P + P A+ Q =


0 for any positive definite matrix Q. Differentiating (18) yields,
 
∼˙T ∼
L̇ = Ė T P E + E T P Ė + 2tr w F −1 w
     
˙
∼ ∼T ˙
∼ ∼T ˙
∼ ∼T
+ tr aG −1 a + tr c H −1 c + tr σ J −1 σ
Shape-Adaptive RBF Neural Network … 653

Substituting for Ė from (16) and rearranging the results yields,


 
  ∼˙T −1 ∼

L̇ = E A P + P A E + 2E (P B) +2tr w F w
T T y T

     
˙ −1 ∼T
∼ ˙ −1 ∼T
∼ ˙ −1 ∼T

+ tr aG a + tr c H c + tr σ J σ

The Kalman–Yakubovich–Popov (KYP) lemma [23] can be applied to prove that


there exist two positive, definite symmetric matrices P and Q when the transfer
function is made strictly positive real. By applying KYP for the system described by
(16) and (17),

A T P + P A + Q = 0, P B = C T

Then, L̇ further delineates to,


   
∼˙T −1 ∼
∼ ˙ −1 ∼T

L̇ = −E Q E + 2E C
T y +2trT
w
T
F w + tr aG a
   
˙
∼ ∼T ˙
∼ ∼T
+ tr c H −1 c + tr σ J −1 σ


Substituting for y from (10) gives,
 
∼∼  ∼T  ∼T  ∼T
w σ

L̇ = −E Q E + 2(C E) 3I
T
ψ +w ψa
T
a + ψc c + ψσ +ξ +ε
     
∼˙T ∼ ˙
∼ ∼T ˙
∼ ∼T
+ 2tr w F −1 w + tr aG −1 a + tr c H −1 c
 
˙ −1 ∼T

+ tr σ J σ

∼∼  ∼T  ∼T  ∼T
L̇ = −E Q E + 2r 3I wψ +w ψa a + ψc c + ψσ σ

T T
2(C E)T (ξ + ε)
     
∼˙T ∼ ˙
∼ ∼T ˙
∼ ∼T
+ 2tr w F −1 w + tr aG −1 a + tr c H −1 c
 
˙
∼ ∼T
+ tr σ J −1 σ

   
Using the property B T A = tr AB T = tr B T A for any A, BR N ×1 , and it can
be written that




r T (3I ) w ψ = tr ψr T (3I ) w
654 K. J. C. Kumara and M. H. M. R. S. Dilhani
 
∼  ∼T ∼  ∼T
r T w ψa a = tr r T w ψa a
 
∼  ∼T ∼  ∼T
r T w ψc c = tr r T w ψc c
 
∼  ∼T ∼  ∼T
r T w ψσ σ = tr r T w ψσ σ

Then,
 
∼˙T −1 ∼
L̇ = −E Q E + 2(C E) (ξ + ε) + 2tr w F w
T T


   
∼ ˙ −1 ∼T
∼ ˙ −1 ∼T

+ 2tr ψr (3I ) w + tr aG a
T
+ tr c H c
 
˙
∼ ∼T
+ tr σ J −1 σ
   
 ∼T  ∼T
T ∼ T ∼
+ 2tr r w ψa a + 2tr r w ψc c
 
∼  ∼T
+ 2tr r T w ψσ σ

˙
∼ ∼ ∼˙ ∼ ∼˙ ∼ ˙
∼ ∼
Since w = − ẇ, a = − ȧ, c = − ċ and σ = − σ̇ with the tuning laws in (14),
above L̇ is further derived to the form,

L̇ = −E T Q E
+ 2(C E)T (ξ + ε)
     
+ 2κEtr w T w − w + 2κEtr w T w − w

     

T

+ 2κEtr a a − a + 2κEtr σ T σ − σ
     √ 
Q min E + 2κ ŵ ŵ − wmax + 2κ â â − amax + 2(ξ N + ε N )
L̇ = −E    
+ 2κ ĉ ĉ − cmax + 2κ σ̂ σ σ − σmax − 2

which is negative if the term within the bracket (TWB) is positive.


 2  2
WB = 2κ ŵ − wmax /2 − κwmax 2
/2 + 2κ â − amax /2
 2
− κamax
2
/2 + 2κ ĉ − cmax /2 − κcmax 2
/2
 2 √
+ 2κ σ̂ − σmax /2 − κσmax /2 − 2 2(ξ N + ε N )
2

which is guaranteed to be positive as long as either,


Shape-Adaptive RBF Neural Network … 655
 2  √
κ/2 wmax + amax
2
+ cmax
2
+ σmax
2
+ 2 2(ξ N + ε N )
E>
Q max
or
 √
wmax 1 2  2 2(ξ N + ε N )
w  > + w + amax + cmax + σmax +
2 2 2
2 4 max κ
or
 √
amax 1 2  2 2(ξ N + ε N )
a  > + w + amax + cmax + σmax +
2 2 2
2 4 max κ
or
 √
cmax 1 2  2 2(ξ N + ε N )
c > + w + amax
2 + cmax
2 + σmax
2 +
2 4 max κ
or
 √
σmax 1 2  2 2(ξ N + ε N )
σ  > + wmax + amax
2 + cmax
2 + σmax
2 +
2 4 κ

Therefore, convergence is guaranteed, and the system of error dynamics


[Eqs. (17)–(18)] is stable in the sense that the size of the state vector is bounded.

6 Prototype and Mathematical Model of Surface Vessel

The CAD model (Fig. 1) of the surface vessel designed in Catia V6 by following
the standards of the naval architecture described especially in SV designs in [24–26]
for autonomous applications. The SV design is consisted of two similar hulls (both
shape and the mass) as the main floating bodies of the SV. Two electric propellers are
connected to the back end of the hulls as shown. To maintain the vertical stability,
a submerged body of aerofoiled shape Gertler body is attached to thin vertical strut
and that is fixed to the SV structure symmetrically. Physical dimensions of the SV
of the vessel are given in Table 1. The mathematical model of the SV was devel-
oped following the first principles and using the standard notations rather physical
parameters that are later substituted to at the numerical simulation stage.
The frame definitions are as follows.
EF: origin coincides with the center of gravity (CG) of the vessel at the initial
position. XE axis is directed toward the North. YE axis is directed toward the East
of the Earth. ZE axis points downward.
656 K. J. C. Kumara and M. H. M. R. S. Dilhani

Table 1 Design parameters


SV part Physical CAD measurement
of the prototype surface vessel
dimensions
Hulls (2 Nos.) Length 2.540 m
Beam 0.152 m
Draft 0.127 m
Platform area 0.774 m2
Vertical strut Length 1.580 m
Beam 0.025 m
Chord 0.075 m
Projected area 0.035 m2
(projected through
surge)
Projected area 0.119 m2
(projected through
sway)
Gertler body Length 1.870 m
Greatest diameter 0.443 m
Propellers (2 Nos.) Blade diameter 0.100 m
Distance from the 0.850 m
CG (surge
direction)
Surface vessel Total mass (m) 505.12 kg
Yaw inertia (I z ) 559.23 kg m2

BF: origin is fixed at the CG of the vessel. XB is directed toward the sway, YB is
directed toward the surge.
Vertical (heave), pitch, and roll motions of the SV are neglected as appreciable loss
in accuracy under typical and slightly severe maneuvers is very small. Therefore, SV
mathematical representation is limited to three degrees of freedom (3DoF) system.
The configuration vector of the SV in BF with respect to (w.r.t.) EF can be written as
 T
η(t) = x y φ ; t ≥ 0 (19)

where x = x(t) and y = y(t) represents the linear displacements in the surge
(X E ) and sway (Y E ) directions and, φ = φ(t) is the yaw about Z E . By defining SV
velocitiesu = u(t), v = v(t) and ω = ω(t) in the directions ofX B , Y B and rotation
about Z B directions, respectively, velocity vector of the SV is given by (20).
 T
V (t) = u v ω ; t ≥ 0 (20)

The rotation matrix (from BF to EF) can be obtained as,


Shape-Adaptive RBF Neural Network … 657
⎡ ⎤
cos(φ) −sin(φ) 0
J (η) = ⎣ sin(φ) cos(φ) 0 ⎦ (21)
0 0 1

In (21), the angle of trim and angle of the roll are considered negligible and have
a minimum effect on the dynamics of the SV. The relationship between (19) and (20)
can be further derived using (21) to describe the kinematics of the SV as,

η̇(t) = J (η)V (t) (22)

By applying the Newton–Euler equations to the motion of the SV (Fig. 2), the
general equations of motion along the directions of BF that describe the dynamics
of the SV are obtained as,

X = m(u̇ − vω − yG ω̇ − x G ω2 )

Fig. 2. 3D design of the prototype SV, its main components, and the coordinate frames defined for
the analysis of kinematics and dynamics (<1>: two hulls, < 2>: vertical strut, < 3>: gertler body,
<4>: two propellers, CG: Center of gravity, EF: Earth fixed-frame, B: Body)
658 K. J. C. Kumara and M. H. M. R. S. Dilhani

Y = m(v̇ − uω + x G ω̇ − yG ω2 )
N = I Z ω + m[x G (v̇ + uω) − yG (u̇ − vω)] (23)

where X , Y , and N are the external forces and moment acting on the vehicle. x G and
yG are the distances to the CG of the SV from the origin of the BF. Here, by placing
the origin of the BFF at the CG, x G → 0,yG → 0. Hence, the above set of relations
(23) are further simplified and concisely derived as,

TR (t) = M V̇ (t) + [C(V ) + D(V )]V (t) + g0 [η(t)]; t ≥ 0. (24)

where M is the positive definite mass-inertia matrix, C[V (t)] ∈ R3×3 is the total
matrix for Coriolis and centripetal terms, and D[V (t)] ∈ R3×3 is the damping force
matrix, g0 [η(t)] ∈ R3 represents the vector of gravitational forces and moments,
and finally, TR (t) ∈ R3×3 gives the input vector that represents the external forces
and moments on the SV. The detailed version of the mathematical model in (24) is
developed by considering the SV parameters and marine hydrodynamics theories in
[24] [26]and [27]. Time differentiating (22) and substituting into (24) yields,

M η̈(t) = f [η, V ] + τ[η, U ] (25)

where f [η, V ] = −J (η)[C(V ) + D(V )]V (t) − J˙(η)V (t) and τ[η, U ] =
j(η(t)) · g(U ).
The controlled terms g(U ) given by (26) are determined by the control method
described in Sects. 2 and 4 under numerical simulations. Furthermore, propeller
thrust (T ) and angle (δ) provide the actual output to move the SV. Once the entries
of g(U ) are known, (26) will then be solved for the control vector, U.
 T
g(U ) = T cos(δ) T sin(δ) T sin(δ) (26)

One may refer [18] for detail kinematics and dynamics analysis of the SV with
all the physical properties and marine hydrodynamic modeling.

7 Final Controller Design and Numerical Simulations

Referring back to the error dynamics described by the state space model in (16) and
(17), matrix A is selected, so that first part of (16) contributes to the PD controlling,
while nonlinear dynamics are handled by the second part with SA-RBF-NN control-
ling signal. Taking K p and K d are proportional and derivative gains, respectively, the
PD controller gain functions are given as follows.
 
0 1
A=
−K p −K d
Shape-Adaptive RBF Neural Network … 659

 T  T  T
Further by defining B = 0 1 , C = 1 1 and E = e ė are completed
the definition of the state space model of the SV control system.
It can also be proved that the transfer function T (s) of the state space model of
SV is stable for K P ≥ 0 and K d > 1 converting the transfer function into controller
canonical form [20] and by using the pole placement approach. This proof can be
found elsewhere [17] for weight tuning law, and one can follow the same procedure
to obtain the same for all other tuning laws.
The control method presented requires full state feedback and acceleration
measurements in both surge and sway directions. Yaw rate can be measured using a
gyroscope placed near the CG, and accelerations are measured by accelerometers.
Nowadays, an inertial sensor-based low-cost hardware–software system is available
with all the above measuring capabilities [28]. Further Kalman filter-based algorithm
is used to estimate the velocities of SV [29]. With the availability of geographical
positioning system (GPS) signal corrected to localize dead reckoning, absolute posi-
tion coordinates are obtained. Further, as the SA-RBF-NN controller developed here
is compact, and an embedded computer is capable enough to process the data and
calculate the control signal to deliver the required thrust in real time.
The completed mathematical model and the controller are converted to a MATLAB
[30] code and simulated for the eight-shape trajectory defined by (27) as the desired
path (with positions; xd , yd ). The application proposed here is the load and unloading
of ship cargos temporarily anchored near the harbor.
 
αt
xd = 2Rsin
2
yd = Rsin(αt) (27)

Two SA-RBF-NN subnets are derived based on the controller described above
to handle the surge and sway dynamics. The design parameters of the controller
tuned with the randomly selected valued to achieve the best performance in terms of
stability and the tracking accuracy. The initial simulation results shown in Figs. 3, 4,
and 5 indicate that the SA-RBF-NN has the highest position accuracy compared to
the conventional PD controller for the controller gains, K p = 1, and K d = 7. Further,
these results are shown only the online training where dynamics are changed based
on the desired path and desired speeds governed by the desired trajectory and hence
the dynamics of the SV.

8 Conclusion and Future Works

SA-RBF-NN is designed and applied for a real-time path tracking application


combining with classical PD control found in control engineering. In the reported
literature, RBF-NNs are found very efficient and compact along with their fast
learning and accurate approximation properties. Conventional RBF-NN has been
660 K. J. C. Kumara and M. H. M. R. S. Dilhani

Fig. 3 Tracking of eight-shape trajectory by PD controller only and with SA-RBF-NN controller

Fig. 4 Surge directional tracking of error of eight-shape trajectory by PD controller only and with
SA-RBF-NN controller
Shape-Adaptive RBF Neural Network … 661

Fig. 5 Sway directional tracking of error of eight-shape trajectory by PD controller only and with
SA-RBF-NN controller

presented for various application with weight updates which is modified by intro-
ducing its shape change by integrating RBF center states, standard deviations, and
altitudes, so that activation function itself updated and adapt to the situation. With
the initial RBF function in (1), the author’s previous controller design approach is
extended by modifying the activation function given by (7). Tuning laws proposed
are optimized and developed by considering the overall feedforward transfer function
of the state space model consisted of error dynamics, and the control signal is conver-
gent, and therefore, controller design parameters are selected. A short tracking path
of the eight-shape curve is selected, and numerical simulations are carried out for the
dynamic model of prototype SV designed in 3D with all necessary components. Two
propellers and propeller angles (two propellers together) are controlled by the control
signal delivered by two neural subnets developed to handle longitudinal (surge) and
lateral (sway) dynamics. Results of numerical simulations are shown that the desired
trajectory is accurately tracked by the newly developed controller the SA-RBF-NN
by combining PD controller, compared to the latter controller both with the full state
feedback sensors. Therefore, SA-RBF-NN controller can be proposed to control such
nonlinear systems and especially when run by low-cost embedded hardware as the
controller is compact and need less computing power compared to most other NN
based controller today.
In an actual situation, this type of SV come across many other challenges such
as obstacle avoidance, navigation, and mapping and various application-based prob-
lems. Some of them can easily be solved by integrating the LiDAR and vision-based
sensing system; however, the computing power and energy are necessitated. Future
662 K. J. C. Kumara and M. H. M. R. S. Dilhani

works of the research included the deployment of the SA-RBF-NN + PD controller


developed here to low-cost hardware like Jetson Nano by NVIDIA [31] that can
handle image processing with sufficient speed for such applications and run robotics
operating system (ROS) [32] in the laboratory environment. Further controlling of
autonomous ground vehicles will also be a good candidate for this, especially where
pitching and rolling are minimized, so that two subnets can optimally handle the
desired path tracking.

References

1. Buhmann M (2009) Radial basis functions: theory and implementations. Cambridge University
Press: Cambridge, pp 1–4
2. Graupe D (2007) Principles of artificial neural networks. World Scientific, Singapore
3. Park J, Sandberg I (1991) Universal approximation using radial-basis function networks. J.
Neural Comput 3(2):246–257
4. Moody J, Darken C (1989) Fast learning in networks of locally tuned processing units. J. Neural
Computing. 1(2):281–294
5. Wang L, Fu X (2005) A simple rule extraction method using a compact RBF neural network.
In: Advances in neural network. Springer, Heidelberg pp 682–687
6. Baughman D, Liu Y (1995) Classification: fault diagnosis and feature categorization. In: Neural
networks in bioprocessing and chemical engineering. Academic Press Ltd., California. pp
110–171
7. David VK, Rajasekaran S (2009) Pattern recognition using neural networks and functional
networks. Springer, Heidelberg
8. Wu J (2012) Prediction of rainfall time series using modular RBF neural network model coupled
with SSA and PLS. In: Asian conference on intelligent information and database systems.
Kaohsiung, Taiwan (2012)
9. Saastamoninen A, Lehtokangas M, Varri A, Saarinen J (2001) Biomedical applications of radial
basis function networks. In: radial basis function networks. vol 67. Springer, pp 215–268
10. Halali M, Azari M, Arabloo M, Mohammadi A, Bahadori A (2016) Application of a radial
basis function neural network to estimate pressure gradient in water–oil pipelines. J. Taiwan
Inst Chem Eng 58:189–202 (Elsevier2016)
11. Wang P (2017) The application of radial basis function neural network for mechanical fault
diagnosis of gear box. In: IOP conference series: materials science and engineering. Tianjin,
China
12. Liu J (2010) Adaptive RBF neural network control of robot with actuator nonlinearities. J.
Control Theory Appl 8(2):249–256
13. Chaudhuri A (2012) Forecasting financial time series using multiple regression, multi-
layer perception, radial basis function and adaptive neuro fuzzy inference system models:
a comparative analysis. J Comput Inf Sci 5:13–24
14. Sisil K, Tsu-Tian L (2006) Neuroadaptive combined lateral and longitudinal control of highway
vehicles using RBF networks. IEEE Trans Intell Transp Syst 17(4):500–512
15. Marino R (1997) Adaptive control of nonlinear systems: basic results and application. J Annu
Rev Control 21:55–66
16. Howlet R, Jain L (2010) Radial basis function networks 1: recent developments in theory and
applications. Springer, Heidelberg
17. Kumara KJC, Sisil K Intelligent control of vehicles for “ITS for the sea” applications. In: IEEE
third international conference on information and automation for sustainability. IEEE Press,
Melbourne, pp 141–145
Shape-Adaptive RBF Neural Network … 663

18. Kumara KJC (2007) Modelling and controlling of a surface vessel for “ITS for the Sea”
applications. Master thesis. University of Moratuwa
19. Fadali A, Visioli A (2013) Elements of nonlinear digital control systems. In: Digital control
engineering. Academic Press, Amsterdam, pp 439–489
20. Ogata K (2010) Modern control engineering. Prentice Hall, Boston, pp 649–651
21. Giesl P (2007) Construction of global lyapunov functions using radial basis functions. Springer,
Heidelberg, pp 109–110
22. Zhang J, Xu S, Rachid A (2001) Automatic path tracking control of vehicle based on Lyapunov
approach. In: IEEE international conference on intelligent transportation systems. IEEE Press,
Oakland
23. Vidyasagar M (1993) Nonlinear systems analysis. Prentice-Hall, Englewood Cliffs
24. Bishop B (2004) Design and control of platoons of cooperating autonomous surface vessels. In:
7th Annual maritime transportation system research and technology coordination conference
25. Caccia M (2006) Autonomous surface craft: prototypes and basic research issue. In: 14th
Mediterranean conference on control and automation
26. Vanzweieten T (2003) Dynamic simulation and control of an autonomous vessel. Master thesis.
Florida Atlantic University, Florida
27. Newman J (1977) Marine hydrodynamics. MIT Press, London
28. Sukkarieh S (2000) Low cost, high integrity, aided inertial navigation system. Ph.D. thesis.
University of Sydney, Sydney
29. An intro to Kalman filters for autonomous vehicle. https://towardsdatascience.com/an-intro-
to-kalman-filters-for-autonomous-vehicles
30. MATLAB. https://www.mathworks.com/products/matlab.html
31. NVIDIA Jetson nano developer kit. https://developer.nvidia.com/embedded/jetson-nano-dev
eloper-kit
32. ROS: robot operating system. www.ros.org
Electricity Load Forecasting Using
Optimized Artificial Neural Network

M. H. M. R. Shyamali Dilhani, N. M. Wagarachchi,


and Kudabadu J. C. Kumara

Abstract Electric load forecasting becomes one of the most critical factors for the
economic operation of power systems due to the rapid increment of daily energy
demand in the world. The energy usage of the electricity demand is higher than the
other energy sources in Sri Lanka according to the record of Generation Expansion
Plan—2016, Ceylon Electricity Board, Sri Lanka. Moreover, forecasting is a hard
challenge due to its complex nature of consumption. In this research, the long-term
electric load forecasting based on optimized artificial neural networks (OANNs) is
implemented using particle swarm optimization (PSO) and results are compared with
a regression model. Results are validated using the data collected from Central Bank
annual reports for thirteen years from the year 2004–2016. The choice of the inputs
for ANN, OANNs, and regression models are given depends on the values obtained
through the correlation matrix. The training data sets used in the proposed work are
scaled between 0 and 1, and it is obtained by dividing the entire data set by its large
value. The experimental results show that OANN has better accuracy in forecasting
compared to ANN and regression model. The forecasting accuracy of each model is
performed by the mean absolute percentage error (MAPE).

Keywords Back propagation · Electricity load forecasting · Neural network ·


Particle swarm optimization · System-type architecture

M. H. M. R. Shyamali Dilhani (B) · N. M. Wagarachchi


Department of Interdisciplinary Studies, University of Ruhuna, Hapugala, Galle, Sri Lanka
e-mail: rasidilhani@gmail.com
N. M. Wagarachchi
e-mail: mihirini@is.ruh.ac.lk
K. J. C. Kumara
Department of Mechanical and Manufacturing Engineering, Faculty of Engineering, University of
Ruhuna, Hapugala, Galle, Sri Lanka
e-mail: kumara@mme.ruh.ac.lk

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 665
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_48
666 M. H. M. R. Shyamali Dilhani et al.

1 Introduction

Predicting future electricity consumption is vital for utility planning. The process is
difficult due to the complex load patterns. Electricity forecasting is the basic planning
process which is followed in the electric power systems industry for a particular
area over different time horizons [1, 2]. Accurate electricity forecasting leads to
reduce operation and maintenance costs. It increases the power supply reliability and
delivery system which helps to obtain a reliable decision for future development. At
present, utilities have been growing interest in smart grid implementation. Therefore,
electricity forecasting has a greater impact on storage maintenance, demand-side
management, scheduling, and renewable energy integration. Forecasting helps the
user to obtain the relationship between the consumption and its price variations in
detail [3].
Forecasting is broadly classified into long-term forecasting, medium-term fore-
casting, and short-term forecasting. In this, long-term forecasting is used to predict
the plant capacity and its planning, medium-term forecasting is claimed to predict
the plant maintenance schedule and short forecasting is used to predict the daily
operations. The proposed research work focused on long-term forecasting. Long-
term forecasting is common in the planning and operation of electric utilities [4].
Electricity consumption varies according to the economic and social circumstances
of a society. The major advantage of long-term electricity load forecasting is it indi-
cates economic growth for the system. Moreover, it provides relevant data in terms of
transmission, distribution, and generation. However, electricity forecasting accuracy
differs from the nature of the situation. For example, one can forecast daily load
demand in a particular region within 1–3% accuracy, whereas accurate prediction
for an annual load is a complex process due to the unavailability of long-term load
factors information [2, 3, 5].
Annual electricity forecasting in Iran is reported in Ghanbari et al. [6] research
work where the artificial neural network and linear, log linear regression models are
used to experiment. Real GDP and population are the two economical parameters that
are considered as the experimental lags in the proposed approach. Abd-El-Monem
[7] provides an in-depth analysis for forecasting in the Egyptian region where ANN
and other forecasting parameters are used to test the given condition. The study
provides better analysis for load demand, sales, population, GDP, and average price
in an accurate manner including econometric variables.
In addition to that, meta-heuristic models are used to optimize the forecasting
model which gives reliable and robust results. Using global search procedures,
probabilistic and heuristic nature are analyzed [8–11].
Since back propagation (BP) algorithm stops at local minima, many researchers
have an interest in training the ANN model using PSO. The ability to solve complex
and nonlinear functions using PSO is a major influence to use it for electricity demand
forecasting [12–15]. PSO is a fast convergence process which is based on swarm-
based operation in which the particles are adjusted to obtain the desired performance
output.
Electricity Load Forecasting Using Optimized Artificial Neural Network 667

The main purpose of this research is to investigate how an optimized neural


network influences the performance of the long-term electricity forecasting in Sri
Lanka the regression and BP ANN models. Thirteen years of historical electricity data
are considered for this experiment from Central Bank annual reports of Sri Lanka.
Typical monthly and annual load demand patterns from the year 2004–2016 depicted
in Fig. 1 clearly show that the demand has increased over the past 12 year’s period
continuously. The rest of the paper is organized as follows. The next section describes
numerous electricity forecasting models reported in the literature. The PSO optimized
ANN model proposed by this paper is discussed in Sect. 3. Section 4 presents the
forecasting performances followed by results and discussion and concluding remarks
of the research by Sects. 5 and 6, respectively.

2 Electricity Forecasting Models

2.1 Artificial Neural Networks

The neural network model is developed based on the human brain structure where
the nerves are considered as neurons for process the given input. Figure 2 depicts an
illustrative representation of a neural network and neurons. Generally, it has three
layers such as the input layer, hidden layer, and output layer. These layers are inter-
connected based on the weights. The hidden layer is used to connect the input and
output layer and the weights are used to reduce the error between the layers. Based
on the learning rules, the weights are modified in the neural network architecture.
Most probably, initial weights are chosen randomly. Then adjust them by comparing
the output error.
The mathematical representation of the input parameter of the neuron is
X (where X = [x1, x2, . . . .xn ]) and the output is y. W = [w1 , w2 , . . . wn ] represents
the synaptic weights and b is the bias. The neuron output is given by (1).

y= wn x n + b (1)

The estimation of these parameters focusing the minimum error criterion is called
the training of the network. Back propagation [16] is a widely used model in a neural
network and various research works and applications are evolved using this back
propagation algorithm [17–19]. It updates the weights and biases until it obtains a
zero training error or predetermined epochs. Weight is frequently changed based on
the error function obtained between the actual output and desired output values. The
weight correction at each iteration, k of the algorithm is given as,

wik+1 = wik + αik gik (2)


668 M. H. M. R. Shyamali Dilhani et al.

Fig. 1 Monthly load demands in Sri Lanka

Fig. 2 Neuron structure of


artificial neural networks

Neuron

Inputs
⋮ ⋮
Output

where wik is the current set of weights and biases, gik is the current gradient which is
based on the given error, and αik is the learning rate.

2.2 Particle Swarm Optimization (PSO)

ANNs’ parameter optimization is a major problem when it is used in electricity


demand forecasting. For parameter optimization, various methods are evolved
through particle swarm optimization [20, 21] is familiar among all of them. The
Electricity Load Forecasting Using Optimized Artificial Neural Network 669

first algorithm has been developed based on the observations from fish swarms and
bird flocks. Multi-dimensional search space is considered as the main objective of
PSO where the swarm of the objects is used to analyze in the search space. Based
on the particle movements, PSO provides a globally optimum solution.
PSO uses local optima to handle optimization issues and it is useful for model
implementation. Also, PSO has successfully been applied to least squares estima-
tion [22], multiple regression models [15], neural network training [14, 23–26], and
support vector machine (SVM) [27]. The generic steps of the PSO are mathematically
interpreted as follows. Each particle is initialized in the search space with random
positions and velocities. The position and the velocity of each particle at genera-
tion i are given by the vectors xi = (xi1 , xi2 , . . . , xid ) and vi = (vi1 , vi2 , . . . , vid )
respectively, where d is the number of particles.
Based on the fitness function of particles, two kinds of memories are obtained in
the particle swarm optimization process. The fitness value is the mean squared error
(MSE) between the target and actual data series. After calculating the fitness value
of all the particles, PSO updates the personal best (pbest) and global best (gbest),
where pbest is the best personal value for each particle updated so far and the gbest is
the best global value for the entire set. The pbest and gbest represent the population,
velocities, and positions are updated according to the (3) and (4).

Vk (i + 1) = w.Vk (i) + n 1 .r1 .(pbest − xk (i)) + n 2 .r2 .(gbest − xk (i)) (3)

xk (i + 1) = xk (i) + Vk (i + 1) (4)

where Vk (i) and xk (i) are the velocity and position of particle k at ith iteration and
w is the inertia weight. r1 and r2 are the random values between 1 and 0 and the
predetermined learning factors are given as n 1 and n 2 .

3 Proposed Techniques

In this proposed work, three models are employed to solve the problem of long-term
electricity demand forecasting.
1. Forecasting model using back propagation ANN.
2. Forecasting model using ANN optimized by PSO.
3. Forecasting model using linear regression.
The first model uses BP to train the weights of the ANN, while the second model
uses the PSO to optimize the weights of ANN. The third model discusses a statistical
model called linear regression to forecast long-term electricity demand. The results
are obtained using real historical monthly data from Central Bank reports, Sri Lanka.
These methods are explained in the following subsections in detail.
670 M. H. M. R. Shyamali Dilhani et al.

3.1 Forecasting Model Using Back Propagation ANN

In this model, forecasts will be prepared for each month of the year 2016. Five inputs
are created including population (1000 per capita), GDP (per capita in US$), energy
sales (GWh), an exchange rate (US$), and historical loads (MW). The monthly GDP
data are collected for thirteen years from 2004 to 2016. The correlation matrix is
used to obtain the input choices and results. Figure 3 depicts that the selected five
factors are highly correlated with each other. Table 1 summarizes the results from
the correlation matrix. For example, the correlation coefficient between historical
annual load and population, GDP, energy sales, and exchange rate are 0.917, 0.972,
0.999, and 0.953, respectively.
All the training data in this process are scaled to be between 0 and 1. To meet
this, all the data sets are divided by its largest singular value. The proposed model is
described with the following equation to explain the inputs and outputs for the ANN.

F(m) = a1 L(m − 1) + a2 Pop(m − 1) + a3 GDP(m − 1)


+ a4 ES(m − 1) + a5 ER(m − 1) (5)

Fig. 3 Correlation matrix of all the input factors


Electricity Load Forecasting Using Optimized Artificial Neural Network 671

Table 1 Correlation coefficient between each input factors


Annual load Midyear GDP/Per Energy sales Exchange rate
values population capita (Avg.)
Annual load 1
values (MW)
Population 0.917 1
(‘000)
GDP/Per capita 0.972 0.900 1
(US$)
Energy sales 0.999 0.916 0.969 1
(GWh)
Exchange rate 0.953 0.816 0.928 0.956 1
(Avg US$)

Load of the forecasting month, F(m) is calculated using the load of the previous
month, L(m − 1), the population in the previous month, Pop(m − 1), the per capita
GDP in previous month, GDP(m − 1), energy sale in the previous month, ES(m − 1),
and the exchange rate in the previous month, ER(m − 1). In this model, ANNs are
trained by 12 years’ historical data (from 2004 to 2015) and designed to predict the
total load for each month of the year 2016. In this process, the target set is loaded with
data from 2004 to 2015 which has 144 values and the input set is used to load with
144 × 5 elements as historical load values, monthly population, etc. The monthly
population of Sri Lanka where annual population increases has divided by 12 to take
monthly population assuming uniform population growth, the third column is the
per capita GDP, the fourth column is the historical energy sales, and the last column
is the exchange rate in Sri Lanka for the specified period.
A prior analysis and simulations were carried out for different ANN structures
with different training functions by varying the hidden neurons. By varying biases
and weight functions at each series, various results are obtained for the same struc-
ture. Through the conjugate gradient BP (traincgb), training function performs better
compared to other training functions. Also, minimum forecasting error was obtained
from the three-layer topology with five hidden neurons for the first and second layers
and the last layer consists only one hidden neuron. Therefore, the later experiment is
performed with a similar structure identified through the secondary experiment. The
BP training algorithm is used with 1000 epochs to find the optimum weight values
of the network.

3.2 Forecasting Model Using ANN Optimized by PSO

In particle swarm optimization, a set of weights are used to represent each particle
which provides the relationship between the neurons. Mean square error is the fitness
function of every particle which is used to measure from the output and the weights of
672 M. H. M. R. Shyamali Dilhani et al.

the given series. The error function could be reduced by updating the weight function
frequently. Once the fitness functions are calculated, the pbest and the gbest values
are updated in the process which describes the effective weight of the particle from
the entire set.
The process of ANN optimized by PSO is summarized as follows.
Step 1 Sample data are scaled to be between 0 and 1.
Step 2 All the variables are randomly initialized and update the velocity and the
position of each particle. In the process, 0 and 1 are assigned to r 1 and r 2 as
a random variable and the learning factor along with inertia weight are fixed
into 0.5 and 2, respectively. The maximum number of iteration is 100 and
the fitness value is calculated using MSE. This step also allows placing the
weights and biases of each particle. The total number of weights and biases
for the proposed model is 36: 30 weights and 6 biases.
Step 3 Calculate the MSE using the following equation

1
n
MSE = (L m − Fm )2 (6)
n m=1

where training samples are given as n, and load value is represented as L m


at mth sample while Fm is the output load at mth sample. The fitness of each
particle is defined by (6). If the new position is better than pbest, pbest will
be replaced by the new position. Otherwise, it will not change. The same
concept updates the gbest.
Step 4 According to Eqs. (3) and (4), the position and velocity of each particle are
updated.
Step 5 If the stopping conditions are met, the algorithm terminates. Otherwise, the
process repeats from step 3.
Step 6 Take the optimum set of parameters from PSO and put in the ANN to forecast
the monthly electricity demand.

3.3 Forecasting Model Using Linear Regression

The linear regression model is considered to forecast the monthly load of the year
2016. The regression model is a statistical technique and most of the researchers use
this model due to the easiness of model implication. It is the same as the factors which
is used in the ANN model. The forecasted load is considered as the dependent variable
and the other factors are considered as the independent variables. The mathematical
representation of this model can be summarized as follows.

F(m) = C0 + C1 L(m − 1) + C2 Pop(m − 1) + C3 GDP(m − 1)


+ C4 ES(m − 1) + C5 ER(m − 1) (7)
Electricity Load Forecasting Using Optimized Artificial Neural Network 673

where F(m) is the forecasted month, Ci , i = 0, . . . , 5 are the load forecasting


estimated coefficient function. This condition is applied to the data sets to obtain the
monthly load values.

4 Forecasting Performance

Absolute percentage error and mean absolute percentage are used to calculate the
accuracy of the forecasting model and it is given as
n 


 L m − Fm 
APE =   × 100 (8)
 L 
m=1 m

APE
MAPE = (9)
n
where n is the total number of months, L m represents the actual load demand at
month m, and Fm represents the forecasted load demand at month m.

5 Experimental Results and Discussion

In this section, the MAPE results for ANN, OANN, and regression model simulations
are summarized. According to the MAPEs in Table 2, it is observed that OANN
attains best performance in annual electricity forecasting and electricity load demand
compared to ANN with back propagation and linear regression model.
All the forecasting models have five input variables together with the historical
load demand to forecast the monthly load of the year 2016. An optimized neural
network consists of optimized weights using particle swarm optimization performs
better compared to the other two models. It has shown that the average monthly
forecasted error is 1.836. The second least average forecasting error is given by a
neural network. The average monthly forecasting error is 2.502 while the regression
model shows a 2.949 average forecasting error.
All three models have their highest forecasting error in April (3.611, 4.665, and
4.976, respectively), whereas minimum forecasted errors are given in December
(0.556, 1.452, and 1.753, respectively). Moreover, the optimized neural network has
shown that over 2 percent forecasting error only in February (2.613), May (2.304),
August (2.240), and November (2.142), whereas the ANN model and regression
model have more than 2 percent forecasting error for all the months except in
December.
Figure 4 shows that the best forecasting results are given by the OANN model.
Moreover, a paired t-test is carried to check the model accuracy. Tables 3 and 4
are showed the correlation between the actual load demand and the forecasted load
674 M. H. M. R. Shyamali Dilhani et al.

Table 2 Monthly and


Forecast month APE
annually APE and MAPEs for
the year 2016 OANN ANN Regression
January 1.007 2.395 3.227
February 2.613 2.924 3.807
March 1.354 2.005 2.108
April 3.611 4.665 4.976
May 2.304 2.559 2.678
June 1.366 2.287 2.641
July 1.929 2.074 2.322
August 2.24 2.323 2.562
September 1.704 2.31 2.809
October 1.212 2.568 3.725
November 2.142 2.463 2.785
December 0.556 1.452 1.753
MAPE 1.836 2.502 2.949
Monthly absolute percentage error (APE) for three different
models such as Optimized Artificial Neural Network, Artificial
Neural Network and Regression model are appeared in the each
column of the table. In addition to that, in the last row of the table
shown that annual mean absolute percent error (MAPE) for three
models

Fig. 4 Actual and forecasted loads for time series model

Table 3 Paired samples correlations


N Correlation Sig.
Pair 1 Actual and OANN 12 0.982 0.000
Pair 2 Actual and ANN 12 0.986 0.000
Pair 3 Actual and LR 12 0.983 0.000
Electricity Load Forecasting Using Optimized Artificial Neural Network 675

Table 4 Paired samples test


Paired differences T Sig.
Mean Std. Std. 95% Confidence (2-tailed)
deviation error interval of the
mean difference
Lower Upper
Pair Actual 21.67 9.34 2.69 15.74 27.61 8.03 0.000
1 -OANN
Pair Actual-ANN 30.05 8.65 2.49 24.55 35.55 12.03 0.000
2
Pair Actual-LR 34.84 9.68 2.79 28.68 40.99 12.46 0.000
3

by OANN, ANN, and regression model. It is highly correlated, and the pairs are
significant with probability value 0.

6 Conclusion

The technique based on the PSO and ANN was proposed in this research to fore-
cast monthly load demand for the Sri Lankan network. The results of numerical
simulations show that the OANN model together with five input factors reduces the
forecasting error significantly. The correlation matrix is used to obtain the choice
of inputs to obtain the desired results. The selected factors have higher correlations
with each other. In the data preparation process, all the training data are uniformly
scaled to be between 0 and 1. Weights and biases of the ANN model are optimized
by using PSO and BP training algorithms, and the regression model is considered
to check the model adequacy. Though the ANN and regression model provides rela-
tively better results, it is still not accurate as the OANN model. OANN performs well
as it is having a unique ability to deal with the nonlinearity of the model. As such, it
overcomes many time series models’ drawbacks as per the case presented and tested
by the work here. It can also conclude that all techniques are quite promising and
relevant for long-term forecasting according to the paired t-test results.

References

1. Beaty HW (2001) Handbook of electric power calculations. McGraw-Hill


2. Singh AK, Ibraheem SK, Muazzam M, Chaturvedi DK (2013) An overview of electricity
demand forecasting techniques. In: National conference on emerging trends in electrical,
instrumentation & communication engineering 2013, Network and complex systems, pp 38–48.
3. Chakhchoukh Y, Panciatici P, Mili L (2011) Electric load forecasting based on statistical robust
methods. IEEE Trans Power Syst 26(3):982–991
676 M. H. M. R. Shyamali Dilhani et al.

4. Chow JH, Wu FF, Momoh JA (2005) Applied mathematics for restructured electric power
systems. In: Applied mathematics for restructured electric power systems, Springer, pp 1–9
5. Starke M, Alkadi N, Ma O (2013) Assessment of industrial load for demand response across
US regions of the western interconnect. Oak Ridge National Lab. (ORNL), Oak Ridge, TN,
US
6. Ghanbari A et al (2009) Artificial neural networks and regression approaches comparison for
forecasting Iran’s annual electricity load. In: International conference on power engineering,
energy and electrical drives, 2009, POWERENG’09. IEEE
7. Abd-El-Monem H (2008) Artifical intelligence applications for load forecasting
8. Zhang F, Cao J, Xu Z (2013) An improved particle swarm optimization particle filtering algo-
rithm. In: 2013 International conference on communications, circuits and systems (ICCCAS).
IEEE
9. Jiang Y et al (2007) An improved particle swarm optimization algorithm. Appl Math Comput
193(1):231–239
10. Samuel GG, Rajan CCA (2015) Hybrid: particle swarm optimization genetic algorithm
and particle swarm optimization shuffled frog leaping algorithm for long-term generator
maintenance scheduling. Int J Electr Power Energy Syst 65:432–442
11. Chunxia F, Youhong W (2008) An adaptive simple particle swarm optimization algorithm. In:
Control and decision conference, 2008. CCDC 2008. Chinese. IEEE
12. Subbaraj P, Rajasekaran V (2008) Evolutionary techniques based combined artificial neural
networks for peak load forecasting. World Acad Sci Eng Technol 45:680–686
13. Daş GLS (2017) Forecasting the energy demand of Turkey with a NN based on an improved
particle swarm optimization. Neural Comput Appl 28(1): 539–549
14. Jeenanunta C, Abeyrathn KD (2017) Combine particle swarm optimization with artificial neural
networks for short-term load forecasting. ISJET 8:25
15. Hafez AA, Elsherbiny MK (2016) Particle swarm optimization for long-term demand fore-
casting. In: Power systems conference (MEPCON), 2016 eighteenth international middle east.
IEEE
16. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating
errors. Nature 323(6088):533
17. Mazzoni P, Andersen RA, Jordan MI (1991) A more biologically plausible learning rule for
neural networks. Proc Natl Acad Sci 88(10):4433–4437
18. Dilhani MS, Jeenanunta C (2017) Effect of neural network structure for daily electricity load
forecasting. In: Engineering research conference (MERCon), 2017 Moratuwa. IEEE
19. Samarasinghe S (2016) Neural networks for applied sciences and engineering: from funda-
mentals to complex pattern recognition. Auerbach Publications
20. Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm.
In: 1997 IEEE international conference on systems, man, and cybernetics. Computational
cybernetics and simulation. IEEE
21. Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: Proceedings
of the sixth international symposium on micro machine and human science, 1995. MHS’95.
IEEE
22. AlRashidi M, El-Naggar K (2010) Long term electric load forecasting based on particle swarm
optimization. Appl Energy 87(1):320–326
23. Meissner M, Schmuker M, Schneider G (2006) Optimized particle swarm optimization (OPSO)
and its application to artificial neural network training. BMC Bioinf 7(1):125
24. Freitag S, Muhanna RL, Graf W (2012) A particle swarm optimization approach for training
artificial neural networks with uncertain data. In: Proceedings of the 5th international
conference on reliable engineering computing, Litera, Brno
25. Tripathy AK et al (2011) Weather forecasting using ANN and PSO. Int J. Sci Eng Res 2:1–5
26. Shayeghi H, Shayanfar H, Azimi G (2009) STLF based on optimized neural network using
PSO. Int J Electr Comput Eng 4(10):1190–1199
27. Sarhani M, El Afia A (2015) Electric load forecasting using hybrid machine learning approach
incorporating feature selection. In: BDCA
Object Detection in Surveillance Using
Deep Learning Methods: A Comparative
Analysis

Dharmender Saini, Narina Thakur, Rachna Jain, Preeti Nagrath,


Hemanth Jude, and Nitika Sharma

Abstract Unmanned aerial vehicle (UAV) technology has revolutionized the field
globally in today’s scenario. The UAV technologies enabled the activities to be
efficiently monitored, identified, and analyzed. The principal constraints of the
present surveillance system, along with closed-circuit television (CCTV) cameras,
are limited surveillance coverage area and high latency in object detection. Deep
learning embedded with UAVs has found to be effective in the tracking and moni-
toring of objects, thus overcoming the constraints mentioned above. Dynamic surveil-
lance systems in the current scenario seek high-speed streaming, and object detection
in real-time visual data has become a challenge over a reasonable time delay. The
paper draws a comprehensive analysis of object detection deep learning architec-
tures by classifying the research based on architecture, techniques, applications, and
datasets. It has been found that RetinaNet is highly accurate while YOLOv3 is fast.

Keywords Object detection · Convolutional neural network · Surveillance

1 Introduction

Smart city is fragmented without the incorporation of an effective surveillance system


(such as UAVs) based on Internet of things (IoT) that provides an impeccable anal-
ysis of captured videos and images [1]. A strenuous examination is required to
deploy smart autonomous surveillance systems which become challenging with
long-established object detection methods, built on slender trainable architectures.
Thus, IoT-based monitoring can be accomplished through object detection.

D. Saini (B) · N. Thakur · R. Jain · P. Nagrath · N. Sharma


Department of Computer Science Engineering, Bharati Vidyapeeth’s College of Engineering,
New Delhi, India
e-mail: mihirini@is.ruh.ac.lk
H. Jude
Department of ECE, Karunya University, Coimbatore, Tamil Nadu, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 677
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_49
678 D. Saini et al.

For detection and monitoring of objects in real time, high-level, large-scale data
is retrieved, localized, and classified. Object detection [2], hence, supplies important
facts about the acknowledged visual data for the logical understanding.
The detection and tracking of objects are ubiquitous and find its place in many
prevalent applications that includes surveillance through human behavior analysis
[3], driverless cars, medical diagnosis [4], pose estimation, handwriting estimation,
visual object detection, large-scale object detection, and traffic surveillance [5–7].
The field of object detection holds a huge potential for research to make the most effi-
cient learning algorithms. The learning models are trained and tested on the labeled
dataset. The applied algorithm must perform in real-time situations proficiently and
efficiently, particularly in fields of a major problem with safety.
The issues experienced in the identification of objects, such as diverse lighting
conditions, occlusion [8], varying viewpoints, and poses, have broad opened a fresh
research window for the building of systems that could effectively perform object
identification and localization tasks. Therefore, the task for state-of-the-art research
is not just restricted to detection and tracking but also to meet the abovementioned
challenges.
The full paper is organized in the following manner. Section 2 in this paper
discusses the related research work on object detection. Section 3 elaborates on
the existing architectures of object detection and draws a performance comparison.
Section 4 discusses challenges to object detection. Section 5 concludes the work
along with the future scope of the research.

2 A Brief Overview in Object Detection on Deep Learning

A most descriptive model of deep learning algorithm is CNN [9] also termed as
ConvNet. It is the non-recurrent feed-forward technique of artificial neural networks
employed to recognize visual data. Some of the traditional CNN architectures include
LeNet-5, ZFNet [10], VGG16 [11], and AlexNet [12], while modern approaches
include Inception, ResNet, and DenseNet. Convolution is a mathematical term or
operation referred for the sum of products, and successive convolutional layers are
subsets of previous layers interleaved with average pooling or maximum pooling
layers as depicted in Fig. 1. Each 3D matrix in the neural network is called the
feature map. Filtering and pooling of the transformations are applied to these feature
maps to extract robust features. LeNet 5 [13] was developed for postal services to
recognize handwritten digits of zip codes. AlexNet [12] is another architecture that
uses large kernel sized filters 11 × 11 in the first convolutional layer and 5 × 5 in
the second layer. The architecture is trained on the ImageNet dataset.
These large-sized non-uniform filters are replaced in VGG16 architecture that
has 3 × 3 uniform filters through 16 convolutional layers [11]. VGG16 architecture
has also been trained on ImageNet, and the model performance is very high with an
accuracy value of 92.7%. Modern technique based on CNN which includes Inception
architecture developed by Google, also referred to as GoogLeNet, contains Inception
Object Detection in Surveillance Using Deep Learning Methods… 679

Fig. 1 Convolutional neural network architecture

Table 1 Description of traditional convolutional neural network architectures


Paper Architectures Description
Yann LeCunn (1998) [13] LeNet Deep learning-based CNN architecture is
presented for handwriting recognition
Krizhevsky (2012) [12] AlexNet Large kernel 11 × 11 sized filter
convolution layers are used
Zeiler (2013) [10] ZFNet Filter layer’s size is reduced to 7 × 7 to
observe features in the pixel domain and
reduced error rates
Simonyan (2014) [11] VGG16 16 uniform convolution layers used with
3 X 3 uniform filters
Kaiming (2015) [14] ResNet Vanishing gradient problem addressed
through layer skipping method
Szegedy(2015) GoogLeNet (Inception) Convolution of feature maps is
performed on three different scales

cells that perform convolution in series at different scales. It uses global pooling at
the end of the network.
Deep neural networks face the concern of vanishing gradient due to the stacking
of several layers and backpropagation to the previous layers. This concern is
addressed and resolved in Residual Network (ResNet) by skipping one or more
layers, thereby creating shortcut connections in the convolutional network, while
DenseNet presented another approach by connecting each layer with the input layer
through shortcut connections. Table 1 draws a comparison among the discussed
architectures.

3 Object Detection

Object detection follows a series of the procedure from region selection where the
object is located by creating bounding boxes around the detected objects. This is also
referred to as region-based object detection. Following that, certain visual features
from selected objects recognized by SIFT [15] or HOG [16] algorithms are extracted
680 D. Saini et al.

Fig. 2 Types of object


detection Object detecƟon

Region based Regression based


Object detecƟon Object detecƟon

and classified to make the data hierarchical, followed by predicting the data to extract
logical information using SVM [17] or K-means classifiers. The object detection
architectures have been segregated based on localization techniques and classification
techniques. However, deep learning techniques are based on simultaneous detection
and classification referred to as regression techniques. Therefore, object detection is
broadly categorized as region-based and regression-based object detection shown in
Fig. 2.

3.1 Region-Based Object Detection

This section discusses various region-based object detection methods.

3.1.1 Region-Based CNN (RCNN)

RCNN uses selective search to generate exact 2000 region proposals, followed
by classification via CNN. These region proposals are refined using regression
techniques. But the assortment of 2000 regions makes the computation slow. The
architecture is as shown in Fig. 3.

Fig. 3 Region-based CNN architecture[18]


Object Detection in Surveillance Using Deep Learning Methods… 681

3.1.2 Fast RCNN

Fast region CNN [19] is an open-source and implemented in Python and C++ [20]. It
is more accurate with a higher mean average precision (MAP) and nine times faster
than RCNN since it uses single-stage training rather than the three-stage training used
in RCNN. The training in fast region CNN updates all network layers concurrently.

3.1.3 Faster RCNN

This architecture employs a combination of region proposal network (RPN) and fast
RCNN detector [21]. RPN is a complete convolutional network that predicts the
bounding region and uses these proposals to detect objects and predict scores. RPN
is based on the ‘attention’ mechanism and shares it with fast RCNN to locate the
objects. Fast RCNN detector uses the proposed regions for classification. Both the
accuracy and quality of region proposals are improved in this method. Figure 4 depicts
the faster RCNN architecture. The comparison of region-based object detection has
been given in Table 2.

Fig. 4 Faster RCNN architecture[21]

Table 2 Comparison of
Parameters RCNN Fast RCNN Faster RCNN
region-based object detection
methods Test time/Image (sec) 50 2 0.2
Speedup 1x 25x 250x
mAP (VOC 2007) 66.0 66.9 66.9
682 D. Saini et al.

Fig. 5 YOLO architecture [22]

3.2 Regression-Based Object Detection

This section elaborates regression-based object detection methods.

3.2.1 Yolo

You Look Only Once (YOLO) shown in Fig. 5 is faster object detection. This regres-
sion approach predicts the region proposals and class probabilities in sole estima-
tion by a single neural network. The basic YOLO processes images in real time
at 45 frames per second (fps) and stream video with latency less than 25 ms. The
regression-based value obtained by this method is twice as compared to other real-
time detectors [22]. It performs better than prior detection methods but lacks in the
detection of small objects.

3.2.2 Single Shot Multibox Detector (SSD)

SSD is also based on VGG-16 architecture [23], as shown in Fig. 6. It is a regression-


based technique that generates the object score at the prediction stage. It is faster than
YOLO, and its accuracy is comparable to faster RCNN. It has a region-based fixed
number of default bounding boxes for different aspect ratios and object categories
followed by a prediction of shape offsets and confidence scores.
Object Detection in Surveillance Using Deep Learning Methods… 683

Fig. 6 Single shot multibox detector (SSD) [23]

3.2.3 YOLOv2

YOLOv2 is an improved architecture of the previous YOLO version built on the


DarkNet-19 CNN model. It is further improved to an additional version YOLO9000,
proposed to detect over 9000 categories of objects. YOLOv2 achieves 76.8% mAP
on PASCAL VOC 2007 at 67 fps and 78.6% mAP at 40 fps [24].

3.2.4 RetinaNET

The extreme foreground–background class imbalance problem found in a one-stage


detector [25] is regarded as one of the major concerns in performance degradation. It is
resolved by RetinaNet, developed by Facebook AI Research (FAIR). The prediction
is improved by using focal loss or lower loss supplied by “easy” negative samples
to the “hard” samples. RetinaNet is proficient in feature extraction formed with the
combination of ResNet and FPN and outperforms faster RCNN. In this architecture,
feature maps are pyramidically stacked as shown in Fig. 7.

Fig. 7 RetinaNet architecture with pyramid stacked feature map layers [25]
684 D. Saini et al.

Fig. 8 YOLOv3 architecture [26]

3.2.5 YOLOv3

YOLOv3 [26] is built on the DarkNet-53 CNN model. This method uses logistic
regression for each bounding box to predict an object score. The value is “1” if
the real object boundary, overlapped by the bounding box, is more than any other
bounding box. The prediction is ignored if the previous bounding box overlaps the
object boundary more than the threshold value. YOLOv3 uses independent logistic
classifiers for multi-label classification. This method is three times faster than SSD
and equally accurate. Unlike previous versions of YOLO, this method is also capable
of detecting small objects. YOLOv3 architecture is illustrated in Fig. 8.
Table 3 embodies the models capable of region proposal and classification of
detected objects. It also highlights the constraint in terms of involved high computa-
tion time. From the table, a clear observation is drawn that the methods comprising
a selection of object regions and simultaneous classification are faster.
Table 4 is a detailed analysis of object detection methods describing backbone
CNN architecture, trained on MS COCO, PASCAL VOC 2007, and PASCAL VOC
2012 datasets. The mean average precision (mAP) values are also compared and
are found maximum in YOLOv3, about 57.9% when trained on MS COCO dataset
which is also faster but less accurate than RetinaNet [26].
Object Detection in Surveillance Using Deep Learning Methods… 685

Table 3 Object detection method with constraint review


Object detection High computation time Region proposal Regression/classification
techniques detection
CNN ✓ ✓ ✗
RCNN ✓ ✓ ✗
Fast RCNN ✓ ✓ ✗
Faster RCNN ✗ ✓ ✗
SSD ✗ ✓ ✓
YOLO ✗ ✓ ✓
YOLOv2 ✗ ✓ ✓
RetinaNet ✗ ✓ ✓
YOLOv3 ✗ ✓ ✓

4 Challenges in Object Detection

Several difficulties emerged in identifying the objects that exist in an image. This
section discusses the impediments experienced while dealing with the standard
datasets, which limit to achieve high-performance measures.
Occlusion: The major problem in object detection is occlusion [8, 27, 28]. Occlusion
is the effect of one object blocking another object from the view in 3D images. CNN
framework is not capable to handle occlusions. The complex occlusions in pedestrian
images with deep learning framework, DeepParts is proposed [29] to deal with the
problem.
Viewpoint variation: Severe distortions occur due to degree variation in viewpoints
of the image. The classification of objects becomes difficult on varied angles and has
a direct impact on accuracy in predicting the object [6].
Variation in poses: Variation in facial expression and poses makes it difficult for
the algorithm to detect the faces. To address occlusions and pose variations, a novel
framework based on deep learning is proposed in [28] which collects the responses
from local facial features and predicts faces.
Lighting conditions: Another big challenge in detecting objects is the lighting
conditions that may vary throughout the day. Different approaches are followed by
researchers to tackle varying lighting conditions in the daytime and nighttime traffic
conditions [7].
Table 4 Dataset and model-based object detection method review with mean precision value
686

Method Model Training datasets mAP (%) Description


Region-based method RCNN VGG16 PASCAL VOC 2007 66.0 The main drawback of CNN [18]
method is exhaustive running
time involved in identifying the
number of regions which is
resolved in RCNN method by
restricting the number of region
proposals to 2000
Fast RCNN VGG16 MS COCO 20.5 Fast RCNN [19] method reduces
PASCAL VOC 2007 + 70.0 running time further by
PASCAL VOC 2012 performing convolution once, to
generate a feature map
Faster RCNN VGG16 MS COCO 18.9 Region proposal network (RPN)
VGG16 PASCAL VOC 2007 73.2 is used by faster RCNN [21]
method to reduce running time
Regression-based method SSD ResNet-101 PASCAL VOC 2007 + 81.6 Single shot multibox detector
PASCAL VOC 2012 + MS (SSD) [23] is relatively simple
COCO that encapsulates object
localization and classification in
single network
YOLO Network inspired by PASCAL VOC 2007 + 63.4 YOLO [22] method takes
GoogLeNet PASCAL VOC 2012 advantage of the prediction of
class probabilities along with
region proposal. The main
limitation of YOLO is in
detecting small objects
(continued)
D. Saini et al.
Table 4 (continued)
Method Model Training datasets mAP (%) Description
YOLOv2 DarkNet-19 MS COCO 21.6 YOLOv2 [24] is an improved
PASCAL VOC 2007 + 78.6 version of YOLO in which the
PASCAL VOC 2012 performance is enhanced by
incorporating batch
normalization and
high-resolution classifier.
YOLO9000 is an extended
version of YOLOv2 capable to
classify 9000 classes
RetinaNet ResNet + FPN MS COCO 37.8 RetinaNet [25] uses feature
pyramid architecture, a one-stage
detector
YOLOv3 DarkNet-53 MS COCO 57.9 YOLOv3[26] indulges residual
skip connections and
un-sampling features, which is
found missing in YOLOv2. It is
capable of detecting small
objects
Object Detection in Surveillance Using Deep Learning Methods…
687
688 D. Saini et al.

5 Conclusion and Future Scope

Various object detection architectures are compared based on the training datasets,
and the performance measures are analyzed in this research. The comparison
focused on recognizing the most appropriate methods that could be used for surveil-
lance, requiring real-time data extraction with the least latency and maximum
accuracy. The analysis shows that object detection methods performing regional
proposal detection and classification reduce computation time and are there-
fore faster than traditional methods. The study highlights the fact that there is
always a trade-off between speed and accuracy. SSD provides maximum preci-
sion; however, with minimal latency, YOLOv3 outperforms all other object detec-
tion techniques. Using an unmanned aerial vehicle (UAV), YOLOv3 can be used to
produce highly responsive smart systems for live streaming videos and images in a
surveillance system over the Internet.

Acknowledgements This work is supported by the grant from Department of Science and Tech-
nology, Government of India, against CFP launched under Interdisciplinary Cyber-Physical Systems
(ICPS) Programme, DST/ICPS/CPS-Individual/ 2018/181(G).

References

1. Minoli D, Sohraby K, Occhiogrosso B (2017) IoT Considerations, requirements, and archi-


tectures for smart buildings—energy optimization and next generation building management
systems. IEEE Internet Things J 4(1):1–1
2. Schmid C, Jurie F, Fevrier L, Ferrari V (2008) Groups of Adjacent Contour Segments for
Object Detection. IEEE Trans Pattern Anal Mach Intell 30(1):36–51
3. Singh A, Patil D, Omkar SN (2018) Eye in the sky: real-time drone surveillance system
(DSS) for violent individuals identification using scatternet hybrid deep learning network.
IEEE Comput Soc Conf Comput Vis Pattern Recognit Work 2018:1710–1718
4. Jain R, Jain N, Aggarwal A, Hemanth DJ (2019) Convolutional neural network based
Alzheimer’s disease classification from magnetic resonance brain images. Cogn Syst Res
57:147–159
5. Hu Q, Paisitkriangkrai S, Shen C, van den Hengel A, Porikli F (2016) Fast detection of multiple
objects in traffic scenes With a Common detection framework. IEEE Trans Intell Transp Syst
17(4):1002–1014
6. Hayat S, Yanmaz E, Muzaffar R (2016) Survey on unmanned aerial vehicle networks for civil
applications: a communications viewpoint. IEEE Commun Surv Tutorials 18(4):2624–2661
7. Tian B, Li Y, Li B, Wen D (2014) Rear-view vehicle detection and tracking by combining
multiple parts for complex Urban surveillance. IEEE Trans Intell Transp Syst 15(2):597–606
8. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Occlusion-aware R-CNN: detecting pedestrians in
a Crowd. In lecture notes in Computer science (including subseries Lecture Notes in Artificial
Intelligence and Lecture Notes in Bioinformatics), pp 657–674
9. Westlake N, Cai H, Hall P (2016) Detecting people in artwork with CNNs. Lecture notes
Computer Sciene (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol
9913 LNCS, pp 825–841
10. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In Lecture
notes in Computer science (including subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics), vol 8689 LNCS, no. PART 1, pp 818–833
Object Detection in Surveillance Using Deep Learning Methods… 689

11. Simonyan K, Zisserman A (2014) Very deep Convolutional networks for large-scale image
recognition, pp 1–14
12. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet Classification with Deep Convolutional
Neural Networks. In: Proceedings of the 25th international Conference on neural information
processing systems, Vol 1, pp 1097--1105
13. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document
recognition. proc IEEE
14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proc IEEE
Comput Soc Conf Comput Vis Pattern Recognit 2016: 770–778
15. Keypoints S, Lowe DG (2004) Distinctive Image Features from. Int J Comput Vis 60(2):91–110
16. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceed-
ings—2005 IEEE Computer society Conference on Computer Vision and Pattern Recognition,
CVPR 2005
17. Kyrkou C, Theocharides T (2009) SCoPE: Towards a systolic array for SVM object detection.
IEEE Embed Syst Lett 1(2):46–49
18. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature Hierarchies for accurate object
detection and semantic segmentation. IEEE Conf Comput Vis Pattern Recogn 2014:580–587
19. Girshick R (2015) Fast R-CNN. In 2015 IEEE International Conference on Computer Vision
(ICCV), vol 2015 Inter, pp 1440–1448
20. Jia Y et al (2014) Caffe: Convolutional architecture for fast feature embedding,” MM 2014.
Proc 2014 ACM Conf Multimed 675–678
21. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with
region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
22. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time
object detection. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 2016:779–788
23. Liu W, et al (2016) SSD: Single Shot MultiBox Detector,” in Lecture notes in computer
science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol 9905 LNCS, pp 21–37
24. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: 2017 IEEE conference on
Computer Vision and Pattern Recognition (CVPR), 2017, vol 2017, pp 6517–6525
25. Lin TY, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. Proc
IEEE Int Conf Comput Vis 2017: 2999–3007
26. Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement, Apr 2018
27. Yao L, Wang B (2019) Pedestrian detection framework based on magnetic regional regression.
IET Image Process
28. Yang S, Luo P, Loy CC, Tang X (2015) From facial parts responses to face detection: a deep
learning approach. Proc IEEE Int Conf Comput Vis vol 2015 Inter, no 3, pp 3676–3684
29. Mathias M, Benenson R, Timofte R, Van Gool L (2013) Handling occlusions with franken-
classifiers. Proc IEEE Int Conf Comput Vis pp 1505–1512
MaSMT4: The AGR Organizational
Model-Based Multi-Agent System
Development Framework for Machine
Translation

Budditha Hettige, Asoka S. Karunananda, and George Rzevski

Abstract A framework is an essential tool for agent-based programming that saves


a programmer’s time and provides development standards. There are few multi-
agent frameworks and models available for agent-based programming. The AGR
organizational model is a successful agent development model that builds artificial
societies through agents. MaDKit is one of the successful frameworks that use the
AGR organizational model. However, the English to Sinhala agent-based machine
translation system needs a lightweight framework and the fastest message-passing
capabilities. These features are currently not available on the existing frameworks at a
sufficient level. Thus, the Java-based multi-agent system development framework has
been developed using the AGR organizational model with the required modifications.
This paper presents a multi-agent system development framework, MaSMT, which is
specially designed to handle English to Sinhala agent-based machine translation. The
MaSMT has been developed using the AGR organizational model that provides an
infrastructure of the agents, communication methods for agents, agent status control-
ling, and a tool for agent monitoring. Different types of multi-agent systems have
already developed through the MaSMT framework, including, Octopus, AgriCom,
and RiceMart. The framework is freely available and can be downloaded from the
source forge.

Keywords MaSMT · Multi-agent systems · AGR organizational model

B. Hettige (B) · A. S. Karunananda · G. Rzevski


Department of Computational Mathematics, Faculty of Information Technology, University of
Moratuwa, Moratuwa, Sri Lanka
e-mail: budditha@kdu.ac.lk
A. S. Karunananda
e-mail: asokakaru@uom.lk
G. Rzevski
e-mail: rzevski@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 691
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_50
692 B. Hettige et al.

1 Introduction

Multi-agent systems (MASs) are computerized systems composed of multiple inter-


acting intelligent agents [1]. A multi-agent system also consists of two or more agents
capable of communicating with each other in a shared environment. MAS consists of
four major components, namely agents, environment, ontology, and the virtual world
[2]. An agent in a multi-agent system may be a computer application or indepen-
dently running process (a thread) capable of doing actions. Theoretically, agents are
capable of acting independently (autonomously) and controlling their internal states
according to the requirements. Agents communicate with other agents through the
messages. These messages consist of information for agent activities (agent doing
their task according to the messages they have taken from others). The environment
is the interaction between the “outside worlds” of the agent. In most of the cases,
environments are implemented within a computer. The ontology can be defined as
an explicit specification of conceptualization. Ontologies capture the structure of the
domain. Thus, the agent nature of the multi-agent system capabilities is based on the
ontology.
According to the activities, behavior, and existing features, agents can be cate-
gorized in different types including simple reflex agent, model-based reflex agent,
goal-based agent, utility-based agent, and learning agents [3]. However, based on
agents’ behaviors, various classifications are also available including, collaborative
agent, interface agent, mobile agent, information or Internet agent, reactive agent,
hybrid agent, and smart agent. These agents show different capabilities and behaviors
to work together and that makes the artificial society.
Note that communication among agents is necessary to allow collaboration,
negotiation, and cooperation between independent entities [4]. Thus, it should
require a well-defined, agreed, and commonly understood semantics. Further, agents
can communicate with each other with the peer-to-peer, broadcast, or noticeboard
method.
In peer-to-peer agent communication, agents send messages only for known
peer(s). It means that there is an agreement between the sender and the receiver.
The broadcasting method sends the message(s) for all. According to this method,
everyone in the group (sometimes all agents in the system) takes the message.
Further, the noticeboard method allows a different way than the above. In the notice-
board method, agents send messages into the shared location (noticeboard) if the
agent required some information, then the agent takes that information from the
noticeboard.
Further, multi-agent systems consist of some advantages [5]. A multi-agent system
consists of interconnected agents. They used to distribute computational resources to
the central resources. MAS provides interconnection and interoperation of multiple
systems and should be capable of taking globally coordinates, distributed information
from sources efficiently.
However, there are some limitations; MAS needs some time to take the solutions
and be required to pass messages among agents. The parallelism of the multi-agent
MaSMT4: The AGR Organizational Model-Based Multi-Agent System … 693

system and avoiding unnecessary message parsing can help to deal with the above
limitations.
Note that, the multi-agent system also uses to handle the complexity of a software
system and to provide intelligent solutions through the power of agent communi-
cation. Thus, the development of multi-agent systems is also a bit of a complicated
process. According to such complexity, selecting a suitable framework is highly
essential than ad-hoc development. In general, the multi-agent framework provides
agent infrastructure, communication, and monitoring methods for agents. In addition
to that common standards available for agent development, especially communica-
tion among agents including FIPA-ACL [6, 7] and KQML [8]. Among others, FIFA
is one of the most used common stranded for agent development.
The development of a multi-agent system from scratch is not an easy task and
that required to model agents, communications, and controlling with some standards.
Therefore, most of the multi-agent system developers give attention to an existing
framework to build their MAS solutions easily. Number of well-developed multi-
agent framework available for multi-agent system development including, JADE
[9], MaDKit [10], PADE [11], SPADE [12], AgentBuilder [13], and ADK [14].
Further, most of the existing multi-agent development frameworks are uniquely
designed to develop general-purpose, fully distributed multi-agent applications.
However, these existing frameworks directly do not support distinct requirements
of the agent-based English to Sinhala machine translation. The English to Sinhala
machine translation system requires several activities on natural language processing,
including morphological processing, syntax processing, and semantic processing.
A new multi-agent development framework has been designed and develops with
incorporating the following required features, including capabilities to work with a
large number of agents efficiently, the ability to send and reserve several messages
quickly, and the ability to customize agents easily for local language (Sinhala)
processing requirements. Multi-agent system for machine translation (MaSMT) was
released in 2016 to provide the capability to AGR-based agent development. This
paper reports the latest version of the MaSMT4.0, which consists of new features,
including email-based message parsing and the ability to work with customized
behaviors.
The rest of the paper is organized as follows. Section 2 presents a
summary on existing multi-agent frameworks. Section 3 comments on the AGR
(Agent/Group/Role) model and infrastructure of the agents. Section 4 presents the
design of the MaSMT, including agents, communications, and monitoring features
of the framework. Section 5 gives some details of the developed multi-agent systems
developed through the MaSMT. Finally, Sect. 6 concludes the paper along with the
future scope.
694 B. Hettige et al.

2 Related Works

There are several multi-agent system development frameworks available for different
requirements. This section briefly describes some existing multi-agent system devel-
opment frameworks with their features. Java Agent Development Framework (JADE)
[9] is a Java-based open-source software framework for MAS development. JADE
provides middle-ware software support with GUI tools for debugging and deploy-
ment. Further, JADE provides task execution and composition model for agent
modeling, and peer-to-peer agent communication has been done with asynchronous
message passing. In addition to the above, JADE consists of the following key
features: JADE provides a FIPA-compliant distributed agent platform, multiple direc-
tory facilitators (change agents active at run time), and messages are transferred
encoded as Java objects.
MaDKit [10] is a Java-based generic multi-agent platform based on the Aalaadin
conceptual model [16, 17]. This organizational model consists of groups and roles for
agents to manage different agent activities. MaDKit also provides a lightweight Java
library for MAS design. The architecture of MaDKit is based on three design prin-
ciples, such as micro-kernel architecture, agentification of services, and the graphic
component model. Also, MaDKit provides asynchronous message passing. Further, it
can be used for designing any multi-agent applications, from distributed applications
to multi-agent simulations.
PADE is a free, entirely Python-based multi-agent development framework to
develop, execute, and manage multi-agent systems in distributed computing environ-
ments [11]. PADE uses libraries from twisted project to allow communication among
the network nodes. This framework supports multiplatform, including embedded
hardware that runs Linux. Besides, PADE consists of some essential functionalities.
PADE agents and their behaviors have been built using object-orientation concepts.
PADE is capable of handling messages in the FIPA-ACL standard and support cyclic
and timed behaviors.
A smart Python multi-agent development environment (SPADE) is another frame-
work for multi-agent system development [12]. This framework provides a new
platform aimed at solving the drawbacks of the communication models from other
platforms. SPADE includes several features such as SPADE agent platform is based
on the XMPP support, and agent’s model based on behaviors supports FIPA metadata
using XMPP data forms and provides a web-based interface for agent control.
In addition to the above popular frameworks, AgentBuilder [13], Agent Develop-
ment Kit (ADK) [14], Jason [15] and Shell for Simulated Agent Systems (SeSAm)
[18] are the most people used other multi-agent development frameworks.
Table 1 gives a brief description of the selected multi-agent system develop-
ment frameworks and their main features. With this theoretical and application
base, MaSMT was developed through the AGR organizational model. The next
section briefly reports the AGR model and its architecture for multi-agent system
development.
MaSMT4: The AGR Organizational Model-Based Multi-Agent System … 695

Table 1 Summary of the existing frameworks


System Type Platform Features
JADE Open source Java Asynchronous message parsing
MaDKit Open source Java Asynchronous message parsing,
PADE Free Python Supports FIPA
SPADE Free Python Supports FIPA metadata using XMPP
data
AgentBuilder Open source Java Quickly and easily build intelligent
agent-based applications with MaS
knowledge
Jason Open source Java Speech-act-based inter-agent
communication
SeSAm Open source Programming shell GUI-based agent modeling
ADK Open source Mobile-based Large-scale distributed solutions

3 AGR Organizational Model and MaSMT Architecture

This AGR organizational model was designed initially under the Aalaadin model,
which consists of agents, groups, and roles. Figure 1 shows the UML-based Aalaadin
model [15] for multi-agent system development. According to the model, each agent
is a member of one or more groups and a group contains one or more roles. The agent
should be capable of handling those roles according to the agents’ requirements. This
model is used by the MaDKit system [16] by allowing free overlapping agents among
groups.
The MaSMT model is almost the same as the above model but removes a freely
overlapping feature of the group and role at the same time. It means the agent is
on one or more groups as well as one or more roles; however, there is only one
active group and role. Thus, the agent does actions according to this active group and
role. Note that, agents are only active communicating entities capable of playing roles
within groups. Therefore, MaDKit provides the freedom for agent designers to design
appropriate internal models for agents. With this idea, the MaSMT agent is designed
considering the three-level architecture that consists of a root agent, controlling
agents, and ordinary agents. The root represents the top-level in the hierarchy and

Fig. 1 UML-based Aalaadin


model for multi-agent
system development
696 B. Hettige et al.

Fig. 2 Agents’ architecture on MaSMT

contains several controlling agents (managers). Each controlling agent consists of


any number of ordinary agents. In addition to that, agents can be clustered according
to their group and role. This layered model allows building agents’ swarms quickly.
Figure 2 shows the agent diagram of the three-level architecture on MaSMT.
With this model, an ordinary agent can communicate with its swarm, as well as
its controller agent. The controller agent should capable of communicating and fully
controlling its ordinary agents. Controllers can communicate with other controllers,
and the root can handle all agents in the system. With this model, MaSMT allows for
passing messages through the noticeboard method, peer-to-peer communication, or
broadcast method.

4 MaSMT

MaSMT is an open-source multi-agent development framework, developed using


Java that enables cross-platform capabilities. The framework provides agents and
their communication facilities to develop multi-agent systems easily. This section
briefly noted on the architecture of the MaSMT.

4.1 Agent’s Model

This model comprises of three types of agents, namely the ordinary agent, controller
agent (previously called as manager), and root agent. The MaSMT ordinary agents
do actions when others are used to controlling them. A controller agent consists of
several MaSMT agents. Hierarchically root is capable of handling a set of controller
MaSMT4: The AGR Organizational Model-Based Multi-Agent System … 697

agents. Using this three-layer agent model, agents can easily cluster and model to
build a swarm of agents easily.

4.2 Abstract Model of the MaSMT Agent

AbstractAgent model is used to identify agents through its group-rule-id. Agent


identifier for the particular agent can generate using group, role and relevant id.
Also, role(dot)id@group can be used to locate agents quickly. As an example
read_words.101@ensimas.com provides read_words is a role, id is 101 and
ensimas.com is a group.
This AbstractAgent model is used by the MaSMT to handle all the agent-based
activities that are available in the MaSMT agents, MaSMT controllers, and MaSMT
root agent.

4.3 MaSMT Agent

MaSMT agents are the active agents in the framework provides agents’ infrastruc-
ture for agent development. The modular architecture of the MaSMT consists of
several built-in features including noticeboard reader and environment controller.
The modular architecture of the MaSMT agent is shown in Fig. 3. The noticeboard

Fig. 3 A modular view of the MaSMT agent


698 B. Hettige et al.

Fig. 4 Life cycle of the MaSMT agents

reader provides access capabilities of the noticeboard and environment controller


can directly access environment. Also, through the status monitor agent controller
can see the status of the MaSMT agent.

4.4 MaSMT Agent’s Life Cycle

Agents in the MaSMT system follows the life cycle (status of the agent), namely
active, live, and end. When new agents initiate it directly start the “active” section
(it usually work as a one-step behavior for the agents) Then, agent moves to its live
section. (The live section is the working section of the agent that usually works as
a cyclic behavior). The MaSMT agent leaves from the live section when the live
property of the agent is false. According to the requirements, an agent may wait until
a specific time or until a new message comes to the in-queue. Figure 4 shows the life
cycle of the MaSMT agent.

4.5 MaSMT Controller Agent

MaSMT controller agent is the middle-level controller agent of the MaSMT frame-
work, capable to control its, clients, as required. Figure 5 shows the life cycle of
the MaSMT controller agent. The MaSMT controller agent also provides all the
features available in the MaSMT agents. In addition to that, MaSMT controller
should capable to provide message passing, network access, noticeboard access and
environment handling capabilities.

4.6 MaSMT Root Agent

The root agent is the top-level controller agent (MaSMT Manager), which is capable
of handling other MaSMT controller agents. According to its architecture, there is
MaSMT4: The AGR Organizational Model-Based Multi-Agent System … 699

Fig. 5 Life cycle of the MaSMT controller agents

only one root for the system. Further, the MaSMT root agent is also capable of
communicating with other root agents through the “Net access agent”.

4.7 MaSMT Messages

The MaSMT framework uses messages named MaSMT Messages to provide agent
communication. These MaSMT Messages have been designed using FIPA-ACL
message standards. MaSMT Messages can be used to communicate in between
MaSMT agents as well as other agents that support “FIFA-ACL” message stan-
dards. Table 2 gives the structure of the “MaSMT Message” including data fields
and types. More information on “MaSMT Messages” is provided under the MaSMT
development guide [19].

4.8 MaSMT Message Parsing

MaSMT supports noticeboard concepts as well as peer-to-peer and broadcast


methods for message parsing. In addition to that, it allows to email-based direct
communication among root agents. Further, MaSMT Messages are classified and
forward according to its message headers. Table 3 shows the description of each
message headers.
700 B. Hettige et al.

Table 2 Fields, type, and information on MaSMT Messages


Date Field Type Description
Sender MaSMT AbstractAgent The sender of the message
ReplyTo MaSMT AbstractAgent The conversation is to be directed to the agent
Receiver MaSMT AbstractAgent The receiver of the message
Message String A message should be subject to the message
Content String The original content of the message
Ontology String Relevant information
Type String Type of the message
Data String Information
Header String Header (use to redirect)
Language String Language of the message
Conversation int Conversation ID

Table 3 Message directives (headers for messages)


Message Header Description
Agents Sends a message to a particular group and role
AgentGroup Sends a message to a particular group
AgentRole Sends a message to a particular role
Local The agent or controller sends a message to it sawm who has given role and
group
LocalRole The agent or controller sends a message to it sawm who has given the role
RoleOrGroup The agent or controller sends a message to it sawm who has given role or
group
Broadcast The agent or controller sends a message to it sawm
Root The controller can send a message to Root agent
Controller The agent can send a message to its controller
NoticeBoard The agent or controller sends a message to the noticeboard
MailMessage The agent or controller send a message as a email message

5 Usage of the MaSMT

MaSMT has been developed as an open-source product, and which is available on


the source forge since March 2016. Figure 6 shows the download statistics on the
MaSMT framework.
Using this MaSMT framework, a number of the multi-agent systems have been
successfully developed. Among them, English to Sinhala agent-based machine trans-
lation system (EnSiMaS), [20], has been completely developed through the MaSMT
[21]. EnSiMaS uses a hybrid approach to its translation [22]. All the natural language
processing activities on EnSiMaS have been done through the agents. Thus system
MaSMT4: The AGR Organizational Model-Based Multi-Agent System … 701

Fig. 6 Usage of the MaSMT framework

consists of eight natural language processing swarms to translate English sentence(s)


into Sinhala through the agents’ support.
Furthermore, various kind of multi-agent systems has already developed,
including, Octopus [23], AgriCom [24] and RiceMart [25]. Octopus provides multi-
agent-based solution for Sinhala chatting [23], and AgriCom and RiceMart provide a
communication platform for agricultural domain. Besides, a multi-agent-based file-
sharing application has already developed trough for the distributed environment
[26]. MaSMT already developed as a Java application. Thus, web-based applica-
tion development capabilities have been already tested through the web-based event
planning system [27].

6 Conclusion and Future Work

The multi-agent system development framework can be used as a tool for multi-
agent system development. The MaSMT framework has been designed with consid-
ering the AGR organizational model which was available in the Aalaadin project.
According to MaSMT’s AGR model, each agent has a group and a role at a time.
However, agents can change their role and group at run time. Further, MaSMT also
uses a three-layer agent model (root, controller, and agent) to build agents’ swarms
quickly. Also, MaSMT is capable to communicate with agents using noticeboard
method and a peer-to-peer or broadcasting way for message parsing. Especially
MaSMT framework allows email-based message-passing capabilities. Numbers of
multi-agent applications have been successfully developed with MaSMT including
EnSiMaS, Octopus, AgriCom, and RiceMart. The framework is freely available and
can be downloaded from the source forge. Further, MaSMT can implement for other
languages like Python is one of the further directions of the research.
702 B. Hettige et al.

References

1. Wooldridge M (2009) An introduction to multi agent systems, 2nd edn Wiley


2. Multi Agent Systems—an overview, ScienceDirect Topics https://www.sciencedirect.com/top
ics/chemical-engineering/multi-agent-systems
3. Coelho H (2014) Autonomous agents and multi-agent systems
4. Rzevski R, Skobelev P (2014) Managing complexity. Wit Pr/computational mechanics,
Southampton, Boston
5. Bradshaw JM (1997) An introduction to software agents
6. Labrou Y, Finin T, Peng Y (1999) Agent communication languages: the current landscape
7. FIPA Agent Communication Language Specifications. https://www.fipa.org/repository/acl
specs.html
8. Finin T, Fritzson R, McKay DP, McEntire R (1993) KQML-a language and protocol for
knowledge and information exchange
9. Java Agent DEvelopment Framework. https://jade.tilab.com
10. MaDKit, https://www.madkit.net/madkit
11. Python Agent DEvelopment framework Pade 1.0 documentation. https://pade.readthedocs.io/
en/latest
12. Palanca P Spade: Smart Python agent development environment
13. AgentBuilder https://www.agentbuilder.com
14. Mitrovic D, Ivanovic M, Bordini RH, Badica C (2016) Jason Interpreter, Enterprise Edition,
Informatica (Slovenia), vol 40
15. Xu H, Shatz SM (2003) ADK: an agent development kit based on a formal design model for
multi-agent systems. Autom Softw Eng 10(4):337–365
16. Ferber J, Gutknecht O (1998) A meta-model for the analysis and design of organizations in
multi-agent systems. In: Proceedings international conference on multi agent systems (Cat.
No.98EX160)
17. Gutknecht O, Ferber J (1997) MadKit and organizing heterogeneity with groups in a platform
for multiple multi-agent systems
18. Klügl K, Puppe F (1998) The multi-agent simulation environment SeSAm, in University
Paderborn
19. MaSMT 3.0 Development Guide, ResearchGate. https://www.researchgate.net/publication/
319101813_MaSMT_30_Development_Guide
20. Hettige B, Karunananda AS, Rzevski G (2016) A multi-agent solution for managing complexity
in English to Sinhala machine translation. Int J Des Nat Ecodyn 11(2):88–96
21. Hettige B, Karunananda AS, Rzevski G (2017) Phrase-level English to Sinhala machine trans-
lation with multi-agent approach. In 2017 IEEE international conference on industrial and
information systems (ICIIS), pp 1–6
22. Hettige B, Karunananda AS, Rzevski G (218) Thinking like Humans: a new approach to
machine translation, in Artificial Intelligence, pp 256–268
23. Hettige B, Karunananda AS (2015) Octopus: a multi agent Chatbot
24. Goonatilleke MAST, Jayampath MWG, Hettige B (2019) Rice express: a communication
platform for rice production industry. In Artificial Intelligence, pp 269–277
25. Jayarathna H, Hettige B (2013) AgriCom: a communication platform for agriculture sector. In:
2013 IEEE 8th international Conference on industrial and information systems, pp 439–444
26. Weerasinghe L, Hettige B, Kathriarachchi RPS, Karunananda AS (2017) Resource sharing in
distributed environment using multi-agent technology. Int J Comput Appl 167(5):28–32
27. Samaranayake TD, Pemarathane WPJ, Hettige B (2017) Solution for event-planning using
multi-agent technology. In: 2017 seventeenth international Conference on advances in ICT for
emerging regions (ICTer), pp 1–6
Comparative Study of Optimized
and Robust Fuzzy Controllers for Real
Time Process Control

Ajay B. Patil and R. H. Chile

Abstract A comparative analysis is carried out to measure the performance of μ-


synthesis D-K iteration based controllers and μ-synthesis based fuzzy controller is
implemented in this research. It is difficult to design the weighting function used
in μ-synthesis based robust control methods. The weighting functions are chosen
mostly on a trial and error basis. The response for T-S fuzzy and μ-synthesis D-K
iteration based controller are compared. The real-time platform of process control
is used to examine the performance. Stability analysis is conducted by Nyquist and
bode plots. Frequency analysis shows the importance of weighting function.

Keywords Robust stability and robust performance · μ synthesis · Weighting


function · T-S fuzzy · D-K iteration

1 Introduction

Robustness is of great importance in control system design, the reason behind that is,
real engineering systems are unprotected to external disturbance and noise. Generally,
to stabilize a plant, a control engineer is required to design a controller, if the plant is
not stable at first and meets certain levels of efficiency in the presence of disturbance,
noise, and plant parameter variations. One of the robust control issues that were
deemed astronomically to be solved by the H-∞ approach and MU (μ) approach
[1, 2].
It is possible to achieve nominal performance and robust stability against unstruc-
tured perturbations with H-∞ optimal approach but the issue of robust perfor-
mance requirements is neglected. In a real-time implementation, the higher-order
controller may not be viable because of computational and hardware limitations.
Design methods based on structure-singular value MU(μ) can be used to achieve
Robust Stability and Robust Performance (RSRP). One of the strong robust design
approaches is MU(μ) synthesis problem [11, 12] [14]. The stabilizing controller can

A. B. Patil (B) · R. H. Chile


Department of Electrical Engineering, SGGS Institute of Engineering and Technology,
Vishnupuri, Nanded, M.S, India
e-mail: abp2510@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 703
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_51
704 A. B. Patil and R. H. Chile

be designed by μ synthesis approach since the system at equilibrium can be repre-


sented by a non-minimum phase system [2].Performance of H-2 and H-∞ control
is not satisfactory in the presence of parametric uncertainty, whereas in μ synthesis
approach structured/parametric uncertainties are considered therefore it provides
better performance than H-∞ approach [3, 15, 16]. Time-domain specifications are
used in μ approach which reduces tedious and usually inaccurate design iterations
[6].Maximization of system performance is possible with structured uncertainty [7].
The nonlinear model can be successfully reduced to a linear model with uncer-
tainties based on its geometric structure. Hence the first step is to derive a suit-
able linear converter model, where parameter variations are described in terms of
Linear Fractional Transformation (LFT). μ synthesis is applied to this reduced
model, which reduces complexity in design procedure [3–5, 17]. D-K iteration is
a commonly used method in μ synthesis, which is successful in many practical
applications [8–10].Weighting functions are considered for structured uncertainties,
which can be optimized with optimization techniques such as particle swarm opti-
mization (PSO), Genetic algorithm (GA), JAYA [13]. T-S fuzzy models present a
stabilization approach for time-delay systems via state feedback and Fuzzy observer
Controller [18].
The flow of the paper is as follows, Sect. 2 gives the details about the μ analysis
and synthesis used for RSRP and Sect. 3 has the concept of weighting function
and contains μ synthesis controller with D-K iteration. Section 4 contains the μ-
synthesis controller with Takagi–Sugeno (TS) fuzzy.Sect. 5 gives a detailed study of
the real-time process control model under study which contains a continuous stirrer
tank reactor (CSTR) system. Section 6 includes simulation outcomes and Sect. 7
provides hardware outcomes. Finally, Sect. 8 covers the conclusion and the future
scope of the research.

2 RSRP Based on µ-Synthesis

The μ-synthesis controller is in state-space form and can be defined with the
following models by incorporating suitable state variables and easy manipulations

ẋ = Ax + Bu

y = C x + Du

Here, x is the state vector, u and y are input and output vector respectively and
A, B, C, and D are state-space parameters. The structure uncertainties are taken into
consideration when the μ-synthesis problem is stated. The structure uncertainties are
arranged in a particular manner as follows [1],

 = {[δ1/r 1, . . . , δs/r s, 1, . . . , F] ∈ C, i ∈ Cmj × mj} (1)


Comparative Study of Optimized and Robust Fuzzy… 705

where


s 
s
ri + mj = n (2)
i=1 i=1

n—Dimension of the blocka from Eq. (1) there are two typof uncertainty blocks of
block  and they are,
s—Repeated scalar blocks.
f —Full scalar blocks.
δi—the repeated scalar blocks parameter can be only real numbers. The value of μ
is given as

1
:= min{σ () : |(I − M)| = 0} (3)
μ(M)

If, | (I−M )| = 0 then, μ(M) := 0.


Consider Fig. 1 which shows standard M− configuration. Where M is enclosed
with uncertainty block . here M(s) is interconnected transfer function matrix, v,
d are vector signal; v is the input to the uncertainty  and d is the output to the
uncertainty block .

μ (M(s) := sup μ (M( jω) (4)


w∈R

The normalized set of structure uncertainty B is given by

B := { : σ () ≤ 1,  ∈ } (5)

Equation (4) shows the structure singular value of interconnected transfer function
matrix. If M(s) is stable and μ (M(s)) < 1 (or, ||M||μ < 1), then and only then the
Standard M− configuration in Fig. 1 is robustly stableand shows, ‘w’ as the input,
generally including disturbances, noises, command signals, ‘z’ is the error output

Fig. 1 Standard M− configuration and configuration for RP analysis


706 A. B. Patil and R. H. Chile

normally consists of tracking errors, regulator output, and filtered actuator signals.
Let M(s) is split appropriately as below,
 
M11 M12
M(s) = (6)
M21 M22

It can easily get that,


 
z = M22 + M21 ((I − M11 )−1 M12 w (7)

= Fu (M, )w (8)

From Eq. (8), Fu(M,) is the upper linear fraction transformation (ULFT), an
upper loop of an interconnected transfer function matrix is enclosed by structured
uncertainty hence it is called ULFT. If the condition, Fu(M, )∞ < 1 satisfies
then concerning , Fu(M, ) becomes robustly stable. Now the fictitious uncertainty
block  P is added in such a way that it does not affect the robust stability of the
system and it can be shown in Fig. 2. The reliable performance is determined if only
 ∈ B condition is satisfied.Fig. 1 describes the robust stability problem replace
 by  ˜
˜ thus the below is a robust stability problem in respect of 

˜ ∈
 ˜ := {diag{, p}} :  ∈ B, p ∞ ≤ 1} (9)

If the infinity norm of M22 is less than one then the performance of the plant is
nominal and also if M(s) is internally stable then the system gives nominal stability.
The general method used to solve the μ synthesis problem is the D-K iteration
method. This method is shown in Fig. 2 with considering the controller K., the
feedback signal y and u are shown and M is created by P and K. The relation of M

Fig. 2 Standard M- configuration with  P analysis and configuration with K


Comparative Study of Optimized and Robust Fuzzy… 707

and P can be given by,

M(P, K ) = Fl (P, K ) (10)

⎡ ⎤
P11 P12 P13
P(s) = ⎣ P21 P22 P23 ⎦ (11)
P31 P32 P33

Now Eq. (10) will be rewritten in Eq. (12) as,


   
P11 P12 P13
M(P, K ) = + K(IP33K)−1 [P31 P32] (12)
P21 P22 P23
 
inf sup inf σ − D M(P, K )D −1 ( jw) (13)
k(s) w∈R D∈D

where D is scaling matrix and given by,


 
D = {D = diag D1 . . . Ds , d1 Im1 . . . , d f Im f :
(14)
Di ∈ C ri ∗ri , Di∗ > 0, d j > 0}
 
sup inf σ − D M(P, K )D −1 ( jw) < 1 (15)
w∈R D∈D

The technique of D-K iteration is to minimize the Eq. (13).

3 Concept of the Weighting Function (WF)

The effect of weighting functions of the system is used to suppress noise and distur-
bances that occur in the system. The controller’s performance and characteristics of
the system are attained by the transient response. Weight function can be selected by
the following assumptions by the selection of the weighting function that should be
in the frequency domain, Stable weights, and diagonal weights are chosen, and the
diagonal elements are to be restricted for a minimum phase, real-rational function.
The closed-loop system block diagram is shown in Fig. 3 by taking weighting
functions into account. Wu and W p are respectively, the performance weights linked
to control law and tracking, K is a controller, G is the plant’s transfer function, d is
an external disturbance and e is an error.
Table 1 shows the ISE calculation for different weighting functions. These
weighting functions are chosen by the trial and error method. As seen from the
Table 1 and Fig. 4, it is analyzed that the 6th column weight function gives better
results, but this is very time-consuming process. To overcome this problem in this
708 A. B. Patil and R. H. Chile

Fig. 3 Block diagram of the closed-loop system

Table 1 Different weighting function results


Sr. No wp(s) wu(s) ISE Sr. No wp(s) wu(s) ISE
s+1 s+2
1 s+0.001 1 7.323 4 s+0.0005 1 6.239

0.95 ss 2 +1.8s+10
2
2 s+1
s+0.0005 1 6.738 5 +8s+0.01
10−2 4.156
s 2 +1.8s+11
3 s+2
s+0.01 1 6.654 6 0.95 s 2 +8s+0.01
10−2 4.064

Fig. 4 Comparative results of different weighting function


Comparative Study of Optimized and Robust Fuzzy… 709

research the nature-inspired algorithm known as the Particle Swarm Optimization


(PSO) algorithm is proposed to optimize the weighting function.

3.1 μ-Synthesis Controller Using Particle Swarm


Optimization (PSO)

After performing random analysis in Table 1, the second-order weighting function


equation is used for optimization purposes, and in that x(1) and x(2) will have to
optimize by using PSO algorithms. The value of μ depends on weight values. But
weight values depend on the value of x(1) and x(2). Hence to optimize this fitness
function the PSO algorithm is used. The weight function for optimization selected
as-

s 2 + 1.8s + X (1)
W p = 0.95
s 2 + 8s + X (2)
−2
Wu = 10 ; μ = X

Where, Range of the optimization parameters is, x(1) = [10 11], x(2) = [0 1] and
x(3) = [0 2]. The parameter required to optimize fitness function in PSO algorithm
is set in Table 2. After PSO optimization, the optimized fitness function obtained is
given in Eq. (16),

s 2 + 1.8s + 10.14
W p = 0.95 and μ = 0.35 (16)
s 2 + 8s + 0.011

The μ-synthesis is executed by placing optimize weighting function and the


Robust Control Toolbox.
It is seen that from Table 3, at iteration 3, the value of γ is reduced to 0.961 and the
value of μ is equivalent to 0.951, which implies RP has been attained. The controller
obtained is given in Eq. (17) and state-space form in Eq. (18).

105.2s 3 + 7824s 2 + 6162s + 1977


T.F = (17)
s4 + 27.96s 3 + 341.9s 2 + 1281s + 1.607

The state-space model of the system is,

Table 2 PSO algorithm


Parameters of PSO Values Parameters of PSO Values
parameters
Number of particles 25 wfinal 0.4
Number of iterations 50 c1 0.5
winitial 1.2 c2 1.5
710 A. B. Patil and R. H. Chile

Table 3 Iteration Summary


Iteration no 1 2 3
of μ-synthesis controller
using PSO Order of controller 4 12 16
Total D-scale order 0 8 12
γ achieved 1.693 0.988 0.961
Peak value of μ 1.461 0.950 0.951

⎡ ⎤
−0.0013 0.0292 0.0058 − 0.0209
⎢ −0.0292 −6.6647 8.7775 −10.9527 ⎥
A=⎢
⎣ −0.0058 −6.7775
⎥,
−0.3426 2.0498 ⎦
0.0209 10.9527 2.0498 −20.9505
⎡ ⎤
−1.2492
⎢ −14.6961 ⎥
B=⎢
⎣ −2.8073 ⎦

10.2185
 
C = −1.2492 −14.6961 2.8073 −10.2185
D=0 (18)

4 µ-Synthesıs Controller Wıth T-S Fuzzy

Fuzzy logic control is inherently robust in sense of the imprecise parameter infor-
mation and the variation with some bound. Hence fuzzy controllers are used for
the systems where the data is complex and with variations. Here in this paper
fuzzy controller is developed using Takagi Sugeno based compensation technique.
Takagi—Sugeno’s suggested fuzzy model is described by fuzzy IF–THEN rules that
represent a nonlinear system’s local input–output relationships. The main feature of
a Takagi—Sugeno fuzzy model is that a linear system model expresses the local
dynamics of each fuzzy implication (rule). The fuzzy dynamic model or T–S fuzzy
model consists of a family of local linear dynamic models smoothly connected
through fuzzy membership functions. The fuzzy rules of the fuzzy dynamic model
have the form

I F R l : I F z 1 is F1l and...z v is Fvl

THEN, x(t + 1) = Al x(t) + Bl u (t) + al


y(t) = Cl x(t)l ∈ {1, 2, . . . , m}
Comparative Study of Optimized and Robust Fuzzy… 711

Fig. 5 Membership functions of CSTR fuzzy model

where Rl denotes the lth fuzzy inference rule, m the number of inference
rules,F jl ( j = 1, 2, ..., ν) the fuzzy sets, x(t) ∈ n the state vector, u(t) ∈ g
the input vector,y(t) ∈  p the output vector, and (Al , Bl , al , Cl ) the matrices of
the lth local model, and z(t) = [z 1 , z 2 , . . . z v ] are the premise variables, which are
some measurable variables of the system, for example, the state variables or the
output variables. Fuzzy rules are designed based on the local state-space model for
the dynamic system. The control gains are designed using the linear quadratic regu-
lation technique. The sample rules are given in Eq. (19), with different equilibrium
points obtained with the phase plane method. For fuzzy designing, the triangular
fuzzy functions are used as shown in Fig. 5.
Fuzzy rules:
Rule 1: If x2 (t) is low (i.e.,x2 (t) is about 0.8862).
THEN

δ ẋ(t) = A11 δx(t) + A12 δx(t − τ ) + B 1 δu(t)δu(t) = −F1 δx(t)

Rule 2: If x2 (t) is Middle (i.e.,x2 (t) is about 2.7520).


THEN

δ ẋ(t) = A21 δx(t) + A22 δx(t − τ ) + B 2 δu(t)

δu(t) = −F2 δx(t)

Rule 3: If x2 (t) is High (i.e.,x2 (t)x2 (t) is about 4.7052).


THEN

δ ẋ(t) = A31 δx(t) + A32 δx(t − τ ) + B 3 δu(t)


712 A. B. Patil and R. H. Chile

δu(t) = −F3 δx(t)

where

δx(t) = x(t) − xd , δx(t − τ ) = x(t − τ )


− xd , δu(t) = u(t) − u d and F1 , F2 , F3 (19)

are to be designed.
In this proposed method, μ-synthesis is designed with T-S fuzzy in which the
reduced transfer function obtained by the μ- synthesis method is converted in the
form of control input and error signal. This will be used in T-S fuzzy and for creating
a membership function. Error is taken as input and control input is taken as output
of the FIS file.

5 Contınuous Stırred Tank Reactor (CSTR)

CSTR plays an important role in the chemical process, where the exothermic reaction
takes place and the heat of the reaction needs to be removed by the use of coolant.
The control objective of CSTR is to maintain the temperature inside the tank at the
desired value. The system has a cooling process it is shown in Fig. 6 and the block
diagram is shown in Fig. 7.
The controller is used to minimize the error signal. The output of the controller is
given to DAC since the output of the controller is digital and the CSTR system only
understands physical quantity that is analog, hence it is given to the E/P converter
(Electro-pneumatic converter) which is used to convert a current input signal to a
linear proportion. To initiate the work of the E/P converter the compressor is used
and the output of the E/P converter is given to the control valve. Control valve action
takes place to minimize an error. The procedure is repeated until the desired output is
obtained. Using the process reaction curve method the second-order transfer function
obtained for CSTR is shown in Eq. (20). This system equation is imprecise and with
different nonlinearities. The proposed methods are applied to study the responses of
CSTR with a transfer function given in Eq. (20).

−0.12S + 12S
G(s) = (20)
3S2 + 4S + 1
Comparative Study of Optimized and Robust Fuzzy… 713

Fig. 6 CSTR hardware experimental set up

Fig. 7 Block diagram of continuously stirred tank reactor


714 A. B. Patil and R. H. Chile

6 Simulation Results

Experimental results are taken on the CSTR system explained in Sect. 5. Table 4
shows the iteration summary of the D-K iteration method by which it will prove that
the system performance is robust since the value of μ is less than one.
The time response with the D-K iteration controller and μ T-S fuzzy controller is
given in Figs. 8 and 9 respectively. Figure 10 gives the comparative time response of
PID with gains as KP = 12.24, KI = 2.0980, KD = 14.2, μ-controller D-K iteration

Table 4 Summary of
Iteration no 1 2 3
iteration for D-K method
Order of controller 4 12 16
Peak value of μ 1.461 0.950 0.946

Fig.8 Time response of D-K iteration controller

Fig.9 Time response of μ-controller with T-S fuzzy


Comparative Study of Optimized and Robust Fuzzy… 715

Fig.10 Time response of PID, μ-controller D-K iteration, μ-controller with T-S fuzzy

Table 5 Comparative results of response parameters


Parameter PID D-K iteration controller μ- synthesis controller with T-S
fuzzy
Rise Time (sec) 2 5.295 10.044
Settling 110 85 85
Time (sec)
Overshoot 14.368% 33.088% 0%
Undershoot 19.886% 11.193% 1.977%
ISE 6.197 4.064 3.046

method and μ-synthesis controller with T-S fuzzy. It shows that the μ-synthesis
controller with the T-S fuzzy controller has less overshoot (zero) than the other two
controllers. The μ-Synthesis controller with T-S fuzzy controller is more robust as
compared to the other two controllers.
Table 5 gives analysis for the comparative time response of PID, μ-controller D-K
iteration method, and μ-synthesis controller with T-S fuzzy. The Integral square error
function (ISE) is 3.046 in the case of a fuzzy controller. The settling time and over-
shoot are improved in T-S fuzzy control as compared to conventional methods. The
analysis of response shows better performance for optimized μ- synthesis controller
with T-S fuzzy as compare to others.
Figures 11 and 12 give the RSRP for the D-K iteration controller and presented a
μ-PSO fuzzy based controller, which shows the relationship between frequency and
upper bound of μ, which is less than one. The reduced 4th order controller (transfer
function) obtained using the proposed topology is shown in Eq. (17). This shows the
robust stability of the proposed methods.
716 A. B. Patil and R. H. Chile

Fig. 11 RS for D-K iteration controller

Fig. 12 RS for presented μ-PSO controller

7 Real-Time Process Control Results

The result of the μ-synthesis controller using D-K iteration is shown in Fig. 13
which describes that the setpoint is correctly tracked by the controller. Initially, the

Fig.13 Hardware result of Mu synthesis controller using D-K iteration method


Comparative Study of Optimized and Robust Fuzzy… 717

Fig.14 Hardware result of Mu synthesis controller using T-S fuzzy

temperature of the CSTR set at 60 degrees and the setpoint is 55, so the controller
gradually decreases temperature form 60 to 55° as shown in Fig. 13.
The result of the T-S fuzzy controller is shown in Fig. 14 where the initial temper-
ature is 45° and the setpoint is 40°. In both, the case controller tracks the setpoint but
in T-S Fuzzy control, the required time is less compared to the D-K iteration method.

8 Conclusion and Future Scope

The Mu-synthesis controller using the D-K iteration method and Mu synthesis
controller using T-S fuzzy is proposed in this research. In order to improve the
precision and stability of the real-time system, fuzzy control is used. The μ value
obtained is less than one which defines the stability of the proposed method. In the
proposed method overshoot problem is nullified and settling time is also improved
as compared to the PID controller. Similarly, the ISE value for the proposed method
is less than the PID controller and DK iteration method. Simulation results clearly
show that the μ-synthesis controller with T-S fuzzy is more robust than the D-K
iteration method. PSO based tuning used to optimize the weighting function added
to the system gives more accurate results. Also, from the hardware study, it is clearly
visible that the CSTR system works properly and gives accurate results when it uses
the μ-synthesis controller with a T-S fuzzy controller.
The Future Scope of the system can be modified further by using optimization
algorithm like GA, TLBO, and JAYA by which more accurate results can be achieved
and satisfies the RSRP criteria.
718 A. B. Patil and R. H. Chile

References

1. Zhou K, Doyle JC (1998) Essentials of robust control. Vol. 104. Upper Saddle River, Prentice
hall, NJ
2. Pannu S, Kazerooni H, Becker G, Packard A (1996) μ-synthesis control for a walking robot.
IEEE Control Syst Mag 16(1):20–25
3. Bendotti P, Beck CL (1999) On the role of LFT model reduction methods in robust controller
synthesis for a pressurized water reactor. IEEE Trans Control Syst Technol 7(2):248–257
4. Buso S (1999) Design of a robust voltage controller for a buck-boost converter using /spl
mu/-synthesis. IEEE Trans Control Syst Technol 7(2):222–229
5. Stein G, Doyle JC (1991) Beyond singular values and loop shapes. J Guidance Control Dyn
14(1)
6. Tchernychev A, Sideris A (1998) /spl mu//k/sub m/-design with time-domain constraints. IEEE
Trans Autom Control 43(11):1622–1627
7. Wallis GF, Tymerski R (2000) Generalized approach for /spl mu/ synthesis of robust switching
regulators. IEEE Trans Aerosp Electron Syst 36(2):422–431
8. Tsai KY, Hindi HA (2004) DQIT: /spl mu/-synthesis without D-scale fitting. IEEE Trans Auto
Control 49(11):2028–2032
9. Lee TS, Tzeng KS, Chong MS (2004) Robust controller design for a single-phase UPS inverter
using /spl mu/-synthesis. IEE Proc Electric Power Appl 151(3):334–340
10. Lanzon A, Tsiotras P (2005) A combined application of H/sub /spl infin// loop shaping and /spl
mu/-synthesis to control high-speed flywheels. IEEE Trans Control Syst Technol 13(5):766–
777
11. Qian X, Wang Y, Ni ML (2005) Robust position control of linear brushless DC motor drive
system based on /spl mu/-synthesis. IEE Proc Electric Power Appl 152(2):341–351
12. Shahroudi KE (2006) Robust servo control of a high friction industrial turbine gas valve
by indirectly using the standard$mu$-synthesis tools. IEEE Trans Control Syst Technol
14(6):1097–1104
13. Franken N, Engelbrecht AP (2005) Particle swarm optimization approaches to coevolve
strategies for the iterated prisoner’s dilemma. IEEE Trans Evol Comput 9(6):562–579
14. Kahrobaeian A, Mohamed YAI (2013) Direct single-loop /spl mu/-synthesis voltage control
for suppression of multiple resonances in microgrids with power-factor correction capacitors.
IEEE Transactions on Smart Grid 4(2):1151–1161
15. Bevrani H, Feizi MR, Ataee S (2016) Robust frequency control in an islanded microgrid: ${H}
_{\infty }$ and $\mu $ -synthesis approaches. IEEE Trans Smart Grid 7(2):706–717
16. Cai R, Zheng R, Liu M, Li M (2018) Robust control of PMSM using geometric model reduction
and $\mu$-synthesis. IEEE Trans Indus Electron 65(1):498–509
17. Gu DW, Petkov P, Konstantinov MM (2005) Robust control design with MATLAB®. Springer
Science & Business Media
18. Cao YY, Frank PM (2000) Analysis and synthesis of nonlinear time-delay systems via fuzzy
control approach. IEEE Trans Fuzzy Syst 8(2):200–211
Ant Colony Optimization-Based Solution
for Finding Trustworthy Nodes
in a Mobile Ad Hoc Network

G. M. Jinarajadasa and S. R. Liyanage

Abstract Mobile ad hoc network (MANETs) is one of the most popular wireless
networks which is having dynamic topologies due to their self-organizing nature.
These are less infrastructure-oriented networks due to nodes being mobile, and
hence routing becomes a more important issue in these networks. With the ubiq-
uitous growth of mobile and the Internet of things (IoT) technologies, mobile ad
hoc networks are acting as a vital character in the process of creating social inter-
actions. Although these are having plenty of problems and challenges including
security, power management, location management, and passing multimedia over
the network due to the routing issue. MANETs consist of plenty of dynamic connec-
tions between nodes by finding a trustworthy route for communication is a challenge.
Therefore, based on swarm intelligence methodologies, an ant colony optimization
(ACO) algorithm for finding the most trusted path is proposed here via the use of
probabilistic transition rule and pheromone trails.

Keywords Mobile Ad hoc Networks (MANETs) · Swarm intelligence · Ant


colony optimization (ACO) · Probabilistic transition rule · Pheromone trails

1 Introduction

With the invention of IoT and social networking concepts, the use of wireless devices
and mobile technologies has increased over the recent past decades. Hence, wireless
networks including mobile ad hoc networks provide the major contribution in estab-
lishing the interactions among the network nodes in service-oriented networks. But
the major issue in MANETs is lack of security because of the easily changing nature
of network topology since organizing the network nodes happen in a decentralized
manner and not having a fixed infrastructure. Therefore, there is a problem with the

G. M. Jinarajadasa (B) · S. R. Liyanage


University of Kelaniya, Kelaniya, Sri Lanka
e-mail: madhushikagihani@gmail.com
S. R. Liyanage
e-mail: sidath@kln.ac.lk

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 719
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_52
720 G. M. Jinarajadasa and S. R. Liyanage

reliability of the transferred data within a mobile ad hoc network. As an example, an


application of a mobile ad hoc network for a service-oriented network, the service
provider should guarantee the trustworthiness of the requested service or the trans-
ferred data to the service provider. When transferring data, the message or the packet
must be transferred through several nodes. Those intermediate nodes can be either
trustworthy nodes or compromised nodes. Hence, to guarantee the safety and relia-
bility of data passed, the transferred data must avoid traversing through the compro-
mised nodes. There can be a set of possible paths available for data transferring and to
ensure the trustworthiness of the transferred data finding the optimal path for sending
the data will be the solution. Hence, finding the shortest possible path which contains
only trustworthy nodes is needed. The trust value of the network nodes is computed
considering the node properties and the recommendation of neighbor nodes and with
the help of transition probabilistic rule and the heuristic values obtained by applying
the ACO algorithm, an optimal trust path can be identified [1, 2].

1.1 Swarm Intelligence

Swarm intelligence is one piece of computational intelligence that portrays the


aggregate conduct of decentralized and self-composed frameworks that are natural
or artificial. Provinces of ants and termites, fish, honeybees, groups of birds, and
crowds of different creatures are the instances of the systems concentrated by swarm
intelligence [3].

1.2 Artificial Ants and the Shortest Path Problem

Artificial ants live in a discrete world and store pheromone in an issue relying on the
way. They have additional capacities like neighborhood search, look forward, and
backtracking. By exploiting the inward memory and storing a measure of pheromone
capacity of the arrangement quality, they can utilize neighborhood heuristics [4].
As shown in Fig. 1, ants are given a memory of visited hubs where ants manufac-
ture arrangements probabilistically without refreshing pheromone trails. Ants deter-
ministically in reverse backtrack the forward way to refresh pheromone and they
store several pheromone functions, as per the quality of the arrangement they created
(Fig. 2).

1.3 Ant’s Probabilistic Transition Rule

At each node, ant takes the decision where to move next, depending on the pheromone
trail. In light of the pheromone stockpiling on every node, the ant takes the choice
Ant Colony Optimization-Based Solution for Finding Trustworthy … 721

Fig. 1 Finding the shortest way to arrive at the goal

Fig. 2 Ant utilizing


pheromone, heuristics, and
memory to pick the next hub

by applying the probabilistic rule which can be characterized as follows.

α 
N
Pikj = i ij/ k
i n=1 [i ikj ]α (1)


i ij is the amount of pheromone trail on the edge and N k i is the set of probable
neighbor nodes ant k positioned on node i can shift to where α is the relative influence
of pheromone function [5, 6].

1.4 Ant Colony Optimization Metaheuristics

This is the populace-based strategy in which artificial ants iteratively develop possible
answer arrangements. In each cycle, every ant makes one possible answer arrange-
ment utilizing a productive search technique. The development of the answers is prob-
abilistically influenced by pheromone trail data, heuristic data, and partial possible
722 G. M. Jinarajadasa and S. R. Liyanage

solutions or answer arrangements of every ant. Pheromone trails are adjusted during
the search procedure to reflect aggregate experience [7, 8].

2 Related Work

Swarm intelligence and the discipline of mobile ad hoc networks have been
researched in many types of aspects in the field including shortest path search,
routing protocol optimization, energy balancing, and improving the quality of
service. Among the swarm intelligence mechanism, there are plenty of utilizations
of the ant colony optimization in the improvement of different aspects of MANET
environments.
A hybrid routing algorithm for MANETs based on ant colony optimization called
HOPNET is proposed by Wang, J. et al., where the calculation has been contrasted
with the random waypoint model and random drunken model with the ZRP and DSR
routing protocols [9].
A routing protocol called ant colony-based routing algorithm (ARA) is proposed
with the main goal of reducing the routing overhead. The presented routing protocol
is exceptionally versatile, productive, and scalable [10].
The “AntNet” and the “AntHocNet” are applications of swarm intelligence in
MANETs where they utilize the notion of ant colony optimization (ACO) by finding
near-best solutions to graph optimization problems [11].
The AntNet and the AntHocNet are finding the closest optimal routes in a drawn
diagram of interactions without global information. But the disadvantage of this
approach is, it creates additional communication overhead by the usual transferring
of both “forward ants” and the “backward ants” [12, 13].
Schoonderwoerd et al. have addressed the above-mentioned issue by proposing
the solution called ant-based control (ABC) which is very similar to AntNet where
makes the communication overhead relatively smaller by using only the forward ants
[14].
A trust calculation for online social networks based on ant colony optimization
is suggested by Sanadhya et al., where it creates a trust cycle for trust pathfinding in
the means of achieving the trustworthiness and satisfaction of the service provided
to the service requester [8].
An improved ACO-based secure routing protocol for wireless sensor networks is
proposed by Luo et al. for the optimal path finding with the combination of probability
value and applying fuzzy logic for trust value calculating. The proposed secure
routing protocol shows that the calculation can ensure the discovery of the forwarding
path with low cost in the reason of guaranteeing security [15].
When it comes to mobile ad hoc networks, the biggest challenge is to discover a
way among the communication endpoints, satisfying the client’s quality of service
(QoS) prerequisites. Several approaches are suggested to improve the quality of
service requirement in MANETs while finding the multiple stable paths among the
source and the goal [16, 17, 18].
Ant Colony Optimization-Based Solution for Finding Trustworthy … 723

An altered ant colony algorithm is proposed by Asghari et al. to locate an authentic


path alongside improving load balancing among the target users where the refreshed
pheromone reverse affects the chosen path by the ants [19].

3 Proposed Work

The proposed work can be divided into three major parts which are creating the
artificial intimacy pheromone for defined mobile ad hoc network, applying the trust
concept with the ant’s probabilistic transition rule, and the algorithm calculation of
trust value.


3.1 Network Intimacy Pheromone ( i s )

When considering a MANET environment, a node can connect to another network


node with the network communication intimation. The network intimation value can
be defined as; if node “A” can connect to his neighbor node “B” which is within

one-hop distance network intimacy pheromone is equal to one ( i s = 1); if node “A”
cannot connect to other nodes directly without having the help of one-hop neighbor

node, then network intimacy pheromone is less than one ( i s < 1).
As shown in Fig. 3, when node A connects to nodes B, C, and D, since node A

can directly connect with them, the network pheromone value i s is equal to 1. But if
node A requires connecting to node E, F, and G, node A should connect indirectly

by going through the several hops. Then, the network pheromone value i s becomes
less than 1 where it decrements the pheromone value by 0.1 for each hop.
Based on the above Table 1, an aggregated weighted heuristic function can be
generated to calculate the heuristic value for the data transferring through the created

Fig. 3 Pheromone value variation with the neighbor nodes


724 G. M. Jinarajadasa and S. R. Liyanage

Table 1 Filtered network


Parameter for trust calculation Weight
parameters and weights
Control packets received for forwarding 1
Control packets received forwarded 1
Routing cost 2
No. of packet collisions 1
Data packets received for forwarded 1
Data packets received forwarded 1
Packet dropped 2
No. of packets transmitted 1
No. of packets received 1
Packet signal strength 0.1
Available energy 0.1

network.

η̃r = sum of weights between two adjacent nodes (Ws )


/Upper bound of weighted sum(Ws ) (2)

3.2 Trust Decision by Ant’s Probabilistic Transition Rule


(Pk ij )

With the aggregation of the network heuristic values and the network intimacy
pheromone values, ant’s probabilistic transition rule can be modified as follows to
make a trust defining rule for ants to decide the next trustworthy node among the
neighbor nodes.
 α  β  α  
   
β
Pikj = i s (i, j) ηr (i, j) /i=1
n
is ηr (3)

Trust decision value by ant’s probabilistic transition rule can be declared as above

where i s (i,j) and Ár (i,j) are, respectively, the pheromone value and the heuristic value
related with the i and j nodes. Further, α and β are positive real parameters whose
values decide the overall significance of pheromone versus heuristic data.
Ant Colony Optimization-Based Solution for Finding Trustworthy … 725

3.3 Trust Calculation ACO Algorithm

Because of the network parameters filtered from the network simulation and with the
help of probabilistic and heuristic values in ACO the following algorithm is proposed
to calculate the trust in the created MANET environment.

1 Get filtered network data


2 For (1 to Iteration)
Begin [where Iteration = total no of Iterations]
3 Calculate the upper bound of the weighted sum of network parameters
4 Determine node actions matrix (d) from filtered data and calculate heuristic value Ár
5 Initialize data(d,α,β,a,N,E,φ) [ where α = 1, β = 2, a = Total no of ants, N = total no of
nodes, E = requesting node, φ = pheromone evaporation
6 Loop
7 Randomly position a Ants on E node
8 For (1 to N)[calculate trust path with the path construct a solution, from source to
destination)
9 Begin
10 For(1 to a)
11 Begin
12 For (1 to N-1 neighbor node)
13 Begin
 
14 Pk ij = [ i α β / n α β
s (i,j) ] [Ár (i,j) ] i=1 [ i s ] [Ár ]
15 Update network intimacy pheromone
 
i s (i,j) (1- φ) i s (i,j) ]
16 End
17 End
18 End
19 End

Algorithm 1: ACO algorithm for trust calculation.

4 Experimental Results

Figure 4 shows the created MANET environment with 9 nodes that choose the routing
behavior with the ad hoc on-demand distance vector (AODV) routing protocol, and
Fig. 5 shows the training of ants for minimum trust pathfinding by the pheromone
updating where 10 ants and the no. of iterations were 100.
726 G. M. Jinarajadasa and S. R. Liyanage

Fig. 4 MANET simulation with 9 nodes and AODV protocol

Fig. 5 Minimum trust cycle with the length of 99


Ant Colony Optimization-Based Solution for Finding Trustworthy … 727

5 Conclusion

The data are transferred between the nodes against a MANET by utilizing a reliable
path in the given network is achieved in this research. The integrity of the commu-
nication between the nodes is calculated by the ant’s probabilistic transition rule and
heuristic values calculated by the modified probabilistic trust value calculating equa-
tion. From the simulation of the trust, the calculation algorithm applied network is
finding the shortest and the optimal trust path which is having the trustworthy nodes.

References

1. Papadimitratos P, Haas Z (2002) Secure routing for mobile ad hoc networks. In Communication
Networks and Distributed Systems Modeling and Simulation Conference (CNDS 2002) (No.
SCS, CONF)
2. Marti S, Giuli TJ, Lai K, Baker M (2000) Mitigating routing misbehavior in mobile ad hoc
networks. In Proceedings of the 6th annual international conference on Mobile computing and
networking (pp 255–265), ACM
3. Bonabeau E, Marco DDRDF, Dorigo M, Theraulaz G (1999) Swarm intelligence: from natural
to artificial systems (No. 1). Oxford university press
4. Hsiao YT, Chuang CL, Chien CC (2004) Ant colony optimization for best path planning. In
IEEE International Symposium on Communications and Information Technology, ISCIT 2004.
(vol 1, pp 109–113). IEEE
5. Kanan HR, Faez K (2008) An improved feature selection method based on ant colony
optimization (ACO) evaluated on face recognition system. Appl Math Comput 205(2):716–725
6. Asif M, Baig R (2009) Solving NP-complete problem using ACO algorithm. In 2009
International conference on emerging technologies (pp 13–16). IEEE
7. Dorigo M, Stützle T (2003) The ant colony optimization metaheuristic: algorithms, appli-
cations, and advances. In Handbook of metaheuristics (pp 250–285), Springer, Boston,
MA.
8. Sanadhya S, Singh S (2015) Trust calculation with ant colony optimization in online social
networks. Procedia Comput Sci 54:186–195
9. Wang J, Osagie E, Thulasiraman P, Thulasiram RK (2009) HOPNET: A hybrid ant colony
optimization routing algorithm for mobile ad hoc network. Ad Hoc Netw 7(4):690–705
10. Gunes M, Sorges U, Bouazizi I ARA-the ant-colony based routing algorithm for MANETs. In
Proceedings international conference on parallel processing workshop (pp. 79–85). IEEE
11. Dorigo M, Birattari M, Blum C, Clerc M, Stützle T, Winfield A (eds) (2008) Ant colony
optimization and swarm intelligence. In Proceedings 6th international conference, ANTS 2008,
Brussels, Belgium, 22–24 Sept 2008 (vol 5217) Springer
12. Di Caro G, Dorigo M (1998) AntNet: distributed stigmergetic control for communications
networks. J Artif Intell Res 9:317–365
13. Di Caro G, Ducatelle F, Gambardella LM (2005) AntHocNet: an adaptive nature-inspired
algorithm for routing in mobile ad hoc networks. European Trans Telecomm 16(5):443–455
14. Schoonderwoerd R, Holland OE, Bruten JL, Rothkrantz LJ (1997) Ant-based load balancing
in telecommunications networks. Adaptive Behavior 5(2):169–207
15. Luo Z, Wan R, Si X (2012) An improved ACO-based security routing protocol for wireless
sensor networks. In 2013 International Conference on Computer Sciences and Applications
(pp 90–93). IEEE
16. Roy B, Banik S, Dey P, Sanyal S, Chaki N (2012) Ant colony based routing for mobile ad-hoc
networks towards improved quality of services. J Emerg Trends Comput Inf Sci 3(1):10–14
728 G. M. Jinarajadasa and S. R. Liyanage

17. Deepalakshmi P, Radhakrishnan DS (2009) Ant colony based QoS routing algorithm for mobile
ad hoc networks. Int J Rec Trends Eng 1(1)
18. Asokan R, Natarajan AM, Venkatesh C (2008) Ant based dynamic source routing protocol
to support multiple quality of service (QoS) metrics in mobile ad hoc networks. International
Journal of Computer Science and Security 2(3):48–56
19. Asghari S, Azadi K (2017) A reliable path between target users and clients in social networks
using an inverted ant colony optimization algorithm. Karbala Int J Modern Sci 3(3):143–152
Software Development for the Prototype
of the Electrical Impedance Tomography
Module in C++
A. A. Katsupeev, G. K. Aleksanyan, N. I. Gorbatenko, R. K. Litvyak,
and E. O. Kombarova

Abstract The basic principles and features of the implementation of the electrical
impedance tomography (EIT) method in the C++ language are proposed in this
research. This software will significantly reduce the hardware time for performing of
computational operations and will expand the capabilities of the technical implemen-
tation of the EIT method in real technical systems of medical imaging. An algorithm
for the operation of the EIT module prototype software in C++ has been devel-
oped. The principles of building the software for the EIT module prototype have
been developed, which provides the possibility of embedding into other medical
equipment. The software interface of the EIT module prototype has been developed.

Keywords Electrical impedance tomography · Medical software ·


Reconstruction · Conduction field · Software principles

1 Introduction

Improving the quality and reliability of medical diagnostic information is one of


the key tasks of the modern healthcare system worldwide. In this regard, there is a
practical need for the development and creation of tools for obtaining and processing
the medical visualizing information as one of the main components of a significant

A. A. Katsupeev (B) · G. K. Aleksanyan · N. I. Gorbatenko · R. K. Litvyak · E. O. Kombarova


Department of Informational and Measurement Systems and Technologies, Platov South-Russian
State Polytechnic University (NPI), Novocherkassk, Russia
e-mail: andreykatsupeev@gmail.com
G. K. Aleksanyan
e-mail: graer@yandex.ru
N. I. Gorbatenko
e-mail: gorbatenko@novoch.ru
R. K. Litvyak
e-mail: litvyak_rk@rambler.ru
E. O. Kombarova
e-mail: ms.ekom@mail.ru

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 729
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_53
730 A. A. Katsupeev et al.

examination of the patient. In this regard, it is promising to develop new medical


and technical systems that can increase the efficiency of existing devices and provide
medical personal additional information about the human condition to systematize
and formulate the correct diagnosis and develop adequate and correct treatment
tactics. In this regard, the creation of technical means based on the receipt and
analysis of the electrical properties of an object is an urgent area of the modern
medical instrumentation. One of the promising prospects in this area is the electrical
impedance tomography [1–3], which finds new aspects of use in clinical practice
every year. The method is understandable and relatively easy to implement in tech-
nical devices, but it has several disadvantages for use in real (quasi-real) time. This is
due to a number of factors, one of which is multilevel resource-intensive calculations
and operations for calculating, reconstructing, and visualizing the conduction field,
which is difficult to implement in hardware at the microprocessor level.
Existing application packages that offer various algorithms for implementing the
EIT method, such as EIDORS [4] and PyEIT [5], cannot be directly used in informa-
tional and measuring devices of the EIT due to the peculiarities of their implemen-
tation of the algorithms. In this regard, this work proposes a new implementation of
well-known algorithms in C++ [6] that will significantly reduce the hardware time
for performing computational operations and expand the capabilities of the technical
implementation of the EIT method in real technical systems of medical imaging.

2 Existing Technologies of Implementation

2.1 PyEIT Framework

The PyEIT framework [5] is based on the Python framework for modeling and
visualizing of the conduction field using the EIT method. The capabilities of PyEIT
include finite element modeling, 2D and 3D visualization, solving the forward and
inverse EIT problems, meshing, and image formation for external applications. The
mesh module can split the region into triangles (2D) and tetrahedra (3D). PyEIT
implements state-of-the-art EIT algorithms that support both static and dynamic
reconstruction.
PyEIT can use the following algorithms for dynamic reconstruction such as back-
projection method, GREIT [7], and NOSER [8]. PyEIT includes demo examples
of the operation of reconstruction algorithms [5]. To visualize the results of EIT
(including 3D), PyEIT uses the Matplotlib charting library [5].
Software Development for the Prototype of the Electrical … 731

2.2 Eidors

The specialized application package Electrical Impedance and Diffuse Optical


Reconstruction Software (EIDORS)[4] is intended for the reconstruction of images
based on the results of EIT [1]. The work of the EIDORS software system is based
on the use of the language and development environment MATLAB [9]. EIDORS
features include finite element modeling, solving the forward EIT problem, static
and dynamic reconstruction using various methods, 2D and 3D visualization [10].
EIDORS includes many reconstruction algorithms, such as the Gauss–Newton
algorithm, the back-projection method, the conjugate gradient method, the internal
point method, GREIT, and others.

2.3 Justification of the Need for Implementation in C++


Language

The EIT reconstruction algorithm requires high performance due to two factors such
as the first factor is building a finite element mesh and calculating the reconstruc-
tion matrix to calculate the conductivity field of the object under study based on the
measurement data. And the second factor is displaying the change in the conduc-
tivity field in quasi-real time during the measurement process of the object under
study. Here, the first factor is not key when implementing the EIT algorithm in
medical devices, since the reconstruction matrix can be generated in advance and
used repeatedly for different patients. However, the second factor is important in
view of the need to display the change in the conductivity field of the object under
study with sufficient speed.
Since C++ is a compiled programming language, it meets the stated performance
requirements and, as a result, can be used to develop software for the EIT module.
The proposed work has also developed a solution for processing EIT results in the
form of a Web portal [11], but this development is not intended for use in medical
equipment. This is due to the fact that the EIT channel is often used in clinical practice
not independently, but is integrated into existing medical and technical devices, for
example, lung ventilators, the software of which is implemented in the C++ language.
Thus, the general scheme of the developed software can be represented in Fig. 1.
The software is divided into two modules: information processing and software
interface. Software processing, in turn, is divided into a measurement process and
statistical processing of measurement data.
732 A. A. Katsupeev et al.

EIT module software

Information processing Software interface

Measurement Statistical
process processing

Fig. 1 EIT module prototype software diagram

3 Principles of Software for EIT Module Prototype

3.1 Software Algorithm

The developed software algorithm is shown in Fig. 2.


In general, the program algorithm is divided into three directions: carrying out
the measurement process, statistical processing of the results, and the generation of
a new model of the object under study for calculating the conductivity field.
In the case of the program running in the measurement mode, after setting the
measurement parameters and connecting to the device, from the measuring device
potential differences at the measuring electrodes are obtained, presented in the form

of sets  = {ϕ1 , …,ϕn }, ’ = {ϕ1 , …, ϕn ’ }.
where ϕi , i = 1, …, n—potential differences across the measuring electrodes
during the obtained measurement;
ϕi, , ϕi, ,i = 1, …, n—potential differences across the measuring electrodes during
the reference measurement;
n—measuring electrodes pairs number.
The conductivity field of the object under study within the tomographic section
is calculated by the formula:

 = H ∗ (  −),

where H—pre-generated reconstruction matrix. The reconstruction matrix consists


of pre-generated coefficients that are used to calculate the conductivity field values
based on the measurement vector. Thus, the number of matrix rows is equal to the
number of finite elements, and the number of columns is equal to the size of the
measurement vector.
Software Development for the Prototype of the Electrical … 733

Start

Measurement
process Other activity
Definition of the work purpose

Device connection Definition of the work purpose


Statistical Generation of
analysis new model

Setting measurement
Selecting the model view (2D
parameters MP (I, f, signal Selecting the patient
or 3D)
form)

Start of the Setting the shape and


Viewing the patient record
measurement cycle boundaries of the model

Receiving measurement data Setting the distance between


Viewing the statistical
Ψ = {φ1,…,φn} from the nodes of the finite element
information
device mesh

Calculation of the conduction


DICOM protocol generation Finite element meshing
field Ω = {σ1,…,σm}

Visualization of the
Reconstruction matrix
conduction field Ω =
calculation
{σ1,…,σm}

Calculation of lung ventilation


and perfusion

Archiving the measurement


process

End of the measurement


cycle

End

Fig. 2 Software algorithm of EIT module prototype


734 A. A. Katsupeev et al.

The vector  is a set of values of the conductivity field at the finite elements of
the reconstructed object model:

 = {σ1 , ...σm },

where m—the number of finite elements in the model.


Based on the measurement data, ventilation, perfusion, and ventilation-perfusion
ratio are calculated, as well as archiving of the measurement procedure is carried
out.
Another direction of the program’s work is the processing and formation of statis-
tical results based on the measurement data. Within the framework of this direction,
a measurement protocol is formed according to the DICOM standard.
The third direction is the generation of a new model of the object under study
for tomographic measurements. It is used if the standard chest cavity model is not
suitable and it is necessary to adapt the model for a specific patient.

3.2 The Principles of the EIT Module Prototype Software

The software is divided into the following modules, as shown in Fig. 3.


The software modules are conventionally divided into three groups, namely
measurement process, information processing, and user interface. The “measure-
ment process” group interacts with the measuring device, the “information process-
ing” group processes the measurement data, and the “user interface” group interacts
with the user.
The principles for implementing a solution in C++ language include the following:

(1) The main criteria are uninterrupted measurement process and speed of
processing the information received from the measuring device;
(2) Based on the analysis of state-of-the-art graphics visualization [12], a combina-
tion of GTK + OpenGl [13, 14] is used as a graphic library for displaying the
conductivity field to ensure high speed of information output. A screenshot of
the software is shown in Fig. 4.
The main blocks of the information displayed on the screen are the recon-
structed conductivity field, ventilation graphs, control buttons, and measurement
parameters setting block.
Within the framework of compatibility with the software used in lung ventilators,
it is planned to change the GTK graphics library to MFC [15].
(3) To minimize the computation time in the measurement data processing module,
it is necessary to use a compiled programming language, therefore, the
developed solution is implemented in the C++ language.
Software Development for the Prototype of the Electrical … 735

EIT module software

Measurement Information
User interface
process processing

EIT
Visualization of
measurement
Measurement the results of
process
data processing calculating
control
indicators
characterizing
the function and
Archiving and Calculation of condition of the
recording the ventilation by the lungs
measurement EIT method
procedure

Calculation of Signaling
perfusion by the
EIT method

Generation and
Calculation of maintenance of
the ventilation- patient records
perfusion ratio

Storage of
reconstruction
and visualization
results

Generation of
reports

DICOM protocol
generation

Fig. 3 Principles of the EIT module prototype software


736 A. A. Katsupeev et al.

Fig. 4 Screenshot of the software interface of the EIT module prototype

4 Conclusions

The implementation of the electric impedance tomography algorithm in different


environments is proposed using the C++ language. The software developed will
significantly reduce the hardware time for performing of computational operations
and will expand the capabilities of the technical implementation of the EIT method in
real technical systems of medical imaging. An algorithm for the operation of the EIT
module prototype software in C++ has been developed. The principles of building
the software for the EIT module prototype have been developed, which provides the
possibility of embedding into other medical equipment. The software interface of
the EIT module prototype has been developed.

Acknowledgements The study is carried out as part of the federal target program “Research and
Development in Priority Directions for the Development of the Russian Science and Technology
Complex for 2014-2020”, with financial support from the Ministry of Science and Higher Education
(agreement No. 05.607.21.0305). Unique agreement identifier RFMEFI60719X0305.

References

1. Adler A, Boyle A (2019) Electrical impedance tomography, pp 1–16. https://doi.org/10.1002/


047134608x.w1431.pub2.
2. Pekker JS, Brazovskii KS, Usov VN (2004) Electrical impedance tomography—Tomsk: NTL,
p 192
3. Aleksanyan GK, Denisov PA, Gorbatenko NI, Shcherbakov ID, Al Balushi ISD (2018)
Principles and methods of biological objects internal structures identification in multifre-
quency electrical impedance tomography based on natural-model approach. J Eng Appl Sci
13(23):10028–10036
Software Development for the Prototype of the Electrical … 737

4. Adler A, Lionheart W (2006) Uses and abuses of EIDORS: an extensible software base for
EIT. Physiol Meas 27(5):S25–S42. CiteSeerX 10.1.1.414.8592. https://doi.org/10.1088/0967-
3334/27/5/S03. PMID 16636416
5. Liu B, Yang B, Xu C, Xia J, Dai M, Ji Z, You F, Dong X, Shi X, Fupy F (2018) EIT:a python
based framework for electrical impedance tomography // SoftwareX 7(C):304–308
6. Stroustrup B (1997) “1". The C++ Programming Language (Third ed.). ISBN 0–201–88954–4.
OCLC 59193992
7. Adler A, Arnold J, Bayford R, Borsic A, Brown B, Dixon P, Faes T, Frerichs I, Gagnon H,
Garber Y, Grychtol B, Hahn G, Lionheart W, Malik A, Stocks J, Tizzard A, Weiler N, Wolf G
(2008) GREIT: towards a consensus EIT algorithm for lung images. Manchester Institute for
Mathematical Sciences School of Mathematics. The University of Manchester
8. Cheney MD, Isaacson D, Newell JC (2001) Electrical impedance tomography. IEEE Sign
Process Mag 18(6)
9. MATLAB Documentation (2013) MathWorks. Retrieved 14 Aug 2013
10. Lionheart WRB, Arridge SR, Schweiger M, Vauhkonen M, Kaipio JP (1999) Electrical
impedance and diffuse optical tomography reconstruction software. In Proceeding soft he 1st
world congresson industrial process tomography, pp 474–477, Buxton, Derbyshire
11. Aleksanyan G, Katsupeev A, Sulyz A, Pyatnitsin S, Peregorodiev D (2019) Development of
the web portal for research support in the area of electrical impedance tomography. Eastern-
European J Enterprise Technol 6(2):6–15
12. Chotisarn N, Merino L, Zheng X (2020) A systematic literature review of modern software
visualization. J Vis 23:539–558
13. The GTK Project // https://www.gtk.org/
14. OpenGL—The Industry Standard for high performance graphics // https://www.opengl.org/
15. MFC Applications for desktop // https://docs.microsoft.com/ru-ru/cpp/mfc/mfc-desktop-app
lications?view=vs-2019
Information Communication Enabled
Technology for the Welfare
of Agriculture and Farmer’s Livelihoods
Ecosystem in Keonjhar District
of Odisha as a Review

Bibhu Santosh Behera, Rahul Dev Behera, Anama Charan Behera,


Rudra Ashish Behera, K. S. S. Rakesh, and Prarthana Mohanty

Abstract Odisha is an Agrarian State, and extension education plays a vital role in
agriculture growth and promotion of farmer’s livelihoods. For technology dissemina-
tion and knowledge generation purposes, ICT plays a vital role in Agrarian society
to empower the farming fraternity. In the year 2013–2017, a pilot-based research
was made by the group of researchers with the help of OUAT, KVK and the agri-
culture department to publish this research review for investigation. Here, expost
facto design and multistage sampling were taken for study along with a structured
followed for collection of data. Around 170 samples were taken for a comprehensive
study in Keonjhar District Of Odisha. Here, the objective of this research is to provide
support for data collection.

Keywords ICT · Agriculture · Livelihood

B. S. Behera (B) · R. D. Behera · P. Mohanty


OUAT, Bhubaneswar, Odisha, India
e-mail: bibhusantosh143@gmail.com
R. D. Behera
e-mail: faculty.greencollege@gmail.com
B. S. Behera
International Researcher, LIUTEBM University, Lusaka, Zambia
A. C. Behera · R. A. Behera
Faculty Green College, Odisha, India
K. S. S. Rakesh
LIUTEBM University, Lusaka, Zambia

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 739
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_54
740 B. S. Behera et al.

1 Introduction

1.1 Brief Introduction

Information communication technology is a system under which all solutions may


be possible as per problem/challenges. ICT is also a useful weapon that can be
applied in every development sector for achieving maximum effectiveness in every
assigned task. Agriculture is the fuel of the country which plays a very dynamic role
in society and also as the backbone of the country’s economy. As per the research
findings, approximately 70% of the population in India makes its livelihoods from
agriculture (Kurukshetra, June 2015).

1.2 Theoretical Orientation

Agriculture must have constant diffusion of technology to convene worldwide food


security, environmental sustainability as well as poverty reduction. In India, as per
SECC information (2011), launched in 2015, the entire “households are 24.39 crores,
out of 17.91 crores” followed in the field that is non-urban, some of uncertain as well
as unsecured earnings. Because of globalization, demand and urbanization of higher
value item worldwide situation continue to be altering in the perspective of cultiva-
tion. Agriculture is quite hard for individuals though it is living below the poverty line.
To moderate the worldwide requirements as well as reforms of agriculture manufac-
ture, green revolution got positioned throughout the mid of 1960. The main goals of
the natural revolution had been developed in a location below agriculture, utilization
of chemical fertilizers, new technologies as well as pesticides to enhance the result.
In India, actions underneath the eminent researcher M.S.Swaminathan as well as
his panel of researchers “of Indian Council for Agriculture Research (ICAR). Green
revolution” enhanced the productivity of cereals—mostly wheat and rice along with
some other primary cereals such as maize to an assured amount, generally in western
Uttar Pradesh, Punjab as well as Haryana. In the opening phase, the green revolution
has been started through the spread of technology that is new to much better irrigated
as well as the endowed region. Afterward, numerous revolutions came about in India
to improve a range of other categories of production like the white revolution (milk
production), silver fiber (cotton production), round revolution (potato production),
“red revolution (meat and buffalo),” blue revolution ( fish production), evergreen
(productivity without loss), yellow revolution (oil seeds), golden (overall horticul-
ture), silver revolution (poultry production) and so on. Instead of these kinds of revo-
lution in India, numerous farmers follow the conventional method of farming. Every
year a lot of farmers are facing enormous damage throughout farming. Agriculture
manufacture constantly declines throughout the last several years.
Thus, farming advancement requires to offer new information as well as knowl-
edge to farmers’ doorstep. Advanced training includes different methodologies
Information Communication Enabled Technology for the Welfare … 741

of training to encourage awareness, consequently which agriculture part requires


advanced training to renew output via farming. Information communication tech-
nology (ICT) assists to give information to the farmers at their doorstep. It offers
knowledge associated with fertilizer consumption, price output in the markets pest
management, weather/climate information, online land registration, etc. Government
offices of each level are linked with a system, to offer knowledge to the farmers.
Krisak Sathi, development officers, agriculture expert, village agriculture workers
(VAW), as well as stakeholders are educating farmers, to settle in techniques that
are new to agriculture. In India, teledensity has quickly improved, according to the
government’s report 2015 (Kurukshetra, February 2016), in rural area and teleden-
sity is improved two times. Non-urban farmers access data concerning agriculture by
voice over call on a mobile phone, short message service (SMS). The Central govern-
ment works together among the state governments have been initiated in several ICT
centers equipped with telephone, broadband connection, internet, PCs, along with
development officer for example IFFCO-ISRO GIS project, VISTANET, Gyandoot
project, cyber dhaba, e-choupal, AMARKET and so on information based Knowl-
edge offerer via mobile based web portal, various web and Kisan Call Centres,farmers
web portal mkisan portal (www.mkisan.gov.in), (www.farmer.gov.in) (mksp.gov.in).
These portals are assistance information-based knowledge as well as advisory by
subject professionals. Department of Agriculture and Cooperation was created over
80 portals, mobile-based applications along with Web sites through the coopera-
tion of the National Informatics Centre. RKVY, DACNET, National Food Secu-
rity Mission (NFSM), SEEDNET, National Horticulture Mission (NHM), Acreage,
Productivity and Yields (APY), ATMA and INTRADAC are the essential portals.
The greatest proportion of citizens create livelihood through farming. This study has
specified significance to know about the different ICTs projects related to farming
growth. Special ICTs projects in Odisha as well as the government along with private
organizations are private designs programmers to attain the non-urban farmers.

ICT and Agricultural Development


ICTs in farming can assist better entry to data that support or drive information
distribution. ICTs help dissemination, retrieval, storage, management, along with the
creation of any related information, knowledge and data which might have already
been adapted as well as processed “(Bachelor 2002; Chapman and Slay maker 2002;
Rao 2007; Heeks 2002).” Since the analysis is all regarding the discrepancy of
ICTs as well as the application of it should create for growth especially in the field
of cultivation. In India, ICT in farming is a promising area concentrating on the
development of rural as well as agriculture improvement. ICT can supply with the
exact information needed for the farmers that help enhanced agriculture productivity.
Thus, the public–personal partnership, private government as well as initiatives
programs are created for the growth of agriculture. But in India, still in increasing
phase along with developing as an emerging style, the gain of ICT is still to achieve
every farmer. Possibly technological improvement numerous farmers particularly
that are share cropper as well as original are not receiving appropriate service
and information because of poor financial conditions as well as social constraints.
742 B. S. Behera et al.

Other reasons are language barriers, illiteracy as well as refusal to accept the new
technology.
The manner in that ICT projects entry, apply, assess, as well as deliver content
might enhance the possibility of ICT utilization through farmers as well as therefore
can develop into an essential element in a project. To submit the data seeking appetite
of farmers, ICT may be acted as a panacea for all of them. Here, the problem of a
farmer may be analyzed by using ICT tools with relevance to local condition of them.
Local content was described as content that is proposed for particular neighborhood
viewers, as described by language, culture or geographic location, or as content
which is economically, politically socially as well as culturally related to a certain
people. The optimum benefit of ICT should be reached to the doorstep of the farming
fraternity and rural artisans.

2 Scope and Significance of the Study

The Government” of India emphasizes on “Digital India” program. The Government


of India started to follow information ideas for the course to improve throughout the
1980 through Prime Minister Rajiv Gandhi. The work was concentrated on utilization
of ICTs to get into farming information in the component of Patna. The analysis
has concentrated the way the improvement officers as well as stakeholders utilize
ICTs as the right to use as well as utilization equipment would be the emphasis of
the research. The analysis is focused on using ICTs through data suppliers as well
as the way they distribute information to access utilization of cultivation the non-
urban farmers of Patna along with additionally how Patna’s farmers in non-urban
distant regions use information as well as use the assistance of ICTs in agriculture
development. Similarly in Odisha, OCAC, Department of IT, Government of Odisha
has been introduced an e-literacy program/digital literacy program to empower the
farmers and rural artisans. Digital Green Project is also supporting Odisha in Odisha
Livelihoods Mission Program for strengthening the livelihoods of farmers in Odisha.

2.1 Review of Literature

Among the several types of researches accomplished on the subject associated with
this particular effort, the mechanism is linked to ICT’s communication as well as
ICT’s role in non-urban development in the area such as agriculture, education,
health sanitation, and economy they are:
“Mohanty and Bohra [1]iii highlight the ICT’s role in the entire world through
the appearance of different types of equipment. ICT played a major role that has
not merely prepared access throughout the world easier but has assist combination
of consideration, procedure synergies in functioning techniques along with place,
democratic function approach along with participation in learning along with the
Information Communication Enabled Technology for the Welfare … 743

improving organization transparency increase alertness to along with e-governance’s


application has opened recent vistas of facilitation strategy as well as the management
system.
To discuss improving communication in non-urban “sectors, Dasgupta et al. [2]iv
bring out the” idea by guide to press for farming growth, though explaining about
communication models for development as well as technology transfer and research
about the way to create as well as transport of acceptance technologies requirements
for recreating the agriculture communication models. The benefit of communication
technologies assists in non-urban development as well as the benefit of information
technology in agriculture growth as well as impact distribution along with alteration
for agriculture.
In his research, Schware and Bhatnagar [3]v , highlight the victorious utilization of
information and communication technologies( ICT) in” non-urban growth. It starts
through an initial section that traces the past of the utilization of ICT in non-urban
India. It observes a few of the issues which were influenced the implementations
of non-urban development programs as well as also demonstrates ICT application
might assist and conquer them in upcoming years.
“Narula [4]vi explains about the dynamics of dysfunction” as well as the growth of
advancement. Additionally mentioned about how these two elements are impeded as
well as facilitated “by Development Communication models” running in particular
society at a specific point of time is offered improvement communication difficulties,
“Technological challenge; Strategies” and reach for advancement.
Singhal and Rogers [5]ix state that information communication that is new replaces
the speed of interaction by human throughout the globe. They create the globe as
a “Global Village” in which the whole world is linked through each other. In India
quite a distance to attaining information society, though vast amounts of employees
working in sectors related to the information to offer data from the ground level. A
significant role in the development procedure is played by information.
The technology in which new as well as its numerous applications, i.e., TV,
telecommunication, radio, computers, cable, as well as the Internet, is quickly leading
India to turn a society based on information. the ideas “Informatisation” techniques is
the procedure with that technologies based on communication are utilized as a way of
promoting socio-economic growth. The different Indian communication revolution
includes proliferation of telephone, software as well as the Internet for development
of venture capital, entrepreneurship, along with helpful government policies as well
as “networking between Indian business in Silicon Valley and their Indian-based
counterparts.”
Hanson and Narula [6]x states that a” genuine creation of the society is based
on information in itself as well as a topic to the common difficulty of interna-
tional transfer, postal service along with telephony through the computer. Ways to
numerous nations are responding to each other because of stress from a society based
on information. They examine the present situation of infrastructure development
policy, social systems and developing countries along with models of information
technologies as well as society. Exactly “how society accepts technologies in the
744 B. S. Behera et al.

social system” as well as lifestyle requirements of the different societies in global


perceptions.
Behera et al. [7] in his study entitled as in India, in agriculture sector, information
communication technology endorses retail marketing as a study stated that “ICT
mediated Market Led Agriculture Extension era. Thus, in this Information,” era
could not in a position to make it with no information. According to R.T.I. act 2005,
each individual justifies the appropriate knowledge for agriculture.
Consequently through providing regard to knowledge, one must generate a revo-
lution on information through the delightful mantra “Soochana se Samadhan.” The
second major producer of merchandise is India like vegetables and fruits. Among the
primary key problems, that need investigation, is the technique by that can decrease
the post-harvest damage that is quite considerable currently. This will require a plan
of the environment-friendly, cost-effective and efficient storage system. Addition-
ally, there is a necessity for worth inclusion in farming which creates to capitalize
on the cultivation outcome. This paper tries to emphasize the significance of ICT in
enhancing advertising actions of retail business in farming aspects in the economy
of Indian. The great opportunity of applying exactly the similar in Indian agricul-
tural business behaviors through several stories of achievement and versions for an
explanation of the significance of ICT in Agriculture Retail Marketing.
Behera et al. [7] in his study entitled ad E-Governance arbitrated farming for
continuous Life in India stated that the era of ICT that includes the ICT Mediated
farming Extension in Urban and non-Urban areas to distribute the principles of infor-
mation through “Expert System (ES), Decision Support System (DSS), along with
Management Information System (MIS)” by impregnating the Knowledge Manage-
ment System as well as User Interface. Therefore, e-agriculture, thus details an
emerging area concentrated on the improvement of rural and agricultural develop-
ment via enhanced information as well as communication procedure. The major
goal is to give an interface to consumers as well as farmers and also to facilitate
connecting up of cultivation manufacture marketing cooperative. In India, Gyan-
doot Project, IT-Kiosks, “Information Village Project of MSSRF (MS Swaminathan
Research Foundation),” ITCs, Eid-party agriline, E-chaupal, Bhoomi Project, Kisan
Call Center(KCC), I-Kisan Project of Nagarjun group of companies, Village Knowl-
edge Center and so on are the latest improvement of e-governance mediated agricul-
ture. It is the life of end-users as well as farmers in an alternative manner through
“Common Service centers in grass root level,” e-kiosks and knowledge management
portals.

3 Specific Objectives

The specific objectives of the work are to illustrate and analyze ICTs application
in agriculture in the Patna block and to discover the role of ICTs in agriculture
development. And also to assess the people’s consciousness toward ICTs application
in agriculture development.
Information Communication Enabled Technology for the Welfare … 745

3.1 Hypothesis Setting

The hypothesis of the” work is:


H1—An emphatic role of upliftment of agriculture growth is by ICTs.
H2—In rural areas, ICTs applications are still unreachable.
H3—The strength in the ICTs application for agriculture growth is enhanced by
mobile communication.

4 Research Methodology (Materials Methods)

The present work was accomplished through gathering “both primary and secondary
data.
Secondary Data Collection: The secondary data” information has been gathered
with the help of various sources of portals, Web sites, materials, along with other
existing records which are Act and Policy of Odisha Government, national as well
as state government agriculture portal, different projects as well as schemes on ICT
under Odisha’s Government.
The additional related details were collected from different publications, journals,
Internet, research paper, official records, magazines and news articles along further
exiting sources of information.
Sample Design: To work for the responsibility of ICT in a location like Patna in
Keonjhar District, the sample was created as per the possibility of investigating ways
in the fixed time.
Population of the Study: 29,755 are the research population composed of farming
labors, and farmers (private agencies) still are exclusively connected through the
farming.
Sample Area: In Odisha’s agriculture, Patna blocks play an important part in
maize production. Approximately 80% of people are exclusively associated with
agriculture as well as industries based agro that offer livelihood on the population
of the block. 2 g panchayats are chosen out from 24 panchayats for the gathering
of data as per agricultural activities, 1 maximum cultivation actions and another one
lowest actions. The chosen location of the sample from the block of Patna includes
“16 different villages, 8 villages” through every 2 panchayats along with a sample
which is small individuals as well as big farmers. Thus, ICT setup is needed in which
it experiences the lack of data. The study done is descriptive research.
Sample Size: 170 are sample sizes which consist of: “4 stakeholder (private agents
of seeds and fertilizer companies),” 4 government officials, 160 farmers along 2 ICTs
experts.
746 B. S. Behera et al.

Sample Selection: Simple random sampling methods were used for the sample.
With the help of stratified random sampling techniques, ten farmers were chosen
from every village; in each panchayat, they have eight revenue villages.
Primary Data Collection: It was gathered via two techniques observation as well as
a survey. From the farmers of selected villages, data was collected through a schedule.
From every village, ten farmers were taken. The selection was completed from the
catalog of stakeholders. The schedule was organized through both open-ended and
closed-ended “questionnaire. Although gathering primary data of non-participatory”
surveillance, a technique was observed.
Tools and Techniques: Schedule” was utilized “as a tool of survey method.”
Data analysis and interpretation: Data is examined in quantitative as well as qual-
itative procedures. Collected data from each panchayat are examined averagely. To
understand the variation, a comparative analysis was completed. To evaluate the size
of data, SPSS software was utilized.
Farmers are experiencing different press behavior that do not limit with particular
media. An 21% of average consumes just on media based on electronic (radio as
well as TV), 7% on folk media, 51% on both folk and electronic, 5.6% on print as
well as electronic, 3.15% on print, folks as well as electronic, 4.4% on electronic
as well as the Internet, 0.65% on print only as well as 6.25% on every category,
respectively. Here, over 10% of farmers have utilized the Internet. Therefore, the
impact of electronic media (radio as well as TV), as well as folk media, is superior
compared to other media.
Consumption of media is approximately equivalent to both sides since farmers
spend the rest time of theirs on leisure. Primarily paying on entertainment typical is
41.55%, 3.15% just on a news program, on 37.75% on both entertainment as well
as news are entertainments along with others 0.65%, agriculture as well as news
associated information around 2% as well as every category of programs are15.1%.
Although the ICT based information system are data oriented deliberately go via
agricultural based data as a news channel and public information system. When a
consequence, “15.1 + 1.9 = 17% of farmers are” remarkably search data according
to the requirements of theirs with the help of media.
India’s only completely agricultural-based channel is DD Kisan. The channel
has broadcasting programs mainly based on agriculture, but an average of 37.79%
of individuals are widely recognized all concerning the channel, while 62.21% of
farmers do not have any clue about its significance as well as existence.
It presents that farmers approximately 55.5% become weather/climate-associated
information with the surveillance as well as asking out of knowledgeable ICTs users
or friends. 5% of farmers have studied with the help of media, i.e., weather forecast or
news of radio, newspapers or TV. Basically, 1.25% with the assist ICTs application,
27.55% during different sources such as relatives, friends along with family those
are ICTs users, tools along with observation of “media, 11.25% of them access” via
ICTs application as well as media. Thus, 1.25 + 11.25 = 12.50% “an average of
farmers” utilize “ICTs application for weather/climate” information.
Information Communication Enabled Technology for the Welfare … 747

Data transfer is from one to numerous via inter-personal communication. Usually,


20% of farmers obtain knowledge regarding program, announcement, distributions,
government policy and so on from friends, 5.65% from agriculture officials (village
level workers, Krisak Sathi along with other government officials), as 0.65% study
just from media, along with 2.55% directly via ICTs applications, 53.8% through the
assistance of agriculture officials as well as friends, 3.8% media as well as agriculture
officials, 5% via media, agriculture officials along with friends, 5.65% with the
assist of media as well as friends, moreover directly via ICTs application 3.15%.
Consequently, farmers are utilizing data concerning government, policy program as
well as services with the assistance of ICTs applications, and 2.55 + 3.15 = 5.7%.
is the percentage.
Analysis state regarding how the farmers are learning strategies which are latest
of farming for enhanced production, fertilizer usage, “pest management, treat the
different disease of crops” as well as market price results. Typical 7.5% learns
exceeding information by own knowledge along with 22 observations through
friends. 2.5% by the assistance of “agriculture officials, 3.15%” by assistance “of
ICTs application and 80.65%” via the different sources such as observation, friends,
stakeholders along with government officials, although 6.3% learn from stakeholders.
Consequently, those ICTs applications specifically assist farmers to acquire the latest
information as well as innovative methods and total contribution of its 3.15 + 6.3 =
9.45%.
Between the entire proportion of mobile users, approximately 36% of them
utilize just for communication among relatives as well as friends, 23.29% have
utilized for double reason entertainment (Play game, watching the video, listen songs
etc.), as well as communication approximately 15% for communications as well as
also to learn about agriculture “based information (agriculture extension purpose)”
along with 25.85% each the categories of an idea. Therefore 25.85 + 15 = 40.85,
approximately 41% obtain knowledge concerning farming through the assistance of
communication through mobile.
Cultivation extension officers offer data straight to the farmers concerning way
and techniques which are new of farming. At the first phase, extension officers
are achieving information by the assistance of ICTs application after which data
distributes to the whole agriculture society. As an outcome, 21.3% of farmers
enhanced the profits of theirs by the assistance of extension workers as well as
ICTs applications.
Figure 1 depicts the relation between information sources Vs rate of adoption
and non-adoption rate of each source, and finally, most of the respondents were
belong to the farming community; so accordingly as per their literacy and knowledge
percentage, they were exposed to various sources. Apart from these, most of the
respondents were women that are why due to their shyness they prefer puppet and
folkways are the trustable sources of information, and due to illiteracy, wall painting
is also one of the best sources and NGOs is also one of the trustable sources as per
findings.
Here, Fig. 2 depicts the preference of information consumers toward the various
source of information with the adoption rate. Local traditional media is highly
748 B. S. Behera et al.

180
160 136 145
140
120 95
100 AdopƟon rate
80 68 70 65 60
60 34 36 35 32
40 18 19 Non-AdopƟon rate
20
0
Linear (AdopƟon rate)

Linear (Non-AdopƟon
rate)

Fig. 1 Information sources versus adoption and non-adoption rate. Source own data

180 165
160 140
140
120 AdopƟon Rate
95 89
100 85
80 71
Non-AdopƟon Rate
60
40 30
Linear (AdopƟon Rate)
20 5
0
Print Electronic Internet Local & Linear (Non-AdopƟon
Media Media Media TradiƟonal Rate)
Media
Rural Youth Women Farmers
ArƟsans

Fig. 2 Adoption rate of information consumers with various information sources along with rate
of adoption. source own data

adopted by people, and Internet media is the lowest part. The trendline reflects
the adoption and non-adoption level of information which passes through both
information consumers and information media.
Information Communication Enabled Technology for the Welfare … 749

5 Result and Conclusion

Media consumption is much superior between the farmers. Almost 99% of them use
on media, it might be regular or electronic media, folk media, or media which is
new. According to the information, less or more farmers behave on several media
types. Around 11% of the farmers are utilizing the Internet. Usual 17% of farmer’s
use media for agriculture-based information, another percentage of farmers utilizing
for other reason like news, entertainment, along with various other kinds of details.
Between the media users, about 38% are familiar with “the DD Kisan Channel.
Those are” familiar with the DD Kisan Channel, 63.63% are seeing the channel
frequently, and 97.5% declared every information is appropriate for the farming
extension.
Weather information cooperates a critical role for farming. Now at Patna, an
average of 12.50% of farmers utilizing ICTs application to find out regarding
weather/climate information. Several of them are receiving data via media, ICTs,
as well as through the surveillance.
In Patna, essentially farmers are discovered agriculture strategies by the ancestors
as well as friends. ICTs assist them degree to understand cultivation procedure.
Essentially information is moving with agriculture experts or extension offices to the
farmers; Patna’s portion is 53.8%. 5.7% are permission via the ICTs applications.
The latest trends of methods utilize modern gadgets, use of improving and
advanced agriculture techniques, hi-tech concepts, pesticides management along
with other methods—normally diffuse through one to more as well as again more to
much more. In Patna, 9.45% of farmers use every kind of data with the assistance of
ICTs application as well as every source. At the starting phase, stakeholders, experts
and extension officers increase trends via ICTs types of equipment for the future
phase; they instruct farmers through visit fields, workshops as well as demonstrations.
Mobile phone functions as resources of ICT in Patna “block, an average of 87.87%
farmers” utilizing a cell phone. Between the 22.3% using smartphone while 77.7%
owning a regular phone, tab person is 0. Cell phone mostly utilizing now for commu-
nication among relatives or friends as well as the portion is 36%, along with 23.29%
utilizing for entertainment and communication “(listen to music, watching the video
such types,” play game), an approximately 41% utilizing for collect data concerning
farming. Out of the cell phone customers, 65% of mobile users hardly ever read SMS
that is received in the inbox, except 35% seriously read the SMS. An average 38% of
farmers get SMS through several portals, registered Web sites or government offices
concerning agricultural requirements. In Patna, Internet users hike gradually, and the
absence of appropriate connectivity of broadband and vulnerable strength of mobile
network produce obstacles to use the “Internet. In spite of these, 58.35% of farmers”
are utilizing the Internet, out of which “22.3% are smartphone users. Between the
58% Internet users, 62.50% of farmers browse” farming associated information on
the Internet. However, just 3% of them frequently visits as well as know portals
related to farmers along with such types of Web sites.
750 B. S. Behera et al.

Basically, 40% of farmers understand regarding Kisan Call Centres, others do not
include any clue regarding it. Between the 40% to 23.49%, users maintain commu-
nication in KCC; out of the 23.49%, just 65.8% are authorized mobile numbers of
theirs on KCC, among 65.8% of farmers who are registered 75% obtain message from
KCC regularly. Government of Odisha, free of charge mobile distribution programs
for Kisan Credit cardholders, in Patna Block, no one availed mobile phone.
PCs as well as laptops are seldom utilized in this area, just 1.75% of farmers
utilized such kinds of progenies or devices of farmers are utilizing laptops, and
PCs; they tell their elders concerning questions related to agriculture. In Agriculture
Extension, 8.67% of farmers are aware of the benefits of ICTs application. Through
the suggestions of extension workers, collected data with the help of ICTs application,
21.3% of farmers enhance the manufacture of theirs.
Mandi facilities are really poor, just paddy are bought through neighborhood
mandi, and fresh foods are delivered to the neighbor mandis of the states. Approx-
imately 8.7% of the farmers obtain gain via the assistance of mobile phones or
ICTs application. The mobile phone essentially enables you to collect information
regarding sell price; approximately 19% of farmers study sell price results via the
mobile phone.
ICTs applications like cell phones assist farmers to transform the old perceptions.
With the help of the mobile phone, farmers can create communication with Kisan
Call Centres, market holders queries with extension officers, share information with
friends, and also web on a mobile phone. Above most assists them to alter the standard
design of farming.

Acknowledgement Acknowledging the help and support of OUAT, Green College and LIUTEBM
University for publishing this paper sucessfully. Also thankful to Dr.B.P.Mohapatra, Dr.K.S.S
Rakesh for encouragement.

References

1. Mohanty L, Bohra N (2006) ICT strategies for school a guide for school administration. Sage
Publication, NewDelhi
2. Dasgupta D, Choudhary S, Mukhopadhyay SD (2007) Development communication in rural
sector. Abhijeet Publication, Delhi
3. Bhatnagar S, Schware R (eds) (2006) Information and communication technology in develop-
ment. Sage Publication, New Delhi
4. Narula U (2011) Development Communication: theory and practice. Har-Anand Publication,
New Delhi
5. Singhal A, Rogers EM (2011) Indian communication revolution, From Bullock Carts to cyber
marts. Sage Publication, New Delhi
6. Hanson J, Narula U (2012) New communication technologies in developing countries.
Routledge, Publication New York
7. Behera SB et al (2015) Procedica of computer science, Elseiver Publications
CHAIN: A Naive Approach of Data
Analysis to Enhance Market Power

Priya Matta, Sparsh Ahuja, Vishisth Basnet, and Bhasker Pant

Abstract Data analytics is one of the most important fields in this computing world.
Every emerging paradigm finally results in the generation of data. The rapidly
growing attraction from different industries, markets, and even academics results
in the requirement of deep study of big data analytics. The proposed work deals with
the unstructured data and raw data, and it then converts that into the structured and
consistent data by applying various modern methods of the data warehouse. These
methods include logistics and supply chain management (LSCM), customer relation-
ship model (CRM), and business intelligence for market analysis to make fruitful
decisions for an organization. Thus, the processes of data analysis are performed in
data warehouses that are ETL process which describes the steps of gathering and
transforming the data and finally placing the data to the destination. The proposed
CHAIN method is a core sector of our research work as a naive approach assisting in
improvising the market power. Here, in this research work, a market analysis on an IT
hardware sector is performed that deals with the sales of peripherals and telecommu-
nication devices in the market. This can be achieved via continuous communication
of clients and retailers to generate meaningful and relevant data, and this data can
also be analyzed for generating various required reports.

Keywords Data analytics · Extraction · Transformation · Logistics and supply


chain management (LSCM) · Customer relationship model (CRM)

P. Matta (B) · S. Ahuja · V. Basnet · B. Pant


Computer Science and Engineering, Graphic Era University, Dehradun, India
e-mail: mattapriya21@gmail.com
S. Ahuja
e-mail: sparshahuja261@gmail.com
V. Basnet
e-mail: vishistbasney@gmail.com
B. Pant
e-mail: pantbhaskar2@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 751
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_55
752 P. Matta et al.

1 Introduction

Big data creates a big opportunity with big data sets and helps to realize the benefits.
The targeted solutions for data analysis are to provide new approaches to achieve
impressive results. Marketers are collecting lots of data daily from a variety of
customers to paint a complete picture of each customer’s behavior. According to
the past CRM, analytics comprises all programming that analyzes data customers to
streamline better business decisions and monetizing.
Similarly, in this research work, the introduction of another analysis technique by
seeking the present results will lead a better decision making in businesses and also
be healthy for customers in the coming years. In this, marketers can feed these new,
real-time insights back into the organization to influence product development.
This research work provides a new methodology for data analysis using the chain
process. It begins by defining the ETL process and describing its key characteristics.
It provides an overview of how the analysis process is to be done. As by seeking the
past and present results in the market, it shows less interaction between customers
and shopkeepers. To enhance the relationship between them, their implications for
the future are also discussed. Key application areas for the process are introduced,
and sample applications from each area are described. Two case studies are provided
to demonstrate the past results, and how they can yield more powerful results now.
The paper concludes with a summary of the current state that how businesses can be
more powerful for the future.
The paper is composed of nine sections. After the introduction part, Sect. 2 defines
data analytics and its importance. Section 3 describes the extraction, transformation,
and load (ETL) cycle. Section 4 discusses the motivation behind our research. The
proposed technology is elaborated in Sect. 5, containing the definition of CHAIN,
Why CHAIN, How CHAIN can be achieved, and the proposed model of method-
ology. In Sect. 6, market analysis through the case study is discussed. The result
analysis is presented in Sect. 7. A comparison of the proposed approach with existing
methods is provided in Sect. 8, and the conclusion of the work is presented in Sect. 9.

2 Data Analytics and Its Importance

According to Gandomi [1], “Big data are worthless in a vacuum. Its potential value
is unlocked only when leveraged to drive decision making.” Decision making can
be accomplished by different institutes and organizations by turning the vast and big
amount of data into precise and meaningful information. According to him, different
industries and different organizations define and express big data in different ways.
Data analysis is a term that examines the extraction of the data set either a primary or
secondary, to organize and mold into helpful information for healthy decision making.
It also helps in reducing complexities in the managerial decisions, to enhance the
effectiveness, marketing policies, and end-user serviceability to boost the business
CHAIN: A Naive Approach of Data Analysis to Enhance Market Power 753

Input
TransformaƟon
Concept
Output
Analysis
ComputaƟon

Fig. 1 Data analytics process

performances. From input to the generation of output, the process can be understood
with the help of Fig. 1.
Importance of Data Analytics
According to Kempler [2], “to capitalize on these opportunities, it is essential to
develop a data analytics framework in which defines the scope of the scientific,
technical, and methodological components that contribute to advancing science
research.”
Other importance can be mentioned as it helps in reducing the banking risk by
identifying fraudulent customers from the historic data and also assists in presenting
appropriate advertisements depending on selling and purchasing historical data. It is
also exploited by various security agencies to enhance security policies, by gathering
the data from different sensors employed. It also assists in eliminating replicated
information from the data set.
Limitations of Data Analytics
The limitations of the data analytics can be described as while having surveys, the
person does not need to be providing accurate information. The missing values and
lack of substantial part could also limit its usability. Data may be varying in quality
and format when it is collected from different sources.

3 Extraction, Transform, and Load (ETL)

(A) Extraction
According to Akshay. S, “In the extract process, data is extracted from the source
system and is made accessible for further processing. The main objective of
the extract step is to extract the required data from the source systems utilizing
the least possible little resources” [3]. Extraction is the act of extracting the
records from a range of homogenous and heterogeneous resources, and it is to
be transferred to the data warehouse. This process is done in such a way that it
will not be affecting the performance as well as the response time of the system.
754 P. Matta et al.

Fig. 2 ETL process

(B) Transformation
According to Akshay [3], “The most complex part of the ETL process is the
transformation phase. At this point, all the required data is exported from the
possible sources but there is a great chance that data might still look different
from the destination schema of the data warehouse” [3]. Transformation is an
act of transforming the extracted data into fruitful information which is not
exactly similar to the structure of the data in the warehouse. In this process, the
naïve values are sorted, merged, and even derived by the application of various
validation rules.
(C) Loading
Once the data is extracted and transformed, it is now finally ready for the last
stage that is the load process in which the data is collected from one or more
sources into their final system. However, there are many facts like the process
of data loading and its influence on the storage of data in the data warehouse.
The way the data is being loaded may have its impact on the server’s speed of
processing as well as the analysis process. The other major consideration during
the data loading is to prevent the database from getting debilitate. According to
Talib and Ramazan, “This step makes sure data is converted into the targeted
data structure of data warehouse rather than source data structures. Moreover,
various schema and instance joining and aggregation functions are performed
at this phase” [4].

The process of ETL can easily be explained with the help of Fig. 2.

4 Motıvatıon Behind

As the local survey of the micro and medium enterprise has been recorded, the owners
of these enterprises said that they are lacking in establishing proper communication
with their customers since various platforms came into existence. A variety of plat-
forms is the reason for segregating the choice of customer, and thus the customer is
not stable to purchase goods from a single platform. Seller’s said that if the customers
will help them by giving their valuable and suitable suggestions regarding the product,
they will be providing the best services for them. Thus, from this survey, the CHAIN
CHAIN: A Naive Approach of Data Analysis to Enhance Market Power 755

process is introduced which helps to interact and enhance the relationship between
both the consumer and the seller.

5 Proposed Methodology

The analysis process can be accomplished after the data is gathered and inputted
on the basis of dynamic requirements generated by various end-users or customers.
“During early requirement analysis, the requirement engineer identifies the domain
stakeholders and models them as social actors, who depend on one another for goals
to be fulfilled, tasks to be performed, and resources to be furnished” [5]. After having
concern about the requirements, then it will be forwarded to the next process, that is,
data collection. Big data analytics in logistics and supply chain management (LSCM)
has received increasing attention because of its complexity and the prominent role
of LSCM in improving the overall business performance. Thus, while acquiring data
from multiple sets and performing analyses it found that the CHAIN method will be a
rising factor in mining techniques for the customer relationship model (CRM). “CRM
requires the firm to know and understand its markets and customers. This involves
detailed customer intelligence in order to select the most profitable customers and
identify those no longer worth targeting” [6]. “In the emerging markets of Asia,
dynamic capability played a crucial role in gaining competitive CRM performance
across all three industries” [7].
(A) CHAIN
“Another popular approach to customer preference quantification is the discrete
choice analysis (DCA) technique, which includes the probit model and logit models
(multinomial, mixed, nested, etc.) to name but a few” [8]. Data requirement is the first
and foremost part of data processing. The analysis process can be accomplished after
data is gathered and inputted on the based on dynamic requirements generated by
various end-users or customers. There are lots of ways to collect data from various
inputs but according to the situation of the market, the introduction of the term
“CHAIN” can be a rising factor for the local market businesses.
CHAIN stands for “Customer’s Help, Advice and Information Networks,” and it
is a process of communicating customers with the sellers from which the customer
will help the business persons to gain knowledge by providing suitable suggestions to
enhance a business and boost up the services between the customer and sellers. This
is the process in which the shopkeepers will raise the questions with their customer
in the form of sentiments or in any other possible ways to interact with them (Fig. 3).
(B) Need of CHAIN
“Market segmentation is one of the most fundamental strategic planning and
marketing concepts, wherein the grouping of people is done under different categories
such as the keenness, purchasing capability, and the interest to buy” [9]. According
to the market analysis, it shows that nowadays customers are avoiding approaching
756 P. Matta et al.

Fig. 3 Data analytics process

the local market and trying to less interact with the local vendors. This is because
of having changes in our lifestyle and having attraction toward the platform which
is not social. “OLC establishes an organizational culture where the existing mental
models regarding the collection, retention, and utilization of customer knowledge
are gradually replaced with new ones to better exploit market opportunities which
translate customer needs into value-added offerings” [10].
According to the survey, the shopkeepers have suffered a lot due to e-commerce not
only because of cheaper prices of products but also not having the proper interaction
between customers and shopkeepers which causes a distance between them. By the
process of “CHAIN,” it is possible to challenge and target other platforms easily. By
having healthy communication, an information network between them will also help
in enhancing business services and by providing a product at effective cost.
(C) Performance of CHAIN
“Today is the era of loyalty such as customer loyalty, employee loyalty, manage-
ment loyalty, and loyalty to the principles, ideals, and beliefs. Several studies have
shown that satisfaction is not the key to ultimate success and profitability” [11].
With the help of e-services (digitally) or can be non-digital, shopkeepers will easily
approach toward the customer with providing weekly suggestion assessment for the
customer, who will help the shopkeeper to gain fruitful approaches and aid to enhance
their business and also help in maintaining relations with customer, and thus it will
form a customer network.
Examples of e-services are by providing sentiment assessment for their customers
through contact numbers, creating an effective Web site of their respective stores and
CHAIN: A Naive Approach of Data Analysis to Enhance Market Power 757

Fig. 4 Proposed model of CHAIN process

by providing a chatbot system, providing effective services for all products and a
proper review system for each product, and also ERP system which attracts the
customer by getting any kind of information regarding purchased goods.
These factors will lead to rising of CHAIN’s process. “These findings prove the
findings given by Brown and Gulycz (2001) and Chen (2008), who recommended that
satisfied customers are more inclined toward retaining a relationship with existing
companies and positive repurchase intentions in the future” [12].
(D) Proposed Model of Methodology
Figure 4 depicts the relation between customer and seller in which seller will
provide a suitable interface which contains the reviews and sentiments for their
customers to communicate with them and later on customers will advise sellers about
their product and share the appropriate reviews with them, and at last vendor will
come up with the best outcome and share it with them accordingly. Thus, it will form
a CHAIN to enhance the business management system and customer relationship
management.
The shopkeeper will have a Web site and application software that consists of
a good support system design through which they can interact with the end-users.
Each customer will have to register and to create a user id and password through
which they can login and communicate with shopkeepers; there will be a choice for
customers that they want their information to be public or private. This feature will
keep customer’s data safe, confidential, and away from duplication.
And if any kinds of comments or information either positive or negative are there,
the rights are given only to the admin or a shopkeeper to ignore or block the false
information.
758 P. Matta et al.

6 Market Analysis Through Case Study

According to Sgier [13], “Big data analytics in Logistics and Supply Chain Manage-
ment (LSCM) has received increasing attention because of its complexity and
the prominent role of LSCM in improving the overall business performance.”
According to Shafiei [14], “Reliable field data and well-documented field trials are
the foundation of developing statistical screening tools.”
According to Marjani [15], “Moreover, big data analytics aims to immediately
extract knowledgeable information using data mining techniques that help in making
predictions, identifying recent trends, finding hidden information, and making deci-
sions.” According to Nastic [16], “detecting patterns in large amounts of historic
data requires analytics techniques that depend on cloud storage and processing capa-
bilities.” After going through different viewpoints of researchers and practitioners,
concludes the following research. The study can be made on any kind of computing
device, for example, desktop PC, laptops, smartphones, and others of the same cadre.
The investigation involves a thorough market analysis of IT products of different
companies that have been sold in Indian markets in the 2nd quarter of the year
2019 (July–December). The collected data from the local market reached all the IT
product exclusive stores about trending sales. After the analysis, there has been a
decline in market sales and an increase in e-commerce sales by 60–70%. And as a
whole, there is a threefold decrease in the overall IT hardware sales in India. The
data collected from the smartphone sellers depicts that there is a huge downfall in
sales of mobile phones and its accessories in small local markets. Thus, an analysis is
performed due to less interaction between customers and a shopkeeper is the major
factor in the downfall of IT sales in the local market. The lack of services provided
by the shopkeeper also leads to loss of their market. A survey is conducted for the
shopkeepers regarding the market strategies. Some of those questions are as follows:
Criteria 1: If you want to buy any digital or IT product, will you purchase that from
the local market or online market (Flipkart, Amazon, any other online platform)?
Criteria 2: Which platform among these two you feel more secure and safe?
Criteria 3: If the local market gives you the facility, not only of item purchase but
also information sharing, accepting customer requests and ideas, then which platform
would you like to choose?
The result of that survey is depicted in the graphical forms.
After analyzing Fig. 5, it is observed that half of the customers are attracted to the
online platform which is the basic reason for the downfall of the local market. If the
shopkeeper starts communicating with customers and provides the best services to
their customers, then customers will regain the local market and it may enhance the
economy of the local market especially of IT products and there will be a healthy
circulation of money in the market again.
After analyzing Fig. 6, it is observed that a lesser number of customers are attracted
to online platforms, as they feel insecure regarding the quality of the product.
CHAIN: A Naive Approach of Data Analysis to Enhance Market Power 759

Fig. 5 Pie chart on the basis


of criteria 1

Fig. 6 Pie chart on the basis


of criteria 2

Fig. 7 Pie chart on the basis


of criteria 3

Figure 7 shows 60–65% of customers will back to the local market if the best
services are provided by the sellers and according to the previous results, this time
there will be an increase of sales in the market by 10–15% which will boost up the
local market economy.

7 Result Analysis

“Based on the theoretical and the reality of what happened, there is still a gap of
research on the influence of customer satisfaction on customer trust and loyalty. The
key problem in this research is questioning the variable customer satisfaction and
760 P. Matta et al.

trust influence customer loyalty and the role of customer trust as a mediating variable
in the BRI Kendari of Southeast Sulawesi province” [17]. The customers now have
a variety of options to purchase products because of rapid globalization and growing
competition. They can easily compare products or even switch their platform which
is the reason behind of downfall for retailers.
Thus, to collect information from consumers there will be a “CHAIN” process
which can be digital or non-digital will help lots of shopkeepers and customers to
maintain a long-term relationship between them and thus there will be a safe, secure,
and good product services in local markets.
Not only “CHAIN” process applies to the IT sector but for the other remaining
sectors “The CHAIN” methodology is a requirement. “Through customer collabo-
ration, organizations learn, meet customer requirements better, and improve perfor-
mance (Prahalad and Ramaswamy 2004). Customers offer a wide base of skills,
sophistication, and interests and represent an often-untapped source of knowledge”
[18].

8 Comparison with Existing Techniques

However, this can be resolved through the CHAIN process which can be an
operational process to enhance the growth of the business and make them more
profitable.
CHAIN is quite similar to customer service and support (CSS), but CHAIN is an
advance feature in which it will take customer’s help and business-related information
for better outcome of business and by providing quick, convenient, and consistent
services to its customers by interacting them with the help of e-services directly to
the shopkeepers without having any conciliator.

9 Conclusıon

Data analysis and mining are some of the most important and most promising aspects
in the field of information technology for the integration of the businesses, vigorous
decision making. In this paper, the process of data analysis is explained by employing
various techniques. The outline is significantly related to data analysis and its chal-
lenges. Furthermore, an overview of the ETL process can be applied to both the public
and private sectors for forecasting, making crucial decisions in finance, marketing,
sales, etc.
The introduction of the CHAIN process will be the most effective technique in
the field of analysis and mining. It will boost up the economy of the retail market in
the coming years. Lots of people will interact with the retailers digitally, and thus it
will also help in the growth of the digital sector which will be more safe and secure.
People will also learn to be more social.
CHAIN: A Naive Approach of Data Analysis to Enhance Market Power 761

The outcome of data analytics is one of the major requirements of the industry
nowadays. All industries, entrepreneurs, and other organizations have recognized the
significance of data analytics to improve their throughput, to enhance their profits,
and to increase their efficiencies.
One can easily understand the criticality of efficient and better data analytics for
the proper growth of any kind of industry, business, or organization. It also provides
the speed and accuracy to business decisions and also maximizes the conversion
rates. In data analytics, it is a great career opportunity and has a good future in a
thriving era.

References

1. Gandomi A, Haider M (2015) Beyond the hype: Big data concepts, methods, and analytics. Int
J Inf Manage 35(2):137–144
2. Steve K, Mathews T (2017) Earth science data analytics: definitions, techniques and skills.
Data Sci J 16
3. Lohiya, Akshay S et al (2017) Optimize ETL for banking DDS: Data Refinement Using ETL
process for banking detail data store (DDS). Imperial J Interdiscip Res (IJIR) 3:1839
4. Ramzan T et al (2016) A multi-agent framework for data extraction, transformation and loading
in data warehouse. Int J Adv Comput Sci Appl 7(11):351–354
5. Giorgini P, Rizzi S, Garzetti M (2008) GRAnD: A goal-oriented approach to requirement
analysis in data warehouses. Decis Support Syst 45(1):4–21
6. Rygielski C, Wang J-C, Yen DC (2002) Data mining techniques for customer relationship
management. Technol Soc 24(4):483–502
7. Darshan D, Sahu S, Sinha PK (2007) Role of dynamic capability and information technology
in customer relationship management: a study of Indian companies. Vikalpa 32(4):45–62
8. Conrad T, Kim H (2011) Predicting emerging product design trend by mining publicly avail-
able customer review data. In DS 68–6: proceedings of the 18th international conference on
engineering design (ICED 11), impacting society through Engineering Design, vol 6, Design
Information and Knowledge, Lyngby/Copenhagen, Denmark
9. Kashwan, Kishana R, Velu CM (2013) Customer segmentation using clustering and data mining
techniques. Int J Comput Theory Eng 5(6):856
10. Ali Ekber A, et al (2014) Bridging organizational learning capability and firm performance
through customer relationship management. Procedia Soc Behav Sci 150:531–540
11. Ali K, et al (2013) Impact of brand identity on customer loyalty and word of mouth commu-
nications, considering mediating role of customer satisfaction and brand commitment. (Case
study: customers of Mellat Bank in Kermanshah). Int J Acad Res Econ Manage Sci 2(4)
12. Ishfaq A , et al (2010) A mediation of customer satisfaction relationship between service
quality and repurchase intentions for the telecom sector in Pakistan: A case study of university
students. African J Bus Manage 4(16):3457
13. Sgier L (2017) Discourse analysis. Qual Res Psychol 3(2):77–101
14. Ali S, et al (2018) Data analytics techniques for performance prediction of steamflooding in
naturally fractured carbonate reservoirs. Energies 11(2):292
15. Mohsen M, et al (2017) Big IoT data analytics: architecture, opportunities, and open research
challenges. IEEE Access 5:5247–5261 (2017)
16. Stefan N, et al (2017) A serverless real-time data analytics platform for edge computing. IEEE
Int Comput 21(4):64–71
762 P. Matta et al.

17. Madjid R (2013) Customer trust as relationship mediation between customer satisfaction and
loyalty at Bank Rakyat Indonesia (BRI) Southeast Sulawesi. Int J Eng Sci 2(5):48–60
18. Vera B, Lievens A (2008) Managing innovation through customer coproduced knowledge in
electronic services: an exploratory study. J Acad Market Sci 36(1):138–151
Behavioural Scoring Based on Social
Activity and Financial Analytics

Anmol Gupta, Sanidhya Pandey, Harsh Krishna, Subham Pramanik,


and P. Gouthaman

Abstract Credit scoring is probably the most seasoned utilization of an examina-


tion. Considering a couple of years, an enormous number of complex characterization
strategies have been created to support the measurable execution of credit scoring
models. Rather than concentrating on credit scoring, this project relies on alterna-
tive data sources to support and implement other factors to identify the characteris-
tics of an individual. This work identifies unique factors through a person’s online
presence and the financial record to provide them a unique score that signifies an
individual’s behaviour. The proposal demonstrates how a person’s online activity on
social media sites, like Facebook and Twitter determine the character and behaviour
of the person. Some factors that are included for the social scoring are types of posts
shared, comments added, posts posted, pages followed and liked. These data are
plotted against a graph signifying the time to obtain a social score. There is a finan-
cial scoring model that will determine the person’s financial fitness and likelihood
to engage in criminal activities due to financial deformity. Combining both social
scoring and financial scoring at a specific weight will provide with a behavioural
score. This score will classify the subjects and help determine good citizens among
the rest. Subjects with higher behavioural scores predict the promise and practice
of a good citizen. This can be used to engage and provide added incentives to good
citizens in order to promote good citizenship.

A. Gupta · S. Pandey · H. Krishna · S. Pramanik · P. Gouthaman (B)


Department of Information Technology, School of Computing, SRM Institute of Science and
Technology, Kattankulathur, Chennai, India
e-mail: gouthamanps@gmail.com
A. Gupta
e-mail: anmol2feb1999@gmail.com
S. Pandey
e-mail: sanidhya099@gmail.com
H. Krishna
e-mail: harshkrish11@gmail.com
S. Pramanik
e-mail: subhampramanik@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 763
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_56
764 A. Gupta et al.

Keywords Social activity · Financial analytics · Credit scoring · Behavioural


score · Sentiment analysis

1 Introduction

This work aims to specialize in the development of a scientific and methodological


approach to see a scoring mechanism to assess and score human behaviour supported
by online social media activity and financial activity. The scores will help reward
people with good behaviour and hence promote good citizenship. This proposed
work will provide a social score to individuals based on various factors. Personal
factors include sincerity, honesty and integrity, which are key determinants of an
individual’s personality. Financial factors include a person’s spending on money,
timely payment of loans and EMIs. The financial aspects are mainly based on the
banking details of the person’s account statements and loan history.
The main innovation of this work is that each of the citizens is going to be given
a score measuring their sincerity, honesty and integrity which this score will then be
a serious determinant for the life of an individual, for example, whether to be able
to get credit, rent a flat, or buy a ticket, or having preferential access to hospitals,
various services of government and universities. Social credit score will consider a
variety of factors that will go beyond just their online activity, behaviour, financial
decisions and spending. Considering these factors, it will reveal more than just that
as well as the base root of this work comes from the credit scoring model, where
companies are rated and scored based on of their creditworthiness. It evolved over
many stages, from individual people having their credit score, which are determined
by the financial condition of the place that they are living in concerning global
finance value. It evolved furthermore as the Chinese credit scoring system came into
the forefront. In the 1960s, the emphasis was laid on the growth of an ‘information
infrastructure in finance’. It proposed a constant progressive growth towards the use
of statistical methods to avoid risk and to calculate creditworthiness.
Moving towards the twentieth century, it grew into considering individual factors
like a person’s honesty, integrity, sincerity and giving them individual scores. Before
this, blacklisting and white listing obtained, which was a means to punish people. The
primary aim of this work is to reward not to punish. Advancing towards the twenty-
first century, a proposition of a ‘scored society’ came to the force. This work has been
inspired by a similar approach where a scoring system will score the individuals on
their social media behaviour and their financial propositions. Some incidents related
to this occurred in 2018 and 2019. In 2018, some people were denied rail and airline
tickets in China as a result of fraud and poor behaviour. In 2019, restrictions were
imposed on people leading to a change in people’s behaviour. Technical integration
needed to be done to implement this on a global level. Systems were there to give
you a financial score or a social score. This work has evolved from the thought of
merging both and giving the people a suitable model that will promote traditional
moral values. This work also identifies unique factors through a person’s online
Behavioural Scoring Based on Social Activity and Financial … 765

presence and financial record to provide with a score that signifies the behaviour of
an individual.
Furthermore, the work demonstrates how a person’s online activity on social
media sites such as Facebook and Twitter determines the nature and behaviour of the
person. Some factors that are included for the social scoring are types of posts shared,
comments added, posts posted, pages followed and liked. These data are plotted
against a graph signifying the time to obtain a social score. There is a financial scoring
model that will determine the person’s financial fitness and likelihood to engage in
criminal activities due to financial deformity. Combining both social scoring and
financial scoring at a specific weight will provide the behavioural score. This score
will classify the subjects and help determine good citizens among the rest. This can
be used to engage and provide added incentives to good citizens in order to promote
good citizenship and behaviour.
The problem statement is to focus on the development of a scientific and method-
ological approach to determine a scoring mechanism to assess and score human
behaviour based on online social media activity and financial activity. Credit scoring
provides the banks with a base to determine whether the borrower or lender can
be trusted to give or grant loans. It will help and filter out trusted and non-trusted
sources. Big data sources are utilized to enhance statistical and economic models so
as to improve their performance. Using this data, it will provide more information
on the consumer which will help companies in making better decisions.
There has always been a system to punish people for the mistakes, and it does not
necessarily set the right example for others. Rather, building a system that will reward
the people for their good behaviour and financial expenditures. It would accomplish
two things, firstly it will inspire people to be well behaved on social media, and
secondly, t helps to spend the money wisely. After the availability of required data,
the system cleanses and analyses on the basis of various attributes, and later sentiment
analysis will be performed to develop the sentiment of the data. Once this is done for
all the data attributes, a normalized behavioural score will be generated. The system
moves on to financial scoring and the user required to provide data like monthly
income, debt ratio, number of times defaulted, number of dependents, number of
open lines of credits, number of secured and unsecured lines of income, age and so
on.

2 Literature Survey

The financial capacity of the state and the statistical approach is to assess it. [1] illus-
trates the event of specialized approaches to find the amount and use of an economic
scope to find current patterns of the variations for development. This research is
necessary because of the demand to authenticate the best national security level,
dubious methods to spot its modules, and therefore must select the direction for the
development of the state. The sole specialized method to assess the state’s financial
ability is incomplete, hence making it really hard to prepare steps for improving
766 A. Gupta et al.

the management of its components. This approach calls for comparing the values of
creation and use of the statistical data provided by the authorities like ‘the ratio of
the deficit/surplus of the state’s budget to Gross Domestic Production (GDP)’, etc.
[2]. This has proposed a scientific method for an inclusive analysis of the financial
potential of the state.
An individual’s online behaviour can be precisely be explained by personality
traits [3]. Therefore, understanding the personality of an individual helps in the
forecast of behaviours and choices. Several types of research have shown that the
user’s answers to a personality examination can help in predicting their behaviour. It
can be helpful in many different spheres of life from punctuality, job performance,
drug use, etc. [4]. The study was mainly focused on Facebook profiles of the users
as everyone uses Facebook to share about their life events That Facebook profile
features, For the purpose of the study, Facebook data of 32,500 US citizens were
acquired who had been active on Facebook for the past 24 months or more. The
data included the friend’s list of the user, events that are posted, regular status,
posts updates, images and groups. For analysis, the features were arranged according
to ‘number of months’ since when the user had joined Facebook. A simple linear
regression method was used for this purpose. Correlating the personality of the user
with that of Facebook profile features, ‘Spearman’s rank correlation’ was used to
calculate the correlations of the features. All these were tested using t-distribution
tests with significant at < 0.01 level. Openness—The people who are eager to have
experience and are free tend to like more Facebook pages and have a higher number
of status updates and are in more chatting groups. Conscientious users either join
few groups or no groups at all and also their use of like is less frequent. The research
reveals that the average number of likes made by the most conscientious users is
higher than 40% likes of the most spontaneous people. Extroverts mean the individual
will interact more with others on Facebook. These people like to share what all events
they are attending, events that are happening in their life and like to come in contact
with more and more users.
Neuroticism and Facebook likes have a positive correlation, showing the
emotional users using ‘like’ features more. It has been found that 75% of normal
users like fewer than 150 pages or posts, but emotional people use the like feature
for 250 times or more [5]. The study is limited to a small or moderate population
who volunteered to report their behaviours though the research paper limits to only
US-based Facebook users. Another issue is that Facebook users can be very selective
in liking the pages which can defer from their personality.
The credit scoring system has been the main support system for the bankers,
lenders, NBFCs, etc., to perform statistical analysis about the creditworthiness of
the borrower [6]. The lenders thus based on this credit score decide whether to grant
credit to the borrower or not [7]. The article proposes a unique improvement in the
traditional credit scoring model by the inclusion of mobile phone data and social
network analysis results. Also, the article proves that adding call data records (CDR)
adds value to the model in terms of profit. Thus, a unique dataset is made by including
the call data record, account information related to the credit and debit made by the
borrower to make the credit scorecard [8]. Call data networks which are made from
Behavioural Scoring Based on Social Activity and Financial … 767

call data records are used to differentiate the positive credit influencers from the
defaulters. Advanced social networks analytics techniques are used to achieve the
desired output. Hence, the results of the research show that the inclusion of a call data
record increases the efficiency of the traditional credit scoring model. The positive
aspect is that the traditional credit scoring models can be improved by adding call
logs. It is easier to forecast what all features are more essential for the prediction,
both in terms of profit and statistical performance. On the other side, the biggest
limitation is the dataset. It just generalizes the whole scorecard card. The lender
cannot differentiate whether the credit is for micro-loans or mortgages. The credit
scoring model used in the research does not include the behaviour feature which is
also an important parameter for analysing creditworthiness; behavioural data can be
obtained from telecom companies or social media platforms.
The subjective material well-being (S-MWB) covers a wide range of concepts
like financial satisfaction, stress due to financial insecurity, how the government is
handling the economy, etc. It can be a self-reported well-being gathered by a ques-
tionnaire [9]. It may consist of a range of economic issues like how the government is
handling the economy of the country, the rate of essentials, pay and margin benefits
from the job, and livelihood. It can also focus on particular dimensions of the phys-
ical life like contentment with the family financial situation as well as self-financial
situation, if the income is adequate or not, the standard of living is up to the expec-
tation or not, etc. The advantage here is about the study revealing several interesting
facts that may have policy significance, understanding the precursors of subjective
material well-being can help policy-makers to make schemes to improve the sense
of material well-being of the people as well as their quality of life. The drawback is
the comparison by the people done amongst their own country people and not the
other country people. The subjective well-being of a person is very much dependent
on the economy of the country.
Credit rating could be a process to estimate the power of an individual or orga-
nization to fulfil their financial commitments, supporting previous dealings [10]. By
such a process, financial institutions classify borrowers for lending decisions by eval-
uating their financial and/or non-financial performances. This paper focuses on using
social media data to determine an organization’s credibility. It is done so because
many times a case may arise where the financial and non-financial assessments done
on the organization might not be accurate or cannot be trusted. Preferably, there may
be a case where their credit analysers provide false data, so that they can easily get a
loan. In these cases, using social media data can be very fruitful. It also undertakes
financial measures which have been implemented in our project as well. Therefore,
using multiple criteria for the credit rating approach can be very useful for accurate
information. A multiple criteria approach will help to identify the loan taker’s cred-
ibility as a large extent to have various factors to distinguish. This approch not only
tracks their financial and non-financial assets but also their social media data which
can reveal important things about the company’s mind set and behaviour. The bene-
fits are the integration of social media data into credit rating. Analysts are provided
with more interpretation opportunities. The negative side is that the credit ratings
768 A. Gupta et al.

tend to decrease when social media data that is considered. Gathering this much data
might require a lot of permissions which need to be granted.
The work develops models to identify the scores of consumers with or without
social network data and how the scores change the operation [11]. A consumer might
be able to change the behaviour of social media to attain a higher score. This article
deals with the amount of change from the normal score a consumer can get by doing
accordingly. A credit score is required by consumers, so that people can apply for a
loan, to get their lender’s confidence which can extend its consumers credit based on
the score. Therefore, a high credit score is a must for all consumers. Consumers realize
that the score is based on social network data; people will try to be better citizens.
Given that consumers use credit for a variety of undertakings that affect social and
financial mobility, like purchasing a house, starting a business, or obtaining teaching,
credit scores have a substantial impact on access to opportunities and hence on social
inequality among citizens. Until recently, credit score was captivated with debt level,
credit history, and on-time payments as summarized by FICO. But now, an outsized
number of firms depend on network-based data to see consumer creditworthiness.
The positive aspect is the development of models to assess the change in credit
score and usage of network-based measures rather than consumer’s data, whereas
the negative part is developing these models which might be a tedious task, and there
is a possibility of the network provided data being fabricated.
A credit score system is presented in an article for ethical banking, micro-finance
institutions, or certain credit cooperatives [12]. It deals with the evaluation of the
social and financial aspects of the borrower. Social aspects include employment,
health, community impact and education. The financial aspects are mainly based on
banking details of the company, its account statements, loan history and repayment
of debts on time. Based on these financial aspects and social as well, this paper will
help to figure out companies who are actually trustworthy of borrowing or lending
money and hence be fruitful for the bank where the people are associated. On the
one hand, it provides a credit scoring system for ethical banking systems, identifies
responsible leaders, whereas on the other side, there are security issues, consumer’s
banking details which are sensitive information.
The paper tries to show that compulsive buying and ill-being perception acts as
controlling on credit and debit card uses and debts [13]. They both act as a repulsive
force for each other. A person with compulsive buying will tend to spend more and
more money which can lead to debt, whereas a person with a perception of ill-buying
may think of future financial problems and knows that unnecessary buying may lead
to debt will tend to spend money judiciously. There are several studies proving
this theory that compulsive buying encourages debt, while ill-buying perception
discourages debt [14]. But today, materialistic happiness is a dominant factor for
people; hence, mankind tends to spend on materialistic goods. Well, there can be
multiple hypothetical situations which the research paper had assumed like the urge
of getting materialistic goals positively impacts compulsive buying, individuals with
compulsive buying will overuse their credit cards, compulsive buying leads to debt,
responsible use of credit cards have a negative effect on debt, ill-being perception
leads to responsible use of credit cards, and ill-being perception discourages credit
Behavioural Scoring Based on Social Activity and Financial … 769

card debt. The benefit is that the article is not gender biased, and it clearly shows how
a materialistic individual can lead to financial debt easily irrespective of the gender.
The drawback is that the research is unable to analyse an individual’s behaviour at
different time periods, and the credit analysis models lack robustness.

3 Proposed Methodology

The work aims to focus on the improvement of a logical and methodological way to
deal with deciding a scoring mechanism to assess and score human behaviour based
on online social media activity and financial activity. More or less, its principle
development, once completely executed, could be that every user will be given a
certain mark estimating their truthfulness, genuineness and uprightness and that this
score will at that point be a significant determinant for their lives, for example,
regardless of whether to have the option to get credit, lease a house, or purchase a
ticket or being given favoured access to medical clinics, colleges and taxpayer-driven
organizations. This score will focus on thinking about a wide scope of individual
elements. It additionally takes after, yet goes farther then, a scope of frameworks
that are proposed to build the noticeable quality of notoriety with exchanges, online
stages and in the ‘sharing economy’.
Consequently, it is focused here on rating frameworks concerning distinct people.
The social angles attempt to evaluate the advance effect on Millennium Development
Goals, for instance, work, training, condition, well-being or network sway. The social
FICO rating model combines the bank’s expertise and ought to likewise be lucid with
its significance. Scoring alone based on financial aspects may risk the institution to
let a socially bad person get loans and other financial benefits. A socially bad person
may tend to be a defaulter or use financial benefits for unethical purposes. Therefore,
keeping this view in mind, a methodology has been proposed which will score a
person based on both social and financial aspects. The proposed system comprises
four major components, namely user, third-party companies/govt., social media data
pool and financial/bank data. The user is required to register with our system and
connect a social media account of the desired choice. Once the user’s social media
account is connected with this system, the user will be required to provide an ‘access
token’ which the system utilizes to access the required data. After the required data
is available, the system will clean, it will be analysed based on various attributes,
and then, sentiment analysis is being performed to develop the sentiment of the data.
Once this is done for all the data attributes, a normalized behavioural score will be
generated. The system moves on to financial scoring, and the user will be required
to provide data like monthly income, debt ratio, number of times defaulted, number
of wards, amount of open lines of credits, number of secured and unsecured lines
of income, age, etc. The system will make use of machine learning models for both
behavioural and financial scoring. To perform behavioural scoring, the system will
request an external API and use a local model for financial scoring. To send this ML
model as a REST administration, it adopts Flask. Furthermore, the system is using
770 A. Gupta et al.

a WSGI compliant application server along with NGINX Web server. The trained
model is deployed with a Flask. The model can be saved and loaded to make new
predictions on data provided by clients/users.
Considering behavioural scoring, sentiment analysis is executed on the user’s
social data. Sentiment analysis classifies the data based on its sentiment and provides
a positive, negative, or a neutral score and a confidence value which is used to
generate a score. For financial analysis, the above-mentioned attributes are taken into
consideration. Also, a local model is built using a random forest classifier algorithm
to generate the score accordingly. The input dataset consists of social data obtained
from user’s Facebook and other social media accounts using various external APIs
like Facebook Graph API. It is the leading method for applications to analyse and
compare user data. Financial data would be provided by the user at the time of
generation of the score.
Once the user data is available, the retrieval of important information is carried
out to score a user, and unnecessary information can be removed. Parameters and
features decide the availability of data and subject to check reliability. Monkey learn
classifier is utilized to perform sentiment analysis and obtain the sentiment of the user.
Score aggregation and normalization involve combining of both the behavioural score
and financial score. Also, there are various techniques like the weight of evidence
(WOE) and information value (IV) which are applied. This will make the system
more reliable and efficient. Additional ranking based on time dependency is also
performed. The final scores are intimidated to the user with the scoring benchmark

Fig. 1 Architecture of the proposed system


Behavioural Scoring Based on Social Activity and Financial … 771

and reference. The behavioural and the financial score will allot a final score to the
user.

4 Empirical Analysis

4.1 Behavioural Scoring

The objective of this module is to classify and provide a score to a person based
on their social media activity. Behavioural scoring involves the collection of user
data from various social accounts, analysis of user data and final generation of the
score. Data is obtained from Facebook and Instagram using Graph API. This is done
by generating access tokens with required permissions. Inspection of parameters
like the post, quotes, likes and feed data is done. Behavioural scoring module uses
the user’s social data to perform sentiment analysis. The analysis is performed on
all the above parameters. The system will use an external API, MonkeyLearn and
compute the sentiment of each parameter. This provides the system with sentiment
value and confidence. The system performs weight of evidence (WOE) and allots
rank weighted by time for parameters related to like posts and likes because these
are influenced by time and dynamic appearances.
Graph API provides various functionalities for applications to read and write
the Facebook community-based diagram. The API’s structure is made up of nodes,
edges and fields. Nodes are singular objects like user, picture and group. Edges are
the connections between the nodes. In simple words, the link between a group of
objects and a single object is an edge. Fields provide information regarding an object
like general information of a person. So the nodes are used to fetch metadata about
a particular object which are individual users in our system, use the connections
between nodes (Edges) to fetch groups of entities on an individual object, and use
fields to retrieve individual user’s general information which will be used as scoring
parameters to generate a score for the individual user. Graph API is HTTP-based
which makes it compatible with any language that has an HTTP library. This allows
the Graph API directly with the proposed system once the access tokens are provided
by the user. Also, field parameters can be included in the nodes (individual users)
and describe which categories or areas that can be sent back with the response. An
immediate check of the admin node reference shows that one of the categories that
can be fetched when accessing admins entity in the name field, which is the name
of the admin. Nodes are singular objects, each with a distinct ID, to get information
about a node that directly queries its ID.
In regards to MonkeyLearn API for sentiment analysis, an external API is applied
to perform sentiment analysis. This assists in categorizing and finding utilitarian
metadata from raw texts like electronic mails, online chats, and other media resources
like Web pages, online documents, tweets and more. Also, the content can be cate-
gorized with formal groups or bins like emotion or subject, and extricate any specific
772 A. Gupta et al.

data like establishments or watchwords. MonkeyLearn necessitates that are validated


by sending an API key with each solicitation to grant permission to access the API.
Parameters refer to the attributes that the model will take to get efficient and
concise results. A lot of data about a person can be obtained just by looking at
their social feed like personal information which may include name, date of birth,
places lived, interests, etc., but a lot of this cannot be practically used to generate a
score. There are four parameters that are taken into consideration, namely posts, feed,
likes and quotes which will be used for scoring. Posts refer to the updates which are
posted by the user on their wall. They can be textual or pictorial. The feed continually
refreshes the rundown of stories on a client’s viewing page. It consists of notices,
photographs, recordings, joins, application action, likes from individuals, pages and
gatherings that they follow on Facebook. Likes are the list of content that is liked
by the user. This may include posts, the activity of friends, people and groups they
follow. Quotes are posted by people and be found on a Facebook profile page when
scrolled down to the very bottom. Here, users are able to add their favourite quotes for
all the viewers and are visible (by default) on their public profile. The model chooses
the parameters with the most information and ignores the rest. Quotes posted by the
person are not very time-dependent, and feed is dynamic; hence, sentimental analysis
is directly applied to it. Likes and posts show the current mood of the person. So,
both time and senti-value (obtained after sentimental analysis) for likes and posts
parameters.
The Graph API acquires data from the user. The data will include the pre-decided
parameters, quotes, posts, likes and feed. This is done using access tokens. An access
token is an obscure string that distinguishes a client, application, or page and can
be utilized by the application to make API calls. At the point when an individual
associates with an application utilizing Facebook login and affirms the solicitation
for consents, the application gets an access token that gives transitory, secure access
to Facebook APIs. Access tokens are obtained by means of a few techniques. The
token incorporates data about when the token will terminate and which application
produced the token. Due to protection checks, most of the API’s approaches Facebook
need to incorporate an access token. There are various sorts of access tokens to help
distinctive use cases like entity access token, customer access token, app access token
and page access token. User access token is utilized in this proposed work. This sort
of access token is required whenever the application calls an API to peruse, alter
or compose a particular individual’s Facebook information for their benefit. Client
access tokens are by and large acquired by means of a login discourse and require an
individual to allow the application to get one. The users are responsible to generate
the access token. This is very simple and can be easily done using Graph API explorer.
The users are able to connect the Facebook account with the Graph API, approve
necessary permissions to access data and generate the user access token (Fig. 2).
Now, the user data is available, and on this, sentiment analysis is performed. But
prior, data is cleaned, removal of possible noise, columns checked for NaN values
are removed. Once this is done, it is ready for sentimental analysis which is also
known as opinion mining and emotion AI. Opinion investigation comprises natural
language processing (NLP), content examination and biometrics to methodically
Behavioural Scoring Based on Social Activity and Financial … 773

Fig. 2 Generation of user


access token

discern, separate, evaluate and study states and abstract data. An essential errand
in assessment investigation is about ordering the extremity of a given book at the
archive, sentence, or highlight/perspective level—regardless of whether they can
communicate conclusion in a record, a sentence or an element include/viewpoint
is sure, negative, or unbiased. Next, the access token is utilized and the model ID
to call the MonkeyLearn API. Further, each column is iterated, and then, data is
classified as negative or positive. An attribute called ‘confidence’ is obtained. These
two attributes (‘senti-value’, ‘confidence’) with ‘ID’ are added in the CSV file which
was obtained in the second step as new attribute columns with their values. These
steps are involved to call the MonkeyLearn API that will initiate a post request. The
endpoint expects a JSON body. It should be an object with the data property and the
list of the data (texts) which need to be classified. The response consists of a list of
all the data with their response that is negative, positive, or neutral, and a confidence
value if the API call is successful.
Next is identifying the weight of evidence and assigning them to parameters.
The weight of evidence (WoE) provides the functionalities to re-engineer the values
in continual and unconditional forecasting of the variables into individual boxes
on its own and finally assign to every individual box category a distinct weight of
evidence value. As different users will have different parameters, it will be used
to allocate the score. WoE provides weight based on the priority and usefulness of
the parameters. This weight is assigned to the parameters. The parameters with the
highest weights are chosen for sentimental analysis. WoE can be used to compute
774 A. Gupta et al.

the ‘strength’ of a marshal in order to uncouple positive and negative default. It can
also be written as the ratio of spread of positives / spread of negatives, where spread
refers to the proportion of true and false values in the distinct bins of the total amount
of positives and negatives. Mathematically, the ‘weight of evidence (WOE)’ value
for some number of observations can be computed as:

WOE = [ln (Distr Goods/Distr Bads)] ∗ 100 (1)

The amount of WoE will be zero if the likeliness of spread of positives / spread
of negatives is equivalent to one. If the distribution negatives or badin, a bin is more
than the distribution positives or goods, the probability will be lesser than one, and
the WoE will be a negative number; if the number of positives is greater than the
negatives in a group, the WoE merit will be a definite (>0) number. From all the above-
extracted features, the best features are identified which are related to differentiate
the various leaf diseases.

4.2 Financial Scoring

The objective of this module is to run an analysis on a person’s financial history


and generate a score based on their financial activity. To do this, a model is created
and financial data from various online sources are used to train it. There are various
attributes and values which are taken into consideration, like user’s income, number
of loans, debt ratio, number of family members, etc. The financial details provided by
the users and the values that are passed in the model are to generate a score. Random
forest classifier is utilized to develop the model. Credit plays a very important role
in any economy and is always required by companies and individuals, so that the
markets can effectively work. Hence, it is very important to devise sophisticated
methods to understand whether an individual can be provided credit by forecasting
the probability of default. A financial score or a credit score which is also popularly
known as CIBIL is already used by many companies and financial institutions. It
is used to control whether a loan should be accorded or not. Different institutions
have different attributes and factors which they take into consideration to generate a
score. The proposed model will focus on empirical data provided by the user mainly
focusing on factors that assist third-party actors and the government to understand
if a person preferably experiences financial problems in the coming two years. In
addition, a dataset from kaggle is used to train and test the model. This dataset
contains information of around 300,000 users. This leads the parties to make an
informed decision regarding the reliability of a user hence making sound financial
decisions. A random forest classifier is used to classify the given data into bins or
classes by using a large number of decision trees. It uses capturing and features
stochasticity when building every tree in order to create an unrelated forest of trees
whose forecast by board is more precise than that of any discrete tree. To make
correct decisions using random forest, features are necessary that can be used to
Behavioural Scoring Based on Social Activity and Financial … 775

get at least some insights and forecasting power. It also becomes really important
that the decision trees as well as the forecasting made by them are unrelated or
at least have very low levels of degree of similarity. While the algorithm itself via
feature haphazardness tries to execute the lower degree of relations for us, the features
selected and the final parameters decided will ultimately impact the relations as well.
The two main reasons of utilizing random forest are follows. The predict_proba
function of the random forest classifier can be used directly on the output to get a
probability value in a range from zero to one. Another reason is that it is extremely
effortless to change the output of a random forest classifier to a simpler binary
classification problem which would further ease computation.
In regards to dataset description, the attributes which are taken into consideration
are as follows:
• Age: The age in years of the borrower.
• Debt ratio: The debt ratio is defined total costs incurred by the borrower in a month
like living costs, payment of monthly EMI or any other debt payments divided by
their net gross income of a month.
• Monthly income: The gross monthly income of the borrower.
• Number of dependents: Total number of dependents in the family including
parents, children, wife, etc.
• The total number of unsecured lines of income: This may include personal lines
of loans, borrowed credit from friends or family, credit card balances, etc.
• The total number of secured lines of income: Secured lines of income refers to
real estate income, business or service income.
• Defaulted in first 30 days: Amount of times the debtor failed to pay in the first
thirty days.
• Defaulted between 30–59 days: Amount of times the debtor failed to pay between
30 to 59 days.
• Defaulted between 60–89 days: Amount of times the debtor failed to pay between
60 to 89 days.
• Defaulted after 90 days: Amount of times the debtor failed to pay after 90 days.
• Total number of loans: Total number of loans taken by the borrower.
For model development in detecting outliers, the outliers in statistics can be
thought of as data points as it differs greatly from the remaining part of the data.
Outliers are abnormalities in the data, but it is imperative to understand the nature of
the outliers. It is essential to dropping them only if it is clear that they are incorrectly
entered or data not properly measured, so that removing them would not change the
result. To detect the outliers, the interquartile range (IQR) method is applied. These
are a set of mathematical formulae applied to retrieve the set of outlier data. IQR is
defined as the interquartile range (IQR) which is the midpoint or centre half that is
a share of quantifiable dissipation being equal to the difference between the ranges
of seventy-fifth and twenty-fifth percentiles.

IQR = Q3−Q1 (2)


776 A. Gupta et al.

At the end of the day, the IQR is the subtraction of the lower quartile from the third
quartile that is also known as the upper quartile. These quartiles can be observed by
plotting them on a case plot. It is a proportion of the scattering like standard deviation
or fluctuation, yet is considerably more powerful against exceptions. The indexes of
outliers are appended, and the entries removed are from the dataset. The dataset
cleansing is the process of removing data which is unfit for the training process.
This may include NaN values present in the dataset. A series of python functions
are utilized to perform the same. The essential one being functions such as qcut
and get_dummies. Qcut is defined in the python documentation as ‘quantile-based
discretization function’. This means that this function will divide up the original
or the fundamental data entities into similar-sized boxes. This function defines the
boxes using percentiles based on the dispersion of the available data rather than the
true numeric boundaries of the boxes. The values which are greater than six will
be bonded together as the standard deviation of chosen data is extremely high. All
the NaN values are present in the chosen dataset with the median of the column.
Get_dummies is a python functionality which is used to convert categorical vari-
ables into dummy/indicator variables. When this function is applied to a column of
categories where there is one category per observation. It produces a new column for
each unique categorical value. The value one is placed in the column corresponding
to the categorical value present for that observation. When the number of values is
increased, the accuracy and the efficiency of the model are also improved when the
random forest is used in further processes. After this, a final check is performed on
the dataset to check for any NaN values present in the dataset. For model creation,
the dataset is divided as testing and training data, and the target value is separated
from the trained features.
Moving on with the model fitting and accuracy aspects, a random forest classifier
is used for creating and fitting the model. A confusion matrix will be used to generate
the accuracy of the model. The confusion matrix is used to get an understanding of
the working and execution of a classification pattern on certain testing data for which
the label values are known. It allows envisaging the working of an algorithm. The
specifics of the accuracy of the model can be determined using a confusion matrix,
such as absolute accuracy, responsiveness, reactiveness and so on. These measures
assist to determine whether to accept the model or not. Taking into account the cost
of the errors is an imperative part of the decision whether to accept or reject the
model. After this, accuracy can be calculated. The accuracy of the proposed model
came out to be 80.78%.

Accuracy = Elements classified correctly / Total Elements (3)

The next step is the generation of scores. The model is loaded, and the data
values provided by the user are passed into the model to generate scores. But first,
the model is dumped in a package. This is done using the Joblib library. Joblib
provides a quick and strong mechanism especially for bigger amounts of data and
operations like streamlining of ND arrays. After loading the model, the predict_proba
function is used to generate the scores. This is an extremely significant function. The
Behavioural Scoring Based on Social Activity and Financial … 777

Fig. 3 Weighted graph

predict_proba gives you the probabilities for the target in array form. The amount
of probabilities for every row is taken out and is equal to the length of the total
categories. It gives the value of the log-probability for all the features of the model,
where features are ordered as they are in classes. The predict_proba (SELF, M) [SRC]
estimates the final returned values for each class and is ordered by the index of the
features.
The mathematical logic applied to normalize and couple the scores. There are
two aspects of the system namely financial and behavioural where equal weights are
given to both behavioural as well as financial. Another approach is to do a weighted
average (Fig. 3).
To normalize the score accurately there are certain cases that need to be handled.
For instance, there can be cases like unavailability of financial or behavioural data,
categorization, ranking based on score and so on. In that case, there may be a necessity
to convert the scoring parameters. To address these issues, below are a set of problems
and solutions:
• Unavailability of behavioural data, hence no generation of the behavioural score.
• Similarly, in case of unavailability of financial data, the financial score cannot
be generated. So there can be a requirement to perform scoring on financial or
behavioural data alone.
• As the financial model generates the default probability, is it possible to transform
it into a financial scoring metric?
• How to categorize users based on score and to justify the lack of data, if any of
the above cases are encountered?
• To handle these cases and maintain the effectiveness of the system, there are
following possibilities:
• As there is a calculation about the chances of a user defaulting for a certain
number of days or the default probability, the probability a user does not default
can be simply calculated, 1—probability (user defaults) therefore, the probability
of good behaviour.
778 A. Gupta et al.

• There is not enough usable data for behavioural scoring, and yet it is significant
to score them.
In order to understand this clearly, hypothetically when there is non-availability
of neither behavioural nor financial data, then both behavioural and financial score
will be zero, and the final score will be calculated as,
• Score = 1 – 0 + 0 / 2 = 0.5
• The user will be put into a neutral category—neither good nor bad.
Suppose when behavioural data is unavailable, then the user has a behavioural
score of 0 and financial default probability of 0.5 (which will come under a bad
borrower), for this the score will be calculated as,
• Score = 1−0.5 + 0 / 2 = 0.25
• This is a bad score and at the lower end of the spectrum, considerably not an ideal
position to be in.
When there is both financial as well as behavioural data and both the scores are
0.5 each, then the score will be given as,
• Score = 1−0.5 + 0.5 / 2 = 0.5.
• This will again come under good category and hence acceptable as the user has a
good behavioural score but a bad financial score.

If there is an extremely bad case, −0.5 as the behavioural score and a financial
default probability of 1 which means the user will certainly default. In this case, the
score will be calculated as,
• Score = 1–1 + (−0.5) / 2 = -0.25.
• Now, this is extremely bad, and hence, it is extremely concerning.

On this basis, it is decided to categorize the scores as, 0.75–1.0: Excellent, 0.5–
0.75: Good, 0–0.5: Okay and −1.0 to 0: Concerning. Looking at all those cases,
it is understood that with the data obtained and using the different behavioural or
financial score computed, the score can be justified based on this logic. Hence, the
score stays consistent, and all the scores can be justified in every case. One interesting
thing to note over here is that financial score ranges from 0 to 1 (never negative), and
behavioural score ranges from −1 to 1. Therefore, if an equal-weighted average is
taken for both, the score will range in −0.5 to 1.0. In this case, it can be concluded
with ease that a negative aggregate score can be an extremely concerning score.

5 Results and Conclusion

The work demonstrates how behavioural scoring can be used to promote good
behaviour and identify good citizenship among the actors. This can be used to engage
Behavioural Scoring Based on Social Activity and Financial … 779

and provide added incentives to good citizens to encourage good citizenship. The
research work portrays how a person’s online activity on online media sites like Face-
book and Twitter determines the nature and behaviour of the person. Some factors
that are included for the social scoring are types of posts shared, comments added,
posts posted, pages followed and liked. These data are plotted against a graph signi-
fying the time and obtain a social score. There is a financial scoring model that will
determine the person’s financial fitness and likelihood to engage in criminal activities
due to financial deformity. Combining both social scoring and financial scoring at
a specific weight will provide us with a behavioural score. This score will classify
the subjects and help determine good citizens among the rest. This can be used to
engage and provide added incentives to good citizens to enhance good citizenship.
Many compelling avenues are open for future enhancement and exploration such as
only certain specific features have been used to predict the personality of the user,
though there is a wide variety of features that were not explored like a specific type
of group a user is a member of. The user can be selective in liking any page, group,
or public figure. Thus, more sophisticated approaches can be used to overcome this
drawback. The analysis is only done based on the online behaviour of the user. A user
can have different behaviour in the virtual environment and the real environment.
Hence, work can be done in the future to outperform this negative aspect. Another
scope of further improvement can be the study of privacy-safeguard mechanisms to
further enhance and secure online data.

References

1. Vyhovska F, Polchanov N, Aldiwani A, Shukairi K (2019) The methodological approaches


development to assess the creation and use of the financial capacity of the state. Public Munic
Financ 8(1)
2. Agytaevna Adilova M, Meldahanovich Akayev K, Erzhanovna Zhatkanbayeva A, Halelkyzy
Zhumanova A (2015) Problems of financial security and financial stability of the Republic of
Kazakhstan. Mediterr J Soc Sci 6(6)
3. Kosinski T, Bachrach M, Kohli Y, Stillwell P, Graepel D (2014) Manifestations of user person-
ality in website choice and behaviour on online social networks. Mach Learn 95(3):357–380
4. Tupes RC, Christal EE (1992) Recurrent personality factors based on trait ratings. J Pers
60(2):225–251
5. Vazire S, Gosling SD (2004) E-Perceptions: personality impressions based on personal
websites. J Pers Soc Psychol 87(1):123–132
6. Zhou J, Sun H, Fu G, Liu S, Zhou J, Zhou X (2019) A Big Data mining approach of PSO-based
BP neural network for financial risk management with IoT. IEEE Access 7:154035–154043
7. Agarwal V, Dhar R (2014) Editorial—Big Data, Data Science, and Analytics: the opportunity
and challenge for IS research. Inf Syst Res 25(3):443–448
8. Thomas LC (2000) A survey of credit and behavioural scoring: forecasting financial risk of
lending to consumers. Int J Forecast 16(2):149–172
9. Sirgy M (2019) What determines subjective material well-being?
10. Gul I, Kabak S, Topcu O (2018) A multiple criteria credit rating approach utilizing social media
data. Data Knowl Eng 116, 80–99
11. Wei C, Yanhao Y, Van den Bulte P, Dellarocas C (2014) Credit scoring with social network
data. SSRN Electron J 35(2)
780 A. Gupta et al.

12. Gutierrez-Nieto J, Begona SC, Carlos CC (2016) A credit score system for socially responsible
lending. J Bus Ethics 133:691–701
13. Bertran D, Echeverry MP (2019) The role of small debt in a large crisis: credit cards and the
Brazilian recession of 2014
14. Lee L, Qiu GM (2016) A friend like me: modeling network formation in a location-based social
network. SSRN Electron J 33(4):1008–33
An Optimized Method for Segmentation
and Classification of Apple Leaf Diseases
Based on Machine Learning

Shaurya Singh Slathia, Akshat Chhajer, and P. Gouthaman

Abstract Agriculture is a significant portion of the world economy as it gives insur-


ance. In any case, concerning it has been noticed that plants are broadly contaminated
by various sicknesses. This causes tremendous monetary misfortunes in agribusi-
ness throughout the world. The personnel assessment of natural product infections
is a troublesome procedure that is limited by utilizing robotized strategies used to
identify plant sicknesses during the previous stage. During an examination, another
strategy is actualized for apple sicknesses recognizable proof and acknowledgment.
Three pipeline methods followed are initial processing, blemish division and high-
lights removal, and characterization. The initial step is that the plant leaf blemish is
improved by a half and half technique that combines three-dimensional box sepa-
rating, de-relationship, three-dimensional Gaussian channel, and three-dimensional
median channel. From this point forward, the sore areas are fragmented by a solid
connection utilized process. Finally, the shading, shading histogram, and the surface
highlights are removed and combined. The extricated highlights are enhanced by
hereditary calculation and arranged by KNN classifier. The research is carried on the
plant village dataset. The planned system is tried for four sorts of ailment classes.
The grouping results show the advancement of our strategy on the chosen infec-
tions. Also, the great initial-processing method consistently created unmistakable
highlights which later accomplished critical arrangement exactness.

Keywords Plant disease · Machine learning · Image processing · Picture


segmentation · Feature extraction

S. S. Slathia · A. Chhajer · P. Gouthaman (B)


Department of Information Technology, School of Computing, SRM Institute of Science and
Technology, Kattankulathur, Chennai, India
e-mail: gouthamanps@gmail.com
S. S. Slathia
e-mail: slathiashauryasingh@gmail.com
A. Chhajer
e-mail: chhajeraryan@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 781
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_57
782 S. S. Slathia et al.

1 Introduction

The goal of this work is to discover leaf malady recognition and characterization.
Leaf illness identification and characterization are the significant qualities required
for agrarian businesses. As cultivation is a significant piece of the global wealth it
gives sanitation. It has been noticed that plants are widely contaminated by various
ailments. This causes tremendous monetary misfortunes in the horticulture industry
around the globe, and a proficient recognizable proof and acknowledgment of organic
product leaf maladies are a waste and flow challenges in machine vision because of
their significant requisition in farming.
In agribusiness, different sorts of natural product sicknesses exist which influence
the creation and nature of organic products. The vast majority of these maladies
are decided by according to a specialist here based on their indications. It may
be costly because of the inaccessibility of specialists and greater expense. In this
respect, the registering analysts in a joint effort with agribusiness specialists have
proposed numerous calculations for robotized identification of infections in plants
and natural products. The manual examination of natural product infections is a
troublesome procedure which can be limited by utilizing computerized strategies
for recognition of plant maladies at the previous stage. Hence, it is fundamental to
build up a mechanized electronic framework for recognition and arrangement of leaf
side effects at the beginning period. To satisfy the above necessity, a machine-based
picture handling method is suitable for recognizing and grouping the leaf illnesses.
The past framework portrays mechanized leaf infection imperfection discovery
from the pictures. A robotized vision-based framework which comprises of a picture
snatching system and an investigation strategy is utilized for identifying and grouping
the leaf illnesses. The proposed investigation strategy treats malady like unusual
areas and different effects that are identified with the order of leaf infections. In the
imperfection recognition process, a few pictures preparing calculations are utilized
to extricate pictures includes and find deformity’s situations on leaf pictures.
The current framework proposed a color-based division picture preparing calcula-
tion for leaf ailment ID of this work is to specialize in the development of a scientific
and methodological approach to see a scoring mechanism to assess and score human
behavior supported by online social media activity and financial activity. The draw-
back of the existing system is that those strategies are not quicker and adaptive. The
ribs may cause undesirable mistakes in the characterization of leaf diseases. Superior
preprocessing and division are absent. The yielded result is less when contrasted with
our proposed system. The accuracy of the system is low.

2 Literature Survey

The article portrays computerized leaf sickness imperfection recognition from the
pictures. In this paper, a robotized vision-based framework includes a picture
An Optimized Method for Segmentation and Classification … 783

capturing system and an investigation strategy for distinguishing and characterizing


the leaf illnesses. The proposed examination strategy treats infection like anoma-
lous locales and different effects that are identified with the characterization of leaf
maladies. In the deformity discovery process, a few pictures handling calculations are
utilized to remove pictures includes and find imperfection situations on leaf pictures
[1]. This technique proposed a shading-based division picture handling calculation
for leaf malady recognizable proof.
In regards to apple leaf disease identification with a genetic algorithm, this article
depicts computerized leaf infection imperfection discovery from the pictures. The
proposed examination technique treats illnesses like strange districts and different
effects that are identified with the grouping of leaf sickness. This enquires about the
proposed relationship-based highlights determination picture handling calculation
for leaf ailment distinguishing proof [2].
For the detection and classification of citrus diseases, a proposed investigation
technique treats illness like anomalous areas and different effects that are identified
with the grouping of leaf maladies [3]. In the imperfection recognition process,
a few picture handling calculations are utilized to separate pictures includes and
find deformity’s situations on leaf images. The technique that was proposed was an
upgraded weighted division for leaf infection recognizable proof.
To understand a comparative study of leaf disease diagnosis system, this article
elucidates computerized leaf illness imperfection recognition from the pictures. In
this paper, a computerized vision-based framework which comprises of a picture
snatching system and an investigation strategy for recognizing and arranging deserts
on the outside of cowhide texture [4]. This framework proposed the choice of a
surface-based feature for leaf illness ID.
Toward the development of an automated system for crop recognition and classifi-
cation, this paper provides a computerized vision-based framework which comprises
of a picture snatching system and an investigation technique for distinguishing and
ordering the leaf infections [5]. In the deformity recognition process, a few pictures
preparing calculations are utilized to separate pictures include and find imperfec-
tion’s situations on leaf images. The investigated technique proposed a shape-based
element determination for leaf infection recognizable proof.
In regard to recommending, it tentatively assesses a product solution for the
planned location and grouping of plant leaf infections. Investigations of plant
attribute/illness allude to the investigations of outwardly discernible examples of
a specific plant [6]. These days crops face numerous qualities/illnesses. Harmfulness
of the creepy-crawly is one of the significant characteristics/ailments. Bug sprays
are not generally demonstrated productive because bug sprays might be poisonous
to feathered creatures. It additionally harms normal creature natural ways of life. The
going with two phases is incorporated continuously after the division stage. In the
underlying advances, the recognition is based on the green-toned pixels. Next, these
pixels are hidden subject as far as possible regards that are enrolled using Otsu’s tech-
nique, by then those large green pixels are hidden. The other additional development
is that the pixels with zeros red, green, and blue characteristics and the pixels on the
restrictions of the debased pack (object) were completely emptied. The exploratory
784 S. S. Slathia et al.

results show that the proposed technique is a generous procedure for the disclosure
of plant leaves diseases.
The features differentiate in the kind of nonlinear post-taking care of which is
applied to the local force run. The features are operated with the Gabor imperative-
ness, complex minutes, and crushing cell executive features [7]. The capacity of the
relating directors to make specific part vector packs for different surfaces is contem-
plated using two strategies. The Fisher standard is for gathering result connection.
The two methods give consistent results. The pounding cell director gives the best
detachment and division results. The surface distinguishing proof capacities of the
executives and their power to non-surface features are moreover taken a gander at.
The crushing cell chairman is the one specifically that explicitly responds just to
the surface and limits a false response to non-surface features, for instance, object
structures.
Concerning leaf disease detection, the discussion is made on the exploration of
customized leaf disease areas which is an essential research subject as it would exhibit
benefits in checking immense fields of yields and along these lines perceive symptoms
of ailment when it appears on plant leaves [8]. There are the guideline adventures for
contamination acknowledgment of image acquisition, image preprocessing, image
segmentation, feature extraction, and statistical analysis. This proposed work is to
represent the first step in filtering using the center channel and convert the RGB
picture to CIELAB concealing part, then second step is isolation using the k-medoid
method, and finally, the resulting stage is to veil green pixels and remove the disguised
green pixels, later in the following stage, it focuses on texture features statistics,
and this features spread in the neural framework. The neural network depicts the
performance well and could viably perceive and arrange the attempted disease.
Toward plant disease detection, genuine organic fiascos cause incredible harm to
edit creation. The plant infection information of various could be dissected, and later
an estimating report was produced. This exploration was made by normal for the
plant ailment which presents an estimated framework’s structure, and key improve-
ments acknowledgment dependent on information mining’s plant malady framework
[9]. The information mining capacity may isolate into two sorts; they are descrip-
tion and conjecture. The information mining is a sort of profound level information
investigation, it can extricate the mass information from the certain standard infor-
mation, and the profound level’s advancement may additionally improve the data
asset. The restricted information within the information of the executive’s module
information input stockroom is available as indicated by the portrayal. The infor-
mation unearthing endures the accomplishment of the conjecture of the impact. The
framework combines information of the executives to report the structure to produce
and gauge in windows framework.
In regards to detection, pictures structure noteworthy data and information in
common sciences. Plant sicknesses have changed into an issue as it can cause a basic
decline in both quality and measure of provincial things [10]. Modified acknowledg-
ment of plant illnesses is a principal inquiry about a point as it would exhibit benefits
in watching colossal fields of yields, and as such subsequently recognize the signs of
infections when it appears on plant leaves. The proposed structure is an item answer
An Optimized Method for Segmentation and Classification … 785

for modified acknowledgment and computation of surface bits of knowledge for


plant leaf ailments. The advanced planning system contains four rule steps, starting
a concealing change structure for the data RGB picture is made, by then the green
pixels are a disguise and ousted using express cutoff regard, by then the image is
distributed and the significant parts are removed, finally the surface bits of knowl-
edge is prepared. From the surface estimations, the closeness of contaminations on
the plant leaf is surveyed.

3 Proposed Methodology

The exploration is another strategy that is executed for sicknesses distinguishing


proof and acknowledgment. Three fundamental techniques are used initiation,
blemish division highlight removal, and arrangement. The initial step, specimen
blemishes are improved by a half and half strategy that is three-dimensional box sepa-
rating, de-relationship, three-dimensional Gaussian channel, and three-dimensional
median channel. From this point forward, the blemishes are sectioned into a solid
relationship triggered strategy. Continuing the shading, shading histogram and the
surface highlights are separated and included. The characteristics are differentiated
by a gene-based algorithm and classified by KNN classifier. The experiment is under-
taken on the plant village dataset. The research tests for the following deficiencies like
sulfur, phosphorous, etc. The benefits of the proposed system are utilizing various
shading spaces which need to be distinguished for the best reasonable shading space.
To utilize different element extraction strategies, it is essential to recognize suitable
contrasts among terrible and great calfskin. Besides for utilization of various chan-
nels, it is necessary to expand the vision of influenced portions of leaf infections.
Furthermore, to use genetic calculation-based element extraction strategies, it is
required to recognize the best reasonable highlights between leaf infections.
Productive recognizable proof and acknowledgment of organic product leaf
ailments is a present test in PC vision (CV) because of their significant applica-
tions in agribusiness and agro-economy. In farming, different sorts of natural product
illnesses exist which influence the creation and nature of organic products. The vast
majority of these maladies are decided by a specialist around there based on their
side effects. In any case, it is costly because of the inaccessibility of specialists and
greater expense. In this respect, the processing analysts in a joint effort with horti-
culture specialists have proposed numerous calculations for robotized recognition of
maladies in plants and natural products. Leaf side effects are a significant wellspring
of data to recognize the ailments in a few distinct kinds of natural product plants.
Apple is a significant organic product plant and broadly renowned for its supplement
esteems. Be that as it may, its creation and quality are harmed by the assault of
various ailments like dark decay, rust, and scourge. In this way, it is fundamental to
build up a robotized electronic framework for the discovery and arrangement of leaf
manifestations at the beginning time.
786 S. S. Slathia et al.

The early identification of these side effects is useful to progress the value and
creation of natural products. On the grounds of machine visualization, it is a func-
tioning examination region to discover the injury spot and concerning a few tech-
niques that are presented for organic products sicknesses identification through
picture handling and AI calculations. A great deal of division, highlights extrac-
tion, and grouping methods are proposed in writing for organic products infections
division and acknowledgment, for example, a mix of highlights based indications
distinguishing proof, shading-based division, connection-based highlights determi-
nation, improved weighted division, surface highlights, shape highlights, support
vector machine (SVM), and so on.
The covering model is an extra substance hiding model in which RGB lights are
associated with different propensities to replicate a broad demonstration of shades.
The name of the model beginnings from the initials of the three included substance
key shades namely R: Red, B: Blue, and G: Green. The rule reason behind the red-
blue-green covering model is for the perceiving, delineation, and show of images in
electrical frameworks, for example, television screens and personal computers. In any
case, it possesses the way of utilizing standard photography. Before the electronic age,
the red–green–blue covering model concerning a strong theory behind it, orchestrated
in a human point of view on tints (Fig. 2).

Fig. 1 Flow diagram


An Optimized Method for Segmentation and Classification … 787

Fig. 2 RGB

With considerations to grayscale, a dim computerized picture is a picture as a


solitary example in photography and figuring where the estimation of every pixel is
just power detail. Photographs of this sort are called highly contrasting, comprises of
dim hues, extending from dark at the most minimal to white at the most noteworthy
power. Dim scale pictures are not the same as the slightest bit highly contrasting
bi-tonal pictures, which are pictures that just have two hues, highly contrasting with
regards to PC symbolism. Grayscale processed data has different dark hues between
them (Fig. 3).
The dim scale picture now and then brings about a solitary band of electrome-
chanical range estimating the power of the light at every pixel and, if only one
given recurrence is caught, it is monochromatic. Be that as it may, you can likewise
combine them with an entire shading picture. The principal tasks related to the article
are the customary set activities association, convergence, and supplement in addition
to interpretation. The interpretation A + x is set as a vector x and set A.

A + x = α + x|αε A (1)

Note that since the advanced picture is comprised of pixels at an essential arrange
area (Z2), this suggests limitations on the reasonable interpretation vectors x. The
fundamental tasks of Minkowski are included and taken away, and they are presently
quantifiable. Morphological separating procedures apply to pictures at the dim level.
The segments are organized with certain constraints to a limited number of pixels,
and the curves are rearranged. Be that as it may, the organizing viewpoint presently
has dark qualities related to each area of the directions as the picture produces. The
points of interest can be found in Dougherty and Giardina.
The consequential result is that the most extreme channel and the base channel
are dark level widening and dim level disintegration for the particular organizing
component given by the state of the channel window with the dim worth “0” inside
the window. Morphological smoothing calculation is the perception that a dark level
opening flattens a dim worth picture over the outside of splendor known as the
capacity and the smooth dim intensity shutting from beneath. The morphological
788 S. S. Slathia et al.

Fig.3 Grayscale

gradient is where the inclination channel gives a vector representation. The version
given here makes an approximation of the scale of gradients.
Border corrected mask is where a channel is a cover. The covering guideline is
otherwise called spatial filtration. Veiling is called sifting as well. In this definition,
there is a process of managing the separating activity that is performed legitimately on
the picture. A portion, convolution network, or veil is a little grid in picture handling
that is valuable for obscuring, honing, decorating, edge-discovery, and that is only
the tip of the iceberg. This is accomplished by methods for a part picture convolution.
To locate the specific tasks in a picture, the veil is produced. The issues or highlights
can be discovered in a picture. The outskirt amended cover is a veil wherein all the
issues of a picture are shut to the edges.
In machine visualization, surrounding is the strategy for apportioning a digital
image data into various fragments. The division objective is to disentangle or poten-
tially change a picture’s portrayal into something that is progressively important and
simpler to examine. The division of pictures is commonly used to search for bends
and blemishes in the images. More exactly, image dissection is the procedure by
which every pixel in a picture is doled out a name, so pixels with an akin to coordi-
nate have the same attributes. The result of image dissection is a lot of sections which
covers the complete image all in all, or an assortment of forms got from the picture
(see Edge recognition). Every one of the pixels in an area is comparable. Nearby
areas differ significantly for the similar principles of the data. Utilizing interjection
calculations like leading solid shapes, the form produced after data division can be
used to make a three-dimensional simulation when used on a pile of data, which is
the basis of clinical imaging.
CCA is a notable picture handling method which checks a picture and gatherings
pixels into marked segments dependent on pixel network. An eight-point CCA stage
An Optimized Method for Segmentation and Classification … 789

is performed to find all articles produced from the former stage inside the double
picture [11]. The yield of this stage is a variety of N antiques that gives a case of that
stage’s info and yield. The proposed framework’s fundamental applications basi-
cally point to mechanical applications such as supporting early location, finding, and
fitting treatment, and segmentation of pictures assumes a significant job in numerous
applications for the picture preparing. Finally, to lower SNR conditions and various
things, the available issues are managed by computerization for the effective and
exact division.

4 Empirical Analysis

4.1 Preprocessing

To lower the abstract of pictures, both input and output are termed as preprocessing.
Preprocessing is used to improve the data; it removes distortions and undesirable
aspects of the data and also enhances important features that are required for further
processing. As all image processing methods use redundancy of images, Pixels
of the same image have identical corresponding luminosity values. The input data
requires preprocessing techniques, so that the correct analysis of the data can take
place. This implies that if any neighboring pixels that may be corrupted and can also
be applied for data analysis. This method requires changing the size of the input data
and changing it to a grayscale picture by using different filters. Data cleaning is the
process to find, remove, and replace or missing data. Searching for local extreme
and abrupt changes in the data can help in identifying notable trends in the data. The
grouping method is used to signify the relationships between the various data points.
The preprocessing applies certain methodologies wherein all the input images are
resized into the same dimensions. The output image is altered in case the input data
does not have the same specified aspect ratio. Image filtering is a process to enhance
the input data. For example, an image can be filtered to highlight some aspects or
erase some aspects. Next, if one single bit of colored pixel needs to be stored then 24
bits are required, whereas a grayscale image only requires 8 bits of storage. There is
a significant drop in the memory requirement (by almost 67%) which is extremely
useful. Grayscale reduces ambiguity from the value of a 3D pixel (RGB) to a value
of 1D easily. Most functions with 3D pixels (e.g., edge detection, morphological
operations, etc.) cannot be enhanced.

4.2 Segmentation

Image segmentation is a common technique in digital image treatment and analysis,


many times depending on pixels of the data, to segregate it into different sectors. In
790 S. S. Slathia et al.

computer vision, the segmentation of the image is a method of subdivision of the


digital picture into several segments. Segmentation is a method for grouping pixels
with similar characteristics. Separating an image into independent regions, so that
object in the image is clearly defined, and it is designed to create where each region
is homogeneous and combination of no two simultaneous areas is similar to each
other, this process is defined as image segmentation. The accuracy of segmentation
determines the potential success or failure of the analysis process. Segmentation
is carried out because of the related property. The proposed solution implements
this similar property by implementing the “k-mean clustering algorithm,” multiple
segmentation method zeroes in on the center of any set in the inner cluster and finally
re-segments the inputs based on a center that is nearest to the median. This method
helps to remove important picture features, which allow information to be easily
perceived.
Next, a suitable test image is chosen by using various color space conversions
including RGB to HSV and RGB to YCbCr conversions. The labels are then cali-
brated to produce realistic results using the shape detection method based on compre-
hensive regional context knowledge. Using the different color spaces, the best color
space is selected that is related to leather classification. Firstly, Color space conver-
sion where a translation of the color representation from one base to another is
performed. This generally occurs with the translation of an image represented by a
different color space, to render the image as similar as possible to the original. This is
achieved with the translated image. Secondly, in the converting color format where
the color information does not support many applications of image processing. If
there is a necessity to differentiate between colors, then one explanation is to change
the RGB input to black and white or grayscale output formats. Finally, morphological
operations in which the morphological processing of images is a set of procedures
that do not follow any hierarchy that is related to any structured elements of the data.
Morphology has a huge array of data processing methods that process shape-based
images. A structuring factor for an input image is used for morphological operations
to construct an output image of the same size that are poor when printed in black and
white (Fig. 4).

4.3 Feature Extraction

In the field of automotive training, recognizing the sequence of processing the input
image, culling of the features is initiated by putting together the processed informa-
tion and constructs properties that are optimized for information and non-redundancy.
The extraction of features is connected to a reduction in dimensionality. When the
input for an algorithm is extensive for processing and is considered to be repeti-
tive, it can be translated into a minimized set of properties. The function selection
is called to determine a subset of the initial features [12]. In order that the desired
task may not be complete, the opted characters will have the required information
An Optimized Method for Segmentation and Classification … 791

Fig. 4 Black and white


image

from the input data, using this minimized depiction. Shape features, color features,
geometrical features, and texture features.
To begin with, the shape features comprise of shape characteristics like round
objects or any other shape where perimeter boundaries of the objects along with
the diameter of the order, are defined as shape features. Next, color features where
the color and texture histograms and the whole picture color structure form part
of the global apps. Color, texture, and shape features provide local characteristics
for sub-images, regions, and points of interest. These image extracts are then used
to match and retrieve images. Then, geometrical features in which the geometric
characteristics of objects consisting of a sequence of geometric elements such as
points, lines, curves, or surfaces are essential. Such characteristics may be corner
elements, edge characteristics, circles, rids, a prominent point of picture texture, etc.
Finally, texture features of an image fabric is a cluster of the processed parameters
in the processing of an image that defines the quantum of a perceived arrangement
of an image texture that gives information about the contiguous pattern of spectrum
or sharpness of the data or a specific area the data (Fig. 5).
Here, there is the utilization of feature extraction methods like gray-level co-
occurrence matrix (GLCM), local binary pattern (LBP), region segmentation, and
genetic algorithm. GLCM gives the structure features of the input data like clarity,
correlation, energy, etc. Then, the LBP gives the various shape features of the input
image. After that genetic algorithm is applied to choose the best attribute to distin-
guish the different diseases that may occur in the leaf. The region properties segmen-
tation is utilized to get the mathematical features of the input image such as density,
area, and so on. From all the above, the extracted features serve as the best features
that are identified which are related to differentiating the various leaf diseases.
792 S. S. Slathia et al.

Fig. 5 Small object


removed image

4.4 Classification

The methodology of taking out data from a host of image indexes is called image
classification. To create thematic charts, the resulting raster from image classification
can be used. A toolbar for image classification is the preferred way for classification
and multivariate analysis.

5 Conclusion and Future Work

In this research, an enhanced robotized PC based strategy is planned and approved for
acknowledgment of disease. The sore blemish differentiates extending, sore division,
and conspicuous highlights determination and acknowledgment steps. The differenti-
ation of the contaminated spot is upgraded, and division is performed by the projected
technique. The performance of the projected technique is additionally upgraded by
region segmentation. At that point, numerous highlights are removed and melded by
utilizing an equal strategy. A genetic calculation is used to choose the best highlights,
and later they are used by KNN for grouping. In the future, the proposed method-
ology can be grouped as many others that are yet to be developed methods like texture
analysis and classification. This helps in determining the stages of the disease. It will
be of great help as the system is not dependent on the disease. The proposed system
can also be greatly enhanced to identify diseases that do not originate at the leaves
but rather at different parts of the plant. Sudden death syndrome (SDS) can also be
integrated into our module, but due to the lack of proper dataset at present, it could
An Optimized Method for Segmentation and Classification … 793

not be incorporated into this present work. Another advancement of the work could
be to add and identify the different ways in which pests affecting the plants as each
pest has a different way of attacking the plants. Finally, one major upgrade could
also be used to identify what kind of nutrient deficiency the plant is facing due to
which it is having those diseases and proper care can be taken care of the plant.

References

1. Rozario LJ, Rahman T, Uddin MS (2016) Segmentation of the region of defects in fruits and
vegetables. Int J Comput Sci Inf Secur 14(5)
2. Chuanlei Z, Shanwen Z, Jucheng Y, Yancui S, Jia C (2017) Apple leaf disease identification
using genetic algorithm and correlation based feature selection method. Int J Agric Biol Eng
10(2):74–83
3. Sharif MY, Khan MA, Iqbal Z, Azam MF, Lali MI, Javed MY (2018) Detection and classifi-
cation of citrus diseases in agriculture based on optimized weighted segmentation and feature
selection. Comput Electron Agric 150:220–234
4. Sapkal AT, Kulkarni UV (2018) Comparative study of leaf disease diagnosis system using
texture features and deep learning features. Int J Appl Eng Res 13(19):14334–14340
5. AlShahrani AM, Al-Abadi MA, Al-Malki AS, Ashour AS, Dey N (2018) Automated system
for crops recognition and classification. Comput Vis Concepts Method Tools 1208–1223
6. Gavhale KR, Gawande U (2014) An overview of the research on plant leaves disease detection
using image processing techniques. J Comput Eng 16(1):10–16
7. Camargo A, Smith JS (2009) An image-processing based algorithm to automatically identify
plant disease visual symptoms. Biosyst Eng 102(1):9–21
8. Zhang S, Wu X, You Z, Zhang L (2017) Leaf image based cucumber disease recognition using
sparse representation classification. Comput Electron Agric 134:135–141
9. Ferentinos KP (2018) Deep learning models for plant disease detection and diagnosis. Comput
Electron Agric 145:311–318
10. Shuaibu M, Lee WS, Hong YK, Kim S (2017) Detection of apple marssonina blotch disease
using particle swarm optimization. Trans ASABE 60(2):303–312
11. Kamilaris A, Prenafeta-Boldu FX (2018) Deep leuarning in agriculture: a survey. Comput
Electron Agric 147:70–90
12. Gu Y, Cheng S, Jin R (2018) Feature selection for high-dimensional classification using a
competitive swarm optimizer. Soft Comput 811–822
A Thorough Analysis of Machine
Learning and Deep Learning Methods
for Crime Data Analysis

J. Jeyaboopathiraja and G. Maria Priscilla

Abstract The analysts belonging to the police forces are obliged for exposing the
complexities found in data, to help the operational staff in nabbing the criminals and
guiding strategies of crime prevention. But, this task is made extremely complicated
due to the innumerous crimes, which take place and the knowledge levels of recent
day offenders. Crime is one of the omnipresent and worrying aspects concerning
society, and preventing it is an important task. Examination of crime is a systematic
means of detection as well as an examination of crime patterns and trends. The data
work involving includes two important aspects, analysis of crime and prediction of
perpetrator identity. Analysis of crime has a significant role to play in these two steps.
Analysis of the crime data can be of massive help in the prediction and resolution of
crimes from a futuristic perspective. To avert this issue in the police field, the crime
rate must be predicted with the help of AI (machine learning) approaches and deep
learning techniques. The objective of this review is to examine the AI approaches
and deep learning methods for prediction of crime rate that yield superior accuracy,
and this review article also explores the suitability of data approaches in the attempts
made toward crime prediction with specific predominance to the dataset. This review
evaluates the advantages and drawbacks faced by crime data analysis. The article
provides extensive guidance to the evaluation of model parameters to performance
in terms of prediction of crime rate by carrying out comparisons ranging from deep
learning to machine learning algorithms.

Index Terms Big data analytics (BDA) · Support vector machine (SVM) ·
Artificial neural networks (ANNs) · K-means algorithm · Naïve Bayes

J. Jeyaboopathiraja (B)
Research Scholar, Department of Computer Science, Sri Ramakrishna College of Arts and
Science, Coimbatore, India
e-mail: jeyaboopathi@gmail.com
G. Maria Priscilla
Professor and Head, Department of Computer Science, Sri Ramakrishna College of Arts and
Science, Coimbatore, India
e-mail: mariajerryin76@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 795
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_58
796 J. Jeyaboopathiraja and G. Maria Priscilla

1 Introduction

Recently, big data analytics (BDA) has evolved to be a prominent technique used for
the data analysis and extraction and their relevance in an extensive array of appli-
cation fields. Big data involves the accessibility to an extreme amount of data that
tends to become hard to be stored, processed, and mined with the help of a classical
database fundamentally because the data available is massive, complicated, unorga-
nized, and quickly varying. This one is the possible and critical basis behind the big
data’s idea, which is being initially encouraged by online companies such as Google,
eBay, Facebook, LinkedIn, etc. The term “big data” indicates a digital repository of
information having an enormous volume, velocity, and diversity. Analytics in big
data refers to the software developing process to unravel the trends, patterns, asso-
ciations, or other meaningful perspectives in that enormous amounts of information
[1]. Big data has an inevitable part to play in different domains, like agriculture,
banking, data mining, education, chemistry, finance, cloud computing, marketing,
and healthcare stocks [2].
Owing to the consistent urbanization and rising population, society has become
city-centric. But, an increasing number of savage crimes and accidents have also
accompanied these developments. To deal with these problems, sociologists, analyst,
and protection organizations have dedicated many endeavors toward the mining of
important patterns and factors. The development of ‘big data,’ necessitating new
techniques toward the well-organized and precise analysis of the rising volumes of
data that are very much criminal, has remained a critical problem for every law
enforcement and foundations of intelligence collection. The crime has increased
multi-fold over the passage of time, and the criminals have begun using the latest trend
in technology not just for committing the offenses, along with the runoff acquittal.
Crime is not anymore confined to the boulevards and back rear entryways (back
alleys) in the neighborhood places. Also, the Internet which acts as a connecting
bridge for the whole world thrives as a field for the more crooked minded criminals
recently. Acting upon the barbaric acts like the 9/11 terrorist assaults and technology
exploitation for hacking the most protected databases used for defense, novel and
efficient techniques of crime prevention has emerged to gain rising significance [3].
Data mining is called as a potential tool with remarkable capability to aid the illegal
examiners in highlighting the majority of vital information concealed in the crime
‘big data.’ Mining the data in the form of a tool for crime investigation is identified
as a relatively novel and popular research domain. In addition to the improving usage
of the systems that are computerized for crime tracking, analysts in computer data
have got into assisting by the officers of law enforcement and detectives not just to
accelerate the process of crimes’ resolution [4], but also for advance prediction of
crimes. The improving accessibility of big data has also influenced the importance
of command by the applications involving several data mining approaches, and its
simplicity to be used by people with no skills in data analysis and knowledge on
statistics.
A Thorough Analysis of Machine Learning and Deep Learning Methods … 797

The capability of analyzing this extremely high amount of data along with its
intrinsic drawbacks with no computational assistance imposes manual pressure [5].
This review investigates the modern approaches of data mining techniques, which
are utilized for the prediction of crime and criminality. Diverse classification algo-
rithms like Naïve Bayesian, decision tree (DT), back propagation (BP), support
vector machine (SVM), and deep learning techniques have been utilized for the
prediction “crime category” for differentiation. This technical work provides the
details on different data mining classifiers employed for the crime data prediction.
This technical work also studies the advantages and drawbacks of different data
mining approaches in crime data.

2 Literature Review

Mining the data techniques are utilized for the detection as well as prevention of
crime. Classical classification approaches concentrate on both organized and unor-
ganized data for pattern detection. The evolution of big crime data has made many of
the available systems employ an ensemble of data mining approaches to get exact and
accurate predictions. Crime analysis can range over an extensive array of activities
in crime starting from simple mistreatment of public duties to crimes that are pre-
arranged at an international level [1]. This section reviews the classical data mining
techniques and deep learning techniques in crime data analysis.

2.1 Review of Data Mining Methods for Crime Data

McClendon and Meghanathan [5] introduced WEKA, which is a publicly available


data mining software, for carrying out an of analysis in a comparative way between
the patterns of wild crime extracted from the Communities and Crime Unnormalized
Dataset given by the University of California-Irvine (UCI) database and the original
data in the crime statistics for the state of Mississippi, which has been given by
neighborhoodscout.com Web site. Linear regression, additive regression, and deci-
sion stump algorithms are presented employing the identical finite group of features,
on the Communities and Crime Dataset. On the whole, the algorithm for linear regres-
sion exhibited the best performance among the three algorithms chosen. Its lifetime
is to show the effectiveness as well as the accuracy of the machine AI employed in
data mining analysis at the prediction of aggressive crime activities.
Tyagi and Sharma [6] studied the data mining mechanism employed in the crim-
inal examination. Here, the important concepts are to focus on the technique utilized
in crime data analytics. Algorithms including C4.5, CART, K-nearest neighbor algo-
rithm (KNN), support vector machine (SVM), and artificial neural networks (ANNs)
help detect the particular patterns of the criminals in massive sized datasets, clas-
sifying the criminal activities into various groups and prediction on the hotspots of
798 J. Jeyaboopathiraja and G. Maria Priscilla

crime. This research work describes the problems that come up during the analysis,
which need to be eliminated to obtain the required result.
Pramanik et al. [7] studied a framework on big data analytics, which investigates
four parameters, they are criminal networks, such as network extraction, subgroup
location, association design revelation, and central member identification and with
success. Big data sources, change, stages, and devices are integrated to render four
significant functions, which exhibit a potential correlation with the two dimensions
of SNA. Also, social network analysis (SNA) is a well-known and proactive tech-
nique to unravel the earlier mysterious structural patterns from criminal networks.
SNA was identified to be a resourceful and effective mechanism for analyzing crim-
inal organizations and firms. With the constant introduction of modern platforms of
big data analytics, tools, and approaches, the years to come will witness the broad
deployment and usage of big data across defense and law enforcement companies.
Jha et al. [8] presented on data analytical techniques based on big data can
help prevent the crimes. Also, various approaches of data collection have been
studied, and it comprises of volunteered geographic information (VGI) along with
geographic information system (GIS) and Web 2.0. The final stage determination
includes the forecasting that depends on the gathering of data and investigation.
Big data is regarded as a suitable framework for the crime data analysis since it
renders better throughput, fault resilience helps in the analysis of massive datasets,
processes on hardware goods, and produces trustworthy outcomes, while the Naïve
Bayes algorithm of machine learning can predict better utilizing the available dataset.
Nadathur et al. [9] provided a comprehensive overview of crime incidences and
their relevance in literature through the combination of techniques. This review work,
as the first step, analyzes and detects the features of crime occurrences introducing
a schema on the combinatorial incident description. The newly introduced schema
tries to find a method for a systematic merging of various elements or crime features,
and the system provides a database with much better throughput and lesser main-
tenance expenditure applying Hadoop tools with HDFS and map-reduce programs.
Besides, an elaborate list comprising of crime-associated violations is presented.
This facilitates a perfect interpretation of the repetitive and underlying criminal
actions. This review work tries to help experts and law enforcement officers in finding
the patterns and trends in rendering the forecasts, discovering the association, and
probable explanations.
ToppiReddy et al. [10] studied different visualizing approaches and AI algorithms,
which are followed for the distribution of crime prediction over a region. First, process
the untreated datasets that are processed and then envisage as per the requirement.
KNN is a technique employed for arrangement purposes. The object classification
is performed with a mainstream vote from its neighbor, and the presumed object
belongs to the class that is famous among its k-nearest neighbors. Naïve Bayes
depends on Bayes theorem that defines the likelihood of an occurrence by the earlier
acquaintance of constraints having relevance to the event. Then, AI was utilized
for the information extraction from these massive datasets and finds the concealed
associations amid the data which in turn is further utilized for reporting as well as
finding the patterns in the crime that was helpful for crime analysts in the analysis
A Thorough Analysis of Machine Learning and Deep Learning Methods … 799

of these crime networks employing different interactive visualization techniques for


forecasting the crime and therefore is of great assistance in preventing the crimes.
Yerpude and Gudur [11] designed data mining approaches, which are used on
crime data for forecasting the features, which influence the higher crime rate. Super-
vised learning makes use of datasets for training, testing, and achieving the necessary
results while unsupervised learning partitions a discontinuous, unorganized data into
classes or groups. Data mining’s supervised learning approaches are decision trees,
Naïve Bayes, and regression, and AI on earlier gathered data and hence utilized for
the prediction of features which influences the crime incidences in an area or neigh-
borhood. Depending on the positioning attained by the features, the Crime Records
Bureau and Police Department can embark on required measures to reduce the chance
of crime occurrence.
Pradhan et al. [12] demonstrated big data analytics using the San Francisco crime
dataset, as gathered by the San Francisco Police Department and accessible through
the initiative of Open Data. Algorithms such as Naïve Bayes, decision tree, random
forest, K-nearest neighbor (KNN), as well as multinomial logistic regression are
employed. It is primarily focused on carrying out a comprehensive analysis of the
important kinds of crimes, which happened in the municipality, screen the pattern
throughout the years, and decide how different attributes have a role to play in partic-
ular crimes. Besides, the results attained of the exploratory data analysis is leveraged
to inform the data preprocessing process, before training different AI models for
forecasting the type of crimes. Especially, the model helps in predicting the kind of
crime, which will happen in every district of the city. The dataset given is hugely
imbalanced, and therefore, the metrics utilized in earlier research focus primarily
on the majority class, irrespective of the performance of the classifier in minority
classes, and a technique is proposed for resolving this problem.
Yu et al. [13] studied building datasets from real crime records. These datasets
have a collective tally of crime and crime-associated events classified by the police
forces. The information comprises of place as well as a time of these events. More
number of spatial and temporal features are extracted from the unprocessed dataset.
Second, a group of data mining classification approaches are used for carrying the
crime prediction. Different classification techniques are analyzed to decide the one
that is best for the prediction of crime “hotspots.” The classification of rising or egress
is investigated. Finally, the best prediction technique is proposed for achieving the
most consistent results. A model has resulted from the research work, which exploits
the inherent and external spatial and temporal data to get robust crime predictions.
Jangra and Kalsi [14] presented a prediction analysis process, where future devel-
opment and results are forecasted based on presumption. AI and regression methods
come under the two techniques, which have been used for carrying out predictive
analytics. During the process of predictive analytics, AI approaches are extensively
used as well as emerged to become commonly used since their massive scale datasets
managed by it are quite efficient and yield much better performance. It renders the
results of having standard features and noisy data. The KNN is a famous approach
that helps in the analysis of prediction. To boost the crime prediction’s accurateness,
800 J. Jeyaboopathiraja and G. Maria Priscilla

the Naïve Bayes method is used. It is found that Naïve Bayes yields much better
accuracy in comparison with KNN for the crime prediction.
Deepika and SmithaVinod [15] designed a technique for India’s crime detection
that employs data mining approaches. The mechanism comprises steps such as data
preprocessing, clustering, classification, and visualization. The field of criminology
studies about different crime features. Clustering through K-means helps in the detec-
tion of crime, and the groups are created depending on the resemblance found in the
crime characteristics. The random forest algorithm and neural networks are used for
data classification. Revelation (visualization) is performed employing the Google
marker clustering, and the crime hotspots are plotted on the India map. WEKA tool
helps in validating the accuracy.
Dhaktode et al. [16] presented a data mining approach that is employed for
analysis, examination, and verifies the patterns in crimes. A clustering technique
is enforced for the analysis of crime data, and the stored data is clustered employing
the K-means algorithm. Once the classification and clustering are performed, a crime
can be predicted by its past data. This newly introduced system can specify areas
having a greater probability of crime rate and different regions having a greater crime
rate.
Jain et al. [17] designed a systematic approach for crime analysis and prevention
to spot and investigate the patterns and trends found in crime. This system can
help to predict the areas having a greater probability for crime incidences and can
help to visualize the crime vulnerable hotspots. The growing usage of computerized
systems is of much aid to the crime data analysts in helping law enforcement officials
to solve crimes faster. K-means algorithm is performed by dividing the data into
groups according to means. Further, this algorithm includes a modification known
as the expectation–maximization algorithm where the data is partitioned based on
their parameters. This data mining framework is easy for implementation, and it
jointly operates with the geospatial plot of wrongdoing and increases the detective’s
efficiency and other law enforcement officials.
Sukanya et al. [18] worked on the analysis of the criminals’ data, and grouping
and classification approaches are utilized. These data are accumulated in the crimi-
nals’ repository. Spatial clustering algorithms and structured crime classification are
utilized for the classification of the crimes. These algorithms are useful in identifying
the spots of crime occurrences. The identification of the criminals will be done based
on the spectator or clue present at the location of the crime. Identifying the hotspot of
criminals’ occurrences will be valuable to the police forces to improve the security
of the specific region, and this will reduce the crimes to a much better extent in the
future. After the application of this concept to every area, the criminal activities can
be reduced to the maximum extent possible. The crimes cannot be controlled entirely.
Ladeira et al. [19] presented data preprocessing, transformation, and mining
approaches to find the crime details hidden in the dataset associating similar records.
Subsequently, the criminal records are categorized into three groups considering the
complexity of the criminal action, which are: A (low sophistication), B (medium
sophistication), or C (high sophistication). To know the effect of non-application and
utilization of preprocessing approaches and the data mining approaches that attain
A Thorough Analysis of Machine Learning and Deep Learning Methods … 801

the best outcomes, two experiments were carried out, and the comparison of their
mean accuracy was done. The application of the preprocessing and random forest
algorithm produced superior results and also the potential of knowing high dimen-
sional and dynamic data. As a result, an ensemble of these approaches can yield
better information to the police department. Inference of Data Mining Methods for
Crime Data is shown in Table 1.

2.2 Review of Deep Learning Methods for Crime Data

Keyvanpour et al. [20] designed data mining approaches that were supported using a
multi-use framework for investigating the crimes intelligently. The framework used
a systematic technique for employing a self-organizing map (SOM) and multilayer
perceptron (MLP) neural networks for the grouping and classification of data in
crime. Design aspects and problems in employing hierarchical/partitional grouping
approaches are used in clustering the crime data.
Lin et al. [21] studied the idea of a criminal situation in a framework-based crime
forecast demonstrating and characterizes a lot of spatial-fleeting features that rely
upon 84 sorts of segment data utilizing the Google Places API to robbery information
for Taoyuan City, Taiwan. Deep neural networks was the best model, and it performed
better than the well-known random decision forest, support vector machine, and
K-near neighbor algorithms. Experiments show the significance of the geographic
feature design in increasing performance and descriptive capability. Also, testing for
crime displacement reveals that the copy of this design outshines the criterion format.
Feng et al. [22] suggested data analysis for the analysis of criminal data in San
Francisco, Chicago, and Philadelphia. First, the time series of the data is explored,
and forecasting crime trends in the coming years are performed. After this, with
the crime category predicted and the time and location are given, to get over the
problem of disproportion, compound classes are combined into bigger classes, and
selection of feature is carried out for accuracy improvement. Multiple state-of-the-
art data mining approaches, which are specially applied for forecasting the crime,
are presented. The results of experiments reveal that the tree classification models
outperformed this task of classification over KNN and Naive Bayesian techniques.
Holt-Winters integrated with the seasonality of multiplicative yields superior results
in the forecasting the crime trends. The potential results will be advantageous meant
for police forces and law enforcement in solving crimes faster and render the cues,
which can help in them nabbing the crimes, forecast the probability of happenings,
efficiently exploit the assets, and formulate quicker decisions.
Chauhan and Aluvalu [23] studied that in this emerging technological field, the
cyber-crimes are increasing at an exponential rate and are quite a challenge to the
skills of investigators. Also, the data on crime is rising magnanimously, and it is
generally in digital format. So the data generated cannot be managed with efficiency
employing classical analysis approaches. Rather than applying conventional data
analysis mechanisms, it would be advantageous to employ big data analytics for this
802 J. Jeyaboopathiraja and G. Maria Priscilla

Table 1 Inference of data mining techniques for crime data


S. Author name Technique name Benefits Drawbacks
no.
1 McClendon [5] Linear regression, Shows how efficient Data mining help for
additive regression, as well as precise the a long and difficult
and decision stump machine learning process for law
algorithms algorithms utilized in enforcement officers
the analysis of data who need to go
mining can be at the during massive
prediction of wild volumes of data
crime patterns
2 Tyagi [6] Data mining An algorithm can Intelligence agencies
technologies manage a massive sift through the
amount of data and database manually,
render superior which in turn is a
accuracy difficult task and
time consuming
3 Pramanik [7] Network extraction, Functions as modern Convergence is slow,
subgroup detection, analytics of big data the performance is
interaction pattern platforms for security reduced by an
discovery, and central and law enforcement increasing number of
member identification agencies classes, and
over-fitting occurs
often
4 Jha [8] Machine learning This system is Machine learning for
algorithm utilized for analysis predicting and
of crime data, and it averting future crime
renders higher
throughput
5 ToppiReddy [10] Visualizing The current system is Consumes more time
techniques and utilized for the for classification
machine learning prediction of the
algorithms crimes and aids the
law agencies. The
accuracy in
prediction is
increased
6 Yerpude [11] Decision trees, Naïve Improves security Influences the
Bayes, and regression and crime protection greater crime rate
with a desktop. Any
safety measures can
be undertaken in
accordance with the
relevant features
(continued)
A Thorough Analysis of Machine Learning and Deep Learning Methods … 803

Table 1 (continued)
S. Author name Technique name Benefits Drawbacks
no.
7 Yu [13] Data mining Best prediction It does not provide
classification technique to attain good support for
techniques the most consistent real-time
outcomes applications. Future
work has to
incorporate motor
vehicle theft-based
crime
8 Jangra [14] Naïve Bayes Enhances the Computation time is
accuracy of the crime excessive for a few
prediction approach classifiers.
Concurrent
techniques are
required for reducing
the classification
time
9 Deepika [15] K-means clustering, The technique will be One more problem is
random forest advantageous for the that they cannot
algorithm, and neural crime department of predict the time of
networks India in the analysis occurrence of the
of criminal activities crime
with superior
forecasting
10 Jain [17] K-means clustering Helps to increase the 1) Hard to
efficiency of the predict K-value.
officer and other law 2) Performance is
enforcement officials not good with the
global cluster
11 Sukanya [18] Clustering and The hotspot of the Real-time prediction
classification criminal activities is slow, hard to
technique and identifying the implement, and
criminals employ complicated
clustering and
classification
algorithms
12 Ladeira [19] Data preprocessing, Application of The prediction
transformation, and preprocessing process employing
mining techniques techniques and which random
data mining forests consumes
approaches yields more compared to
superior results other algorithms
804 J. Jeyaboopathiraja and G. Maria Priscilla

massive amount of information. Essentially, the data collected will be disseminated


over multiple geographic places and based on that, the clusters will be generated.
Secondly, the analysis of the clusters created is done employing big data analytics.
At last, these analyzed clusters are provided to the artificial neural network which
will lead to the generation of prediction patterns.
Pramanik et al. [24] explored the strengths of analytics in big data for achieving
intelligence insecurity within a criminal analytics structure. Five important technolo-
gies including, link analysis, intelligent agents, text mining, artificial neural network
(ANNs), and machine learning (ML) have been identified and have found exten-
sive application in a different field for evolving the mechanical basis of security
in an automatic way and criminal investigation system. Few popular data sources,
analytics techniques, and applications associated with two significant features of
social network analysis such as analysis in a structural and positional way forming
the basis of criminal analytics are examined. The advantages and drawbacks of
analytics in big data apply to the field of criminal analytics.
Stalidis et al. [25] presented a comprehensive analysis of crime classification and
prediction employing deep learning architectures. The efficiency of deep learning
algorithms in this domain is analyzed, and it yields recommendations for the design
and training of deep learning systems for the prediction of crime spots, employing
public data acquired from police reports. The experiments carried out with five
openly available datasets show that the deep learning-based techniques perform
continuously better than the available techniques that have been performing better.
Also, the efficiency of various parameters in the deep learning frameworks is evalu-
ated, and it provides various perspectives for their configuration to achieve increased
performance in crime classification and at last crime prediction.
Kang and Kang [26] suggested a fusion technique of feature-level data with envi-
ronmental consideration depending on a deep neural network (DNN). This dataset
comprises of gathered data from different online repositories of crime statistics,
geographic, and meteorological data and images found in Chicago, Illinois. Before
the generation of data training, crime-associated data is selected in carrying out
statistical analyses. At last, the DNN training is carried out, and it comprises of
the four types of layers, which include spatial, temporal, environmental context,
and joint feature representation layers. Integrated with critical facts obtained as of
different domains, fusion DNN is considered to be manufactured goods of an effective
decision-making process, which helps in the statistical analysis of data redundancy.
The traits of experiments demonstrate that the DNN model exhibits more accuracy
in the crime prediction incidence compared to other prediction models.
Shermila et al. [27] designed a model, which identifies the patterns in crime from
observations gathered from the crime location and then describes the offender who is
probably the crime suspect through prediction. This technical work covers two impor-
tant aspects, which include analysis of crime and forecasting the offender’s identity.
The crime analysis step classifies various crimes that are unsolved, as well as evalu-
ates the impact of different factors such as year, month, and weapon used in uncertain
crimes. The system predictively describes the offender employing algorithms such
as multilinear regression, K-neighbors classifier, and neural networks.
A Thorough Analysis of Machine Learning and Deep Learning Methods … 805

Lin et al. [28] designed a deep learning algorithm that has been found extensive
application in various fields; such as image identification and processing the natural
language. The deep learning algorithm yields superior prediction results compared
to other methodologies like random forest and Naïve Bayes for probable crime loca-
tions. Also, the model performance is improved by collecting data with diverse
time scales. For validating the results of experiments, the probable crime spots are
visualized on a map, and it is inferred if the models can find the real hotspots.
Krishnan et al. [29] formulated an artificial neural networks model, which replace
the traditional data mining approaches in a better manner. In this analysis, the predic-
tion of the crime is done with the help of recurring long short-term memory (LSTM)
networks. An available organized dataset is helpful in the prediction of the crimes.
Data is divided into training, testing data. Both the testing and training go through
the training process. The resultant training and testing data is then compared with
the real crime count, and its visualization is done.
Gosavi and Kavathekar [30] examined data mining approaches, which will use in
the detection and prediction of crimes employing association rule mining, k-means
clustering, decision trees, Naive Bayes, and machine learning approaches like deep
neural network and artificial neural network. Inferences from this survey were that if
the dataset instances contain more number of missing values, then preprocessing is
an important task, and crimes do not happen consistently across urban locations but
is concentrated in particular regions. Therefore, the prediction of crime hotspots is an
essential task, and the usage of post-processing will be of massive help in reducing
the crime occurrence rate.
Wang et al. [31] designed the benchmarked deep learning spatio-temporal
predictor, ST-ResNet, for aggregated prediction of the distribution of crime. These
models consist of two steps. The first one performs the preprocessing of the crude
data of crime. This comprises of regularization in both space and time to improve
the guessable signals. Secondly, hierarchical architectures of residual convolutional
units are adapted for training multifactor crime prediction models.
Mowafy et al. [32] showed that criminology is a critical field in which text mining
approaches have a significant part to play for law enforcement officers and crime
analysts to help in the investigation and speed up the resolution of crimes. A common
architecture for an extracting the crime procedure which combines the extraction of
text with the investigation of criminal procedure for forecasting the type of crime by
using the classification of text for the unorganized data in the police incident reports,
which is regarded to be a segment of the criminal behavior analysis.
Ivan et al. [33] recommended that approach is called the intelligence of business
and it is considered as dependent on supervised learning (organization) approaches
provided that there was branded training data. The comparison of four varied classifi-
cation algorithms including decision tree (J48), Naïve Bayes, multilayer perceptron,
and support vector machine was carried out to get the most efficient algorithm for
forecasting the crimes. The study employed classification models created with the
help of Waikato Environment for Knowledge Analysis (WEKA). Decision tree (J48)
is performed Naïve Bayes, multilayer perceptron, and support vector machine (SVM)
algorithms, and exhibited much better presentation both in terms of execution time
806 J. Jeyaboopathiraja and G. Maria Priscilla

and accuracy. Inference of Deep Learning Methods for Crime Data is shown in Table
2.

3 Issues from Existing Methods

In the criminology literature, the association among crime and different features
has been rigorously analyzed, where common instances are historical crime records,
unemployment rate, and similarity in space. This literature review depicts conceptu-
alizing predictive policing, and it is imminent and reaped out advantages and disad-
vantages. The research reveals a variance between the substantial focus for potential
benefits and disadvantages of predictive policing in the literature and the available
empirical proof. The empirical proof yields very limited assistance for the advan-
tages claimed of predictive policing. While few empirical studies show that predictive
policing mechanisms result in a reduction in crime, others show no influence. Concur-
rently, no empirical proof exists at all for the disadvantages given. With the rising
advent of computerized systems, crime data analysts can prove to be of massive
help to the law enforcement executives to accelerate the practice of rectifying the
crime. Employing the extraction of data and statistical approaches, novel algorithms
and systems have been designed alongside new sorts of information. The impact of
AI and measurable methodologies (statistical approaches) on wrongdoing or other
enormous information applications like auto collisions or time arrangement infor-
mation will encourage the investigation, extraction, and translation of significant
examples and patterns, subsequently helping in the prevention of criminal activities
and control. In comparison with deep learning algorithms, machine learning algo-
rithms are bogged down by a few challenges. Having all those benefits to its potential
and ubiquity, machine Learning is just not exact. The below factors are its limitations:

3.1 Data Acquisition

Machine learning need huge datasets for training, and these must be comprehen-
sive/impartial and worthy. They can also encounter circumstances where they have
to wait for the generation of novel information.

3.2 Time and Resources

ML requires sufficient time to allow the algorithms that are trained and extend suffi-
ciently to satisfy their objective with a reasonable amount of exactness and accor-
dance. Also, enormous resources are required for its functioning. These can imply
more computer resources required.
A Thorough Analysis of Machine Learning and Deep Learning Methods … 807

Table 2 Inference of deep learning techniques for crime data


S. no. Author name Technique name Benefits Drawbacks
1 Keyvanpour [20] SOM and MLP SOM clustering MLP is now
neural networks technique within the considered to be
scope of crime inadequate for
analysis with better recently advanced
results computer vision
processes, does not
consider spatial
information
2 Lin [21] Deep neural Feature design for It needs a massive
networks increased amount of data for
performance and carrying out the
descriptive capability classification or
crime analysis
3 Feng [22] Data mining Data mining Privacy, security, and
techniques approaches are information misuse
particularly are the huge
employed for crime challenges if they are
prediction. not dealt with and
Holt-Winters resolved right
combined with
multiplicative
seasonality yields
superior results
4 Stalidis [25] State-of-the-art The efficiency of It needs a massive
methods and deep various parameters in amount of data for
learning the deep learning having a better
frameworks provides performance than
perspective for their other approaches
configuration to
attain better
performance
5 Kang [26] Deep neural network DNN results from an The kind and time of
(DNN) effective crime incidences and
decision-making to get another type
process, which helps of data for their
in the statistical prediction
analysis of data
redundancy. DNN
model offers more
accuracy in the
prediction of crime
incidence compared
to other prediction
models
(continued)
808 J. Jeyaboopathiraja and G. Maria Priscilla

Table 2 (continued)
S. no. Author name Technique name Benefits Drawbacks
6 Shermila [27] Multilinear The system describes KNN algorithm is
regression, the offender that it is not an active
K-neighbors employing learner; it does not
classifier, and neural algorithms use the training data
networks predicatively for learning anything
through, multilinear and just uses only
regression, the training data
K-neighbors for classification
classifier, and neural
networks
7 Lin [28] Deep learning Machine learning Crime incidence
algorithm technique developed prediction, similar to
to yield increased finding highly
prediction of future nonlinear
crime hotspot correlations,
locations, with results redundancies, and
verified by real crime dependencies
data between numerous
datasets
8 Krishnan [29] Neural network Crime prediction is Undefined behavior
done with recurring of the network,
LSTM networks hardship in
demonstrating the
problem to the
network, and the
duration of
the network is
unpredictable
9 Shruti S. Gosavi Association rule Prediction of crime Crime does not
[30] mining, k-means hotspots is a highly happen consistently
clustering, decision significant task, and across urban regions
trees, Naive Bayes, also usage of but is concentrated
and machine post-processing will in particular regions
learning techniques aid in reducing the
rate of crimes
10 Wang [31] CNNs and ST-ResNet Every training step
ST-ResNet framework, the past takes a much longer
dependencies have to time
be fixed in an explicit
manner and explicit
dependencies that are
much longer make
the network to be
highly complicated
and hard to train
A Thorough Analysis of Machine Learning and Deep Learning Methods … 809

3.3 Interpretation of Results

One more important dispute is the capability of accurate interpretation of results that
the algorithm generates. The algorithms also must be carefully chosen.

3.4 High Error-Susceptibility

Machine learning is independent but exceedingly vulnerable to mistakes. Assume,


an algorithm is trained with datasets sufficiently minute, so it cannot be impartial.
So the results are influenced forecasting resulting from a biased training set. This
provides unrelated advertisements that are being shown to end users. In ML, such
errors can propagate an error chain, which can go on unnoticed for long durations
of time. Also, if they are found out, a considerable amount of time is wasted in
identifying the source of the problem, and even longer to get it fixed.

4 Solution

Using deep learning and neural networks, novel representation has been designed
for the prediction of crime incidence [34]. Since deep learning [35] and artificial
intelligence [36] have gained many victories in the vision of the computer, they
have found application in BDA for the prediction of tendency and categorization.
Big data analytics offers the skills for transforming how that the department of law
enforcement and agencies in intelligence security perform the extraction of essential
information (e.g., criminal networks) from various sources of data in real-time for
corroborating their surveys. Also, deep learning can be introduced in the form of a
cascade of layers. Every succeeding layer takes the output signal from the earlier
layer as its info (input). This feature and several other features yield few benefits
while used for resolving different problems.

4.1 High-Level Performance

Presently, several domains such as computer vision, recognition in speech, and


processing the natural language like the neural networks relying on deep learning
technologies are exceedingly great when compared to the techniques that are
employed in conventional machine learning. The levels of accuracy are maximized
when simultaneous the error count gets reduced.
810 J. Jeyaboopathiraja and G. Maria Priscilla

4.2 Capability to Design New Functions

Traditional machine learning assumes that humans design purpose and this strategy
consume quite a lot of time. Deep learning has the capability of creating new services
depending on the inadequate number of them present in their learning dataset. It is
implied is that deep learning algorithms can generate novel works to attain the recent
goals.

4.3 Advanced Analytical Capabilities

An AI algorithm to function in the approved manner, labeled data has to be prepared.


The system dependent on algorithms of deep learning has the capability of emerging
“smarter” by itself while doing the processing solving method and can deal with the
unlabelled information.

4.4 Adaptability and Scalability

Deep learning methods are greatly simple to be adapted to dissimilar fields, and
compared to conventional ML algorithms, it can evolve the potential facilitation of
transfer learning in which the complete model is learned, in many cases, aiding to
attain much greater efficiency in a shorter span. Scalability is one more significant
drawback. The neural networks can deal with data increase compared to conventional
machine learning algorithms.

5 Conclusion and Future Work

A comprehensive study on crime analysis employing deep learning and machine


learning algorithms. The investigation based on deep learning and machine learning
techniques in crime analysis may probably happen in certain coming years. This
technical work investigates the efficiency of deep learning algorithms on this crime
data analysis domain and yields recommendations for the design and training of
deep learning systems in the prediction of crime hotspots, employing open data from
police reports. The benchmarked techniques are compared against deep learning
frameworks. The advantages and drawbacks of those two techniques are explained
clearly in the review section. At last, the deep learning-based techniques show a
consistent performance that is much better than the available best-performing tech-
niques. As to the work intended for the future, the efficiency of various parameters in
A Thorough Analysis of Machine Learning and Deep Learning Methods … 811

the deep learning is to be evaluated, and insights can be provided for their configura-
tion for attaining superior performance in crime classification and ultimately crime
prediction.

References

1. Hassani H, Huang X, Silva ES, Ghodsi M (2016) A review of data mining applications in
crime. Statist Anal Data Mining ASA Data Sci J 9(3):139–154
2. Memon MA, Soomro S, Jumani AK, Kartio MA (2017) Big Data analytics and its applications.
Ann Emerg Technol Comput (AETiC) 1(1):46–54
3. Dhyani B, Barthwal A (2014) Big Data analytics using Hadoop. Int J Comput Appl 108(12)
4. Hassani H, Saporta G, Silva ES (2014) Data Mining and official statistics: the past, the present
and the future. Big Data 2(1):34–43
5. McClendon L, Meghanathan N (2015) Using machine learning algorithms to analyze crime
data. Machine Learning Appl Int J (MLAIJ) 2(1):1–12
6. Tyagi D, Sharma S (2018) An approach to crime data analysis: a systematic review.
Communication, Integrated Networks Signal Processing-CINSP 5(2):67–74
7. Pramanik MI, Zhang W, Lau RY, Li C (2016) A framework for criminal network analysis using
big data. In IEEE 13th international conference on e-business engineering (ICEBE), pp 17–23
8. Jha P, Jha R, Sharma A (2019) Behavior analysis and crime prediction using Big Data and
Machine Learning. Int J Rec Technol Eng (IJRTE) 8(1)
9. Nadathur AS, Narayanan G, Ravichandran I, Srividhya S, Kayalvizhi J (2018) Crime analysis
and prediction using Big Data. Int J Pure Appl Math 119(12):207–211
10. ToppiReddy HKR, Saini B, Mahajan G (2018) Crime prediction and monitoring framework
based on spatial analysis. Proc Comput Sci 132:696–705
11. Yerpude P, Gudur V (2017) Predictive modelling of crime dataset using data mining. Int J Data
Mining Knowl Manage Process 7(4)
12. Pradhan I, Potika K, Eirinaki M, Potikas P (2019) Exploratory data analysis and crime prediction
for smart cities. In Proceedings of the 23rd international database applications and engineering
symposium
13. Yu CH, Ward MW, Morabito M, Ding W (2011) Crime forecasting using data mining
techniques. In IEEE 11th international conference on data mining workshops, pp 779–786
14. Jangra M, Kalsi S (2019) Naïve Bayes approach for the crime prediction in Data Mining. Int
J Comput Appl 178(4):33–37
15. Deepika KK (2018) SmithaVinod, Crime analysis in India using data mining techniques. Int J
Eng Technol 7:253–258
16. Dhaktode S, Doshi M, Vernekar N, Vyas D (2019) Crime rate prediction using K-Means. IOSR
J Eng (IOSR JEN) 25–29
17. Jain V, Sharma Y, Bhatia A, Arora V (2017) Crime prediction using K-means algorithm. Global
Res Dev J Eng 2(5):206–209
18. Sukanya M, Kalaikumaran T, Karthik S (2012) Criminals and crime hotspot detection using
data mining algorithms: clustering and classification. Int J Adv Res Comput Eng Technol
1(10):225–227
19. Ladeira LZ, Sanches MF, Viana C, Botega LC (2018) Assessing the impact of mining techniques
on criminal data quality, Anais do II Workshop de Computação Urbana (COURB), vol 2(1)
20. Keyvanpour MR, Javideh M, Ebrahimi MR (2011) Detecting and investigating crime by means
of data mining: a general crime matching framework. Proced Comput Sci 872–880
21. Lin YL, Yen MF, Yu LC (2018) Grid-based crime prediction using geographical features.
ISPRS Int J Geo-Inf 7(8)
812 J. Jeyaboopathiraja and G. Maria Priscilla

22. Feng M, Zheng J, Han Y, Ren J, Liu Q (2018) Big Data Analytics and Mining for crime data
analysis, visualization and prediction. in International conference on brain inspired cognitive
systems, pp 605–614
23. Chauhan T, Aluvalu R (2016) Using Big Data analytics for developing crime predictive model.
In RK University’s first international conference on research and entrepreneurship, pp 1–6
24. Pramanik MI, Lau RY, Yue WT, Ye Y, Li C (2017) Big Data analytics for security and criminal
investigations, Wiley interdisciplinary reviews: data mining and knowledge discovery, vol 7,
No 4
25. Stalidis P, Semertzidis T, Daras P (2018) Examining Deep Learning architectures for crime
classification and prediction, arXiv preprint arXiv:1812.00602
26. Kang HW, Kang HB (2017) Prediction of crime occurrence from multi-modal data using deep
learning. PloS one, vol 12, No. 4
27. Shermila AM, Bellarmine AB, Santiago N (2018) Crime data analysis and prediction of perpe-
trator identity using Machine Learning approach. In 2nd international conference on trends in
electronics and informatics (ICOEI), pp 107–114
28. Lin YL, Chen TL, Yu LC (2017) Using machine learning to assist crime prevention. In 6th
IIAI international congress on advanced applied informatics (IIAI-AAI), pp 1029–1030
29. Krishnan A, Sarguru A, Sheela AS (2018) Predictive analysis of crime data using Deep
Learning. Int J Pure Appl Math 118(20):4023–4031
30. Gosavi SS, Kavathekar SS (2018) A survey on crime occurrence detection and prediction
techniques. Int J Manage Technol Eng 8(XII):1405–1409
31. Wang B, Yin P, Bertozzi AL, Brantingham PJ, Osher SJ, Xin J (2017) Deep learning for real-
time crime forecasting and its ternarization. In International symposium on nonlinear theory
and its applications, pp 330–333
32. Mowafy M, Rezk A, El-bakry HM (2018) General crime mining framework for unstructured
crime data prediction. Int J Comput Appl 4(8):08–17
33. Ivan N, Ahishakiye E, Omulo EO, Wario R (2017) A performance analysis of business
intelligence techniques on crime prediction. Int J Comput Inf Technol 06(02):84–90
34. Wang M, Zhang F, Guan H, Li X, Chen G, Li T, Xi X (2016) Hybrid neural network mixed
with random forests and perlin noise. In 2nd IEEE international conference on computer and
communications, pp 1937–1941
35. Wang Z, Ren J, Zhang D, Sun M, Jiang J (2018) A deep-learning based feature hybrid framework
for spatiotemporal saliency detection inside videos. Neurocomputing 287:68–83
36. Yan Y, Ren J, Sun G, Zhao H, Han J, Li X, Marshall S, Zhan J (2018) Unsupervised image
saliency detection with Gestalt-laws guided optimization and visual attention based refinement.
Pattern Recogn 79:65–78
Improved Density-Based Learning to
Cluster for User Web Log in Data Mining

N. V. Kousik, M. Sivaram, N. Yuvaraj, and R. Mahaveerakannan

Abstract The improvements in tuning the website and improving the visitors’ reten-
tion are done by deploying the efficient weblog mining and navigational pattern
prediction model. This crucial application initially performs data clearing and initial-
ization procedures until the hidden knowledge is extracted as output. To obtain good
results, the quality of the input data has to be promisingly good, and hence, more focus
should be given to pre-processing and data cleaning operations. Other than this, the
major challenge faced is the poor scalability during navigational pattern prediction.
In this paper, the scalability of weblog mining is improved by using suitable pre-
processing and data cleaning operations. This method uses a tree-based clustering
algorithm to mine the relevant items from the datasets and to predict the navigational
behavior of the users. The algorithm focus will be mainly on density-based learning
to cluster and predict future requests. The proposed method is evaluated over BUS
log data, where the data is of greater significance since it contains the log data of
all the students in the university. The conducted experiments prove the effectiveness
and applicability of weblog mining by using the proposed algorithm.

Keywords Weblog mining · Navigational tree learning · Clustering ·


Density-based learning

N. V. Kousik (B)
Galgotias University, Greater Noida, Uttarpradesh 203201, India
e-mail: nvkousik@gmail.com
M. Sivaram
Research Center, Lebanese French University, Erbil 44001, Iraq
e-mail: sivaram.murugan@lfu.edu.krd
N. Yuvaraj
ICT Academy, Chennai, Tamilnadu 600096, India
e-mail: nyuvaraj89@gmail.com
R. Mahaveerakannan
Hindusthan College of Engineering and Technology, Coimbatore 110070, India
e-mail: mahaveerakannan10@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 813
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_59
814 N. V. Kousik et al.

1 Introduction

In the present scenario, the entire world is relying on the website to interact with
the other end. The institutions, organization, and industries retain their clients by
using many ways to make their website more efficient and reliable. This is achieved
using the auditing operation, which can be performed in two ways. The first way is
to evaluate the browsing history of a specific user and the collected information is
used for enhancing the website structure or feedback contents received from the user
and it can be used to improve the website experience. The second way is to record
the navigational history of the client and this is used to improve the experience of
the user. The second option is used widely since it does not rely on voluntary inputs
of the client and it also automates the analysis of user navigational history. This
is referred to as web usage mining (WUM) or weblog mining (WLM). The WLM
finds its application in many fields and it includes web content personalization,
recommender system [15], prefetching, and catching [14]. The benefits of weblog
mining find its major usefulness in e-commerce applications in browsing, where the
clients are targeted with relevant advertisements and products.
Such a web access file is created automatically by the web server and it includes
each view of the object, image, or HTML document, which is logged by the user. Each
weblog file of a website is a single line text obtained due to each view and it has two
log files, namely common log file and extended log file. Data in the file contains the
navigation patterns of single or multi-user of a single or multiple website browsing
behaviors of entire web traffic. The general characteristic other than a source of
collection of a weblog file includes a text tile with the identical format, contains
only single HTTP request, and support information like IP address, file name, HTTP
response status and size, request date and time, URL, and browser data.
Weblog mining consists of three processes: pre-processing or data cleaning,
mining the pre-processed data to extract hidden knowledge, and then analyzing the
results obtained after extraction. The weblog mining deals mostly with huge datasets,
and hence, issues occur due to the availability of space and run time. Apart from such
issues, the other challenges arise due to the nature of the log file [1]. The web server
logs poorly track the user navigational history since it provides the entire control
over server capacity and bandwidth. Here, a problem arises for the data mining algo-
rithm due to the web access log accumulated from the server, obtained from user
navigational history, which is used to extract the user’s navigational pattern.
In this paper, the main aim is to analyze the WLM process and pattern prediction
of the online navigational history of the user. The present work considers the access
log file to process and extract the hidden knowledge. Such usage data is also collected
from the user through browser cookies; however, it is not of prime concern, since it
raises privacy concerns associated with the user. The major contribution of the paper
includes pre-processing and cleaning operations under three stages. The second is the
usage of a tree-based clustering algorithm for mining the user navigational pattern.
The final contribution is the use of an effective way to predict the navigational
behavior of online users and test the effectiveness of the methods.
Improved Density-Based Learning to Cluster for User Web Log … 815

The outline of paper is organized as follows. Section 2 shows related works.


Section 3 provides proposed framework by integrating contributions. Section 4
evaluates proposed methods with various experiments. Section 5 concludes the paper.

2 Related Works

There are many similar approaches for improving WLM, which enhances pattern
sequence identification in the data stream. In [1], a task-oriented weblog mining
behavior is used for identifying online browsing behavior in PC and mobile plat-
forms. This method uses footstep graph visualization to navigate the patterns using
sequential rule mining in clickstream data. The purchase decision is predicted using
sequence rules in exploration-oriented browsing behavior.
In [2], pre-processing, knowledge discovery, and analyzing the pattern are used to
extract weblog data. The extraction process is carried out using a neuro-fuzzy hybrid
model. This method uncovers the hidden patterns from WUM on a college website.
In the case of [3], the removal process is carried out using mutually supervised and
unsupervised descriptive knowledge mining. The clustering using association rule
and subgroup knowledge discovery is carried out in an extra pure olive oil commercial
website.
The taxonomy is used as a constraint to WLM [4], in which the transaction data
or user information is extracted using a weblog mining intelligent algorithm. This
method helps to enable the third party straight right of entry on website functionalities.
A rational recommendation method based on the action is used for WLM [5]. This
method uses lexical patterns for itemset generation and better recovery of hidden
knowledge.
WLM is carried out using a tool [6], which evaluates the pedagogical process to
identify instructor’s and student’s attitudes in a web-based system. This web-based
tool provides support to measure various parameters both at the micro- and macro-
level, i.e., for instructors and policymakers, respectively [6]. The study to evaluate the
self-care behavior of the participants, who are elderly, is carried out using a self-care
service system. This system provides service and analysis of elder people on daily
basis using WLM activity. Here, various self-care services are analyzed statistically.
Then, an interest-based representation constructs the assembly of the elders using
the ART2-enhance K-mean algorithm, which clusters the patterns. Finally, sequence-
based representation with Markov models and ART2 K-mean clustering scheme is
used for mining cluster patterns [7].
Form the webpages, web user ocular movement data is captured using an eye-
tracking tool in [8]. This eye-tracking technology is used for the classification of key
objects in those websites, where conventional techniques surveying are eliminated.
In this technique, a web user’s eye position data is identified in a monitor screen
and the weblog sequence’s total page visits are combined with this. It also extracts
significant behavior approaching on user activities. Temporal property is used in [9]
for obtaining connection knowledge, where the temporal property is attained in this
816 N. V. Kousik et al.

technique using a fuzzy association. GA with 2-tuple linguistic demonstration is used


for resolving fuzzy sets in association rule mining. A fuzzy set intersection boundary,
discovery rules are used for extracting knowledge. To fit with real-world log data,
graph representation with enhanced fitness function is used in a genetic algorithm.
Graph theory is combined with sequential pattern mining in [21] for weblog mining.
The relationship between data is defined using the request dependency graph in [22].
In [10], EPLogCleaner is used to discover knowledge by filtering out irrelevant
items from common prefix URLs. This method is tested under real network traffic
trace from one enterprise proxy. In [11], a taxonomy-based intentional browsing data
is used for improving WLM. This method clarifies associations with other browsing
data. Besides, an online data collection method is used to build intentional browsing
data available for weblog data. In [12], an effective relationship between global and
local data items is studied by extracts information using user web sessions. The
data is segmented using similarity distance measures and the associations between
the data are met using the registry federation mechanism. Further, for performing
sequential mining, a distributed algorithm is used inside registry federation. In [13], a
navigational pattern tree is used for modeling weblog data behavior and a navigational
pattern mining algorithm is used for finding sequential patterns. This method scans
the navigational pattern tree’s sub-trees for generating candidate recommendations.
In [14], the content mining technique is combined with a weblog for generating
user navigation profiles to link prediction and profile enrichment using diversified
semantic information and user interest profiles based on language and global users.
The semantic information is obtained using destination marketing organizations,
which matches and provides a user-dependent profile for prospect web designs. This
method is tested over the bidasoaturismo website using this non-invasive web mining
with a minimum information web server. In the case of [15], an automated WLM and
recommendation system using existing user behavior on a clickstream is developed.
At this point, really simple syndication provides relevant information, and K-nearest
neighbor classification identifies the user clickstream data. It further matches the
user group and then browsing meets user needs. This is attained using extraction,
cleansing, formatting, and grouping of sessions from the RSS address file of the users
and then a data mart is developed.
In [16], a unified intention weblog mining algorithm is used for processing datasets
with multiple data types and it transforms well the browsing data into linguistic items
using the concept of fuzzy set. In [17], a fuzzy object-oriented web mining algorithm
is used with two phases, which are used for knowledge from weblogs or class or
instances. Fuzzy has intra- and inter-page mining phase, where former one linguistic
itemsets with the same classes and different attributes are derived, while in the latter,
a large sequence to represent the webpage relationship is derived. In [18], a website
structure optimization problem is resolved using an enhanced tabu search algorithm
with advanced search features from its multiple neighborhoods, dynamic tenure of
tabu, adaptive lists of tabu, and multi-level criteria.
In the case of [19], recommendation-dependent WUM enhances current recom-
mender systems quality using product taxonomy. This method tracks the customers
using a rating database on purchasing behaviors and considerably enhances quality
Improved Density-Based Learning to Cluster for User Web Log … 817

recommendations. Product taxonomy improves the nearest neighbor search using


dimensionality reduction. The remaining recommender system uses a Kohonen
neural network or self-organizing map (SOM) [20] for improving search pattern
in both online and offline.
However, this method suffers from poor scalability problems due to the rapid
usage of the system to mine weblog data. The solutions for improving mining are
shown in the proposed system.

3 Proposed System

The weblog data accumulate successful hits from the Internet. Hits are defined as
requests made by the user for viewing a document or an image in an HMTL format.
Such weblog data is created automatically and stored moreover in a client-side server
or proxy server from the organization database. Weblog data has details like computer
making query request’s IP address details, request time with details, user ID, the status
field for defining whether the request is successful or not, transferred file size, URL,
browser name and version.
Data cleaning as well as pre-processing steps involve page views creation and
session generations. The session operations identification is based entirely on time-
dependent heuristics. This kind of time-dependent approach decides session time out
using time duration the threshold. This helps to attain better quality output using the
proposed approach.

3.1 Pre-processing

This is an initial step to clean weblog content, which converts log data in an
unformatted version to be accepted as an input to the cluster mining process. The
cleaning as well as pre-processing operation mainly involves three major steps, which
include: cleaning of data, user identification, and session identification. Data cleaning
and user identification involve data integration, data anonymization, data cleaning,
feature selection, scenario building, feature generation, and target data extraction.
The cleaning operation helps to improve the unwanted entries, which is quite an
important step with regards to analysis or mining. The weblog data pre-processing
has nine steps, which are shown in Fig. 1.
Data Integration
Weblog source is acquired from a weblog server for 6 to 12 months duration. The
dataset has various fields that are integrated from multiple sources and used in eval-
uation for proving the proposed method’s effectiveness. Here, considered the BUS
dataset with student’s records with multiple fields, which is shown in Table 1.
818 N. V. Kousik et al.

Fig. 1 Steps in pre-processing

Table 1 Depiction of basic


Features Depiction of features
characteristics of BUS dataset
User_id Unique ID
student_username Student number
Login_time Student’s login time
Logout_time Student’s logout time
Credit Charge of Internet credit
Conn_duration Connection duration
Sum_updown Download/upload sum
Conn_state Connection state
Ras_description Hotspot name
Ras_ip IP of network connection
Kill_reason Disconnection warning
Reason Reason for network disconnection
Remote_IP Remote IP’s network connection
Station_IP System’s information code
Station_IP_Value System’s information value

Data anonymizationIt consists of three steps such as generalization, suppression,


and randomization. The first step replaces attribute value with a general value, the
second steps stop attributes real value release and its occurrence is displayed with
some notation, and the final step replaces real with a random value. The proposed
system uses a suppression strategy since the privacy of data is a major concern in
the proposed system. Private information from the BUS dataset is extracted using
student_username and student information’s privacy is preserved by replacing orig-
inal with some pseudo values, respectively. This avoids finding student’s identity by
an anonymous one but the proposed model is capable of handling.
Improved Density-Based Learning to Cluster for User Web Log … 819

Table 2 Feature generation


Features Feature generation Depiction of
feature
Student_username Code_Field Studying Field
User_Type user type
Entrance_Year Entrance year
Code_Level Studying level
Login_time Login_date Student’s login
date
Login_hour Student’s login
hour
Login_min Student’s login
minutes
Logout_time Logout_date Student’s logout
date
Logout_hour Student’s logout
hour
Logout_min Student’s logout
minutes

Data Cleaning
The cleaning operation involves three steps: missing as well as noisy data detection,
automated filling with a global constant value if missed/duplicated record is removed.
The negative values presence in the dataset makes the model’s performance uncom-
fortably. Hence, negative values in credit and duration features are replaced with
suitable positive values using binning as well as smoothening operations.
Feature Selection
This process eliminates redundant and irrelevant features using the Spearman corre-
lation analysis. It identifies features, which are correlating among them. In the present
dataset, features elimination like duration and kill_reason have taken place, since the
duration feature can also be obtained from login_time as well as logout_time, and
feature reason has elevated correlation through kill_reason. Another feature similar
to static_ip is eliminated, as it is established irrelevant to the current target.
Scenario Building
The proposed system is defined with two scenarios to analyze user behavior based
on university regulations. It includes the details of connected student’s identification
in the network and learning student’s behavior from any hotspots within a college
campus on a holiday.
Feature Generation
According to consider scenarios, required feature sets are generated by this. So,
student_username is divided into four features, which are shown in Table 2
820 N. V. Kousik et al.

Table 3 Intended characteristics for data extraction


Target Features
Validity, Reason, Logout-min, Logout-data, Logout-hour, Login-min, Login-data, Login-hour,
Ras-IP, Remote-IP, Successfully-state, Ras-decription, Sum-in–out-mb, YearOfInterance,
Duration, LevelCode, TypeOfuser, FieldCode

New validity feature creation presents with two values as per scenario one, i.e.,
no feature is set, if logout or login time of students is not set, else there would not be
any string in the value. With seven values, created day features as per scenario two.
Every day is allocated with a value.
Target Data Extraction
The target data is extracted from the above pre-processing operation and schema is
shown in Table 3. Depending upon the scenario two, the number of network connec-
tions from Ras_description is computed and stored. When network connection count
is more than the threshold for a student, the activity is considered as unusual behavior.
If a medical student connects to the engineering faculty hotspot, it is regarded as
unusual behavior.

3.2 Potential User Identification

This step is used for separating potential users from the BUS dataset and interested
users are identified using the C4.5 decision tree classification algorithm. Decisions
rules are set to extract potential users from the dataset and the algorithm avoids the
entries updated via network manager. The network manager normally collects and
updates information by crawling around webpages. Such crawling collects huge log
files and creates a negative impact while extracting knowledge from user navigational
patterns. The proposed method resolves the issue by identifying the entries made by
the network manager’s prior segmentation of the potential users.
The weblog entries updated by the network manager’s are identified using its IP
address; however, this knowledge finds it difficult to discover the search engine and
agents. Alternatively, a root directory of the website is studied, since the network
managers read the root files prior to website access. The weblog files containing the
website access details are given to each manager before crawling to know its rights.
However, the access to network managers cannot be relied on, since the exclusion
standard of the network manager is considered voluntary and they try to detect and
eliminate all the entries of network manager, which has accessed the weblog file.
And also detects to eliminate all the network manager access within midnight. This
leads to the elimination of the network manager entries in head mode and computes
browsing speed and excludes network manager’s speed less than threshold value T
and also when the total visited pages exceeding a threshold value.
Improved Density-Based Learning to Cluster for User Web Log … 821

Browsing speed is estimated based on total pages count browsed and total session
time. For handling total entries count by network managers, a set of decision rules
are applied. This helps to group the user into potential user and non-potential users.
Using valid weblog attributes, the classification algorithm classifies users based on
training data. Attributes selection is carried out within 30 s and session time for
referring total pages is 30 min. Further, the decision rule for identifying the potential
user is set less than 30 min and the total pages right of entry is predetermined to less
than 5. The access code post is used for classifying users and it reduces weblog file
size, which helps to improve clustering prediction and accuracy.

3.3 Clustering Process

The proposed method uses an evolving tree-based clustering algorithm, which groups
the potential users with its navigational pattern. The evolving tree graph sets connec-
tivity between the webpages, where the graph edges assign the weights based on
session time, connectivity time (is the measure of total visits in two webpages for a
particular session), and frequency.


N
Ti f x (k)
Tx y f y (k)
i=1
C x,y = (1)
N
Ti
Tx y
i=1

where
T i —time duration of the session, i in both x and y webpages.
T xy —requested time difference between x and y webpages in a specific session.
At kth position, if the webpage appears, it is denoted as f (k) = kand the frequency
measure between the x and y webpages is given as,

Nx y
Fx,y =   (2)
max N x , N y

where
N xy —total sessions containing x and y webpages,
N x —sessions count at page x,
N y —sessions count at page y.
The values Cx,y and Fx,y are used to normalize the time and frequency values
between 0 s and 1 s. Hence, the degree of connectivity between the two webpages is
calculated using,
822 N. V. Kousik et al.

2C x,y Fx,y
wx,y = (3)
C x,y + Fx,y

The weights are stored in an adjacency matrix (m) and each entry of m has wx,y
values, as per Eq. (3). The increased use of edges in graphs is eliminated by discarding
the lesser correlated threshold values with minimum frequency contribution.
Evolving Tree Fundamentals
Figure 2 shows the structure of the tree with N node nodes in the network, where each
node is N l, j , where l is the node identity and the j is a parent node, l = 1, 2,…,
N node and j = 0,1,…, and i = j. Consider an example, where the node N 2,1 has the
parent node as N 1,0 . On other hand, the weight vector of each node is given as wl
= {wl ,1 ,wl,2 , …, wl,n } with n as the number of features and bl as hit counter. The hit
counter has the total counts; a node becomes the best matching pair (match) between
the webpages. The size and depth of the tree are determined by N node nodes and
maximum layers, e.g., size of the tree is 9 and its depth is 4 as shown in Fig. 1.
The evolving tree has three types of nodes, namely the root node (is the first node,
N 1,0 ), trunk node (blue circle other than N 1,0 ), and leaf nodes (green circle, N lf , where
lf ∈ l). The root node is the first layer of the evolving tree and does not have a parent

Fig. 2 An example of the N1,0


ETree structure
Layer 1

Layer 2 N2,1 N3,1


Tree Depth

Layer 3 N4,2 N5,2 N6,3 N7,3

Layer 4 N8,5 N9,5

Fig. 3 Pre-processing and 8000


cleaning operation to test the 7000
transactions
Number of Transactions

6000 Users
Potential Users
5000

4000

3000

2000

1000

0
Day - 1 Day - 2 Day - 3
Improved Density-Based Learning to Cluster for User Web Log … 823

node (j = 0). The trunk node is found between the leaf and root node, which is a
static node and acts as an interconnection node between the leaf nodes. The leaf node
does not have any child nodes and a minimum trunk node is used to determine the
distance between two leaf nodes. For example, the total trunk nodes between N 7,3
and N 4,2 are 3, and hence, the tree distance between N 7,3 and N 4,2 is also 3 or (N 7,3 ,
N 4,2 ) is 3.
Evolving Tree-Based Clustering Algorithm
The evolving tree has two parameters, namely splitting threshold, θ s and the creation
of child nodes during the split process, θ c . Consider a training sample data with a
total number of features as n, where X(t) = [x 1 , x 2 ,…, x n ] and the entire algorithm
takes seven steps to learn the objects from the weblog data, which is the training
sample data.
Initially, the training data is fetched, and if the training model is available, the
dataset is loaded into it, or else, a new training model is created. This operation
takes place in the root node and then the process moves to the leaf node. The total
number of best matching pair in the training dataset is found. The distance between
the best matching pair is found using Euclidean similarity value and then the child
node is matched at layer 2 using E{X(t)}. Finally, the shortest distance between
the child and the best matching pair is found using the minimum distance equation
from Eq. (4). Check if the leaf node and the match2 values are same since it leads
to the calculation of N match value from X(t). When the leaf node does not match
with match2 values, then the child node is estimated. The overall process is repeated
until the entire leaf node (N lf ) is found. Check if the value of N lf is greater than one
and the score(d(X(t),W l )) are the same, then the matching pair is chosen Randomly.
On the other hand if N lf value is lesser than or equal to one and score(d(X(t),W l ))
are dissimilar, then the process goes to step 13. Once the matching pair is chosen
using weighted values, the weight vector of the best matching pair is found and
updated. Then the weight vector of the best matching pair is updated and the process
is repeated until the best-weighted pair is updated. After updating the matching pair
(wx,y or wmatch ), the weight vector is updated using Kohonen learning rule given in
Eq. (5). The neighborhood function of the tree is calculated using Eq. (6) to obtain
the expansion of tree structure. Finally, the updated N match value is chosen as a parent
node and the weighted of the parent node is considered as a child node. Once all the
values of tree and child nodes are known, the tree is updated, and hence, training
model is updated further. This is used as a learning model for new training data.
Algorithm 1: Evolving tree-based clustering algorithm
1: Fetch a training data, X(t) = [x 1 , x 2 ,…, x n ]
2: If trained model is available
3: Load the trained model (θ s ,θ c )
4: Else
5: Create a new model (θ s ,θ c )
6: End
7: While process move from root to leaf node
824 N. V. Kousik et al.

8: Find N match for X(t)


9: Find Euclidean similarity measure between the pairs X(t) in the webpage =
E{X(t)}
10: Match the child node at layer 2 with E{X(t)}, then
11: Calculate the minimum distance between the child node and best matching pair
at layer 2,
12:

d(X (t), Wl ) = ||X (t) − wl (t)|| (4)

13: If match2 = leaf node, Then //best matching pair at layer 2


14: Calculate N match for X(t)
15: Else
16: Child node related to match at layer 2 is matched in a similar fashion
17: End
18: Repeat the process until the N lf are found
19: If N lf > 1 && score(d(X(t),W l )) = same
20: rand (match) //matching pair is chosen in random manner
21: Else
22: Go to step 13
23: End
24: Update the wx,y or wmatch // weight vector of best matching pair is updated
25: bmatch = bmatch + 1
26: Weight vector is updated using Kohonen learning rule

 
wl f (t + 1) = wl f (t) + h Nmatch_l f (t) X (t) − wl f (t) (5)

27: Calculate h Nmatch_l f //neighborhood function

  2 
−dT Nmatch , Nl f
h Nmatch_l f (t) = α(t)e (6)
2σ 2 (t)

where
 
dT Nmatch , Nl f - distance between the N match and N lf
α(t) - learning rate,
σ (t) - Gaussian kernel width, which is monotonically reduced with t
28: The tree is considered growing
29: If bmatch = θ s
30: Update N match as a parent node with θ c
31: Initialize w(θ c ), such that weight of parent node is same as child nodes
32: Else
33: GoTo step 26
Improved Density-Based Learning to Cluster for User Web Log … 825

34: End
35: Training model is updated
36: Learn new training data
37: End

3.4 Prediction Engine

The prediction engine classifies the user navigation patterns and the future request
of the user is predicted based on this engine classifier. The longest subsequence
algorithm is utilized for such prediction process and it finds the common longest
subsequence bout the entire sequence and the algorithm consists of two properties:
• When two sequence x and y in a webpage ends up with similar element, the
common longest subsequence (cls) is founded by eliminating the end element
and then the shortened sequences are found.
• When two sequence x and y in a webpage do not end up with similar element, then
the longest sequence between x and y in a webpage is found, as x is cls(x n ,ym-1 )
and y is cls(x n-1 ,ym )
The cls is thus calculated using, Eq. (7), which is given by,

⎨ 0 if i or j = 0
cls(xi , yi ) = cls(xi−1 , yi−1 ), xi if ẍi = ÿ j (7)

long(cls(xi−1 , yi ), cls(xi , yi−1 )) if ẍi = ÿ j

The cls common to x i and yj is found by comparing its elements ẍi and ÿi . The
sequence cls(x i-1 ,yj-1 ) can only be extended by ẍi , when both x i and yj are equal. If
x i and yj are not equal the longer sequence in cls(x i-1 ,yj ) and cls(x i ,yj-1 ) is found, and
if cls(x i-1 ,yj ) and cls(x i ,yj-1 ) are same, then both values are retained, provided both
the values are not identical.
The following algorithm 2 consists of the following steps: The webpages are
assigned with URL, and for each web pair, the weight is calculated. The edge weight
is computed over entire nodes in the graph and the edges with minimum frequency
are removed. The remaining high-frequency nodes are further used to form the cluster
using a depth-first search. Finally, the cluster with the minimum size is removed from
the graph.
Algorithm 2: Prediction Engine Algorithm
1: URL is assigned over the list of webpages, L[p] = P
2: For each pair (Pi , Pj ) ∈ L[p] do // webpage pair
3: Then M ij = w(Pi , Pj ); // weight is computed using Eq. (3)
Edgeij = M ij
4: End for
5: For Edgeu, v ∈ GraphE,V do //Edge weight with minimum frequency is removed
826 N. V. Kousik et al.

6: If Edgeu,v < min(frequency)


7: Remove Edgeu,v
8: Else
9: Keep Edgeu,v
10: Endif
11: End for
12: For all vu do// vertices (u)
13: C[i] = DFS (u); // perform Depth first search to form the cluster
14: If C[i] < min(size(C))
15: Remove C[i] //remove cluster with length lesser than minimum size of cluster
16: Else
17: End
18: i=i+1
19: End for
20: Return the Cluster

4 Evaluation and Discussions

This section evaluates the proposed method through a series of experiments and the
BUS dataset (Collected from Bharathiyar University, India) is used for experimenting
with the testing environment. The dataset consists of 1,893,725 entries and 533,156
webpages.
The algorithms are implemented with Java programming. The results related to
the pattern discovery are obtained using the proposed evolutionary tree-based clus-
tering. The paper also indicates the gaining performance concerning user navigational
pattern’s accurate prediction and run time. Initially, the process starts by discarding
or filtering the noisy weblog data, implicit request, and error entries and network
manager entries. Then the clustering process is carried out to group the potential
users and the longest subsequence algorithm is used for best predictive response for
future request.

4.1 Session Identification

The improvements are suggested by proposing that returning users without repeated
user requests do not help in identifying the knowledge related to the navigational
pattern. Once the sessions are detected using threshold timing, say 30 min, the
checking is done to detect whether the user pattern is shared by the same user or
not. When the user navigation shared pattern exists, then the identified sessions are
approved; otherwise, the sequences getting split to sessions are skipped. The inves-
tigations are carried out to find the effects associated with the quantity and quality
of the sessions identified.
Improved Density-Based Learning to Cluster for User Web Log … 827

The present investigation carries out two-time thresholds, 10 and 30 min with
an equal set of experiments. The timing threshold is carried out with three different
minimum lengths of patterns (lsp), 1 to 3. The test length is tested with a different set
of variables that ranges from 10–100%. As the variable value increases, the sessions
associated with the patterns share well. Using the values of Table 3, the ratio for a
different time and lsp is evaluated.

4.2 The inference from Table 4 can be obtained as follows


as the threshold time reduces the ratio of correctly
classifying instances gets reduced, this is due to the fact
that minimum frequency values are mostly eliminated
at the lower threshold level. Hence, the lowest ratio value
defines very lesser correctly classified instances
from the weblog data. And also, with reducing the length
of the patterns, the ratio of classifying the instances
increases greatly, which is tenfold times greater
than the lsp3 2 and 3. This is the same case for threshold
time of 30 min; however, the ratio of classification
has improved to greater instant than the threshold time
of 10 min. pre-processing

The proposed algorithm is tested over the dataset to find the benefits of the proposed
algorithm, which uses the proposed navigational pattern. The raw data is sent through
the process of pre-processing, cleaning, and session identification prior to clustering.
Table 3 provides the results of total transactions and memory used for storing the
weblog cleaned data. Figure 3 shows the evaluated results of identified potential
users after the pre-processing operation and it is found that many irrelevant items are
removed that result in high-quality potential users identified (Table 5).

Table 4 Results of various sessions for user navigational patterns


Threshold Sessions identified False positives rate Total sessions Ratio = False
correctly identified positives / Total
10 min 16,173 7129 23,302 30.59
30 min 15,345 5793 21,138 27.41
828 N. V. Kousik et al.

Table 5 Results of sessions for varying length and threshold time


R in % Ratio time = Ratio time = Ratio time = Ratio time = Ratio time = Ratio time =
10 min lsp ≥ 10 min lsp ≥ 10 min lsp ≥ 30 min lsp ≥ 30 min lsp ≥ 30 min lsp ≥
3 2 1 3 2 1
100 1.6 1.7 19.8 4.2 4.5 21.1
90 1.6 1.7 20.8 4.2 4.5 21.8
80 1.6 1.9 22.4 4.2 5 23.2
70 1.6 2.4 23.5 4.2 5.5 24
60 1.8 3.6 25.3 4.4 7 25.1
50 1.9 5.1 26.5 4.6 8.8 25.7
40 2.1 6.5 26.9 5 10.4 26
30 2.6 8 27.3 4.9 12.1 26.2
20 4.2 10.4 27.4 7.3 14.6 26.3
10 5.6 12.1 27.4 9.5 15.5 26.3

4.3 Results of Clustering

The clustering result is used to find the different forms of information, which is
extracted from the weblog data. It contains the total number of visits attempted over
a website, traffic of the webpage, frequency of the page viewed, and behavior of
navigational user pattern. The weblog data is considered with 100 unique webpages,
where the clarity is improved by assigning it with codes. The total visits made over
24 h on these 100 pages are tested and this is used to test the proposed system’s
performance. The minimum frequency or the threshold value is used to remove the
correlated edges with lower value and the size of the minimum cluster is set as one.
It is clear from Table 4 that the threshold value of 0.5 shows the optimal results with
the associated dataset. The test is repeated against different weblog data sizes and the
results are found. Thus, the clustering results are 0.5 and are used for the prediction
process.

4.4 Prediction Results

The prediction algorithm performance is evaluated using three performance param-


eters as shown in Eq. (8)-(10). The navigational pattern obtained from the previous
step is divided into two sets. The generation of prediction and evaluation of prediction
are the two sets in this process. The parameters are thus defined in Eq. (8)–(10),

|P(an , T ) ∩ Evaln |
accuracy = (8)
|P(an , T )|
Improved Density-Based Learning to Cluster for User Web Log … 829

|P(an , T ) ∩ Evaln |
coverage = (9)
|Evaln |
coverage(P(an , T )) × 2 × accuracy(P(an , T ))
F1 − measur e = (10)
coverage(P(an , T )) + accuracy(P(an , T ))

where
an–navigation pattern for active session and.
T–threshold value.
P(an,T)–prediction set.
Evaln–evaluation set.
The prediction rate is increased, since the accuracy of the prediction is increased
with the threshold values and the best accuracy is obtained is 92%.

5 Conclusions and Future Work

An effective weblog mining framework is proposed to solve the online navigational


prediction behavior. The proposed heuristic method is tested under two experimental
sets and the results showed that the patterns obtained have only lesser false posi-
tive instances. With the increasing number of instances, the false positive instances
get smaller and this provides good quality output due to proper cleaning and pre-
processing prior to the application of the mining algorithm. The algorithm utilized for
mining uses a clustering process to select its relevant from the appropriate input. This
clustering algorithm helps to detect the entire pattern relevant to the user navigational
pattern and it does not focus only on the most frequent patterns. The framework for
improving the online user navigational behavior prediction fits well with the predic-
tions of online patterns. This framework has reduced the running time and produces
online patterns in a reduced time instance.

References

1. Raphaeli O, Goldstein A, Fink L (2017) Analyzing online consumer behavior in mobile and
PC devices: a novel web usage mining approach. Electron Commer Res Appl 26:1–12
2. Shivaprasad G, Reddy NS, Acharya UD, Aithal PK (2015) Neuro-fuzzy based hybrid model
for web usage mining. Procedia Computer Science 54:327–334
3. Carmona CJ, Ramírez-Gallego S, Torres F, Bernal E, delJesús MJ, García S (2012) Web usage
mining to improve the design of an e-commerce website: OrOliveSur. com. Expert System
Appl 39(12):11243–11249
4. Devi BN, Devi YR, Rani BP, Rao RR (2012) Design and implementation of web usage mining
intelligent system in the field of e-commerce. Procedia Engineering 30:20–27
830 N. V. Kousik et al.

5. Lopes P, Roy B (2015) Dynamic recommendation system using Web usage mining for e-
commerce users. Procedia Comput Sci 45:60–69
6. Cohen A, Nachmias R (2011) What can instructors and policy makers learn about Web-
supported learning through Web-usage mining. Int Higher Educ 14(2):67–76
7. Hung YS, Chen KLB, Yang CT, Deng GF (2013) Web usage mining for analysing elder self-care
behavior patterns. Expert Syst Appl 40(2):775–783
8. Velásquez JD (2013) Combining eye-tracking technologies with web usage mining for
identifying Website Keyobjects. Eng Appl Artif Intell 26(5):1469–1478
9. Matthews SG, Gongora MA, Hopgood AA, Ahmadi S (2013) Web usage mining with
evolutionary extraction of temporal fuzzy association rules. Knowl-Based Syst 54:66–72
10. Sha H, Liu T, Qin P, Sun Y, Liu Q (2013) EPLogCleaner: improving data quality of enterprise
proxy logs for efficient web usage mining. Procedia Computer Science 17:812–818
11. Tao YH, Hong TP, Su YM (2008) Web usage mining with intentional browsing data. Expert
Syst Appl 34(3):1893–1904
12. John JM, Mini GV, Arun E (2012) User profile tracking by Web usage mining in cloud
computing. Procedia Engineering 38:3270–3277
13. Huang YM, Kuo YH, Chen JN, Jeng YL (2006) NP-miner: A real-time recommendation
algorithm by using web usage mining. Knowl-Based Syst 19(4):272–286
14. Adeniyi DA, Wei Z, Yongquan Y (2016) Automated web usage data mining and recommenda-
tion system using K-Nearest Neighbor (KNN) classification method. Applied Computing and
Informatics 12(1):90–108
15. Tao YH, Hong TP, Lin WY, Chiu WY (2009) A practical extension of web usage mining with
intentional browsing data toward usage. Expert Syst Appl 36(2):3937–3945
16. Hong TP, Huang CM, Horng SJ (2008) Linguistic object-oriented web-usage mining. Int J
Approximate Reasoning 48(1):47–61
17. Yin PY, Guo YM (2013) Optimization of multi-criteria website structure based on enhanced
tabu search and web usage mining. Appl Math Comput 219(24):11082–11095
18. Cho YH, Kim JK (2004) Application of Web usage mining and product taxonomy to
collaborative recommendations in e-commerce. Expert Syst Appl 26(2):233–246
19. Zhang X, Edwards J, Harding J (2007) Personalised online sales using web usage data mining.
Comput Ind 58(8):772–782
20. Musale V, Chaudhari D (2017) Web usage mining tool by integrating sequential pattern
mining with graph theory, 1st International Conference on Intelligent Systems and Information
Management (ICISIM), Aurangabad, India. https://doi.org/10.1109/ICISIM.2017.8122167
21. Liu J, Fang C, Ansari N (2016) Request dependency graph: A model for web
Spatiotemporal Particle Swarm
Optimization with Incremental Deep
Learning-Based Salient Multiple Object
Detection

M. Indirani and S. Shankar

Abstract The recent developments in the computer vision application will detect
the salient object in the videos, which plays a vital role in our day-to-day lives. Diffi-
culty in integrating spatial cues with motion cues makes the process of a salient object
detection more difficult. Spatiotemporal constrained optimization model (SCOM) is
provided in the previous system. Since the better performance is exhibited in the
detection of single salient object, the variation of salient features between different
persons is not considered in this method and more general agreement related to
their significance is met by some objects. To solve this problem, the proposed
system designed a spatiotemporal particle swarm optimization with incremental
deep learning-based salient multiple object detection. In this proposed work, incre-
mental deep convolutional neural network (IDCNN) classifier is introduced for a suit-
able measurement of success in a relative object saliency landscape. Spatiotemporal
particle swarm optimization model (SPSOM) is used for performing the ranking
method and detection of multiple salient objects. In this system to achieve global
saliency optimization, local constraint temporal as well as spatial cues is exploited.
Prior video frame saliency map and change detection motion history are done using
SPSOM. Moving salient objects are distinguished from diverse changing background
regions. When compared with existing methods, better performance is exhibited
using proposed method as shown in results of experimentation concerning recall,
precision, average run time, accuracy and mean absolute error (MAE).

Keywords Salient object · Spatiotemporal particle swarm optimization model


(SPSOM) · Incremental deep convolutional neural network (IDCNN) classifier ·
Global saliency optimization

M. Indirani (B)
Assistant Professor, Department of IT, Hindusthan College of Engineering and Technology,
Coimbatore 641032, India
e-mail: mindirani2008@gmail.com
S. Shankar
Professor, Department of CSE, Hindusthan College of Engineering and Technology, Coimbatore
641032, India
e-mail: shanx80@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 831
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_60
832 M. Indirani and S. Shankar

1 Introduction

In recent days, video salient object detection (VSOD) has gained more interest.
In general, during free-viewing to understand the underlying mechanism of HVS,
it is essential to have video salient object detection (VSOD) and it is also used
in various real-time applications like weakly supervised attention, robotic interac-
tion, autonomous driving, video compression, video captioning, video segmentation
[1–5].
Due to the challenges in video data like large object deformations, blur, occlu-
sions and diverse motion patterns and human visual attention behavior’s inherent
complexity like attention shift, selective attention allocation and great difficulties
are presented in VSOD in addition to its practical and academic significance. So, in
the past few years, research interest is increased apparently. Detection of the salient
object is a task that is based on the mechanism of visual attention, where the algorithm
aims for exploring more attentive or objects other than scene or images surrounding
area.
From the background, foreground objects are identified in the detection of a salient
object in video and image. The assumption that objects can be distinctive, pattern,
motion, textured on when compared with background forms base for this technique
[6]. The saliency map is outputted in one frame, where probability of pixel belonging
to a salient object is represented by every value. Potential objects are identified using
pixels having high probability.
Difficulty in integrating spatial cues with motion cues makes the process of detec-
tion of a salient object is more difficult and it is also difficult to deal with static adjacent
frames and unavailability of motion feature. Complicating factors like cluttered back-
ground, large background motion, shadowing due to illumination changes, intensity
variation influence the acquired video quality. In a video, for spatiotemporal salient
object detection, various methods have been proposed until now [7, 8].
The ability of high-level semantic feature representation of deep convolutional
neural network (CNN) makes it more suitable in recent days. To detect a salient object,
various CNN-based methods are proposed and they have produced better results [9–
11]. But, CNN output has non-sharp boundaries and coarse due to the presence of
pooling layers and a convolutional layer having a huge amount of receptors [12].
Learning of new classes is allowed using effective DCNN with incremental growing
and training method while sharing the base network’s part [12]. According to relative
saliency’s hierarchical representation, proposed an IDCNN classifier.
The paper is organized as, different salient object detection methods are discussed
in Sect. 2, and for multiple salient object detection, a model is proposed in Sect. 3,
experimentation and analysis are presented in Sect. 4 and Sect. 5 concludes the
research work.
Spatiotemporal Particle Swarm Optimization with Incremental … 833

2 Related Works

Le and Sugimoto (2018) present a technique for recognizing notable items in record-
ings, where worldly data notwithstanding spatial data is completely considered. The
system introduced a new set of Spatio Temporal Deep (STD) features that exploit
local and global contexts over frames. Furthermore, Spatio Temporal Conditional
Random Field (STCRF) is proposed to calculate a saliency from STD features.
STCRF is the augmentation of CRF to the worldly space and portrays the connections
among neighboring areas both in a casing and over edges.
STCRF prompts transiently reliable saliency maps over casings, adding to the
precise discovery of notable items’ limits and clamor decrease during identifica-
tion. The structured strategy first fragments an info video into numerous scales and
afterward registers a saliency map at each scale level utilizing STD highlights with
STCRF. The last saliency map is registered by intertwining saliency maps at various
scale levels [13].
Chen et al. (2018) present a model for video notable item discovery called
spatiotemporal constrained optimization model (SCOM). It misuses spatial and tran-
sient signs, just as a neighborhood limitation, to accomplish a worldwide saliency
enhancement. For a robust motion computation of salient objects, present the scheme
to modelling the motion cues from optical flow field, the saliency map of the prior
video frame and the motion history of change detection. It can able to differentiate
the moving salient objects from diverse changing background regions.
Moreover, a viable objectness measure is structured with natural geometrical
translation to extricate some solid item and foundation districts, which gave as the
premise to characterize the closer view potential, foundation potential and the imper-
ative to help saliency engendering. These possibilities and the limitation are planned
into the structured SCOM system to create an ideal saliency map for each edge in a
video [14].
Qi et al. (2019) structured a fast video striking article identification strategy at
0.5 s each casing (counting normal 0.32 s for optical stream calculation). It mostly
comprises of two modules, the underlying spatiotemporal saliency module and the
connection channel-based remarkable worldly spread module. The previous one
coordinates the spatial saliency by powerful least hindrance separation and limit
balance prompt with fleeting saliency data from movement field. The last one joins
relationship channels to keep the saliency consistency between neighboring edges.
The over two modules are at last combined in a versatile manner [15].
Wu et al. (2018) structured a spatiotemporal notable article recognition technique
by incorporating saliency and objectness, for recordings with entangled movement
and complex scenes. The underlying notable article identification result is first based
upon both the saliency map and objectness map. A short time later, the district size of a
remarkable item is acclimated to acquire the casing shrewd striking article discovery
result by iteratively refreshing the article likelihood map, which is the mix of saliency
map and objectness map.
834 M. Indirani and S. Shankar

At last, to improve the fleeting lucidness, the succession-level refinement is


performed to produce the last striking article location result. Trial results on open
benchmark datasets exhibit that the proposed technique reliably outflanks the best in
class striking article location strategies [16].
Dakhia et al. (2019) center around precisely catching the fine subtleties of remark-
able articles by proposing a Hybrid-Backward Refinement Network (HBRNet),
which joins the elevated level and low-level highlights extricated from two distinctive
CNNs. Exploiting the entrance to the obvious signals and semantic data of CNNs,
our half and the half profound system helps in demonstrating the article’s unique
situation and saving its limits also.
In particular, the framework coordinates powerful mixture refinement modules by
consolidating highlight maps of two back to back layers from two profound systems.
Additionally, the planned refinement model uses the lingering convolutional unit to
give a compelling start to finish preparing. Besides, apply the element combination
strategy to empower full abuse of multi-scale highlights and continuously recuperate
the goals of the coarse expectation map.
Through the trial results, the framework shows that the planned system accom-
plishes cutting edge execution. In half and half refinement module, the leftover convo-
lutional unit just as the combination technique can improve the nature of the forecast
maps [17].

3 Proposed Methodology

Spatiotemporal particle swarm optimization model (SPSOM) with incremental deep


learning (IDL)-based salient multiple object detection is proposed. Here, foreground
and background image subtraction is performed by using the incremental deep
learning (IDL) algorithm.
A new model is proposed for detecting video multiple salient objects and ranking
method termed as spatiotemporal particle swarm optimization model (SPSOM),
where, for multiple objects global saliency optimization, local constraints, temporal
as well as spatial cues are exploited. Figure 1 shows the proposed system’s flow
diagram.

3.1 Input Video

For a given video sequence, visual and temporal detection of salient objects in every
frame Ft of a video, a sequence is a major goal, where frame index is represented as t.
The assumption that, for a given video sequence, by analyzing spatial and temporal
cues, background or salient objects of some reliable regions can be found is used
in this proposed saliency model, and from these detected reliable regions, saliency
Spatiotemporal Particle Swarm Optimization with Incremental … 835

Input video

Frame F Frame F

Incremental deep Learning

For each object

Motion Motion Motion Saliency


distribution edge history image maps

Motion Energy

Object-like regions

Spatiotemporal Particle Swarm


Optimization Model

Saliency maps for multiple objects

Salient Multiple Object

Fig. 1 Flow diagram of the proposed methodology


836 M. Indirani and S. Shankar

seeds can be derived for achieving global optimization of salient detection of the
object.
In a video sequence, for every frame, superpixels are generated for modeling
saliency using segmentation with SLIC and there exist around 300 pixels approxi-
mately in every superpixel. Superpixel labeling problem corresponds to the detection
of the salient object. In a frame, for every superpixel ri (i = 1, 2, ·... N), saliency
landspace value si ∈ [0, 1] is assigned in this technique.
Minimization of constrained energy function E(S) is formulated from superpixel
labeling using a system model, where saliency label’s configuration is represented
as S = {s1 , s2 ,..., s N }. Initially, reliable labels are assigned to some superpixels and
it has three potentials, namely smoothness potential , background potential  and
foreground potential .
N N   
min E(S) = (si ) + (si ) +  si , s j (1)
i=1 i=1 (i, j∈N )

s.t.(S) = k

Where, in a frame Ft , neighborhood set having spatially connected superpixels


pair is represented as N, constraint vector is represented as k and it has few values of
convincing saliency land space. Superpixel may be classified as background or salient
object using the measure of background potential  and foreground potential .
Overall saliency smoothing is promoted using smoothness potentials, where different
labels are assigned to neighboring superpixels.

3.2 Incremental Deep Convolutional Neural Network


(IDCNN)

Four classes (C1−C4) are used for training base network and after training, discarded
training data of those classes. Then, two classes (C5, C6) are given as an input sample
data and these data has to be accommodated in the network while maintaining those
initial four classes knowledge [18].
So, it requires an increase in capacity of network and network is retained only
with new data (of C5 and C6) in an effective way for classifying tasks’ classes
(C1−C6) using an updated network. The classification process is termed as a task-
specific classification if tasks are classified separately and it is termed as a combined
classification if they are classified together. Incremental learning model overview is
shown in Fig. 2.
Design Approach
In the same DCNNs, there exist classifier as well as feature extractor with several
layers, which makes the superiority of DCNNs. In the proposed training method,
fixed feature extractor corresponds to sharing convolutional layers and classifier
Spatiotemporal Particle Swarm Optimization with Incremental … 837

Fig. 2 Incremental learning model: the network needs to grow its capacity with the arrival of data
of new classes

corresponds to fully connected layer and there would not be any sharing of it. Process
of reusing learned network parameters for learning new classes set is termed as
sharing.
In every case, newly available data only used for learning new classes and there
is an assumption that there will be a similar feature in old as well as new classes. A
single dataset is split as various sets in a designed system for using them as an old
and new task data with multiple classes in the network update process.
Figure 3 shows that, in convolutional layers, around ∼60% of learning parameters
are shared with respective ReLu and batch normalization layers and ∼1% accuracy of

Fig. 3 Updated incremental network architecture for training methodology


838 M. Indirani and S. Shankar

Fig. 4 Overview of the DCNN incremental training methodology with partial network sharing

the baseline is achieved. Accuracy of classification is drastically degraded if around


60% of network parameters are shared. For maximum benefits, incremental training
method is developed in a designed system according to this observation.
With optimum sharing of network, incremental training is proposed in this system,
which is shown in Fig. 3. Two sets are formed by splitting available classes set
initially. The base network is trained using a core or larger set and cloned branch
network having various configurations of sharing is trained using a small set called
demo set. From the results of training, generated a sharing vs accuracy curve, which is
used for selecting the architecture of the network and optimum sharing configuration
of an application.
Without degrading accuracy of a new task, the ability to share base networks initial
layer is indicated using this curve. Select an optimum sharing configuration, which
satisfies quality requirements. New classes set can be learned using this optimum
configuration. From accuracy-sharing trade-off curve, it is difficult to compute the
optimum sharing configuration.
With energy benefits, for trading accuracy, tuning knob is provided by this
curve. Entire search space is needed not to be explored by this system and
network architecture-based heuristics can be applied and also it allows number
of training samples, dataset complexity and number of trainable parameters for
computing optimum sharing configuration in few cloned network’s retaining iter-
ations. Procedure for computing optimum sharing point is described in the following
passage.
Two sets are formed by separating available classes initially in a designed system
for training base network is shown in the Fig. 4. The flow diagram of the Spatiotem-
poral Particle Swarm Optimization Model (SPSOM) is shown in Fig. 5. They are
demo and core set. Then, the network is trained using a core set, and with a demo set,
the network is separated. For computing optimum sharing configuration, this sepa-
rated network’s accuracy is used as a reference. Branch network is created, which
will share the base network’s few initial layers, and using demo set, it is trained. This
branch network corresponds to the trained base network’s cloned version.
Branch network is trained using this system and its performance is compared
with reference accuracy. Sharing is increased, if the reference value is close to new
Spatiotemporal Particle Swarm Optimization with Incremental … 839

Fig. 5 Spatiotemporal
particle swarm optimization Initialize the number of super pixels
model (SPSOM)

Initialize position and velocity of


each super pixel

Compute distance for each super


pixel

Update Pbest

Update Gbest

Update position and velocity

Is the
stopping
No condition
Yes

Multiple object detection

accuracy and branch is again trained for comparison. Sharing is decreased, if the
reference value is greater than new accuracy and branch is again trained for compar-
ison. Based on required quality values, optimum sharing configuration is finalized
after little iteration.
Optimum sharing point shows the fraction of sharing, and after that, degradation
of accuracy to increased sharing is greater than the threshold of quality. With the
minimum loss of quality, maximum benefits can be achieved using this method. At
last, with core and demo set, the base network is retained for enhancing base network
features due to the availability of core and demo sets.
(A) Foreground potential
Using spatial–temporal visual analysis, few reliable object regions O can be
obtained. This is a major assumption used for defining foreground potential; these
regions are in salient object part. In a frame Ft , for every superpixel ri , foreground
potential is defined in the system as,
840 M. Indirani and S. Shankar

(si ) = F(ri )(1 − si )2 (2)

Where foreground term is represented as F(ri ) and superpixel ri ’s foreground


probability is evaluated using this term. As per theoretical computation, a superpixel
with high foreground term is having a high chance of being salient and it has a high
value of saliency land space si which is normalized in [0 to 1] range. In proposed
SCOM, minimization of energy is achieved by multiplying (1 − si )2 with foreground
term.
In reliable object regions O, between every superpixel ro and superpixel ri , average
appearance similarity F(ri ) is computed for modeling foreground term F(ri ). Where
the number of masked regions O is denoted as K, yellow dots represent masked
background regions B. Clustering is used for computing regions B and O. Object
regions movement is estimated by proposing motion energy term M(ri ). Foreground
term F(ri ) is expressed as,

F(ri ) = A(ri )M(ri ) (3)

where
 
1  distg2 (ri , r0 )
k
A(ri ) = exp − (4)
N 0−1 2σ 2

where geodesic distance between superpixel ri to superpixel r0 ∈ O is represented as


distg (ri , r0 ), in an undirected graph, along the shortest path from ri to r0 , accumulate
edge weights are used for computing it.
Motion Energy Term
Motion energy term M is modeled using the exploitation of optical flow field in
designed system and it indicates that Md , Me , Mh and St−1 are used to form a motion
energy term M. System uses Sobel edge detector for generating motion edge Me
optical flow field. Motion objects contours are extracted using this.
Over the entire frame, uniform color distribution is shown by backgrounds as
indicated by the color spatial distribution of optical flow field and unique nature is
exhibited in motion objects and they are very compact. Following shows motion
distribution measure Md defined in this system,


N
   
Md (ri ) =  pt r j − μi 2 vi j (5)
j=1

 
where superpixel r j ’s normalized centroid is represented as pt r j ;superpixel ri ’s
color similarity weighted centroid is represented as μi and is expressed as,
Spatiotemporal Particle Swarm Optimization with Incremental … 841
N  
j=1 vi j pt r j
μi = N (6)
j=1 θi j

Between superpixel ri and r j , color similarity is expressed as,


  
distc2 ri , r j
vi j = exp − (7)
2σ 2

Between superpixels and other pixels of color optical field, color discriminative-
ness and spatial distance are measured using motion distribution Md .
For frame Ft, motion edge Me and MHI Mh is defined using generated motion
distribution map Md. In Ft, for superpixel, motion energy term M(ri) is defined using
the integration of the prior frame’s saliency St − 1as,

γ (Mh (ri ) + Me (ri ) + Md (ri )


M(ri ) = (1 − γ )st−1 (ri ) + (8)
3
where, balance parameter is represented as γ and it is set as 0.5 in our experiment.
(B) Background potential
Background potential is defined in the system as opposed with foreground poten-
tial, for measuring every superpixels likelihood for being background. Background
potential (si ) of every superpixel ri is defined as,

(si ) = ωb (ri )si2 (9)

where background term is represented as ωb (ri ) which measures background


probability of superpixel ri . Clearly, superpixel having a small value of ωb (ri ) is
visually less salient. Energy minimization of Eq. (1) is promoted by multiplying si2
with background term.
Between superpixelri and superpixel in reliable background regions B, appearance
similarity is computed for defining background term as,

1 distg2 (ri , rb )
ωb (ri ) = exp(− (10)
|B| 2σ 2

Where in reliable background regions set B, the number of superpixels is repre-


sented as |B|, between superpixelri and superpixels of B, shortest average appearance
distance is represented as distg(ri,rb).
(C) Smoothness potential
Overall saliency labeling smoothing is achieved using smoothness potential,
where nighbouring pixels are penalized by assigning various saliency labels. It is
expressed as,
842 M. Indirani and S. Shankar
   
 si , s j = ωi j ri , r j (si − s j )2 (11)

where
  
  distc2 ri , r j
ωi j ri , r j = exp − , (i, j) ∈ N (12)
2σ 2

12
Where, within a frame, every spatially adjacent superpixels are available in neigh-
borhood set N. In CIE-Lab, color space between superpixelsri and rj is represented
as distc (ri ,rj ) and it also represents color features Euclidean distance, thus between
sup, erpixelri and rj , appearance similarity is measured using wij (ri , rj ).
(D) Reliable regions O and B
According to reliable background regions B and reliable object regions O, back-
ground potential and foreground potential are proposed. Computation of reliable
regions B and O is represented in this work. Salient object detection performance is
mainly defined using these regions.
In object-like regions K, superpixels are clustered using this system and objects
are more similar to superpixels near to the center of the cluster. The ri ’s cluster
intensity is represented as,
    
I (r )i = δ V (ri ) − V r j  − dc (13)
ri ,r j ∈K

Where cutoff value is represented as dc . In the proposed method, it lies between


0.05 and 0.5 and it is not sensitive. Delta function is expressed as,

1 ifx < 0
δ(x) = (14)
0 other wise

Superpixelri enclosed with the large number of neighbors rj is indicated using


cluster intensity I(ri), the high value of DC can be produced within cut off distance.
With the low intensity of cluster, center of the cluster enclosed with neighbors can
have high object probability.
Cluster intensity I(ri) is measured for selecting background regions B and reliable
object regions O. Select ri as object superpixel, if the threshold h o is less than I(ri),
likewise select ri as background superpixel, if the threshold h o is greater than I(ri).
Following defines thresholds h 0 and h b ,

h 0 = t0 ∗ max(I (ri )), ri ∈ K (15)

h b = tb ∗ min(I (ri )), ri ∈ K (16)


Spatiotemporal Particle Swarm Optimization with Incremental … 843

where, for object regions, cluster intensities spanning extent is expressed as to and
background regions, cluster intensities spanning extent is expressed as tb .

3.3 Salient Multiple Objects Detection

The relative salience of detected objects regions is considered in the proposed work
for predicting salient objects total count. For reliable object regions based on saliency
propagation, from K superpixels ro ∈ O, affinity matrix Woi ∈ R N ×N is defined for
every N superpixels ri ∈ S, so that,

Woi = [. . . , ωoi (ro , ri ), . . . , ω K N (r K , TN ) (17)

where


distc2 (r0 , ri )
ωoi (ro , ri ) = exp − , (ro , ri ) ∈ N (18)
2σ 2

Likewise, from M superpixel srb ∈ B, defined an affinity matrix Wbi ∈ R R M×N


based on reliablebackground regions to every superpixelsri ∈ S. With diagonal
elements doo = Woi , (o = 1; 2; 3; _ _ _;K), degree matrices of Woi is defined as
i 
Doi and with diagonal elements dbb = = Wbi ,, (b = 1; 2; 3; …;M), egree matrices
i
of Wbi is defined as Dbi . In matrix form, defined constraint function  (S) = | in
expression (1) using ranking technique as,
⎡ ⎤
s1
⎢ . ⎥
⎢ ⎥
⎢ . ⎥
⎢ ⎥
.., Doi −αWoi , . ⎢ ⎥
⎢ si ⎥ (19)
.., Dbi −βWbi , . (K +M)×N ⎢ ⎥
⎢ . ⎥
⎢ ⎥
⎣ . ⎦
sN N×1

I (ro )
=b (20)
I (rb ) (K +M)×1

where element-wise multiplication is represented as , solution vector is represented


as [s1 ; s2 ; s3 ; . . . ; s N ]T , where, in S, every element needs to be predicted and it is
a saliency label, cluster intensity vector is represented as [I(ro ); I(rb )], dimensional
weighting vector is represented as b with dimension (K × M), in which, based on a
threshold value defined above, every element of this vector is set as 0 for background
844 M. Indirani and S. Shankar

and 1 for an object. Balance parameters are represented as α and β, and in this
experiment, 0.99 is set to these parameters. Affinity matrices Woi and Wbi are not
square matrices; additional potential appending to E(S) cannot be transformed using
(S) = k.
For multiple video salient object detection, the model is presented in this work
and spatiotemporal particle swarm optimization model (SPSOM) which is a ranking
method, where local constraint, temporal and spatial cues are exploited for achieving
optimization of global saliency in multiple objects.
Social behaviors of fish schooling and birds flocking motivate PSO and it is an
evolutionary computation method. In a swarm, particles are used for representing
every solution, which is a basic principle of PSO. Particles correspond to superpixel
in this proposed work. In search space, every particle has its own position and vector
xi = (xi1 , xi2 ,…,xi D ), is used for representing it, where the search space dimension
is represented as D.
To search optimal salient object in search space, superpixels are moving with
velocity. The velocity of superpixel is indicated as vi = (vi1 , vi2 ,…,vi D ). Based
on every particles experience and its neighboring pixels experience, velocity and
position are updated by every particle. Object corresponds to a distance between
superpixels as assumed in proposed work.
Particles best position corresponds to its previous best position which is recorded
and represented as pbest and gbest corresponds to the best position of the population
achieved so far. The optimum solution is searched by PSO using gbest and pbest,
where, based on the following expressions, every particles position and velocity are
updated.
t+1
xid = xid
t
+ vid
t+1
(21)

   
vid
t+1
= ω ∗ vid
t
+ c1 ∗ r1 pid − xid
t
+ c2 ∗ r2 pgd − xid
t
(22)

where, in the evolutionary process, tth iteration is represented as t. In search space,


dthdimension is represented as d ∈ D, inertia weight is represented as w, on current
velocity, the impact of the previous velocity is controlled using this weight, accel-
eration constants are represented as c1 and c2 , random values having a uniform
distribution are represented as r1 and r2 and it lies between 0 to 1, in dth dimension,
pbest and gbest values of elements are represented as pid and pgd .
Predefined maximum velocity,vmax , and vidt+1
∈ [−vmax , vmax ] are used for limiting
velocity. Number of iterations or good fitness value are used as criteria for setting the
iteration limit of the algorithm. Multiple salient objects are produced as algorithm
output.
Algorithm 1: Spatiotemporal particle swarm optimization model (SPSOM)
Input: Number of superpixels in the reliable regions.
Output: Salient objects detection.
Spatiotemporal Particle Swarm Optimization with Incremental … 845

1. Initialize superpixels count (i = 1,…N) as particle


2. Initialize random position and velocity
3. While achieving stop condition
4. do
5. For each superpixeli
6. Evaluate distance (objective function)
7. If best fitness value (pBest) of history is worst that fitness value
8. New pBestis set as current value
9. end if
10. If gBest is worst than pbest
11. Set gBest = pBest
12. end if
13. Update position and velocity using Eq. (20) and (21)
14. end do
15. Return multiple object detection

4 Experimental Results

The dataset used for evaluation is presented in this section with parameters used
for evaluating salient object detection performance. Three benchmark datasets are
used in experimentation, which includes, Freiburg-Berkeley Motion Segmentation
(FBMS) datasets, which is commonly used and collected from https://lmb.informatik.
uni-freiburg.de/resources/datasets/moseg.en.html. In FBMS, drastic camera move-
ment is involved in various videos, and for extracting motion feature, large motion
noise is introduced by these movements. Testing and training set is formed by split-
ting FBMS dataset randomly. Figure 6 shows input images. The Mean Absolute Error
(MAE) performance of the proposed SPSOM with IDCNN method is compared with
the existing DSS and SCOM approaches which are shown in Fig. 7.

4.1 Evaluation Metrics

To detect multiple salient objects, system uses three various standard metrics for
measuring performance, which includes, precision-recall (PR) curves, accuracy,
average run time, mean absolute error (MAE). Deeply supervised salient object
detection (DSS), SCOM and SPSOM with IDCNN approaches are compared. The
proposed and existing methods performance is represented in Table 1.
Mean Absolute Error (MAE)
Absolute errors average defines mean absolute error | ei | =|yi − xi |{\displaystyle
|e_{i}|=|y_{i}-x_{i}|}, where yi {\displaystyle y_{i}} is the prediction
and xi {\displaystyle x_{i}} the true value. The mean absolute error is given
846 M. Indirani and S. Shankar

Fig. 6 Input images

by
n  
 
i=1 y i − x i
MAE = (23)
n
The performance of the proposed SPSOM with IDCNN method is compared with
existing DSS and SCOM approaches in terms of mean absolute error(MAE). In
x-axis, methods are represented and MAE is represented in the y-axis. In proposed
work, incremental deep convolutional neural network (IDCNN) classifier is proposed
Spatiotemporal Particle Swarm Optimization with Incremental … 847

0.08
0.07
0.06
MAE (%)
0.05
0.04
0.03
0.02
0.01
0
DSS SCOM SPSOM with IDCNN
Methods

Fig. 7 Mean absolute error (MAE) comparison

Table 1 Performance
Methods Metrics
comparison
MAE (%) Accuracy (%) Average Run
Time (s)
DSS 0.07 80 0.432
SCOM 0.069 87 37.5
SPSOM with 0.04 91 0.28
IDCNN

for measuring success in a relative object saliency landscape. It reduces mean absolute
error. From experimental results, it is concluded that proposed SPSOM with IDCNN
approach achieves 0.04% when other methods such as DSS and SCOM attain 0.07%
and 0.069%, respectively.
Figure 8 shows the accuracy of proposed SPSOM with IDCNN approach and
existing DSS and SCOM approaches. In x-axis, methods are represented and accu-
racy is represented in the y-axis. In proposed work, spatiotemporal particle swarm
optimization model (SPSOM) is introduced for achieving multiple objects global
saliency optimization. The distance between the superpixels is considered as an
objective function. Due to this optimization, the accuracy of the proposed system is
improved. From the graph, it can be concluded that the proposed system achieves
91% of accuracy when other methods such as DSS and SCOM attain 80% and 87%,
respectively.
Figure 9 shows the PR of proposed SPSOM with IDCNN approach and existing
DSS and SCOM approaches. In x-axis, recall value is represented and precision
is taken as the y-axis. From results, it shows that proposed SPSOM with IDCNN
approach achieves great performance than existing SPSOM with IDCNN approaches.
The average run time of the proposed SPSOM with IDCNN approach is compared
with the existing DSS and SCOM approaches. In x-axis, methods are taken
and average run time is taken as the y-axis. From Fig. 10, it is concluded that
848 M. Indirani and S. Shankar

100

90

Accuracy (%) 80

70

60

50

40

30

20

10
DSS SCOM SPSOM with IDCNN
Methods

Fig. 8 Accuracy comparison

DSS SCOM SPSOM with IDCNN


1
0.9
0.8
0.7
Precision

0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
Recall

Fig. 9 PR curves

proposed SPSOM with IDCNN method achieves 0.28 s when other methods such
as DSS and SCOM attain 0.432 s and 37.5 s, respectively. The performance of the
proposed SPSOM with IDCNN approach achieves better performance than existing
approaches.
Spatiotemporal Particle Swarm Optimization with Incremental … 849

40

Average run time (sec per frame)


36
32
28
24
20
16
12
8
4
0
DSS SCOM SPSOM with IDCNN
Methods

Fig. 10 Average run time (seconds per frame)

5 Conclusion

The proposed system designed a spatiotemporal particle swarm optimization model


(SPSOM) with incremental deep convolutional neural network (IDCNN) classifier
to detect multiple salient objects in a video. In this proposed work, a deep learning
algorithm is designed to split the foreground and background regions. From salient
objects, some reliable regions are detected using an objectness measure, which is used
for extracting constraint and modeling energy potential modeling and background for
supporting saliency propagation. Then the proposed spatiotemporal particle swarm
optimization model framework is introduced for generating an optimal saliency map
of every frame of video. Experimental results show that proposed system produced
better performance compared with existing DSS and SCOM approaches in terms of
average run time, MAE, accuracy and PR. In a frame, form changing and static back-
ground, salient objects motions can be extracted by introducing objectness measure
in future.

References

1. Wang W, Shen J, Porikli F (2015) Saliencyaware geodesic video object segmentation. In: IEEE
CVPR, pp 3395–3402
2. Xu N, Yang L, Fan Y, Yang J, Yue D, Liang Y, Price B, Cohen S, Huang T (2018) Youtube-vos:
Sequence-to-sequence video object segmentation. In: ECCV, pp 585–601
3. Pan Y, Yao T, Li H, Mei T (2017) Video captioning with transferred semantic attributes. In:
CVPR, pp 6504–6512
4. Guo C, Zhang L (2010) A novel multiresolution spatiotemporal saliency detection model and
its applications in image and video compression. IEEE TIP 19(1):185–198
5. Zhang Z, Fidler S, Urtasun R (2016) Instancelevel segmentation for autonomous driving with
deep densely connected mrfs. In IEEE CVPR, pp 669–677
850 M. Indirani and S. Shankar

6. Srivatsa RS, Babu RV (2015) Salient object detection via objectness measure. In: 2015 IEEE
international conference on image processing (ICIP), pp 4481–4485, IEEE
7. Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In:
Proceedings of the conference on computer vision and pattern recognition, pp 3395–3402
8. Yang J, Zhao G, Yuan J, Shen X, Lin Z, Price B, Brandt J (2016) Discovering primary objects
in videos by saliency fusion and iterative appearance estimation. IEEE Trans Cir Syst Video
Technol 26(6):1070–1083
9. Chen T, Lin L, Liu L, Luo X, Li X (2016) DISC: deep image saliency computing via progressive
representation learning. IEEE TNNLS
10. Li X, Zhao L, Wei L, Yang MH, Wu F, Zhuang Y, Ling H, Wang J (20165) DeepSaliency:
multi-task deep neural network model for salient object detection. arXiv preprint arXiv:1510.
05484
11. Wang L, Lu H, Ruan X, Yang MH (2015) Deep networks for saliency detection via local
estimation and global search. In: Proceedings of the IEEE conference on computer vision and
pattern recognition, pp 3183–3192
12. Zheng S, Jayasumana S, Romera Paredes B, Vineet V, Su Z, Du D, Huang C, Torr P (2015)
Conditional random fields as recurrent neural networks. In: ICCV
13. Le TN, Sugimoto A (2018) Video salient object detection using spatiotemporal deep features.
IEEE Trans Image Process 27(10):5002–5015
14. Chen Y, Zou W, Tang Y, Li X, Xu C, Komodakis N (2018) SCOM: Spatiotemporal constrained
optimization for salient object detection. IEEE Trans Image Process 27(7):3345–3357
15. Qi Q, Zhao S, Zhao W, Lei Z, Shen J, Zhang L, Pang Y (2019) High-speed video salient object
detection with temporal propagation using correlation filter. Neurocomputing 356:107–118
16. Wu T, Liu Z, Zhou X, Li K (2018) Spatiotemporal salient object detection by integrating with
objectness. Multimedia Tools Appl 77(15):19481–19498
17. Dakhia A, Wang T, Lu H (2019) A hybrid-backward refinement model for salient object
detection. Neurocomputing 358:72–80
18. Wang W, Shen J, Shao L (2017) Video salient object detection via fully convolutional networks.
IEEE Trans Image Process 27(1):38–49
Election Tweets Prediction Using
Enhanced Cart and Random Forest

Ambati Jahnavi, B. Dushyanth Reddy, Madhuri Kommineni,


Anandakumar Haldorai, and Bhavani Vasantha

Abstract In this digital era, the framework and working process of election and
other such political works are becoming increasingly complex due to various factors
such as number of parties, policies, and most notably the mixed public opinion. The
advent of social media has deployed the ability to converse and discuss with a wide
range of audience across the globe, whereas gaining a sheer amount of attention from
a tweet or post is unimaginable. Recent advances in the area of profound learning
have contributed to the use of many different verticals. Techniques such as long-
term memory (LSTM) perform a sentiment analysis of the posts. This can be used
to determine the overall mixed reviews of the population towards a political party
or person. Several experiments have shown how to forecast public sentiment loosely
by examining consumer behaviour in blogging sites and online social networks in
national elections. This paper has proposed a model of machine learning to predict the
chances of winning the upcoming election based on the common people or supporter
views on the web of social media. The supporter or user shares their opinion or
suggestions about the group or opposite group of their choice in social media. It has
been required to collect the text posts about election and political campaigns, and
then the machine learning models are developed to predict the outcome.

Keywords Sentiment analysis · Decision tree · Random forest and logistic


regression

A. Jahnavi · B. Dushyanth Reddy · M. Kommineni · B. Vasantha


Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation,
Vaddeswaram, AP, India
e-mail: madhuri.cbit@gmail.com
A. Haldorai (B)
Department of Computer Science and Engineering, Sri Eshwar College of Engineering,
Coimbatore, Tamil Nadu, India
e-mail: anandakumar.psgtech@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 851
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_61
852 A. Jahnavi et al.

1 Introduction

The online platform has become an enormous course for individuals to communi-
cate their preferences. Using various assessment techniques, the ultimate intent of
people can be found, for example, by eviscerating the content of the tendency, posi-
tive, negative, or truthful. For instance, assessment appraisal is always noteworthy
in a relationship to hear their client’s insights on their things by imagining eventual
outcomes of races and getting ends from film ponders. The data snatched from the
opinion evaluation is helpful for predicting the future choices. Rather than associ-
ating individual terms, the relation between the set of words is also considered. While
selecting the general assumption, each word’s ending is settled and united using a
cap. Pack of words will also ignore word demands, which prompt phrases with invali-
dation should be erroneously described. In the past decades, there has been a massive
improvement in the use of small-scale blogging stages, for instance, Twitter. Nudged
by that advancement, associations and media affiliations are continuously searching
for ways to analyze the information about what people ponder about their things
and organizations in the social platforms like Twitter [1]. Associations, for instance,
Twitratr, tweetfeel, and social mention are just an uncommon sorts of individuals
who advance tweet presumption examination as one of their organizations [2].
Although a significant proportion of work has been performed on how emotions
are expressed in different forms such as academic studies and news reports, where
significantly less study has been done [3]. Features, for instance, customized
linguistic component marks and resources, for instance, idea vocabularies have exhib-
ited the accommodation for supposition examination in various spaces, and anyway
will they also show significance for evaluation assessment in Twitter? This paper
begins to analyse this request [4].

2 Literature Survey

Notwithstanding the character goals on tweets, working out the concept of Twitter
messages is basically close to the sentence-level assumption evaluation, the
welcoming and express language used in tweets, as well as the general idea of
the local micro-blogging allows Twitter’s thinking evaluation to extend beyond the
expectation [5]. It is an open solicitation on how well the highlights and proce-
dures are utilized on continuously well-shaped information that will move to the
micro-blogging space [6].
Ref. [7] It involves measures such as data collection, pre-processing of documents,
sensitivity identification, and classification of emotions, training and model testing.
This research subject has grown over the last decade with the output of models hitting
approximately 85–90% [8].
Ref. [9] Firstly, in this paper, they have presented the method of sentiment analysis
to identify the highly unstructured data on Twitter. Second, they discussed various
Election Tweets Prediction Using Enhanced Cart and Random Forest 853

techniques in detail for carrying out an examination of the sentiments on Twitter


information [10].
Ref. [11] They suggested a novel approach in this paper: hybrid topic-based
sentiment analysis (HTBSA) for the task of predicting election by using tweets.
Ref. [12] Using two separate versions of SentiWordNet and evaluating regres-
sion and classification models across tasks and datasets, it offers a new state-of-
the-art method for sentiment analysis while computing the prior polarity of terms.
The research investigation is concluded by finding the interesting differences in
the measured prior polarity scores when considering the word part of speech and
annotator gender [13].
Ref. [14] This paper proposed a novel hybrid classification algorithm in this
paper that explains the conventional method of predictive sentiment analysis. They
also integrated the qualitative analysis along with data mining techniques to make
sentiment analysis method more descriptive [15].
Ref. [16] This research work chose to use two automated classification learning
methods in this paper: support vector machines (SVM) and random forest for
incorporating a novel hybrid approach to classify the Amazon’s product reviews.
Ref. [17] Here, the proposed research work aims to build a hybrid senti-
ment classification model that explores the basic features of the tweet and uses
the domain-independent and domain-related lexicons to provide a more domain-
oriented approach for analysing and extracting consumer sentiment towards popular
smartphone brands in recent years.

3 Methodology

The following figure shows the steps followed in the proposed model (Fig. 1).
Decision Tree
As the implementation of machine learning algorithms is mainly intended to solve
problems at the industry level, the need for more complex and iterative algorithms
is becoming as an increasing requirement. The decision tree algorithm is one such
algorithm used to solve problems in both regression and classification.
Decision tree is considered as one of the most useful algorithms in machine
learning because it can be used to solve many challenges. Here are a few reasons
why decision tree should be used:
(1) It is considered the most comprehensible machine learning algo-
rithm and can easily be interpreted.
(2) This can be used for problems with classification and regression.
(3) It deals better with nonlinear data as opposed to most machine learning
algorithms.
(4) Building a decision tree is a very quick process since it uses only one function
per node to divide the data.
854 A. Jahnavi et al.

Fetching the
raw data

Pre-processing Implementing
the Retrieved data Algorithm’s

Retaining
Accuracies from the
algorithms applied

Fig. 1 Flow chart of the proposed work

Recursive partitioning is an important instrument in data mining. It lets us explore


the structure of a collection of data while making decision rules simple to imagine for
predicting a categorical (classification tree) or continuous (regression tree) outcome.
This section explains the modelling of the CART, conditional inference trees (Fig. 2).

Fig. 2 Sample tree that Freak < 0.5


appears after the
implementation of cart
algorithm YES NO

Hate < 0.5


True

wtf<0.5 True

False True
Election Tweets Prediction Using Enhanced Cart and Random Forest 855

Random Forest
The random forest algorithm works by aggregating the predictions from different
depths of multiple decision trees. Decision tree in the forest will be trained on a
dataset subset called the bootstrapped dataset.
The portion of samples left out when constructing each decision tree in the forest
is referred to as the Out-Of-Bag (OOB) dataset. As observed later, the model can
automatically determine its own output by running each of the samples through the
forest in the OOB dataset.
Remember how the impurity measurement is generated with each feature by using
the Gini index or entropy when deciding on the criteria with which to split a decision
tree. Nonetheless, a predefined number of features are randomly chosen as candidates
in random forest. The above would result in a greater difference between the trees
which would otherwise have the same characteristics.
If the random forest is used for classification, and a new sample is provided, the
final prediction is made by taking most of the predictions produced in the forest by
each individual decision tree. In the event, it is used for regression and a new sample
is provided; the final prediction is made by taking the average of the predictions
produced in the forest by each individual decision tree.
Logistic Regression
Key backslide is a genuine model which uses a determined ability to display a parallel
subordinate variable in its main structure, but there are many dynamically complex
developments. Determined backslide is surveying the parameters of a critical model
in backslide analysis.
Numerically, a double-determined model has a dependent variable with two
possible characteristics, for example, pass / bomb, which is represented by a marker
variable called “0” and “1 . In the vital model, the log-risks (the odds logarithm) for
the value “1 tested is an immediate mix of one independent variable (“markers”)
in any event; the free factors can be either a double factor (two classes, coded by a
pointer variable) or a constant variable (any authentic value). The relative probability
of the value called “1 will shift from 0 (irrefutably the value “0 ) to 1; from now
on the limit that changes to probability over log opportunities is the defined limit,
hence the name.
The unit of estimation for the scale of the log-chances is known as a logit, from
a given unit, hence the elective names. Essentially proportionate to models with a
different sigmoid limit as opposed to the defined limit, the probit model, for example,
can use it similarly; the usual explanation for the main model is that it increases one
of the self-sufficient factors that multiply the odds of the result at a predictable rate,
with each free factor having its own parameter; this summarizes the odds magnitude
for a double bad variable.
The twofold determined backslide model has extensions to different degrees of
the dependent variable: straight out yields with various characteristics are shown by
multinomial vital backslide, and if the various classes are mentioned, by ordinal vital
backslide, for example, the comparing chances of ordinal key model.
856 A. Jahnavi et al.

Fig. 3 Classification of the retrieved data using logistic regression, represented in a form of plot

The model itself fundamentally models probability of yield similar to information


and does not perform verifiable gathering (it is definitely not a classifier), anyway
it might be used to make a classifier, for instance by picking a cutoff regard and
orchestrating commitments with probability more noticeable than the cutoff as one
class, underneath the cutoff as the other; this is a commonplace technique to make
a twofold classifier. The coefficients are overall not handled by a shut structure
verbalization, not in the least like straight least squares (Fig. 3).

4 Performance and Result Analysis

Considering that election outcomes are very difficult to predict using other methods,
including public opinion polls, and with social media such as Facebook and Twitter
increasingly prevalent, the authors chose to use Twitter’s sentiment analysis to
forecast Indian general election results (Table 1).

Table 1 Accuracies of
Algorithm Accuracy of tweets
algorithm’s
Neg Pos
Cart 0.8789 0.9437
Random forest 0.9155 0.9493
Logistic regression 0.8845 0.9268
Election Tweets Prediction Using Enhanced Cart and Random Forest 857

From the above table, the overall accuracies of the tweets obtained using cart
algorithm are 91.13%, random forest algorithm is 93.24%, and logistic regression is
90.56%. So, from the results, it is observed that the random forest algorithm works
better on election tweets data.

5 Conclusion

Hence, in this research work, in order to expand the data set size, there may be several
other prospective fields to perform this analysis, where it includes the data from other
major social networking sites, such as Twitter. It is also found that there is a dedicated
research space to work with the training dataset by considering the model dataset
that already specifies a certain number of algorithmic features. The major downside
of this research work is that it fails to recognize the significant parameter called
emotion, when defining the polarity of a tweet. Since the data was labelled manually,
the volume was not high enough to have more precise information, so more tweets
can be obtained and marked. As a continuation of this research work, the network
size will be increased.

Reference

1. Malika M, Habiba S, Agarwal P (2018) A novel approach to web-based review analysis using
opinion mining. In: International Conference on Computational Intelligence and Data Science
(ICCIDS 2018) , Department of Computer Science and Engineering, Jamia Hamdard, New
Delhi-110062, India
2. Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau RJ (2011) Sentiment analysis of Twitter
data
3. Popularity analysis for Saudi telecom companies based on Twitter Data. Res J Appl Sci Eng
Technol (2013)
4. Liu B (2012) Sentiment analysis and opinion mining, Morgan & Claypool Publishers
5. Joshi S, Deshpande D (2018) Twitter sentiment analysis system. Department of Information
Technology, Vishwakarma Institute of Technology Pune, Maharashtra, India
6. Agarwal A, Xie B, Vovsha I, Rambow O, Passonneau RJ (2011) Sentiment analysis of Twitter
data. Department of Computer Science Columbia University New York, NY 10027 USA
7. Gupta B, Negi M, Vishwakarma K, Rawat G, Badhani P (2017) Study of Twitter sentiment
analysis using machine learning algorithms
8. Umadevi V (2014) Sentiment analysis using weka. IJETT Int J Eng Trends Technol 8(4):181–
183
9. Techniques for sentiment analysis of Twitter data: a comprehensive survey. In: 2016
International Conference on Computing, Communication and Automation (ICCCA)
10. Caetano JA, Lima HS, Santos MF, Marques-Neto HT (2018) Using sentiment analysis to define
twitter political users’ classes and their homophily during the 2016 American Presidential
election
11. Bansala B, Srivastavaa S (2019) On predicting elections with hybrid topic based sentiment
analysis of tweets. Department of Applied Sciences, The NorthCap University, Gurugram,
India
858 A. Jahnavi et al.

12. Guerini M, Gatti L, Turchi M (2013) Sentiment analysis: how to derive prior polarities from
SentiWordNet
13. Barahate SR, Shelake VM (2012) A aurvey and future vision of data mining in educational field.
In: Proceedings 2nd International Conference on Advanced Computing and Communication
Technology, pp 96–100
14. Chen X, Vorvoreanu M, Madhavan K (2014) Mining social media data to understand students’
learning experiences. IEEE Trans 7(3):246–259
15. Twitter data sentiment analysis and visualization.Int J Comput Appl 180(20)
16. Al Amrani Y, Lazaar M, Kadiri KE (2018) Random forest and support vector machine based
hybrid approach to sentiment analysis. Author links open overlay panel
17. Venugopalan M, Gupta D (2016) Exploring sentiment analysis on Twitter data, IEEE 2015
Flexible Language-Agnostic Framework
To Emit Informative Compile-Time
Error Messages

Malathy Nagalakshmi, Tanya Sharma, and N. S. Kumar

Abstract It is observed that programmers encounter cryptic compiler error mes-


sages that are difficult to comprehend and hence inconvenient to resolve. Program-
mers spend a lot of their time in syntax error corrections. There is no universal tool
that emits helpful descriptive error messages for all languages. In this paper, we pro-
pose a language-agnostic framework that detects syntax errors and produces error
messages that are easy to comprehend. It allows domain experts to specify error mes-
sages for each type of syntax error. We use a novel approach wherein our framework
provides the functionality of customizing error messages by providing a template for
specifying the same. This makes it easier for novice programmers to understand the
cause of compile-time errors in any programming language. Our results confirm that
programmers prefer the enhanced error messages emitted by our framework over the
standard error messages.

Keywords Language-agnostic framework · PLY · Customization of error


messages · Program generating a program · Programming languages

1 Introduction and Related Work

Syntax errors constitute one of the largest classes of programming errors. Program-
mers spend a significant amount of time in correcting them [1]. Dozens of syntax
errors detecting environments like Eclipse [2] and IntelliJ [3] exist in the program-
ming world. But, there is no universal environment available for all languages. As
programmers usually use different languages based on the problem that they are

M. Nagalakshmi (B) · T. Sharma · N. S. Kumar


Department of Computer Science, PES University, Bengaluru, India
e-mail: malathy124@gmail.com
T. Sharma
e-mail: tanya.sharma0217@gmail.com
N. S. Kumar
e-mail: kumaradhara@gmail.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 859
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_62
860 M. Nagalakshmi et al.

solving, they would require a language-agnostic framework to detect errors for all
the programming languages that they use.
In addition to this, popular programming languages such as Python and C [4]
produce unhelpful error messages [5–7] that provide little or no help to program-
mers in identifying these errors. For instance, "Syntax Error: invalid syntax", "error:
expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘{’ token" are common syntax
errors seen which provide minimal assistance to programmers [8].
To address these issues, we developed a language-independent framework that
provides a template to specify user-friendly error messages. A domain expert can
customize the default syntax error messages to any brief or elaborate error message
based on the use-case. A standard compiler/interpreter environment does not offer
this feature. For example, the error messages can be customized to be more elaborate
for novice users [9]. For a competitive coding environment, the strictness of syntax
error detection can be lenient. This would ensure the code passes any test case if
the logic is right without considering incorrect syntax. This sort of flexibility is not
available in any other framework for all languages.
Existing work to detect syntax errors use machine learning techniques or abstract
syntax trees. Sahil et al. [10] use a neural network trained over a dataset that consists of
assignments submitted by students. This method requires high computational power
and average training periods of 1.5 hours. The performance of the neural network is
dataset dependent. On the other hand, using an abstract syntax tree [11] can handle
only one syntax error at a time. The first error detected has to be rectified before
the system can detect other errors. It is not ideal for a large program that can have
multiple syntax errors at once. The programmer needs to rectify one error at a time
which is time-consuming.
Due to the above-mentioned limitations, we consider an alternative approach
based on compiler theory. Our framework identifies the compile-time syntax errors
in a user given program based on the grammar provided for a language.
The grammar consists of error productions for every syntax error. Our framework
provides the flexibility to supply error productions for any programming language. A
PLY [12] program (pure-Python implementation of the popular compiler construction
tools LEX and YACC) is generated using these error productions. We adopt a novel
approach where we have a Python code that automatically generates the PLY code.
The PLY program is a transformation of the error production rules to a suitable format.
The program generated PLY program is then run to give better diagnostics of the
code that is uploaded by the user. The error productions provided and the uploaded
program must be of the same programming language. As a proof of concept, we
have written error productions in Python and C to help identify syntax errors in both
these languages. Any future use of this framework would require the domain expert
to provide the error productions for the language in the required format.
Our framework has the following features:
1. Our framework allows customization of syntax error messages that enable more
instructive error messages to be emitted compared to standard compiler/inter-
preter.
Flexible Language-Agnostic Framework to Emit Informative Compile-Time . . . 861

2. The PLY program used to identify and correct the syntax errors is automatically
generated using another program.
3. Our framework converts a language-specific input to a language-agnostic PLY
program making it language independent.
4. All the syntax errors in a program are detected at once making it easy for under-
standing compile-time errors.
The rest of this paper is structured as follows. We describe the architecture of our
framework in Sect. 2. Section 3 is devoted to the results obtained. Lastly, we present
our conclusions in Sect. 4.

2 Language-Agnostic Tool to Detect Syntax Errors

On a high-level, our approach involves making use of lexical analyzer(LEX) and


yet another compiler-compiler(YACC) to parse the user-provided input code. The
following sections describe how error productions are converted into a suitable form
as required by LEX and YACC to detect syntax errors.

2.1 Architecture

Our framework can be divided into the following components as shown in Fig. 1.

2.1.1 Error Productions

Each error production consists of a non-terminal called the left-hand side of the pro-
duction, an arrow, and a sequence of tokens and/or terminals called the right-hand
side of the production. This is followed by a customized error message separated
from the error production using a delimiter(“). The customized error messages pro-
vide a clear description of the syntax errors. One of the non-terminals is designated
as the start symbol from where the production begins. One such error production in
Python to identify missing colon after a leader is as follows:

Generates

Input Input
Error PLY User
PLY
Productions Program Program

Output
Program

Fig. 1 Framework architecture


862 M. Nagalakshmi et al.

1 Program -> DEF SPACE ID ( Parameter_List ) Funcbody ‘‘ Missing Colon

The following is an error production in C to identify missing parentheses following


the function name:
1 Program -> Type SPACE ID { FuncBody } ‘‘ Missing ()

2.1.2 PLY

Once we have the error productions, we generate a PLY[8] program. PLY is a pure-
Python implementation of the compiler construction tools LEX and YACC. PLY uses
LALR(1) parsing. It provides input validation, error reporting, and diagnostics.
The LEX.py module is used to break input text into tokens specified by a collection
of regular expression rules. Some of the tokens defined in LEX are : NUMBER, ID,
WHILE, FOR, DEF, IN, RANGE, IF. The following assigns one or more digits to
the token NUMBER:
1 def t_NUMBER(t):
2 r’\d+’
3 t.value = int(t.value)
4 return t

YACC.py is a LALR parser used to recognize syntax of a language that has been
specified in the form of a context free grammar.
1 expr : expr ’+’ expr { $$ = node(’+’, $1, $3); }

The input to YACC is a grammar with snippets of code (called “actions”) attached
to its rules. YACC is complemented by LEX. An external interface is provided by
LEX.py in the form of a token() function which returns the next valid token on the
input stream. This is repeatedly called by YACC.py to retrieve tokens and invoke
grammar rules such as:
1 def p_1(p):
2 "Program : PRINT ’(’ NUMBER ’)’ "
3

The output of YACC.py is used to implement simple one-pass compilers.

2.1.3 PLY Program Generator

We use a novel approach, wherein a Python program is used to automatically generate


the PLY program. The Python program reads the error productions provided in a file.
Each production rule is then converted into a PLY function:
1 def p_3(p):
2 "Program :DEF SPACE ID ’(’Parameter_List’)’’:’ Funcbody Program"
3 print(action[2])

Here, action[2] corresponds to the customized error message mentioned in the file.
Flexible Language-Agnostic Framework to Emit Informative Compile-Time . . . 863

The Python program takes grammar of any programming language and generates
the language-specific PLY code. This automatic generation of PLY program makes
the tool language independent.

2.1.4 User Program

This is the code that is uploaded by the user. The automatically generated PLY
program identifies the compile-time errors in this code.

2.1.5 Program Generated PLY Program

The program generated PLY program takes the code that is uploaded by the user as
input and provides descriptive error messages for the compile-time errors. This PLY
program is automatically generated and no changes are made to this while detecting
syntax errors for different languages.
Some of the syntax errors which were tackled are:
– Missing or extra parenthesis
– Missing colon after leader
– Indentation errors
– Missing, misspelled or misuse of Keywords
– Mismatched quotes
– Misuse of assignment operator(=).
One of the challenges is identifying the various types of errors possible. For
example, consider the following code snippet:

1 print("Hello World")

Some of the different possible syntax errors for the above code snippet are:
1 #Missing left parenthesis
2 print "Hello World")
3
4 #Missing right parenthesis
5 print ("Hello World"
6
7 #Extra parenthesis on the left
8 print (("Hello World")
9
10 #Misspelled keyword
11 prnt ("Hello World")
12
13 #Mismatched quote
14 print (’Hello World")

All possibilities have to be handled during syntax error detection.


864 M. Nagalakshmi et al.

Fig. 2 User interface

2.2 Web Interface

We have developed a Web interface that has the following features:


1. Write Code In: Allows the user to select the programming language of the code
that is to be uploaded.
2. Upload: Allows user to upload a program in the programming language selected
in Write Code in as shown in Fig. 2a
3. Compile: Highlights the errors in the user uploaded program if there are any.
When the user hovers over the highlighted area, the error message will be dis-
played as shown in Fig. 2b, c.
4. Run: Allows the user to run the corrected code on the standard compiler or
interpreter environment as shown in Fig. 2d.

3 Results

We conducted a survey with 100 engineering students who had basic knowledge of
at least one programming language. As shown in Fig. 3a, students preferred a more
elaborate error message(error message 2) emitted by our framework over standard
Flexible Language-Agnostic Framework to Emit Informative Compile-Time . . . 865

Fig. 3 Survey

compiler/interpreter error messages. According to the survey, students also preferred


working with a language-independent framework over a language-specific compil-
er/interpreter like Python IDLE(Fig. 3b).
Keeping the existing work in mind, we were able to develop a framework that
allowed programmers to easily detect and rectify their code.

3.1 Descriptive Error Messages

Unlike the standard compiler/interpreter environment our framework emits non-


cryptic and descriptive error messages to novice programmers, as specified by the
domain expert, making it easier for them to understand the reason for the error and
hence help them correct the same. Table 1 shows the comparison of error messages
produced by our framework and the standard error messages produced by Python.

Table 1 Result analysis


Code snippet Python IDLE Framework output
Syntax error: Invalid syntax Syntax error: Missing
colon at the end of line 1
Syntax error: Invalid syntax Syntax error at ’)’. Extra
right parenthesis at line 1
Syntax error: Invalid syntax Invalid keyword at line 4

Syntax error: Invalid syntax Mismatched quotes at line


12
Syntax error: Invalid syntax ’++’ is invalid at line 2
Syntax error: Invalid syntax Required indentation at
line 2
866 M. Nagalakshmi et al.

3.2 Syntax Errors Detection

Unlike the existing tools, our framework is able to detect all compile-time syntax
errors at once even for an interpreter environment. Consider the following code
snippet:
1 # check whether a given string is a palindrome or not
2 def isPalindrome(str)
3 for i in range(0, int(len(str)/2)):
4 if str[i] != str[len(str)-i-1]:
5 return False
6 return True
7 s = "malayalam"
8 ans = isPalindrome(s)
9 else:
10 print("No")

Python IDLE output : Syntax Error: Invalid Syntax


Framework output : Syntax Error: Missing colon at the end of line 2.
Syntax Error: No matching ’if’ for ’else’ block
at line 9.

3.3 Novel Approach

Grammars are used to describe the syntax of a programming language, and hence
any syntactically correct program can be written using its production rules. Our
framework identifies the compile-time syntax errors in a user given program using
a novel approach. This involves modifying the grammar provided for a language to
contain error production rules to detect possible syntax errors. Our approach has not
been used before. We use a Python program to generate the PLY program that detects
syntax errors.

3.4 Language-Independent Framework

Our framework is language independent and no code changes are required while
working with different programming languages. This makes our framework flexible.

4 Conclusion and Future Scope

In this paper, we present a language-independent framework that provides a template


for customizing error messages. Our framework uses a novel approach, wherein an
automatically generated PLY program is used to detect and describe syntax errors.
We have tested the tool by using error productions for Python and C. The core of
Flexible Language-Agnostic Framework to Emit Informative Compile-Time . . . 867

the framework remains the same irrespective of the programming language chosen.
The only requirement is that the error productions provided and the uploaded user
program is of the same language.
The following describes the improvements and further steps that could be taken
with this framework.

4.1 Range of Syntax Errors

This framework could be extended to detect multiple syntax errors on a single line.
For example, consider the following code snippet:
1 prnt "Hello World")

The above code snippet has two syntax errors. First, misspelled keyword, and second,
missing left parenthesis. However, since our framework scans tokens from left to
right, only the first error is detected. The second error is detected only after correcting
the first error.

4.2 Automatic Generation Of Error Productions

Presently, the rules to identify the errors have to be provided by the domain expert.
This can be auto generated. Instead of specifying all possible error productions, the
different error productions for a given correct production can be generated automat-
ically. For example, consider the following correct production:
1 Program -> DEF SPACE ID ( Parameter_List ) : Funcbody

Using the correct production, the following error productions could be generated:
1 Program -> DEF SPACE ID ( Parameter_List ) Funcbody ‘‘ Missing Colon
2 Program -> ID ( Parameter_List ) : Funcbody ‘‘ Missing keyword ’def’
3 Program -> DEF SPACE ID (( Parameter_List ) : Funcbody ‘‘ Extra left parenthesis

References

1. Kummerfeld, Sarah K, Kay J (2003) The neglected battle fields of syntax errors. In: Proceedings
of the fifth Australasian conference on Computing education, vol 20
2. Eclipse IDE (2009) Eclipse IDE. www.eclipse.org. Last visited 2009
3. IntelliJ IDEA (2011) The most intelligent Java IDE. JetBrains. Dostupné z: https://www.
jetbrains.com/idea/. Cited 23 Feb 2016
4. https://www.tiobe.com/tiobe-index/
5. Javier Traver V (2010) On compiler error messages: what they say and what they mean. Adv
Hum-Comput Interact Article ID 602570:26. https://doi.org/10.1155/2010/602570
868 M. Nagalakshmi et al.

6. Becker BA et al (2019) Compiler error messages considered unhelpful: the landscape of text-
based programming error message research. In: Proceedings of the working group reports on
innovation and technology in computer science education, pp 177–210
7. Becker BA et al (2018) Fix the first, ignore the rest: Dealing with multiple compiler error mes-
sages. In: Proceedings of the 49th ACM technical symposium on computer science education
8. Brown P (1983) Error messages: the neglected area of the man/machine interface. Commun
ACM 26(4):246–249
9. Marceau G, Fisler K, Krishnamurthi S (2011) Mind your language: on novices’ interactions
with error messages. In: Proceedings of the symposium on new ideas, new paradigms, and
reflections on programming and software, pp 3–18
10. Sahil B, Rishabh S (2018) Automated correction for syntax errors in programming assignments
using recurrent neural networks
11. Kelley AK (2018) A system for classifying and clarifying python syntax errors for educational
purposes. Dissertation. Massachusetts Institute of Technology
12. Beazley D (2001) PLY (Python lex-yacc). See http://www.dabeaz.com/ply
Enhancing Multi-factor User
Authentication for Electronic Payments

Md Arif Hassan, Zarina Shukur, and Mohammad Kamrul Hasan

Abstract Security is beginning to be more and more important for electronic trans-
action nowadays, and the need for security is becoming more important than ever
before. A variety of authentication techniques have been established to ensure the
security of electronic transactions. The usage of electronic payment systems has
grown significantly in recent years. To secure confidential user details from attacks,
the finance sector has begun to implement multi-factor authentication. Multi-factor
authentication is a device access management strategy that enables an individual to
move through various authentication phases. For each of the previous tasks, attempts
have been created to secure the electric payment process by using various authenti-
cation methods, and despite the advantages of theirs, each had a downside. Authenti-
cation based on password is one of the most common ways for users to authenticate
in numerous online transaction applications. Inhere, electronic payments authentica-
tion mechanism which mainly relies on the traditional password only authentication
cannot efficiently resist to the latest password wondering and password cracking
strikes. In order to handle the problem, this paper proposes an authentication algo-
rithm for electric payments by adding a multi-factor mechanism on the existing user
authentication mechanism. The enhancement concentrates on enhancing the user
authentication elements of multiple factors using the biometric technique. The soft-
ware is developed using an android simulator, and a platform that helps developers
to evaluate an application without the requirement for the device to be built on a
real device. The proposed system has two phases, namely: registration stage and an
authentication stage. The suggested authentication protocol gives users safe access
to the authorization through multi-factor authentication using their password and

Md Arif Hassan (B) · Z. Shukur · M. Kamrul Hasan


Faculty of Information Technology, Center for Cyber Security, National University Malaysia
(UKM), 43600 UKM, Bangi, Selangor, Malaysia
e-mail: arifhassane72@gmail.com
Z. Shukur
e-mail: zarinashukur@ukm.edu.my
M. Kamrul Hasan
e-mail: mkhasan@ukm.edu.my

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 869
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_63
870 Md Arif Hassan et al.

fingerprint. In order to ensure unauthorized users cannot easily break into the mobile
application, the proposed enhancement would provide better security.

Keywords Electronic payments · Single-factor · Two-factor · Multi-factor


authentication

1 Introduction

The multi-factor authentication has been built on the Internet in order to improve
user authentication efficiency and make it impossible for attackers to access and the
cracking systems. It provides security of information for companies and prevents
them from crashing or losing money. When online transfers arise, consumers still
worry for hackers and anti-social activities as they transfer money from one account
to another. Nevertheless, it is essential to validate users and so as to keep user infor-
mation harmless on cloud and cryptographic techniques are required to encrypt this
sensitive data. The most significant use of multi-factor authentication is to ensure that
only authenticated or authorized users are entitled to process their financial trans-
actions in financial services such as online banking, online banking, and Internet
banking. A great change in electronic transactions has increased equal to the secu-
rity attack against electronic payment systems. Some of these attacks are managed
by the weaknesses of user authentication systems that are performed online.
The authentication process is necessary to encourage users to enter passwords,
and if it suits the current user, the user is authenticated and otherwise not permitted to
sign in to the system. The first step is always authentication for online transactions.
Authentication based on password (single-factor authentication) is one of the most
common ways for users to authenticate in numerous mobile applications which is
known as a single-factor authentication. However, password-based authentication
schemes have many issues, and the risk of using passwords in corporate application
authentication is not precise. One of the major problems with password-based authen-
tication is that the majority of users do not know how strong password is. Two-factor
authentication, the extra security action which involves individuals to get into a code
route to their email or phone, has usually been effective to maintain usernames and
passwords protection from attacks. The usage of two-factor applying these factors
has reduced fraud but has not stopped [1]. The uses of two-factor authentication
are using way too many tokens, token forgery, token costs, and shed tokens [2, 3].
Nevertheless, security industry specialists have confirmed an automated phishing
attack which may cut through that additional level of protection—even called 2FA—
perhaps, trapping unsuspicious users into sharing their private credentials [4]. The
evolution of authentication techniques from SFA to MFA can be seen in Fig 1.
Authentication strategies, which typically rely on over one component, are more
difficult to compromise than one component method. A multi-factor authentication
feature is necessary to render the solution successful and secure in order to improve
Enhancing Multi-factor User Authentication … 871

Fig. 1 Evolution of
authentication techniques
from SFA to MFA [5] Single Factor Authentication
Knowledge Factor
PIN, password, security question

Two-Factor Authentication
Ownership factor
Smartphone, key-card, one-time password

Multi-Factor Authentication
Biometric factor
Biometric factor: Fingerprint, face, iris,
voice, vein etc.

the protection of online transactions. This paper intends to design and apply an elec-
tronic payment for authentication of a secure online payment system in this system;
the payment process requires multiple authentications for a transaction rather than
sending it directly to a customer. The emerging trend is now biometric authentica-
tion. Because of the high protection level, it is really user-friendly, and the fingerprint
login option is increasing. Biometrics is the process by which a person is physio-
logic or chemical features are authenticated. For each person, these characteristics
are unique. They can never be stolen or repeated, avoid different forms of assaults,
and allow us to secure our personal records. Our proposed multi-factor authentication
is a many stages of authentication that the person has going with. The person must
be authenticated firstly through his password and fingerprint biometric to proceed
validation process. After all of the methods on the person becomes authenticated,
and then just user’s is able to access their account.
The article is divided into four parts. The first section would provide a descrip-
tion of the existing system and the formulations of the problem. The implemen-
tation of electronic payments and its related analysis, already discussed, is in the
following section. The third section explains the method overall. Section 4 outlines
the implementation of the model and its conclusion and potential research are the
final section.
872 Md Arif Hassan et al.

2 Literature Review

A variety of authentication strategy was created to make certain the protection of elec-
tronic transactions. Single-factor authentication (SFA) enables device access through
a single authentication process. A simple password authentication scheme is the most
common example of SFA. The system prompts the user for a name followed by a
password. Passwords are saved on server aspect in encrypted type or even using hash
functions, also the username as well as passwords transmit in encrypted type over
the secure connection. Therefore, if some intruder gets access over the system, there
has no be worried about leakage of information, as it will not expose any info about
real password. Though it appears protected, however, in practical, it is significantly
less secure as an assailant is able to gain original password of a customer utilizing
various assault after a couple of combinations [6]. By posting the password, one may
compromise the account right away. An unauthorized user may also try to increase
access through the use of the different kinds of attack like, brute force attack [7–9],
rainbow table attack [10, 11], dictionary attack [12–14], social engineering [15–17],
phishing [18, 19], MITMF [20, 21], password-based attack [7, 22], session hijacking
[23], and malware [19, 24, 25].
Single-factor authentication image-based schemes are an approach in Bajwa [26].
The drawback of the system is that it takes more time for authentication and shoulder
surfing is possible in this method. A common technique used in electronic payment
authentication requires drawing design codes on the display screen, which approach
by Vengatesan et al. [27]. To mitigate the problem of single-factor authentication,
the two-factor authentication was considered as a solution for securing online trans-
actions and recognizing the authenticated person and logging in to a system or appli-
cation, and many current and new companies are racing to deploy it. Two-factor
authentication is a mechanism that implements more than one factor and is considered
stronger and more secure than the traditionally implemented single-factor authenti-
cation system. Two-factor authentication using hardware devices, such as tokens or
cards, OTP has been competitive and difficult to hack to solve these problems.
There are plenty of benefits of 2FA technique, but opponents have been working
on breaking this strategy and discovered a number of ways to hack it as well as
expose the very sensitive info of the people.
Although 2FA systems are powerful, they still suffer by using malicious attacks
like lost/stolen-wise card attacks, token costs, token forgery, and lost tokens [3],
shooting a phony fingerprint from the original fingerprint, as well insider attacks
[28]. Due to increase online transaction two-factor authentication is not enough for
performing expensive transactions online [2]. There is therefore a necessity for a
strong and secure more system according to the multi-factor authentication to check
out the validity of users. Principally, MFA involves different things such as, for
instance, biological and biometrics, the sensible mobile device, token unit, along
with the smart card. This authentication program enhances the security level and
also provides for the use of identification, verification, and then authentication for
guaranteeing user authority. In order to mitigate the issue of two-factor authentication,
Enhancing Multi-factor User Authentication … 873

it viewed the multi-factor authentication as a formula for securing online transactions.


Multi-factor authentication is a technique of computer access control that a person
is able to pass effectively showing different authentication stages. In this, rather than
asking only individual piece of info as passwords, users are requested to provide a
number of extra info a minimum of three valid elementary authentication [29] factors
and that helps make it harder for any intruder to bogus the identity of the real user.
The multi-factor authentication will be the three measures of authentication that
the person has going through user. This more info is able to consist of different
aspects as fingerprints, security tokens, and biometric authentication [6]. Multi-factor
authentication may be done in numerous ways; most widespread of them are using
login credential with a few additional information. Until now, their many methods
are available for secure user authentication, namely biometric authentication [30]. It
uses biometric authentication to eradicate the defects of old techniques. Biometrics
means automated detection of people depending on their specific behavioral and
natural attributes like experience, fingerprint, iris, speech, etc. [31]. There are two
types of biometrics namely unimodal biometrics and multimodal biometrics [16].
Biometric systems have many uses [12]. A biometric fingerprint authentication using
Stream Cipher Algorithm have been approached by Alibabaee and Broumandnia for
online banking [31]. The Stream Cipher protocol is based on one-time password and
fingerprints. This method is highly resistant to all kinds of electronic banking attack,
such as phishing and password theft [32–35]. The proposed system is developed
based on RSA cryptosystem. A similar authentication method has been approached
by Benli et al. [30]. In their, the proposed method users at first register their biometric
credentials on the device. Do the related method has been proposed by Harish et al.
[2] for Smart Wallet. In this project, we intend to implement and apply a multi-
factor user authentication of a secure electronic payment system using a user phone
number, user secret code, and password and biometrics fingerprint authentication.

3 Proposed Design System

For each of the previous tasks, attempts have been created to secure the electric
payment process by using various methods, and despite the advantages of theirs,
each had a downside. With a bit of searching right here, several existing deficiencies
dependent on RSA encryption algorithms are solved and also a protected method is
suggested. This paper proposes a multi-factor authentication component algorithm
for electric payments.
The proposed system has two phases, namely: registration stage and an authenti-
cation stage. A comprehensive explanation of each phase is provided below. Table 1
presents the notations used in the proposed system of ours. Before making use of the
apps, the person should register the information of theirs during a procedure known
as the registration phase. Verification of that information may just be achieved by a
procedure known as an authentication phase. Each of the suggested materials and
874 Md Arif Hassan et al.

Table 1 Demographic
Notations Description
characteristics
Ui User
IDi Unique identifier of user
UiEid User email id
UiP User phone number
UiA User address
UiPass Password of the user
UiBF Biometric feature of user
DBs Server database

strategies are completed in the system during both registration process as well as the
authentication procedure, their process flow is reviewed in this area.

3.1 Registration Phase

To be able to use the service on, the person will need to do one-time registration. In
the particular registration stage that can be used to gather all of the users’ info, by
using that info when the person would like to login into that method the server checks
if the person is legitimate. Here, the person has to register their account together with
the details; we explain the registration measures as follows (Fig. 2):
Step-1: Start
Step-2: Input user information
Ui = IDi + UiEid + UiP + UiA + UiPass + UiBF
Step-3: Combining all user information store in database
DBs = IDi + UiEid + UiP + UiA + UiPass + UiBF
Step-4: Registration complete
End: Welcome notification for registration.

3.2 Authentication Phase

When the customer tries to sign in on the method, the authentication server has to
authenticate the user. If both values are identical, then the authentication is pros-
perous. In information, the person must authenticate him/herself utilizing the many
authentication actions for the login progression as well as the transaction process.
The authentication module comprises of two primary processes: specifically, login
process as well as the authentication process. In login procedure, the person has to
login utilizing the authorized password and number, fingerprint, and IMEI for authen-
tication. After the user login to the method, the person is only able to see the account
details. On the other side, for transaction, the person needs to again authenticate their
Enhancing Multi-factor User Authentication … 875

Fig. 2 Registration phase


Initializing
Application
Registration

Input User credential


information

Combining user all


information

No If all Match ?

Yes

Registration Successful
Congratulation
Notification

Stop

self-using the fingerprint authentication. Just when the person authenticates with the
fingerprint specifics, the transaction could be accomplished. The comprehensive
process is talking about in the following steps (Fig. 3):
Step-1: Start
Step-2: Input user password
Ui = UiPass
876 Md Arif Hassan et al.

Fig. 3 Authentication phase


Initializing
Authentication
Process

Input User
registered
fingerprint

No If Match ?

Yes

Input User
password

No If Match ?

Yes

Authentication Successful
!!!

Stop

If UiPass = DBs go to step 3


Else, step 2
Step-3: Input user biometric fingerprint
Ui = UiBF
If UiBF = DBs
Ui = UiPass + UiBF = DBs go to step 4
Enhancing Multi-factor User Authentication … 877

Else, step 3
Step-4: Authentication Successful!!
Step-4: End-access granted go for next module.
In this particular registration stage, which can be used to gather all the users’
info, by using that info when the person would like to login into that particular
method. In this particular stage, the moment operator enters all his personal identifi-
cation, and then simply user is able to obtain access for the method. The registration
procedure is shown in Fig. 4. A session is produced for him/her afterward he/she
becomes equipped to access materials and he can alter the personal details of his in
panel. After successful registration and owner approval, the customer will see the
profile display on the payment process. In the profile, there is going to be many
service models including wallet balances, leading set up mobile operators, shopping
as well as include money. The most as well as the latest user authentication and
account access control of Internet banking systems is based on an individual compo-
nent authentication, namely operator title as well as a password. The protection of
password-based authentication, nonetheless, is determined by the power of the user’s
selected pass, etc.

4 Implementation and Result

Android software development kit (SDK) engine was the primary development kit
used for this project based on the scalability of devices the application can be used
on and the rich application framework it provides, allowing users to build innovative
applications for mobile devices within a Java language environment. The primary
development kit used for this project was the Android software development kit
(SDK). The front-end style is easy to use as well as simple as soon as the solution is
begun, the person will certainly register himself, and after that, he will certainly have
the ability to login right into the system. A hardware to help the back and front end
of the device is essential for all applications to be built. The software that was used
for this development is android studio. Google created android studio for android
programming specifically. It has a comfortable interface that helps a customer to
access and check the submission. To fix bugs, the built-in SDK was used to operate
the device.
The system is developed to run particularly android studio virtual device nexus
Google pixel 2, and all various other different types of smartphone devices that
utilize this innovation. The system is independent that it can service all android-
based smartphone devices. The android platform also provides built-in database
(SQLite database) and Web services. SQLite databases have been developed into the
android SDK. SQLite is a SQL database engine which stores data in.db files [36].
Two types of android SDK storage are widely available, namely internal and external
storage. Save files in internal storage are containerized by default, so other apps on
the computer cannot access them. Such files are removed when the user uninstalls
878 Md Arif Hassan et al.

(a): Home screenshot (b): Registration process. (c): fingerprint authentication.

(d): fingerprint completed. (e): password authentication (f): authentication completed.

Fig. 4 Screenshot of the registration and authentication process

your program. At the other side, a computer compatible with android allows external
shared storage. Store can be replaced either internally or separately (such as an SD
card). Files stored in external storage can be read worldwide. Once USB mass storage
is allowed, the user is able to modify them.
The prototype is evaluated based on the registration stage and authentication
stages. The simulation is run in the Web server side on a DELL laptop computer
Enhancing Multi-factor User Authentication … 879

with Intel Core i7 CPU, 3.40 GHz CPU as well as 6 GB RAM. The operating system
is Windows 10 professional. Android is an open-source operating system built on
Linux and the android platform that makes everyday activities simple, fast, and
helpful apps for mobile devices. The android architecture offers server-to-application
compatibility with certain APIs.
Java cryptography architecture (JCA) is used to develop android cryptography
APIs. The JCA is a collection of digital signature APIs, message digests, authen-
tication of credentials and certificates, verification, key generation and control, and
stable generation of random numbers. These APIs allow security to be easily inte-
grated into their application code by developers [37]. For our implementation, we
used javax crypto APIs. Developing APIs/ GUI for use of a fingerprint reader applica-
tion program interface or graphical user interface. The authentication of fingerprint is
feasible only on smartphones, with a touch sensor for user recognition and connection
to software and program features. The execution of fingerprint authentication is an
enormous multi-step process at first. Fingerprint authentication is primarily a cryp-
tographic process involving a key, encryption cipher, and a fingerprint manager for
the authentication function. For an android device, there are several common ways
to incorporate fingerprint authentication. The Keyguard Manager and the Finger-
print Manager for Fingerprint Authentication use two system services. For using
the fingerprint analysis, a fingerprint manager is necessary. Within the Fingerprint
Manager, several of the fingerprint-based approaches can be found.
Fingerprint scanner and API source code collection have been used for developing
API/GUI for this study. A new users signs up to the application by clicking the regis-
tration switch on the welcome page to start with and after that sending his/her info in
the registration page. The user will certainly need to sign up in the application on first
usage. After registration, the client will certainly obtain a username and also pass-
word. Our proposed work based on device-based using keystore. The system stores
user data and compares it with existing databases. In this case, only authentication
will be effective if the current and existing database match. The Android Keystore is
a program that makes it easier for developers to build and store cryptographical keys
in containers. The Android Keystore is another JavaKeystoreAPI implementation,
which is a repository for certificates of authorization or public key certificates and
which uses Java-based encryption, authentication, and HTTPS-service applications
in several situations. The entries are encrypted with a password from a keystore. The
most stable and suggested form of keystore is currently a strongbox-backed Android
Keystore [38]
The signup display is revealed and listed below Fig. 4b reveals a screenshot
of the signup pages. When the registration is complete, the user requirement to
the authentication procedure begins. A confirmation message if the customer signs
successfully “successfully registered” is displayed as shown below Fig. 4c shows the
screenshots from the login phase procedure in steps 1 and 2. Various monitoring are
also held during user registration. The user has to use their fingerprint and password
for logging into the application for each time usage. The authorized user must apply
to sign up fingerprint and password otherwise if the user enters such a fingerprint
880 Md Arif Hassan et al.

or password that is not registered then the user will get a notification message that
“incorrect fingerprint or password.”

5 Conclusion

The proposed method is used for mobile application and security in an electronic
payment system, using a biometric verification feature, which is used to validate the
fingerprint model registered at the time of registration. The customer can perform the
transaction and protection will be given if the fingerprint is matched with the samples
in the databases, authentication will succeed. It provides users with access to the
authorization secure way through multi-factor authentication using their password
and fingerprint. This approach is simply to guarantee security and trust in the financial
sector. Our algorithm provides an extra protection layer that stops hackers from
targeting phishing and social engineering. The approach solution strengthens the
existing system of authentication. It greatly increases mobile banking networks of
protection by offering three-dimensional assurances from three separate areas such as
knowledge, inherent, and possession. It also increases the user interface by allowing
verification simpler for consumers. This process can be used by anyone who has a
smart device that is support biometric fingerprint authentication.

Acknowledgements The authors would like to thank the anonymous reviewers for their helpful
feedback. This research was funded by a research grant code from Ya-Tas Ismail—University
Kebangsan Malaysia EP-2018-012.

References

1. Khattri V, Singh DK (2019) Implementation of an additional factor for secure authentication


in online transactions. J Organ Comput Electron Commer 29(4):258–273
2. Harish M, Karthick R, Rajan RM, Vetriselvi V (2019) A new approach to securing online
transactions—the smart wallet, vol 500. Springer, Singapore
3. Shaju S, Panchami V (2017) BISC authentication algorithm: an efficient new authentication
algorithm using three factor authentication for mobile banking. In: Proceedings of 2016 online
international conference on green engineering and technologies. IC-GET 2016, pp 1–5
4. Newcomb A (2019) Phishing scams can now hack two-factor authentication | fortune,
2019. Available: https://fortune.com/2019/06/04/phishing-scam-hack-two-factor-authentic
ation-2fa/. Accessed: 21 Mar 2020
5. Ometov A, Bezzateev S, Mäkitalo N, Andreev S, Mikkonen T, Koucheryavy Y (2018) Multi-
factor authentication: a survey. Cryptography 2(1):1
6. Kaur N, Devgan M (2015) A comparative analysis of various multistep login authentication
mechanisms. Int J Comput Appl 127(9):20–26
7. Emeka BO, Liu S (2017) Security requirement engineering using structured object-oriented
formal language for m-banking applications. In: Proceedings of 2017 IEEE international
conference on software quality reliability and security. QRS 2017, pp 176–183
Enhancing Multi-factor User Authentication … 881

8. Ali MA, Arief B, Emms M, Van Moorsel A (2017) Does the online card payment landscape
unwittingly facilitate fraud? IEEE Secur Priv 15(2):78–86
9. ENISA (2016) Security of mobile payments and digital wallets, no. December. European Union
Agency for Network and Information Security (ENISA)
10. Sudar C, Arjun SK, Deepthi LR (2017) Time-based one-time password for Wi-Fi authentication
and security. In: 2017 International conference on computer communication and informatics,
ICACCI 2017, vol 2017, pp 1212–1215
11. Kogan D, Manohar N, Boneh D (2017) T/Key: second-factor authentication from secure hash
chains dmitry, pp 983–999
12. Jesús Téllez Isaac SZ (2014) Secure mobile payment systems. J Enterp Inf Manag 22(3):317–
345
13. Dwivedi A, Dwivedi A, Kumar S, Pandey SK, Dabra P (2013) A cryptographic algorithm
analysis for security threats of semantic e-commerce web (SECW) for electronic payment
transaction system. Adv Comput Inf Technol 367–379
14. Yang W, Li J, Zhang Y, Gu D (2019) Security analysis of third-party in-app payment in mobile
applications. J Inf Secur Appl 48:102358
15. Gualdoni J, Kurtz A, Myzyri I, Wheeler M, Rizvi S (2017) Secure online transaction algorithm:
securing online transaction using two-factor authentication. Proc Comput Sci 114:93–99
16. Venugopal H, Viswanath N (2016) A robust and secure authentication mechanism in online
banking. In: Proceedings of 2016 online international conference on green engineering and
technologies—IC-GET 2016, pp 0–2
17. Roy S, Venkateswaran P (2014) Online payment system using steganography and visual
cryptography. In: 2014 IEEE students’ conference on electrical engineering and computer
sciences—SCEECS 2014, pp 1–5
18. Alsayed AO, Bilgrami AL (2017) E-banking security: internet hacking, analysis and prevention
of fraudulent activities. Int J Emerg Technol Adv Eng 7(1):109–115
19. Ataya MAM, Ali MAM (2019) Acceptance of website security on e-banking—a review. In:
ICSGRC 2019–2019 IEEE 10th control and system graduate research colloquium, Proceeding,
pp 201–206
20. Kaur R, Li Y, Iqbal J, Gonzalez H, Stakhanova N (2018) A security assessment of HCE-NFC
enabled E-wallet banking android apps. In: Proceedings of international conference on software
and computer applications, vol 2, pp 492–497
21. Chaudhry SA, Farash MS, Naqvi H, Sher M (2016) A secure and efficient authenticated encryp-
tion for electronic payment systems using elliptic curve cryptography. Electron Commer Res
16(1):113–139
22. Skračić K, Pale P, Kostanjčar Z (2017) Authentication approach using one-time challenge
generation based on user behavior patterns captured in transactional data sets. Comput Secur
67:107–121
23. Ibrahim RM (2018) A review on online-banking security models, successes, and failures. In:
International conference on electrical, electronics, computers, communication, mechanical and
computing (EECCMC). IEEE EECCMC
24. Elliot M, Talent K (2018) A robust and scalable four factor authentication architecture to
enhance security for mobile online transaction. Int J Sci Technol Res 7(3):139–143
25. Shi K, Kanimozhi G (2017) Security aspects of mobile based E wallet. Int J Recent Innov
Trends Comput Commun
26. Bajwa G, Dantu R, Aldridge R (2015) Pass-pic: a mobile user authentication. In: 2015 IEEE
international conference on intelligence and security informatics: securing the world through
an alignment of technology, intelligence, humans Organ. ISI 2015, p 195
27. Vengatesan K, Kumar A, Parthibhan M (2020) Advanced access control mechanism for cloud
based E-wallet, vol 31, no. August 2016. Springer International Publishing, Berlin
28. Mohammed and Yassin (2019) Efficient and flexible multi-factor authentication protocol based
on fuzzy extractor of administrator’s fingerprint and smart mobile device. Cryptography 3(3):24
29. Nwabueze EE, Obioha I, Onuoha O (2017) Enhancing multi-factor authentication in modern
computing. Commun Netw 09(03):172–178
882 Md Arif Hassan et al.

30. Benli E, Engin I, Giousouf C, Ulak MA, Bahtiyar S (2017) BioWallet: a biometric digital
wallet. In: Twelfth international conference on information systems (Icons 2017), pp 38–41
31. Alibabaee A, Broumandnia A (2018) Biometric authentication of fingerprint for banking users,
using stream cipher algorithm. J Adv Comput Res 9(4):1–17
32. Suma V (2019) Security and privacy mechanism using blockchain. J Ubiquitous Comput
Commun Technol (UCCT) 1(1):45–54
33. Sivaganesan D (2019) Block chain enabled internet of things. J Inform Technol 1(1):1–8
34. Hassan A, Shukur Z, et al (2020) A review on electronic payments security. Symmetry (Basel)
12(8):24
35. Hassan A, Shukur Z, Hasan MK (2020) An efficient secure electronic payment system for
E-commerce. Computers 9(3):13
36. Guide MST (2020) Data storage on android—mobile security testing guide. Avail-
able: https://mobile-security.gitbook.io/mobile-security-testing-guide/android-testing-guide/
0x05d-testing-data-storage#keystore. Accessed: 27 Jul 2020
37. Guide MST (2020) Android cryptographic APIs—mobile security testing guide. Avail-
able: https://mobile-security.gitbook.io/mobile-security-testing-guide/android-testing-guide/
0x05e-testing-cryptography. Accessed: 27 Jul 2020
38. Android D (2020) Android keystore system | android developers. Available: https://developer.
android.com/training/articles/keystore. Accessed: 16 Aug 2020
39. Mridha MF, Nur K, Kumar A, Akhtaruzzaman M (2017) A new approach to enhance internet
banking security. Int J Comput Appl 160(8):35–39
40. Soare CA (2012) Internet banking two-factor authentication using smartphones. J Mobile,
Embed Distrib Syst 4(1):12–18
Comparative Analysis of Machine
Learning Algorithms for Phishing
Website Detection

Dhiman Sarma, Tanni Mittra, Rose Mary Bawm, Tawsif Sarwar,


Farzana Firoz Lima, and Sohrab Hossain

Abstract Internet has become the most effective media for leveraging social inter-
actions during the COVID-19 pandemic. Users’ immense dependence on digital
platform increases the chance of fraudulence. Phishing attacks are the most common
ways of attack in the digital world. Any communication method can be used to target
an individual and trick them into leaking confidential data in a fake environment,
which can be later used to harm the sole victim or even an entire business depending
on the attacker’s intend and the type of leaked data. Researchers have developed
enormous anti-phishing tools and techniques like whitelist, blacklist, and antivirus
software to detect web phishing. Classification is one of the techniques used to detect
website phishing. This paper has proposed a model for detecting phishing attacks
using various machine learning (ML) classifiers. K-nearest neighbors, random forest,
support vector machines, and logistic regression are used as the machine learning
classifiers to train the proposed model. The dataset in this research was obtained
from the public online repository Mendeley with 48 features are extracted from
5000 phishing websites and 5000 real websites. The model was analyzed using F1
scores, where both precision and recall evaluations are taken into consideration. The
proposed work has concluded that the random forest classifier has achieved the most
efficient and highest performance scoring with 98% accuracy.

Keywords Machine learning · Phishing · Detection · KNN · K-nearest neighbor ·


Random forest · Decision tree · Logistic regression · Support vector machine

D. Sarma (B)
Department of Computer Science and Engineering, Rangamati Science and Technology
University, Rangamati, Bangladesh
e-mail: dhiman001@yahoo.com
T. Mittra
Department of Computer Science and Engineering, East West University, Dhaka, Bangladesh
R. M. Bawm · T. Sarwar · F. F. Lima · S. Hossain
Department of Computer Science and Engineering, East Delta University, Chittagong, Bangladesh

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 883
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_64
884 D. Sarma et al.

1 Introduction

Today’s digital world is increasingly carried out in wide range of platforms from
business to health care. Massive online activities open door for cyber criminals.
Phishing is the most successful and dangerous cyber-attack observed across the
globe. Phishing attacks are dangerous and it can be avoided by simply creating
awareness and developing the habits of staying alert and continuously being on the
lookout when surfing through the Internet and by clicking links after verifying the
trustworthiness of the source links. There are also tools such as browser extensions
that notify users when they have entered their credentials on a fake site, therefore,
possibly having their credentials transferred to a user with malicious intent. Other
tools can also allow networks to lock down everything and only allow access to
whitelisted sites to provide extra security while compromising some convenience on
the user side [1–4].
A company can take several measures to protect itself from phishing attacks. But
the core problem is still relying on the employees to some extent on being careful and
alert at all. While it ensures the reliability of machines, humans are not customizable.
A mistake from one employee could be enough to lead to a vulnerability that an
attacker can skillfully exploit and cause damage to an entire company if not detected
and contained in time. Security is a significant concern for any organization [5–9, 22].
This paper decided to employ the concepts of machine learning to train a model
that would learn to detect links that could be attempting to execute a phishing attack
and allow the machine to become an expert at detecting such sites and alerting humans
without having to rely on the human minds too much. By using artificial intelligence,
this research intended to add another layer of security that would tirelessly detect
sites and get better at its performance over time given more datasets to learn from
and allow humans to share their responsibilities, regarding careful Internet surfing
with the machines.

2 Related Research

Different research works that were pertinent to phishing attacks and essential clas-
sification techniques that were practiced to detect web phishing were highlighted in
this section.
With the current boom in technology, phishing has become more popular among
malicious hackers. The first-ever phishing lawsuit was filed in 2004 when a certain
phisher created a duplicate of a popular website known as “America Online”. With
the help of this same website, he was able to get access to personal user info and
bank details of many individuals. Phishers began to focus on websites that had online
transactions with people and made legions of fake websites that began to trick unsus-
pecting people into thinking they were the real one. Table 1 shows various types of
phishing attacks.
Comparative Analysis of Machine Learning Algorithms … 885

Table 1 Types of phishing attacks


Algorithm-based phishing Attackers create different algorithms that can detect and steal
personal user information from the database of a website
Deceptive phishing Currently, this is done by using e-mails that link a client to a
malicious website where a client unsuspectingly enters their
private information
URL phishing Hackers use hidden links in unsuspecting parts of a website that
leads a client to a malicious page [10–15]
Hosts file poisoning Before making a DNS query, the “hostnames” in the host records
are checked. Phishers poison the “host records” and redirect a user
to a phishing website
Content injection phishing Hackers inject some malicious sections in a real website that
collects the data of a user [16]
Clone phishing Phishers use a former sent e-mail and clone it. The cloned e-mail
has a malignant link attached to it that is sent to different
unsuspecting users [14]

Table 2 Traditional website phishing detection techniques


Blacklist filter A blacklist is a primary access control mechanism that can block some
aspects on a list from passing through. These filters can be applied in
different security measures like DNS servers, firewalls, e-mail servers,
etc. A blacklist filter maintains a list of elements like IP addresses,
domains, IP netblocks that are commonly used by phishers
whitelist filter A whitelist filter contains a list of different elements such as URLs,
schemes, or domains that will be allowed to pass through a system
gateway. A whitelist contrary to a blacklist maintains a list of all
legitimate websites
Pattern matching filter Pattern matching is a technique that checks if a specific sequence of
data or tokens exists among a list of given data

Table 2 explains traditional website phishing detection techniques like blacklist


filter, whitelist filter, and pattern matching filter.

2.1 Machine Learning-Based Methods

Malicious Domain Detection


Malicious domains are one of the leading causes of phishing. Different machine
learning methods and models were created to better detect malicious domains with
a high success rate [17].

E-mail Spam Filtering


Spam filters use probing methods that detect malicious e-mails and blocks them.
The e-mail is passed through thousands of predefined rules that scores the e-mail on
886 D. Sarma et al.

the probability of being a spam. Phishers use spam e-mails to direct a client to their
malicious webpage and steal data.

3 Methodology

As this paper mainly employs machine learning techniques to train models that can
detect phishing websites, our first step is to understand that our research understood
how machine learning works. In a nutshell, all machine learning techniques involve
a dataset and some programmed code that perform computations allowing the code
to analyze a portion of the data and observe relationships between the features and
the classification of the data. The machine’s trained knowledge of the relationship
was then tested against the rest of the data, and its performance was measured and
scored. Based on the performance of the model, the setup of the training procedure
and the dataset preprocessing were readjusted in hopes of better results in the next
iteration of training. If a model failed to provide satisfactory results, other techniques
were employed if found relevant to the dataset. If the model performs better than
all other trained models; however, the model was stored and used on new unknown
datasets to verify its performance furthermore.
It is important to note that different datasets could be in different formats, and
therefore new datasets introduced to the model might require preprocessing to
maintain optimal performance from the model.
To better demonstrate the model, Fig. 1 demonstrates the process.

3.1 Dataset

The dataset in this research was obtained from the public online repository Mendeley.
The dataset contained 48 features extracted from 5000 phishing websites and 5000
real websites. An improved feature extraction technique was employed to this dataset
by using the browser automation framework. The class label indicated two outcomes
where 0 was a phishing website, and 1 was a real website.

3.2 Data Preprocessing

Any collected dataset usually comes with errors, variable formats, different features,
incomplete sections, etc. If the dataset is used directly to train a model, it could lead
to unexpected behavior, and the results would rarely ever satisfy the expected needs.
Comparative Analysis of Machine Learning Algorithms … 887

Fig. 1 Flowchart of the proposed system

Therefore, it is important to preprocess data to convert raw data into an understandable


format to allow the model to train in the best way possible.

Steps in Data Preprocessing


Step 1. Import the libraries: Pandas, matplotlib, yellowbrick, numpy, and sklearn
libraries were used for import the libraries.
Step 2. Import the dataset: The dataset was imported and named as “data”.
Step 3. Check the missing values: This step was unnecessary for our dataset, as
our dataset did not have any missing values.
Step 4. See the categorical values: The dataset did not contain any categorical
values. So, this step was skipped.
Step 5. Split the dataset into training and test set: The dataset was split into
two parts. 70% of the dataset was used for training purposes and the rest of the 30%
for testing purposes. Training dataset provided features and labels together to learn
the relationship between them so that the model could later on test its knowledge
against the test set, where it was only provided with the features and was set to
generate labels for each set of features and check how many of its predictions were
done correctly.
888 D. Sarma et al.

Step 6. Feature scaling: Feature scaling was used to set a limit to the range
of variables to allow for comparison on common grounds. But it was need not to
implement for our dataset.

3.3 Classifiers

The model picked K-nearest neighbors, random forest, support vector machines,
and logistic regression as the machine learning techniques to train our model. The
models after having been trained and could then be analyzed using F1 scores that
took into account both precision and recall evaluations of the models. The model was
judged based on how much bias it contained on predicting the labels for a sample of
data, and how much difference existed in the fit of the data when compared between
the test set results and the train set results and a measurement that was referred to as
variance.

K-Nearest Neighbors (KNN)


The main idea of KNN in a nutshell is that things that are similar are usually near
each other. It is one of the simplest of all machine learning techniques and works by
simply comparing the features of a set of data, to be labeled, with other sets of data,
that are already labeled. It measures the difference between these features and refers
to these differences as distances. After having measured the distances, the model
selected k number of shortest distances and then output the most frequent labels
among them as its predicted label for the unlabeled dataset [15].

Random Forest
Random forests are made using decision trees, so it is important to understand
decision trees before understanding random forests.
Decision trees were made out of data by analyzing the features in a dataset and
creating a root using the feature that has the most impact on the label to the feature
set. These can be measured using different scoring techniques like the Gini Index.
Once a root had been decided on, the rest of the features were analyzed and similar
to the selection of the root, and the features were scored and the most significant
feature among the rest was added as a child to the root. This technique was repeated
until all of the features were added to the tree.
When a label was to be decided, the root feature was selected and then its prob-
ability was used to determine the path to take from its node, and similarly, it has
decided the intended next path from the next node and its corresponding feature. The
process was repeated until reached a leaf node that was the end of the tree, where
the decision was finalized, and therefore a label was provided by our model.
Although decision trees are good at predicting labels from the dataset used to
create them, they are not so good at predicting labels on an entirely new set of features
that are considered to be somewhat inaccurate at their predictive capabilities. This
inaccuracy can be minimized by using random forests.
Comparative Analysis of Machine Learning Algorithms … 889

The first step in generating a random forest was using the dataset to create a
bootstrapped dataset. This new dataset would contain samples from the original
dataset but would be randomly selected and placed into the new table, with the
possibility of some samples existing in the new table more than once.
The second step was to select a random subset of the features and analyze those,
using our chosen scoring technique, to generate the root of the decision tree. To add
children to the root, another random subset from among the rest of the features was
once again selected and was analyzed to pick the next child. The second step was
repeated several times to generate a wide variety of decision trees which increased
the accuracy of our model compared to using one individual decision tree.
The process of labeling an unlabeled sample of data was using all of the decision
trees to predict labels according to each of them, and then keeping track of the labels
produced by each tree, and finally selecting the label that was predicted by the most
number of decision trees. The most popular label was selected as the final predicted
label and was usually more accurate then what would have achieved from using a
single decision tree.
While random forests are deterministic, another model called extremely random-
ized tree can also be used which introduces more randomness in its generation of
trees. The splits in ERTs are not optimal, and therefore, can lead to more business.
Variance is reduced as well because of the extra randomness. While both random
forests and extremely randomized trees perform quite similarly, ERTs are usually
more inaccurate but also understandably faster in computation. But ERTs should be
avoided if the dataset contains a lot of noisy features which can reduce its functionality
even more [18–20].

Support Vector Machines


Support vector machines work by analyzing a dataset and trying to set a separator
called a support vector classifier, among the features to be able to classify the samples
using information regarding which side of the separator the sample falls into. To
be able to separate different kinds of datasets and establish boundaries that mark
regions for each label, the data was moved into a higher dimension than its original
relatively lower dimension in the beginning. For example, if a dataset were to be
one dimensional, it can be turned into a two-dimensional curve by using the squared
values of the features and plotting it against the original features. If a support vector
classifier were unable to separate the features into their labeled regions, this could
open up the possibilities and allow the separator to be placed in the newer and more
flexible graph.
The decision of squaring the features, or raising them to a perhaps higher poly-
nomial degree, is taken by the polynomial kernel which increases the dimensions by
setting new degrees and then uses the relationships between each pair of observa-
tions to find a support vector classifier. Radial kernel can also be used which finds
the support vector classifier in infinite dimensions and trains the model such that it
behaves like a weighted nearest neighbor technique.
890 D. Sarma et al.

When a new unlabeled sample is provided, it can simply be plot within the graph
and compare its position with that of the support vector classifier to observe which
side of the separator it falls into, and therefore, classify the sample accordingly.
Support vector classifiers also have other versions of it. For example, while linear
SVC only attempts to fit a hyperplane within a data to best separate the different
categories of the data, a Nu-SVC uses a parameter to control the number of support
vectors [21, 23].

Logistic Regression
Logistic regression is based on the concept of linear regression, where a line is
plotted against a given dataset and its axes. This line is drawn such that the squared
differences between this line and the plotted points are at their minimum. The line
and calculated R2 are used to determine whether the features are correlated. The p-
value was also calculated to verify that this value was actually statistically significant.
Finally, the line was used to plot any sample of data and finds a label’s corresponding
value according to this line.
Logistic regression uses a similar concept but is different such that it can only
classify two labels and no more. Another difference is that it does not use a straight
line, but rather an S-shaped curve which goes from 0 to 1. It tells the probability of
a given sample to belong in one of these two labels.
Logistic regression CV uses cross-validation over logistic regression to further
improve the quality of our model. When cross-validation was applied, sections of
data from the dataset were resampled in separate sessions to achieve multiple results.
It could calculate the mean probability which can label the data and can get more
accurate results.
To reach the right equation of the line, stochastic gradient descent was used
which used gradients of the loss function at each iteration as an indication to lead
to the proper values to be placed within the line equations constants, and therefore
minimizing the loss in the process and deriving the optimal line equation for our
dataset.

4 Result

Precision recall, F1 score, and success rate are widely used to measure the perfor-
mance of the supervised machine learning algorithms [24–27]. Classification report
of our model is described below. In all the tables, the row indicates 1 as a real website
and 0 as a phishing website.
Table 3 presents the classification report of the support vector machine. The preci-
sion and recall for predicting a real website are 0.920 and 0.898. These scores were
used to calculate the F1 score for predicting a real website which was 0.909. Simi-
larly, the precision and recall for predicting a phishing website are 0.895 and 0.917.
Using these both scores, the F1 score was measured for predicting a phishing website
and is 0.906. It is to be noted that the precision for predicting a real website is higher
Comparative Analysis of Machine Learning Algorithms … 891

Table 3 SVC classification


Precision Recall F1
report
1 0.920 0.898 0.909
0 0.895 0.917 0.906

while the recall for predicting a phishing website is higher. The F1 scores are similar.
F1 score was compared to other algorithms to find the optimal one.
Table 4 represents the classification report of the non-uniform support vector
classifier. The precision and recall for predicting a real website are 0.897 and 0.851.
These scores were used to calculate the F1 score for predicting a real website which is
0.874. Similarly, the precision and recall for predicting a phishing website are 0.851
and 0.896. Using these both, the F1 score was measured for predicting a phishing
website which is 0.873. The scores for these are significantly lower than support
vector machine.
Table 5 presents the classification report of the linear support vector classifier. The
precision and recall for predicting a real website are 0.900 and 0.970. These scores
were used to calculate the F1 score for predicting a real website which is 0.933.
Similarly, the precision and recall for predicting a phishing website are 0.965 and
0.885. Using these both scores, the F1 score was measured for predicting a phishing
website that is 0.923. The F1 scores in here are significantly higher than support
vector classifier.
Table 6 represents the classification report of KNN. The precision and recall for
predicting a real website are 0.854 and 0.905. These scores were used to calculate
the F1 score for predicting a real website which is 0.879. Similarly, the precision
and recall for predicting a phishing website are 0.893 and 0.836. Using these both,
the F1 score is measured for predicting a phishing website is 0.864. The F1 scores
in here are significantly lower than linear support vector classifier.

Table 4 Nu-SVC
Precision Recall F1
classification report
1 0.897 0.851 0.874
0 0.851 0.896 0.873

Table 5 Linear SVC


Precision Recall F1
classification report
1 0.900 0.970 0.933
0 0.965 0.885 0.923

Table 6 KNN classifier


Precision Recall F1
classification report
1 0.854 0.905 0.879
0 0.893 0.836 0.864
892 D. Sarma et al.

Table 7 Logistic regression


Precision Recall F1
classification report
1 0.897 0.898 0.898
0 0.892 0.891 0.892

Table 8 Logistic regression


Precision Recall F1
CV classification report
1 0.937 0.948 0.942
0 0.944 0.932 0.938

Table 7 presents the classification report of logistic regression. The precision and
recall for predicting a real website are 0.897 and 0.898. These scores are used to
calculate the F1 score for predicting a real website which is 0.878. Similarly, the
precision and recall for predicting a phishing website are 0.892 and 0.891. Using
these both, the F1 score is measured for predicting a phishing website is 0.892. The
precision, recall, and F1 scores for both 1 and 0 are remarkably close to each other
indicating that this algorithm works well for both precision and recall. However, the
F1 scores are still lower than linear SVC, so it cannot be considered as the best one.
Table 8 presents the classification report of logistic regression CV, where CV
stands for cross-validation. The precision and recall for predicting a real website are
0.937 and 0.948. These scores were used to calculate the F1 score for predicting a real
website which is 0.942. Similarly, the precision and recall for predicting a phishing
website are 0.944 and 0.932. Using these both scores, the F1 score was measured for
predicting a phishing website which is 0.938. The precision, recall, and F1 scores for
both 1 and 0 are remarkably close to each other, indicating that this algorithm works
well for both precision and recall. The F1 scores in here were better than linear SVC
so this is the best score so far.
Table 9 presents the classification report of stochastic gradient descent (SGD). The
precision and recall for predicting a real website are 0.966 and 0.826. These scores
were used to calculate the F1 score for predicting a real website which is 0.891.
Similarly, the precision and recall for predicting a phishing website are 0.841 and
0.969. Using these both scores, the F1 score was measured for predicting a phishing
website which is 0.900. The F1 scores in here were lower than logistic regression
CV so it was also rejected.
Table 10 represents the classification report of random forest classifier. The preci-
sion and recall for predicting a real website are 0.977 and 0.984. These scores were
used to calculate the F1 score for predicting a real website which is 0.980. Similarly,
the precision and recall for predicting a phishing website are 0.983 and 0.975. Using

Table 9 SGD classifier


Precision Recall F1
classification report
1 0.966 0.826 0.891
0 0.841 0.969 0.900
Comparative Analysis of Machine Learning Algorithms … 893

Table 10 Random forest


Precision Recall F1
classifier classification report
1 0.977 0.984 0.980
0 0.983 0.975 0.979

Table 11 Bagging classifier


Precision Recall F1
classification report
1 0.972 0.977 0.974
0 0.975 0.971 0.973

Table 12 Extra trees


Precision Recall F1
classifier classification report
1 0.984 0.979 0.982
0 0.978 0.984 0.981

these both scores, the F1 score was measured for predicting a phishing website which
is 0.979. Here, the precision, recall, and F1 scores are remarkably high than all the
other models. Hence, this is considered the best one yet.
Table 11 presents the classification report of bagging classifier. The precision and
recall for predicting a real website are 0.972 and 0.977. These scores were used to
calculate the F1 score for predicting a real website which is 0.974. Similarly, the
precision and recall for predicting a phishing website are 0.975 and 0.971. Using
these both scores, the F1 score was measured for predicting a phishing website is
0.973. Here, the precision, recall, and F1 scores are remarkably high than all the
other models except random forest classifier.
Table 12 represents the classification report of extra trees classifier. The precision
and recall for predicting a real website are 0.984 and 0.979. These scores were used
to calculate the F1 score for predicting a real website which is 0.982. Similarly, the
precision and recall for predicting a phishing website are 0.978 and 0.984. Using
these both, the F1 score was measured for predicting a phishing website is 0.981.
Here, the precision, recall, and F1 scores are the highest, and this is the best score of
all.
It is to be noticed from the above classification reports (Table 13) that all the
classifiers under random forest did remarkably well for detecting phishing websites
and real websites.

5 Conclusion

This study went in great detail and an in-depth explanation of machine learning
techniques and their performances when used against a dataset, containing data
regarding websites, in order to detect phishing websites. This technique is not
894 D. Sarma et al.

Table 13 Comparative classification report


Classifier Precision Recall F1
SVC 1 0.920 0.898 0.909
0 0.895 0.917 0.906
Nu-SVC 1 0.897 0.851 0.874
0 0.851 0.896 0.873
Linear SVC 1 0.900 0.970 0.933
0 0.965 0.885 0.923
KNN 1 0.854 0.905 0.879
0 0.893 0.836 0.864
Logistic regression 1 0.897 0.898 0.898
0 0.892 0.891 0.892
Logistic regression CV 1 0.937 0.948 0.942
0 0.944 0.932 0.938
SGD 1 0.966 0.826 0.891
0 0.841 0.969 0.900
Random forest 1 0.977 0.984 0.980
0 0.983 0.975 0.979
Bagging 1 0.972 0.977 0.974
0 0.975 0.971 0.973
Extra trees 1 0.920 0.898 0.909
0 0.895 0.917 0.906

commonly described in great detail in this paper but also showed how each of the
models performs by using plotted charts to demonstrate and compare each individual
algorithms.
This report aims to be useful to its readers to provide a conclusive analysis of
these methods and to verify our observations regarding random forest classifier’s
optimal performance. The graphs and details that were added to this paper aimed
to help others to carry out further experimentation progressing from where it was
concluded.
It is intended to carry on the proposed research work with further modifications to
the dataset and applies other machine learning techniques with modified parameters
to hopefully open more possibilities in improving the global defense against the
cyber attackers.
Comparative Analysis of Machine Learning Algorithms … 895

References

1. Da Silva JAT, Al-Khatib A, Tsigaris P (2020) Spam e-mails in academia: issues and costs.
Scientometrics 122:1171–1188
2. Mironova SM, Simonova SS (2020) Protection of the rights and freedoms of minors in the
digital space. Russ J Criminol 14:234–241
3. Sethuraman SC, Vijayakumar V, Walczak S (2020) Cyber attacks on healthcare devices using
unmanned aerial vehicles. J Med Syst 44:10
4. Tuan TA, Long HV, Son L, Kumar R, Priyadarshini I, Son NTK (2020) Performance evaluation
of Botnet DDoS attack detection using machine learning. Evol Intell 13:283–294
5. Azeez NA, Salaudeen BB, Misra S, Damasevicius R, Maskeliunas R (2020) Identifying
phishing attacks in communication networks using URL consistency features. Int J Electron
Secur Digit Forensics 12:200–213
6. Iwendi C, Jalil Z, Javed AR, Reddy GT, Kaluri R, Srivastava G, Jo O (2020) KeySplitWater-
mark: zero watermarking algorithm for software protection against cyber-attacks. IEEE Access
8:72650–72660
7. Liu XW, Fu JM (2020) SPWalk: similar property oriented feature learning for phishing
detection. IEEE Access 8:87031–87045
8. Parra GD, Rad P, Choo KKR, Beebe N (2020) Detecting internet of things attacks using
distributed deep learning. J Netw Comput Appl 163:13
9. Tan CL, Chiew KL, Yong KSC, Sze SN, Abdullah J, Sebastian Y (2020) A graph-theoretic
approach for the detection of phishing webpages. Comput Secur 95:14
10. Anwar S, Al-Obeidat F, Tubaishat A, Din S, Ahmad A, Khan FA, Jeon G, Loo J (2020)
Countering malicious URLs in internet of things using a knowledge-based approach and a
simulated expert. IEEE Internet Things J 7:4497–4504
11. Ariyadasa S, Fernando S, Fernando S (2020) Detecting phishing attacks using a combined
model of LSTM and CNN. Int J Adv Appl Sci 7:56–67
12. Bozkir AS, Aydos M (2020) LogoSENSE: a companion HOG based logo detection scheme
for phishing web page and E-mail brand recognition. Comput Secur 95:18
13. Gupta BB, Jain AK (2020) Phishing attack detection using a search engine and heuristics-based
technique. J Inf Technol Res 13:94–109
14. Sonowal G, Kuppusamy KS (2020) PhiDMA—a phishing detection model with multi-filter
approach. J King Saud Univ Comput Inf Sci 32:99–112
15. Zamir A, Khan HU, Iqbal T, Yousaf N, Aslam F, Anjum A, Hamdani M (2020) Phishing web
site detection using diverse machine learning algorithms. Electron Libr 38:65–80
16. Rodriguez GE, Torres JG, Flores P, Benavides DE (2020) Cross-site scripting (XSS) attacks
and mitigation: a survey. Comput Netw 166:23
17. Das A, Baki S, El Aassal A, Verma R, Dunbar A (2020) SoK: a comprehensive reexamination
of phishing research from the security perspective. IEEE Commun Surv Tutor 22:671–708
18. Adewole KS, Hang T, Wu WQ, Songs HB, Sangaiah AK (2020) Twitter spam account detection
based on clustering and classification methods. J Supercomput 76:4802–4837
19. Rao RS, Vaishnavi T, Pais AR (2020) CatchPhish: detection of phishing websites by inspecting
URLs. J Ambient Intell Humaniz Comput 11:813–825
20. Shabudin S, Sani NS, Ariffin KAZ, Aliff M (2020) Feature selection for phishing website
classification. Int J Adv Comput Sci Appl 11:587–595
21. Raja SE, Ravi R (2020) A performance analysis of software defined network based prevention
on phishing attack in cyberspace using a deep machine learning with CANTINA approach
(DMLCA). Comput Commun 153:375–381
22. Sarma D (2012) Security of hard disk encryption. Masters Thesis, Royal Institute of Technology,
Stockholm, Sweden. Identifiers: urn:nbn:se:kth:diva-98673 (URN)
23. Alqahtani H et al (2020) Cyber intrusion detection using machine learning classifica-
tion techniques. In: Computing science, communication and security, pp 121–31. Springer,
Singapore
896 D. Sarma et al.

24. Hossain S, et al (2019) A belief rule based expert system to predict student performance under
uncertainty. In: 2019 22nd international conference on computer and information technology
(ICCIT), pp 1–6. IEEE
25. Ahmed F et al (2020) A combined belief rule based expert system to predict coronary artery
disease. In: 2020 international conference on inventive computation technologies (ICICT), pp
252–257. IEEE
26. Hossain S et al (2020) A rule-based expert system to assess coronary artery disease under uncer-
tainty. In: Computing science, communication and security, Singapore, pp 143–159. Springer,
Singapore
27. Hossain S et al (2020) Crime prediction using spatio-temporal data. In: Computing science,
communication and security. Springer, Singapore, pp 277–289
Toxic Comment Classification
Implementing CNN Combining Word
Embedding Technique

Monirul Islam Pavel, Razia Razzak, Katha Sengupta,


Md. Dilshad Kabir Niloy, Munim Bin Muqith, and Siok Yee Tan

Abstract With the advancement of technology, the virtual world and social media
have become an important part of people’s everyday lives. Social media allows people
to connect, share their emotions and discuss various subjects, yet it also becomes
a place or cyberbullying, personal attack, online harassment, verbal abusing and
other kinds of toxic comments. Top social media platform still suffering from fast
and accurate classification to remove this kind of toxic comment automatically. In
this paper, an ensemble methodology of convolution neural networking (CNN) and
natural language processing (NLP) is proposed which segments toxic and non-toxic
comments in first phase, and then it classifies and labels in six types based on the
dataset of Wikipedia’s talk page edits, collected from Kaggle. The proposed archi-
tecture is structured following data preprocessing applying data cleaning processes,
adopting NLP techniques like tokenization, stemming and converted word into vector
by word embedding techniques. Ensembling the preprocessed dataset and best word
embedded method, CNN model is applied that scores ROC-AUC 98.46 and 98.05%
accuracy for toxic comment classification which is higher than compared existing
works.

M. I. Pavel (B) · S. Y. Tan


Center for Artificial Intelligence Technology, Faculty of Information Science and Technology,
The National University of Malaysia, 43600 Bangi, Selangor, Malaysia
e-mail: P104619@siswa.ukm.edu.my
S. Y. Tan
e-mail: esther@ukm.edu.my
R. Razzak · K. Sengupta · Md. D. K. Niloy · M. B. Muqith
Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh
e-mail: razia.razzak@g.bracu.ac.bd
K. Sengupta
e-mail: katha.sengupta@g.bracu.ac.bd
Md. D. K. Niloy
e-mail: md.dilshad.kabir.niloy@g.bracu.ac.bd
M. B. Muqith
e-mail: munim.bin.muquith@g.bracu.ac.bd

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 897
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_65
898 M. I. Pavel et al.

Keywords Toxic comment · Classification · Word embedding · Tokenization ·


Convolution neural networking · fastText · Natural language processing

1 Introduction

Social media today, on account of its attainability and accessibility, has ended up
becoming the region for discussions regarding information and queries about places.
It has given scope to each person to express themselves more than ever and enhanced
communication and sharing in online platform. Unfortunately, this platform is also
turning into a platform hating speech and verbal attacking, even putting people at risk
of violence, who support diversity in race, ethnicity, gender and sexual orientation.
Cyberbullying and harassment have become serious issues, which affect a wide range
of people, sometimes, inflicting severe psychological problems such as depression
or even suicide. The abusive online content will fall into more than one toxic cate-
gory, such as hating, threatening, insulting based on identity [1]. According to the
2014 poll [2] of the PEW Research Institute, 73% of people on the Internet have seen
someone being harassed online, 45% of Internet users have all been harassed and 45%
were exposed to substantial harassment. More than 85% of databases are completely
non-toxic and the concentrations of toxicity are totally not seen in Wikipedia. In
contrast to 2010, teenagers were 12% [3] more likely to be subjected to cyberbul-
lying, which obviously indicates negative part of social media. Corporations and
social media platforms are trying to track down abusive comments toward users and
are also looking for ways to automate the said process. This paper utilizes deep
learning to examine whether social media comments are abusive or not, and to clas-
sify them further into various categories such as toxic, severely toxic, obscene, vulgar,
profane and hatred toward different identity. In this paper, we are using two neural
networks based methods, convolutional neural networks (CNN) and natural language
processing (NLP) which have been applied combining with word embedding without
any syntactic or semantic expertise. We have evaluated our models and have used
accuracy tests to see how well the models performed.
To represent the research work, the sections of the paper are arranged as follows:
Sect. 2 addresses the relevant work in this area; Sect. 3 outlines the proposed method-
ology. Section 4 presents the experimental analysis alongside implementation process
and results of the procedure and Sect. 5 concludes the purpose of research work as
well as further scopes of development.

2 Related Works

With the massive increment of Internet and social media; toxic comments, cyberbul-
lying and verbal abuse have become a major issue of concern and several studies have
been conducted to resolve this by adopting classification techniques. Georgakopoulos
Toxic Comment Classification Implementing … 899

et al. [4] used CNN model for solving toxic comment classification problem using
the same dataset we have used in our methodology. In their solution, they demon-
strated using a balanced subset of the data without tackling imbalance dataset as well
as did only binary classification to identify the comments are toxic or not, but did
not predict the toxic level. To improve the issue, Saeed et al. [5] applied deep neural
network architectures with a good accuracy. One of the best part of this research was
their classification framework does not need any laborious text preprocessing. They
used CNN-1D, CNN-V, CNN-I, BiLSTM, BiGRU models and analyzed calculating
F1, ROC-AUC, precision, recall scores and claims that Bi-GRU had showed the
highest F1 score and scored great in precision and recall. In another work, authors
of [6] demonstrated capsule network-based toxic comment classifier which imple-
mented based on single model capsule network along with focal loss along. Their
model scored 98.46% accuracy with RNN classifier on TRAC dataset, known as
a Hindi-English combined dataset of toxic and non-toxic comments. Furthermore,
Kandasamy et al. [7] adopted natural language processing technique (NLP) inte-
grated with the implementation of URL analysis and supervised machine learning
techniques social media data where it scored 94% accuracy. Anand et al. [8] presented
different deep learning techniques such as convolution neural network (CNN), ANN,
long short-term memory cell (LSTM) and these are with and without word GloVe
embeddings, where GloVe pre-trained model is applied for classification. Most of the
research works showed the binary classification of toxic and non-toxic comments,
but labeling classes of toxic comments after identification are still missing in the
previous works. To solve this issue, we proposed a methodology that classifies toxic
comments and also predicts the toxicity based on classes.

3 Methodology

The workflow of the proposed methodology is presented in Fig. 1 flowchart, where


it is shown that after inserting dataset and splitting it into training and testing set.
The dataset then processes through preprocessing stage that combines data cleaning,
tokenization, stemming and passes to word embedding. We compared three famous
available word embedding techniques and ensemble with CNN classifier with the best
accurate one. The CNN architecture is firstly implemented the binary classification
to determine toxic or non-toxic comments and finally its subclass toxic levels are
predicted if the comments are toxic.
900 M. I. Pavel et al.

Fig. 1 Proposed workflow

3.1 Data Preprocessing

3.1.1 Data Cleaning

To reduce the irregularity from the dataset, data cleaning is effectively needed for
achieving better outcomes and faster processing. We adopt various data cleaning
processes like stop word removing [9], removing punctuations and case folding
where all words will be converted into lower case, removing duplicate words, URL,
emoji or short codes of emoji, numbers, one character-based word removing and
symbol removing. Overall, there are lot of words that are unnecessarily used to
emphasize a meaningful sentence. Those words are needed to be removed from the
list such as I, is, are, a, the, of and many more. They have no value or in use for
the list. In addition, these are the pronouns, conjunctions and relational words which
contribute to almost 500 stop words list of the English language. Natural Language
Toolkit (NLTK) [10] is a library for Python, for many different languages that we
have used in our model for better classification and accurate result.

3.1.2 Tokenization

It is the most common and important part of NLP where a sentence full of words are
separated or split to form different individual words [11] which is considered as a
token. Figure 2 shows how the sentence is broken down to a segmented form. Here,
a model named fastText is used for mapping the word to a vector number. In the first
step, chunk of words will be separated from a big sentence or content of information
such as [“We hate toxic comment”] to [“we”, “hate”, “toxic”, “comment”] and in
Toxic Comment Classification Implementing … 901

Fig. 2 Process of tokenization

second, the words will be embedded with some numbers to represent word vector-
ization. It mainly compares the group of vector words that are in vector space and
finds the mathematical similarity like man to boy and woman to girl.

3.1.3 Stemming

Connotation of words has a different sense in distinct form in English language and
sometimes they have similar word for describing a variety of things in using all parts
of speech. In order to find the root which is also called lemma, stemming is used for
preparing those words by removing or diminishing the inflection forms of the words
like playing, played and playful. These words have suffix, prefix, tense, gender and
other grammatical forms. Moreover, when we compare group of words and find a
root that is not the same kind then we would take it into consideration as a different
category of that word as lemma. The method of lemma is used in our model for better
output [12].

3.2 Word Embedding

Word embedding is used to learn the representation of a vector constructed using


neural networks. It is used primarily to regulate vector representations of words in
a significant alternative. Word embedding works by transforming vector representa-
tions based on mapping semantic data to an embedded space for geometric words
[13]. Table 1 shows the applied and tested word embedding methods and the details.
902 M. I. Pavel et al.

Table 1 Comparison of word embedding techniques


Method Token (billion) Details References
fastText 16 This word embedding dataset is created from [14, 15]
Wikipedia 2017, UMBC Web base corpus and
statmt.org news dataset
Word2Vec 100 It is a pre-trained dataset created from Google [16]
news dataset
GloVe 6 This dataset is created from Gigaword 5 and [17]
Wikipedia 2014

3.3 CNN Architecture for Classification

Convolutional neural networks or CNN have been commonly applied to image classi-
fication [18] problems because of its internal capacity to use two statistical properties
named “local stationarity” and “compositional structure”. To implement CNN for
toxic comment classification [19–22], the initial rule is that before being feeded [23]
to CNN architecture, sentence needs to be encoded and to improve the scenario, the
approach of applying vocabulary in a medium of index containing words which has
sets of texts that is mapped into integer length from 0 to 1. Afterward, the padding
technique is utilized to fill with zeros the document matrix with a view to gain the
highest length as CNN architecture needs constant input dimensionality. Next, the
next stage includes translating the encoded documents into matrices, in which each
row corresponds to a single term. The matrices generated move by the embedding
layer in which a dense vector transforms any term (row) into a representation of low
dimensions [24]. The operation then goes as per the standard CNN research method.
The word embedding technique is chosen for the low-dimensional representation
of each word during this point. Embedding method is the use of fixed dense word
vectors, generated utilizing word such as fastText, Word2Vec and GloVe which are
mentioned in the previous section. Our CNN architecture is built includes kernel size
five in 128 filters for 5 word embeddings along with 50 unit fully connected (dense)
layer. Figure 3 shows the setup of CNN for our toxic comment classification model.
The proposed architecture for CNN designed in 10 layers which is shown in Fig. 4
where it begins with input layer where we input the dataset, then an embedded layer
which is pre-trained with chosen word embedding technique, convolution layer to
learn feature map which captures relationships with nearby elements, max pooling
layer that helps to reduce dimensionality by segments and takes the max value, two
dropout regularization layer to deduct the problem of overfitting, two dense layers
where first one learn the weights of the input to identify outputs and second one
improves the weight and one flatten layer (fully connected) and finally one output
layer that generates the predicted class.
To train our model, we adopt ADAM optimizer [25] and binary cross-entropy
loss, and evaluated with binary accuracy in first phase, then proceed with multi-class
classification for toxic leveling. We use four epochs for high computation power with
Toxic Comment Classification Implementing … 903

Fig. 3 Setup visualization of CNN architecture

Fig. 4 Proposed convolutional neural network architecture

spearing training data set into mini-batches of 64 examples where 70% is training
and 30% data is for testing purpose.

4 Experimental Analysis

In this section, first, we describe the used dataset from Kaggle and visualize the cate-
gories with their correlations. After that, the performance analysis of the proposed
system on this dataset for toxic comment classification is shown. Finally, a demon-
stration of the proposed methodology on random toxic comments is presented for
leveling the toxic categories. For this experimental analysis, we used a computer
built AMD Ryzen 5 with 16 GB RAM and 256 GB SSD ROM, Nvidia’s GTX 1665
GPU and coded in Python 3.6 in Anaconda which is based on Spyder IDE.
904 M. I. Pavel et al.

4.1 Dataset

The dataset we have used in our research is acquired from Kaggle which is very
popular publicly available dataset named “Wikipedia Talk Page Comments annotated
with toxicity reasons” [26] which content almost 1,60,000 comments with manually
labeling. The dataset contains total six classes (toxic, severe_toxic, obscene, insult,
threat, identity_hate) which are described down below in Fig. 5.
The correlation matrices in Fig. 6 shows that “toxic”, comments are most strongly
correlated with “insult” and “obscene” class. Moreover, “toxic” and “thread” have
the only weak correlation. Further, there is very weak correlation between “obscene”
and “insult” comments are also highly correlated, which makes perfect sense. It also
shows the class “threat” has the weakest correlation with all classes.

Fig. 5 Data representation in bar chart (class contain wise)


Toxic Comment Classification Implementing … 905

Fig. 6 Visual representation of correlation between classes

4.2 Classification Evaluation

After doing the preprocessing and word embedding, we used CNN model with fast-
Text embedding technique for binary classification in the initial stage after prepro-
cessing with tokenization and stemming. We utilize three separate structures of
convolution, utilizing three separate structures of convolution at the same time where
dense vector dimension 300 with filter size width 128. For increasing convolutional
layer, filter width is equal to the vector dimension, and its height was 3, 4 and 5.
A cumulative pooling operation is implemented after each convolutional layer. A
complete layer attached is the output of the pooling layer, while the softmax feature
refers to the ending layer. Finally, we implementing this in four epochs where at
first, model loss was 6.08% and ROC-AUC was 98.46%, a gradually decreasing
starts from second epoch where loss was 3.63%, then 2.71 and 2.6% in final epoch as
well as validation score-based AUC reached maximum at 98.63% for toxic comment
classification.
In this experimental, AUC reached maximum at 98.63% for toxic comment clas-
sification. Figure 7 presents the training and testing loss for each epoch where it
visualizes that the training loss decreases from 0.0618 to constant 0.0368. Further-
more, Table 2 shows the demonstration of toxic leveling on some random vulgar and
toxic comments where it is shown the predicted toxicity based on the six classes. It
906 M. I. Pavel et al.

Fig. 7 Loss function on each epoch for train and test set

Table 2 Toxic comment labeling


Sentence Toxic (%) Severe Toxic Obscene (%) Threat (%) Insult (%) Identity Hate
(%) (%)
“Just go 98 4 40 40 67 1
and die”
“I will kill 100 10 10 96 13 0
you”
“you are a 100 4 100 0 99 0
bloody”
“Good 0 0 0 0 0 0
morning”
“you are 93 0 0 0 48 0
ugly”
“I will 86 0 0 68 1 0
break
your”
“you, son 100 1 100 0 99 0
of bitch”
“Stupid 97 0 40 0 71 0
girl!!”
“You are a 100 0 97 0 96 0
jackass”
“We hate 96 0 29 0 81 58
you fool”
“Nigga, 99 0 62 0 68 89
get lost”
Toxic Comment Classification Implementing … 907

Table 3 Accuracy
References Method Accuracy (%)
comparison
[2] CNN and bidirectional GRU 97.93
[27] SVM and TF-IDF 93.00
[28] Logistic regression 89.46
[1] Glove and CNN and LSTM 96.95
Proposed model CNN with fastText 98.05

levels each classified toxic words into subclasses with prediction where some of the
sentence can be in multiclasses or can be specifically scores high in one class that
makes sense.
Table 3 shows comparisons of others proposed work and our methodology where
authors of [2] used CNN and bidirectional GRU model and achieved 97.93% accu-
racy, [27] implemented SVM and TF-IDF and got 93.00% test accuracy after, [28]
scored 89.46% applying logistic regression, [1] got 96.95% accuracy implementing
GloVe word embedding and combining CNN and LSTM; and our proposed method-
ology with fastText word embedding works better these works showing 98.05%
accuracy and 98.46% ROC-AUC score. We adopt fastText word embedding tech-
nique was it shows highest accuracy for our model where with GloVe the accuracy
was 96.68 and 93.45% with Word2Vec.

5 Conclusion

In this paper, we represent a toxic comment classification system which is a vital issue
as with the growing social media, and it is also necessary to prevent cyberbullying,
vulgar or toxic comments because preventing this kind of things are still challenging.
However, we successfully achieve a higher accuracy comparing with other existing
works, implementing CNN with fastText word embedding technique after processing
using natural language processing including data cleaning (i.e., stop word removing),
tokenization and stemming. Firstly, it classifies comments are toxic or non-toxic
with 98.05% accuracy and 98.46 ROC-AUC score. Following that, it labels the
toxic classified comments into five other subclasses. Thus, the proposed work not
only fetches toxic comments but also clarify which subclasses it may belong that is
essential for practical implementation.
Though the accuracy of this proposed methodology is high, it can be improved
more by improving the quantity of the dataset where there are some imbalance in
class distributions as well as quantities and training more cases. Further, we are
planning to deploy it in online teaching platform chatbox and social media as these
two platforms cause a major amount of toxic comments.
908 M. I. Pavel et al.

References

1. Anand M, Eswari R (2019) Classification of abusive comments in social media using


deep learning. In: 2019 3rd international conference on computing methodologies and
communication (ICCMC), pp 974–977
2. Ibrahim M, Torki M, El-Makky N (2018) Imbalanced toxic comments classification using data
augmentation and deep learning. In: 2018 17th IEEE international conference on machine
learning and applications (ICMLA), pp 875–878
3. Van Hee C, Jacobs G, Emmery C, Desmet B, Lefever E, Verhoeven B et al (2018) Automatic
detection of cyberbullying in social media text. PLoS ONE 13(10):e0203794
4. Georgakopoulos SV, Tasoulis SK, Vrahatis AG, Plagianakos VP (2018) Convolutional neural
networks for toxic comment classification. In: Proceedings of the 10th hellenic conference on
artificial intelligence, pp 1–6
5. Saeed HH, Shahzad K, Kamiran F (2018, November) Overlapping toxic sentiment classifica-
tion using deep neural architectures. In: 2018 IEEE international conference on data mining
workshops (ICDMW), pp 1361–1366
6. Srivastava S, Khurana P, Tewari V (2018, August) Identifying aggression and toxicity in
comments using capsule network. In: Proceedings of the first workshop on trolling, aggression
and cyberbullying (TRAC-2018), pp 98–105
7. Kandasamy K, Koroth P (2014) An integrated approach to spam classification on Twitter using
URL analysis, natural language processing and machine learning techniques. In: 2014 IEEE
students’ conference on electrical, electronics and computer science, pp 1–5. IEEE
8. Anand M, Eswari R (2019, March) Classification of abusive comments in social media
using deep learning. In: 2019 3rd international conference on computing methodologies and
communication (ICCMC), pp 974–977
9. Uysal AK, Gunal S (2014) The impact of preprocessing on text classification. Inf Process
Manage 50(1):104–112
10. Hardeniya N, Perkins J, Chopra D, Joshi N, Mathur I (2016) Natural language processing:
python and NLTK. Packt Publishing Ltd
11. Orbay A, Akarun L (2020) Neural sign language translation by learning tokenization. arXiv
preprint arXiv:2002.00479
12. Hidayatullah AF, Ratnasari CI, Wisnugroho S (2016) Analysis of stemming influence on
indonesian tweet classification. Telkomnika 14(2):665
13. Yang X, Macdonald C, Ounis I (2018) Using word embeddings in twitter election classification.
Inform Retriev J 21(2–3):183–207
14. Santos I, Nedjah N, de Macedo Mourelle L (2017, November) Sentiment analysis using convo-
lutional neural network with fastText embeddings. In: 2017 IEEE Latin American conference
on computational intelligence (LA-CCI), pp 1–5
15. Wang Y, Wang J, Lin H, Tang X, Zhang S, Li L (2018) Bidirectional long short-term memory
with CRF for detecting biomedical event trigger in FastText semantic space. BMC Bioinform
19(20):507
16. Lilleberg J, Zhu Y, Zhang Y (2015, July) Support vector machines and word2vec for text
classification with semantic features. In: 2015 IEEE 14th international conference on cognitive
informatics & cognitive computing (ICCI* CC), pp 136–140
17. Chowdhury HA, Imon MAH, Islam MS (2018, December) A comparative analysis of word
embedding representations in authorship attribution of bengali literature. In: 2018 21st
international conference of computer and information technology (ICCIT), pp 1–6
18. Pavel MI, Akther A, Chowdhury I, Shuhin SA, Tajrin J (2019) Detection and recognition of
Bangladeshi fishes using surf and convolutional neural network. Int J Adv Res 7: 888–899
19. Risch J, Krestel R (2020) Toxic comment detection in online discussions. In: Deep learning-
based approaches for sentiment analysis, pp 85–109
20. Jacovi A, Shalom OS, Goldberg Y (2018) Understanding convolutional neural networks for
text classification. arXiv preprint arXiv:1809.08037
Toxic Comment Classification Implementing … 909

21. Wang S, Huang M, Deng Z (2018, July) Densely connected CNN with multi-scale feature
attention for text classification. IJCAI 4468–4474
22. Carta S, Corriga A, Mulas R, Recupero DR, Saia R (2019, September) A supervised multi-
class multi-label word embeddings approach for toxic comment classification. In: KDIR, pp
105–112
23. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language
processing (almost) from scratch. J Mach Learn Res 12:2493–2537
24. Gal Y, Ghahramani Z (2016) A theoretically grounded application of dropout in recurrent
neural networks. In: Advances in neural information processing systems, pp 1019–1027
25. Zhang Z (2018, June) Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM
26th international symposium on quality of service (IWQoS), pp 1–2
26. Toxic Comment Classification Challenge. (n.d.). Retrieved February 9, 2020, from https://
www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data
27. Dias C, Jangid M (2020) Vulgarity classification in comments using SVM and LSTM. In:
Smart systems and IoT: Innovations in computing, pp 543–553. Springer, Singapore
28. Kajla H, Hooda J, Saini G (2020, May) Classification of online toxic comments using machine
learning algorithms. In: 2020 4th international conference on intelligent computing and control
systems (ICICCS), pp 1119–1123
A Comprehensive Investigation About
Video Synopsis Methodology
and Research Challenges

Swati Jagtap and Nilkanth B. Chopade

Abstract With enormous growth in video surveillance technology, the challenges


in terms of data retrieval, monitoring, and browsing have been increased. A smarter
solution for this is a video synopsis technique that represents prolonged video in a
compact form based on the object trajectories rather than the key frame approach. It
converts long video footage into shorter video form while preserving all the activities
of the original video. The object trajectories are shifted in time domain as well as in
spatial domain to offer the maximum compactness while maintaining the sequence
of original source video. This paper gives the brief about the different approaches,
evaluation parameters, and datasets used to assess the quality of synopsis video.
The main objective is to investigate the query-based video synopsis useful for data
retrieval through activity clustering steps in the synopsis framework that will also
help to solve societal problems.

Keywords Video synopsis · Video surveillance · Object detection and


segmentation · Optimization · Activity clustering

1 Introduction

The exponential increase in technological enrichment demands the need for video
surveillance almost in all areas. Video surveillance plays an important role in terms
of security mostly for monitoring process, transport, public security, education field,
and many more [1]. There are some challenges of video surveillance that needs to
address. The enormous amount of data produced is hard to monitor continuously, and
processing of this data within a short period of time is a major challenge [2]. As the
surveillance camera is continuously tracking the events, there is a huge requirement

S. Jagtap (B) · N. B. Chopade


Department of Electronics and Telecommunication, Pimpri Chinchwad College of Engineering,
Pune, India
e-mail: swatialjagtap@gmail.com
N. B. Chopade
e-mail: nilkanth.chopade@pccoepune.org

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 911
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_66
912 S. Jagtap and N. B. Chopade

of memory for storage. Thus, data browsing for certain activities from this data will
take hours/days. Therefore, video browsing becomes tedious and time-consuming
task as a result most of the videos are not watched. A probable solution is a need for
a method that summarizes the video and can convert hours of video into mins. These
methods are called video condensation, which is further divided as frame-based,
object-based. Video summarization is a frame-based method defined as a process of
making and representing a detail abstract view of the whole video withinthe shortest
time period. The video summarization can be categorized into two parts: a static
video summary (storyboard) and a dynamic video summary (video trailer). The
static video summarization selects the key frames of a video sequence and mostly
used for indexing, browsing, and retrieval [3].
Dynamic video summary consists of video skimming and video fast forward.
Video skimming method selectsthe smallest dynamic portion called video skims
of audio and video to generate the video summary [4]. The movie trailer is one of
the most popular video skims in exercise [5]. Fast-forwarding methods processes
the frame depending upon the manual control or automatic setting of speed [6].
The video summarization methods condense the video data in a temporal domain
only. The spatial redundancy is not considered for condensation that reduces the
compression efficiency.
Figure 1 illustrates the difference between video summarization and video
synopsis. Video summarization extract the key frame based on the features such
as texture, shape, motion, where compression is achieved in temporal domain only.
While, video synopsis is a object-based compression algorithm that extracts the
activity from the original video that is represented in the form of tubes. The proper
rearrangement of tubes with same chronological order gives the compression in
temporal as well as in a spatial manner producing more compactness.
The paper is organized as follow: Section II covers the various video synopsis
approaches. The complete synopsis process flow and methodology is explained in

Fig. 1 a Video summarization extracting only the key frame. b Video synopsis displaying the
multiple objects from different time interval [7]
A Comprehensive Investigation About Video Synopsis … 913

section III. The evaluation parameters and datasets are reviewed in section IV. Section
V covers the research challenges in the field of video synopsis. Finally, section VI
covers the conclusion and discussion on future research.

2 Video Synopsis Approaches

Video synopsis can be categorized in different approaches based on optimization


method, activity clustering, and input domain. Optimization is the main step of video
synopsis which gives a proper arrangement of a tube to get better efficiency. The opti-
mization methods can be categorized in two types mainly online and offline methods.
In offline synopsis, all the moving objects are extracted and saved in a memory
with the instantaneous background before the optimization process. Therefore, there
is a need of huge memory and supporting hardware. In optimization process, the main
focus is energy minimization. However, this large memory requirement makes the
optimization more time-consuming to search the optimum solution. Also, the manual
control needed to decide the length of video synopsis. To address this problem, Feng
et al. [8] proposed an online synopsis which can overcome the problems of offline
optimization. In this method, the moving object detection and filling of the tube
are done real time and applicable for live video steam. There is no need for huge
memory to store the object; thus, the method also reduces the computation cost.
Most of the approaches use online optimization to reduce the computation cost and
reduction in memory requirement. Activity clustering is another approach that can
be used for increasing the efficiency. It is the additional step added after extracting
the moving object and representing as a tube. The tubes can be further categorized
into different cluster which can be used for smooth and quick video retrieval and
browsing. The input domain depends upon the camera topology and data domain
used. Type of camera topology plays an important part to decide the computation
complexity. The topology is classified as single camera and multicamera. Most of
the studies use single camera to find the optimum solution and reduce the compu-
tation complexity. A multicamera approach covers all cameras to find the optimal
solutions, but increases the computational complexity. The input data domain can be
categorized as pixel domain or compressed domain. Some of the approaches use the
compressed domain as the input to reduce the computational complexity than trans-
forming the data into the pixel domain. Table 1 categorizes the different approaches
used in studies.
The table gives the brief about research work need to be contributed in online
optimization, activity clustering, multicamera approach, and compression domain.
The online approach gives better performance to reduce the computational cost.
Activity clustering helps to improve the compression ratio.
914 S. Jagtap and N. B. Chopade

Table 1 Classification of video synopsis approaches


Studies Optimization Optimization Activity Camera topology Input
type method clustering (single/multicamera) domain
(Online/Offline) (pixel or
compressed)
Sekh et al. [9] Online Energy Yes Single Pixel
minimization YOLO9000
Raut et al. [10] Online Dynamic No Single Pixel
graph
coloring
approach
Ghatak et al. [11] Offline HSATLBO No single Pixel
optimization
approach
Li et al. [12] Offline Group No Single Pixel
partition
greedy
optimization
algorithm
Ra and Kim [13] Online Occupation No Single Pixel
matrices
with FFT
He et al. [14] Offline Potential No Single Pixel
collision
graph (PCG
Yumin et al. [15] Offline Genetic No Single Pixel
algorithm
Ansuman et al. Offline Table driven, Yes, Multi Pixel
[7] contradictory Simple
binary graph MKL
coloring
(CBGC)
approach,
and
simulated
annealing
(SA)
Balasubramanian Offline Energy Yes Single Pixel
et al. [16] minimization SVM
classifier
Zhong et al. [17] Offline Graph cut No Single Compressed
approach
Jianqing et al. Online Tube filling No Single Pixel
[18] algorithm
Greedy
optimization
Pritch et al. [19] Offline Simulated Yes Single Pixel
annealing SVM
method classifier
(continued)
A Comprehensive Investigation About Video Synopsis … 915

Table 1 (continued)
Studies Optimization Optimization Activity Camera topology Input
type method clustering (single/multicamera) domain
(Online/Offline) (pixel or
compressed)
Zheng et al. [20] Online Simulated No Single Compressed
annealing
method

3 Video Synopsis Framework and Methodology

Video synopsis is used to condense the size of the original video which makes the
data retrieval easy. Figure 2 describe the steps which involves in video synopsis
process. The initial step for video synopsis is to detect and track the moving object.
This is a preprocessing step and very important for further processing. The next step
involves activity clustering in which the clustering of the same object trajectories is
done. The next step is the main part of the synopsis algorithm called optimization.
It involves the optimal tube rearrangement for collision avoidance and to get the
compressed video. The tube rearrangement can be done based on user’s query which
helps to target the synopsis video depending on query given. After the optimum tube
rearrangement, the background is generated depending on surveillance video, and
the rearranged tubes are stitched with the background to get the compact view of the
original video.

3.1 Object Detection and Tracking

Object detection is the preliminary phase of any video synopsis process. Object
detection is followed by segmentation and tracking of trajectories of same object
called activity and represented as a tube. There are many challenges in tracking the

Fig. 2 Video synopsis observed process flow


916 S. Jagtap and N. B. Chopade

real-time object. Occlusions and illumination variations are the primary challenges.
The object tracking for synopsis involves appropriate segmentation of the object
in each frame. The activities present in the original video should be detected and
tracked correctly else it produces the blinking effect that is the sudden appearance
and disappearance of objects. Some of the approach with respective studies are listed
in Table 2
The evaluation parameter of video synopsis directly gets affected by the qualities
of object detection and tracking. To avoid this challenge, Pappalardo et al. [23]
introduced a toolbox to generate a dataset needed for testing with annotation.

Table 2 Object detection and tracking algorithms


Studies Object detection and Merits Demerits
tracking algorithms
Ahmed et al. [9] Mixture of Gaussian Self-adaptive to Does not give good
(MoG) illumination variation result in dense
Gaussian mixture model within short temporal foreground
GMM segments
Raun et al. [10] Sticky tracking Can minimize blinking May produce some
Visual background effect visual artifacts
extractor (ViBe)
algorithm
He et al. [14] Visual background Efficient background May produce some
extractor (ViBe) subtraction and visual artifacts
algorithm tracking algorithm
Linear neighbor object
prediction algorithm
Zhu et al. [18] Scale invariant local Better segmentation Losses the object
ternary pattern (SILTP) due to its illumination that stop moving
Sticky tracking invariant nature in
design and reduce the
blinking effect
Li et al. [21] Aggregated channel Fast feature extraction External
features (ACF) and powerful disturbances may
detection representation capacity effect the tracking
Kalman filter
Huang et al. [22] Contrast context Robust to geometric Ghost shadow occurs
histogram (CCH) and photometric and Occlusion
Gaussian mixture model transformations
GMM
Mahapatra et al. [7] Fuzzy inference system Fast and efficient Leaves holes in
DBSCAN detected object
A Comprehensive Investigation About Video Synopsis … 917

Table 3 Overview of activity


Studies Activity Findings
clustering methodology
clustering
methodology
Ahmed et al. [9] Convolutional To find object labels
neural network (three layers)
(CNN) [24]
Balasubramanian SVM classifier Classification based
et al. [16] on features of a face
Chou et al. [25] Hierarchical To cluster the similar
clustering trajectories
Mahapatra et al. [7] Simple MKL Action-based
clustering
Lin et al. [26] Patch-based Abnormal activity
training method classification

3.2 Activity Clustering

Activity clustering is used to categorize the activity trajectories of similar type


depending upon the object type, motion type, target as per the user query. The quality
of video synopsis can be improved by displaying similar activities together as it is also
easy for user to understand. It can be used for application-level rather than enhance-
ment in methodology. It gives good accuracy for video browsing application. Table
3 describes the activity clustering methodology with their description.
For clustering, convolution neural network gives the accuracy of around 70–80%.
However, advance deep learning approaches can be used to increase accuracy.

3.3 Optimization

Optimization is the main process of video synopsis. It is the process of the optimum
rearrangement of the tube to obtain the collision free and chronological arranged
compacted video. The rearrangement of foreground objects is expressed as reduc-
tion of energy in terms of activity cost, collision cost, and temporal consistency
cost of object trajectories. The activity cost assures the maximum number of object
trajectories in a video synopsis. The temporal consistency cost is used to preserve
the temporal order of the activities; therefore, the breakage of temporal order is
penalized. The collision cost helps to avoid spatial collisions between activities with
providing the better visual quality. Some of the optimization approaches are listed
in Table 4
The optimization methodology helps to rearrange the tube optimally; some
approaches focus on collision avoidance while some try to improve the compression
ratio. Improvement in all the parameters cannot be achieved at the same time.
918 S. Jagtap and N. B. Chopade

Table 4 Overview of optimization methodology


Studies Optimization methodology Findings
Raun et al. [27] Dynamic graph coloring The tube rearrangement using graph
approach coloring approach formed from
steaming video
Heet et al. [14] Potential collision graph (PCG) Tube rearrangement using collision
free (CF) and collision potential
(CP)
Pappalardo et al. [28] Improved PCG coloring Based on graph coloring approach
approaches by updating the PCG coloring
approaches through graph connected
component
Zhu et al. [18] Tube filling algorithm Tube rearrangement through
Greedy optimization Finding tube’s optimal location
Nie et al. [29] Markov chain Monte Carlo Tube arrangement and energy
(MCMC) algorithm minimization
Li et al. [21] Simulated annealing approach Energy minimization for tube
arrangement
Huang et al. [22] Markov random field with a Energy minimization for tube
simple greedy algorithm arrangement
Ghatak et al. [11] HSATLBO optimization Hybridization of teaching
approach learning-based optimization (TLBO)
algorithms and simulated annealing
and
Li et al. [12] Group partition greedy Tube rearrangement
optimization algorithm
Ra et al. [13] Occupation matrices with FFT Tube rearrangement using energy
minimization
Tian et al. [15] Genetic algorithm Tube rearrangement using energy
minimization
Li et al. [30] Seam carving using dynamic Tube rearrangement using seam
programming and greedy carving
optimization algorithm

3.4 Background Generation

A time-lapse background image is generated after optimizing the location of the


activity. The surveillance video is having a static background; however, the back-
ground image should include the changes reflecting day and night, illumination and
represents the natural background over time to improve the visual quality. The back-
ground generation is not related to the efficiency of video synopsis but it adds the
benefit in visual quality. The inconsistency between the tube and background may
produce the visual artifacts in video synopsis. Many of the approaches [11, 17] are
using temporal median filter to estimate the background of surveillance video.
A Comprehensive Investigation About Video Synopsis … 919

3.5 Stitching

This is the former step of video synopsis flow, where the tubes are stitched with
the generated time lapse background. The stitching does not affect effect on the
efficiency of video synopsis but it improves the visual quality. Many approaches
employed Poisson image editing to stitch a tube into the background by changing
the gradients.

4 Parametric Evaluation and Dataset

Parameters are used to assess the quality of video synopsis. Some of the parameters
are listed below.
1. Frame condensation ratio (FR) is defined as the ratio between the number of
frames in original video and the synopsis video [10].
2. Frame compact ratio (CR) is defined as the ratio between the number of object
pixel in original video and total pixels in synopsis video. It provides the informa-
tion about the spatial compression and measures the object occupying the spatial
space in synopsis video [10].
3. Non-overlapping ratio (NOR) is defined as the ratio between the number of pixels
that the object is occupying and sum of each object mask pixels of synopsis
video. It provides the information about the amount of collision between tubes
in a synopsis video [10].
4. Chronological disorder (CD) is defined as the ratio between the number of tubes
in reverse order and total number of tubes. It measures the chronological sequence
of the tube [10].
5. Runtime (s): Time required for generation of video synopsis.
6. Memory requirement: The memory utilization is measured using peak memory
usage and average memory usage.
7. Visual quality: It gives the visual appearance of the synoptic video which should
include all the activity that occurred in the original video
8. Objective evaluation: some of the approaches [31] also conduct the survey based
on the result in the objective way to validate the synopsis result by comparing
the visual appearance. The original video, proposed synopsis video, and existing
method synopsis videos are shown to fixed participants and certain question
based on appeared, compactness is asked. Based on the answers, the efficiency
of proposed synopsis is calculated.

4.1 Dataset

The datasets are needed to validate the performance of different methodology. The
presence of proper datasets helps to check the quality of results by the proposed
920 S. Jagtap and N. B. Chopade

Table 5 Overview of available datasets


Studies Datasets Description
Ahmed et al. [9] 1.VIRAT and Sherbrooke street Surveillance video dataset
surveillance video datasets
2. IIT-1 and IIT-2 dataset
Raun et al. [10] Online video using You tube Surveillance video
Pappalardo et al. [23] 1. UA-DETRAC dataset Real video consisting of traffic
scenes
Ghatak et al. [11] 1. PETS 2001, MIT surveillance Standard surveillance video
dataset data set
2. IIIT Bhubaneswar surveillance
dataset
3. PETS 2006 and UMN dataset
Mahapatra et al. [7] 1. KTH Evaluating action
2. WEIZMANN Recognition (first two)
3. PETS2009 synopsis generation (latter two)
4. LABV

methodology. The performance of video synopsis can be evaluated using the publicly
available dataset and outdoor videos. Table 5 lists the available dataset.
In some of the studies, the datasets are created by the researcher to check the
evaluation parameters. However, these datasets cannot be used to compare the result.
The assessment of the evaluation parameter is a tough task as the standard dataset
is not available. In some of the studies, the evaluation parameters are taken in reverse
ratio. Therefore, the comparison of different results will be a problematic task.

5 Future Research Challenges

Video synopsis technology has overcome many challenges in the area of video inves-
tigation and summarization, but there are many glitches within the scope of appli-
cation by itself. Some of the challenges in the field of video synopsis are given
below.
1. Object Interactions
The interaction between the object trajectories should be preserved while converting
original video into compacted form. For example, if two people are walking side by
side in a video. The tracking of tubes is done separately in optimization phase and
for collision avoidance; these tubes are rearranged in a way that they not ever met in
the final synopsis [32]. The rearrangement of the tubes should be implemented with
proper optimization algorithm, so the original integration can be preserved.
2. Dense Videos
A Comprehensive Investigation About Video Synopsis … 921

Another challenge is the crowded public places, where the source video is highly
dense with objects occupying all the location repeatedly. In this situation, the required
object can be kept alone in synopsis video but this may affect the chronological order
of the video and may create misperception to the user browsing for the particular
event or object in the resultant video. Also reduces the visual quality of the video.
The selection proper segmentation and tracking algorithm will help to overcome with
the challenge.
3. Camera Topology
The synopsis video quality may get affected by the camera topology used. Object
segmentation and tracking are an important phase of video synopsis in which the
source videos can be fetched using still camera or moving camera. The synopsis
generation will be difficult with moving camera as the viewing direction is constantly
varying. The background generation and the stitching step will be difficult for moving
camera as there will be continuous changes in background appearance. The multi-
camera approach will be another challenge in the generation of video synopsis as
object tracking and segmentation will be difficult as the number of inputs will be
more, and changeable background shift will be tough to predict.
4. Processing Speed
The faster real time speed can be achieved by system using a multi-core CPU imple-
mentation. The GPU further reduces the processing time and enhances the speed of
processor giving reduced value of runtime.
5. It is an optional step in video synopsis process flow but added advantage for
quick data retrieval and browsing. It increases the computational complexity but
can be used for many applications depending upon the user’s query. Depending
upon the user query, the clustering of the similar tubes can be generated, and
synopsis video is produced based on the clustering.

6 Conclusion

Video synopsis has gained more demand with the increases in CCTV and techno-
logical enrichment in the video analysis field. It is an emerging technology used to
represent the source video in compacted form based on the activities which can be
used in many applications. There are several approaches of video synopsis in which
online approach is used for real-time video steaming. Multicamera and compressed
domain approach need to explore for enhancing the efficiency of related parameters.
The video synopsis process flow starts with object detection, trajectories tracking,
activity clustering, tube rearrangement, background generation, and stitching. The
accuracy of tracking and segmentation of object trajectories can affect the quality of
synopsis video. The compression ratio can be improved by optimum arrangement
tubes. The proper chronological order and less collision between the tubes help to
enhance the visual quality.
922 S. Jagtap and N. B. Chopade

Numerous challenges can be addressed in future for further research. Video


synopsis is used for efficient data retrieval and browsing that can be effectively
addressed by query-based video synopsis through activity clustering step in a
synopsis process flow. The deep learning classifiers can be used for clustering to
improve efficiency. Other challenges may be the existence of a dense crowd, multiple
camera views, and the trade-off between the parametric evaluation of video synopsis.
Thus, there is a need to design a new framework for efficient optimization through
clustering of the tube to enhance the overall efficiency.

References

1. Markets and Markets (2019) Video surveillance market survey. https://www.marketsandma


rkets.com/Market-Reports/video-surveillance-market-645.html
2. Tewari D (2020) Video surveillance market by system type and application (commercial, mili-
tary & defense, infrastructure, residential, and others): global opportunity analysis and industry
forecast, 2017–2025. Video surveillance Market Outlook 2027.
3. Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM
Trans Multimedia Comput Commun Appl 3(1):3–es
4. Money AG, Agius H (2008) Video summarisation: A conceptual framework and survey of the
state of the art. J Vis Commun Image Represent 19(2):121–143
5. Smith MA (1995) Video skimming for quick browsing based on audio and image characteri-
zation
6. Petrovic N, Jojic N, Huang TS (2005) Adaptive video fast forward. Multimedia Tools Appl
26(3):327–344
7. Mahapatra A et al (2016) MVS: a multi-view video synopsis framework. Signal Processi:
Image Commun 42:31–44
8. Feng S et al (2012) Online content-aware video condensation. 2082–2087
9. Ahmed SA et al (2019) Query-based video synopsis for intelligent traffic monitoring
applications. IEEE Trans Intell Transport Syst 1–12.
10. Ruan T et al (2019) Rearranging online tubes for streaming video synopsis: a dynamic graph
coloring approach. IEEE Trans Image Process 28(8):3873–3884
11. Ghatak S et al (2020) An improved surveillance video synopsis framework: a HSATLBO
optimization approach. Multimedia Tools Appl 79(7):4429–4461
12. Li X, Wang Z, Lu X (2018) Video synopsis in complex situations. IEEE Trans Image Process
27(8):3798–3812
13. Ra M, Kim W (2018) Parallelized tube rearrangement algorithm for online video synopsis.
IEEE Signal Process Lett 25(8):1186–1190
14. He Y et al (2017) Fast online video synopsis based on potential collision graph. IEEE Signal
Process Lett 24(1):22–26
15. Tian Y et al (2016) Surveillance video synopsis generation method via keeping important
relationship among objects. IET Comput Vis 10:868–872
16. Balasubramanian Y, Sivasankaran K, Krishraj SP (2016) Forensic video solution using facial
feature-based synoptic video footage record. IET Comput Vis 10(4):315–320
17. Zhong R et al (2014) Fast synopsis for moving objects using compressed video. IEEE Signal
Process Lett 21:1–1
18. Zhu J et al (2015) High-performance video condensation system. IEEE Trans Circuits Syst
Video Technol 25(7):1113–1124
19. Pritch Y et al (2009) Clustered Synopsis of Surveillance Video. In: 2009 Sixth IEEE
international conference on advanced video and signal based surveillance
A Comprehensive Investigation About Video Synopsis … 923

20. Wang S, Wang Z-Y, Hu R-M (2013) Surveillance video synopsis in the compressed domain
for fast video browsing. J Vis Commun Image Represent 24:1431–1442
21. Li X, Wang Z, Lu X (2016) Surveillance video synopsis via scaling down objects. IEEE Trans
Image Process 25(2):740–755
22. Huang C-R et al (2014) Maximum a Posteriori probability estimation for online surveillance
video synopsis. IEEE Trans Circ Syst Video Technol 24:1417–1429
23. Pappalardo G et al (2019) A new framework for studying tubes rearrangement strategies in
surveillance video synopsis. In: 2019 IEEE international conference on image processing (ICIP)
24. Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. IEEE Conf Comput Vis
Pattern Recogn (CVPR) 2017:6517–6525
25. Chien-Li C et al (2015) Coherent event-based surveillance video synopsis using trajec-
tory clustering. . In: 2015 IEEE international conference on multimedia & expo workshops
(ICMEW)
26. Lin W et al (2015) Summarizing surveillance videos with local-patch-learning-based abnor-
mality detection, blob sequence optimization, and type-based synopsis. Neurocomputing
155:84–98
27. Rav-Acha A, Pritch Y, Peleg S (2006) Making a long video short: dynamic video synopsis.
In: 2006 IEEE computer society conference on computer vision and pattern recognition
(CVPR’06), vol 1, pp 435–441
28. He Y et al (2016) Graph coloring based surveillance video synopsis. Neurocomputing 225
29. Nie Y et al (2019) Collision-free video synopsis incorporating object speed and size changes.
IEEE Trans Image Process: Publ IEEE Signal Process Soc
30. Li K et al (2016) An effective video synopsis approach with seam carving. IEEE Signal Process
Lett 23(1):11–14
31. Fu W et al (2014) Online video synopsis of structured motion. Neurocomputing 135:155–162
32. Namitha K, Narayanan A (2018) Video synopsis: state-of-the-art and research challenges
Effective Multimodal Opinion Mining
Framework Using Ensemble Learning
Technique for Disease Risk Prediction

V. J. Aiswaryadevi, S. Kiruthika, G. S. Priyanka, N. Nataraj,


and M. S. Sruthi

Abstract Multimodal sentiment analysis framework is tremendously applied in


the medical and healthcare sectors. Identifying the depression, differently disabled
person speech recognition, Alzheimer, low pressure or heart problems and those
kinds of impairments are widely addressed. Video data is processed for mining
feature polarity from acoustic, linguistic and visual data upon extraction from the
same. The feature data set extracted from the YouTube videos contains comments,
likes, views and shares for expressing the polarity of information conveyed through
the streaming videos. Static information from a video file is extracted in the form
of linguistic representation. Musical data extracted and transformed in the linguistic
form is used for polarity classification using ensemble-based random forest algo-
rithm which has encountered with the error rate of 4.76%. Short feature vectors are
expressed in the visualizing musical data and trending YouTube videos data set for
utilizing the transformed and SF vectors from video and musical data. Accuracy of
the ensemble-based learning is obtained as 91.6% which is tougher than any other
algorithms to achieve using the same set of machine learning algorithms. Proper
filter wrapping of batch data is used for the split ratio of 5 percentage split window.
When SVM is used in discrimination with ensemble random forest algorithm, the
predicted results contain an error rate of 2.64% which improves the accuracy of the
classifier along the soft margin with an accuracy of 96.53% accuracy.

V. J. Aiswaryadevi (B)
Dr NGP Institute of Technology, Coimbatore, India
e-mail: aiswarya.devi@live.com
S. Kiruthika · G. S. Priyanka · M. S. Sruthi
Sri Krishna College of Technology, Coimbatore, India
e-mail: kiruthika.s@skct.edu.in
G. S. Priyanka
e-mail: priyanka.g@skct.edu.in
M. S. Sruthi
e-mail: sruthi.ms@skct.edu.in
N. Nataraj
Bannari Amman Institute of Technology, Sathyamangalam, India
e-mail: nataraj@bitsathy.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 925
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_67
926 V. J. Aiswaryadevi et al.

Keywords Sentiment analysis · SVM · Ensemble random forest · Multimodal


opinion mining

1 Introduction

Many researchers are working on the construction of the multimodal opinion mining
framework. A clinical decision support system is employed with automation and
without human intervention using machine learning algorithms and deep learning
algorithms in recent days massively. Deep learning networks are also used along
with the ensemble-based extreme learning machines due to the problem of overfitting
in several depths leading to a sparse density of traversal towards the goal state.
Traditional random forest works with simple and quite effective accuracy in object
recognition and goal state prediction. Here, the traditional random forest is sampled
with its input data sampled with goal-based constraints. The seeded sample is taken
into random forest module execution for opinion mining framework construction.
SVM machine learning algorithm is also trained on the same set of samples trained by
random forest classifier for the random sampling on the data. The prediction results
and parameter are discussed based on observations noted.
Section 2 briefs about the data set under analysis, and Section 3 describes the goal-
based ensemble learning algorithm with the set of goal constraints and its predicates.
Section 4 speaks about the ensemble-based learning algorithm and its effectiveness.
Section 5 depicts the results derived by using the ensemble-based opinion mining
framework.

2 Related Works

Multimodal sentiment analysis framework is constructed using many data mining,


machine learning algorithms and also even deep learning algorithms for health care,
ministry and military data, especially in the e-learning industries. It is widely aiding
the diverse variety of industry sectors deployed researchers and industry innova-
tions. The polarity classification is attained with 88.60% of accuracy using CNN
algorithm with the aid of word2vector on the data set described by Morency et al.
(2011) in [1]. Subjectivity detection on linguistic data using ELM paradigm which
combines the features of Bayesian networks and fuzzy recurrent neural networks
was done by Chaturvedi et al. [2] showing the accuracy results hiked to 89% using
Bayesian networks-based extreme learning machines in [2]. Ensemble methods are
adopted for the textual data by Tran et al. in [3] for YouTube sentic computing
features. Short-time Fourier transform (STFT)-based Mel-frequency cepstral coef-
ficients [4] were calculated using ensemble-based extreme learning machines for
feature-level attributes in YouTube videos establishing accuracy of 77% achieved by
Hu et al. [5]. Naive Bayes classifier + support vector machines-lyric features were
Effective Multimodal Opinion Mining Framework … 927

generated using Doc2Vec data set of 100 Telugu songs (audio + lyrics). From the
experimental results, the recognition rate is observed to be in between 85 and 91.2%.
The percentage of lyric sentiment analysis can be improved by using rule-based and
linguistic approach shown in [6]. The USC IEMOCAP database [6] was collected
to study multimodal expressive dyadic interactions in 2017 by [7]. Another exper-
imental study showed that while using CNNSVM produced a 79.14% accuracy, an
accuracy of only 75.50% was achieved using CNN. Multimodal sentiment analysis,
data set, multimodal emotion recognition data set and the visual module of CRMKL
[8] obtained 27% higher accuracy than the state of the art. When all modalities were
used, 96.55% accuracy was obtained outperforming the state of the art by more than
20%. The visual classifier trained on the MOUD which obtained 93.60% accuracy
[9] got 85.30% accuracy on the ICT-MMMO data set [10] using the trained visual
sentiment model on the MOUD data set. Many historical works failed to reduce the
overfitting caused due to deep neurons and decision levels.

3 About the Data Set

Chord bigrams of piano-driven and guitar-driven musical strings are extracted and
transformed into the linguistic form in the visualizing musical data set. Bigram
features such as B-flat, E-flat and A-flat chords are frequently occurring musical
strings in the piano and guitar music. YouTube trending data set contains the number
of comments, likes and shares expressed for each video in the data set. Using
prefiltering algorithms goal-specific rules the needful information alone is extracted
from the videos and musical strings of the data set. Histogram segmentation is used
for video sequences, and Mel-frequency spel spectrum is used for musical sequences
for sparse filtering.

4 Goal-Based Ensemble Learning

The detailed multimodal opinion mining frameworks are expressed in terms of five
basic steps, namely collection of raw data, pre-processing and filtering, classification
of filtered data, sentiment polarity extraction and analysing the performance param-
eter. Goal-based features are alone extracted for analysis. The following flowchart
provides the flow of opinion mining framework for multimodal sentiment analysis
(Fig. 1).
928 V. J. Aiswaryadevi et al.

CollecƟon of Data set transformed


raw data and random sampled

Preprocessing
and filtering

SenƟment
Polarity
ExtracƟon

Analye the
performance
parameter

Goal based
machine
learning

Fig. 1 Opinion mining framework construction flowchart

4.1 Random Forest

Goal-based data mining algorithms [11] are used for forming the decision trees
[12]. Bootstrapped decision trees are constructed using 150 samples under random
sampling and 10 features sampled using feature sampling and bagged using a majority
of the polarity expressed by the bootstrapped data. A simple model for end-stage liver
disease risk prediction [13] is implemented using the ensemble-based random forest
algorithm with 500 bootstrapped decision trees and achieved the accuracy of 97.22%
with Gaussian filter normalization [14] with the random sampling rate of Gaussian
Naïve Bayes Classifier [15] specified below in Eq. 1. An MDR data set is developed
as like MELD data [16] set using the normalized short feature vectors from YouTube
trending videos [17] and visualizing musical Data [18].
  2 
1 xi − μ y
P(xi /y) =  exp − (1)
2π σ y
2 2σ y2

Random sampling is done at the seed samples of 2019 among 11,110 data entries
using Gaussian normalization distribution (Fig. 2).
Effective Multimodal Opinion Mining Framework … 929

Fig. 2 Bootstrapping the random samples and feature samples

4.2 SVM for Logistic Regression on Random Sampling

SVMs are used for setting the UB and LB sample rate by soft margin random sampling
amongst the entire data set handpicked from the multimodal MDR [19] data set. Utter-
ances are neglected for sampling. Only the sentiment score and polarity expressed
in the data are taken into account. Hypervector parameter [20] is used for effective
classification of random samples with the seeding rate of 200 per vector (Figs. 3 and
4).

Fig. 3 SVM random sampling on short feature vector


930 V. J. Aiswaryadevi et al.

Fig. 4 SVM feature sampling and scaling vectors expressed by the data set transformation

Feature sampling is done at 80 samples per vector, and the sampled features are
bootstrapped using the decision tree algorithms. The accuracy rate expressed by the
random forest generated using the random sampled and feature sampled chords is
discussed in the results below.

5 Analysis of Performance Parameters

SoftMax classifier is used with performance measure indices reflected through the
confusion matrix describing the true positive rate, false positive rate, true negative
rate and false negative rate. A confusion matrix for an actual and predicted class is
formed comprising of TP, FP, TN and FN to evaluate the parameter. The significance
of the terms is given below TP = True Positive (Correctly Identified) TN = True
Negative (Incorrectly Identified) FP = False Positive (Correctly Rejected) FN =
False Negative (Incorrectly Rejected). The performance of the proposed system is
measured by the following formulas:

(TP samples + TN samples)


Accuracy (ACC) = (2)
TP + TN + FP + FN
TP samples
Sensitivity (Sens) = (3)
TP + FN
TN samples
Specificity (Sp) = (4)
TN + FP
Effective Multimodal Opinion Mining Framework … 931

6 Results and Discussions

The results and performance metric indices derived are expressed as follows: The
number of samples present before random sampling and feature sampling is 11,110
records, whereas, after random sampling with the seed of 200 seeds, the sample
frames are created with 660 records. Amongst the true positive rate, the accuracy
rate obtained is demonstrated below with a dot plot and confusion matrix.
Number of trees 5700
No. of variables tried at each split 500
OOB 4.76%

Confusion matrix

TP FN Error (Class)
Positive Polarity 250 5 0.01960784
Negative Polarity 14 130 0.09722222

Fig. 5 Multimodal opinion mining framework for disease risk prediction


932 V. J. Aiswaryadevi et al.

In Fig. 5, the disease risk prediction rate of random forest is well expressed with
the sample rate at each bagging node. The accuracy level increases and Gini index
increases for maximum accuracy on the classification.

References

1. Poria S, Cambria E, Gelbukh A (2015) Deep convolutional neural network textual features and
multiple kernel learning for utterance-level multimodal sentiment analysis. In: Proceedings of
the 2015 conference on empirical methods in natural language processing, pp 2539–2544
2. Chaturvedi I, Ragusa E, Gastaldo P, Zunino R, Cambria E (2018) Bayesian network based
extreme learning machine for subjectivity detection. J Franklin Inst 355(4):1780–1797
3. Tran HN, Cambria E (2018) Ensemble application of ELM and GPU for real-time multimodal
sentiment analysis. Memetic Computing 10(1):3–13
4. Poria S, Majumder N, Hazarika D, Cambria E, Gelbukh A, Hussain A (2018) Multimodal
sentiment analysis: addressing key issues and setting up the baselines. IEEE Intell Syst
33(6):17–25
5. Hu P, Zhen L, Peng D, Liu P (2019) Scalable deep multimodal learning for cross-modal
retrieval. In: Proceedings of the 42nd international ACM SIGIR conference on research and
development in information retrieval (SIGIR’19). Association for Computing Machinery, New
York, NY, USA, pp 635–644. https://doi.org/10.1145/3331184.3331213
6. Abburi H, Akkireddy ESA, Gangashetti S, Mamidi R (2016) Multimodal sentiment analysis
of Telugu songs. In: SAAIP@ IJCAI, pp 48–52
7. Poria S, Peng H, Hussain A, Howard N, Cambria E (2017) Ensemble application of convo-
lutional neural networks and multiple kernel learning for multimodal sentiment analysis.
Neurocomputing 261:217–230
8. Busso C, Deng Z, Yildirim S, Bulut M, Lee CM, Kazemzadeh A, Lee S, Neumann U, Narayanan
S (2004) Analysis of emotion recognition using facial expressions, speech and multimodal
information. In: Proceedings of the 6th international conference on multimodal interfaces.
ACM, pp 205–211
9. Poria S, Chaturvedi I, Cambria E, Hussain A (2016) Convolutional MKL based multimodal
emotion recognition and sentiment analysis. In: 2016 IEEE 16th international conference on
data mining (ICDM). IEEE, pp 439–448
10. Calhoun VD, Sui J (2016) Multimodal fusion of brain imaging data: a key to finding the missing
link(s) in complex mental illness. Biological pysychiatry. Cogn Neurosci Neuroimaging
1(3):230–244. https://doi.org/10.1016/j.bpsc.2015.12.005
11. Lin WH, Hauptmann A (2002) News video classification using SVM-based multimodal clas-
sifiers and combination strategies. In: Proceedings of the tenth ACM international conference
on multimedia. ACM, pp 323–326
12. Falvo A, Comminiello D, Scardapane S, Scarpiniti M, Uncini A (2020) A multimodal
deep network for the reconstruction of T2W MR Images. In: Smart innovation, systems
and technologies. Springer, Singapore, pp 423–431. https://doi.org/10.1007/978-981-15-5093-
5_38
13. Kim Y, Jiang X, Giancardo L et al (2020) Multimodal phenotyping of alzheimer’s disease with
longitudinal magnetic resonance imaging and cognitive function data. Sci Rep 10:5527. https://
doi.org/10.1038/s41598-020-62263-w
14. Rozgić V, Ananthakrishnan S, Saleem S, Kumar R, Prasad R (2012) Ensemble of SVM trees
for multimodal emotion recognition. In: Proceedings of the 2012 Asia Pacific signal and
information processing association annual summit and conference. IEEE, pp 1–4
15. Xu X, He L, Lu H, Gao L, Ji Y (2019) Deep adversarial metric learning for cross-modal
retrieval. World Wide Web 22(2):657–672. https://doi.org/10.1007/s11280-018-0541-x
Effective Multimodal Opinion Mining Framework … 933

16. Kahou SE, Bouthillier X, Lamblin P, Gulcehre C, Michalski V, Konda K, Jean S, Froumenty P,
Dauphin Y, Boulanger-Lewandowski N, Ferrari RC (2016) Emonets: multimodal deep learning
approaches for emotion recognition in video. J Multimodal User Interfaces 10(2):99–111
17. Jin K, Wang Y, Wu C (2021) Multimodal affective computing based on weighted linear fusion.
In: Arai K, Kapoor S, Bhatia R (eds) Intelligent systems and applications. IntelliSys 2020.
Advances in intelligent systems and computing, vol 1252. Springer, Cham. https://doi.org/10.
1007/978-3-030-55190-2_1
18. Ranganathan H, Chakraborty S, Panchanathan S (2016) Multimodal emotion recognition using
deep learning architectures. In: 2016 IEEE winter conference on applications of computer vision
(WACV). IEEE, pp 1–9
19. Majumder N, Hazarika D, Gelbukh A, Cambria E, Poria S (2018) Multimodal sentiment
analysis using hierarchical fusion with context modeling. Knowl-Based Syst 161:124–133
20. Soleymani M, Garcia D, Jou B, Schuller B, Chang SF, Pantic M (2017) A survey of multimodal
sentiment analysis. Image Vis Comput 65:3–14
Vertical Fragmentation
of High-Dimensional Data Using Feature
Selection

Raji Ramachandran, Gopika Ravichandran, and Aswathi Raveendran

Abstract Fragmentation in a distributed database is a design technique that reduces


query processing time by keeping the relation size small. When it comes to storing
high-dimensional data in a distributed manner, the processing time increases. This is
due to the huge attribute size. In this paper, a method is proposed which can reduce the
size of high-dimensional data by using feature selection technique. This technique
reduces dimensions by removing irrelevant or correlated attributes from the dataset
without removing any relevant data. The algorithm used for feature selection and
vertical fragmentation is the random forest and Bond Energy Algorithm (BEA),
respectively. Experiments show that our method can produce better fragments.

Keywords Feature selection · Random forest · Vertical fragmentation · Bond


energy algorithm

1 Introduction

Owing to the need of today’s business world, many organizations run in distributed
manner and hence stores data in distributed databases. Banking systems, consumer
supermarkets, manufacturing companies, etc. are some examples. These organiza-
tions have branches working in different locations and therefore stores their data in
a distributed manner. Fragmentation is a design technique in distributed databases,
in which instead of storing a relation entirely in one location, it is fragmented into
different units and stored at different locations. Fragmentation provides data to the
user from the nearest location as per the user’s requirement. Fragmentation increases

R. Ramachandran · G. Ravichandran · A. Raveendran (B)


Department of Computer Science and Applications, Amrita Vishwa Vidyapeetham, Amritapuri,
India
e-mail: aswathipoduval97@gmail.com
R. Ramachandran
e-mail: rajir@am.amrita.edu
G. Ravichandran
e-mail: gopikaravichandran@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 935
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_68
936 R. Ramachandran et al.

efficiency by reducing the size of the table, and hence, the search time and also
provides security and privacy to the data. The fragmentation process has three cate-
gories: Horizontal, vertical, and hybridized fragmentation. Diagrammatic represen-
tation of these is shown in Fig. 1. Fragmentation is partitioning off a relation F into
fragments F 1 , F 2 , …, F i , containing enough information to reconstruct the original
relation F.
In horizontal fragmentation, data is fragmented tuple wise based on minter predi-
cates. This helps all related data to fall in a particular fragment. In this case, most of the
time user queries need search in minimum fragments [1]. Whereas, in vertical frag-
mentation, data is fragmented attribute wise. This is with the assumption that users
access certain related attributes together, and hence, if they are kept in one fragment,
then user queries can be executed faster. Our system considers vertical fragmenta-
tion. The benefit of vertical fragmentation is that only a few related attributes are
stored in each site, comparing to the original attribute set. Also, attributes are stored
according to the access frequency of attributes at different sites. All these factors
reduce the query processing time in distributed databases.
In the case of hybrid fragmentation, data is fragmented vertically as well as hori-
zontally. This method creates fragments with minimal information, attribute wise as
well as tuple wise [2].

Fig. 1 Fragmentation type


Vertical Fragmentation of High-Dimensional Data … 937

Today’s world mainly deals with a large volume of data called big data. Big data
is a collection of data which is a large size and yet increasing day by day. It contains
high dimensions and needs a large amount of space for its storage.
When it comes to storing big data in a distributed manner, even after fragmentation,
each fragment will be large. As the size of fragments increases, time for query
execution also increases [3]. So, if the fragment size can be reduced as much as
possible, then that will speed up the query execution process.
When high-dimensional data is considered, it can be seen that all those dimensions
may not be important or they may be interrelated, and that redundancy occurs in the set
of attributes. Removing this irrelevant or repeated dimensions will reduce the attribute
size of dataset and hence that of fragments produced in vertical fragmentation. This
paper proposes a novel approach for vertical fragmentation of high-dimensional data
using feature selection.
Dimensionality reduction techniques are divided into two categories—feature
selection and feature extraction. Feature selection is used for reducing the attribute
size before vertical fragmentation. Feature selection is the technique which allows
us to select the most relevant features. It is done according to the relative importance
of each feature on the prediction. It eventually increases the accuracy of the model
by removing irrelevant features. Even though there exists different types of feature
selection methods, random forest algorithm (supervised) is focused on because its
efficiency is better compared to other feature selection methods [4].
The rest of the paper is organized as follows. Section 2 discusses the major work
already done in the vertical fragmentation as well as in feature selection. Our proposed
method of vertical fragmentation based on feature selection is explained in Sect. 3.
Experimentation conducted on various datasets, and their result analysis is done in
Sect. 4, and the paper concludes in Sect. 5.

2 Literature Review

Fragmentation in distributed databases is undergoing rapid evolution as new methods


are being developed with an ever-expanding availability of data. This section explains
the major work done in this area. The feature selection techniques became popular
ever since the emergence of big data. The section also explains the main techniques
and applications of different feature selection techniques.
Performance of the distributed database can be increased only if proper distribu-
tion design is included. One of them is fragmentation technique, and fragmentation
of relational databases has started since the 1080s [5].
The distributed relational database has fragmented vertically that contains simple
attribute and simple method by Thendral and Madu Viswanathan [6]. Here, the
input is a set of user queries which are entered at different sites. Partitioning is
done using bond energy algorithm by considering query’s frequency which the user
enters. This method has been advantaging that fragmentation allocation and update
938 R. Ramachandran et al.

are implemented successfully for the stand-alone system. But only update queries
are handled in the paper; for delete and alter extensions are being needed.
A similar work has been done by Rahimi et al. [7]. Here, fragmentation is
performed in a hierarchical manner using the applied bond energy algorithm with
a modified affinity measure and then calculates the cost of allocating fragments to
each site and allocates fragments to the correct site. The hierarchical method results
in more related attributes which enable better fragments. However, the cost function
considered for fragment allocation is not an optimized one.
Dorel Savulea and Nicolae Constantinescu in their paper [8] uses a combination
of a conventional database containing fact and a knowledge base containing rules
for vertical fragmentation. The paper also presents us with the implementation of
different algorithms related with fragmentation and allocation methods, namely RCA
rules for clustering, OVF for computing overlapping vertical fragmentation, and
CCA for allocating rules and corresponding fragments [9]. Here, attribute clustering
in vertical fragmentation is not determined by attribute affinity matrix, as usual, but
it is done using the rule to attribute dependency matrices. The algorithm is efficient
but a small number operation can only perform [10].
A case study of vertical fragmentation is done by Iacob (Ciobanu) Nicoleta—
Magdalenaa [11]. The paper explains briefly about of distributed database and the
importance of fragmentation and its strategies. A comparison between different types
of fragmentation is also done here. This case study is done by implementing the e-
learning platform for the academic environment using vertical fragmentation [12].
The paper explained how vertical fragmentation increases concurrency and thereby
causes an increase in throughput for query processing.
Feature selection helps to reduce overfitting and reduces its size by removing
irrelevant features. There are mainly three types of feature selection; they are wrapper
method, filter method, and embedded method [13]. In the wrapper method, subsets
of features are generated; then, the features are deleted or added in the subset. In the
filter method, feature selection is done based on the scores of the statistical test. The
embedded method combines both the features of the wrapper method and the filter
method. Random forest classifier comes under the wrapper method [4].
Jehad Ali, Rehanullah Khan, Nasir Ahmad, Imran Maqsood on their paper random
forest and decision tree made a comparative study on the classification result of
random forest and decision tree by using 20 datasets available in UCI repository. They
made a comparative study based on correctly classified instances in both decision
tree and random forest by taking a number of instance and number of attributes [14].
On the comparison, the paper concluded that the percentage of correctly classified
instances is high in random forest, and incorrectly classified instances are lower than
that of a decision tree. The comparison is also done on recall, precision, and F-
measure. In the comparison, random forest has increased classification performance,
and the results are also accurate [15].
The study on the random forest is done by Leo Breiman in his paper named
random forests. The paper gives a high theoretical knowledge of random forest, and
it includes the history of random forest. The complete steps of the random forests are
explained by computation. The random forest for regression is formed in addition to
Vertical Fragmentation of High-Dimensional Data … 939

classification [16]. It is concluded that random features and random inputs produce
better results in classification than regression. But only two types of randomness are
used here that are bagging and random features; other injected randomness gives a
better result.
The application of the random forest algorithm in computer fault diagnosis is
given by Yang and Soo-Jong Lee. The paper describes a technique that helps to
diagnose rotating machinery fault. In this, a classifier for a novel assembly constructs
a significant amount of decision tree. Even though there exist many fault diagnosis
techniques, the random forest methodology is considered to be better because of its
executed speed. Here, the randomness like bagging is used, the bootstrap acronym
which is a meta-algorithm that enhances classification [17]. However, a minor change
in the training set in a randomized procedure can trigger a major difference between
the component classifier and the classifier trained in the whole dataset.
One proposal is made by Ramon Casanova Santiago Saldana, Emily Y. Chew,
Ronald P. Danis, Craig M. Greven, Walter T. Ambrosiu in their paper implementa-
tion of random forest methods for diabetic retinopathy analyzes. Early detection of
retinopathy diabetic can prevent the chances of becoming blind. The approach used
by 3443 participants in the ACCORD-Eye analysis is random forest and logistic
regression classificatory on graded fundus photography and systematic results. They
concluded that RF-based models provided a higher classification of ion accuracy
than logistic regression [18]. The result suggests that the random forest method can
be one of the better tool to diagnose diabetic retinopathy analysis and also evaluating
its progression. But here different degrees of retinopathy are not evaluated.
Even though there exist many applications of feature selection in the big data
area, it has not yet been used in distributed databases for vertical fragmentation, to
the best of our knowledge.

3 Proposed Method

As stated earlier, our proposed method consists of vertical fragmentation of high-


dimensional data, after removing irrelevant or correlated attribute, using a feature
selection method.

3.1 Architectural Diagram

The proposed system architecture is shown in Fig. 2.


As shown in the architectural diagram, our proposed method of vertical fragmen-
tation is done in two phases.
Phase 1: Feature selection of high-dimensional data.
Phase 2: Vertical fragmentation.
The processing of each phase is explained below.
940 R. Ramachandran et al.

Fig. 2 System architecture

Phase 1: Feature selection phase:


This phase converts high-dimensional dataset to low-dimensional dataset by
removing irrelevant information by using the feature selection method, namely
random forest algorithm.
The steps in the random forest are given below.
Step 1: Select the random samples from the dataset.
Step 2: Using selected samples create decision trees, and the result of the prediction
is got from all decision trees.
Step 3: Voting will be carried out for every expected outcome.
Step 4: Ultimately, pick the most significant result as the final outcome of
prediction.
Phase 2: Vertical fragmentation phase:
Vertical fragmentation of the reduced dimensional dataset is done in this phase
using the Bond Energy Algorithm (BEA) [19].
The steps of BEA are given below.
Step 1: Create usage matrix for each class from the user queries. Given a set of
queries Q = q1 , q2 , ..., qq that will run on the relation R[A1 , A2 , ..., An ],

1, if A j is used by qi
use(qi , A j ) = (1)
0, otherwise

Step 2: Create access frequency matrix of queries for each class for each site.
Step 3: By using access frequency and method usage matrix, affinity matrix is
determined.
Step 4: The clustered matrix is built from an affinity matrix.
Step 5: Partitioning algorithm is used to obtain partitions.
Partition point is the point that divides the attributes into separate classes to allow
the multiple sites to be allocated. Two-way partitioning was done, i.e., division of
the attributes must be assigned to two locations in two classes. Attributes to the left
Vertical Fragmentation of High-Dimensional Data … 941

of the partition point belong to one site while attributes to the right belong to another
site [20].
The fragments produced using the BEA can be allocated to various nodes of the
distributed database using the allocation algorithm.

4 Experimentation and Result Analysis

Experimentation of our proposed method is done using various parameters like time,
no. of fragments as well as the average number of dimensions in each fragment.
For experimentation purpose, five datasets have been taken from the UCI repository.
Details of the datasets are given in Table 1. The complexity and space consumption
are reduced by using a feature selection method is seen.
Time taken for fragmenting the dataset with and without feature selection is shown
in Fig. 3. As seen from the graph, when the high-dimensional data is reduced to low-
dimensional data using feature selection, it also reduces the fragmentation time. As
the dimensionality of big data increases, a considerable reduction in fragmentation
time will be got, if remove irrelevant or dependent features before fragmentation.

Table 1 Dataset used


Dataset Number of attributes Number of tuples
Madelon train 100 4400
Arrhythmia 200 452
Anonymous 300 3371
Microsoft Web
Rock art features 400 1105
Madelon 500 4400

Fig. 3 Comparison based on fragmentation time


942 R. Ramachandran et al.

Fig. 4 Comparison based on number of fragments formed

Also, the number of fragments produced after fragmentation plays an important


role in determining the efficiency of the distributed system. As the fragment count
increases, more fragments need to be searched for processing a single query and that
will, in turn, increase the query processing time. A comparison based on number of
fragments produced for different datasets is shown in Fig. 4. When feature selection
is used as a preprocessing step of fragmentation, only relevant features will be consid-
ered for vertical fragmentation. This ensures that the no. of fragments produced will
be sufficient to answer user queries.
When a query requires only a single fragment for its processing, the number of
dimensions in that fragment plays a critical role in performance. If related attributes
can be kept together in a fragment, then it can reduce query processing time drasti-
cally. When fragmentation is done after feature selection, it can be made sure that
each fragment formed contains only limited and relevant attributes. Table 2 shows
the average number of dimensions in each fragment after feature selection.
It is evident from the table that feature selection can result in producing fragments
with limited dimensions after fragmentation. In general, experiments show that our
method can produce better fragments with respect to various other methods.

Table 2 Number of
Dataset Number of fragments Average no. of dimensions
dimension in each fragments
D1 10 5
D2 18 6
D3 22 8
D4 28 9
D5 32 10
Vertical Fragmentation of High-Dimensional Data … 943

5 Conclusion

The paper proposes a new method for vertical fragmentation of high-dimensional


data using feature selection. Removing irrelevant or correlated attributes before frag-
mentation can reduce the dimension size, as well as it can produce better fragments.
A random forest is chosen for the feature selection and bond energy algorithm for
the vertical fragmentation. When high dimensional is reduced to low dimension and
then fragment, it will also considerably reduce the query execution time. Alloca-
tion of fragments to various nodes of the distributed database is kept as a future
enhancement.

References

1. Ramachandran R, Nair DP, Jasmi J (2016) A horizontal fragmentation method based on


data semantics. In: 2016 IEEE international conference on computational intelligence and
computing research (ICCIC), Chennai, India
2. Ramachandran R, Harikumar S (2015) Hybridized fragmentation of very large databases
using clustering. In: 2015 IEEE international conference on signal processing, informatics,
communication and energy systems, Chennai, India
3. Jacob JS, Preetha KG (2001) Vertical fragmentation of location information to enable location
privacy in pervasive computing, India. In: IFIP/ACM International Conference on Distributed
Systems Platforms and Open Distributed Processing (2001)
4. Kursa MB, Rudnicki WR (2018) The all relevant feature selection using random forest. IEEE,
Bhopal, India (2018)
5. Vertical Fragmentation in Relational Database Systems-(Update Queries In- cluded). IEEE
Communications Surveys
6. Matteo G, Maio D, Rizzi S (1999) Vertical fragmentation of views in relational data warehouses.
In: SEBD, pp 19–33
7. Rahimi H, Parand F, Riahilarly D (2018) Hierarchical simultaneous vertical fragmentation and
allocation using modified bond energy algorithm in distributed databases, India (2018)
8. Savulea D, Constantinescu N (2011) Vertical fragmentation security study in distributed
deductive databases
9. Chakravarthy S, Muthuraj J, Varadarajan R, Navathe SB (1993) An objective function for verti-
cally partitioning relations in distributed databases and its analysis. Distrib Parallel Databases
2(1):183–207
10. Bellatreche L, Simonet A, Simonet M (1996) Vertical fragmentation in dis- tributed object
database systems with complex attributes and methods. IEEE
11. Rogers J, Gunn S (2016) Identifying feature relevance using a random forest, India 91
12. Cornell D, Yu PS (1987) A vertical partitioning algorithm for relational databases. In:
Proceedings of the third international conference on data engineering. IEEE
13. Miaoa J, Niu L (2016) A survey on feature selection. In: Information technology and
quantitative management, India
14. Mahsereci Karabulut E, Ay¸se O¨ zel S, Turgay İ (2012) A comparative study on the effect of
feature selection on classification accuracy. Turkey
15. Ani R, Augustine A, Akhil NC, Deepa Gopakumar OS (2016) Random forest ensemble classi-
fier to predict the coronary heart disease using risk factors. In: Proceedings of the international
conference on soft computing systems
16. Reif DM, Motsinger AA, McKinney BA, Crowe JE Jr, Moore JH (2015) Feature selection
using a random forests classifier for the integrated analysis of multiple data types
944 R. Ramachandran et al.

17. Draminski M, Rada-Iglesias A, Enroth S, Wadelius, C, Koronacki J, Komorowski J (2008)


Monte Carlo “feature selection for supervised classification”. Bioinformatics 24(1):110–117
18. Reif DM, Motsinger AA, McKinney BA (2006) Feature selection using random forest classifier
on integrated data type
19. Mehta S, Agarwal P, Shrivastava P, Barlawala J (2018) Differential bond energy algorithm for
optimal vertical fragmentation of distributed databases
20. Puyalnithi T, Viswanatham M (2015) Vertical fragmentation, allocation and re-fragmentation
in distributed object relational database systems with update queries
Extrapolation of Futuristic Application
of Robotics: A Review

D. V. S. Pavan Karthik and S. Pranavanand

Abstract The phase of life has been constantly developing. From a bicycle to the
fastest car, the future is led by the latest breakthroughs in the field of science,
medicine, space research, marine explorations, and many more. One such break-
through is robotics. People are familiarized with robots by watching them on televi-
sion, computers, and less likely in real life. Robotics revolutionize the very purpose
of humans and their needs. Based on the panorama obtained, this paper includes
profound research and applications made in the field of robotics, which put forward
the machine dominance in the industry.

Keywords Robotics · Data analytics · Space exploration · Medicine

1 Introduction

Robots can leave constrained industrial environments and reach out to unexplored and
unstructured areas, for extensive applications in the real world with substantial utility.
Throughout history, there has always been a forecast about robotics thriving and being
able to manage tasks and mimic human behavior. Today, as technological advances
continue, researching, designing, and building new robots serve various practical
purposes in fields like medicine, space research, marine exploration, manufacturing
and assembling, data analytics, armory, and so on.
In fields like manufacturing, assembly, medical and surgical implementations,
robotics essentially minimizes human flaws, and to increase accuracy. On the other
hand, in fields like space and marine exploration, robotics make it possible for us to
reach unbelievable heights in areas that are practically impossible to reach. With the
existing technologies, various applications have already been made. However, the
future of robotics has a lot in the hold.

D. V. S. Pavan Karthik (B) · S. Pranavanand


Vallurupalli Nageswara Rao Vignana Jyothi Institute of Engineering and Technology,
Secunderabad, Telangana, India
e-mail: pavan.karthik12@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 945
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_69
946 D. V. S. P. Karthik and S. Pranavanand

1.1 Robotics in the Field of Medicine

The use of robotics in the medical sector has been constantly upgraded to meet the
accuracy and demand in surgeries. A 16-segment biomechanical model [1] of the
human body is made, and its 3D model realization is done using the SolidWorks
medium to facilitate the movement according to the task but for an arm or any limb
to move similar to a real one. One should know the geometric and mass-inertial char-
acteristics of body segments, to gain an overview of these properties a mathematical
model which predicts the inertial properties of the human body in any fixed body
position (Sitting) is made, and the model is used to develop a design. The model is
used to determine the human behavior in space, ergonomics, criminology, and other
areas.
Brain–Machine Interface (BMI) [2] is an interactive software which helps in
communication with the robot and the environment. It can be used for a wide range
of patients. The information is taken from the user’s electroencephalographic (EEG)
signals (Fig. 1) and adapts accordingly to the user’s daily requirement by providing
almost the same inputs as with the real limb.
In the European Commission (EC), the directorate general of information society
and media is provoking the use of technology, which has been proven useful in the
health care sector [3].
A brain tumor is a deadly chronic disease, be it for a child or an adult. The most
efficient way of locating a tumor is with the help of an Magnetic Resonance Imaging
(MRI). MRI, in coordination with robotics, has a better scope of success rate [4].
For example, the tumor may be neglected and may spread to different parts of the
body, which may be complicated for the naked eye and present equipment to detect.
But with the help of continuum robots, this probability is reduced to the maximum.
Microbots [5] are deployed into the affected area which provides a first-person review
and is efficient in taking decisions and accordingly performing the required activity,
where there is no accountability of human caliber and approach. The robot uses

Fig. 1 Actual set up of BMI software


Extrapolation of Futuristic Application … 947

the process of electrocautery (cautery using a needle or other instrument that is


electrically heated) to remove the tumor. This mesoscale approach helps the surgeon
to locate both the surgical robot as well as the tumor to completely eradicate the
tumor.
On the other hand, surgical knives require certain experience to go flawlessly.
The introduction of the water-jet surgery [6] increases precision and maintains the
perfect pressure and abrasion for every tissue or organ in the human body. Water is
sprayed at high pressures, and another probe absorbs the water spraying at the other
end to facilitate the view of endoscopic optics.
Taking care of elderly people is associated with both time and patience. Due to the
shortage of caregivers, people with Mild Cognitive Impairment (MCI) have trouble
controlling their actions and movements. With the help of robotics [7], the patients
can be supported with an instrument that gives them directions and the bits of help
control their posture (something like a smart walking stick which can be used for
supporting the elderly in physical activity based on their motion).
Pain is one of the most natural and common feelings ever endured by humans
but for people affected by CRPS the pain and symptoms last for months. A robotic
exoskeleton integrated with a Virtual Reality (VR) [8] dashboard is designed to assess
proprioception at rest and during active movement in people suffering from CRPS
as these people try to exaggerate their movements and hurt themselves. This system
helps them by giving them information on how their body can properly move without
taking any stress on the muscles.
People who undergo mental therapy feel the need to talk to someone. They might
get offended or feel very uncomfortable. For this very purpose, robots can be intro-
duced, and they can also be used for recreational purposes, helping people cope up
with their mental illness by having a good chat with them [9].
Not only humans, but robotics can also be used for animals to help with their
disabilities. For example, rats suffering from neurological disorders or spinal cord
injuries tend to have hindered movement. A robotic exoskeleton [10] is designed to
help rats move, which consists of Soft Pneumatic Actuators (SPAs). According to
some theories, it is discovered that all flora and fauna are born with the genes of
their parents. Now, where do all these features get stored? They are stored in what is
called Deoxyribonucleic Acid (DNA), and they determine the features inherited by
the offspring. The structure of the DNA is a double-helix model and consists of tiny
thread-like structures that carry genetic information.
To treat some cases mostly coronary disease, micro-reeling of microfibers [11] is
done. It is counterfeiting the vascular microstructure using Electromagnetic Needle
(EMN). To achieve smooth reeling, the trajectory of the EMN tip is already prede-
termined. The EMN reels the microfiber containing magnetic nanoparticles around
a micropillar; to keep the microfiber from being attracted to the EMN tip, a dual-ring
structure is designed. The main advantage of the robotic system is that it involves
high accuracy and stability in fabricating these microstructures.
Performing surgeries from great distances have been a great advantage in today’s
science thanks to the advanced robots—the Telerobots [12]. This particular kind of
robot was discovered to those who need help in space as people couldn’t go the
948 D. V. S. P. Karthik and S. Pranavanand

distances. Yet, it is more than effectively used on earth. Not to forget, there is a slight
communication gap between the surgeon and the patient.
The very recent invention in the field of robotics is Xenobots [Based on, https://
en.m.wikipedia.org/wiki/Xenobots] the living and self-healing robot made from the
stem cells of a frog (Xenopus laevis). It is a completely different species of robots,
perfectly small to travel in the human body. They can go without food or water for
about a month or two. Generally, tiny robots made of iron and plastic are harmful once
they decay in the body. On the contrary, Xenobots are almost degradable compared
to the other ones. They can be used for targeted drug delivery or elimination of any
disease on a microscopic level.

1.2 Robotics in Space Research

Space is another milestone that mankind has achieved in the previous decade. But
space can be as challenging as a game of shogi—without a proper strategy, and a
plan it is very complicated to cruise through. In places where there might be a risk for
human landing, robots can be used as testbeds [13] and test the landing site before
any human interactions. Humanoid robots [14] can be used for unmanned and deep
space missions and retrieve information. The main objective of the robot should be
able to perform these actions accordingly.
1. Communication dish: Adjust pitch and yaw alignment of a communication dish.
2. Solar array: Retrieve, deploy, and connect a new solar panel to the existing solar
array.
3. Air leak: Climb the stairs to a habitat entrance, enter the habitat, find an air leak
using a leak detector tool, and repair the air leak using a patch.
DRC-Hubo [14] the humanoid robot that can recognize color and position of LED
which displays on a console panel, can press the button that opens the door, and can
walk through the doorway.
Not all astronauts are blessed to be the customers of the robots on the International
Space Station (ISS). People on ISS face many problems such as repetitive work and
difficulty performing their experiments in microgravity. To conquer this problem,
NASA invented a robot named Astrobee (Fig. 2) [15], which can perform repetitive
tasks, carry loads, and provide a platform for the spacemen to conduct their research.
It has a hinge that can cling on to the rails of the ISS, thereby increasing the standby
time and reduction of fuel consumption to oppose gravity. It is more than speculating
to have a robot co-worker or a co-pilot by your side.
Many people may have watched films, where the robot steers the space shuttle
and later retrieve information, where there is no possibility of human existence. Out
of the many, ESA’s METERON SUPVIS Justin is the experiment, where astronauts
on-board the International Space Station (ISS) commanded a humanoid robot Rollin’
Justin [16] in a simulated Martian environment on earth. This type of operation is
highly recommended for the spacemen as they have trouble controlling their motor in
Extrapolation of Futuristic Application … 949

Fig. 2 Structure of Astrobee

microgravity apart from their mental load due to any uninvited problems on board.
Another advantage is that the robot can freely be controlled by a variety of User
Interface (UI) such as a tablet and such.
Ever feared that one-day debris or a spaceship came crashing into the atmosphere?
Well, one shouldn’t freak out because humans have been so smart to come with a
countermeasure. A machine that is fitted with a kinematically redundant robot [17]
its main principle is based on target motion prediction. It moves based on a reference
trajectory provided by the ground control and constantly corrects its trajectory with
the help of a tracking controller and finally takes the grasp. The duration of grasping
is selected based on the initial contact forces that pass through the center of mass
of the chaser (The robot which grabs the target) to minimize the aftereffects caused
by a change in altitude. It either delivers the object down to earth or may set it back
in its trajectory, thereby reducing ballistic impact. This method can also be used to
remove space debris from the earth’s orbit.

1.3 Robotics in Marine Exploration

The space is much known than our waters. Half our oceans remain unexplored, and
there might be a cure for every disease that has struck mankind, lost knowledge, and
much more right under our noses. Things that strike our mind when think of water are
the vessels, ships, and boats. Out of all ways, the worst way to lose your vessel is to let
950 D. V. S. P. Karthik and S. Pranavanand

it sink in the deep waters. A hull is the main part of the ship, and its design is the only
way it is to float and resist the friction. Due to humidity and it constantly cruising in
the waters, rust is invited which in turn corrodes the metal leading to leakages in the
vessel and resulting in the sinking. To counter this problem, researchers have come
up with an idea of a swarm of deep-water robots [18] which detect the breaches in
the hull and then notifies the crew on board in emergencies. They form a cluster, and
they rearrange themselves in the area of the water infiltration. They are the physical
representation of the quote “United we stand, divided we fall.” They have a higher
resistance to sensor noise, and the probability of the robotic population going haywire
is near to zero, thereby reducing casualties and economy in the marine industry.
Deep marine exploration has been possible only due to the use of robots and
their ability to transfer the information among themselves and take necessary action.
Crabster (CR200) [19] is a six-legged crab-like robot (Fig. 3) made for deep-sea
explorations that can withstand turbidity and underwater waste. It has been currently
tested in a water tank which is simulated to a scenario to that of the wild sea currents.
A platform named OceanRINGS [20] technologies which can be associated with
almost any Remotely Operated Vehicle (ROV) independent of the size. Tests were
conducted with different support vessels off the North, South, and West coast of
Ireland, in Donegal, Bantry Bay, Cork Harbor, Galway Bay, Shannon Estuary, and
La Spezia, Italy. It is also provided with the ground to a prototype communication
system in real-time. It is based on the principle of remote presence technology.
Marine renewable energy (Oil and gas) is equally important as oxygen in our
lives, making our lives possible. Sometimes, the oil or gas forms offshore, and for
such purposes, OceanRINGS have put forward the idea of building robotic systems
for inspection of offshore sub-sea marine renewable energy [21]. They are capable
of resisting extremely harsh weather conditions and send information related to the

Fig. 3 Crabster equipped with different devices


Extrapolation of Futuristic Application … 951

location and amount of reserves to the Virtual Control Cabin (VCC) on the ground,
making renewable energy available to the population. This smart technology could
lead to significant savings in time, maintenance, and operational costs.

1.4 Robotics in Manufacturing and Assembling

In the manufacturing industry, human hours are not so efficient. Humans prefer
to work in a safe and flexible environment, but cannot always be provided with
luxury. [22] Therefore replacing them with robots increases efficiency and production
compared to human labor. They reduce the cost of manufacturing. They can carry out
any work for up to 12 hrs straight and can rectify many human errors and mistakes
in quality control, thereby proving themselves fitter for the job than humans. For
example, if a person needs to lift an object of about 25 kg he /she will experience
pain in the back.
People tend to forget the places they put their things. To improve the stacking and
retrieving of things, people have come up with a robot which stacks [23] the required
object at a particular place and at a particular time, and it makes a note of it, which
later uses this information to retrieve the object back from, where it was placed.
The Statue of Liberty was gifted by French to the United States on account of
their independence, but it was imported in parts by ships. It would have certainly
taken about four months or so just to assemble it. Imagine if it were to be gifted in
this era, where there is a constraint for space and labor. Pinning these limitations in
mind, a new approach is put forward, where the fixtures and the tooling are all taken
care of by the coordinated mobile robots [24]. The mobility of the robot’s results in
reevaluating the assembly space and reducing the cost and effort of labor. Sometimes
robots need to undergo complex coordination to get the parts, and the tools at the
designated place and time to obtain the desired assembly. The assembling process
is cut down to these four basic points: (1) mobile manipulator hardware design,
(2) fixture-free positioning, (3) multi-robot coordination, and (4) real-time dynamic
scheduling.
Additive manufacturing [based on https://en.m.wikipedia.org/wiki/3D_printing]
or commonly known as 3D printing has greatly influenced the mass production
market. It is used to manufacture the complex of parts which are difficult to manu-
facture and produce (Fig. 4). There is a bundle of advantages when it comes to 3D
printing, which includes:
1. Rapid prototyping: As the name suggests, it aids faster production. It just takes
hours to produce unlike the usage of other typical methods which may result in
days.
2. A quick analyzing technique: Manufacturing an experimental product to check
its properties and its functions, thereby having an awareness of the pros and cons
of the product when going for large-scale production.
952 D. V. S. P. Karthik and S. Pranavanand

Fig. 4 A finished product using the 3D printer. Image courtesy: www.zdnet.com

3. Waste reduction: The material is used according to the product only, and the
remaining material is later used.
4. Custom: Every product designed can be customizable in size, shape, color, and
structure.
5. Precision: In some fields of work a millimeter plays an important role in the
machine’s efficiency. For example, springs in watches are of very small size, and
they require great time and precision to craft them by other means. However,
here, it is done with pinpoint accuracy and in a short time.

1.5 Robotics in Data Analytics

One of the most budding and enriching technologies which are on par with both AI
and robotics are big data, otherwise known as the cloud, dew, and fog computing
[25]. The robots have advanced to such a state, where all the information and data
are stored, verified, and then sent to the user. To store and execute such large data and
algorithms, robots need a much larger storage space apart from hard drives. This is
where cloud and fog come into the picture. With their immense storage calculations,
executions of functions are performed at higher speeds to meet the demands of the
growing population and the corporate world. C2RO (Collaborative Cloud Robotics)
[26] is a cloud platform that uses a stream processing technology to connect the
city to mobile devices and sensors. This technology boosts the intelligence of robots
to a larger scale by being able to perform complicated tasks such as simultaneous
Extrapolation of Futuristic Application … 953

localization and mapping (SLAM), speech recognition, and 3D grasp planning as


they retrieve the information from the cloud or fog.

1.6 Robotics in SOS Missions

During a natural calamity or disaster, most of the time loss of human life are
inevitable. In such situations, drones can carry out search and rescue missions. The
safest way to get in or out of a forest is to follow the existing trail generally made
by hikers and mountaineers. The robot needs to look for the trail and then make
an effort to stay on the trail. A machine learning approach to the visual perception
of forest trails and gathering information is made by training the neural network
with the various real-world dataset and testing it by operating on a single image, the
system outputs the main direction of the trail compared to the viewing direction. The
probable direction is determined using Deep Neural Networks (DNNs) as an image
classifier [27], which operates reading the image’s pixels. It is used to determine the
actions and avoid obstacles in the wilderness. It is mainly made to navigate in places,
where humans cannot reach with their existing approach. Finding people who have
lost their way into dense forests or maybe a rugged terrain might not be completely
impossible, but for a robot of the size of an arm that might be a cakewalk.

1.7 Robotics in Armory

War might not be a good thing to point our views onto, but have an opportunity of
introducing unmanned vehicles controlled by robots to reduce casualties on a large
scale. Unmanned Aerial Systems (UAS) [28] or can be simply quoted as drones have
proven worth of themselves in unmanned missions. They can substitute missiles,
torpedoes resulting in decreasing sacrificial and suicidal missions. Another alterna-
tive is the use of robotic soldiers who have higher endurance and strength are capable
of the battle for longer durations, i.e., robotic ammunition, etc.

1.8 Robotics in Teaching

Teaching today’s kids, tomorrow’s future is more than important. Robotics can turn
ideas into reality. Inculcating it in the curriculum [29], both for schools and univer-
sities will trigger the future generations toward the budding field. If the teacher is
a robot, then students will make an effort to listen. By grabbing their attention and
which in turn results in a proper academic and social career, robots can be of great
help. Robots can train athletes and sportsmen toward glory. They can instruct the
pupils with the help of speech, the most effective form of communication. This feat
954 D. V. S. P. Karthik and S. Pranavanand

is achieved by using Artificial Neural Network (ANN) [30]. It can follow simple
instructions as of now, thereby making robotics efficient in all walks of life.

2 Conclusion

The above literature review gives us a basic clue of robotics in our daily lives and
its use in the long run. It also gives an overview of the existing and the upcoming
technology in the vast field of robotics. Apart from the cited fields of usage, there
might be many other fields in which robotics are the fundamental building block.
Reaching the human level of intelligence and exposure is currently an issue as the
robots can perform tasks only which they are programmed for. Yet, research to
achieve maximum human-like characteristics is still on the run.

References

1. Nikolova G, Kotev V, Dantchev D (2017) CAD modelling of human body for robotics
applications. In: 2017 international conference on control, artificial intelligence, robotics &
optimization (ICCAIRO), Prague, pp 45–50. https://doi.org/10.1109/ICCAIRO.2017.18
2. Schiatti L, Tessadori J, Barresi G, Mattos LS, Ajoudani A (2017) Soft brain-machine inter-
faces for assistive robotics: a novel control approach. In: 2017 International conference on
rehabilitation robotics (ICORR), London, pp 863–869. https://doi.org/10.1109/ICORR.2017.
8009357
3. Gelderblom GJ, De Wilt M, Cremers G, Rensma A (2009) Rehabilitation robotics in robotics for
healthcare; a roadmap study for the European Commission. In: 2009 IEEE international confer-
ence on rehabilitation robotics, Kyoto, 2009, pp 834–838. https://doi.org/10.1109/ICORR.
2009.5209498
4. Kim Y, Cheng SS, Diakite M, Gullapalli RP, Simard JM, Desai JP (2017) Toward the devel-
opment of a flexible mesoscale MRI-compatible neurosurgical continuum robot. IEEE Trans.
Rob. 33(6):1386–1397. https://doi.org/10.1109/TRO.2017.2719035
5. Ongaro F, Pane S, Scheggi S, Misra S (2019) Design of an electromagnetic setup for independent
three-dimensional control of pairs of identical and nonidentical microrobots. IEEE Trans Rob
35(1):174–183. https://doi.org/10.1109/TRO.2018.2875393
6. Schlenk C, Schwier A, Heiss M, Bahls T, Albu-Schäffer A (2019) Design of a robotic instrument
for minimally invasive waterjet surgery. In: 2019 International symposium on medical robotics
(ISMR), Atlanta, GA, USA, pp 1–7. https://doi.org/10.1109/ISMR.2019.8710186
7. Stogl D, Armbruster O, Mende M, Hein B, Wang X, Meyer P (2019) Robot-based training for
people with mild cognitive impairment. IEEE Robot Autom Lett 4(2):1916–1923. https://doi.
org/10.1109/LRA.2019.2898470
8. Brun C, Giorgi N, Gagné M, Mercier C, McCabe CS (2017) Combining robotics and virtual
reality to assess proprioception in individuals with chronic pain. In: 2017 International confer-
ence on virtual rehabilitation (ICVR), Montreal, QC, pp 1–2. https://doi.org/10.1109/ICVR.
2017.8007491
9. Meghdari A, Alemi M, Khamooshi M, Amoozandeh A, Shariati A, Mozafari B (2016) Concep-
tual design of a social robot for pediatric hospitals. In: 2016 4th international conference on
robotics and mechatronics (ICROM), Tehran, pp 566–571. https://doi.org/10.1109/ICRoM.
2016.7886804
Extrapolation of Futuristic Application … 955

10. Florez JM et al (2017) Rehabilitative soft exoskeleton for rodents. IEEE Trans Neural Syst
Rehabil Eng 25(2):107–118. https://doi.org/10.1109/TNSRE.2016.2535352
11. Sun T et al (2017) Robotics-based micro-reeling of magnetic microfibers to fabricate helical
structure for smooth muscle cells culture. In: 2017 IEEE international conference on robotics
and automation (ICRA), Singapore, 2017, pp 5983–5988. https://doi.org/10.1109/ICRA.2017.
7989706
12. Takács Á, Jordán S, Nagy DÁ, Tar JK, Rudas IJ, Haidegger T (2015) Surgical robotics—
born in space. In: 2015 IEEE 10th Jubilee international symposium on applied computational
intelligence and informatics, Timisoara, pp 547–551. https://doi.org/10.1109/SACI.2015.720
8264
13. Backes P et al (2018) The intelligent robotics system architecture applied to robotics testbeds
and research platforms. In: 2018 IEEE aerospace conference, Big Sky, MT, 2018, pp 1–8.
https://doi.org/10.1109/AERO.2018.8396770
14. Tanaka Y, Lee H, Wallace D, Jun Y, Oh P, Inaba M (2017) Toward deep space humanoid
robotics inspired by the NASA space robotics challenge. In: 2017 14th international conference
on ubiquitous robots and ambient intelligence (URAI), Jeju, pp 14–19. https://doi.org/10.1109/
URAI.2017.7992877
15. Yoo J, Park I, To V, Lum JQH, Smith T (2015) Avionics and perching systems of free-flying
robots for the International Space Station. In: 2015 IEEE international symposium on systems
engineering (ISSE), Rome, pp 198–201. https://doi.org/10.1109/SysEng.2015.7302756
16. Schmaus P et al (2020) Knowledge driven orbit-to-ground teleoperation of a Robot coworker.
IEEE Robot Autom Lett 5(1):143–150. https://doi.org/10.1109/LRA.2019.2948128
17. Lampariello R, Mishra H, Oumer N, Schmidt P, De Stefano M, Albu-Schäffer A (2018)
Tracking control for the grasping of a tumbling satellite with a free-floating robot. IEEE Robot.
Autom. Lett. 3(4):3638–3645. https://doi.org/10.1109/LRA.2018.2855799
18. Haire M, Xu X, Alboul L, Penders J, Zhang H (2019) Ship hull inspection using a swarm of
autonomous underwater robots: a search algorithm. In: 2019 IEEE international symposium on
safety, security, and rescue robotics (SSRR), Würzburg, Germany, 2019, pp 114–115. https://
doi.org/10.1109/SSRR.2019.8848963
19. Yoo S et al (2015) Preliminary water tank test of a multi-legged underwater robot for seabed
explorations. In: OCEANS 2015—MTS/IEEE Washington, Washington, DC, 2015, pp 1–6.
https://doi.org/10.23919/OCEANS.2015.7404409
20. Omerdic E, Toal D, Dooly G (2015) Remote presence: powerful tool for promotion, education
and research in marine robotics. In: OCEANS 2015—Genova, Genoa, 2015, pp 1–7. https://
doi.org/10.1109/OCEANS-Genova.2015.7271467
21. Omerdic E, Toal D, Dooly G, Kaknjo A (2014) Remote presence: long endurance robotic
systems for routine inspection of offshore subsea oil & gas installations and marine renewable
energy devices. In: 2014 oceans—St. John’s, NL, 2014, pp 1–9. https://doi.org/10.1109/OCE
ANS.2014.7003054
22. Hirukawa H (2015) Robotics for innovation. In: 2015 symposium on VLSI circuits (VLSI
circuits), Kyoto, 2015, pp T2–T5. https://doi.org/10.1109/VLSIC.2015.7231379
23. Chong Z et al (2018) An innovative robotics stowing strategy for inventory replenishment
in automated storage and retrieval system. In: 2018 15th international conference on control,
automation, robotics and vision (ICARCV), Singapore, pp 305–310. https://doi.org/10.1109/
ICARCV.2018.8581338
24. Bourne et al D (2015) Mobile manufacturing of large structures. In: 2015 IEEE international
conference on robotics and automation (ICRA), Seattle, WA, pp 1565–1572. https://doi.org/
10.1109/ICRA.2015.7139397
25. Botta A, Gallo L, Ventre G (2019) Cloud, fog, and dew robotics: architectures for next gener-
ation applications. In: 2019 7th IEEE international conference on mobile cloud computing,
services, and engineering (MobileCloud), Newark, CA, USA, pp 16–23. https://doi.org/10.
1109/MobileCloud.2019.00010
26. Beigi NK, Partov B, Farokhi S (2017) Real-time cloud robotics in practical smart city appli-
cations. In: 2017 IEEE 28th annual international symposium on personal, indoor, and mobile
956 D. V. S. P. Karthik and S. Pranavanand

radio communications (PIMRC), Montreal, QC, 2017, pp 1-5. https://doi.org/10.1109/PIMRC.


2017.8292655.
27. Giusti A et al (2016) A machine learning approach to visual perception of forest trails for
mobile robots. IEEE Robot. Autom. Lett. 1(2):661–667. https://doi.org/10.1109/LRA.2015.
2509024
28. Sanchez-Lopez JL et al (2016) AEROSTACK: an architecture and open-source software frame-
work for aerial robotics. In: 2016 international conference on unmanned aircraft systems
(ICUAS), Arlington, VA, pp 332–341. https://doi.org/10.1109/ICUAS.2016.7502591
29. Niehaus F, Kotze B, Marais A (2019) Facilitation by using robotics teaching and learning.
In: 2019 Southern African Universities power engineering conference/robotics and mecha-
tronics/pattern recognition association of South Africa (SAUPEC/RobMech/PRASA), Bloem-
fontein, South Africa, pp 86–90. https://doi.org/10.1109/RoboMech.2019.8704848
30. Joshi N, Kumar A, Chakraborty P, Kala R (2015) Speech controlled robotics using Artificial
Neural Network. In: 2015 Third international conference on image information processing
(ICIIP), Waknaghat, pp 526–530. https://doi.org/10.1109/ICIIP.2015.7414829
AI-Based Digital Marketing
Strategies—A Review

B. R. Arun Kumar

Abstract Artificial Intelligence (AI) techniques are applied for customer data and
that can be analyzed to anticipate customer behaviour. The AI, the big data and
advanced analytics techniques can handle both structured and unstructured data effi-
ciently with great speed and precision than regular computer technology which elicits
Digital Marketing (DM). AI techniques enable to construe emotions and connect
like a human which made prospective AI-based DM firms to think AI as a ‘business
advantage’. Marketers are data rich but insight poor is no longer enviable due to AI
tools which optimize marketing operation and effectiveness. This paper highlights the
significance of applying AI strategies in effectively reaching the customer in terms of
understanding their behaviour to find their expectations on the product features, oper-
ations, maintenance, delivery, etc. using machine learning techniques. It highlights
that such strategies enable digital marketing towards customer need-based business.

Keywords Artificial intelligence · Machine learning · Digital humans ·


Chatbot’s · Digital marketing strategies

1 Introduction

Digital Marketing (DM) involves promoting efforts that use an electronic device or
the Internet utilizing digital channels such as electronic search engines, electronic
social media, email and websites. DM which uses electronic and Internet to connect
to current and prospective customers can also be denoted as ‘online marketing’,
‘Internet marketing’ or ‘web marketing’.
Online marketing strategies implemented using the Internet, and its related
communicating hardware/software devices/technologies can be referred to as digital
marketing.

B. R. Arun Kumar (B)


Department of Master of Computer Applications, BMS Institute of Technology and Management
(Affiliated to Vivesvaraya Technological University, Belagavi), Doddaballapura Main Road,
Avalahalli, Yelahanka, Bengaluru, Karnataka 560064, India
e-mail: arunkumarbr@bmsit.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 957
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_70
958 B. R. Arun Kumar

DM implementation broadly involves the following steps as presented in Fig. 1


[1]:

Fig. 1 Digital marketing


methodology [1]
AI-Based Digital Marketing Strategies—A Review 959

Fig. 2 Planning a digital


marketing strategy (Source
Ref. [1])

Redefining the strategy is essential to broaden the reachability of the brand, when-
ever new product/service gets introduced. Goal definition may be re-established
to bring brand awareness and goodwill among the customers using digital tools.
Changes in goals/strategies expect changes in the action plan to practically implement
on digital platforms (Fig. 2).
Reaching the customers using the Internet, electronic gadgets such as smart-
phones, social media, search engines, understanding customer behaviour, their pref-
erences, by applying analytic tools and analyzing their results are a comprehensive
emerging, and dynamic domain of the Digital Marketing (DM) is quite different
from traditional marketing. Several studies have projected that 85% of the customer-
business relationship will be maintained using AI tools [2], and the AI market is
appraised to be assets $9.88 billion by 2022.
Coviello, Milley and Marcolin define e-Marketing as ’Using the Internet and
other interactive technologies to create and mediate dialogue between the firm and
identified customers’. DM is the broad term that makes use of different marketing
strategies/tools, namely website, email, Internet, content, video, smartphone, PPC
advertising and SMS messaging. Along with digital strategies/tools, the following
basic guidelines which are the core of DM that is worth recall which is described as
essential guidelines for DM to start with.

1.1 Essential Guidelines for Starting DM

It is difficult and challenging to get the particular website of the business which is a
top-ranked Search Engine Result Page (SERP) among nearly 14 billion searches per
month in the globe all strategies of DM should be optimized including social media
960 B. R. Arun Kumar

marketing, PPC and other DM tasks (https://www.educba.com/seo-in-digital-market


ing/). To be successful in DM, the following tips depicted in Fig. 3 are essential for
beginners.
Search Engine Optimization (SEO) technique if adopted enables search engine
ranking of the website/business. SEO strategy is the key that positions the website
during the critical activities of the business such as the buying/selling process along
with keeping user experience into consideration.
SEO is the process of finding and driving the customer base towards the busi-
ness and the company among n number of e-businesses working with DM strategy.
Search engine advertising is a form of Pay Per Click (PPC) model of Internet
marketing which promotes a growing customer base and generating lead at opti-
mized cost. Conversion Rate Optimization (CRO) ensures to increase the percentage
of the customers who visited the website completes the specific action and improves
customer satisfaction. Higher conversion rates are better as it increases ROI, user
base, user experience, lead generation and reduces acquisition cost. Web analytics
tools also enable us to understand user behaviour and get valuable marketing
intelligence.
To ensure the particular brand is well ranked following points to be pondered apart
from the initial tips for the beginners. The other recent preferred gadgets are, namely
domain naming, optimizing the results for experience on desktop, mobile, tablet.
SEO strategies and social media marketing need to go hand in hand. The ultimate

Fig. 3 Essential tips for beginners of DM


AI-Based Digital Marketing Strategies—A Review 961

aim of content marketing strategies is to get profitable customer action. It shall be


noted that 86% of the business today uses content marketing, where virtually content
plays a significant role in all marketing. It is necessary to do content analytics to know
whether the content is useful, need changes, optimization in the users and business
perspective. Mobile Marketing (MM) strategies that are comparatively effective even
though email marketing and digest are required.
The new MM strategies are interactive such as Location-based Service (LBS),
augmented reality mobile campaigns, 2D barcodes and GPS messaging. It continues
to adopt text messaging and display-based campaign techniques. Digital marketing
professionals need to have planned and use DM tools. Any DM planning depends
on ’the marketing environment such as demography, geography, psychographic and
behavioural analysis. Digital marketing is based upon Internet macro and micro-
environments’. One of the major innovations in DM is to apply Machine Learning
(ML) [3] and deep learning [4] techniques to make strategies/tools more effective
and as per the business needs, the same have to be adopted.
The next section analyses the role of Artificial Intelligence (AI)—ML in DM.

1.2 Role of AI, ML/DL in DM

Despite enhanced digital marketing [5] strategies in place, their efficiencies are to
be improved using contemporary technologies such as AI to understand emotions,
behaviour, respond to human customer’s queries. AI computing could optimize DM
strategies at all cognitive levels.
Teaching the machine to learn is a subset of AI that can offer customized inputs
for marketing specialists. DL is a subclass of ML encompassed of enormously
large neural networks and an immense pool of algorithms that can replicate human
intelligence.
The yield of the direct answer by Google is driven by ML, and the return of the
’people also ask’ section is motorized by DL. Google is continuously culturing and
reflecting human intelligence without the need for humans to nourish all the answers
into its enormous database.

1.2.1 Basic Definitions

In a nutshell following widely referred definition illumines:


AI also referred to as machine intelligence which is intelligence established by
machines, contrasting the natural intelligence exhibited by humans and animals. AI
[6] is quite often used to define the ability developed in the machine that clones the
behaviour of the human mind by learning and problem solving [7].
Definition of ML: ‘A computer programme is said to learn from experience E
with respect to some class of tasks T and performance measure P if its performance
at tasks in T, as measured by P, improves with experience E’ [4].
962 B. R. Arun Kumar

Fig. 4 Performance comparison of ML with DL (Source Ref. [4])

Definition of DL: ‘Deep learning is a particular kind of machine learning that


achieves great power and flexibility by learning to represent the world as a nested
hierarchy of concepts, with each concept defined in relation to simpler concepts, and
more abstract representations computed in terms of less abstract ones’ [4].
DL algorithms perform better and suitable when data is large on high-end
machines, whereas ML algorithms can work on small data with low-end machines
as shown in Fig. 4. DL algorithms involve massive matrix multiplication that needs
hardware accelerators [8]. The better performance of ML expects accurate identifi-
cation and extraction of features, whereas DL algorithms can learn features at a high
level from data which makes DL unique which reduces the analysis and development
of feature extractor for every problem. The given problem is broken into parts and
solved when ML algorithms are used, whereas DL adopts the approach of ’end to end
process’. DL method takes a longer time to learn compared to the ML method. Both
ML and DL are applied in various fields including DM and medical diagnosis. Both
applications of existing features and research trends are exploded in the industry as
well as in academia. Adoptions of ML and DL analytical tools offered a competitive
edge into DM [9].

1.3 Research Methodology

The paper analyses the DM and AI-ML/DL role in stimulating the business by
identifying and responding to the customer’s taste. The paper highlights the role of
artificial intelligence, ML and DL tools in digital marketing. This paper is narrative
in nature; information and examples denoted are based on the references available
AI-Based Digital Marketing Strategies—A Review 963

at some subordinate sources. The study motivates business enterprises to adopt AI-
ML/DL techniques to optimize their digital marketing strategies.

1.4 Research Objective

This research is carried out with a primary objective of exploring AI-based DM and
to the significance of contemporary technology such as AI, big data, data analytics
and deep learning for marketing their product and services.

2 Impact of AI and ML on DM

DM strategies/tools based on AI-ML can streamline the market, optimizing both the
business profit and satisfaction of user experience. The future of DM depends on the
ability of DM professionals in applying AI-ML techniques to effectively implement
DM strategies.
AI and ML are separate yet complementary to each other. As mentioned in [10].
’AI aims to harness certain aspects of the "thinking” mind, Machine Learning (ML)
is helping humans solve problems in a more efficient way. As a subset of AI, ML
uses data to teach itself how to complete a process with the help of AI [11] capabil-
ities’. AI-ML tools [12] can bring out hidden business intelligence from the given
consumer data which streamlines complex DM problems. It is difficult to make valid
conclusions on the implications of ML techniques. It is known that ML has started
creating an impact on DM [3].
This is because of the ability of ML tools to analyze the extremely large dataset
and present the visualization as per the requirement of the DM team for taking deci-
sions to streamline strategies. By applying ML tools, analytics outcomes enable them
to understand their customers in-depth. It may be noted that 75% of DM strategy
development as of now adopted AI functionality, and 85% of the customer interac-
tions can be effectively managed without human intervention [10]. It implies that
the ML tool can streamline DM strategies, and the business can align with AI-ML
[10] future trends. It can be noted that there are several research works, and articles
have upholder the artificial intelligence, ML and DL-based approaches for digital
marketing including [13, 2]. It is found that 90% of the sales professional expected
a substantial impact of AI on sales marketing [14].
964 B. R. Arun Kumar

Table 1 Linkedln table for


Sl. no. Particulars % recommended
content marketing
1 Audience relevance 58
2 Engaging and compelling −57
storytelling
3 The ability to trigger an action or 54
response

2.1 ML Tools Enable DM

AI in general, particularly relevant ML/DL techniques implications in DM [15]


involves utilizing data, content and online channels which assure increased produc-
tivity and a better understanding of targeted customers. It is worth noting that how
exactly ML tools enable DM to streamline.
ML tool can improve the relevance of content marketing: All types of business
creates content in the form of blogs, testimonial videos and recorded webinars. But
content can become truly effective if it has followed the things as per the “Linkedln”
table mentioned in Table 1. ML tools can analyze the given report as per the above
requirement with your content. ML tools can boost PPC campaigns by providing
metrics to drive the business and SEO by giving insights into the content rather
than the specific keywords. ML Chabot [16] is a virtual robot capable of making
conversation with humans either through the text mode, voice commands, sometimes
both. Many of the big brand organizations have already adopted ML Chabot, for
example, Apple’s Siri feature and Facebook. Chabot can speak to the personal level
of the targeted customer and can collect personal information behaving like a virtual
ambassador. The ML techniques can process large datasets and create instantly user
personalized content drips. Investigation of complex DM problems can be much
faster than ever before by applying ML tools and Chabot leading to a meaningful
personalized relationship with targeted customer involvement. AI-ML has created
disruptions and transforming the DM into a different technological landscape. AI-
ML-based marketing models can utilize relevant buying patterns and conduct of the
targeted customers leading to promotion teams wallop the supremacy of AI into their
businesses. Figure 5 indicates models based on ML.

2.2 Digital Humans in DM

AI-based digital humans are successfully communicating with appropriate responses


to queries of the customer. Conferring to the statement of Jody Boshoff, FaceMe,
’non-human interactions between the customer and businesses are going to be 85%
by 2025’. This is because from the reports, it is found that at present 70% of the
customers are transacting using digital services which cut across different sectors
AI-Based Digital Marketing Strategies—A Review 965

Fig. 5 Some ML-based marketing models. (Source Ref. [3])

from telecommunications to banking [17, 18]. Since the customers enjoy intermin-
gling with digital humans, AI-based DHs can impact digital marketing since it can
work efficiently, eventually can keep learning from its experience and reduce costs
as well. Digital services especially digital humans powered by AI when developed
to meet the expectation of the customers, customers prefer it too.
A ’Digital Human’ is the embodiment of an underlying AI-based Chatbot or digital
assistant with additional capabilities such as emotional intelligence. Like a natural
human, it can connect to individual natural humans, understand tone expression
and body language and respond with relevance giving appropriate responses. For
example, patients can take assistance from digital humans to understand their medical
problems and method of following the prescription and diet with individual empathy
[19].
An AI machine in human avatar visually realistic can blink eyes, wink, move
their lips, smiles, treats with empathy; intelligent corporate digital human ability is
highly convincing because of their modes of persuasion in handling customer-centric
services. Compared Chatbots DHs can convince with logos, ethos and pathos. Digital
assistants work 24/7 never get bored or tired. DHs are a combination of multiple
advanced technologies that can understand the user’s emotion, mood and personality
[19] (Fig. 6).
966 B. R. Arun Kumar

Fig. 6 AI digital humans in customer-centric service. (Source Ref. [19])

2.3 Understanding the Importance/Advantages of AI-ML


in DM

DM and storytelling features are strengthened by AI features in achieving market


specialization and reaching the customers. AI’s innovations enabled customers to
interact with technology, data, brands, products and services. Giant companies like
Google are integrating AI [20] into their existing products using speech recognition
and language understanding (Table 2).

Table 2 AI-ML advantages


Sl. no. Importance/advantages
1 Leveraging big data to get a better value
2 Robust customer relationships
3 Precise market predictions and sales forecasts
4 Data-driven, optimized marketing campaigns
5 Enhanced Marketing Qualified Leads (MQL)
6 Further Sales Qualified Leads (SQL)
7 Superior insights to improve positioning
8 360-degree view of customer needs
9 Exploiting more business openings
10 Fine-tuned propensity models to create focused marketing strategies
11 Buyer satisfaction with improved user experience
12 Reducing marketing costs for better ROI
AI-Based Digital Marketing Strategies—A Review 967

Fig. 7 AI-ML-based future marketing driving forces. (Source Ref. [3])

Table 3 Mistakes to be
Sl. no. Mistakes to be avoided
avoided during DM
streamlining [3] 1 Affecting generic and broad customer characters
2 Working with inadequate customer data
3 Neglecting performance of previous marketing
campaigns
4 Not addressing regular and returning customers
5 Generating and dissemination of irrelevant content
6 Too much dependence on gut feeling

The global machine learning market is expected to grow from $1.41 billion in
2017 to $8.81 billion by 2022, at a Compound Annual Growth Rate (CAGR) of
44.1% [3].
The forthcoming DM includes AI-ML-based smart automation solutions which
include the details given in Fig. 7.
AI-ML-based marketing strategies must avoid the following mistakes shown in
Table 3.
It can be determined that ML/DL based on extensive data processing offers the
information essential for the decision-making progression of promoting specialists.
The application of ML-driven tools into digital marketing [21, 22] acquaint with
various new challenges and opportunities. Implementation of ML applied to market
analytical tools has no obvious disadvantages [9].

3 Conclusion

DM strategies are continuously needed to be innovated in line with AI-ML tech-


niques to keep up the market and get high ROI. AI has several tools that can
968 B. R. Arun Kumar

boost DM. They are AI-assisted professional website development, audience selec-
tion, content crafting services, creating and customizing content, Chabot’s, customer
service, email marketing, predictive analysis and marketing, AI recommendations
for engaging the targeted customers. The future developments in AI coupled with
ML and DL tools address the concerns or limiting factors if any with the current
tools.

Reference

1. https://www.deasra.in/msme-checklist/digital-marketing-checklist/?gclid=EAIaIQobChMI
rLf4mJzM6wIV3sEWBR1f0AXnEAAYASAAEgIENPD_BwE
2. https://www.toprankblog.com/2018/03/artificial-intelligence-marketing-tools/
3. https://www.grazitti.com/blog/the-impact-of-ai-ml-on-marketing/
4. https://www.analyticsvidhya.com/blog/2017/04/comparison-between-deep-learning-mac
hine-learning/
5. https://quanticmind.com/blog/predictive-advertising-future-digital-marketing/
6. Artificial Intelligence in Action: Digital Humans, Monica Collier Scott Manion Richard de
Boyett, May 2019. https://aiforum.org.nz/wp-content/uploads/2019/10/FaceMe-Case-Study.
pdf
7. Artificial intelligence. https://en.wikipedia.org/wiki/Artificial_intelligence
8. Talib MA, Majzoub S, Nasir Q et al (2020) A systematic literature review on hardware imple-
mentation of artificial intelligence algorithms. J Supercomput. https://doi.org/10.1007/s11227-
020-03325-8
9. Miklosik A, Kuchta M, Evans N, Zak S (2019) Towards the adoption of machine learning-based
analytical tools in digital marketing. https://doi.org/10.1109/ACCESS.2019.2924425
10. https://digitalmarketinginstitute.com/blog/how-to-apply-machine-learning-to-your-digital-
marketing-strategy
11. https://www.smartinsights.com/managing-digital-marketing/how-ai-is-transforming-the-fut
ure-of-digital-marketing/
12. ]https://www.superaitools.com/post/ai-tools-for-digital-marketing
13. https://www.researchgate.net/publication/330661483_Trends_in_Digital_Marketing_2019/
link/5c4d3d6f458515a4c743467e/download
14. Top Sales & Marketing Priorities for 2019: AI and Big Data, Revealed by Survey of 600+ Sales
Professionals Business Wire|https://www.businesswire.com/news/home/20190129005560/en/
Top-Sales-Marketing-Priorities-2019-AI-Big
15. https://www.educba.com/seo-in-digital-marketing/
16. https://www.prnewswire.com/news-releases/machine-learning-market-worth-881-billion-
usd-by-2022-644444253.html
17. Digital Humans; the rise of non-human interactions, Jody shares. https://www.marketing.org.
nz/Digital-Humans-DDO18
18. Customers’ lives are digital-but is your customer care still analog? Jorge Amar and Hyo Yeon,
June 2017. https://www.mckinsey.com/business-functions/operations/our-insights/customers-
lives-are-digital-but-is-your-customer-care-still-analog
19. In 5 years, a very large population of digital humans will have hundreds of millions of conversa-
tions every day by Cyril Fiévet. https://bonus.usbeketrica.com/article/in-5-years-a-very-large-
population-of-digital-humans-will-have-hundreds-of-millions-of-conversations-every-day
20. https://ieeexplore.ieee.org/stamp/stamp.?arnumber=8746184
AI-Based Digital Marketing Strategies—A Review 969

21. https://cio.economictimes.indiatimes.com/news/strategy-and-management/ai-digital-market
ing-key-skills-to-boost-growth/71682736
22. https://www.singlegrain.com/seo/future-of-seo-how-ai-and-machine-learning-will-impact-
content/
NoRegINT—A Tool for Performing
OSINT and Analysis from Social Media

S. Karthika, N. Bhalaji, S. Chithra, N. Sri Harikarthick,


and Debadyuti Bhattacharya

Abstract There are a variety of incidents that occur and the Open-Source Intelli-
gence (OSINT) tools in the market are capable of only collecting those specific target
data and even that is limited only to a certain extent. Our tool NoRegINT has been
specifically been developed to collect theoretically an infinite amount of data based
on keyword terms and draw a variety of inferences from it. This tool is used to gather
information in a structured format about the Pulwama attacks and draw inferences
such as the volume of data, the general sentiment of people about it and the impact
of a particular hashtag.

Keywords Open-source intelligence · Application programming interface ·


Spiderfoot · Maltego · Social media · Sentiment analysis

1 Introduction

Open-Source Intelligence (OSINT) is the method of obtaining data and other relevant
information about a specific target, usually but not limited to a person, E-mail ID,
phone numbers, IP addresses, location, etc. It makes use of openly available infor-
mation generally without the direct involvement of said target. OSINT is generally

S. Karthika (B) · N. Bhalaji · S. Chithra · N. Sri Harikarthick · D. Bhattacharya


Department of Information Technology, SSN College of Engineering, Kalavakkam, Chennai,
Tamil Nadu, India
e-mail: skarthika@ssn.edu.in
N. Bhalaji
e-mail: bhalajin@ssn.edu.in
S. Chithra
e-mail: chithras@ssn.edu.in
N. Sri Harikarthick
e-mail: sriharikarthickn16102@it.ssn.edu.in
D. Bhattacharya
e-mail: debadyuti4@gmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 971
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4_71
972 S. Karthika et al.

achieved through many tools which automate certain processes, although a prelimi-
nary analysis can be performed through manual processes. In this paper, a new tool
is proposed to collect data from various sources such as Twitter, Reddit and Tumblr
and draw inferences from it [1].
It is a well-known fact that social media is now an integral part of everybody’s life
and more often than not, people tend to post a variety of things on their social media
accounts such as details about their personal lives, their opinions about a particular
entity or incident and also pictures of themselves. This makes data collection from
such sources an extremely important task and essentially forms the core of fields
such as Open-Source Intelligence(OSINT) [2, 3].
The motivation behind the proposed work is that there have been a large number
of tools that have been developed in the field of OSINT, and these tools have although
been very useful and have had various shortcomings. A large number of them are
simple wrappers built around an Application Programming Interface (API) provided
by the particular social media website [4, 5]. There is always an upper limit set upon
the number of requests, and therefore, by extension, the amount of data that can
be collected. The other web scraping tools also limit the amount of data collected,
because of the concept of infinite scrolling in these web pages. Furthermore, the data
is often unstructured and not followed by any sort of analysis. This sort of tools leave
a lot of work up to the end user and therefore needs to be followed by a lot of cleaning
and analysis [6, 7].
In this paper, a tool has been proposed that is built upon web scraping to collect
publicly available data from social media websites without facing difficulties such
as a cap on the amount of data or dependencies upon any sort of API. This tool
will not only overcome the problem set by infinite scrolling web pages, but it will
also provide the post data collection, temporal analysis, sentiment analysis that can
be used to study social media response to certain incidents such as 9/11, Pulwama
attack. The mentioned data can be of any type, textual, pictorial, etc. The targeted
social media websites in this paper are twitter, Reddit and Tumblr [8–10].
In the following paper, Sect. 2 elaborates about related works done in this area,
and Sect. 3 discusses the methodology. Section 4 analyses the results and compares
with other APIs available, and Sect. 5 draws conclusions and discusses the future
scope of the proposed tool.

2 Related Works

This section discusses the various APIs and wrapper existing in the research area of
OSINT.
NoRegINT—A Tool for Performing OSINT and Analysis … 973

2.1 Twitter API

The Twitter API is an interface provided by the company itself to support the integra-
tion of their service in other applications. This is not a data collection tool but more of
an alternative to access one’s account and perform actions from there. Although one
can post data and perform various actions such as follow/unfollow and post content,
it is not an effective data collection tool as the number of requests is limited, and its
usage requires registration on the user’s part and proficient coding knowledge on the
end user’s part [2, 11].

2.2 Reddit API-Wrapper(PRAW)

This is an openly available tool built upon the Reddit developers’ API, which can
be used to perform various activities such as retrieving posts, metadata about posts,
ability to post content itself and also to upvote posts and follow other users [12].
However, the usage of this wrapper involves the hassle of registration as a developer
with Reddit and also requires the end user to be familiar with programming concepts
and the usage of OAuth. This prevents the wrapper from being a simple plug and
play OSINT tool [13].

2.3 Spiderfoot

Spiderfoot, although marketed as an OSINT tool, is commonly used for scanning


targets based on IP addresses and is only capable of providing the user with raw data
about a specified target IP or domain. Although it uses a large pool of resources to
collect the data, it is not capable of drawing any inferences from the data collected.

2.4 Maltego

Maltego is a commonly used data mining tool, mainly used to collect data about
specific entities which may be people, companies or websites. This data is visually
represented as a set of connected graphs. Although it is one of the most effective
OSINT tools in the market, it is still not the best choice for term-wise data collection
and data collection about incidents. It can co-relate and connect data but it cannot
draw conclusive inferences from the representations [14].
974 S. Karthika et al.

3 Framework of NoRegINT Tool

The proposed NoRegINT tool is designed to overcome the various existing API-based
problems such as the number of days, amount of fetched content and the number
of requests. Figure 1 presents four major modules, namely CLI module, Twitter
scraper, Reddit scraper, Tumblr image collector and inference module involved in
the framework of the proposed tool.

Fig. 1 A framework of NoRegINT tool


NoRegINT—A Tool for Performing OSINT and Analysis … 975

3.1 CLI Module

It is an interface that can be used by the user. It provides a high level of abstraction
and does not require any sort of significant programming knowledge on the end user’s
part. It provides the user with two main functionalities. The collection of data based
upon a specified search term and the inferences that can potentially be drawn from
said collected data.

3.2 Twitter Scraper

The Twitter scraper works upon the popular scraping package available in python
called beautiful soup. However, using this with the ’request’ function presented limi-
tations upon the amount of data collected. To overcome this limitation, an instance of
a browser in the background is created, and the javascript code is executed to simu-
late scrolling movements. This theoretically provides an infinite amount of data. The
beautiful soup uses the HTML DOM to access entities present in the page, and this
has been used to collect the tweet metadata. The result is stored in a JSON format.
This is then accessed by the inferences module.

3.3 Reddit Scraper

Similar to the Twitter scraper module, this module also generates the metadata from
Reddit posts regarding a particular keyword and then stores in the same format as
the twitter results. This is then accessed by the inferences module.

3.4 Tumblr Image Collector

The Tumblr image collector sends requests to download the images from URLs
collected from the web page using beautiful soup. These images are then indexed and
stored locally for the user to access. This can be useful in collection or accumulation
of a dataset for a given problem.

3.5 Inference Module

1. Volume of data
(a) Gives insight on the popularity of a topic on different social media
976 S. Karthika et al.

(b) It gives information about the number of Tweets, Reddit posts and images
scraped by the system
2. Sentiment analysis
(a) Gives the average sentiment value of all the Tweets and Reddit posts
(b) Gives info on the impact of a term on social media

4 Results and Discussion

The proposed tool NoRegINT has experimented for the keyword ’Pulwama’. Figure 2
presents the CLI module to obtain keyword input from the user.
After obtaining the keywords from the user, the process of scraping begins with
a new instance of selenium browser object with the built URL. Figure 3 illustrates
the process of automatic loading of posts for scraping, and Figs. 4 and 5 show the
JSON file generated by Reddit and Twitter scrapers, respectively.
Figure 6 describes the comprehensive results achieved using the tool. About 99
tweet posts, 100 Reddit posts and two Tumblr photos were collected in the fast search
method (three levels of scrolling). The tool uses Vader sentiment analysis package in
which the interval of −0.05 to +0.05 is considered as neutral sentiment, an interval
of +0.05 to +1 represents positive sentiment, whereas the interval of −1 to −0.05
is seen as a negative sentiment. The tool achieved an average sentiment of −0.28 on
Twitter scraped data and a value of −0.16 on Reddit data. This sentiment analysis
performed on the data collected based on the NoRegINT tool shows that the recent
posts about that term and its hashtag have been negative on average.
Figure 7 details on the percentage of sentimental tweets in the scraped repository
built by NoRegINT tool.
The comparison has been done based on standard features to determine the perfor-
mance of the APIs and the OSINT tools. Posted content cannot be obtained in maltego
and spiderfoot, whereas it is scraped and stored in NoRegINT in JSON Format.

Fig. 2 CLI module


NoRegINT—A Tool for Performing OSINT and Analysis … 977

Fig. 3 Reddit scraper

Fig. 4 Reddit scraper results


978 S. Karthika et al.

Fig. 5 Twitter scraper results

Fig. 6 Results from inference module

Reddit wrapper, Twitter API and Maltego restrict the amount of data scraped (~3200
tweets, etc.) while NoRegINT doesn’t restrict the amount of content retrieved from
social media. APIs also have a 7 day range limit, while NoRegINT can gather any old
information from the social media posts. None of the tools/APIs in question performs
sentiment analysis on gathered data, while NoRegINT performs sentiment analysis,
giving the average compound value and graph depicting the percentage of senti-
mental tweets. None of the tools has built-in inferencing methods while NoRegINT
can report the number of tweets, posts and photos scraped from Twitter, Reddit and
Tumblr, respectively (Fig. 8).
NoRegINT—A Tool for Performing OSINT and Analysis … 979

Fig. 7 Sentiment analysis


results

Fig. 8 Comparison of NoRegINT tool with the existing APIs

5 Conclusion

The authors of this paper have addressed the problems such as restriction on the
volume of data, 7 day limit for content fetching and the lack of built-in inferencing
in the existing APIs and OSINT tools. The proposed tool, NoRegINT, overcome
these problems by using its features like automatic scrolling and sentiment analysis.
The scrolling facilitates the infinite fetching of data through which the authors were
able to build a complete restrictionless repository. The system is versatile in its
keyword input. The tool can also automatically generate a sentimental review on the
keyword input. This tool can be further developed to give more functionalities such
as analysing streaming of posts and photos and can also be extended to other popular
or growing social media websites.
980 S. Karthika et al.

References

1. Lee S, Shon T (2016) Open source intelligence base cyber threat inspection framework for
critical infrastructures. In: 2016 future technologies conference (FTC). IEEE, pp 1030–1033
2. Best C (2012) OSINT, the Internet and Privacy. In: EISIC, p 4
3. Noubours S, Pritzkau A, Schade U (2013) NLP as an essential ingredient of effective OSINT
frameworks. In: 2013 Military communications and ınformation systems conference. IEEE,
pp 1–7
4. Sir David Omand JB (2012) Introducing social media ıntelligence. Intell Natl Secur 801–823
5. Steele RD (2010) Human intelligence: all humans, all minds. All the Tıme
6. Bacastow TS, Bellafiore D (2009) Redefining geospatial ıntelligence. Am Intell J 27(1):38–40.
Best C (n.d.) Open source ıntelligence. (T.R.I.O), T. R. (2017). Background/OSINT. Retrieved
23 Apr 2018, from www.trioinvestigations.ca: https://www.trioinvestigations.ca/background-
osint
7. Christopher Andrew RJ (2009) Secret intelligence: a reader. Routledge Taylor & Francis Group,
London
8. Garzia F, Cusani R, Borghini F, Saltini B, Lombardi M, Ramalingam S (2018) Perceived
risk assessment through open-source ıntelligent techniques for opinion mining and sentiment
analysis: the case study of the Papal Basilica and sacred convent of Saint Francis in Assisi,
Italy. In: 2018 International Carnahan conference on security technology (ICCST). IEEE, pp
1–5
9. Michael Glassman MJ (2012) Intelligence in the internet age: the emergence and evolution of
Open Source Intelligence (OSINT). Comput Hum Behav 28(2):673–682
10. Stottlemyre SA (2015) HUMINT, OSINT, or Something New? Defining crowdsourced
ıntelligence. Int J Intell Counter Intell 578–589
11. Gasper Hribar IP (2014) OSINT: a “Grey Zone”? Int J Intell Counter Intell 529–549
12. Intellıgence Communıty Directive Number 301 (2006) National Open Source Enterprıse, 11
July 2006
13. Neri F, Geraci P (2009) Mining textual data to boost information access in OSINT. In: 2009
13th International conference information visualisation, pp 427–432. IEEE
14. Pietro GD, Aliprandi C, De Luca AE, Raffaelli M, Soru T (2014) Semantic crawling: an
approach based on named entity recognition. In: 2014 IEEE/ACM International conference on
advances in social networks analysis and mining (ASONAM 2014). IEEE, pp 695–699
Author Index

A D
Abhinay, K., 529 Dalin, G., 15
Adhikari, Surabhi, 39 De, Debashis, 333
Agrawal, Jitendra, 157 Dilhani, M. H. M. R. S., 647
Ahuja, Sparsh, 751 Dilum Bandara, H. M. N., 567
Aiswaryadevi, V. J., 925 Dushyanth Reddy, B., 851
Aleksanyan, G. K., 729
Aravind, A., 529
Arif Hassan, Md, 869 E
Arun Kumar, B. R., 957 Eybers, Sunet, 379
Atul Shrinath, B., 271
Ayyasamy, A., 127
G
Gaba, Anubhav, 39
B Ganesh Babu, C., 445, 481
Bains, Inderpreet Singh, 113 Gautam, Shivani, 285
BalaSubramanya, K., 235 Ghosh, Atonu, 333
Bansal, Nayan, 39 Gokul Kumar, S., 445
Baranidharan, V., 365 Gorbatenko, N. I., 729
Basnet, Vishisth, 751 Gour, Avinash, 305
Bawm, Rose Mary, 883 Gouthaman, P., 763, 781
Behera, Anama Charan, 739 Graceline Jasmine, S., 189
Behera, Bibhu Santosh, 739 Gupta, Akshay Ramesh Bhai, 157
Behera, Rahul Dev, 739 Gupta, Anil, 203
Behera, Rudra Ashish, 739 Gupta, Anmol, 763
Bhalaji, N., 971 Gupta, Sachin, 315
Bhati, Amit, 53 Guttikonda, Geeta, 103
Bhattacharya, Debadyuti, 971

H
C Haldorai, Anandakumar, 851
Channabasamma, 395 Harish, Ratnala Venkata Siva, 349
Chhajer, Akshat, 781 Hettige, Budditha, 691
Chile, R. H., 703 Hettikankanama, H. K. S. K., 601
Chithra, S., 971 Hiremath, Iresh, 405
Chopade, Nilkanth B., 911 Hoang, Vinh Truong, 299
Chung, Yun Koo, 551 Hossain, Sohrab, 883
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 981
S. Smys et al. (eds.), Inventive Computation and Information Technologies, Lecture Notes
in Networks and Systems 173,
https://doi.org/10.1007/978-981-33-4305-4
982 Author Index

I M
Indirani, M., 831 Mahaveerakannan, R., 1, 813
Ishi, Manoj S., 143 Maheshwari, Vikas, 305
Majumder, Koushik, 333
Malipatil, Somashekhar, 305
J Manickavasagam, L., 271
Jagtap, Swati, 911 Manjunathan, A., 481
Jahnavi, Ambati, 851 Manusha Reddy, A., 395
Jain, Rachna, 677 Marathe, Amit, 221
Jani Anbarasi, L., 189 Maria Priscilla, G., 795
Jawahar, Malathy, 189 Maruthi Shankar, B., 445
Jeeva Padmini, K. V., 567 Mathankumar, M., 481
Jeyaboopathiraja, J., 795 Matta, Priya, 751
Jha, Aayush, 39 Mehta, Gaurav, 285
Jha, Avinash Kumar, 39 Mishra, Bharat, 421
Jinarajadasa, G. M., 719 Mittra, Tanni, 883
John Aravindhar, D., 463 Mohanrajan, S. R., 271
John Deva Prasanna, D. S., 463 Mohanty, Prarthana, 739
Joshi, Abhijit R., 221 Muqith, Munim Bin, 897
Joshi, Shashank Karthik D., 235
Jotheeswar Raghava, E., 529
Jude, Hemanth, 677 N
Nagalakshmi, Malathy, 859
Nagrath, Preeti, 677
Nair, Jayashree, 173
K
Narendra, Modigari, 189
Kailasam, Siddharth, 405
Nataraj, N., 925
Kamrul Hasan, Mohammad, 869
Naveen, K. M., 365
Karthika, S., 971
Nayar, Nandini, 285
Karthikeyan, M. M., 15
Niloy, Md. Dilshad Kabir, 897
Karthik, S., 247
Nithish Sriman, K. P., 365
Karthik, V., 189
Karunananda, Asoka S., 583, 691
Katamaneni, Madhavi, 103 O
Katsupeev, A. A., 729 Olana, Mosisa Dessalegn, 551
Kikkuri, Vamsi Krishna, 173
Kiruthika, S., 925
Kombarova, E. O., 729 P
Kommineni, Madhuri, 851 Pandala, Madhavi Latha, 103
Koppar, Anant, 405 Pandey, Sanidhya, 763
Kousik, N. V., 813 Pant, Bhasker, 751
Krishanth, N., 271 Parveen, Suraiya, 259
Krishna, Harsh, 763 Patidar, Sanjay, 113
Kumara, Kudabadu J. C., 647, 665 Patil, Ajay B., 703
Kumar, Anuj, 537 Patil, Annapurna P., 513
Kumar, N. S., 859 Patil, J. B., 143
Pavan Karthik, D. V. S., 945
Pavel, Monirul Islam, 897
L Perera, G. I. U. S., 567
Li, Hengjian, 633 Pramanik, Subham, 763
Lima, Farzana Firoz, 883 Pranavanand, S., 945
Lingaraj, N., 247 Prathap, R., 365
Litvyak, R. K., 729 Praveen Kumar, N., 365
Liyanage, S. R., 719 Premjith, B., 81
Author Index 983

Priyanka, G. S., 445, 925 Sivasankar, P., 463


Priyatham, Manoj, 497 Slathia, Shaurya Singh, 781
Punithavalli, M., 27 Soman, K. P., 81
Sophia Reena, G., 27
Sri Harikarthick, N., 971
R Sruthi, M. S., 925
Rajesh Kumar, E., 529 Subash, G., 271
Rajesh Kumar, P., 349 Subba Raju, K. V., 93
Rajesh Sharma, R., 551 Sujin, J. S., 247
Rakesh, K. S. S., 739 Sungheetha, Akey, 551
Ramachandran, Raji, 935 Sureshkumar, C., 127
Ramkumar, M., 445, 481 Suresh, Yeresime, 395
Ram, Shrawan, 203
Ranasinghe, D. D. M., 615
Ranjith, R., 271 T
Rathnayake, Kapila T., 601 Talagani, Srikar, 173
Raveendran, Aswathi, 935 Tan, Siok Yee, 897
Ravichandran, Gopika, 935 Thakur, Narina, 677
Ravi Kiran Varma, P., 93 Thapa, Surendrabikram, 39
Razzak, Razia, 897 Thenmozhi, S., 235
Rizvi, Ali Abbas, 221 Thota, Yashwanth, 173
Ruthala, Suresh, 93 Tran-Trung, Kiet, 299
Rzevski, George, 691 Tripathi, Satyendra, 421

S U
Sabena, S., 127 Udhayanan, S., 481
Sachdeva, Ritu, 315 Uma, J., 1
Sai Aparna, T., 81
Saini, Dharmender, 677
Sai Ramesh, L., 127, 435 V
Saranya, M. D., 247 Varun, M., 405
Sarath Kumar, R., 445, 481 Vasantha, Bhavani, 851
Sarma, Dhiman, 883 Vasanthapriyan, Shanmuganathan, 601
Sarwar, Tawsif, 883 Vasundhara, 259
Satheesh Kumar, S., 247 Vemuri, Pavan, 173
Selvakumar, K., 435 Venba, R., 189
Sengupta, Katha, 897 Vidanagama, Dushyanthi, 583
Setsabi, Naomi, 379 Vidya, G., 63
Shalini, S., 513 Vikas, B., 235
Shankar, S., 831 Vivekanandan, P., 1
Shanthini, M., 63
Sharma, Nitika, 677
Sharma, Tanya, 859 W
Shrinivas, S., 235 Wagarachchi, N. M., 665
Shukur, Zarina, 869 Wang, Xiyu, 633
Shwetha, N., 497
Shyamali Dilhani, M. H. M. R., 665
Silva, R. K. Omega H., 567 Y
Silva, Thushari, 583 Yashodhara, P. H. A. H. K., 615
Simran, K., 81 Yuvaraj, N., 813
Singh, Ashutosh Kumar, 421
Singh, Bhavesh, 221 Z
Singh, Poonam, 285 Zhao, Baohua, 633
Sivaram, M., 813

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy