Latest Seminar Report Yash Ingole
Latest Seminar Report Yash Ingole
on
Machine Learning Technique for Early Prediction
of Disease from Various Symptoms
Submitted by
Yash J. Ingole
Seventh Semester
B.E. (Computer Science & Engineering)
Guided by
Dr. R. R. Karwa
With great pleasure I hereby acknowledge the help given to me by various individuals
throughout the process. This is an acknowledgement to the inspiration, drive and
technical assistance contributed by many individuals. This report would have never
seen the light of this day without the help and guidance which I have received.
I would like to express my profound thanks to Dr. R. R. Karwa for their guid-
ance and constant supervision as well as for providing necessary information regarding
the seminar report. I would also like to thanks Dr. G. R. Bamnote, Principal of
PRMIT&R and Head, Department of Computer Science & Engineering and the fac-
ulties for their kind co-operation and encouragement which helped me in completion
of this report. I owe an incalculable debt to all staffs of the Department of Computer
Science & Engineering for their direct and indirect help.
I extend my heartfelt thanks to my parents, friends and well wishers for their
support and timely help. Last but not the least; I will thank the God Almighty for
guiding me in every step of the way.
i
Abstract
Most of the medical diagnoses require going to the doctor and fixing appointments
for a consultation and sometimes to get accurate disease indications they have to wait
for blood reports also they have to travel long distances to seek doctor consultation.
When they are not feeling well the first thing they do is to check temperature to
get an estimate or baseline idea of their fever so they can consult their doctor if the
temperature is high enough similarly a medical disease prediction application can
be used to get a baseline idea of disease and prediction model can indicate whether
they should take immediate doctor consultation or not, or at least start some home-
remedies for the same to find temporary relief. Combining machine learning with
an application interface to interact with users provides opportunities for easy interac-
tion with the users with the machine learning model to get more accurate predictions.
ii
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Survey 3
3 System Objectives 8
3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 System Architecture 9
5 Methodology 11
5.1 Algorithms and their significance . . . . . . . . . . . . . . . . . . . . 12
5.2 Parameters used for Calculating the Accuracies of the Machine Learn-
ing Models : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6 Conclusion 27
6.1 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
References 28
iii
1 Introduction
In terms of data collecting and processing, healthcare is one of the most worri-
some industries. With the advent of the digital era and technological advancements,
a vast quantity of multidimensional data on patients is created, including clinical
factors, hospital resources, illness diagnostic information, patients’ records, and med-
ical equipment. The enormous, dense, and complex data must be processed and
evaluated in order to extract knowledge for effective decision making. Medical data
mining offers a lot of potential for uncovering hidden patterns in medical data sets [1].
1
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
niques with physician expertise [9]. Data mining and machine learning approaches
are making significant efforts to intelligently translate accessible data into valuable
information in order to improve the diagnostic process’s efficiency. Several studies
have been conducted to explore the use of machine learning in terms of diagnostic
abilities. It was discovered that, when compared to the most experienced physician,
who can diagnose with 79.97% accuracy, machine learning algorithms could identify
with 91.1% correctness [10]. Machine learning techniques are explicitly used to illness
datasets to extract features for optimal illness diagnosis, prediction, prevention, and
therapy.
1.1 Motivation
Within the healthcare ecosystem there are many stakeholder groups, each of
which has a specific set of needs with regard to ED&D. These groups include patients,
primary and secondary care clinicians, radiologists, epidemiologists, geneticists, clin-
ical researchers, and policymakers
Each of these stakeholders must act based on different information and knowledge,
and different objectives; however, these views must be integrated in order to build
a shared approach to making early detection and diagnosis as powerful and efficient
as possible. What is needed is an overarching ecosystem capable of pulling together
the broad range of available data, needs, and perspectives—a task perfectly suited to
machine learning. By contextualizing and presenting information in a manner spe-
cific to each user and situation, machine learning can extract actionable intelligence
across this range of different datasets, and tailor insights to specific stakeholders.This
is not something that can be done at the standard machine level (for example, where
a model is given information, and makes and displays a prediction on the basis of
that information).
There is a demand to make such ecosystem that will help end users to predict dis-
eases on the basis of symptoms given in it without visiting hospitals. By doing so,
it decreases the rush at OPD of hospitals and bring down the workload on medical
staff. Not only this, this system reduces the costly treatment and panic moment at
the end stages so that proper medication can be provided at the right time and which
can lower down the death rate as well.
2
2 Literature Survey
This section outlines that what has been done so far in Machine Learning Tech-
nique for Early Prediction of Disease from Various Symptoms
Kunal Takke et al.[16] To identify and predict patients suffering from more preva-
lent ailments using Multinomial Naive Bayes, Random Forest Classifier, K-Nearest
Neighbours MNB :98% RFC : 98.2% KNN : 97%
Naganna Chetty et al.[17] developed a system that gives improved results for dis-
ease prediction and used a fuzzy approach. And used techniques like KNN classifier,
Fuzzy c-means clustering, and Fuzzy KNN classifier. In this paper diabetes disease
and liver, disorder prediction is done and the accuracy of Diabetes is 97.02% and
Liver disorder is 96.13%
Dhiraj Dahiwade et al.[18] designed a model for prediction of the disease using ap-
proaches of machine learning and used techniques like KNN and CNN. This paper
suggests disease prediction i.e. based on patient’s symptoms. The accuracy of KNN
is 95% and the accuracy of CNN is 98%.
Rati Shukla et al.[22] suggested prediction and detection for breast cancer by utiliz-
ing machine learning techniques like Decision Tree, Support Vector Machine, Random
Forest, Naı̈ve Bayes, Neural Network, and KNN. In this system, the Support Vector
Machine gives more accurate results than all other algorithms.
3
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
Rashmi G Saboji et al.[23] tried to find a scalable solution that can predict heart
disease utilizing classification mining and used Random Forest Algorithm. This sys-
tem presents a comparison against Naı̈ve-Bayes classifier but Random Forest gives
more accurate results with accuracy 98%.
Anjan Nikhil Repaka et al.[24] designed and implemented a prediction model for
heart disease using naive Bayesian. Any user can use this system using any smart-
phone device and get the prediction results. The accuracy of this system is 89.77%.
of CNN is 98%.
Lambodar Jena et al.[25] focused on risk prediction for chronic diseases by taking
advantage of distributed machine learning classifiers and used techniques like Naive
Bayes and Multilayer Perceptron. This paper tries to predict Chronic-Kidney-Disease
and the accuracy of Naı̈ve Bayes and Multilayer Perceptron is 95% and 99.7% respec-
tively.
Ankita Dewan et al.[26] recommended a disease prediction system that uses data
mining classification hybrid technique for predicting heart disease. This system is
using techniques like Neural Network, Decision Tree, and Naive Bayes. The accuracy
of this system is 87%.
Md. Ehtisham Farooqui et al.[27] Various models based on such algorithms, tech-
niques is presented and their performance is analyzed. Researches have been con-
ducted on various models of supervised learning algorithms.Support Vector Machine
(SVM), K-Nearest Neighbor (KNN), Decision Tree (DT), Naı̈ve Bayes and Random
Forest (RF)with Average accuracy : 95%
4
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
5
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
6
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
7
3 System Objectives
3.2 Objectives
The following are the objectives of studied system :
To provide Storage for the name of the disease of the patient in the Database
which can be used as past record and will help in future for prediction of diseases.
8
4 System Architecture
Researchers have proposed such a system that flaunts a simple, cost effective , elegant
User Interface and also be time efficient .System bridges the gap between doctors and
patients which will help both classes of users to achieve their goal. This system is used
to predict diseases according to symptoms. In this system he takes down symptoms
from the users and evaluate them by applying algorithms such as Decision Tree and
Naı̈ve baye’s which will help in getting accurate prediction .System will explore and
merge more datasets which includes large diversity of population to get more effective
results and thus system improves and enhances the accuracy of the results. Along
with the increased accuracy rate, proliferate the reliability of a system for this job
and can gain the trust of patient in this system.
9
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
In Figure 4.1.1 the entire flow of model takes place were it explain that how system
request a sets questions to the user to identify the symptoms and by applying the ML
techniques to it for prediction of the expected disease from it and return the results
back to user.
3) Patient will enter his symptoms and further answer the subquestions
10
5 Methodology
11
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
1) DECISION TREE :
The methodology used in the Decision tree is a commonly used data mining
method for establishing classification and prediction systems based on multiple
explanatory parameters for developing prediction models for a target instance.
This path classifies a population into branch-like segments in a tree that con-
struct an inverted tree with a root node, internal nodes, and leaf nodes. A
decision tree is a non-parametric algorithm which can efficiently deal with huge,
complicated data sets without involving multiple parametric structures. If the
sample size is large enough, study data can be divided into training and valida-
tion data sets. Using the training data set to build a decision tree model and
a validation data set decide on the appropriate tree size to achieve the optimal
final model.
12
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
13
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
iv) Entropy : A decision tree is built top-down from a root node and involves
partitioning the data into subsets that contain instances with similar values
14
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
15
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
2) NAIVE BAYES:
The Naive bayes algorithm is a classification algorithm that uses Bayesian tech-
niques and is based on the Bayes theorem in predictive modelling. [11] Some
algorithms are more computationally intensive than this one. As a result, it can
be used to quickly generate mining models to find relationships between input
columns and predictable columns.
The Naive Bayes algorithm is primarily used in the creation of classifiers. [12]
The categorical class labels are predicted using certain classifiers. Classifiers,
on the other hand, are used to determine which class the given inputs be-
long to.For the advancement of the expectation framework, the proposed smart
healthcare system framework employs a data mining technique known as ”Naive
Bayes classifier.” [13] This system contains a greater number of data indexes
and characteristics that are genuinely collected from expert data in order to
determine the exact symptom expectation. Some AI and data mining tech-
niques are based on the ”Naive Bayes or Bayes” Rule. The norm is used to
create models of precognitive abilities. It benefits from the ”proof” by deter-
mining the relationship between the goal (i.e., subordinate) and other variables.
The Bayes theorem and Bayes approximation are used in the Naive Bayes al-
gorithm. Bayes theorem
16
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
Naive Bayes algorithm is a simple technique which is used for developing the
models that are used to assigns class labels to problem instances. The class
labels are drawn from finite set. Naı̈ve Byes algorithm is not a single algorithm,
but it is a family of algorithm based on a common principle.
This principle states that the value of each feature is independent of values of
other features of all Naive Bayes classifiers. There are many probability models,
out of which the naı̈ve byes algorithm is efficiently trainee The Naive Bayes is
a straightforward method for creating templates for assigning class labels to
optimization problems. The class labels are chosen from a finite set of options.
The Nave Bayes method is a family of algorithms based on a general concept,
rather than a single algorithm. [14] The value of each feature of all naive bayes
classifiers is proportional to the value of the other features, according to this
theory. There are a variety of probability models, but the nave byes algorithm
is one of the most effective in supervised training.
Let’s summarize ”wolf-prediction” model using a 2x2 confusion matrix that depicts
all four possible outcomes:
17
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
A true positive is an outcome where the model correctly predicts the positive class.
Similarly, a true negative is an outcome where the model correctly predicts the
negative class.
A false positive is an outcome where the model incorrectly predicts the positive
class. And a false negative is an outcome where the model incorrectly predicts the
negative class.
i) Accuracy : Accuracy is one metric for evaluating classification models. Infor-
mally, accuracy is the fraction of predictions of model got right. Formally, accuracy
has the following definition:
For binary classification, accuracy can also be calculated in terms of positives and
negatives as follows:
18
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
Accuracy comes out to 0.91, or 91% (91 correct predictions out of 100 total exam-
ples). That means tumor classifier is doing a great job of identifying malignancies
Actually, let’s do a closer analysis of positives and negatives to gain more insight into
model’s performance.
Of the 100 tumor examples, 91 are benign (90 TNs and 1 FP) and 9 are malignant
(1 TP and 8 FNs).
Of the 91 benign tumors, the model correctly identifies 90 as benign. That’s good.
However, of the 9 malignant tumors, the model only correctly identifies 1 as malig-
nant—a terrible outcome, as 8 out of 9 malignancies go undiagnosed!
While 91% accuracy may seem good at first glance, another tumor-classifier model
that always predicts benign would achieve the exact same accuracy (91/100 correct
predictions) on given examples. In other words, model is no better than one that has
zero predictive ability to distinguish malignant tumors from benign tumors.
Model has a precision of 0.5—in other words, when it predicts a tumor is malignant,
it is correct 50% of the time.
19
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
ii)Recall : For a good classifier, the recall should ideally be 1 (high). If the nu-
merator and denominator are the same, that is, the recall is 1. H. TP = TP + FN,
which also means that FN is zero. As FN increases, the denominator value becomes
larger than the numerator and the recall value decreases.
Mathematically, recall is defined as follows:
Model has a recall of 0.11—in other words, it correctly identifies 11% of all malignant
tumors.
iii)F1 Score : The F1 score will only be 1 if both the fit and recall are 1. The
F1 score is high only if both the fit and recall are high. The F1 score is the harmonic
mean of precision and recall and is a better measure than precision.
20
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
TP (True Positive): It denotes the number of records classified as true while they
were actually true.
FN (False Negative): It denotes the number of records classified as false while they
were actually true.
FP (False Positive): It denotes the number of records classified as true while they
were actually false.
TN (True Negative): It denotes the number of records classified as false while they
were actually false.
21
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
PERFORMANCE EVALUATION :
22
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
23
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
24
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
25
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
26
6 Conclusion
Providing a chatbot service so, can get a instant solution on user problems
27
References
[4] P. Prabhu, S. Selvabharathi. Deep Belief Neural Network Model for Prediction of
Diabetes Mellitus. In 2019 3rd International Conference on Imaging, Signal Pro-
cessing and Communication, ICISPC 2019 (pp. 138–142) Institute of Electrical
and Electronics Engineers Inc. ISBN:9781728136639. 2019.
[6] H. Polat, H. Danaei Mehr, A. Cetin. Diagnosis of chronic kidney disease based on
support vector machine by feature selection methods, J. Med. Syst. 41(4) 2017
55.
[8] E. Gürbüz, E. Kılıç, A new adaptive support vector machine for diagnosis of
diseases, Expert Syst. 31 (5) (2014) 389–397. http://refhub.elsevier.com/S2214-
7853(21)05220-2/h0040
[9] M. Seera, C.P. Lim, A hybrid intelligent system for medical data classification,
Expert Syst. Appl. 41 (5) (2014) 2239–2249. http://refhub.elsevier.com/S2214-
7853(21)05220-2/h0045
28
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
[11] rshad Ullah, “Data Mining Algorithms And Medical Sciences”,IJCSIT, Vol 2,
No 6, December 2016
[12] Amit Kumar Das, Aman Kedia,“Data mining techniques in Indian Healthcare: A
Short Review”, 2015 International Conference on Man and Machine Interfacing
(MAMI), 978-1-5090-0225-2/15
[17] Naganna Chetty, Kunwar Singh Vaisla and Nagamma Patil, “An Im-
proved Method for Disease Prediction using Fuzzy Approach” IEEE, DOI
10.1109/ICACCE.2021.67, pp. 569-572, 2021.
[18] Dhiraj Dahiwade, Gajanan Patle and Ektaa Meshram, “Designing Disease Pre-
diction Model Using Machine Learning Approach” IEEE Xplore Part Number:
CFP19K25-ART; ISBN: 978-1-5386-7808-4, pp. 1211-1215, 2019.
[19] Pahulpreet Singh Kohli and Shriya Arora, “Application of Machine Learning in
Disease Prediction” IEEE, 978-1-5386-6947-1/18, pp. 1-4, 2018.
[21] Dhomse Kanchan B. and Mahale Kishor M., “Study of Machine Learning Algo-
rithms for Special Disease Prediction using Principal of Component Analysis”
IEEE, 978-1-5090-0467-6/18, pp. 5-10, 2018.
29
Machine Learning Technique for Early Prediction of Disease from Various Symptoms
[22] ]Rati Shukla, Vikash Yadav, Parashu Ram Pal and Pankaj Pathak, ”Machine
Learning Techniques for Detecting and Predicting Breast Cancer” IJITEE, ISSN:
2278-3075, Volume-8, pp. 2658-2662, 2019.
[23] Rashmi G Saboji and Prem Kumar Ramesh,“A Scalable Solution for Heart Dis-
ease Prediction using Classification Mining Technique” IEEE, 978-1-5386-1887-
5/17, pp. 1780-1785, 2017.
[24] Anjan Nikhil Repaka, Sai Deepak Ravikanti and Ramya G Franklin,”Design And
Implementing Heart Disease Prediction Using Naives Bayesian” IEEE Xplore
Part Number: CFP19J32-ART; ISBN: 978-1-5386-9439-8, pp. 292-297, 2019.
[25] Lambodar Jena and Ramakrushna Swain, “Chronic Disease Risk Prediction us-
ing Distributed Machine Learning Classifiers” IEEE, 978-1-5386-2924-6/17, pp.
170-173, 2017.
[26] Ankita Dewan and Meghna Sharma, “Prediction of Heart Disease Using a Hybrid
Technique in Data Mining Classification” IEEE, 978-9-3805-4416-8/15, pp. 704-
706, 2015.
[27] Md. Ehtisham Farooqui, Dr. Jameel Ahmad,”A Detailed Review on Disease Pre-
diction Models that uses Machine Learning”’, International Journal of Innova-
tive Research in Computer Science & Technology (IJIRCST) ISSN: 2347-5552,
Volume- 8, Issue- 4, July- 2020
30