0% found this document useful (0 votes)
141 views35 pages

Latest Seminar Report Yash Ingole

The document discusses using machine learning techniques for early prediction of disease from symptoms. It motivates the use of machine learning models by noting that diagnosis is complex due to evolving medical knowledge and limited physician cognitive abilities. Machine learning models have been shown to diagnose with higher accuracy than experienced physicians alone. The document proposes developing a system using machine learning algorithms trained on disease datasets to extract features and enable optimal disease diagnosis, prediction, prevention, and treatment.

Uploaded by

Pranjal Hejib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views35 pages

Latest Seminar Report Yash Ingole

The document discusses using machine learning techniques for early prediction of disease from symptoms. It motivates the use of machine learning models by noting that diagnosis is complex due to evolving medical knowledge and limited physician cognitive abilities. Machine learning models have been shown to diagnose with higher accuracy than experienced physicians alone. The document proposes developing a system using machine learning algorithms trained on disease datasets to extract features and enable optimal disease diagnosis, prediction, prevention, and treatment.

Uploaded by

Pranjal Hejib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

(7KS09) Seminar Report

on
Machine Learning Technique for Early Prediction
of Disease from Various Symptoms
Submitted by

Yash J. Ingole

Seventh Semester
B.E. (Computer Science & Engineering)

Guided by

Dr. R. R. Karwa

Department of Computer Science & Engineering,


Prof. Ram Meghe Institute of Technology & Research,
Badnera - Amravati
2022-2023
CERTIFICATE

This is to certify that the Seminar report entitled

Machine Learning Technique for Early Prediction of Disease


from Various Symptoms

is a bonafide work and it is submitted to the


Sant Gadge Baba Amravati University, Amravati
by

Mr. Yash J. Ingole

Seventh Semester B.E.(Computer Science & Engineering) in


the partial fulfillment of the requirement for the degree of
Bachelor of Engineering in Computer Science & Engineering,
during the academic year 2022-2023 under my guidance.

Dr. R. R. Karwa Dr. G. R. Bamnote


Guide HOKD
Department of Computer Sci.& Engg Department of Computer Sci.& Engg
PRM Institute Of Technology & Research, PRM Institute Of Technology & Research,
Badnera Badnera

Department of Computer Science & Engineering,


Prof. Ram Meghe Institute of Technology & Research,
Badnera - Amravati
2022-2023
Acknowledgment

With great pleasure I hereby acknowledge the help given to me by various individuals
throughout the process. This is an acknowledgement to the inspiration, drive and
technical assistance contributed by many individuals. This report would have never
seen the light of this day without the help and guidance which I have received.

I would like to express my profound thanks to Dr. R. R. Karwa for their guid-
ance and constant supervision as well as for providing necessary information regarding
the seminar report. I would also like to thanks Dr. G. R. Bamnote, Principal of
PRMIT&R and Head, Department of Computer Science & Engineering and the fac-
ulties for their kind co-operation and encouragement which helped me in completion
of this report. I owe an incalculable debt to all staffs of the Department of Computer
Science & Engineering for their direct and indirect help.

My thanks and appreciations also go to my colleagues in developing the report


and people who have willingly helped me out with their abilities.

I extend my heartfelt thanks to my parents, friends and well wishers for their
support and timely help. Last but not the least; I will thank the God Almighty for
guiding me in every step of the way.

Yash Jayant Ingole

i
Abstract

Most of the medical diagnoses require going to the doctor and fixing appointments
for a consultation and sometimes to get accurate disease indications they have to wait
for blood reports also they have to travel long distances to seek doctor consultation.
When they are not feeling well the first thing they do is to check temperature to
get an estimate or baseline idea of their fever so they can consult their doctor if the
temperature is high enough similarly a medical disease prediction application can
be used to get a baseline idea of disease and prediction model can indicate whether
they should take immediate doctor consultation or not, or at least start some home-
remedies for the same to find temporary relief. Combining machine learning with
an application interface to interact with users provides opportunities for easy interac-
tion with the users with the machine learning model to get more accurate predictions.

ii
Contents

1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Literature Survey 3

3 System Objectives 8
3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 System Architecture 9

5 Methodology 11
5.1 Algorithms and their significance . . . . . . . . . . . . . . . . . . . . 12
5.2 Parameters used for Calculating the Accuracies of the Machine Learn-
ing Models : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

6 Conclusion 27
6.1 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

References 28

iii
1 Introduction

In terms of data collecting and processing, healthcare is one of the most worri-
some industries. With the advent of the digital era and technological advancements,
a vast quantity of multidimensional data on patients is created, including clinical
factors, hospital resources, illness diagnostic information, patients’ records, and med-
ical equipment. The enormous, dense, and complex data must be processed and
evaluated in order to extract knowledge for effective decision making. Medical data
mining offers a lot of potential for uncovering hidden patterns in medical data sets [1].

By identifying significant patterns and detecting correlations and relationships among


many variables in huge databases, the use of various data mining tools and machine
learning approaches has changed healthcare organizations [2,3]. It serves as an im-
portant instrument in the medical sector, providing and comparing existing data for
the future course of action. This technology combines multiple analytic methodolo-
gies with modern and complex algorithms, allowing for the exploration of massive
amounts of data [4]. It is used in healthcare to gather, organize, and analyze patient
data in a systematic manner. It may be used to identify inherent inefficiencies and
best practices for providing better services, which may lead to improved diagnosis,
better medicine, and more successful treatment, as well as a platform for a deeper
knowledge of the mechanisms in practically all elements of the medical domain. Over-
all, it assists in the early detection and prevention of disease epidemics by searching
medical databases for pertinent information.

The process of determining a condition based on a person’s symptoms and indicators


is known as medical diagnosis. In the diagnostic process, one or more diagnostic
procedures, such as diagnostic tests, are performed. Diagnosis of chronic illnesses is
a vital issue in the medical industry since it is based on many symptoms. It is a
complex procedure that frequently leads to incorrect assumptions. When diagnosing
illnesses, the clinical judgment is based mostly on the patient’s symptoms as well as
the physicians’ knowledge and experience [5]. Furthermore, when medical systems
evolve and new treatments become available, it becomes more difficult for physicians
and doctors to stay up with the current innovations in clinical practice [6]. For effec-
tive therapy, medical practitioners and doctors must be well-versed in all pertinent
diagnostic criteria, patient history, and a mix of medication therapy. However, mis-
takes are possible since they make judgments instinctively based on information and
experience gained from past experience with patients. Because of factors such as
multitasking, restricted analysis, and memory capacity, their cognitive capacities are
restricted [7]. As a result, it is difficult for a physician to make the right judgment
on a consistent basis if he is not supported by clinical tests and patient history infor-
mation. Even experienced physicians can benefit from a computer-aided diagnostic
system in making sound medical judgments [8]. Thus, medical professionals are very
interested in automating the diagnosis process by integrating machine learning tech-

1
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

niques with physician expertise [9]. Data mining and machine learning approaches
are making significant efforts to intelligently translate accessible data into valuable
information in order to improve the diagnostic process’s efficiency. Several studies
have been conducted to explore the use of machine learning in terms of diagnostic
abilities. It was discovered that, when compared to the most experienced physician,
who can diagnose with 79.97% accuracy, machine learning algorithms could identify
with 91.1% correctness [10]. Machine learning techniques are explicitly used to illness
datasets to extract features for optimal illness diagnosis, prediction, prevention, and
therapy.

1.1 Motivation
Within the healthcare ecosystem there are many stakeholder groups, each of
which has a specific set of needs with regard to ED&D. These groups include patients,
primary and secondary care clinicians, radiologists, epidemiologists, geneticists, clin-
ical researchers, and policymakers

Each of these stakeholders must act based on different information and knowledge,
and different objectives; however, these views must be integrated in order to build
a shared approach to making early detection and diagnosis as powerful and efficient
as possible. What is needed is an overarching ecosystem capable of pulling together
the broad range of available data, needs, and perspectives—a task perfectly suited to
machine learning. By contextualizing and presenting information in a manner spe-
cific to each user and situation, machine learning can extract actionable intelligence
across this range of different datasets, and tailor insights to specific stakeholders.This
is not something that can be done at the standard machine level (for example, where
a model is given information, and makes and displays a prediction on the basis of
that information).

There is a demand to make such ecosystem that will help end users to predict dis-
eases on the basis of symptoms given in it without visiting hospitals. By doing so,
it decreases the rush at OPD of hospitals and bring down the workload on medical
staff. Not only this, this system reduces the costly treatment and panic moment at
the end stages so that proper medication can be provided at the right time and which
can lower down the death rate as well.

2
2 Literature Survey

This section outlines that what has been done so far in Machine Learning Tech-
nique for Early Prediction of Disease from Various Symptoms

Kunal Takke et al.[16] To identify and predict patients suffering from more preva-
lent ailments using Multinomial Naive Bayes, Random Forest Classifier, K-Nearest
Neighbours MNB :98% RFC : 98.2% KNN : 97%

Naganna Chetty et al.[17] developed a system that gives improved results for dis-
ease prediction and used a fuzzy approach. And used techniques like KNN classifier,
Fuzzy c-means clustering, and Fuzzy KNN classifier. In this paper diabetes disease
and liver, disorder prediction is done and the accuracy of Diabetes is 97.02% and
Liver disorder is 96.13%

Dhiraj Dahiwade et al.[18] designed a model for prediction of the disease using ap-
proaches of machine learning and used techniques like KNN and CNN. This paper
suggests disease prediction i.e. based on patient’s symptoms. The accuracy of KNN
is 95% and the accuracy of CNN is 98%.

Pahulpreet Singh Kohli et al.[19] suggested disease prediction by using applications


and methods of machine learning and used techniques like Logistic Regression, De-
cision Tree, Support Vector Machine, Random Forest and Adaptive Boosting. This
paper focuses on predicting Heart disease, Breast cancer, and Diabetes. The highest
accuracies are obtained using Logistic Regression that is 95.71% for Breast cancer,
84.42% for Diabetes, and 87.12% for Heart disease.

Senthilkumar Mohan et al.[20] focused on hybrid techniques in machine learning that


can be used for effectively predicting heart disease and used algorithms like Decision
Tree, Support Vector Machine, Random Forest, Naı̈ve Bayes, Neural Network and
KNN. The accuracy of this system is 88.47%

Dhomse Kanchan B. et al.[21] studied special disease prediction utilizing principal


component analysis using machine learning algorithms involving techniques like Naive
Bayes classification, Decision Tree, and Support Vector Machine. The accuracy of this
system is 34.89% for Diabetes and 53% for Heart disease.

Rati Shukla et al.[22] suggested prediction and detection for breast cancer by utiliz-
ing machine learning techniques like Decision Tree, Support Vector Machine, Random
Forest, Naı̈ve Bayes, Neural Network, and KNN. In this system, the Support Vector
Machine gives more accurate results than all other algorithms.

3
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

Rashmi G Saboji et al.[23] tried to find a scalable solution that can predict heart
disease utilizing classification mining and used Random Forest Algorithm. This sys-
tem presents a comparison against Naı̈ve-Bayes classifier but Random Forest gives
more accurate results with accuracy 98%.

Anjan Nikhil Repaka et al.[24] designed and implemented a prediction model for
heart disease using naive Bayesian. Any user can use this system using any smart-
phone device and get the prediction results. The accuracy of this system is 89.77%.
of CNN is 98%.

Lambodar Jena et al.[25] focused on risk prediction for chronic diseases by taking
advantage of distributed machine learning classifiers and used techniques like Naive
Bayes and Multilayer Perceptron. This paper tries to predict Chronic-Kidney-Disease
and the accuracy of Naı̈ve Bayes and Multilayer Perceptron is 95% and 99.7% respec-
tively.

Ankita Dewan et al.[26] recommended a disease prediction system that uses data
mining classification hybrid technique for predicting heart disease. This system is
using techniques like Neural Network, Decision Tree, and Naive Bayes. The accuracy
of this system is 87%.

Md. Ehtisham Farooqui et al.[27] Various models based on such algorithms, tech-
niques is presented and their performance is analyzed. Researches have been con-
ducted on various models of supervised learning algorithms.Support Vector Machine
(SVM), K-Nearest Neighbor (KNN), Decision Tree (DT), Naı̈ve Bayes and Random
Forest (RF)with Average accuracy : 95%

4
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

Table 2.1 Process & Techniques of conventional models

Sr Year Author Purpose Technique Accuracy


no. Used
1 2022 Kunal To identify and Multinomial MNB :98%
Takke et predict patients Naive Bayes, RFC :
tl.[16] suffering from Random For- 98.2%
more prevalent est Classifier, KNN :
ailments K-Nearest 97%
Neighbors
2 2021 Naganna Developed a sys- KNN classifier, Diabetes:
Chetty et tem that gives Fuzzy c-means 97.02%
al.[17] improved results clustering, and Liver dis-
for disease pre- Fuzzy KNN order:
diction and used classifier 96.13%
a fuzzy approach
3 2019 Dhiraj Designed a K-Nearest KNN: 95%
Dahiwade model for pre- neighbor (KNN) CNN: 98%
et al.[18] diction of the and Convolu-
disease using tional neural
approaches network (CNN)
of machine
learning
4 2018 Pahulp Suggested dis- Logistic Regres- Logistic
reet Singh ease prediction sion,Decision Regression
Kohli et by using ap- Tree : Breast
al.[19] plications and Can-
methods of ma- cer:95.71%
chine learning Diabetes:
84.42%
Heart
Disease:
87.12%
Decision
Tree :
Breast
Cancer:
94.29%
Diabetes:
74.03%
Heart
Disease:
70.97%

5
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

5 2019 Senthil Focused on hy- Decision Tree, 88.4%


kumar brid techniques Support Vec-
Mohan et in machine tor Machine,
al.[20] learning that Random Forest,
can be used Naı̈ve Bayes,
for effectively Neural Network
predicting heart and KNN
disease
6 2018 Dhom se Studied special Naive Bayes Diabetes
Kanch disease predic- classification, Disease:
an B. et tion utilizing Decision Tree 34.89%
al.[21] principal com- and Support Heart
ponent analysis Vector Machine Disease:
using machine 53%
learning algo-
rithms
7 2019 Rati Suggested pre- Naive Bayes 98.2%
Shukla et diction and Classifier, Lo-
al.[22] detection for gistic Regres-
breast cancer sion, Support
by utilizing ma- Vector Ma-
chine learning chines(SVM),Artificial
technique Nueral Networks
and K-Nearest
Neighbor
8 2017 Rashmi G Tried to find a Random Forest 98%
Saboji et scalable solu- Algorithm
al.[23] tion that can
predict heart
disease utilizing
classification
mining
9 2019 Anjan Designed and Naive Bayes 89.7%
Nikhil implemented
Repak a et a prediction
al.[24] model for heart
disease using
naive Bayesian

6
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

10 2017 Lambo dar Focused on risk Naive Naive


Jena et prediction for Bayes,Multilayer Bayes :
al.[25] chronic dis- Perceptron 95% Mul-
eases by taking tiLayer
advantage of Perceptron
distributed ma- : 99.7%
chine learning
classifiers
11 2015 Ankita Recommended Neural Network, 87%
Dewan et a disease pre- Decision Tree
al.[26] diction system and Naive Bayes
that uses data
mining classi-
fication hybrid
technique
12 2020 Md. Various models Support Vec- Average
Ehtisham based on such tor Machine accuracy :
Farooqui algorithms, (SVM), K- 95%
et al.[27] techniques Nearest Neigh-
is presented bor (KNN),
and their per- Decision Tree
formance is (DT), Naı̈ve
analyzed. Re- Bayes and Ran-
searches have dom Forest
been conducted (RF).
on various
models of super-
vised learning
algorithms

7
3 System Objectives

3.1 Problem Definition


A system that flaunt a simple, cost effective , elegant User Interface and also
be time efficient . It bridges the gap between doctors and patients and help classes
of users to achieve their goal. This system is used to predict diseases according
to symptoms. This system are going to take down symptoms from the users and
evaluate them by applying algorithms such as Decision Tree and Naı̈ve bayes which
helps in getting accurate prediction system and explore & merge more datasets which
includes large diversity of population to get more effective results by this system
improve and enhances its accuracy of the results. Along with the increased accuracy
rate, proliferate the reliability of a system for this job and can gain the trust of
patient. In this system apart from all these, systems are comprised of a Database
for storing the data entered by the users and the name of the disease the patient is
suffering from which can be used as a reference in future for further treatment. Hence
system contribute in easier health management with better satisfaction to the users.

3.2 Objectives
The following are the objectives of studied system :

ˆ To collect a data regarding different disease and their corresponding symptoms.

ˆ To manipulate the available data as per need.

ˆ To fit the given data into a model.

ˆ To use of ML algorithm which provide a maximum accuracy in disease predic-


tion.

ˆ To provide Storage for the name of the disease of the patient in the Database
which can be used as past record and will help in future for prediction of diseases.

ˆ To give user-friendly GUI

8
4 System Architecture

Researchers have proposed such a system that flaunts a simple, cost effective , elegant
User Interface and also be time efficient .System bridges the gap between doctors and
patients which will help both classes of users to achieve their goal. This system is used
to predict diseases according to symptoms. In this system he takes down symptoms
from the users and evaluate them by applying algorithms such as Decision Tree and
Naı̈ve baye’s which will help in getting accurate prediction .System will explore and
merge more datasets which includes large diversity of population to get more effective
results and thus system improves and enhances the accuracy of the results. Along
with the increased accuracy rate, proliferate the reliability of a system for this job
and can gain the trust of patient in this system.

Figure 4.1 : Architecture of the system

9
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

In Figure 4.1.1 the entire flow of model takes place were it explain that how system
request a sets questions to the user to identify the symptoms and by applying the ML
techniques to it for prediction of the expected disease from it and return the results
back to user.

Figure 4.1.1 Data Flow Diagram of Model

1) Firstly user will login on the portal

2) Through model application will ask for symptoms

3) Patient will enter his symptoms and further answer the subquestions

4) search for disease and symptoms details in database and return it

5) Finding a related disease using given symptoms in ML model and return a


calculated result

6) Return a confirmed Diagnosis

10
5 Methodology

A system is designed and developed comprising feature extraction strategy


1) Dataset: The data is collected from Columbia University and this study of diseases
and their corresponding symptoms is available on Kaggle.
2)Training Data: Training data is also known as training datasets, training sets, and
training sets. It is an important aspect of the machine learning model which helps
it to make accurate predictions and perform the tasks . Simply put, training data
forms a machine learning model and tells you what the awaited result looks like. The
model iteratively analyzes the dataset to understand its attributes precisely and make
appropriate changes to enhance the performance
3)Testing Data: The test dataset is a subset of the training dataset used to make an
objective evaluation of the final model.
4)Balanced Data

Figure 5.1 : Balanced Data

In supervised machine learning, it is important to train an estimator on balanced


data ,so the model is evenly informed on all classes. The observation of the dataset
and its visualization leads to the conclusion that the data is balanced and there’s no
imbalance in the data, which means that training and testing gives real exactness.
5)Correlation of Disease with Respect to their Corresponding Symptoms:Matrix data
structures are used when there are multiple variables and the aim is to find the
correlations between all these variables and store them using the applicable data
structure. Thus this matrix is known as a correlation matrix. A correlation matrix
is a table that shows the correlation portions between a set of variables. The pair of
variables which are closely related are determined using correlation matrix. It can
also be used to identify relationships between variables that may not be immediately
apparent. Thus Correlation matrices are a tool for researchers and analysts who want
to understand the relationships between multiple variables.

11
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

Figure 5.2: Correlation of Disease with Respect to their Corresponding


Symptoms

5.1 Algorithms and their significance


Multiple disease prediction in accordance with symptoms entered by patient. The
first task is to determine the problem statement. Then making the dataset ready
to work on. After that data is conceptualize using scatter plot, distribution graph,
etc. by doing it there are some anomalies, missing values, etc. in data and makes
the dataset perfect for prediction and finally the main feature is Machine Learning
in which by using algorithms like Decision Tree and Naive Bayes algorithm which
predicts accurate disease for early prediction and better patient care. This model,
uses python as a platform to execute given Machine Learning algorithms.

1) DECISION TREE :
The methodology used in the Decision tree is a commonly used data mining
method for establishing classification and prediction systems based on multiple
explanatory parameters for developing prediction models for a target instance.
This path classifies a population into branch-like segments in a tree that con-
struct an inverted tree with a root node, internal nodes, and leaf nodes. A
decision tree is a non-parametric algorithm which can efficiently deal with huge,
complicated data sets without involving multiple parametric structures. If the
sample size is large enough, study data can be divided into training and valida-
tion data sets. Using the training data set to build a decision tree model and
a validation data set decide on the appropriate tree size to achieve the optimal
final model.

12
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

Figure 5.1.1 : Decision Tree

Correlation of Disease with Respect to their Corresponding Symptoms The De-


cision tree works with the underlying symptoms and predicts a disease. Initially,
the user’s has the symptoms that put it in an array with the value assigned as
1 across these values. This is passed as an input to the model for predicting the
disease. This array matches the disease data collection and ends at a common
leaf node with the highest degree of trust.

i) Choose the best attribute / feature.


ii) The best attribute is the one that best separates or divides the data into
subsets.
iii) The recursive process ends when all elements belong to the same attribute,
no more attributes and instances are left.

13
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

Figure 5.1.2 : Attribute / Feature

Decision tree builds classification or regression models in the form of a tree


structure. It breaks down a dataset into smaller and smaller subsets while
at the same time an associated decision tree is incrementally developed.
The final result is a tree with decision nodes and leaf nodes. A decision
node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and
Rainy). Leaf node (e.g., Play) represents a classification or decision. The
topmost decision node in a tree which corresponds to the best predictor
called root node. Decision trees can handle both categorical and numerical
data
The core algorithm for building decision trees called C4.5 by J. R. Quinlan
which employs a top-down, greedy search through the space of possible
branches with no backtracking. C4.5 uses Entropy and Information Gain
to construct a decision tree.

iv) Entropy : A decision tree is built top-down from a root node and involves
partitioning the data into subsets that contain instances with similar values

14
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

(homogenous). ID3 algorithm uses entropy to calculate the homogeneity


of a sample. If the sample is completely homogeneous the entropy is zero
and if the sample is an equally divided it has entropy of one. To build
a decision tree, needs to calculate two types of entropy using frequency
tables as follows:
a) Entropy using the frequency table of one attributes:

b) Entropy using the frequency table of two attributes:

v) Information Gain : The information gain is based on the decrease in en-


tropy after a dataset is split on an attribute. Constructing a decision tree
is all about finding attribute that returns the highest information gain (i.e.,
the most homogeneous branches).
Step 1: Calculate entropy of the target
Step 2: The dataset is then split on the different attributes. The entropy for
each branch is calculated. Then it is added proportionally, to get
total entropy for the split. The resulting entropy is subtracted from
the entropy before the split. The result is the Information Gain, or
decrease in entropy.
Step 3: Choose attribute with the largest information gain as the decision node

Step 4a: A branch with entropy of 0 is a leaf node.


Step 4b: A branch with entropy more than 0 needs further splitting.
Step 5: The ID3 algorithm is run recursively on the non-leaf branches, until
all data is classified

15
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

2) NAIVE BAYES:
The Naive bayes algorithm is a classification algorithm that uses Bayesian tech-
niques and is based on the Bayes theorem in predictive modelling. [11] Some
algorithms are more computationally intensive than this one. As a result, it can
be used to quickly generate mining models to find relationships between input
columns and predictable columns.

The Naive Bayes algorithm is primarily used in the creation of classifiers. [12]
The categorical class labels are predicted using certain classifiers. Classifiers,
on the other hand, are used to determine which class the given inputs be-
long to.For the advancement of the expectation framework, the proposed smart
healthcare system framework employs a data mining technique known as ”Naive
Bayes classifier.” [13] This system contains a greater number of data indexes
and characteristics that are genuinely collected from expert data in order to
determine the exact symptom expectation. Some AI and data mining tech-
niques are based on the ”Naive Bayes or Bayes” Rule. The norm is used to
create models of precognitive abilities. It benefits from the ”proof” by deter-
mining the relationship between the goal (i.e., subordinate) and other variables.

The Bayes theorem and Bayes approximation are used in the Naive Bayes al-
gorithm. Bayes theorem

. P (c/x) is the posterior probability of class (target) given predictor (at-


tribute).
. P(c) is the prior probability of class.
. P (x/c) is the likelihood which is the probability of predictor given class.
. P(x) is the prior probability of predictor.

16
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

Naive Bayes algorithm is a simple technique which is used for developing the
models that are used to assigns class labels to problem instances. The class
labels are drawn from finite set. Naı̈ve Byes algorithm is not a single algorithm,
but it is a family of algorithm based on a common principle.
This principle states that the value of each feature is independent of values of
other features of all Naive Bayes classifiers. There are many probability models,
out of which the naı̈ve byes algorithm is efficiently trainee The Naive Bayes is
a straightforward method for creating templates for assigning class labels to
optimization problems. The class labels are chosen from a finite set of options.
The Nave Bayes method is a family of algorithms based on a general concept,
rather than a single algorithm. [14] The value of each feature of all naive bayes
classifiers is proportional to the value of the other features, according to this
theory. There are a variety of probability models, but the nave byes algorithm
is one of the most effective in supervised training.

5.2 Parameters used for Calculating the Accuracies of the


Machine Learning Models :
The confusion matrix is an N x N matrix used to evaluate the performance of the
classification model. Where N is the number of target classes. The matrix compares
the actual target value with the value predicted by the machine learning model. This
gives you a complete picture of the classification model’s performance and error types.

Let’s make the following definitions:

.”Wolf” is a positive class.


.”No wolf” is a negative class.

Let’s summarize ”wolf-prediction” model using a 2x2 confusion matrix that depicts
all four possible outcomes:

17
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

A true positive is an outcome where the model correctly predicts the positive class.
Similarly, a true negative is an outcome where the model correctly predicts the
negative class.
A false positive is an outcome where the model incorrectly predicts the positive
class. And a false negative is an outcome where the model incorrectly predicts the
negative class.
i) Accuracy : Accuracy is one metric for evaluating classification models. Infor-
mally, accuracy is the fraction of predictions of model got right. Formally, accuracy
has the following definition:

For binary classification, accuracy can also be calculated in terms of positives and
negatives as follows:

Where TP = True Positives, TN = True Negatives, FP = False Positives, and FN =


False Negatives.
Calculating accuracy for the following model that classified 100 tumors as malignant
(the positive class) or benign (the negative class):

18
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

Accuracy comes out to 0.91, or 91% (91 correct predictions out of 100 total exam-
ples). That means tumor classifier is doing a great job of identifying malignancies
Actually, let’s do a closer analysis of positives and negatives to gain more insight into
model’s performance.

Of the 100 tumor examples, 91 are benign (90 TNs and 1 FP) and 9 are malignant
(1 TP and 8 FNs).

Of the 91 benign tumors, the model correctly identifies 90 as benign. That’s good.
However, of the 9 malignant tumors, the model only correctly identifies 1 as malig-
nant—a terrible outcome, as 8 out of 9 malignancies go undiagnosed!

While 91% accuracy may seem good at first glance, another tumor-classifier model
that always predicts benign would achieve the exact same accuracy (91/100 correct
predictions) on given examples. In other words, model is no better than one that has
zero predictive ability to distinguish malignant tumors from benign tumors.

ii)Precision : The accuracy of a good classifier should ideally be 1 (high). The


accuracy is 1 only if the numerator and denominator are the same, that is, TP = TP
+ FP. This also means that the FP is zero. As FP increases, the denominator value
becomes larger than the numerator and the precision value decreases.
Precision is defined as follows:

Calculating precision for ML model for tumor analysis:

Model has a precision of 0.5—in other words, when it predicts a tumor is malignant,
it is correct 50% of the time.

19
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

ii)Recall : For a good classifier, the recall should ideally be 1 (high). If the nu-
merator and denominator are the same, that is, the recall is 1. H. TP = TP + FN,
which also means that FN is zero. As FN increases, the denominator value becomes
larger than the numerator and the recall value decreases.
Mathematically, recall is defined as follows:

Let’s calculate recall for tumor classifier:

Model has a recall of 0.11—in other words, it correctly identifies 11% of all malignant
tumors.

iii)F1 Score : The F1 score will only be 1 if both the fit and recall are 1. The
F1 score is high only if both the fit and recall are high. The F1 score is the harmonic
mean of precision and recall and is a better measure than precision.

20
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

5.3 Case Study


Case Study on Disease Predication - Heart Disease :
By using Decision tree and Navie bias model is going to predict heart disease. [28]

A confusion matrix is obtained to calculate the accuracy of classification. A con-


fusion matrix shows how many instances have been assigned to each class. Here two
classes, and therefore 2x2 confusion matrix.
Class a = YES (has heart disease)
Class b = NO (no heart disease)

TP (True Positive): It denotes the number of records classified as true while they
were actually true.
FN (False Negative): It denotes the number of records classified as false while they
were actually true.
FP (False Positive): It denotes the number of records classified as true while they
were actually false.
TN (True Negative): It denotes the number of records classified as false while they
were actually false.

CONFUSION MATRIX FOR NAIVE BAYES :

21
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

CONFUSION MATRIX FOR DECISION TREES :

PERFORMANCE EVALUATION :

PERFORMANCE METRICS OF DT AND NB :

22
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

23
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

THE CLASSIFICATION ACCURACY – DECISION TREE CLASSI-


FIER - CROSS VALIDATION :

24
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

THE CLASSIFICATION ACCURACY – NAIVE BAYES CLASSI-


FIER - CROSS VALIDATION :

25
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

PERFORMANCE ANALYSIS FOR TWO CLASSIFIERS - CROSS


VALIDATION :

26
6 Conclusion

Machine Learning Technique for Early Prediction of Disease from Various


Symptoms in which proper implementation of Machine Learning algorithm like Deci-
sion Tree and Navie Bais has done . Algorithms for prediction and achieved the mean
accuracy of more than 97% which shows remarkable rectification and high accuracy
than previous work and also makes this system more reliable than the existing one
for this job and hence provides better satisfaction to the user in comparison with the
other one. It also stores the data entered by the user and the name of the disease the
patient is suffering from in the Database which can be used as past record and will
help in future for prediction of disease with similar symptoms and thus contributing
in increasing efficiency of model .In this GUI is also present for better interaction
with the system by users which is very easy to operate also system has no threshold
of the users because everyone can use this system. hence Machine Learning algorithm
can be used to predict the disease easily with different parameters .

6.1 Future Scope


ˆ Identifying the pathology lab report basis on it predict the disease

ˆ Providing a chatbot service so, can get a instant solution on user problems

ˆ According to estimated disease system is going to suggest the name of medical


practitioners to get a proper treatment.

ˆ Providing the suggestion of diet chart,medicine,therapist etc. on the basis of


predicted result .

27
References

[1] R. Manne, S.C. Kantheti, Application of artificial intelligence in healthcare:


chances and challenges, Curr. J. Appl. Sci. Technol. 40 (6) (2021) 78–89, https://
doi.org/10.9734/cjast/2021/v40i631320.

[2] M. Sivakami, P. Prabhu. Classification of algorithms supported factual knowledge


recovery from cardiac data set, Int. J. Curr. Res. Rev. 13(6) 161- 166. ISSN:
2231-2196 (Print) ISSN: 0975-5241

[3] M. Sivakami, P. Prabhu. A Comparative Review of Recent Data Mining Tech-


niques for Prediction of Cardiovascular Disease from Electronic Health Records.
In: Hemanth D., Shakya S., Baig Z. (eds) Intelligent Data Communication Tech-
nologies and Internet of Things. ICICI 2019. Lecture Notes on Data Engineer-
ing and Communications Technologies, vol 38. Springer, Cham 477-484. ISSN
2367-4512 ISSN 2367-4520 (electronic), ISBN 978-3-030-34079-7 ISBN 978-3-
030-34080-3

[4] P. Prabhu, S. Selvabharathi. Deep Belief Neural Network Model for Prediction of
Diabetes Mellitus. In 2019 3rd International Conference on Imaging, Signal Pro-
cessing and Communication, ICISPC 2019 (pp. 138–142) Institute of Electrical
and Electronics Engineers Inc. ISBN:9781728136639. 2019.

[5] N. Jothi, N.A. Rashid, W. Husain, Data mining in health-


care – A review, Procedia Comput. Sci. 72 (2015)–313.
https://www.sciencedirect.com/science/article/pii/S1877050915036066

[6] H. Polat, H. Danaei Mehr, A. Cetin. Diagnosis of chronic kidney disease based on
support vector machine by feature selection methods, J. Med. Syst. 41(4) 2017
55.

[7] K.B. Wagholikar, V. Sundararajan, A.W. Deshpande, Modeling paradigms for


medical diagnostic decision support: a survey and future directions, J. Med. Syst.
36 (5) (2012) 3029–3049. https://link.springer.com/article/10.1007/s10916-011-
9780-4

[8] E. Gürbüz, E. Kılıç, A new adaptive support vector machine for diagnosis of
diseases, Expert Syst. 31 (5) (2014) 389–397. http://refhub.elsevier.com/S2214-
7853(21)05220-2/h0040

[9] M. Seera, C.P. Lim, A hybrid intelligent system for medical data classification,
Expert Syst. Appl. 41 (5) (2014) 2239–2249. http://refhub.elsevier.com/S2214-
7853(21)05220-2/h0045

28
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

[10] Y. Kazemi, S.A. Mirroshandel, A novel method for predicting kidney


stone type using ensemble learning, Artif. Intell. Med. 84 (2018) 117–126.
http://refhub.elsevier.com/S2214-7853(21)05220-2/h0050

[11] rshad Ullah, “Data Mining Algorithms And Medical Sciences”,IJCSIT, Vol 2,
No 6, December 2016

[12] Amit Kumar Das, Aman Kedia,“Data mining techniques in Indian Healthcare: A
Short Review”, 2015 International Conference on Man and Machine Interfacing
(MAMI), 978-1-5090-0225-2/15

[13] T. Devasia, T. P. Vinushree and V. Hegde, ”Prediction of students performance


using T. Devasia, T. P. Vinushree and V. Hegde, ”Prediction of students perfor-
mance using

[14] W. Song, C. H. Li and S. C. Park, ”Expert Systems with Applications Genetic


algorithm for text clustering using ontology and evaluating the validity of various
semantic similarity measures”, Expert Syst. Appl, vol. 36, no. 5, pp. 9090-9110,
2018

[15] M. Chen, Y. Hao, K. Hwang, L. Wang, and L. Wang, “Disease prediction by


machine learning over big data from healthcare communities” IEEE Access, vol.
5, no. 1, pp. 8869–8879, 2017.

[16] Kunal Takke,”Medical Disease Prediction using Machine Learning Algo-


rithm”,International Journal for Research in Applied Science & Engineering
Technology (IJRASET)ISSN: 2321-9653 IC Value: 45.98; SJ Impact Factor:
7.538 Volume 10 Issue V May 2022

[17] Naganna Chetty, Kunwar Singh Vaisla and Nagamma Patil, “An Im-
proved Method for Disease Prediction using Fuzzy Approach” IEEE, DOI
10.1109/ICACCE.2021.67, pp. 569-572, 2021.

[18] Dhiraj Dahiwade, Gajanan Patle and Ektaa Meshram, “Designing Disease Pre-
diction Model Using Machine Learning Approach” IEEE Xplore Part Number:
CFP19K25-ART; ISBN: 978-1-5386-7808-4, pp. 1211-1215, 2019.

[19] Pahulpreet Singh Kohli and Shriya Arora, “Application of Machine Learning in
Disease Prediction” IEEE, 978-1-5386-6947-1/18, pp. 1-4, 2018.

[20] Senthilkumar Mohan, Chandrasegar Thirumalai and Gautam Srivastava, “Ef-


fective Heart Disease Prediction Using Hybrid Machine Learning Techniques”
IEEE Access, DOI 10.1109/ACCESS.2019.2923707, pp. 81542-81554, 2019.

[21] Dhomse Kanchan B. and Mahale Kishor M., “Study of Machine Learning Algo-
rithms for Special Disease Prediction using Principal of Component Analysis”
IEEE, 978-1-5090-0467-6/18, pp. 5-10, 2018.

29
Machine Learning Technique for Early Prediction of Disease from Various Symptoms

[22] ]Rati Shukla, Vikash Yadav, Parashu Ram Pal and Pankaj Pathak, ”Machine
Learning Techniques for Detecting and Predicting Breast Cancer” IJITEE, ISSN:
2278-3075, Volume-8, pp. 2658-2662, 2019.

[23] Rashmi G Saboji and Prem Kumar Ramesh,“A Scalable Solution for Heart Dis-
ease Prediction using Classification Mining Technique” IEEE, 978-1-5386-1887-
5/17, pp. 1780-1785, 2017.

[24] Anjan Nikhil Repaka, Sai Deepak Ravikanti and Ramya G Franklin,”Design And
Implementing Heart Disease Prediction Using Naives Bayesian” IEEE Xplore
Part Number: CFP19J32-ART; ISBN: 978-1-5386-9439-8, pp. 292-297, 2019.

[25] Lambodar Jena and Ramakrushna Swain, “Chronic Disease Risk Prediction us-
ing Distributed Machine Learning Classifiers” IEEE, 978-1-5386-2924-6/17, pp.
170-173, 2017.

[26] Ankita Dewan and Meghna Sharma, “Prediction of Heart Disease Using a Hybrid
Technique in Data Mining Classification” IEEE, 978-9-3805-4416-8/15, pp. 704-
706, 2015.

[27] Md. Ehtisham Farooqui, Dr. Jameel Ahmad,”A Detailed Review on Disease Pre-
diction Models that uses Machine Learning”’, International Journal of Innova-
tive Research in Computer Science & Technology (IJIRCST) ISSN: 2347-5552,
Volume- 8, Issue- 4, July- 2020

[28] A. Sankari Karthiga1, M. Safish Mary, M. Yogasini ”Early Prediction of


Heart Disease Using Decision Tree Algorithm”International Journal of Advanced
Research in Basic Engineering Sciences and Technology (IJARBEST)ISSN
(ONLINE):2395-695X Volume- 3, March 2017

30

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy