Disease Prediction Project
Disease Prediction Project
Department of
ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
By
SRIDHISHNEE S REG NO. 22AD135
SOWMYA A REG NO. 22AD132
CHENNAI- 69
CERTIFICATE
This is to certify that the “Core Course Project” Submitted by Sowmya A and
Sridhishnee S (Reg no: 22AD132 & 22AD135 ) is a work done by him/her and
submitted during 2023-2024 academic year, in partial fulfilment of the
requirements for the award of the degree of in at Chennai Institute of Technology.
We express our gratitude to our Chairman Shri.P.SRIRAM and all trust members of
Chennai institute of technology for providing the facility and opportunity to do
this project as a part of our undergraduate course.
We are grateful to our Principal Dr.A.RAMESH M.E, Ph.D. for providing us the
facility and encouragement during the course of our work.
We would like to extend our thanks to our faculty coordinators of the Department of
Artificial Intelligence and Data Science, for their valuable suggestions throughout this
project.
We wish to extend our sincere thanks to all Faculty members of the Department of
Artificial Intelligence and Data Science for their valuable suggestions and their kind
cooperation for the successful completion of our project.
We wish to acknowledge the help received from the Lab Instructors of the Department
of Artificial Intelligence and Data Science and others for providing valuable suggestions
and for the successful completion of the project.
The health care systems collects data and reports from the hospitals or patient's
database by machine learning and data processing techniques which is employed to
predict the disease so as to create reports supported the results which used for
various kinds of predictions for disease and which is that the leading explanation for
the human's death since past years. Medical reports and data had been extracted
from various databases to predict a number of the required diseases which are
commonly found in people nowadays breast cancer, heart disease and diabetes
disease and make their life more critical to measure. Nowadays technology
advancement within the health care industry has been helping people to create their
process easier by suggesting hospitals and doctors to travel to for his or her
treatment, where to admit and which hospitals are the simplest for the treating the
desired disease. we've implemented this sort of system in our application to form
people’s life simpler by predicting the disease by inputting certain data from their
reports which can give the result positive or negative supported the disease
prediction they are going to be having a choice to get recommendation of best
hospitals with best doctors nearby from the past users or guardians..
Content
Chapter Tittle Page No.
No
Introduction
Properly analyzing clinical documents about patients’ health anticipate the possibility
of occurrence of various diseases. In addition, acquiring information regarding
specialists of that particular disease as per the requirement facilitates proper and
efficient diagnosis. This Project provides a novel method that uses data mining
technique, namely, Logistic regression and random forest classification algorithm for
prediction of disease. Using medical profiles such as heart rate, blood pressure through
sensors and other externally observable symptoms such as fever, cold, headache etc.
that patient has, prediction of likelihood of a disease is done. Logistic regression and
random forest classification algorithm takes these symptoms and predicts disease.
Furthermore, all the needful and adequate information regarding the predicted disease
as well as the recommended doctors is provided. Recommendation (Future
implementation) suggests the location , contact and other necessary details of the
disease specialists based on the filters chosen by the user out of less fees, more
experience, nearest location and feedback reviews of the doctors. algorithm. Thus user
can get appropriate treatment and necessary medical advice as fast as possible.
Additionally, users provide their feedback for the recommended doctors which are then
added for analysis in order to make further recommendations based on reviews.
Healthcare industry generates terabytes of data every year. The medical documents
maintained are a pool of information regarding patients. The task of extracting useful in
formation or quality healthcare is tricky and important. By analyzing these voluminous
data we can predict the occurrence of the disease and safe guard people. Thus, an
intelligent system for disease prediction plays a major role in controlling the disease
and maintaining the good health status for people by providing accurate and
trustworthy disease risk prediction.
Literature Survey
Architecture Diagram
Existing System
• Several online healthcare system has invented new ideas to benefit people and so
many online applications have features to give recommendations on hospital and
doctors.
• But they have a lack of reliability and accuracy where they need to do improvisations
in the features and modules. Genuinely health care systems might not upload the
opinions of people in some cases for the negative response and by doing manually
while collecting feedback from the patients, might be patients hesitate to give complete
opinions of doctors or hospitals in front of persons where we will find the lack of quality.
• In total we have not found all features and modules at a time in one application and
there are different types of applications for different types of diseases where they have
different applications separately for doctors and hospitals to give recommendations.
• In this research we have found the solution for the issues faced in an existing system
where we have proposed accuracy, reliability and efficiency by developing the features
of three diseases called Heart disease, cancer disease and diabetes where we will find
the most common diseases in people health and we have installed in one application
with the prediction of three diseases by analyzing the symptoms collected from the
patient’s record and taking positive and negative opinions from patient’s according to
that we will give ratings to the hospitals and doctors from best to worst.
• Guardian’s opinions are also very important and they can give feedback on them like
how they treat their patients. Was it friendly or strict? And how the hospital
management is? Was it clean? How is the hospitality? When the feedback comes online
patients and guardians can give both positive and negative opinions completely without
any hesitation.
• Based on that we can provide truthful recommendations of hospitals and doctors for
the people and can predict the results. According to that prediction of a particular
disease, we will predict the best suitable hospital and doctor to consult and to get
admitted into it.
The primary objective of the "Disease Prediction Using Machine Learning" project is to
design and implement an innovative healthcare system that leverages machine
learning and data analytics to significantly improve the accuracy and early detection of
diseases, personalize disease risk assessment for individuals, foster proactive
healthcare interventions, optimize resource allocation in the healthcare system,
enhance patient care by enabling informed decision-making, ensure robust data
security and compliance with privacy regulations, and lay the foundation for further
advancements in the field of machine learning and healthcare analytics. This project
aspires to revolutionize disease prediction, ultimately resulting in better patient
outcomes, reduced healthcare costs, and a substantial contribution to the evolving
landscape of data-driven healthcare.
Modules used
The user has to input the data where it will be stored in database and then according to
their choice the prediction will be made. After collecting the user data from the
database and the choice of predicting the disease is to be predicted. If negative then
end the process and if positive the user will get hospital recommendations (future ) at
which their best treatment can be done.
• Application Flowchart.
• Data collection (from the user) to make dataset. • Importing packages. • Data pre-
processing.
• Data fitting and training.
• Prediction as opted by the user.
• Result or output.
Methodology
This project employs a diverse range of machine learning algorithms, including but not
limited to deep learning, decision trees, and ensemble techniques, to analyze large-
scale healthcare data. The methodology encompasses data collection, preprocessing,
feature selection, model training, and validation. Furthermore, it utilizes electronic
health records, patient demographics, and clinical history as valuable inputs for the
machine learning models.
Results
Technology Used
• Pandas : Pandas is a software library written for python for data manipulation and
analysis. It offers data structures and operations for manipulating numerical tables and
time series.
• Numpy: It is a library for the Python Programming Language, adding support for large,
multiple-dimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays.
• Scikit-learn Package: Scikit-learn is a free machine learning library for python. It
features various algorithms like SVM, Random Forest, K-neighbours, and Decision Tree.
• Confusion Matrix: A confusion matrix is a technique for summarizing the performance
of a classification algorithm. Classification accuracy alone can be misleading if you have
an unequal number of observations in each class or if you have more than two classes
in your dataset. Syntax: from sklearn.metrics import confusion_matrix. 19
• Classification Report: The classification report visualizer displays the precision, recall,
F1, and support scores for the model. Syntax: from sklearn.metrics import
classification_report.
• Accuracy Score: Accuracy is one metric for evaluating classification models.
Informally, accuracy is the fraction of predictions our model got right. Formally,
accuracy has the following definition: Accuracy = Number of correct predictions / Total
number of predictions. Syntax: from sklearn.metrics import accuracy_score.
Data Training
Model fitting is a measure of how well a machine learning model generalizes to similar
data to that on which it was trained. A model that is well-fitted produces more accurate
outcomes. A model that is overfitted matches the data too closely. A model that is
underfitted doesn't match closely enough Training data is the initial dataset used to
train machine learning algorithms. Models create and refine their rules using this data.
It's a set of data samples used to fit the parameters of a machine learning model to
training it by example. Training data is also known as training dataset, learning set, and
training set. It's an essential component of every machine learning model and helps
them make accurate predictions or perform a desired task.
Training Data Set Glimpse
Testing Data Set Glimpse
Project Photos
Conclusion
Reference
[1] M. Denil, D. Matheson, and N. De Freitas, “Narrowing the Gap: Random Forests In
Therein, M., Matheson, D., & De Freitas, N. (2014). Narrowing the Gap:
[2] W. Bergerud, “Introduction to logistic regression models with worked forestry
examples: biometrics information handbook no. 7,” no. 7, p. 147, 1996.
[3] Watson, F. Marir "Using retrospect, they concluded that non-Spanish whites on
average tend to go to hospitals that offer a better patient experience for all patients
compared to hospitals commonly used by African American, Hispanic, Asian / Pacific
Islander, or multiracial patients" 1994.
[4] Binal A. Thakkar, Mosin I. Hasan, Mansi A. Desai, "Healthcare decision support
system for swine flu prediction using naïve bayes classifier",IEEE", 101-105,2010.
[5] Disease Prediction and hospital recommendation using machine learning algorithm,
www.academia.edu
[6] Random forest algorithm, javapoint.com 38
[7] Logistic regression algorithm, javapoint.com
[8] Youtube.com, for application reference
[9] Disease prediction and doctor recommendation system, International Research
Journal of Engineering and Technology (IRJET)
PO & PSOAttainment
PO 1 Engineering knowledge
Yes / No
Design/Development
PO 3 of solutions Yes / No
Conduct investigation
PO 4 of complex problem Yes / No
Environment
PO 7 and Yes / No
Sustainability
PO 8 Ethics Yes / No
PO 10 Communication Yes / No
Project management
PO 11 Yes / No
and finance