Mini Project Report
Mini Project Report
Submitted to
Jawaharlal Nehru Technological University, Hyderabad in
partial fulfilment of the requirements for the award of Degree of
Bachelor of Technology in
By
S. TEJASREE 216Y1A6692
R. SOWMYA 216Y1A6684
P. SPURTHY 216Y1A6673
SIDRA MAHEEN 216Y1A6696
Ms. M. SWETHA
Asst. Professor
2024-2025
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE
External Examiner
ACKNOWLEDGEMENT
We wish to take this opportunity to express our deep gratitude to all the people who
have extended their cooperation in various ways during my project work. It is our pleasure to
acknowledge the help of all those individuals.
We would like to thank our project guide Ms. M. Swetha, Asst. Prof., Computer Science
and Engineering Department for his guidance and help throughout the development of this
project work by providing us with required information. With his guidance, cooperation and
encouragement we had learnt many new things during our project tenure.
We would like to thank our project coordinator Mr. V. SRINIVAS, Asst. Prof.
Computer Science and Engineering Department for his continuous coordination throughout the
project tenure.
We specially thank Dr. E. SUDARSHAN, Asst. Prof., HOD, Computer Science and
Engineering Department for his continuous encouragement and valuable guidance in bringing
shape to this dissertation.
In completing this project successfully all our faculty members have given an excellent
cooperation by guiding us in every aspect. We also thank our lab faculty and librarians.
S. TEJASREE 216Y1A6692
R. SOUMYA 216Y1A6684
P. SPURTHY 216Y1A6673
SIDRA MAHEEN 216Y1A6696
ABSTRACT
Diabetes Mellitus is a serious health issue that affects many individuals globally. It is often
caused by factors such as aging, obesity, lack of physical activity, genetic predisposition, unhealthy
eating habits, and high blood pressure. People with diabetes are at greater risk of complications
like cardiovascular diseases, kidney problems, strokes, vision impairments, and nerve damage.
Currently, healthcare professionals rely on various diagnostic tests to gather patient information
and provide suitable treatments. The use of Big Data Analytics has become increasingly important
in the healthcare industry due to the vast amount of data generated. This technology allows for
analysing large datasets, uncovering hidden trends, and making predictions. However, existing
methods for predicting and classifying diabetes often lack high accuracy.
To address this issue, this paper proposes an advanced diabetes prediction model that considers
additional external factors along with standard parameters like glucose levels, BMI, age, and
insulin. The updated dataset used in this model improves classification accuracy compared to
traditional datasets. Additionally, a pipeline framework has been developed to enhance the
efficiency and precision of diabetes prediction
CONTENTS
1. 1 INTRODUCTION
1.1. Overview
1.2. Objectives
1.3Methodology
2. LITERATURE SURVEY
3. IMPLEMENTATION
5.1. Conclusion
5.2. Future Scope
REFERENCES
CHAPTER 1 INTRODUCTION
1.1 INTRODUCTION
Diabetes is one of the diseases and lots of people suffer from this disease. There are many causes
for diabetes. The main causes are heredity diabetes, overweight, inactivity, lack of exercise,
lifestyle, high blood pressure, etc. Diabetes is a disease caused because of high glucose level or
sugar level in human body and also known as Diabetes Mellitus. Diabetes is major cause for heart
stroke, kidney failure, blindness, etc and it is the major cause of death in the world. So, early
prediction of diabetes can control and save the human lives. There are two types. They are Type
1 diabetes and Type 2 diabetes.
Type 1 Diabetes:
It is IDDM - Insulin Dependent Diabetes Mellitus.
In this type of diabetes, pancreas do not produce sufficient amount of insulin while helps to regulate
the human body.
Type 2 Diabetes:
It is NIDDM – Non-Insulin Dependent Diabetes Mellitus.
In this type of diabetes, cells do not respond to the insulin which is produced in the body.
Machine Learning is a technology in which computers learn automatically from the past data.
Algorithms in ML are efficient in finding result to collect knowledge by various classification and
ensemble models from the data. These machine learnings are capable of predicting diabetes. So,
machine learning techniques are applied on PIMA Indian diabetes dataset for prediction. These
models are accurate in predicting diabetes.
1.2 OBJECTIVES
1. The main objective of the project is to predict whether the person is having diabetes
or not.
3. To predict the diabetes earlier so that it can be controlled by taking any prevention
measures.
1.3 METHODOLOGY
2. K-Nearest Neighbor:
It is supervised machine learning algorithm. This algorithm assumes that similar
things are very near to each other. Based on the similarity, it helps to group new work. For new
data, this algorithm finds the closest data points that is ‘k’ number of neighbors and it calculates
the distance between them. Based on the calculated distance and similarity, it classifies the new
data point.
In this project, data collected from the dataset. Data preprocessing will be performed. In this
process, data standardization will be done. After this process, data will be split into two parts. They
are training data and testing data. Here, in this project, training data is 80% of dataset and testing
data is 20% of dataset. A suitable algorithm or framework will be selected for the prediction. A
comparison of various prediction algorithms will be done and best suitable algorithm will be
chosen for prediction. Based on the requirements, software and hardware will be selected. After
selection, algorithm will be implemented with dataset. Based on the results, more accurate model
will be found. That algorithm will be used for the prediction.
CHAPTER 2 LITERATURE REVIEW
Machine Learning Algorithms allow machines to learn on their own. These algorithms are actively
used in real life. It helps the computers learn automatically from the past experiences. With ML,
machines can perform activities automatically without any human intervention.
Boshra Farajollahi, Mayam Mehmannavaz, Mohammad Javad Sayadi,, Hafez Mehrjoo, Fateme
Moghbeli, in ‘Diabetes Diagnosis using Machine Learning’ demonstrated a comparison of
algorithms like Logistic Regression, Decision Tree, Support vector machine, XG Boost, Random
Forest, Ada Boost. Performance measures based on Accuracy, F1-score, Recall, Precision are
compared in this paper.
‘Predicting Diabetes Mellitus with Machine Learning Techniques’ (2018) by Quan Zou, Kaiyang
Qu, Dehui Yin, Yamel Luo, Ying Ju and Hua Tang implemented machine learning algorithms and
found accurate model using neural networks. They identified more accurate model as best model
for predicting diabetes.
In “Diabetes Prediction using Machine Learning Algorithms” (2019) by Dr. Vaidehi V, Aishwarya
Mujumdar compared the accuracies of two different datasets using different machine learning
algorithms and pipelining. Among those, one of the dataset is named as PIMA Indian diabetes
dataset and other dataset.
Faizan Zafar, Muhammad Umair Khalid, Saad Raza, in ‘Predictive Analysis in Health care for
Diabetes Prediction’ implemented machine learning algorithms and compared the performance
based on F1 Score. They used artificial intelligence in bioinformatics for the discovery of
knowledge and for the prediction of future occurrences.
CHAPTER 3 IMPLEMENTATION
In this project, dataset contains attributes such as number of pregnancies, glucose, blood pressure,
skin thickness, insulin, body mass index, diabetes pedigree function and age.
Data Collection:
The data is collected from the dataset named as PIMA Indian diabetes dataset. It contains
the details of 767 members. It contains 768 rows and 9 columns.
Data Preprocessing:
• The process of transformation of raw data into an understandable format is known as Data
Preprocessing.
• The data should be made such a way that it should be suitable for making machine
learning model.
• It is the most important step to create ML model.
• Quality of data should be checked in this process.
• Quality is checked by completeness, accuracy, timelines and etc.
Data Standardization:
• This data standardization is an important step or a technique.
• This is performed as a preprocessing step.
• This process is performed to prevent the dataset features with wider ranges from the
dominating distance.
• It is the process of standardizing the range of features of the input dataset.
Splitting Data:
Prediction:
• The applied machine learning model gives the prediction based on the training and testing
datasets.
• In this project, prediction is to predict diabetes.
• The machine model gives result whether the person is having diabetes or not.
• Accuracy of the applied machine learning models is compared.
• The more accurate model to predict diabetes is found.
K-Nearest Neighbor:
It is a Supervised Machine Learning algorithm. This algorithm is mainly used in pattern
recognition
Advantages:
• This algorithm contains the less training period which is nearly low.
• The updating of the new data will not affect the prediction of the previous data.
• This is the easiest algorithm used to implement.
Disadvantages:
• This algorithm will be harder to predict the data in the large-scale data sets.
• When the dimensions in the data will be high also this data will not work properly.
3.1 FLOWCHART
A Software Requirement Specification is a context which will discuss about the specification
which are required by a system to the given project. This documentation is also the guide for the
project that we are explaining. This is the basic document that is used to run the given project or
the application that we provided.
Some rules are must followed to prepare the SRS document. This will have the many conditions
that are used such as the functional requirements, purpose, scope, and the components such as
software and the hardware.
Not only this it also contains some other requirements such as the information about the system
that we have to use and also the security that need to be proper.
HARDWARE REQUIREMENTS:
SOFTWARE REQUIREMENTS
HTML
Code of the HTML will display webpage. The “tags” are used by the document by the web browser
uses display the information on the monitor. It is the markup language that tries to create the
webpages.
HTML
• Hyper Text Markup Language.
• This uses markups and the structure to describe the webpage.
• HTML elements are used HTML pages.
• Tags are used in the HTML elements.
• Others tags are used in the HTML are “heading”, “head”, “title”.
• Tags are not displayed on the webpage, but contents of the tags are used in the
browser.
An HTML code will use .html or .html as the file extension to save the file.
Notepad/Notepad++ is used for the text editor of the HTML.
To save the HTML file, file name should be saved with an extension as .html. As the commonly
three letter extensions are used but using HTML as it is very safe. The browser will check every
single letter as consideration such that Indexhome.html and IndexHome.html are treated
separately.
The HTML contains the two types of tags that are the logical and the physical tags. The
logical tags defines that the text that contain the more importance in the web page to the browser
such that tag as <strong> contain………</strong> is used in the context.
Physical tags, are the tags that has that show the exact instruction to the tag. These are used
to display the content in the web page. Some of the examples are:
<head>
<title>Webpage</title>
</head>
<body>
</body>
</html>
<b>This is webpage</b>
The HTML starts with a tag: <p>
The content in HTML is: This text is bold The HTML element ends with an end tag: </p>
<body>
homepage. <b>Webpage</b>
</body>
<body> is the starting tag and ending with </body>. This is the body tag. This tag will represent
the HTML document body. Hence it the most important tag in the HTML document as it defines
the document’s body.
Nested Tags:
Nested tags are used for identification of condition for the given text. Here the <body> also contain
the other tags which are used such as <b> tag and the <p> tag. The main rule in the HTML is when
we use the multiple tags in the document such that every tag that opens at the last should be closed
first at the end.
Such as:
Tag Attributes:
HTML contains some attributes. Attributes provide more information in the HTML. The
tag will notify the operation, but attribute will the tag about the process for performing the
operation. Example, if the bg used in the tag than the background color attribute is used to
change the color in the web page. As shown
CSS
The CSS means Cascading Style Sheets, which is used design the webpage. It is a simple process
to the web pages get decorated with them. We can also change the image setting and designs too
by using the style sheet.
Web page is used for finding the good part of the CSS. Here many options are provided such as
that we change the text color and the style, font and many other changes can be changed at the web
page by using the CSS.
It will use HTML and also with the Extensible Hypertext Markup Language.
HTML document will be very attractive when we use the CSS on the HTML document, it will
increase the presentation content.
A CSS also contain some rules and regulation to attach the document with the elements of the
HTML.
CSS contain three parts:
to Insert CSS:
There are three ways to insert.
• External
• Internal
• Inline
1. External Style sheet:
Using this, entire website can be changed so that we can emphasis and use the external style sheet.
<link> tag is used for element to go program inside the section.
It is a property so that <style> in the HTML document. It is used in the HTML page. It is used in
the head section.
Here, in this project, I have used internal style sheet. <style> in the HTML document.
PYTHON
• It uses the natural English language to programing, hence understanding this code will be
much easier compare to the other programing languages. It is designed for the application
level programs in the real time examples.
• Every line can be executed at a same time in python. Python code is an Interpreted which
means the usage of compiler is not necessary for the whole code.
PYTHON MODULES:
Python module means a library which contain the techniques or the statements which are used for
the application that we are developing.
Python is more famous due to these libraries because the usage of coding will be easier when
compare to other languages.
We use the “import” statement for calling the libraries that we want to use in the code.
• NumPy: It is numerical module which is used for calling the array objects with
multidimension of arrays.
FLASKWEB FRAMEWORK
Web Framework:
This web application contains the many modules and the packages which are used for the web
development. Web Application Framework is used for the web pages without using the low-level
programming. we can create an application that we want to use at the development of our code
execution.
What is Flask?
We use this Flask code for creating the web applications at the user level. This Flask in mainly
using the Werkzeug it is also known as the WSGI toolkit.
• Python web applications are created by using the Flask web framework.
• Using flask is easier compare to the other python web frames as Django and Tikinter.
• In real world applications many sites use the Flask framework such as Pinterest, LinkedIn
etc.
WSGI
WSGI will create the relation between the web application and the web server. Python web
applications use the web server gateway interface (WSGI) for the application development.
Hence, we use the WSGI for python application development.
How to run a Flask app?
To know the process of running the Flask for the following code.
CHAPTER 4
In this chapter, implementation of algorithm or frame work is described. Results will be explained
in terms of tables.
In our project, The Analysis of the algorithm is done by the dataset namely (diabetes.csv) PIMA
Indian diabetes dataset. The Algorithms used in this project are
• Support vector machine algorithm.
• K-Nearest neighbor algorithm.
• Random forest algorithm.
app = Flask(__name__)
@app.route('/') def
home():
return render_template('index.html')
@app.route('/predict', methods=['POST'])
def predict(): if request.method ==
'POST':
preg = int(request.form['pregnancies'])
glucose = int(request.form['glucose']) bp
= int(request.form['bloodpressure']) st =
int(request.form['skinthickness']) insulin
= int(request.form['insulin']) bmi =
float(request.form['bmi']) dpf =
float(request.form['dpf']) age =
int(request.form['age'])
if __name__ == '__main__':
app.run(debug=True)
<style> body
{
width: 100%; height:100%;
font-family: 'Open Sans', sans-serif; background:
rgb(91,77,133);
background: linear-gradient(0deg, rgba(91,77,133,1) 5%, rgba(128,44,102,1) 100%); color:
white; font-size: 18px; text-align:center; letter-spacing:1.2px; background: -ms-radial-
gradient(0% 100%, ellipse cover, rgba(104,128-+138,.4) 10%,rgba(138,114,76,0) 40%), -ms-
linear-gradient(top, rgba(57,173,219,.25)
0%,rgba(42,60,87,.4) 100%), -ms-linear-gradient(-45deg, #670d10 0%,#092756 100%);
}
.login {
position: absolute;
top: 27%; left:
50%; margin: -200px
; width:400px;
height:100%;
}
input {
width: 100%; margin-
bottom: 10px; background:
rgba(0,0,0,0.3); border: none;
outline: none; padding: 10px; font-size: 13px; color: #fff; text-align:
left; text-shadow: 1px 1px 1px rgba(0,0,0,0.3); border: 1px solid rgba(0,0,0,0.3);
border-radius: 4px; box-shadow: inset 0 -5px 45px rgba(100,100,100,0.2), 0 1px 1px
rgba(255,255,255,0.2);
-webkit-transition: box-shadow .5s ease;
-moz-transition: box-shadow .5s ease;
-o-transition: box-shadow .5s ease; -ms-
transition: box-shadow .5s ease; transition:
box-shadow .5s ease;
}
input:focus { box-shadow: inset 0 -5px 45px rgba(100,100,100,0.4), 0 1px 1px
rgba(255,255,255,0.2); }
</style>
</head>
<body>
<div class="login">
<h1>
<font face="times new roman">Diabetes Prediction<br> </font>
</h1>
<form action="{{ url_for('predict')}}"method="post">
<font face="times new roman"> Number of Pregnancies </font>
<input class="form-input" type="text" name="pregnancies" placeholder="Number of
Pregnancies eg. 0 for male"><br>
<font face="times new roman">Glucose</font>
<input class="form-input" type="text" name="glucose" placeholder="Glucose (mg/dL)
eg. 80"><br>
</body>
</html>
</html>
Result.html
<!DOCTYPE html>
<html lang="en" dir="ltr">
<head>
<meta charset="utf-8">
<title>Diabetes Predictor</title>
<style>
CSS Code
</style>
</head>
<body>
<!-- Result -->
<div class="results">
{% if prediction==1 %}
<h1>Prediction: <span class='danger'>You have a greater risk of having diabetes. Please consult
your doctor.</span></h1>
<img class="gif"
src="https://i.pinimg.com/736x/e3/7e/14/e37e14e207070d62cfc4d0b050f3ad91.jpg"
style="width:40%">
{% elif prediction==0 %}
<h1>Prediction: <span class='safe'>You are safe. You have a lower risk of
diabetes.</span></h1>
<img class="gif" src="https://encrypted-
tbn0.gstatic.com/images?q=tbn:ANd9GcQRjSPoym3xeoeYjxzig1UAF0GUjP7ee75oVF3kL91
Lt5xwzxrrIIV3DMFD-koLCx4z0w&usqp=CAU" style="width:35%">
{% endif %}
</div>
</body>
</html>
4.2 RESULTS AND DISCUSSION
In this chapter, detailed output analysis of algorithm is described.
4.2.1 Data set used in the project:
Sample Data sets is shown in the Figure Fig.4.1
Outcome
0 – Presence of diabetes
1 – Absence of diabetes
4.2.3 Prediction
Fig 4.3 Interface
5.1 CONCLUSIONS
In this paper, some of the algorithms are explained and implemented for prediction of diabetes.
This method helps in predicting whether the person having diabetes or not. ML algorithms used in
this paper are of classification and ensemble learning. They are K-Nearest neighbor, Support vector
machine and random forest algorithms.
MODEL ACCURACY
Among them, Random forest algorithm gives higher accuracy and it helps in predicting diabetes.
Predicting Diabetes in its early stages will be the key treatment.
This work can further exetended to predict the type of diabetes the person is having. And this can
also be extended to find how people without diabetes can have diabetes in the next few years. This
can be extended to find the type of diabetes the person having.
REFERENCES
1. Mitushi Soni, Dr. Sunita Varma, “Diabetes Prediction using Machine Learning
Techniques”, IJERT Volume:09, Issue 09, September-2020.
2. Lomani Nayak, Dr. Gayatri S Pandi, “Diabetes Disease Prediction using Machine
Learning”, IRJET Volume:08, Issue:01, January 2021.
3. Aishwarya Mujumdar, Dr. Vaidehi V, “Diabetes Prediction using Machine Learning
Algorithms”, ICRTAC, 2019.
4. ‘Diabetes Diagnosis using Machine Learning’ by Boshra Farajollahi, Mayam
Mehmannavaz, Hafez Mehrjoo, Fateme Moghbeli, Mohammad Javad Sayadi.
5. ‘Predicting Diabetes Mellitus with Machine Learning Techniques’ by Quan Zou, Kaiyang
Qu, Yamel Luo, Dehui Yin, Ying Ju and Hua Tang, 06 November 2018.