Attachment 1
Attachment 1
INFECTION MODEL
Project Overview and Plan
2042827
1
ABSTRACT
I
Contents
ABSTRACT ..................................................................................................................................... I
List of Figures ............................................................................................................................... IV
CHAPTER 1 INTRODUCTION ................................................................................................... 1
1.1. Motivation ........................................................................................................................ 1
1.2. Problem Statement ........................................................................................................... 1
1.3. Aim And Objectives ......................................................................................................... 2
1.4. The Methodology ............................................................................................................. 2
1.5. Report Organization ......................................................................................................... 3
CHAPTER 2 BACKGROUND AND LITERATURE REVIEW ................................................. 5
2.1. Diabetes Background ....................................................................................................... 5
2.2. Machine Learning ............................................................................................................ 6
2.2.1. Supervised Learning ................................................................................................. 6
2.2.2. Unsupervised Learning ............................................................................................. 7
2.2.3. Reinforcement Learning ........................................................................................... 7
2.3. Related Works .................................................................................................................. 7
References ..................................................................................................................................... 10
II
List of Figures
Figure 1 CRISP-DM Steps.............................................................................................................. 2
Figure 2 The report structure .......................................................................................................... 4
Figure 3 Machine Learning's type and techniques .......................................................................... 6
IV
CHAPTER 1
INTRODUCTION
Today, Machine Learning (ML) is an incredibly powerful technique due to its efficacy in
addressing the massive amounts of data that surround us. The purpose of ML is to dig deeper
into large amounts of data, which is extremely complex and sometimes challenging for humans
to do. ML is one of the intelligence approaches with promising results in the classification and
prediction domains. Given the importance of classification in daily life, and machine learning
has been recognized as a means to classify in a wide range of applications. Machine Learning
algorithms are widely used in medical diagnosis, business, marketing and stock market
prediction, forecasting, detecting frauds, etc. In recent years, ML techniques have expanded
dramatically to using in Health care system as a developing industry requiring high predictive
precision.
1.1. Motivation
It is more difficult to use traditional methods or manually examine a huge amount of data,
especially for data in a medical system. This is because it takes more time, is less productive, and
is less efficient. On the other hand, machine learning provides outcomes that are more accurate,
less time-consuming, reproducible, and learn from previous computing. For this reason, there is
an urgent need to derive benefits from the huge amounts of medical data by employing them for
predictive and classification purposes.
The diagnosis of type II diabetes utilizing ML techniques and patient data acquired from clinical
trials and routine health check-ups is a crucial subject that still requires extensive research. This
is due to a variety of factors, including the quantity and quality of the datasets, as well as the
machine learning techniques that were implemented. These factors make it very challenging to
obtain optimal results since they must be carefully considered jointly.
1
CHAPTER 1
INTRODUCTION 2
The main aim of this project is to discover the relationship between patients' data and the level of
diabetes infection. Applying machine learning algorithms is essential to model collected data.
The results can determine significant factors influencing diabetes infection in patients. To
achieve this aim, some subgoals are involved:
1. Look at the latest research and find out what the major factors are that contribute to the
widespread spread of diabetes.
2. Several data mining techniques are used to prepare the dataset for analysis.
3. Building many models using machine learning algorithms.
This project will be based on the Cross-Industry Stander Process for Data Mining (CRISP-DM)
because is widely used in data science and predictive analytics project (Chapman). CRISP-DM
framework contains six major processes: business understanding, data understanding, data
preparation, modeling, evaluation, and deployment. Figure 1 shows CRISP-DM model.
Understanding the nature and purpose of the problem is the most crucial aspect of the entire
procedure. Therefore, before to initiating the life cycle, it is necessary to comprehend the problem
because a successful outcome depends on a thorough understanding of the problem. In Chapter 3,
we will go through each of these steps in greater detail.
CHAPTER 1
INTRODUCTION 3
The first chapter has already discussed about an introduction to machine learning, the problems
and methodology presented to solve this problem as well as the motivation behind this project
and its aim and objectives.
Chapter Two: presents the background of machine learning techniques, and its type also
provides a quick insight for literature review.
Chapter Three: conducts the design of this project and provides a review about the machine
learning algorithms which will used.
Chapter Four: evaluation of the algorithm's outcome-based analytical results.
Chapter Five: the conclusion, highlights the achievements made in this project.
Figure 2 depicts the report's content.
CHAPTER 1
INTRODUCTION 4
Diabetes complications can vary. The most common co-morbid conditions, however, are kidney
disease, amputations, blindness, cardiovascular disease, obesity, hypertension, hypoglycemia,
and the risk of a heart attack or stroke.
Diabetes-related healthcare costs are extremely expensive due to the disease's widespread nature.
Diabetes is becoming more common among young people. According to the American Diabetes
Association, the number of diabetic youths is expected to increase significantly by 2050. As a
result, the disease's occurrence among youths is expected to rise by 49 percent over the next 40
years.
Diabetes studies help people all over the world have better future. By analyzing the correlations
between these lifestyle factors and the spread of this disease, the researchers hope to find ways to
improve the quality of life for billions of people around the world (OCTC).
5
CHAPTER 2
BACKGROUND AND LITERATURE REVIEW 6
2.2.Machine Learning
The field of machine learning has emerged as a part of Artificial Intelligent (AI) which uses
algorithms and statistical theory to build mathematical models through learning from past
experiences. There are three main types of learning depicted in Figure 3, and it explained as the
following (Khadka):
In supervised learning, data sets and an idea of how the desired result should look like are
provided. There are two main types of supervised learning problems, Classification and
Regression.
Regression: Outputs are predicted in continuous values. We are trying to map input variables to
some continuous function. For example, predicting price of house according to the size. Here
price is function of sizes of house which is the continuous output.
Classification: Outputs are predicted in discrete values such as yes or no, true, or false, 0 or 1,
diabetes or not, male or female, positive or negative, and so on. e.g., The classification of
whether a person has diabetes or not is based on health data.
CHAPTER 2
BACKGROUND AND LITERATURE REVIEW 7
10