0% found this document useful (0 votes)
85 views4 pages

Machine Learning For The Prediction of Infectious Diseases: Ntroduction

This document discusses using machine learning to predict infectious diseases. It reviews previous research from 2010-2020 on disease prediction models and how they compare. It then looks at studies implementing these models for specific diseases. Finally, it compares different machine learning algorithms and their ability to predict outbreaks. The goal is to determine how machine learning can best be used to predict infectious diseases by analyzing past literature and approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views4 pages

Machine Learning For The Prediction of Infectious Diseases: Ntroduction

This document discusses using machine learning to predict infectious diseases. It reviews previous research from 2010-2020 on disease prediction models and how they compare. It then looks at studies implementing these models for specific diseases. Finally, it compares different machine learning algorithms and their ability to predict outbreaks. The goal is to determine how machine learning can best be used to predict infectious diseases by analyzing past literature and approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Machine learning for the prediction of infectious

diseases

Abstract—Infectious diseases harm both the health and eco- domain, as clearly shown in the imaging techniques used to
nomic sectors of any society. By predicting an outbreak of diagnose COVID-19 [4], we gain a deeper understanding of
such diseases, the appropriate parties can better prepare, and what we currently know. Through the help of Artificial intelli-
in certain cases, avoid the outbreak altogether. For this rea-
son, we looked into how machine learning is used to predict gence, we can tackle highly complex problems, such as disease
infectious diseases. Through this research, we looked into the prediction and possible cures for said diseases. Furthermore,
current models available that use structured data and how artificial intelligence algorithms allow us to be proactive rather
they scale up against other similar infectious disease prediction than reactive, simply by modelling and predicting futuristic
algorithms. Following this, we looked into how algorithms can events using what we have observed and learnt from history.
use unstructured data to achieve the same goal. When comparing
both approaches, it was evident that unstructured data can This type of information minimizes negative impact and allows
positively impact this field of study, seeing the high percentage of us to avoid unnecessary danger
prediction rate obtained. However, further research is required The main goal of this research is to see how ML can be used
which implements unstructured data as their basis for prediction. to predict infectious diseases. We will start by reviewing and
analyzing the approaches studied in the systematic review [10].
Index Terms—Machine Learning, Predication, Detection, Dis-
ease Outbreak, Infectious Diseases Said research will give us an insight into the research between
2010 and 2020 and how models scale against each other.
Afterwards, using this newly gained information, we will look
I. I NTRODUCTION
into [8]. These types of research provide an implementation of
In recent years, a pandemic struck worldwide, which to this the previous research within a specific disease, thus providing
day is still ongoing. As a result, COVID-19, categorized as elaborate detail about the prediction of outbreaks. Finally, this
an infectious disease, has (approximately) killed up to 10.5 paper compares algorithms and their heuristics.
million people [1]. Fortunately, through the advancements of We will start by looking into how we selected the literature
technology, more specifically Artificial Intelligence (AI), we for this paper in elaborate detail. Following this, we will
were able to find a vaccine in a shorter period when compared discuss our results, and finally, we will argue our findings
to Influenza, another infectious disease that caused a pan- and conclude any gaps in the current knowledge.
demic. Furthermore, AI aided the health sector by contributing
during COVID-19 [2], [3], such as screening and evaluating II. M ETHOD
COVID-19 on CT Images [4] and helping combat COVID-19 We carried out a methodical search to find relevant sources
through its entire life cycle [5]. that provide critical insight into our research question. Our
Apart from the negative impact on health, we observe field of research is very diverse by nature. Therefore we
that both diseases negatively impacted different economic dedicated ample time to come up with suitable search criteria.
sectors. For example, “Spanish flu reduced real GDP per We started our research by deriving a list of critical keywords
capita by around 6 per cent in the typical country over the from our objective. Below we see a comprehensive list of such
period 1918–21” [6]. Moreover, recent statistics [7] show that keywords:
COVID-19 has had unimaginable negative consequences on • Predict / Prediction
the economic and financial sectors. • Detect / Detection
Although both are generic phrases used every day, AI and • Machine Learning
Machine Learning (ML) differ. AI is a technology that allows • Infectious disease(s)
us to create intelligent systems that can simulate human intel- • Review
ligence. In contrast, ML, a subfield of AI, enables machines These were then formulated into the following query string:
to learn from past experiences without explicit programming. (“Predict” OR “Prediction”) AND (“Detect” OR “Detec-
Throughout the past two decades, we slowly integrated AI tion”) AND (“Machine Learning”) AND (“Infectious disease”
and ML into our day-to-day life. From the health and financial OR “Infectious diseases”) AND “Review”
sectors such as hospitals and banks to the leisure sectors like Using the Technical University of Delft Library Website
gaming and betting companies. These companies are turning as our search database, our complete query string resulted
to AI to solve and aid in problems previously irresolvable. in 295 results. Even though these are ample, we decided to
Bioinformatics and disease detection are no exceptions to narrow down our results. Therefore, we experimented with
this. By implementing these technologies within the biological the different abbreviations of the keywords to get a dataset
that related more to our primary objective. After numerous • AQ is the relevance of assessment, defined by the authors
attempts, the following query string yielded the best and most and illustrated in Table I
concise results:
“predict” AND “detect” AND “machine learning” AND TABLE I
“infectious disease” AND “review” amounting to a total of Q UALITY A SSESSMENT Q UESTION , ADAPTED FROM [10, P. 3]
35 results. There was no other formatting and filtering applied
ID Ten Assessment Questions
through the university’s website. AQ1 Does the study define a main research objective or problem
For all 35 sources, the title was initially reviewed and related to the spread of deadly diseases outbreak (e.g.,
studied to see if it aims to tackle problems of interest to us. An prediction, detection, responses)?
AQ2 Does the study specify the relevant disease datasets used?
example of this is how ML was used to predict the outbreak. AQ3 Does the study specify the availability of these datasets
The source was disregarded if the title did not contribute or (e.g. public datasets, private datasets)?
was of little relevance to our main question. Otherwise, it was AQ4 Does the study define the parameters or variables used or
learnt by the machine learning algorithms?
saved locally for later review. AQ5 Does the study define the type of parameters used or learnt
After reviewing the titles of all the sources, we read and by the machine learning algorithms?
evaluated the abstract, introduction, and conclusion of any AQ6 Does the study specify the type of machine learning models
used (e.g. classification, regression, clustering) in solving
source we thought could contribute to our research. After the problem?
reviewing said source, if it was relevant to us and our research, AQ7 Does the study specify the individual models explicitly
an entry was recorded in EndNote of the reference, along with (e.g., neural network, linear regression)?
a short note highlighting critical points of the research and any AQ8 Does the study specify the evaluation measures (e.g.,
Accuracy, Precision, Recall, F-Measure, ROC) used to assess
essential characteristics. the performance of the proposed machine learning approach?
AQ9 Does the study specify the evaluation approaches (e.g.,
III. R ESULTS cross-validation, holdout) used to assess the performance of
the proposed machine learning approach?
Within this section, we will discuss the papers found using AQ10 Does the study specify the ensemble models (e.g.,
the method suggested in the Method section. To facilitate bagging, boosting) used and compare the performance
understanding, we will be splitting our results into subsections. with individual models?
The first subsection will tackle the systematic review [10].
This review gives us an in-depth analysis of the Machine
For their purposes, Rayner Alfred and Joe Henry Obit only
Learning models currently available, how these are used within
used research with a score greater than or equal to 0.65,
the disease detection domain and any drawbacks found from
denoted as good. Following the filtering and evaluation of the
these models. In the second subsection, we will be looking
papers and journals used, they proceeded to summarize and
at a specific approach and review the papers relating to their
categorize the relevant content of each paper at a technical
research.
level, that is, datasets and parameters, problems addressed and
A. Systematic Review assessment measures and used methods.
Within the systematic review [10], the authors illustrated Finally, the authors summarized their findings and listed
numerous ML models. However, as stated, their targeted down any observations made from this research. They noted
algorithms and evaluation criteria were those that can detect the following observations and conclusions:
and predict the spread and outbreak of a disease. Therefore, • Further research needs to be conducted on the ensemble
this review includes references from 2010 until 2020, and models/hybrid models based on deep learning methods
these were found using the search terms “Predicting Disease using multi-source data, primarily because these have
Outbreaks” and “Detecting Disease” using Machine Learning. demonstrated an improvement in the base model’s per-
In order to select the appropriate research and models, the formance.
authors created a model split into five stages. Through this • By integrating multi-sources data, a deeper understanding
model, they filtered through multiple papers and journals found is obtainable about a particular disease outbreak.
in Scopus to have diverse yet detailed research allowing them • Analysis of complex relationships between multi-sources
to conduct this study. After reviewing the relevant research, data can produce better modelling results.
methodically, the authors tabled their findings based on the • There is a limited number of studies revolving around
paper’s research question. unstructured data (e.g., blogs, websites and news articles),
They graded the research if said research question aligned even though these have demonstrated improvements in
with that of the systematic review, using the equation the prediction of disease outbreaks

|AQ| B. Outbreak prediction


1 X
Score(Sj ) = AQij , (1)
|AQ| i=1 Based on the research conducted by Juhyeon Kim and
Insung Ahn [8], their goal was to detect emerging infec-
where tious disease patterns using Internet news articles. They used
• Sj is the total score (ranging from 0 to 1) news articles related to infectious diseases for the year 2019.
Through different ML models, such as Support Vector Ma- is needed to cater to exceptional cases like seasonal infectious
chine (SVM), Semi-supervised Learning (SSL), and Deep diseases. Furthermore, the previous prediction factors should
Neural Network (DNN), they attempted to detect the pattern be considered (something which was not done in this research).
of emerging infectious diseases.
Within this publication, the authors gathered approximately IV. D ISCUSSIONS
1300 articles per day from 237 different counties and 115 As we have seen in subsection III-A, after evaluating
different diseases as their data set. For each article, numerous numerous research papers, the authors arrive at an under-
different types of data were extracted and evaluated. Specifi- standing regarding multi-source data. Using multi-source data
cally, the title of articles, description, published date, disease within the prediction of infectious disease provides noticeable
related to, and the latitude and longitude information. improvements to the algorithms, produces better modelling
results, and provides a better understanding of the disease
TABLE II outbreak.
ACCURACY COMPARISON USING SVM, SSL, AND DNN WITH TRAINING Even though unstructured data, such as news articles, have
SET USING 6 MONTHS OF DATA , ADAPTED FROM [8, P. 9]
shown improvements in the predictability of disease outbreaks,
Validation set period
Accuracy as stated by the authors of the systematic review, not enough
SVM SSL DNN research has been conducted in this domain. For this specific
2019.07.01–2019.09.30 0.736 0.834 0.803
2019.08.01–2019.10.31 0.724 0.839 0.805 reason, we decided to dedicate our efforts to researching
2019.09.01–2019.11.30 0.735 0.837 0.809 papers and journals which predict disease outbreaks using such
2019.10.01–2019.12.31 0.734 0.842 0.809 data and information.
Average 0.732 0.838 0.806
Juhyeon Kim and Insung Ahn [8] decided to detect disease
outbreaks using an unstructured data approach, specifically
new articles. They evaluated such data using different ma-
TABLE III chine learning models on three aspects: accuracy, ROC, and
ROC COMPARISON USING SVM, SSL, AND DNN WITH TRAINING SET F1 Score. Through their research, they demonstrated that
USING 6 MONTHS OF DATA , ADAPTED FROM [8, P. 9]
when dealing with such data, Semi-supervised Learning (SSL)
Validation set period
ROC yielded the best results when compared to the Support Vector
SVM SSL DNN Machine (SVM) and Deep Neural Network (DNN).
2019.07.01–2019.09.30 0.651 0.775 0.739
2019.08.01–2019.10.31 0.646 0.791 0.737
V. C ONCLUSION
2019.09.01–2019.11.30 0.657 0.810 0.756
2019.10.01–2019.12.31 0.648 0.788 0.751 Initially, we set out to see how Machine Learning (ML)
Average 0.650 0.791 0.746 can be used to predict infectious diseases, and through our
research, we have shown numerous algorithms and models
capable of this task. The systematic review [10] conducted an
TABLE IV exhaustive evaluation of the current research (from 2010 to
F1 S CORE COMPARISON USING SVM, SSL, AND DNN WITH TRAINING 2020) to determine which models are best suited to predict
SET USING 6 MONTHS OF DATA , ADAPTED FROM [8, P. 9]
a disease outbreak. They concluded that further research is
F1 Score needed on multi-source data and there is a lack of research
Validation set period
SVM SSL DNN for models that use unstructured data.
2019.07.01–2019.09.30 0.768 0.825 0.811
2019.08.01–2019.10.31 0.760 0.829 0.816
Following these findings, we investigated a paper [8] that
2019.09.01–2019.11.30 0.771 0.831 0.822 used news articles (a form of unstructured data) to predict an
2019.10.01–2019.12.31 0.778 0.842 0.826 outbreak around the world. Their results showed auspicious
Average 0.769 0.832 0.819 outcomes with Semi-supervised Learning (SSL) leading in
evaluation criteria.
Tables II III and IV compare the SVM, SSL, and DNN We have analyzed and evaluated different techniques and
models in terms of accuracy, ROC, and F1 scores respectively. algorithms used to predict outbreaks of different diseases
In all three domains, the SSL model provided the best per- through our research. Even though most of the research
formance. However, the SVM and DNN showed reasonable currently conducted uses structured data, we might be able
prediction performances. to obtain more accurate prediction readings by exploring
This prediction factor is affected by many different reasons, unstructured data. Through this advancement, we would be
such as climate, lifestyle, diplomatic relations, population able to notify the appropriate parties in a timely fashion, giving
and more. Countries with similar infectious disease outbreak them ample time to act or prepare against an outbreak.
patterns can be identified by analyzing the severity patterns of Unfortunately, this approach has some considerable limi-
various types of infectious diseases between countries. tations. For example, when working with unstructured data,
However, following this research, the authors suggest some the researchers must dedicate time to retrieve, parse, clean
further improvements to yield a better result following their and classify such data. Only then can this be fed into the ML
findings. Firstly, they suggest that a more systematic approach model and used to predict. Apart from this, one must also cater
to the noise (such as irrelevant or misreported news) within
their data set, which might negatively impact the prediction
outcome of the model.
R EFERENCES
[1] “Coronavirus excess deaths estimates,” 2021. [On-
line]. Available: https://www.economist.com/graphic-detail/
coronavirus-excess-deaths-estimates
[2] R. Vaishya, M. Javaid, I. H. Khan, and A. Haleem, “Artificial
intelligence (ai) applications for covid-19 pandemic,” Diabetes &
Metabolic Syndrome: Clinical Research & Reviews, vol. 14, no. 4,
pp. 337–339, 2020. [Online]. Available: https://www.sciencedirect.com/
science/article/pii/S1871402120300771
[3] “Artificial intelligence and digital transformation: early lessons from the
covid-19 crisis.” [Online]. Available: https://publications.jrc.ec.europa.
eu/repository/handle/JRC121305
[4] H. Xie, Q. Li, P.-F. Hu, S.-H. Zhu, J.-F. Zhang, H.-D. Zhou, and H.-B.
Zhou, “Helping roles of artificial intelligence (ai) in the screening and
evaluation of covid-19 based on the ct images,” J. Inflamm. Res., vol. 14,
pp. 1165–1172, 2021.
[5] “Using artificial intelligence to help combat covid-19.”
[Online]. Available: https://www.oecd.org/coronavirus/policy-responses/
using-artificial-intelligence-to-help-combat-covid-19-ae4c5c21/
[6] J. Bishop, “Economic effects of the spanish flu,” Reserve Bank of
Australia, no. June, 2020. [Online]. Available: https://www.rba.gov.au/
publications/bulletin/2020/jun/economic-effects-of-the-spanish-flu.html
[7] “Coronavirus: How the pandemic has changed the world economy,”
24/01/2021 2021. [Online]. Available: https://www.bbc.com/news/
business-51706225
[8] J. Kim and I. Ahn, “Infectious disease outbreak prediction using media
articles with machine learning models,” Scientific Reports, vol. 11, no. 1,
2021.
[9] S. V. Scarpino and G. Petri, “On the predictability of infectious disease
outbreaks,” Nature Communications, vol. 10, no. 1, pp. 1–8, 2019.
[10] R. Alfred and J. H. Obit, “The roles of machine learning methods in
limiting the spread of deadly diseases: A systematic review,” Heliyon,
vol. 7, no. 6, 2021.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy