0% found this document useful (0 votes)

32 views29 pages

20.k1.0038 Proposal Project Report Kelar

Uploaded by

v3n4n.fw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views29 pages

20.k1.0038 Proposal Project Report Kelar

Uploaded by

v3n4n.fw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 29

PROJECT REPORT

ENHANCING STROKE DISEASE PREDICTION

PERFORMANCE THROUGH A FUSION OF ADABOOST
WITH C4.5 AND K-NEAREST NEIGHBOR ALGORITHMS

HANNY LUTFY DAMAYANTI

20.K1.0038

Faculty of Computer Science

Soegijapranata Catholic University
2023
CHAPTER 1ABSTRACT

Stroke is one of the most serious medical conditions and has a significant impact on
public health. The importance of accurate prediction of stroke risk is to provide appropriate
treatment and intervention to individuals at risk of developing the disease. In recent years, the
use of machine learning methods has become popular in improving stroke disease prediction.
This research implements the Adaboost method to the C4.5 and K-Nearest Neighbor (KNN)
algorithms with the aim of improving stroke prediction performance. Using relevant datasets,
the C4.5 and KNN algorithms were used separately to perform stroke disease prediction.
Furthermore, the Adaboost method is used to combine the prediction results of the two
algorithms. The results showed that the implementation of the Adaboost method on the C4.5 and
KNN algorithms successfully improved the performance of stroke disease prediction, providing
more accurate and reliable predictions to assist in the diagnosis and treatment of stroke disease.
With a value of 91% for the combination of KNN with Adaboost and 95% for the combination of
C4.5 with Adaboost. Both have a difference in value of 4%. Therefore, C4.5 is more effective in
improving the performance of stroke disease prediction.

Keyword: stroke, c4.5, knn, adaboost

ii
TABLE OF CONTENTS

COVER..........................................................................................................................................................i
ABSTRACT (Abstract Title)......................................................................................................................ii
TABLE OF CONTENTS...........................................................................................................................iii
LIST OF FIGURE......................................................................................................................................iv
LIST OF TABLE.........................................................................................................................................v
CHAPTER 1 INTRODUCTION................................................................................................................1
1.1. Background....................................................................................................................................1
1.2. Problem Formulation......................................................................................................................2
1.3. Scope..............................................................................................................................................2
1.4. Objective........................................................................................................................................2
CHAPTER 2 LITERATURE STUDY.......................................................................................................3
CHAPTER 3 RESEARCH METHODOLOGY........................................................................................7
3.1. Research Methodology...................................................................................................................7
3.2. Dataset Collection..........................................................................................................................8
3.3. Data Preprocessing.........................................................................................................................8
3.4. C4.5 Algorithmm............................................................................................................................8
3.5. K-Nearest Neighbor Algorithm......................................................................................................9
3.6. Adaptive Boosting Method..........................................................................................................10
3.7. Comparing Result.........................................................................................................................10
REFERENCES.............................................................................................................................................a

iii
LIST OF FIGURE

Figure 3.1 Research Methodology...................................................................................................7

iv
LIST OF TABLE

Table 3.1. Dataset Attribute.............................................................................................................8

v
CHAPTER 1
INTRODUCTION

1.1. Background

Stroke is a significant global health issue, ranking as the second leading cause of death
worldwide and a major contributor to disability. Indonesia in particular, faces a pressing
challenge with increasing stroke cases and high mortality rates.[1] According to data from 208
Riskesdas, North Sulawesi Province has the highest prevalence of stroke (14.2%) while Papua
Province (4.1%).[2] In addition, according to the Centers for Disease Control and Prevention
(CDC), stroke is one of the leading causes of death in the United States. Stroke is a non-
communicable disease that accounts for about 11% of all deaths and more than 795,000 people
in the United States experience the adverse effects of stroke.[3] The C4.5 algorithm can be used
to predict or classify an event by forming a decision tree.[4] K-Nearest Neighbor performs the
classification process based on the closest distance of the new data to the old data and begins by
determining the value of K.[5] The Adaboost method is one of the supervised algorithms in data
mining that is widely applied to create classification models.

With the advancement of technology in the medical field, machine learning can be used
to predict stroke. Machine learning algorithms are constructive in making accurate predictions
and providing accurate analyses. The use of machine learning has proven to be widely applied in
classification and optimisation topics in creating intelligent systems to improve healthcare
providers. The selection of the right method for stroke symptom detection is needed because it
affects the results that will be displayed.[6]

This research was conducted to apply the adaboost method to the C4.5 and K-Nearest
Neighbor algorithms in stroke disease classification to obtain accurate prediction results. In the
context of stroke disease classification, the C4.5 algorithm is used to build a decision tree model
that can be used to classify stroke symptoms into stroke or non-stroke. The K-Nearest Neighbor
algorithm calculates each distance value of the old data with the new data and then performs the
classification process based on a predetermined K value. The Adaboost method is used to
improve the accuracy of the classification model by combining several weak classification
1
models into one strong classification model. Accuracy can be interpreted as the level of
correlation between the predicted value and the actual value.[7] In addition, the results of the
tests should be analysed to see how effective the algorithm is.

1.2. Problem Formulation

There are several problem formulations in this research, including :

1. Is the combination of the Adaboost Method in the C4.5 Algorithm effective in predicting
stroke disease?
2. Is the combination of Adaboost Method on K-Nearest Neighbor Algorithm effective in
predicting stroke disease?
3. From the two combinations above, which one is more effective in predicting?

1.3. Scope

The dataset used is Stroke Dataset | kaggle.com which contains patient data such as id,
gender, age, hypertension, heart disease, ever married, work type, residence type, average
glucose level, bmi, smoking status, and patient status (stroke or non-stroke). The classification
model was performed using the Adaboost Method in the C4.5 and KNN algorithms. This
research does not discuss risk factors or causes of stroke, but only focuses on the classification of
stroke symptoms to get accurate prediction results.

1.4. Objective

The main objective of this research is to prove that the Adaboost Method on the C4.5 and
KNN Algorithms is able to provide higher performance for stroke disease classification, because
the Adaboost Method is considered capable of improving the accuracy results of several
algorithms in making predictions on various datasets. So that the results of this research can be
implemented in the health sector to assist health workers in classifying stroke symptoms to
produce accurate predictions.

2
CHAPTER 2
LITERATURE STUDY

Research on the comparison of accuracy and performance of two algorithms for

predicting stroke disease conducted by Kohsasih and Situmorang [3] by comparing the C4.5 and
Naïve Bayes algorithms. The dataset used is approximately 5000 data which is divided into 60%
training data and 40% testing data. The preprocessing process is done using the orange
application. Comparison of the two algorithms resulted in an accuracy rate of 95%, precision
value of 90%, recall value of 95%, and f1-score of 93% in the C4.5 algorithm. While the Naïve
Bayes algorithm produces an accuracy rate of 91%, a precision value of 92%, a recall value of
91%, and an f1-score of 92%. In addition, the log loss and specificity results of the C4.5
Algorithm are 0.190 and 0.047. Meanwhile, the Naïve Bayes algorithm is 0.205 and 0.213,
respectively. From both results, it shows that the C4.5 algorithm has better performance.

In her research on the application of adaboost to improve the performance of data mining
classification in diabetes disease, Novianti et al [5] conducted research using the K-Nearest
Neighbor algorithm for classification performance measurement. The test was carried out 5 times
with K values of 7, 13, 19, 25, and 31 respectively. For testing the KNN algorithm itself, the
highest results were obtained from the second test with 92.90% accuracy. As for testing the KNN
Algorithm with Adaboost, it has the highest results in the first and second tests with the same
accuracy of 95.40%. The use of the adaboost method can increase accuracy results by 2.50%.

Research on the diagnosis of stroke risk levels conducted by Puspitawuri et al[6] using
datasets in the form of numerical and categorical attributes so that researchers use the K-Nearest
Neighbor method to process numerical data and Naïve Bayes to process categorical data. The
first test was conducted on the effect of data distribution on balanced training data classes using
30, 45, and 60 data. For example, in 30 training data, there are 10 data with low risk class, 10
data with medium risk class, and 10 data with high risk class. The second test is the effect of data
distribution on unbalanced training data classes using 30, 45, and 60 data. For example, in 30
training data, there are 8 data with low risk class, 8 data with medium risk class, and 14 data with
high risk class. From the test, the highest accuracy value was obtained on balanced class data of
96.67% with 45 training data and a value of K = 15-22. While in the unbalanced class the highest

3
accuracy was obtained at 100% with a total of 60 training data and a value of K = 20-30. So that
the combination method of KNN and Naïve Bayes can be diagnosed because it has the right
results.

Pebrianti et al[7] conducted research on diabetes disease classification. In order to get

optimal results, they used the Adaboost Method with the Naïve Bayes Algorithm. The dataset
used is tabular data from the health conditions of patients who are indicated as diabetic or not,
totalling 336 dta and consisting of 9 variables. Researchers perform data preprocessing, data
cleaning, and split data. In testing accuracy using Naïve Bayes with split data 60%: 40% resulted
in an accuracy value of 76%. After that, testing the accuracy of Naïve Bayes with Adaboost
resulted in an accuracy value of 76.94%. From these results it can be seen that the Adaboost
method can increase the accuracy results by 0.94%.

The use of the C4.5 Decision Tree Algorithm can optimise classification results to get the
right accuracy. In his research, Pambudi et al[8] explained that the C4.5 Decision Tree
Algorithm modelling uses 23 rules, with the number of classes being 14 rules (non-stroke) and 9
rules (stroke). Researchers also conducted research using two main approaches, namely
qualitative and quantitative approaches. This test was conducted using the C4.5 Decision Tree
Algorithm with confusion matrix measurements and AUC values. From testing the Deision Tree
C4.5 algorithm, the prediction results were 96.05%. While in his research, Rohman et al [9]
stated that the use of the Adaboost-based C4.5 Algorithm by looping and attribute wighting was
able to improve the results of the accuracy value in predicting heart disease. The dataset used
was 867 data, where 364 patients were detected as sick and 503 patients were detected as
healthy. After preprocessing the data resulted in 567 data, 257 people were detected as sick and
310 were detected as healthy. In testing the K-Fold Cross Validation Algorithm C4.5 and
Adaboost-based C4.5 Algorithm, researchers conducted 10 trials with multilevel sampling types
and used local random seeds, so that the accuracy results were higher. From the test results, it is
proven that the Adaboost-based algorithm has a higher accuracy value compared to the C4.5
algorithm. The accuracy value for the C4.5 Algorithm model is 86.59% and the accuracy value
for the Adaboost-based C4.5 algorithm model is 92.24%. The two models have a difference of
5.65%. For evaluation using the ROC curve, the AUC value for the C4.5 Algorithm model is
0.957 and the Adaboost-based C4.5 Algorithm is 0.982. Thus, the results of model testing above

4
can be concluded that testing heart disease models using the Adaboost-based C4.5 method is
better than C4.5 itself.

Based on the experimental results that have been carried out using three split data
scenarios, Hermawan et al [10] stated that the Early Prediction of Stroke Disease Based on
Medical Records Using the Classification and Regression Tree (CART) algorithm produced the
highest accuracy of 89.83% in the split data scenario for 80% training data and 20% test data.
After analysing the experiments that have been carried out, the greater the training data, the
greater the accuracy obtained, because later on the evaluation carried out by the confusion
matrix, the truepositive value and the truenegative value will be greater in the larger dataset
scenario. This will affect the accuracy value because the truepositive value is the value of a
positive prediction and it is correct and the truenegative value is the value of a negative
prediction that is wrong, therefore the greatest accuracy value is in the largest dataset scenario as
well.

In his research on hepatitis disease prediction, Buani [11] explained that his research was
conducted to test the prediction results of the Naïve Bayes algorithm with genetic algorithm
feature selection, and the prediction results obtained in this test were 96.77%, this result
increased from previous research using the same data and the same algorithm, namely the naïve
bayes algorithm, the prediction result was 83.71%, the difference from previous research with
this research is 13.06%, this difference proves that the accuracy level of the naïve bayes
algorithm after feature selection using a genetic algorithm is better accuracy.

Handayani et al [12] explained that the Decision Tree Algorithm has a greater true
positive value than the Neural Network Algorithm. The model of the C4.5 algorithm is in the
form of a decision tree, to be able to make a decision tree, the first step is to calculate the number
of classes affected by liver disease that are liver positive and liver negative from each class based
on attributes that have been determined using training data. Then calculate Entropy (Total) using
the equation. The training data used for the Neural Network model is the same, but the attribute
values are data that has been converted into numerical values. Consists of three layers, namely
the Input layer consisting of ten neurons (nine neurons consist of attributes and one neuron is
bias), one hidden layer consisting of eight neurons, and two output layers which are the results of
the prediction of Positive Liver and Negative Liver, the following Neural Network model with

5
Rapidminer Framework version 5.2.001. From the test, the C4.5 model produces an accuracy
value of 75.56% and an AUC value of 0.898. While the Neural Network Model gets an accuracy
value of 74.1% and an AUC value of 0.671. From these results it can be concluded that the
Decision Tree Model is more accurate than the Neural Network Model.

In her research journal on coronary heart disease prediction system, Larassati et al [13]
using the Naïve Bayes method, this study involved 303 data records consisting of 13 variables
and 1 class. Data processing involved data cleaning, selection and transformation. In data
modeling, the Naïve Bayes algorithm was implemented to predict coronary artery disease.
Performance evaluation is done by measuring the prediction ability with the training data, thus
obtaining the accuracy rate of the applied method. The first experiment split the data 60%,
obtaining 177 training data and 119 test data, with an accuracy of 83.1%. The second experiment
with a 70% and 30% split produced an accuracy of 82.02%, while the third experiment with an
80% and 20% split produced an accuracy of 81.6%. It is concluded from the three experiments
that the amount of data significantly affects the accuracy rate and that the Naïve Bayes algorithm
can be applied to predict coronary artery disease based on the initial examination of patient data.

Based on the literature study above, the C4.5 and KNN algorithms are able to provide
high accuracy results in classifying diseases. However, in the research [5][9] proved that the use
of the Adaboost Method in the KNN and C4.5 Algorithms can provide higher performance than
the KNN and C4.5 Algorithms themselves. Therefore, in this research the author will prove that
the use of the Adaboost Method on the C4.5 and KNN Algorithms is able to provide higher
performance in classifying to predict stroke disease.

6
CHAPTER 3
RESEARCH METHODOLOGY

3.1. Research Methodology

To achieve good results in this research study, a structured research method is essential.
Step of problem solving method :

1. Conduct a literature study related to the topic discussed

2. Collecting stroke disesase datasets from the kaggle platform, studying the
algorithms used
3. Preprocessing the dataset with cleaning data, encoding data, and over sampling
using smote
4. Algorithm modeling using C4.5, K-Nearest Neighbors, and Adaboost
5. Analysing implementation results and making conclusions

Figure 3.1 Research Methodology

7
3.2. Dataset Collection

The dataset used is Stroke Prediction taken from kaggle. The data consists of 43401
observation data with 12 attributes. The data attributes used in this study are presented in the
following Table 3.1.

Table 3.1. Dataset Attribute

No Name Information
1 id Id pasien
2 gender Jenis kelamin
3 age Usia pasien
4 hypertension Hipertensi/tekanan darah tinggi
5 heart_disease Penyakit jantung
6 ever_married Pernah menikah
7 work_type Jernis pekerjaan
8 residence_type Tempat tinggal
9 avg_glucose_level Kadar glukosa
10 bmi Index massa tubuh
11 smoking_status Status merokok
12 stroke Prediksi stroke

3.3. Pre-processing Data

In this research, data preprocessing is done by data cleaning. Where some

attributes and data that are incomplete, inaccurate, and irrelevant are removed. The
purpose of data cleaning is to produce data that is actually used.

3.3.1. Cleaning Data

Data cleaning in the study was carried out to eliminate the same data and empty
data. Similarities and empty data can hinder the data processing process. Therefore, this
research needs to do data cleaning.

3.3.2. Encoding Data

Encoding is one of the pre-processing done in this research. In this research,

encoding is done to change the form of the data. By doing encoding, it can facilitate data
processing.

8
3.3.3. Smote Oversampling

The last pre-processing is oversampling using smote. This is done to change the
amount of data with the label "stroke". The stroke parameter has 2 data contents namely
stroke and non-stroke where the number of non-strokes is more than the number of strokes.
Therefore it is necessary to do oversampling so that the number becomes the same and
produces good accuracy.

3.4. Splitting Data

In splitting the data is divided into 2, namely training and testing. Training is part
of the dataset that is trained to predict the function of the machine learning algorithm.
Testing is part of the dataset that is tested to see its accuracy. In this research, the module
used is sklearn.model_selection.

3.5. C4.5 Algorithm

This research utilize the C4.5 algorithm classification method for stroke disease.
The attribute selection process involves choosing attributes as nodes, which can be either
root nodes or internal nodes, based on the highest Gain value of the existing attributes.
The data processing steps using the C4.5 algorithm include calculating the entropy value,
computing the gain value, and constructing decision trees and rules accordingly. The
formula for calculating entropy and gain can be observed in equation (1) (2) respectively
[8]

− pi∗log 2 ( pi )
n
Entropy (S)=∑ ( 1) ¿
i=0 ¿
¿
Description :

S : set of cases

n : number of partitions S

pi : proportion of Si to S

9
n
|S i|
Gain ( S , A )=Entropy ( S )−∑ ∗Entropy ( S 1 )
I=0 |S|
(2)

Description :

S : set of cases

n : number of partitions of attribute A

|S| : number of cases in S

|Si|: number of cases in the i partition

3.6. K-Nearest Neighbor Algorithm

The K-Nearest Neighbor algorithm performs clustering of new data based on the
distance of the data to several nearest neighbours. In this case, the number of nearest
neighbours is determined by the user which is expressed by K. K-Nearest Neigbor works
based on the minimum distance from the new data to the nearest neighbour that has been
applied. The goal of this algorithm 𝐴 = 𝜋𝑟2 is to classify new objects based on
attributes and training samples. The proximity of neighbours is usually calculated based
on the Euclidean Distance which is presented as follows[5] :

√
n
E ( x , y )= ∑ 0 ( xi− yi )2
i
(3 )

Description :

xi = sample Data

yi = testing data

n = data dimension

I = variable data

10
3.7. Adaptive Boosting Method

Adaboost is used to classify data in their respective classes. Adaboost searches for
class categories based on the weight value owned by the class. This process continues to
be repeated so that there is a value update on the class. In adaboost, the weight value will
continue to increase at each iteration of the wrong weight value at each iteration.
Adaboost is a typical ensemble learning algorithm, the results obtained have a strong
level of accuracy.To form an adaboost ensemble can use the following formula[1][5] :

(∑ )
M
Ym ( x )=sign am ym( x )
m−1

(4 )

3.8. Evaluation

The data that has been processed and tested is then compared. The three main
metrics used to evaluate classification models are accuracy, precision, and recall. In this
research, the model evaluation uses confusion matrix data. Based on the confusion matrix
results, the values of accuracy, recall, precision, and F1 score can be determined.

1. Accuracy
Accuracy is the ratio of true prediction to the overall data.

(TP+TN )
×100 %
( TP+ FP+ FN +TN )
( 5)

2. Precission

Precission is the ratio of positive true prediction

compared to overall positive prediction result.

(TP )
×100 %
( TP+ FP )
( 6)

3. Recall

11
Recall is the ratio of positive true prediction compared to overall positive true
data.

(TP )
×100 %
( TP+ FN )
( 7)

4. F1 Score
F1 Score is a weighted comparison of average precission and recall.

2× ( Recall × Precission )
¿¿
¿

Based on function (5), (6), (7) TP is True Positive, FP is False Positive, TN is

True Negative, FN is False Negative and the result is multiplied by 100% to get the
percentage. The calculation result of the Recall (7) and Precission (8) functions will
produce an F1 Score (8).

12
CHAPTER 4

IMPLEMENTATION AND RESULTS

4.1. Experiment Setup

This research was conducted using an Asus VivoBook 14/15 laptop. Windows 10
operating system with Intel(R) Core(TM) i7-10510U CPU @1.80GHz 2.30 GHz
processor and 8 GB RAM. The programming language used is python 3 which is run on
Google Collaboratory online.

4.2. Implementation

This research uses a combination of the Adaboost algorithm with C4.5 and K-
Nearest Neighbors for comparison in improving the performance of stroke disease
prediction. Before doing the comparison, this research uses several libraries in the
process.
1. import numpy as np
2. import pandas as pd
3. from sklearn.preprocessing import LabelEncoder
4. from sklearn.preprocessing import MinMaxScaler
5. from sklearn.neighbors import KNeighborsClassifier
6. from sklearn.tree import DecisionTreeClassifier
7. from sklearn.ensemble import AdaBoostClassifier
8. from sklearn.ensemble import VotingClassifier
9. from sklearn.model_selection import train_test_split
10. from imblearn.over_sampling import RandomOverSampler
11. from sklearn.metrics import
confusion_matrix,ConfusionMatrixDisplay,f1_score,roc_auc_score,classifica
tion_report, accuracy_score
12. from google.colab import drive
13. import warnings
14. warnings.filterwarnings('ignore')

Line 1 import numpy for numerical computation, line 2 import pandas to process
csv data to numerical and vice versa. Lines 3, 4 and 10 import the library used in pre-
processing data. For lines 5 – 8 import libraries for data modeling using C4.5, KNN and
Adaboost, then lin 11 used to display the result of accuracy, precision, recall and f1-
score. Line 9 is used to devide training and testing data and the library on line 12 is used
to acces and manage datasets whose files are located on Google Drive. Libraries on lines
13 and 14 are used to manage notofocations generated by the program.

13
15. drive.mount('/content/drive/')
16. dataframe = pd.read_csv("/content/drive/MyDrive/Kuliah smt
7/project_strokes.csv")
17. dataframe

Lines 15 - 17 are used to associate Google Drive with Google Colab. So, the
datasets files can be accessed and read existing data structures .
18. dataframe.dropna(inplace=True)
19. dataframe.drop_duplicates(inplace=True)
20. dataframe.isnull().sum()
21. dataframe.duplicated().sum()
22. labelencoder = LabelEncoder()

Lines 18 and 19 of the program code are used to delete has null or empty content
and the same data content. Lines 20 and 21 in the program dunction to display the
amount of data that has been dropped. Line 22 are required to perform the encoding
process.
23. !pip install -U imbalanced-learn
24. from imblearn.over_sampling import SMOTE
25. smote = SMOTE(sampling_strategy='auto', random_state=42)
26. X_resampled, y_resampled = smote.fit_resample(X, y)
27. X_train, X_test, y_train, y_test = train_test_split(X_resampled,
y_resampled, test_size=0.3, random_state=42)

Lines 23 and 24 of the program code contain about installing and importing
libraries that will be used in the process of handling class imbalance techniques. Lines 25
and 26 are used to apply SMOTE to the dataset which will result in a new dataset, where
the number of samples from the minority class has been synthetically added so that it is
balanced with the majority class. The variables X_resampled and y_resampled will
contain the new dataset after the oversampling process with SMOTE. Line 20 use the
train_test_split function to split the resampled dataset using SMOTE into training data
(X_train, y_train) and testing data (X_test, y_test). The split is done by allocating 30% of
the data as test data, and the final mold provides information about the shape of each
dataset.
28. c45 = DecisionTreeClassifier(criterion='gini', splitter='random',
max_depth=5)
29. c45.fit(X_train, y_train)
30. y_pred_c45 = c45.predict(X_test)
31. y_pred_train_c45 = c45.predict(X_train)

32. knn = KNeighborsClassifier(n_neighbors=5)

14
33. knn.fit(X_train, y_train)
34. y_pred_knn = knn.predict(X_test)
35. y_pred_train_knn = knn.predict(X_train)

36. adaboost = AdaBoostClassifier(n_estimators=20, random_state=42)

37. adaboost.fit(X_train, y_train)
38. y_pred_adaboost = adaboost.predict(X_test)
39. y_pred_train_adaboost = adaboost.predict(X_train)

40. c45_adaboost = AdaBoostClassifier(base_estimator=c45,

n_estimators=20, random_state=42)
41. c45_adaboost.fit(X_train, y_train)
42. y_pred_c45_adaboost = c45_adaboost.predict(X_test)
43. y_pred_train_c45_adaboost = c45_adaboost.predict(X_train)

44. ensemble = VotingClassifier(estimators=[('adaboost', adaboost),

('knn', knn)], voting='hard')
45. ensemble.fit(X_train, y_train)
46. y_pred_ensemble = ensemble.predict(X_test)
47. y_pred_train_ensemble = ensemble.predict(X_train)

In lines 28 to 47, the program creates and trains machine learning models, such as
C4.5, KNN, Adaboost, and an ensemble of models using the Voting Classifier technique.
These models are then used to make predictions on test (`X_test`) and training
(`X_train`) data. The C4.5 model is also combined with Adaboost to get the best
prediction results. An ensemble model is also performed using a combination of voting
results from Adaboost and KNN models with the 'hard voting' method.

4.3. Result

4.3.1. Result C4.5 Algorithm

Result provided start from the beginning of preprocessing, then the data is divided
into training and testing then calculated accuracy using C4.5 Algorithm. The Experiment
for the optimal result is using 70% of training data and 30% test data. For C4.5 the best
max_depth in 5. The following is a table of calculate results.

Table 4.1 C4.5 Modeling Result

c4.5 label precision recall f1-score support acccuracy

test size 20% 0 0.80 0.90 0.85 5678 0.84
1 0.89 0.78 0.83 5732
test size 30% 0 0.84 0.91 0.87 8520 0.87
1 0.90 0.83 0.86 8595
test size 40% 0 0.86 0.90 0.88 11398 0.87
15
1 0.89 0.85 0.87 11422

Confusion Matrix of C. 45
0.95
0.9
0.85
0.8
0.75
test size 20% test size 20% test size 30% test size 30% test size 40% test size 40%
(0) (1) (0) (1) (0) (1)

precision recall f1-score accuracy

Figure 4.1 C4.5 Modeling Result

This is the result of the C4.5 algorithm in predicting stroke disease with data divided
into 20%, 30%, 40% in the testing set. Precision is the percentage of correct positive
predictions relative to the total positive predictions. Recall is the percentage of correct
positive predictions relative to the actual total positive predictions. F1-Score is the
weighted harmonic mean of precision and recall. The closer to 1, the better the model.
Accuracy is the total score of the entire prediction process calculated from the above three
values.

4.3.2. Result KNN Algorithm

Result provided start from the beginning of preprocessing, then the data is divided
into training and testing and then calculated accuracy using KNN Algorithm. The
Experiment for the optimal result is using 70% of training data and 30% test data. For KNN
the best neighbors in 5. The following is a table of calculate results.

Table 4.2 KNN Modeling Result

knn label precision recall f1-score support acccuracy

test size 20% 0 0.93 0.78 0.85 5678 0.86
1 0.81 0.95 0.87 5732
test size 30% 0 0.93 0.78 0.85 8520 0.86
1 0.81 0.94 0.87 8595
test size 40% 0 0.92 0.77 0.84 11398 0.85

16
1 0.80 0.94 0.87 11422

Confusion Matrix of KNN

0.95
0.9
0.85
0.8
0.75
test size 20% test size 20% test size 30% test size 30% test size 40% test size 40%
(0) (1) (0) (1) (0) (1)

precision recall f1-score accuracy

Figure 4.1 KNN Modeling Result

This is the result of the KNN algorithm in predicting stroke disease with data divided
into 20%, 30%, 40% in the testing set. Precision is the percentage of correct positive
predictions relative to the total positive predictions. Recall is the percentage of correct
positive predictions relative to the actual total positive predictions. F1-Score is the
weighted harmonic mean of precision and recall. The closer to 1, the better the model.
Accuracy is the total score of the entire prediction process calculated from the above three
values.

4.3.3. Result Adaboost Method

Result provided start from the beginning of preprocessing, then the data is divided
into training and testing and then calculated accuracy using Adaboost Method. The
Experiment for the optimal result is using 70% of training data and 30% test data. For
Adaboost the best estimator in 20. The following is a table of calculate results.

Table 4.3 Adaboost Modeling Result

adaboost label precision recall f1-score support acccuracy

test size 20% 0 0.90 0.92 0.91 5678 0.91
1 0.92 0.90 0.91 5732
test size 30% 0 0.91 0.92 0.92 8520 0.92
1 0.92 0.91 0.92 8595
test size 40% 0 0.91 0.92 0.91 11398 0.91

17
1 0.92 0.91 0.91 11422

Confusion Matrix of Adaboost

0.92
0.915
0.91
0.905
0.9
test size 20% test size 20% test size 30% test size 30% test size 40% test size 40%
(0) (1) (0) (1) (0) (1)

precision recall f1-score accuracy

Figure 4.1 Adaboost Modeling Result

This is the result of the Adaboost method in predicting stroke disease with data
divided into 20%, 30%, 40% in the testing set. Precision is the percentage of correct
positive predictions relative to the total positive predictions. Recall is the percentage of
correct positive predictions relative to the actual total positive predictions. F1-Score is the
weighted harmonic mean of precision and recall. The closer to 1, the better the model.
Accuracy is the total score of the entire prediction process calculated from the above three
values.

4.3.4. Result C4.5 and Adaboost Combination

The results given start from the beginning of preprocessing, then the data is divided
into training and testing and then combined between C4.5 and Adaboost and the accuracy is
calculated using the Adaboost Method. The following is a table of calculation results.

Table 4.4 C4.5 and Adaboost Modeling Result

c4.5 + adaboost label precision recall f1-score support acccuracy

test size 20% 0 0.94 0.97 0.95 5678 0.95
1 0.96 0.94 0.95 5732
test size 30% 0 0.93 0.96 0.95 8520 0.95
1 0.96 0.93 0.95 8595
test size 40% 0 0.94 0.96 0.95 11398 0.95

18
1 0.96 0.94 0.95 11422

Confusion Matrix of C.45 + Adaboost

0.98
0.96
0.94
0.92
0.9
test size 20% test size 20% test size 30% test size 30% test size 40% test size 40%
(0) (1) (0) (1) (0) (1)

precision recall f1-score accuracy

Figure 4.1 C4.5 and Adaboost Modeling Result

This is the result of the C4.5 and Adaboost in predicting stroke disease with data
divided into 20%, 30%, 40% in the testing set. Precision is the percentage of correct
positive predictions relative to the total positive predictions. Recall is the percentage of
correct positive predictions relative to the actual total positive predictions. F1-Score is the
weighted harmonic mean of precision and recall. The closer to 1, the better the model.
Accuracy is the total score of the entire prediction process calculated from the above three
values.

4.3.5. Result KNN and Adaboost Combination

The results given start from the beginning of preprocessing, then the data is divided
into training and testing and then combined between KNN and Adaboost and the accuracy
is calculated using the Adaboost Method. The following is a table of calculation results.

Table 4.5 KNN and Adaboost Modeling Result

knn + adaboost label precision recall f1-score support acccuracy

test size 20% 0 0.86 0.97 0.91 5678 0.91
1 0.96 0.85 0.90 5732
test size 30% 0 0.87 0.96 0.91 8520 0.91
1 0.96 0.86 0.90 8595

19
test size 40% 0 0.87 0.96 0.91 11398 0.91
1 0.96 0.85 0.90 11422

Confusion Matrix of KNN + Adaboost

0.98
0.96
0.94
0.92
0.9
0.88
0.86
0.84
0.82
0.8
test size 20% (0) test size 20% (1) test size 30% (0) test size 30% (1) test size 40% (0) test size 40% (1)

precision recall f1-score accuracy

Figure 4.1 KNN and Adaboost Modeling Result

This is the result of the KNN and Adaboost in predicting stroke disease with data
divided into 20%, 30%, 40% in the testing set. Precision is the percentage of correct
positive predictions relative to the total positive predictions. Recall is the percentage of
correct positive predictions relative to the actual total positive predictions. F1-Score is the
weighted harmonic mean of precision and recall. The closer to 1, the better the model.
Accuracy is the total score of the entire prediction process calculated from the above three
values.

4.3.6. Result Conclussion

Based on the algorithm testing above which uses a max depth value of 5, neighbors
5, estimator 20 and the research was processed with a test size 30% has good results.
Although each algorithm has very small difference in precision, recall, and f1-score values.
For more details, can see the chart below.

20
Confusion Matrix of C.45 + Adaboost
0.96
0.95
0.94
0.93
0.92
0.91
0.9
test size 30% (0) test size 30% (1)

precision recall f1-score accuracy

Figure 4.1 C4.5 and Adaboost Combination Result

Confusion Matrix of KNN + Adaboost

0.97
0.95
0.93
0.91
0.89
0.87
0.85
test size 30% (0) test size 30% (1)

precision recall f1-score accuracy

Figure 4.2 KNN and Adaboost Combination Result

Based on Figure 4.6 and Figure 4.7 the combination of the C4.5 algorithm has
higher results than the knn algorithm combination. Each has a value of 91% and 95%.
Where the two combinations have a difference of 4%.

4.4. Discussion

The results of the above tests use the number of test sizes of 20%, 30% and 40%.
For max depth and neighbors, it was tested 20 times and got the optimal value at 5. As for
the estimator itself, it was tested 10 times and got the optimal value at 20. In the test, it did
not immediately get the best results, the researchers did oversampling so that the data used
had the same amount because before oversampling the data with labels 0 and 1 had a lot of
slippage. Not only that, to combine the KNN algorithm with Adaboost cannot be directly

21
combined like C4.5 and Adaboost. For the combination of KNN and Adaboost, ensemble
assistance is needed. Because there are contrasting parameters between the two algorithms.
After doing all the steps in testing, we finally got good results.

For the results of the combination of C4.5 and Adaboost have results above the
C4.5 and Adaboost algorithms themselves. Meanwhile, the results of the combination of
KNN with Adaboost itself have results above the KNN algorithm itself and below the
Adaboost algorithm itself. Therefore, in this test the combination of the C4.5 and Adaboost
algorithms has good results compared to the combination of the KNN and Adaboost
algorithms.

22
CHAPTER 5

CONCLUSION

Based on the test results of combining the two algorithms, it can be concluded
that both combinations can help in improving the performance of stroke disease prediction.
However, the performance generated by the combination of C4.5 and KNN is different. In
the C4.5 algorithm, the higher the max depth value, the higher the resulting value. While in
KNN the neighbor value does not have a significant difference as well as the Adaboost
Algorithm. Testing in this study using a max depth value of 5 and an estimator of 20 with a
test size of 30% resulted in a performance value of 95% in the combination of the C4.5
Algorithm with Adaboost. Meanwhile, using a neihgbor value of 5, 20 estimators with the
same test size of 30% produces a performance value of 91% in the combination of KNN
with Adaboost.

For the results of precision, recall and f1-score in each combination, there is no
significant difference. In terms of processing time, the C4.5 algorithm can process faster
than the KNN algorithm. This is due to the parameters of each algorithm. It can be
concluded that performance results can be influenced by the amount of data and parameters
used.

Suggestions for future research are to try combining KNN with Adaboost without
using the ensemble method and try combining other algorithms to find out better prediction
performance.

23
REFERENCES

[1] A. Byna and M. Basit, “PENERAPAN METODE ADABOOST UNTUK

MENGOPTIMASI PREDIKSI PENYAKIT STROKE DENGAN ALGORITMA NAÏVE
BAYES,” SISFOKOM, vol. 9, no. 3, pp. 407–411, Nov. 2020, doi:
10.32736/sisfokom.v9i3.1023.
[2] Y. Oktarina and S. Mulyani, “EDUKASI KESEHATAN PENYAKIT STROKE PADA
LANSIA,” vol. 3, 2020, doi: 10.22437/medicaldedication.v3i2.11220.
[3] K. L. Kohsasih and Z. Situmorang, “Analisis Perbandingan Algoritma C4.5 dan Naïve
Bayes Dalam Memprediksi Penyakit Cerebrovascular,” Jurnal Penelitian Teknik
Informatika, Manajemen Informatika dan Sistem Informasi, vol. 9, no. 1, pp. 13–17, Apr.
2022, doi: 10.31294/inf.v9i1.11931.
[4] R. Novita, “Teknik Data Mining : Algoritma C 4.5”.
[5] N. Novianti, M. Zarlis, and P. Sihombing, “Penerapan Algoritma Adaboost Untuk
Peningkatan Kinerja Klasifikasi Data Mining Pada Imbalance Dataset Diabetes,” mib, vol.
6, no. 2, p. 1200, Apr. 2022, doi: 10.30865/mib.v6i2.4017.
[6] A. Puspitawuri, E. Santoso, and C. Dewi, “Diagnosis Tingkat Risiko Penyakit Stroke
Menggunakan Metode K-Nearest Neighbor dan Naïve Bayes”.
[7] L. Pebrianti, F. Aulia, and H. Nisa, “Implementasi Metode Adaboost untuk Mengoptimasi
Klasifikasi Penyakit Diabetes dengan Algoritma Naïve Bayes”, vol. 7, no. 2, 2022, doi:
10.32528/justindo.v7i2.8627.
[8] R. E. Pambudi, “Klasifikasi Penyakit Stroke Menggunakan Algoritma Decision Tree C.45,”
vol. 16, no. 02, doi: 10.5281/zenodo.7535865.
[9] A. Rohman, V. Suhartono, and C. Supriyanto, “PENERAPAN ALGORITMA C4.5
BERBASIS ADABOOST UNTUK PREDIKSI PENYAKIT JANTUNG,” vol. 13, 2017,
doi: 10.25126/jtiik.2020752379.
[10] A. F. Hermawan, F. R. Umbara, and F. Kasyidi, “Prediksi Awal Penyakit Stroke
Berdasarkan Rekam Medis menggunakan Metode Algoritma CART(Classification and
Regression Tree)”, vol. 7, no. 2, 2022, doi: 10.26760/mindjournal.v7i2.151-164.
[11] D. C. P. B. - STMIK Nusa Mandiri Jakarta, “Prediksi Penyakit Hepatitis Menggunakan
Algoritma Naïve Bayes Dengan Seleksi Fitur Algoritma Genetika,” EVOLUSI, vol. 6, no. 2,
Sep. 2018, doi: 10.31294/evolusi.v6i2.4381.
[12] P. Handayani, E. Nurlelah, M. Raharjo, and P. M. Ramdani, “Prediksi Penyakit Liver
Dengan Menggunakan Metode Decision Tree dan Neural Network,” Com, Engine, Sys, Sci,
vol. 4, no. 1, p. 55, Feb. 2019, doi: 10.24114/cess.v4i1.11528.
[13] D. Larassati, A. Zaidiah, and S. Afrizal, “Sistem Prediksi Penyakit Jantung Koroner
Menggunakan Metode Naive Bayes,” jipi. jurnal. ilmiah. penelitian. dan. pembelajaran.
informatika., vol. 7, no. 2, pp. 533–546, May 2022, doi: 10.29100/jipi.v7i2.2842.

Virus Student Worksheet Answer Key
100% (1)
Virus Student Worksheet Answer Key
6 pages
Endurity Core DR PM2152
0% (1)
Endurity Core DR PM2152
2 pages
For Improving and Protecting Plant Health: Chlorine Nitrogen
No ratings yet
For Improving and Protecting Plant Health: Chlorine Nitrogen
1 page
Alliumpdf
100% (1)
Alliumpdf
456 pages
Kingdom of Nothing
80% (5)
Kingdom of Nothing
82 pages
Tbi Case Presentation
No ratings yet
Tbi Case Presentation
22 pages
Physiotherapydfsf
No ratings yet
Physiotherapydfsf
21 pages
Processes 11 01210
No ratings yet
Processes 11 01210
31 pages
Araldite® Rapid Resin: Safety Data Sheet
No ratings yet
Araldite® Rapid Resin: Safety Data Sheet
43 pages
Control and Coordination: Chapter - 7
100% (1)
Control and Coordination: Chapter - 7
9 pages
Microbial Growth
No ratings yet
Microbial Growth
96 pages
Developing A Predictive Model of Stroke Using Support Vector Machine
No ratings yet
Developing A Predictive Model of Stroke Using Support Vector Machine
6 pages
Drugs Acting On The Immune System: Retchel-Elly D. Dapli-An
No ratings yet
Drugs Acting On The Immune System: Retchel-Elly D. Dapli-An
60 pages
Early Prediction of Brain Stroke Using Logistic Regression
No ratings yet
Early Prediction of Brain Stroke Using Logistic Regression
9 pages
Human Disease Prediction (2) - 1 - Compressed
No ratings yet
Human Disease Prediction (2) - 1 - Compressed
62 pages
The Use of Deep Learning To Predict Stroke Patient Mortality
No ratings yet
The Use of Deep Learning To Predict Stroke Patient Mortality
12 pages
Overview: The Health Communication Process: in This Section
No ratings yet
Overview: The Health Communication Process: in This Section
17 pages
Prediction of Stroke Using Deep Learning Model: October 2017
No ratings yet
Prediction of Stroke Using Deep Learning Model: October 2017
10 pages
Veterinarians Rough Draft
No ratings yet
Veterinarians Rough Draft
3 pages
Cambridge IGCSE: Biology 0610/13
No ratings yet
Cambridge IGCSE: Biology 0610/13
20 pages
Surgery: Faculty
0% (1)
Surgery: Faculty
9 pages
Innovations in Stroke Identification A Machine Learning-Based Diagnostic Model Using Neuroimages
No ratings yet
Innovations in Stroke Identification A Machine Learning-Based Diagnostic Model Using Neuroimages
11 pages
Multiple Disease Prediction Using Different Machine Learning Algorithms Comparatively
No ratings yet
Multiple Disease Prediction Using Different Machine Learning Algorithms Comparatively
5 pages
Ann. Din. Biochem.: Regulation of Calcium Metabolism
No ratings yet
Ann. Din. Biochem.: Regulation of Calcium Metabolism
22 pages
(IJCST-V11I3P11) :Kalaiselvi.P, Vasanth.G, Aravinth.P, Elamugilan.A, Prasanth.S
No ratings yet
(IJCST-V11I3P11) :Kalaiselvi.P, Vasanth.G, Aravinth.P, Elamugilan.A, Prasanth.S
4 pages
Hypertensive Renal Disease
No ratings yet
Hypertensive Renal Disease
2 pages
Strokeprediction DRAFTArticle
No ratings yet
Strokeprediction DRAFTArticle
6 pages
Itpml32 Full
No ratings yet
Itpml32 Full
19 pages
2021 IC FICTA Boosting Accuracy
No ratings yet
2021 IC FICTA Boosting Accuracy
13 pages
Aiml Project
No ratings yet
Aiml Project
22 pages
Experimental Disease Prediction Research On Combining Natural Language Processing and Machine Learning
No ratings yet
Experimental Disease Prediction Research On Combining Natural Language Processing and Machine Learning
6 pages
IEEE Conference Team ATOM
No ratings yet
IEEE Conference Team ATOM
5 pages
Heart Diesease Prediction and Recommendation System Using Machine Learning
No ratings yet
Heart Diesease Prediction and Recommendation System Using Machine Learning
11 pages
Silo - Tips - Calf Milk Replacer Guide Developed by Rob Costello Technical Specialist
No ratings yet
Silo - Tips - Calf Milk Replacer Guide Developed by Rob Costello Technical Specialist
19 pages
Prediction of Brain Stroke Using Machine Learning
No ratings yet
Prediction of Brain Stroke Using Machine Learning
8 pages
TINDUKA
No ratings yet
TINDUKA
7 pages
Actuarial cs2
No ratings yet
Actuarial cs2
4 pages
STROKE DISEASE PREDICTION USING Bidectional Finnal
No ratings yet
STROKE DISEASE PREDICTION USING Bidectional Finnal
10 pages
20.k1.0038 Proposal Project Report Kelar-1
No ratings yet
20.k1.0038 Proposal Project Report Kelar-1
31 pages
NCLEX
No ratings yet
NCLEX
1 page
IEEE Paper Format Template
No ratings yet
IEEE Paper Format Template
4 pages
Stroke Prediction Using Machine Learning
No ratings yet
Stroke Prediction Using Machine Learning
8 pages
Miniproject Report
No ratings yet
Miniproject Report
11 pages
Final Viva
No ratings yet
Final Viva
38 pages
(IJCST-V12I4P5) :vaishali Sarde, Pankaj Sarde
No ratings yet
(IJCST-V12I4P5) :vaishali Sarde, Pankaj Sarde
8 pages
DL Project Progress Report
No ratings yet
DL Project Progress Report
49 pages
Multiple Disease Prediction Using Machine Learning
No ratings yet
Multiple Disease Prediction Using Machine Learning
4 pages
Enhancing Stroke Prediction Using The Waikato Environment For Knowledge Analysis
No ratings yet
Enhancing Stroke Prediction Using The Waikato Environment For Knowledge Analysis
8 pages
Applsci 13 05047
No ratings yet
Applsci 13 05047
19 pages
Mini Project of Banking Management
No ratings yet
Mini Project of Banking Management
17 pages
Manifestaciones Clínicas y Diagnóstico de Pancreatitis Aguda.
No ratings yet
Manifestaciones Clínicas y Diagnóstico de Pancreatitis Aguda.
34 pages
Strokeprediction DRAFTArticle
No ratings yet
Strokeprediction DRAFTArticle
6 pages
Paper 62-Analyzing The Performance of Stroke Prediction
No ratings yet
Paper 62-Analyzing The Performance of Stroke Prediction
7 pages
Stroke Prediction ! Review
No ratings yet
Stroke Prediction ! Review
15 pages
Duplichecker Plagiarism Report
No ratings yet
Duplichecker Plagiarism Report
2 pages
Stroke Prediction Using Machine Learning
No ratings yet
Stroke Prediction Using Machine Learning
8 pages
Poultry Science
No ratings yet
Poultry Science
2 pages
Reading Test 1 Rev
No ratings yet
Reading Test 1 Rev
11 pages
Opthalmology MCQ Neet PG Type
No ratings yet
Opthalmology MCQ Neet PG Type
48 pages
Final.52 Plag.
No ratings yet
Final.52 Plag.
48 pages
Symptom-Based Disease Prediction A Machine Learnin
No ratings yet
Symptom-Based Disease Prediction A Machine Learnin
10 pages
Stroke Prediction Analysis
No ratings yet
Stroke Prediction Analysis
5 pages
Chapter 13 Student Version Fall 2023
No ratings yet
Chapter 13 Student Version Fall 2023
51 pages
Physiology of Blood Worksheet: Background
No ratings yet
Physiology of Blood Worksheet: Background
4 pages
R2 .T9 Chapter 14
No ratings yet
R2 .T9 Chapter 14
2 pages
Predicting Stroke Occurrences: A Stacked Machine Learning Approach With Feature Selection and Data Preprocessing
No ratings yet
Predicting Stroke Occurrences: A Stacked Machine Learning Approach With Feature Selection and Data Preprocessing
23 pages
No 11
No ratings yet
No 11
8 pages
Research Report
No ratings yet
Research Report
35 pages
Stroke Prediction Project Presentation
No ratings yet
Stroke Prediction Project Presentation
9 pages
Stroke Prediction D.B
No ratings yet
Stroke Prediction D.B
11 pages
Stroke Prediction Using Artificial Intelligence
No ratings yet
Stroke Prediction Using Artificial Intelligence
4 pages
Intelligent Disease Diagnosis: A Multi-Disease Prediction Approach Using Machine Learning
No ratings yet
Intelligent Disease Diagnosis: A Multi-Disease Prediction Approach Using Machine Learning
12 pages
Performance Analysis of Machine Learning Approaches in Stroke Prediction
No ratings yet
Performance Analysis of Machine Learning Approaches in Stroke Prediction
6 pages
A Machine Learning-Based Model For Stroke Predicti
No ratings yet
A Machine Learning-Based Model For Stroke Predicti
9 pages
中國透析史2020
No ratings yet
中國透析史2020
7 pages
A Transfer Learning Approch To Predict The Diagnosis of Brain Stroke
No ratings yet
A Transfer Learning Approch To Predict The Diagnosis of Brain Stroke
6 pages
Final Research Paper
No ratings yet
Final Research Paper
5 pages
Miniproject Review PPT (Final)
No ratings yet
Miniproject Review PPT (Final)
14 pages
Brain Stroke Prediction Using Machine Learning Techniques
No ratings yet
Brain Stroke Prediction Using Machine Learning Techniques
6 pages
07 Jima July 2025
No ratings yet
07 Jima July 2025
64 pages
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Advanced Analytics of Image Datasets in Human Health
From Everand
Advanced Analytics of Image Datasets in Human Health
Dr. Zemelak Goraga
No ratings yet
Integrated Non-Invasive Cardiovascular Imaging: A Guide for the Practitioner
From Everand
Integrated Non-Invasive Cardiovascular Imaging: A Guide for the Practitioner
IAEA
No ratings yet
Establishing a Secondary Standards Dosimetry Laboratory
From Everand
Establishing a Secondary Standards Dosimetry Laboratory
IAEA
No ratings yet
Arterial hypertension in clinical practice: study and analysis of biotechnological and telemedicine models
From Everand
Arterial hypertension in clinical practice: study and analysis of biotechnological and telemedicine models
Michele Karaboue
No ratings yet
Implementation of a Remote and Automated Quality Control Programme for Radiography and Mammography Equipment
From Everand
Implementation of a Remote and Automated Quality Control Programme for Radiography and Mammography Equipment
IAEA
No ratings yet
Data Science Project Ideas, Methodology & Python Codes in Health Care
From Everand
Data Science Project Ideas, Methodology & Python Codes in Health Care
Zemelak Goraga
No ratings yet
A Practitioner's Approach for Problem-Solving using AI
From Everand
A Practitioner's Approach for Problem-Solving using AI
Satvik Vats
No ratings yet
SAS Graphics for Clinical Trials by Example
From Everand
SAS Graphics for Clinical Trials by Example
Kriss Harris
5/5 (1)
Clinical Graphs Using SAS
From Everand
Clinical Graphs Using SAS
Sanjay Matange
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

20.k1.0038 Proposal Project Report Kelar

Uploaded by

20.k1.0038 Proposal Project Report Kelar

Uploaded by

PROJECT REPORT

ENHANCING STROKE DISEASE PREDICTION

HANNY LUTFY DAMAYANTI

Faculty of Computer Science

Keyword: stroke, c4.5, knn, adaboost

Figure 3.1 Research Methodology...................................................................................................7

Table 3.1. Dataset Attribute.............................................................................................................8

1.2. Problem Formulation

There are several problem formulations in this research, including :

Research on the comparison of accuracy and performance of two algorithms for

Pebrianti et al[7] conducted research on diabetes disease classification. In order to get

3.1. Research Methodology

1. Conduct a literature study related to the topic discussed

Figure 3.1 Research Methodology

Table 3.1. Dataset Attribute

3.3. Pre-processing Data

In this research, data preprocessing is done by data cleaning. Where some

3.3.1. Cleaning Data

3.3.2. Encoding Data

Encoding is one of the pre-processing done in this research. In this research,

3.4. Splitting Data

3.5. C4.5 Algorithm

n : number of partitions of attribute A

|S| : number of cases in S

|Si|: number of cases in the i partition

3.6. K-Nearest Neighbor Algorithm

Precission is the ratio of positive true prediction

Based on function (5), (6), (7) TP is True Positive, FP is False Positive, TN is

IMPLEMENTATION AND RESULTS

4.1. Experiment Setup

32. knn = KNeighborsClassifier(n_neighbors=5)

36. adaboost = AdaBoostClassifier(n_estimators=20, random_state=42)

40. c45_adaboost = AdaBoostClassifier(base_estimator=c45,

44. ensemble = VotingClassifier(estimators=[('adaboost', adaboost),

4.3.1. Result C4.5 Algorithm

Table 4.1 C4.5 Modeling Result

c4.5 label precision recall f1-score support acccuracy

precision recall f1-score accuracy

Figure 4.1 C4.5 Modeling Result

4.3.2. Result KNN Algorithm

Table 4.2 KNN Modeling Result

knn label precision recall f1-score support acccuracy

Confusion Matrix of KNN

precision recall f1-score accuracy

4.3.3. Result Adaboost Method

Table 4.3 Adaboost Modeling Result

adaboost label precision recall f1-score support acccuracy

Confusion Matrix of Adaboost

precision recall f1-score accuracy

4.3.4. Result C4.5 and Adaboost Combination

Table 4.4 C4.5 and Adaboost Modeling Result

c4.5 + adaboost label precision recall f1-score support acccuracy

Confusion Matrix of C.45 + Adaboost

precision recall f1-score accuracy

4.3.5. Result KNN and Adaboost Combination

Table 4.5 KNN and Adaboost Modeling Result

knn + adaboost label precision recall f1-score support acccuracy

Confusion Matrix of KNN + Adaboost

precision recall f1-score accuracy

4.3.6. Result Conclussion

precision recall f1-score accuracy

Confusion Matrix of KNN + Adaboost

precision recall f1-score accuracy

[1] A. Byna and M. Basit, “PENERAPAN METODE ADABOOST UNTUK

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.