Paper 3
Paper 3
Keywords
Breast Cancer, random forest, k-Nearest Neighbor, naive bayes
Introduction
In this paper, the authors have trained and compared the performance of three
different machine learning algorithms named as Random Forest, kNN (k-Nearest-
Neighbor), and Naïve Bayes. The dataset used for this experiment is Wisconsin
Diagnosis Breast Cancer. The authors have used these three ML models/algorithms
for detection of the breast cancer. The main experiment is comparative analysis of
the popular methods used in the Machine Learning domain on the breast cancer
dataset. The comparison of the algorithms is done on the basis of accuracy and
precision. The proposed method in this paper is that first preprocessing is
performed, then features are extracted, and finally the models are trained and
tested on the selected WDBC dataset. The WDBC dataset consists of two classes and
total of 569 instances with 32 attributes.
The results of Random Forest are that there are 108 instances of benign cases, and it
has correctly predicted 103 them as benign and 5 are misclassified as malignant
whereas for the total of 63 malignant cases, 4 are falsely classified as benign and 59
are predicted correcly. The accuracy of Random Forest algorithm is achieved as
94.74%. The results for kNN can be described in terms of correctly predicted cases
and falsely predicted, or misclassifed. The results for kNN indicate that out of 108
Benign cases, 107 were predicted correctly as benign while 1 benign is predicted as
malignant. Whereas, out of 63 malignant cases, only 6 were predicted as benign,
while the correct prediction for the malignant cases is 57. The experiment shows
that the kNN has an accuracy 95.90%. The results for the Naive Bayes are that out of
108 benign cases, 101 are predicted as benign whereas 7 are misclassified as
malignant. In the case of malignant, out of total 63 cases, 9 are falsely classified as
benign where as 54 are correctly predicted as malignant. And, the accuracy for Naive
Bayes is 94.47%. The paper shows that for the WDBC dataset, the best performing
ML model/algorithm is kNN.
Introduction
National Institute of Cancer Prevention and Research, cancer statistics [Online],
Available: http://cancerindia.org.in/statistics/
WHO breast cancer statistics [Online]. Available:
http://www.who.int/cancer/prevention/diagnosisscreening/breastcancer/en/
B.M. Gayathri, Dr. C.P. Sumathi, “Comparative study of relevance vector machine
with various machine learning techniques used for detecting breast cancer” 2016
S Kharya and S Soni,”Weighted Naïve Bayes classifier –Predictive model for breast
cancer detection”, January 2016
Sivakami, "Mining Big Data: Breast Cancer Prediction using DT-SVM Hybrid Model"
2015