Breast Cancer Classification
Breast Cancer Classification
This study was based on genetic programming and machine Firstly, the dataset is imported entirely in my Jupyter notebook.
learning algorithms that aim to construct a system to accurately Then the patches are visualized and some observations are
differentiate between benign and malignant breast tumors. The drawn out.
aim of the study was to optimize the learning algorithm. In this
context, they have applied the genetic programming technique Data Visualization:
to select the best features and perfect parameter values of the
machine learning classifiers. The performance of the proposed What does the image look like? And to what ratio is the IDC
method was based on sensitivity, specificity, precision, positive and IDC negative?
accuracy, and the roc curves.
A single image from the dataset is visualized:
Breast Cancer Survival Prediction by
Sathipati and Ho used an optimized SVM regression to identify
miRNA signatures associated with survival time in patients with
lung adenocarcinoma. They used a novel feature selection
algorithm called IBCGA and these features were then fed into
traditional SVR. Although their custom SVR outperformed
other regression methods, it did not generalize well to unseen
validation data. Another issue with this paper was the size of
datasets.
3.DATASET
Breast cancer can develop at any different part of the breast. The
most common form of breast cancer is Invasive Ductal
Carcinoma (IDC). In order to detect IDC, it is through various
methods such as mammography, ultrasound, biopsy and so on.
Through biopsy, histopathology images are derived. The
Dataset that is going to be used for training and testing for the
image classification model will be the Breast Histopathology
Images dataset. Since the dataset is too large, I have taken 1/8
th part, i. e. 46253 images.
https://www.ncbi.nlm.nih.gov/pubmed/27563488
Fig. 2: Healthy Patches
and
http://spie.org/Publications/Proceedings/Paper/10.1117/12.204
3872.
My Custom CNN has 4 Convolution and Max Pooling layers, 1 The improved Model’s Accuracy has been increased from 85 to
Flattening Layer and 2 Dense Layers. The Conv2D layers apply 87 percent.
convolutional operations with different filter sizes to capture Accuracy: 0.87 Precision: 0.76
image features. MaxPooling2D layers down sample the spatial Recall: 0.70 F1-score: 0.73
dimensions to reduce computational complexity. The flattening
layer flattens the output from the previous layer into a 1D array.
The Dense layers are responsible for combining the extracted
features and making the final classification. The first layer uses
Relu Activation function and the last layer uses the Softmax
activation function to produce probability scores for each class,
cancerous and non cancerous.
The number of Epochs was 25 with a batch size of 75. The loss
function used here is Binary Cross Entropy.
The Model Accuracy Curve and the Loss curve has been
Fig. 6: Accuracy Curve for Model 2
Linear Regression Model
For Cancerous:
Average Color (BGR):
[169.7552 130.784 187.0252]
RGB Ratios:
[0.66570667 0.51287843 0.73343216]
9.REFERENCES
[1]Khalid, Arslan, Arif Mehmood, Amerah Alabrah, Bader
Fahad Alkhamees, Farhan Amin, Hussain AlSalman, and Gyu
Sang Choi. 2023. "Breast Cancer Detection and Prevention
Using Machine Learning" Diagnostics 13, no. 19: 3113.
https://doi.org/10.3390/diagnostics13193113
7.RESULTS
On assessing the performance of the models of classifying the
cancerous and non cancerous images on the test set comprising
of images from each class, the two models achieves a
remarkable accuracy of 85 and 87 percent respectively.
On assessing the performance of the three models SVM, Linear
Regression Model and the CNN Model, the three models give a
low accuracy of around 19-20 precent. This can be further
improved by using Data Augmentation techniques, using
various advanced machine learning models and incorporating
several other important factors like tumor size, tissue hardness
etc. to high quality standard histopathology image dataset.
8.CONCLUSION
These are the final results obtained from the models. Each
model is curated carefully with modifications and
improvements at each line of code in order to obtain the
desired result.