Deep Learning Based
Deep Learning Based
Abstract
This study uses hematoxylin and eosin (H&E) stained whole-slide images (WSIs) for molecular
subtyping of breast cancer. The pipeline involves tumor vs. non-tumor classification and
molecular subtype prediction using a two-step approach. The proposed method achieved an F1
score of 0.95 for tumor detection and a macro F1 score of 0.73 for subtype classification. These
findings underscore the potential of deep learning models as cost-effective and efficient
alternatives to traditional methods like immunohistochemistry and gene expression profiling.
Introduction
One of the main causes of cancer-related fatalities is breast cancer, and molecular subtyping is
essential for individualised treatment plans. Conventional techniques such as gene expression
profiling and immunohistochemistry (IHC) are costly and arbitrary. An automatic and
economical method for classifying subtypes straight from H&E-stained whole-slide images
(WSIs) is deep learning. A multi-step pipeline utilizing cutting-edge deep learning algorithms is
proposed in this paper for precise and effective categorisation. By guaranteeing tumor-focused
analysis and better subtype prediction, the method improves accessibility and lowers diagnostic
expenses.
Objectives
1. Create a workflow that uses H&E-stained WSIs to classify the molecular subtypes of
breast cancer (Luminal A, Luminal B, HER2-enriched, and Basal-like).
2. Make sure subtype classification is based only on tumour tissue by training a deep
learning model to differentiate between tumour and non-tumor regions.
3. Use binary classifiers in a One-vs-Rest (OvR) approach, and use an XGBoost model to
aggregate the findings for subtype prediction.
4. Improve model accuracy by using preprocessing methods including data balance, color
normalization, and tile extraction.
5. Analyse the pipeline's performance on a sizable dataset using measures such as recall,
precision, and F1 scores. Verify the model's capacity for generalizability and usefulness
in research and clinical contexts.
Methodology
Data Collection: The study utilized publicly available datasets such as TCGA-BRCA,
BRACS, CPTAC-BRCA, and HER2-Warwick, collectively comprising 1433 WSIs.
These datasets were annotated and categorized based on breast cancer molecular subtypes
(LumA, LumB, HER2, and BL).
Preprocessing:
WSI regions were divided into tiles of 512x512 pixels at a fixed magnification.
Tumor regions were identified using a binary classifier to exclude irrelevant tiles.
Color normalization was performed using the Macenko method to reduce
variations in image acquisition.
Segmentation: Tumor tiles were separated from non-tumor tiles using Inception_V3, a
convolutional neural network architecture optimized for hierarchical feature extraction.
Feature Extraction and Classification:
The molecular subtype classification used a One-vs-Rest (OvR) strategy with four
binary classifiers.
Results were aggregated using an eXtreme Gradient Boosting (XGBoost) model.
Class imbalances were addressed by augmenting HER2 data through tile overlaps
and balanced sampling.
Future Scope
Enhancing the dataset with diverse and balanced samples to improve minority class
performance.
Expanding the pipeline to include multimodal inputs like immunohistochemistry or
radiological data.
Deploying the system in real-world clinical settings for large-scale validation.
Conclusion
The study highlights the feasibility of using deep learning for breast cancer molecular subtyping
via H&E WSIs. The approach reduces cost, time, and dependency on specialized tools, making it
suitable for broader adoption in clinical settings. With improvements in data diversity and model
robustness, this methodology could transform precision oncology practices