Abstract
Unlike common cancers, such as those of the prostate and breast, tumor grading in rare cancers is difficult and largely undefined because of small sample sizes, the sheer volume of time and experience needed to undertake such a task, and the inherent difficulty of extracting human-observed patterns. One of the most challenging examples is intrahepatic cholangiocarcinoma (ICC), a primary liver cancer arising from the biliary system, for which there is well-recognized tumor heterogeneity and no grading paradigm or prognostic biomarkers. In this paper, we propose a new unsupervised deep convolutional autoencoder-based clustering model that groups together cellular and structural morphologies of tumor in 246 digitized whole slides, based on visual similarity. Clusters based on this visual dictionary of histologic patterns are interpreted as new ICC subtypes and evaluated by training Cox-proportional hazard survival models, resulting in statistically significant patient stratification.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Cancer subtyping is an important tool used to determine disease prognosis and direct therapy. Commonly occurring cancers, such as those of breast and prostate, have well established subtypes, validated on large sample sizes [4]. The manual labor required to subtype a cancer, by identifying different histologic patterns and using them to stratify patients into different risk groups, is an extremely complex task requiring years of effort and repeat review of large amounts of visual data, often by one pathologist.
Subtyping a rare cancer poses a unique set of challenges. Intrahepatic cholangiocarcinoma (ICC), a primary liver cancer emanating from the bile duct, has an incidence of approximately 1 in 160,000 in the United States, and rising [14]. Currently, there exists no universally accepted histopathology-based subtyping or grading system for ICC and studies classifying ICC into different risk groups have been inconsistent [1, 12, 15]. A major limiting factor to subtyping ICC is that only small cohorts are available to each institution. There is an urgent need for efficient identification of prognostically relevant cellular and structural morphologies from limited histology datasets of rare cancers, such as ICC, to build risk stratification systems which are currently lacking across many cancer types.
Computational pathology offers a new set of tools, and more importantly, a new way of approaching the historical challenges of subtyping cancers using computer vision-based deep learning, leveraging the digitization of pathology slides, and taking advantage of the latest advances in computational processing power. In this paper, we offer a new deep learning-based model which uses a unique neural network-based clustering approach to group together histology based on visual similarity. With this visual dictionary, we interpret clusters as subtypes and train a survival model, showing significant results for the first time in ICC.
2 Materials and Methods
Cancer histopathology images exhibit high intra- and inter-heterogeneity because of their size (as large as tens of billions of pixels). Different spatial or temporal sampling of a tumor can have sub-populations of cells with unique genomes, theoretically resulting in visually different patterns of histology [3]. In order to effectively cluster this extremely large amount of high intra-variance data into subsets which are based on similar morphologies, we propose combining a neural network-based clustering cost-function, previously shown to outperform traditional clustering techniques on images of hand-written digits [16], with a novel deep convolutional architecture. We hypothesize that a k-means style clustering cost function under the constraint of image reconstruction which is being driven by adaptive learning of filters will produce clusters of histopathology relevant to patient outcome. Finally, we assess the performance and usefulness of this clustering model by conducting survival analysis, using both Cox-proportional hazard modeling and Kaplan-Meier survival estimation, to measure if each cluster of histomorphologies has significant correlation to recurrence of cancer after resection. While other studies have performed unsupervised clustering of whole slide tiles based on image features, they have been used to address the problem of image segmentation [11] and relied on clustering a developed latent space [7, 8]. Our study adjusts the latent space with each iteration of clustering.
2.1 Deep Clustering Convolutional Autoencoder
A convolutional auto-encoder is made of two parts, an encoder and decoder. The encoder layers project an image into a lower dimensional representation, an embedding, through a series of convolution, pooling, and activation functions. This is described in Eq. 1a, where \(x_i\) is an input image or input batch of images transformed by \(f_\theta ()\), and \(z_i\) is the resulting representation embedding. The decoder layers try to reconstruct the original input image from its embedding using similar functions. Mean-squared-error loss (MSE) is commonly used to optimize such a model, updating model weights (\(\theta \)) relative to the error between the original (input, \(x_i\)) image and the reconstruction (output, \(x_i^{'}\)) image in a set of N images. This is shown in Eq. 1b.
Although a convolutional auto-encoder can learn effective lower-dimensional representations of a set of images, it does not cluster together samples with similar morphology. To overcome this problem, we amend the traditional MSE-loss function by using the reconstruction-clustering error function, as proposed first by Song et al. [16]:
where \(z_i\) is the embedding as defined in Eq. 1a, \(c^*_i\) is the centroid assigned to sample \(x_i\) from the previous training epoch, and \(\lambda \) is a weighting parameter. Cluster assignment is determined by finding the shortest Euclidean distance between a sample embedding from epoch t and a centroid, across j centroids from epoch \(t-1\):
The algorithm is initialized by assigning a random cluster to each sample. Centroid locations are calculated for each cluster class by Eq. 4. Each mini-batch is forwarded through the model and network weights are respectively updated. At the end of an epoch, defined by the forward-passing of all mini-batches, cluster assignments are updated by Eq. 3, given the new embedding space. Finally, the centroid locations are updated from the new cluster assignments. This process is repeated until convergence. Figure 1 shows a visualization of this training procedure.
2.2 Dataset
Whole slide images were obtained from Memorial Sloan Kettering Cancer Center (MSK) and Erasmus Medical Center with approval from each respective Institutional Review Boards. In total, 246 patients with resected ICC without neoadjuvant chemotherapy were included in the analysis. All slides were digitized at MSK using Aperio AT2 scanners (Leica Biosystems; Wetzlar Germany). Up-to-date retrospective data for recurrence free survival after resection were also obtained. Though currently a small sample size when compared to commonly occurring cancers, this collection is the largest known ICC dataset in the world.
A library of extracted image tiles was generated from all digitized slides. First, each slide was reduced to a thumbnail, where one pixel in the thumbnail represented a \(224\times 224\)px tile in the slide at 20x magnification. Next, using Otsu thresholding on the thumbnail, a binary mask of tissue (positive) vs. background (negative) was generated. Connected components below 10 thumbnail pixels in tissue were considered background to exclude dirt or other insignificant masses in the digitized slide. Finally, mathematical morphology was used to erode the tissue mask by one thumbnail pixel to minimize tiles with partial background. To separate the problem of cancer subtyping, as discussed in this paper, from the problem of tumor segmentation, the areas of tumor were manually annotated using a web-based whole slide viewer. Using a touchscreen (Surface Pro 3, Surface Studio; Microsoft Inc., Redmond, WA, USA), a pathologist painted over regions of tumor to identify where tiles should be extracted. Tiles were added to the training set if they lay completely within these regions of identified tumor.
Quality Control. Scanning artifacts such as out-of-focus areas of an image can impact model performance on smaller datasets. A deep convolutional neural network was trained to detect blurred tiles to further reduce noise in the dataset. Training a detector on real blur data was beyond the scope of this study because obtaining annotations for blurred regions in the slide is unfeasible and would also create a strong class imbalance between blurred and sharp tiles. To prepare data for training a blur detector, we used an approach similar to a method described in [6]: To start, half of the tiles were artificially blurred by applying Gaussian-blur with a random radius ranging from 1 to 10. The other half were labeled “sharp” and no change was made to them. A ResNet18 was trained to output an image quality score by regressing over the values of the applied filter radius using MSE. A value of 0 was used for images in the sharp class. Finally, a threshold was manually selected to exclude blurred tiles based on the output value of the detector.
2.3 Architecture and Training
We propose a novel convolutional autoencoder architecture to optimize performance in image reconstruction. The encoder is a ResNet18 [9] pretrained on ImageNet [13]. The parameters of all layers of the encoder updated when training the full model on pathology data. The decoder is comprised of five convolutional layers, each with a padding and stride of 1, for keeping the tensor size constant with each convolution operation. Upsampling is used before each convolution step to increase the size of the feature map. Empirically, batch normalization layers did not improve reconstruction performance and thus, were excluded.
Training the model is the fourth phase of our complete pipeline. At each iteration, the model is updated in two steps. After a each forward-pass of a minibatch, the network weights are updated. At the end of each epoch, centroid locations are updated by reassigning all samples in the newly updated embedding space to the nearest centroid from the previous epoch, as described in Eq. 3. Finally, each centroid location is recalculated using Eq. 4. All centroids are randomly initialized before training.
Two properties of the model need to be optimized: first, the weights of the network, \(\theta \), and then locations of the cluster centers, or centroids, in the embedding space, \(C_j\). In order to minimize Eq. 2 and update \(\theta \), the previous training epoch’s set of centroids, \(C_j^{t-1}\), is used. In the case of the first training epoch, centroid locations are randomly assigned upon initialization. A training epoch is defined by the forward-passing of all mini-batches once through the network. After \(\theta \) have been updated, all samples are reassigned to the nearest centroid using Eq. 3. Finally, all centroid locations are updated using Eq. 4 and used in the calculations of the next training epoch. Figure 1 illustrates this process and architecture.
All training was done on DGX-1 compute nodes (NVIDIA, Santa Clara, CA) using PyTorch 0.4 on Linux CentOS 7. The models were trained using Adam optimization for 150 epochs, a learning rate of \(1e^{-2}\), and weight decay of \(1e^{-4}\). To save on computation time, 100,000 tiles were randomly sampled from the complete tile library to train each model, resulting in approximately 400 tiles from each slide on average. The following section describes the selection process for hyper-parameters \(\lambda \) and J, clustering weight and number of clusters, respectively.
Experiments. The Calinski-Harabaz Index (CHI) [5], also known as the variance ratio criterion, was used to measure cluster performance, defined by measuring the ratio of between-clusters dispersion mean and within-cluster dispersion. A higher CHI indicates stronger cluster separation and lower variance within each cluster.
A series of experiments were conducted to optimize \(\lambda \) and J for model selection. With \(\lambda \) set to 0.2, five different models were trained with varying J clusters, ranging from 5 to 25. Secondly, five models were trained with varying \(\lambda \), from 0.2 to 1, with J set to the value corresponding with the highest CHI from the previous experiment. A final model was trained with optimized J and \(\lambda \) to cluster all tiles in the dataset. Each slide was assigned a class based on which cluster was measured to occupy the largest area in the slide. This approach is used because it is similar to how a pathologist would classify a cancer into a subtype based on the most commonly occurring histologic pattern.
Survival Modeling. In order to measure the usefulness and effectiveness of the clustered morphological patterns, we conducted slide-level survival analysis, based on the assigned classes to associated outcome data. Survival data often includes right-censored time durations. This means that the time of event of interest, in our case recurrence detected by radiology, is unknown for some patients. However, the time duration of no recurrence, as measured until the patient’s last follow-up date, is still useful information which can be harnessed for modeling. Cox-proportional hazard modeling is a commonly used model to deal with right-censored data:
where H(t) is the hazard function dependant on time t, \(h_o\) is a baseline hazard, and covariate \(x_i\) is weighted by coefficient \(b_i\). The hazard ratio, or relative likelhood of death, is defined by \(e^{b_j}\). A hazard ratio greater than one indicates that a cluster class contributes to a worse prognosis. Conversely, a hazard ratio less than one contributes to a good prognostic factor. To measure significance in the survival model, p-values based on the Wald statistic, likelihood ratio, and log-rank test are presented for each model.
Five univariate cox models were trained, each with one cluster class held out as a reference to measure impact of survival relative to the other classes. Further, we show Kaplan-Meier curves to illustrate survival outcomes within each class by estimating the survival function S(t):
where \(d_i\) are the number of recurrence events at time t and \(n_i\) are the number of subjects at risk of death or recurrence prior to time t.
3 Results
Results of model selection by varying \(\lambda \) and J are shown in Table 1. Best performance was measured by CHI with \(\lambda \) set to 0.2 and J set to 5.
Cox-proportional hazard modeling showed strong significance in recurrence-free survival between patients when classifying their tissue based on clusters produced by the unsupervised model. Table 2 details the hazard ratios of each cluster, relative to the others in five different models. Each model has one cluster held as a reference to produce the hazard ratio. Figure 2 shows a visualization of the survival model using Kaplan-Meier analysis.
4 Conclusion
Our model offers a novel approach for identifying histological patterns of potential prognostic significance, circumventing the tasks of tedious tissue labeling and laborious human evaluation of multiple whole slides. As a point of comparison, a recent study showed that an effective prognostic score for colorectal cancer was achieved by first segmenting a slide into eight predefined categorical regions using supervised learning [10]. These kind of models limit the model to pre-defined histologic components (tumor, fat, debris, etc.) and the protocol may not extend to extra-colonic anatomic sites lacking a similar tumor specific stroma interactions [2]. In contrast, the design of our model lacks predefined tissue classes, and has the capability to analyze any number of clusters, thus removing potential bias introduced by training and increasing flexibility in application of the model. We truly hope that novel subtyping approaches like this lead to better grading of cholangiocarcinoma and improve treatment and outcome of patients.
5 Disclosures
T.J.F. is the Chief Scientific Officer of Paige.AI, co-founder, and equity holder of Paige.AI. H.M. and T.J.F. have intellectual property interests relevant to the work that is the subject of this paper. MSK has financial interests in Paige.AI and intellectual property interests relevant to the work that is the subject of this paper.
References
Aishima, S., et al.: Proposal of progression model for intrahepatic cholangiocarcinoma: clinicopathologic differences between hilar type and peripheral type. Am. J. Surg. Pathol. 31(7), 1059–1067 (2007)
Balkwill, F.R., Capasso, M., Hagemann, T.: The tumor microenvironment at a glance (2012)
Bedard, P.L., Hansen, A.R., Ratain, M.J., Siu, L.L.: Tumour heterogeneity in the clinic. Nature 501(7467), 355 (2013)
Bloom, H., Richardson, W.: Histological grading and prognosis in breast cancer: a study of 1409 cases of which 359 have been followed for 15 years. Br. J. Cancer 11(3), 359 (1957)
Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)
Campanella, G., et al.: Towards machine learned quality control: a benchmark for sharpness quantification in digital pathology. Comput. Med. Imaging Graph. 65, 142–151 (2018)
Dercksen, K., Bulten, W., Litjens, G.: Dealing with label scarcity in computational pathology: a use case in prostate cancer classification. arXiv preprint arXiv:1905.06820 (2019)
Fouad, S., Randell, D., Galton, A., Mehanna, H., Landini, G.: Unsupervised morphological segmentation of tissue compartments in histopathological images. PLoS ONE 12(11), e0188717 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Kather, J.N., et al.: Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 16(1), e1002730 (2019)
Moriya, T., et al.: Unsupervised pathology image segmentation using representation learning with spherical k-mean. arXiv preprint arXiv:1804.03828 (2018)
Nakajima, T., Kondo, Y., Miyazaki, M., Okui, K.: A histopathologic study of 102 cases of intrahepatic cholangiocarcinoma: histologic classification and modes of spreading. Hum. Pathol. 19(10), 1228–1234 (1988)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)
Saha, S.K., Zhu, A.X., Fuchs, C.S., Brooks, G.A.: Forty-year trends in cholangiocarcinoma incidence in the us: intrahepatic disease on the rise. Oncologist 21(5), 594–599 (2016)
Sempoux, C., et al..: Intrahepatic cholangiocarcinoma: new insights in pathology. In: Seminars in Liver Disease, vol. 31, pp. 049–060. Thieme Medical Publishers (2011)
Song, C., Liu, F., Huang, Y., Wang, L., Tan, T.: Auto-encoder based data clustering. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds.) CIARP 2013. LNCS, vol. 8258, pp. 117–124. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41822-8_15
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Muhammad, H. et al. (2019). Unsupervised Subtyping of Cholangiocarcinoma Using a Deep Clustering Convolutional Autoencoder. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11764. Springer, Cham. https://doi.org/10.1007/978-3-030-32239-7_67
Download citation
DOI: https://doi.org/10.1007/978-3-030-32239-7_67
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32238-0
Online ISBN: 978-3-030-32239-7
eBook Packages: Computer ScienceComputer Science (R0)