Abstract
Esophageal fistula (EF) is a critical and life-threatening complication following radiotherapy treatment for esophageal cancer (EC). Albeit tabular clinical data contains other clinically valuable information, it is inherently different from CT images and the heterogeneity among them may impede the effective fusion of multi-modal data and thus degrade the performance of deep learning methods. However, current methodologies do not explicitly address this limitation. To tackle this gap, we present an adaptive multi-information dual-layer cross-attention (MDC) model using both CT images and tabular clinical data for early-stage EF detection before radiotherapy. Our MDC model comprises a clinical data encoder, an adaptive 3D Trans-CNN image encoder, and a dual-layer cross-attention (DualCrossAtt) module. The Image Encoder utilizes both CNN and transformer to extract multi-level local and global features, followed by global depth-wise convolution to remove the redundancy from these features for robust adaptive fusion. To mitigate the heterogeneity among multi-modal features and enhance fusion effectiveness, our DualCrossAtt applies the first layer of a cross-attention mechanism to perform alignment between the features of clinical data and images, generating commonly attended features to the second-layer cross-attention that models the global relationship among multi-modal features for prediction. Furthermore, we introduce a contrastive learning-enhanced hybrid loss function to further boost performance. Comparative evaluations against eight state-of-the-art multi-modality predictive models demonstrate the superiority of our method in EF prediction, with potential to assist personalized stratification and precision EC treatment planning.
J. Zhang and H. Xiong—Equal first-author contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Borggreve, A.S., Kingma, B.F., Domrachev, S.A., Koshkin, M.A., Ruurda, J.P., van Hillegersberg, R., Takeda, F.R., Goense, L.: Surgical treatment of esophageal cancer in the era of multimodality management. Annals of the New York Academy of Sciences 1434(1), 192–209 (2018)
Chauhan, G., Liao, R., Wells, W., Andreas, J., Wang, X., Berkowitz, S., Horng, S., Szolovits, P., Golland, P.: Joint modeling of chest radiographs and radiology reports for pulmonary edema assessment. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23. pp. 529–539. Springer (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International conference on machine learning. pp. 1597–1607. PMLR (2020)
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1251–1258 (2017)
Cui, H., Xuan, P., Jin, Q., Ding, M., Li, B., Zou, B., Xu, Y., Fan, B., Li, W., Yu, J., et al.: Co-graph attention reasoning based imaging and clinical features integration for lymph node metastasis prediction. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24. pp. 657–666. Springer (2021)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Faruqui, N., Yousuf, M.A., Whaiduzzaman, M., Azad, A., Barros, A., Moni, M.A.: Lungnet: A hybrid deep-cnn model for lung cancer diagnosis using ct and wearable sensor-based medical iot data. Computers in Biology and Medicine 139, 104961 (2021)
Fu, X., Patrick, E., Yang, J.Y., Feng, D.D., Kim, J.: Deep multimodal graph-based network for survival prediction from highly multiplexed images and patient variables. Computers in Biology and Medicine 154, 106576 (2023)
Guan, Y., Cui, H., Xu, Y., Jin, Q., Feng, T., Tu, H., Xuan, P., Li, W., Wang, L., Duh, B.L.: Predicting esophageal fistula risks using a multimodal self-attention network. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24. pp. 721–730. Springer (2021)
Hirano, H., Boku, N.: The current status of multimodality treatment for unresectable locally advanced esophageal squamous cell carcinoma. Asia-Pacific Journal of Clinical Oncology 14(4), 291–299 (2018)
Jin, Q., Meng, Z., Sun, C., Cui, H., Su, R.: Ra-unet: A hybrid deep attention-aware network to extract liver and tumor in ct scans. Frontiers in Bioengineering and Biotechnology 8, 605132 (2020)
Li, K., Chen, C., Cao, W., Wang, H., Han, S., Wang, R., Ye, Z., Wu, Z., Wang, W., Cai, L., et al.: Deaf: A multimodal deep learning framework for disease prediction. Computers in Biology and Medicine 156, 106715 (2023)
Mahesh, B.: Machine learning algorithms-a review. International Journal of Science and Research (IJSR).[Internet] 9(1), 381–386 (2020)
Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. arXiv preprint arXiv:1803.02155 (2018)
Short, M.W., Burgers, K.G., Fry, V.T.: Esophageal cancer. American family physician 95(1), 22–28 (2017)
Tsushima, T., Mizusawa, J., Sudo, K., Honma, Y., Kato, K., Igaki, H., Tsubosa, Y., Shinoda, M., Nakamura, K., Fukuda, H., et al.: Risk factors for esophageal fistula associated with chemoradiotherapy for locally advanced unresectable esophageal cancer: a supplementary analysis of jcog0303. Medicine 95(20), e3699 (2016)
of Uveitis Nomenclature (SUN) Working Group, S., et al.: Standardization of uveitis nomenclature for reporting clinical data. results of the first international workshop. American journal of ophthalmology 140(3), 509–516 (2005)
Vale-Silva, L.A., Rohr, K.: Multisurv: Long-term cancer survival prediction using multimodal deep learning. MedRxiv pp. 2020–08 (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Yap, J., Yolland, W., Tschandl, P.: Multimodal skin lesion classification using deep learning. Experimental dermatology 27(11), 1261–1267 (2018)
Zhang, Y., Li, Z., Zhang, W., Chen, W., Song, Y.: Risk factors for esophageal fistula in patients with locally advanced esophageal carcinoma receiving chemoradiotherapy. OncoTargets and therapy pp. 2311–2317 (2018)
Zheng, H., Lin, Z., Zhou, Q., Peng, X., Xiao, J., Zu, C., Jiao, Z., Wang, Y.: Multi-transsp: Multimodal transformer for survival prediction of nasopharyngeal carcinoma patients. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 234–243. Springer (2022)
Zhou, H.Y., Yu, Y., Wang, C., Zhang, S., Gao, Y., Pan, J., Shao, J., Lu, G., Zhang, K., Li, W.: A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics. Nature Biomedical Engineering 7(6), 743–755 (2023)
Zhou, Q., Zou, H., Jiang, H., Wang, Y.: Incomplete multimodal learning for visual acuity prediction after cataract surgery using masked self-attention. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 735–744. Springer (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
Authors have no competing interests in the paper.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, J. et al. (2024). A Multi-information Dual-Layer Cross-Attention Model for Esophageal Fistula Prognosis. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15005. Springer, Cham. https://doi.org/10.1007/978-3-031-72086-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-72086-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72085-7
Online ISBN: 978-3-031-72086-4
eBook Packages: Computer ScienceComputer Science (R0)