Skip to main content

A Multi-information Dual-Layer Cross-Attention Model for Esophageal Fistula Prognosis

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 (MICCAI 2024)

Abstract

Esophageal fistula (EF) is a critical and life-threatening complication following radiotherapy treatment for esophageal cancer (EC). Albeit tabular clinical data contains other clinically valuable information, it is inherently different from CT images and the heterogeneity among them may impede the effective fusion of multi-modal data and thus degrade the performance of deep learning methods. However, current methodologies do not explicitly address this limitation. To tackle this gap, we present an adaptive multi-information dual-layer cross-attention (MDC) model using both CT images and tabular clinical data for early-stage EF detection before radiotherapy. Our MDC model comprises a clinical data encoder, an adaptive 3D Trans-CNN image encoder, and a dual-layer cross-attention (DualCrossAtt) module. The Image Encoder utilizes both CNN and transformer to extract multi-level local and global features, followed by global depth-wise convolution to remove the redundancy from these features for robust adaptive fusion. To mitigate the heterogeneity among multi-modal features and enhance fusion effectiveness, our DualCrossAtt applies the first layer of a cross-attention mechanism to perform alignment between the features of clinical data and images, generating commonly attended features to the second-layer cross-attention that models the global relationship among multi-modal features for prediction. Furthermore, we introduce a contrastive learning-enhanced hybrid loss function to further boost performance. Comparative evaluations against eight state-of-the-art multi-modality predictive models demonstrate the superiority of our method in EF prediction, with potential to assist personalized stratification and precision EC treatment planning.

J. Zhang and H. Xiong—Equal first-author contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Borggreve, A.S., Kingma, B.F., Domrachev, S.A., Koshkin, M.A., Ruurda, J.P., van Hillegersberg, R., Takeda, F.R., Goense, L.: Surgical treatment of esophageal cancer in the era of multimodality management. Annals of the New York Academy of Sciences 1434(1), 192–209 (2018)

    Article  Google Scholar 

  2. Chauhan, G., Liao, R., Wells, W., Andreas, J., Wang, X., Berkowitz, S., Horng, S., Szolovits, P., Golland, P.: Joint modeling of chest radiographs and radiology reports for pulmonary edema assessment. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23. pp. 529–539. Springer (2020)

    Google Scholar 

  3. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International conference on machine learning. pp. 1597–1607. PMLR (2020)

    Google Scholar 

  4. Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1251–1258 (2017)

    Google Scholar 

  5. Cui, H., Xuan, P., Jin, Q., Ding, M., Li, B., Zou, B., Xu, Y., Fan, B., Li, W., Yu, J., et al.: Co-graph attention reasoning based imaging and clinical features integration for lymph node metastasis prediction. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24. pp. 657–666. Springer (2021)

    Google Scholar 

  6. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  7. Faruqui, N., Yousuf, M.A., Whaiduzzaman, M., Azad, A., Barros, A., Moni, M.A.: Lungnet: A hybrid deep-cnn model for lung cancer diagnosis using ct and wearable sensor-based medical iot data. Computers in Biology and Medicine 139, 104961 (2021)

    Article  Google Scholar 

  8. Fu, X., Patrick, E., Yang, J.Y., Feng, D.D., Kim, J.: Deep multimodal graph-based network for survival prediction from highly multiplexed images and patient variables. Computers in Biology and Medicine 154, 106576 (2023)

    Article  Google Scholar 

  9. Guan, Y., Cui, H., Xu, Y., Jin, Q., Feng, T., Tu, H., Xuan, P., Li, W., Wang, L., Duh, B.L.: Predicting esophageal fistula risks using a multimodal self-attention network. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24. pp. 721–730. Springer (2021)

    Google Scholar 

  10. Hirano, H., Boku, N.: The current status of multimodality treatment for unresectable locally advanced esophageal squamous cell carcinoma. Asia-Pacific Journal of Clinical Oncology 14(4), 291–299 (2018)

    Article  Google Scholar 

  11. Jin, Q., Meng, Z., Sun, C., Cui, H., Su, R.: Ra-unet: A hybrid deep attention-aware network to extract liver and tumor in ct scans. Frontiers in Bioengineering and Biotechnology 8, 605132 (2020)

    Article  Google Scholar 

  12. Li, K., Chen, C., Cao, W., Wang, H., Han, S., Wang, R., Ye, Z., Wu, Z., Wang, W., Cai, L., et al.: Deaf: A multimodal deep learning framework for disease prediction. Computers in Biology and Medicine 156, 106715 (2023)

    Article  Google Scholar 

  13. Mahesh, B.: Machine learning algorithms-a review. International Journal of Science and Research (IJSR).[Internet] 9(1), 381–386 (2020)

    Google Scholar 

  14. Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. arXiv preprint arXiv:1803.02155 (2018)

  15. Short, M.W., Burgers, K.G., Fry, V.T.: Esophageal cancer. American family physician 95(1), 22–28 (2017)

    Google Scholar 

  16. Tsushima, T., Mizusawa, J., Sudo, K., Honma, Y., Kato, K., Igaki, H., Tsubosa, Y., Shinoda, M., Nakamura, K., Fukuda, H., et al.: Risk factors for esophageal fistula associated with chemoradiotherapy for locally advanced unresectable esophageal cancer: a supplementary analysis of jcog0303. Medicine 95(20), e3699 (2016)

    Article  Google Scholar 

  17. of Uveitis Nomenclature (SUN) Working Group, S., et al.: Standardization of uveitis nomenclature for reporting clinical data. results of the first international workshop. American journal of ophthalmology 140(3), 509–516 (2005)

    Google Scholar 

  18. Vale-Silva, L.A., Rohr, K.: Multisurv: Long-term cancer survival prediction using multimodal deep learning. MedRxiv pp. 2020–08 (2020)

    Google Scholar 

  19. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)

    Google Scholar 

  20. Yap, J., Yolland, W., Tschandl, P.: Multimodal skin lesion classification using deep learning. Experimental dermatology 27(11), 1261–1267 (2018)

    Article  Google Scholar 

  21. Zhang, Y., Li, Z., Zhang, W., Chen, W., Song, Y.: Risk factors for esophageal fistula in patients with locally advanced esophageal carcinoma receiving chemoradiotherapy. OncoTargets and therapy pp. 2311–2317 (2018)

    Google Scholar 

  22. Zheng, H., Lin, Z., Zhou, Q., Peng, X., Xiao, J., Zu, C., Jiao, Z., Wang, Y.: Multi-transsp: Multimodal transformer for survival prediction of nasopharyngeal carcinoma patients. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 234–243. Springer (2022)

    Google Scholar 

  23. Zhou, H.Y., Yu, Y., Wang, C., Zhang, S., Gao, Y., Pan, J., Shao, J., Lu, G., Zhang, K., Li, W.: A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics. Nature Biomedical Engineering 7(6), 743–755 (2023)

    Article  Google Scholar 

  24. Zhou, Q., Zou, H., Jiang, H., Wang, Y.: Incomplete multimodal learning for visual acuity prediction after cataract surgery using masked self-attention. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 735–744. Springer (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Cui .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Authors have no competing interests in the paper.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, J. et al. (2024). A Multi-information Dual-Layer Cross-Attention Model for Esophageal Fistula Prognosis. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15005. Springer, Cham. https://doi.org/10.1007/978-3-031-72086-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72086-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72085-7

  • Online ISBN: 978-3-031-72086-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy