A Multi-information Dual-Layer Cross-Attention Model for Esophageal Fistula Prognosis

Zhang, Jianqiao; Xiong, Hao; Jin, Qiangguo; Feng, Tian; Ma, Jiquan; Xuan, Ping; Cheng, Peng; Ning, Zhiyuan; Ning, Zhiyu; Li, Changyang; Wang, Linlin; Cui, Hui

doi:10.1007/978-3-031-72086-4_3

Jianqiao Zhang ORCID: orcid.org/0000-0002-1718-8711¹⁴,
Hao Xiong ORCID: orcid.org/0000-0002-6842-1667¹⁵,
Qiangguo Jin¹⁶,
Tian Feng¹⁷,
Jiquan Ma¹⁸,
Ping Xuan¹⁹,
Peng Cheng¹⁴,
Zhiyuan Ning²⁰,
Zhiyu Ning²⁰,
Changyang Li²¹,
Linlin Wang²² &
…
Hui Cui ORCID: orcid.org/0000-0001-8224-4698¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15005))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

Abstract

Esophageal fistula (EF) is a critical and life-threatening complication following radiotherapy treatment for esophageal cancer (EC). Albeit tabular clinical data contains other clinically valuable information, it is inherently different from CT images and the heterogeneity among them may impede the effective fusion of multi-modal data and thus degrade the performance of deep learning methods. However, current methodologies do not explicitly address this limitation. To tackle this gap, we present an adaptive multi-information dual-layer cross-attention (MDC) model using both CT images and tabular clinical data for early-stage EF detection before radiotherapy. Our MDC model comprises a clinical data encoder, an adaptive 3D Trans-CNN image encoder, and a dual-layer cross-attention (DualCrossAtt) module. The Image Encoder utilizes both CNN and transformer to extract multi-level local and global features, followed by global depth-wise convolution to remove the redundancy from these features for robust adaptive fusion. To mitigate the heterogeneity among multi-modal features and enhance fusion effectiveness, our DualCrossAtt applies the first layer of a cross-attention mechanism to perform alignment between the features of clinical data and images, generating commonly attended features to the second-layer cross-attention that models the global relationship among multi-modal features for prediction. Furthermore, we introduce a contrastive learning-enhanced hybrid loss function to further boost performance. Comparative evaluations against eight state-of-the-art multi-modality predictive models demonstrate the superiority of our method in EF prediction, with potential to assist personalized stratification and precision EC treatment planning.

J. Zhang and H. Xiong—Equal first-author contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Predicting Esophageal Fistula Risks Using a Multimodal Self-attention Network

Collaborative Learning of Cross-channel Clinical Attention for Radiotherapy-Related Esophageal Fistula Prediction from CT

LOMIA-T: A Transformer-Based LOngitudinal Medical Image Analysis Framework for Predicting Treatment Response of Esophageal Cancer

References

Borggreve, A.S., Kingma, B.F., Domrachev, S.A., Koshkin, M.A., Ruurda, J.P., van Hillegersberg, R., Takeda, F.R., Goense, L.: Surgical treatment of esophageal cancer in the era of multimodality management. Annals of the New York Academy of Sciences 1434(1), 192–209 (2018)
Article Google Scholar
Chauhan, G., Liao, R., Wells, W., Andreas, J., Wang, X., Berkowitz, S., Horng, S., Szolovits, P., Golland, P.: Joint modeling of chest radiographs and radiology reports for pulmonary edema assessment. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23. pp. 529–539. Springer (2020)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International conference on machine learning. pp. 1597–1607. PMLR (2020)
Google Scholar
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1251–1258 (2017)
Google Scholar
Cui, H., Xuan, P., Jin, Q., Ding, M., Li, B., Zou, B., Xu, Y., Fan, B., Li, W., Yu, J., et al.: Co-graph attention reasoning based imaging and clinical features integration for lymph node metastasis prediction. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24. pp. 657–666. Springer (2021)
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Faruqui, N., Yousuf, M.A., Whaiduzzaman, M., Azad, A., Barros, A., Moni, M.A.: Lungnet: A hybrid deep-cnn model for lung cancer diagnosis using ct and wearable sensor-based medical iot data. Computers in Biology and Medicine 139, 104961 (2021)
Article Google Scholar
Fu, X., Patrick, E., Yang, J.Y., Feng, D.D., Kim, J.: Deep multimodal graph-based network for survival prediction from highly multiplexed images and patient variables. Computers in Biology and Medicine 154, 106576 (2023)
Article Google Scholar
Guan, Y., Cui, H., Xu, Y., Jin, Q., Feng, T., Tu, H., Xuan, P., Li, W., Wang, L., Duh, B.L.: Predicting esophageal fistula risks using a multimodal self-attention network. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24. pp. 721–730. Springer (2021)
Google Scholar
Hirano, H., Boku, N.: The current status of multimodality treatment for unresectable locally advanced esophageal squamous cell carcinoma. Asia-Pacific Journal of Clinical Oncology 14(4), 291–299 (2018)
Article Google Scholar
Jin, Q., Meng, Z., Sun, C., Cui, H., Su, R.: Ra-unet: A hybrid deep attention-aware network to extract liver and tumor in ct scans. Frontiers in Bioengineering and Biotechnology 8, 605132 (2020)
Article Google Scholar
Li, K., Chen, C., Cao, W., Wang, H., Han, S., Wang, R., Ye, Z., Wu, Z., Wang, W., Cai, L., et al.: Deaf: A multimodal deep learning framework for disease prediction. Computers in Biology and Medicine 156, 106715 (2023)
Article Google Scholar
Mahesh, B.: Machine learning algorithms-a review. International Journal of Science and Research (IJSR).[Internet] 9(1), 381–386 (2020)
Google Scholar
Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. arXiv preprint arXiv:1803.02155 (2018)
Short, M.W., Burgers, K.G., Fry, V.T.: Esophageal cancer. American family physician 95(1), 22–28 (2017)
Google Scholar
Tsushima, T., Mizusawa, J., Sudo, K., Honma, Y., Kato, K., Igaki, H., Tsubosa, Y., Shinoda, M., Nakamura, K., Fukuda, H., et al.: Risk factors for esophageal fistula associated with chemoradiotherapy for locally advanced unresectable esophageal cancer: a supplementary analysis of jcog0303. Medicine 95(20), e3699 (2016)
Article Google Scholar
of Uveitis Nomenclature (SUN) Working Group, S., et al.: Standardization of uveitis nomenclature for reporting clinical data. results of the first international workshop. American journal of ophthalmology 140(3), 509–516 (2005)
Google Scholar
Vale-Silva, L.A., Rohr, K.: Multisurv: Long-term cancer survival prediction using multimodal deep learning. MedRxiv pp. 2020–08 (2020)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Google Scholar
Yap, J., Yolland, W., Tschandl, P.: Multimodal skin lesion classification using deep learning. Experimental dermatology 27(11), 1261–1267 (2018)
Article Google Scholar
Zhang, Y., Li, Z., Zhang, W., Chen, W., Song, Y.: Risk factors for esophageal fistula in patients with locally advanced esophageal carcinoma receiving chemoradiotherapy. OncoTargets and therapy pp. 2311–2317 (2018)
Google Scholar
Zheng, H., Lin, Z., Zhou, Q., Peng, X., Xiao, J., Zu, C., Jiao, Z., Wang, Y.: Multi-transsp: Multimodal transformer for survival prediction of nasopharyngeal carcinoma patients. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 234–243. Springer (2022)
Google Scholar
Zhou, H.Y., Yu, Y., Wang, C., Zhang, S., Gao, Y., Pan, J., Shao, J., Lu, G., Zhang, K., Li, W.: A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics. Nature Biomedical Engineering 7(6), 743–755 (2023)
Article Google Scholar
Zhou, Q., Zou, H., Jiang, H., Wang, Y.: Incomplete multimodal learning for visual acuity prediction after cataract surgery using masked self-attention. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 735–744. Springer (2023)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia
Jianqiao Zhang, Peng Cheng & Hui Cui
Centre for Health Informatics, Macquarie University, Sydney, Australia
Hao Xiong
School of Software, Northwestern Polytechnical University, Xi’an, Shaanxi, China
Qiangguo Jin
School of Software Technology, Zhejiang University, Hangzhou, Zhejiang, China
Tian Feng
Department of Computer Science and Technology, Heilongjiang University, Harbin, China
Jiquan Ma
Department of Computer Science, Shantou University, Shantou, China
Ping Xuan
School of Computer Science, The University of Sydney, Sydney, Australia
Zhiyuan Ning & Zhiyu Ning
Sydney Polytechnic Institute, Sydney, Australia
Changyang Li
Department of Radiation Oncology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, China
Linlin Wang

Authors

Jianqiao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Qiangguo Jin
View author publications
You can also search for this author in PubMed Google Scholar
Tian Feng
View author publications
You can also search for this author in PubMed Google Scholar
Jiquan Ma
View author publications
You can also search for this author in PubMed Google Scholar
Ping Xuan
View author publications
You can also search for this author in PubMed Google Scholar
Peng Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyuan Ning
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyu Ning
View author publications
You can also search for this author in PubMed Google Scholar
Changyang Li
View author publications
You can also search for this author in PubMed Google Scholar
Linlin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Cui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Cui .

Editor information

Editors and Affiliations

Children’s National Hospital/George Washington University, Washington, DC, USA
Marius George Linguraru
The Chinese University of Hong Kong, Hong Kong, China
Qi Dou
Technical University of Denmark, Kgs Lyngby, Denmark
Aasa Feragen
Imperial College London, London, UK
Stamatia Giannarou
Imperial College London, London, UK
Ben Glocker
Universitat de Barcelona, Barcelona, Spain
Karim Lekadir
Helmholtz Munich, Technical University of Munich and King’s College London, Munich, Germany
Julia A. Schnabel

Ethics declarations

Disclosure of Interests

Authors have no competing interests in the paper.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J. et al. (2024). A Multi-information Dual-Layer Cross-Attention Model for Esophageal Fistula Prognosis. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15005. Springer, Cham. https://doi.org/10.1007/978-3-031-72086-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-72086-4_3
Published: 04 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72085-7
Online ISBN: 978-3-031-72086-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

A Multi-information Dual-Layer Cross-Attention Model for Esophageal Fistula Prognosis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Predicting Esophageal Fistula Risks Using a Multimodal Self-attention Network

Collaborative Learning of Cross-channel Clinical Attention for Radiotherapy-Related Esophageal Fistula Prediction from CT

LOMIA-T: A Transformer-Based LOngitudinal Medical Image Analysis Framework for Predicting Treatment Response of Esophageal Cancer

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

A Multi-information Dual-Layer Cross-Attention Model for Esophageal Fistula Prognosis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Predicting Esophageal Fistula Risks Using a Multimodal Self-attention Network

Collaborative Learning of Cross-channel Clinical Attention for Radiotherapy-Related Esophageal Fistula Prediction from CT

LOMIA-T: A Transformer-Based LOngitudinal Medical Image Analysis Framework for Predicting Treatment Response of Esophageal Cancer

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.