Grounded Knowledge-Enhanced Medical Vision-Language Pre-training for Chest X-Ray

Deng, Qiao; Huang, Zhongzhen; Wang, Yunqi; Wang, Zhichuan; Wang, Zhao; Zhang, Xiaofan; Dou, Qi; Hui, Yeung Yu; Hui, Edward S.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.14750 (cs)

[Submitted on 23 Apr 2024 (v1), last revised 17 Feb 2025 (this version, v2)]

Title:Grounded Knowledge-Enhanced Medical Vision-Language Pre-training for Chest X-Ray

Authors:Qiao Deng, Zhongzhen Huang, Yunqi Wang, Zhichuan Wang, Zhao Wang, Xiaofan Zhang, Qi Dou, Yeung Yu Hui, Edward S.Hui

View PDF HTML (experimental)

Abstract:Medical foundation models have the potential to revolutionize healthcare by providing robust and generalized representations of medical data. Medical vision-language pre-training has emerged as a promising approach for learning domain-general representations of medical image and text. Current algorithms that exploit global and local alignment between medical image and text could however be marred by redundant information in medical data. To address this issue, we propose a grounded knowledge-enhanced medical vision-language pre-training (GK-MVLP) framework for chest X-ray. In this framework, medical knowledge was grounded to the appropriate anatomical regions by using a transformer-based grounded knowledge-enhanced module for fine-grained alignment between textural features of medical knowledge and the corresponding anatomical region-level visual features. The performance of GK-MVLP was competitive with or exceeded the state of the art on downstream image understanding tasks (chest X-ray disease classification, disease localization), generative task (report generation), and vision-language understanding task (medical visual question-answering). Our results demonstrate the advantage of incorporating grounding mechanism to remove biases and improve the alignment between chest X-ray image and radiology report.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.14750 [cs.CV]
	(or arXiv:2404.14750v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.14750

Submission history

From: Qiao Deng [view email]
[v1] Tue, 23 Apr 2024 05:16:24 UTC (1,497 KB)
[v2] Mon, 17 Feb 2025 02:49:16 UTC (28,398 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Grounded Knowledge-Enhanced Medical Vision-Language Pre-training for Chest X-Ray

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Computer Science > Computer Vision and Pattern Recognition

Title:Grounded Knowledge-Enhanced Medical Vision-Language Pre-training for Chest X-Ray

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.