Skip to main content

Expert Insight-Enhanced Follow-Up Chest X-ray Summary Generation

  • Conference paper
  • First Online:
Artificial Intelligence in Medicine (AIME 2024)

Abstract

A chest X-ray radiology report describes abnormal findings not only from X-ray obtained at current examination, but also findings on disease progression or change in device placement with reference to the X-ray from previous examination. Majority of the efforts on automatic generation of radiology report pertain to reporting the former, but not the latter, type of findings. To the best of the authors’ knowledge, there is only one work dedicated to generating summary of the latter findings, i.e., follow-up summary. In this study, we therefore propose a transformer-based framework to tackle this task. Motivated by our observations on the significance of medical lexicon on the fidelity of summary generation, we introduce two mechanisms to bestow expert insight to our model, namely expert soft guidance and masked entity modeling loss. The former mechanism employs a pretrained expert disease classifier to guide the presence level of specific abnormalities, while the latter directs the model's attention toward medical lexicon. Extensive experiments were conducted to demonstrate that the performance of our model is competitive with or exceeds the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Tanida, T., Müller, P., Kaissis, G., Rueckert, D.: Interactive and explainable region-guided radiology report generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7433–7442. IEEE, Vancouver (2023)

    Google Scholar 

  2. Miura, Y., Zhang, Y., Tsai, E.B., Langlotz, C.P., Jurafsky, D.: Improving factual completeness and consistency of image-to-text radiology report generation, In: Conference of the North-American-Chapter of the Association-for-Computational-Linguistics - Human Language Technologies, pp. 5288–5304. ACL (2021)

    Google Scholar 

  3. Hu, X., Gu, L., An, Q., Zhang, M., Liu, L., et al.: Expert knowledge-aware image difference graph representation learning for difference-aware medical visual question answering. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 4156–4165. ACM (2023)

    Google Scholar 

  4. Qiu, Y., Yamamoto, S., Nakashima, K., Suzuki, R., Iwata, K., Kataoka, H., Satoh, Y.: Describing and localizing multiple changes with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1971–1980. IEEE, New York (2021)

    Google Scholar 

  5. Yao, L., Wang, W., Jin, Q.: Image difference captioning with pre-training and contrastive learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3108–3116. AAAI, Vancouver (2022)

    Google Scholar 

  6. Johnson, A.E., et al.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6(1), 317 (2019)

    Article  Google Scholar 

  7. Jhamtani, H., Berg-Kirkpatrick, T.: Learning to describe differences between Pairs of similar images. In: 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4024–4034. ACL, Brussels (2018)

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE, Seattle (2016)

    Google Scholar 

  9. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  10. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311–318. ACL, Univ Penn (2002)

    Google Scholar 

  11. Denkowski, M., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 376–380. ACL, Baltimore (2014)

    Google Scholar 

  12. ROUGE, L.C.: A package for automatic evaluation of summaries. In: Proceedings of Workshop on Text Summarization of ACL, Spain, vol. 5 (2004)

    Google Scholar 

  13. Vedantam, R., Lawrence Zitnick, C., Parikh, D.: Cider: Consensus-based image description evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4566–4575. IEEE, Boston (2015)

    Google Scholar 

  14. Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 590–597. AAAI, Honolulu (2019)

    Google Scholar 

  15. Ye, W., Yao, J., Xue, H., Li, Y.: Weakly supervised lesion localization with probabilistic-cam pooling. arXiv preprint arXiv:2005.14480 (2020)

  16. Pellegrini, C., Ă–zsoy, E., Busam, B., Navab, N., Keicher, M.: Radialog: a large vision-language model for radiology report generation and conversational assistance. arXiv preprint arXiv:2311.18681 (2023)

  17. Smit, A., Jain, S., Rajpurkar, P., Pareek, A., Ng, A.Y., Lungren, M.P.: CheXbert: combining automatic labelers and expert annotations for accurate radiology report labeling using bert. arXiv preprint arXiv:2004.09167 (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edward S. Hui .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Lightweight EIE-all

In this section, we present a streamlined version of EIE-all, denoted as EIE-light, which relied on expert guidance exclusively during the training phase, excluding the expert soft guidance branch during inference while maintaining commendable performance. To achieve this objective, we propose a random dropout training strategy. Specifically, we assigned an enabling probability \(\upbeta \) to the expert soft guidance. During a random training iteration, we drew a sample from a uniform distribution between 0 and 1, yielding a number \(r\). If \(r<\upbeta \), the guidance from the expert classifier was replaced with zero vectors of the same dimension. Otherwise, the guidance was enabled and put into the injection layer. The results of EIE-light are demonstrated in Table S. 1. Comparison with State-of-the-Art (SOTA): EIE-all surpasses existing SOTA by a substantial margin across all metrics. Furthermore, each proposed mechanism notably enhances the performance of the base model. The combination of both mechanisms leads to further improvement.\(\dag\) Results were reported from EKAID. \(\ddagger \) Results were reproduced using the official codes., and the results of the hyperparameter sensitivity study of \(\upbeta \) are illustrated in Figure Fig. S. 1.

Notably, EIE-light outperformed state-of-the-art methods by a considerable margin, achieving gains of 1.1%, 2.1%, 2.9%, and 3.4% in BLEU, and 4.3% and 2.9% in METEOR and \({\text{ROUGE}}_{\text{L}}\), respectively. Furthermore, there was a notable 0.538 improvement in CIDEr, representing a 52.39% increase. It is worth mentioning that EIE-light exhibited superior performance compared to EIE-mem and comparable performance with EIE-esg, indicating that even without guidance during inference, the guidance during training significantly enhances the performance of EIE-base and EIE-mem. Additionally, as depicted in Fig. S.1., there was minimal fluctuation with changes in \(\upbeta \), maintaining results consistently higher than 1.30, with the optimum performance achieved at \(\upbeta =0.6\).

Table S.1. Comparison with State-of-the-Art (SOTA): EIE-all surpasses existing SOTA by a substantial margin across all metrics. Furthermore, each proposed mechanism notably enhances the performance of the base model. The combination of both mechanisms leads to further improvement.\(\dag\) Results were reported from EKAID. \(\ddagger \) Results were reproduced using the official codes.

1.2 A.2 Expert Soft Guidance with More Observations

As stated in the main paper, we adopted PCAM [15], which was the top1 solution of CheXpert [14] classification, as our expert classifier. However, PCAM can only produce the predictions of the 5 most common diseases contained in the CheXpert test set, resulting in suboptimal guidance. Therefore, it is intuitive that the performance would be further improved if the expert classifier could provide guidance on more observations. A recent work [16] is capable of offering the classification results for all 14 observations of CheXpert, denoted as Expert-14, which was then incorporated into our model and was assessed accordingly. When the expert classifier of EIE-esg was substituted by Expert-14, we denote it as EIE-esg-14. When the classifier of EIE-all was replaced with Expert-14, it is denoted as EIE-all-14.

All the results are demonstrated in Table S.1. Comparison with State-of-the-Art (SOTA): EIE-all surpasses existing SOTA by a substantial margin across all metrics. Furthermore, each proposed mechanism notably enhances the performance of the base model. The combination of both mechanisms leads to further improvement.\(\dag\) Results were reported from EKAID. \(\ddagger \) Results were reproduced using the official codes. We can see that model performance improved further when more guidance was available. Specifically, EIE-esg-14 brought more performance gain on all metrics than EIE-esg: 1.8%, 0.9%, 1.9% and 0.132 on BLEU, METEOR, \({\text{ROUGE}}_{\text{L}}\) and CIDEr, respectively. Together with the guidance from all 14 observations and MEM, EIE-all-14 consistently achieved further improvements in performance. Compared with EIE-all, EIE-all-14 gained 2.4%, 2.1%, 2.1% and 2.1% in BLEU, 0.4% and 0.082 in \({\text{ROUGE}}_{\text{L}}\) and CIDEr respectively. These results suggest that more expert guidance could further boost the overall performance of our proposed model. It is worth noting that although PCAM can only predict the presence of the most common abnormalities on chest X-ray, it can significantly improve the overall performance. This could be due to the fact that PCAM can not only offer the presence probability of these abnormalities, but also the presence probability of other related abnormalities.

Fig. S.1.
figure 4

Hyperparameter \(\upbeta \) sensitivity study of EIE-light

1.3 A.3 Entity Selection

To enrich expert insight into abnormality prediction and complement EIE-esg, we have selected relevant terms associated with diseases from CheXpert, namely atelectasis, edema, pneumothorax, cardiomegaly, consolidation, cardiac silhouette, fracture, lung opacity, pleural effusion, and pneumonia.

1.4 A.4 Metrics

Following [1], we have incorporated clinical efficacy metrics, Acc5 and Acc14, to showcase the effectiveness of our model. Specifically, Acc5 represents the micro-averaged accuracy across the 5 most common observations: atelectasis, cardiomegaly, consolidation, edema, and pleural effusion. Acc14 is the example-based averaged accuracy across all 14 observations in CheXpert, encompassing pneumonia, fracture, consolidation, enlarged cardiomediastinum, no finding, pleural other, cardiomegaly, pneumothorax, atelectasis, support devices, edema, pleural effusion, lung lesion and lung opacity. We employed the code from [1] to calculate these clinical efficacy metrics, utilizing CheXbert [17] labeler to determine the presence of the most common observations. It is noteworthy that, following [1], instances labeled as negative, uncertain, or no mention were all treated as negative results.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, Z. et al. (2024). Expert Insight-Enhanced Follow-Up Chest X-ray Summary Generation. In: Finkelstein, J., Moskovitch, R., Parimbelli, E. (eds) Artificial Intelligence in Medicine. AIME 2024. Lecture Notes in Computer Science(), vol 14845. Springer, Cham. https://doi.org/10.1007/978-3-031-66535-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-66535-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-66534-9

  • Online ISBN: 978-3-031-66535-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy