Weakly Supervised Captioning of Ultrasound Images

Alsharid, Mohammad; Sharma, Harshita; Drukker, Lior; Papageorgiou, Aris T.; Noble, J. Alison

doi:10.1007/978-3-031-12053-4_14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13413))

Included in the following conference series:

Annual Conference on Medical Image Understanding and Analysis

2880 Accesses
2 Citations

Abstract

Medical image captioning models generate text to describe the semantic contents of an image, aiding the non-experts in understanding and interpretation. We propose a weakly-supervised approach to improve the performance of image captioning models on small image-text datasets by leveraging a large anatomically-labelled image classification dataset. Our method generates pseudo-captions (weak labels) for caption-less but anatomically-labelled (class-labelled) images using an encoder-decoder sequence-to-sequence model. The augmented dataset is used to train an image-captioning model in a weakly supervised learning manner. For fetal ultrasound, we demonstrate that the proposed augmentation approach outperforms the baseline on semantics and syntax-based metrics, with nearly twice as much improvement in value on BLEU-1 and ROUGE-L. Moreover, we observe that superior models are trained with the proposed data augmentation, when compared with the existing regularization techniques. This work allows seamless automatic annotation of images that lack human-prepared descriptive captions for training image-captioning models. Using pseudo-captions in the training data is particularly useful for medical image captioning when significant time and effort of medical experts is required to obtain real image captions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Medical image captioning via generative pretrained transformers

Article Open access 13 March 2023

Automatic captioning for medical imaging (MIC): a rapid review of literature

Article Open access 17 September 2022

A Curriculum Learning Based Approach to Captioning Ultrasound Images

References

Google code archive (2018). https://code.google.com/archive/p/word2vec/
Evaluating models | automl translation documentation (2020). https://cloud.google.com/translate/automl/docs/evaluate
Grammarbot (2020). https://www.grammarbot.io/
Textblob (2020). https://textblob.readthedocs.io/en/dev/
Context analysis in NLP: why it’s valuable and how it’s done (2021). https://www.lexalytics.com/lexablog/context-analysis-nlp
Alsharid, M., El-Bouri, R., Sharma, H., Drukker, L., Papageorghiou, A.T., Noble, J.A.: A curriculum learning based approach to captioning ultrasound images. In: Hu, Y., et al. (eds.) ASMUS/PIPPI -2020. LNCS, vol. 12437, pp. 75–84. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60334-2_8
Chapter Google Scholar
Alsharid, M., El-Bouri, R., Sharma, H., Drukker, L., Papageorghiou, A.T., Noble, J.A.: A course-focused dual curriculum for image captioning. In: 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 716–720. IEEE (2021)
Google Scholar
Alsharid, M., Sharma, H., Drukker, L., Chatelain, P., Papageorghiou, A.T., Noble, J.A.: Captioning ultrasound images automatically. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11767, pp. 338–346. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32251-9_37
Chapter Google Scholar
Bernardi, R., Cakici, R., Elliott, D., Erdem, A., Erdem, E., Ikizler-Cinbis, N., et al.: Automatic description generation from images: a survey of models, datasets, and evaluation measures. J. Artif. Intell. Res. 55, 409–442 (2016)
Article Google Scholar
Burkov, A.: The Hundred-Page Machine Learning Book, pp. 100–101. Andriy Burkov (2019)
Google Scholar
Chen, L., et al.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5659–5667 (2017)
Google Scholar
Chen, T.H., Liao, Y.H., Chuang, C.Y., Hsu, W.T., Fu, J., Sun, M.: Show, adapt and tell: adversarial training of cross-domain image captioner. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 521–530 (2017)
Google Scholar
Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, Barcelona, Spain, pp. 56–60 (2004)
Google Scholar
Devlin, J., Cheng, H., Fang, H., Gupta, S., Deng, L., He, X., et al.: Language models for image captioning: the quirks and what works. arXiv preprint arXiv:1505.01809 (2015)
Drukker, L., et al.: Transforming obstetric ultrasound into data science using eye tracking, voice recording, transducer motion and ultrasound video. Sci. Rep. 11(1), 1–12 (2021)
Article Google Scholar
Feng, Y., Ma, L., Liu, W., Luo, J.: Unsupervised image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4125–4134 (2019)
Google Scholar
Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. arXiv preprint arXiv:1512.05287 (2015)
Guadarrama, S., et al.: YouTube2Text: recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2712–2719 (2013)
Google Scholar
Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2012–2019. IEEE (2009)
Google Scholar
Hendricks, L.A., Venugopalan, S., Rohrbach, M., Mooney, R., Saenko, K., Darrell, T.: Deep compositional captioning: describing novel object categories without paired training data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–10 (2016)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Krishnamoorthy, N., Malkarnenkar, G., Mooney, R., Saenko, K., Guadarrama, S.: Generating natural-language video descriptions using text-mined knowledge. In: Proceedings of the Workshop on Vision and Natural Language Processing, pp. 10–19 (2013)
Google Scholar
Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., et al.: Babytalk: understanding and generating simple image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2891–2903 (2013)
Article Google Scholar
Lyndon, D., Kumar, A., Kim, J.: Neural captioning for the ImageCLEF 2017 medical image challenges. In: CLEF (Working Notes) (2017)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Sharma, H., Drukker, L., Chatelain, P., Droste, R., Papageorghiou, A.T., Noble, J.A.: Knowledge representation and learning of operator clinical workflow from full-length routine fetal ultrasound scan videos. Med. Image Anal. 69, 101973 (2021)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Stuart, L.M., Taylor, J.M., Raskin, V.: The importance of nouns in text processing. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 35 (2013)
Google Scholar
Tanti, M., Gatt, A., Camilleri, K.: What is the role of recurrent neural networks (RNNs) in an image caption generator? arXiv preprint arXiv:1708.02043 (2017)
Tanti, M., Gatt, A., Camilleri, K.P.: Where to put the image in an image caption generator. Nat. Lang. Eng. 24(3), 467–489 (2018)
Article Google Scholar
Thomason, J., Venugopalan, S., Guadarrama, S., Saenko, K., Mooney, R.: Integrating language and vision to generate natural language descriptions of videos in the wild. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 1218–1227 (2014)
Google Scholar
Topol, E.J.: A decade of digital medicine innovation. Sci. Transl. Med. 11(498), eaaw7610 (2019)
Google Scholar
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
Google Scholar
Wang, X., Pham, H., Dai, Z., Neubig, G.: SwitchOut: an efficient data augmentation algorithm for neural machine translation. arXiv preprint arXiv:1808.07512 (2018)
Zeng, X.H., Liu, B.G., Zhou, M.: Understanding and generating ultrasound image description. J. Comput. Sci. Technol. 33(5), 1086–1100 (2018)
Article Google Scholar
Zeng, X., Wen, L., Liu, B., Qi, X.: Deep learning for ultrasound image caption generation based on object detection. Neurocomputing 392, 132–141 (2019)
Article Google Scholar
Zhao, W., et al.: Dual learning for cross-domain image captioning. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 29–38 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, UK
Mohammad Alsharid, Harshita Sharma & J. Alison Noble
Nuffield Department of Women’s and Reproductive Health, University of Oxford, Oxford, UK
Lior Drukker & Aris T. Papageorgiou

Authors

Mohammad Alsharid
View author publications
You can also search for this author in PubMed Google Scholar
Harshita Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Lior Drukker
View author publications
You can also search for this author in PubMed Google Scholar
Aris T. Papageorgiou
View author publications
You can also search for this author in PubMed Google Scholar
J. Alison Noble
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Alsharid .

Editor information

Editors and Affiliations

Imperial College London, London, UK
Guang Yang
University of Cambridge, Cambridge, UK
Angelica Aviles-Rivero
University of Cambridge, Cambridge, UK
Michael Roberts
University of Cambridge, Cambridge, UK
Carola-Bibiane Schönlieb

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alsharid, M., Sharma, H., Drukker, L., Papageorgiou, A.T., Noble, J.A. (2022). Weakly Supervised Captioning of Ultrasound Images. In: Yang, G., Aviles-Rivero, A., Roberts, M., Schönlieb, CB. (eds) Medical Image Understanding and Analysis. MIUA 2022. Lecture Notes in Computer Science, vol 13413. Springer, Cham. https://doi.org/10.1007/978-3-031-12053-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-12053-4_14
Published: 25 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-12052-7
Online ISBN: 978-3-031-12053-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Weakly Supervised Captioning of Ultrasound Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Medical image captioning via generative pretrained transformers

Automatic captioning for medical imaging (MIC): a rapid review of literature

A Curriculum Learning Based Approach to Captioning Ultrasound Images

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Weakly Supervised Captioning of Ultrasound Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Medical image captioning via generative pretrained transformers

Automatic captioning for medical imaging (MIC): a rapid review of literature

A Curriculum Learning Based Approach to Captioning Ultrasound Images

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.