Skip to main content

Weakly Supervised Captioning of Ultrasound Images

  • Conference paper
  • First Online:
Medical Image Understanding and Analysis (MIUA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13413))

Included in the following conference series:

Abstract

Medical image captioning models generate text to describe the semantic contents of an image, aiding the non-experts in understanding and interpretation. We propose a weakly-supervised approach to improve the performance of image captioning models on small image-text datasets by leveraging a large anatomically-labelled image classification dataset. Our method generates pseudo-captions (weak labels) for caption-less but anatomically-labelled (class-labelled) images using an encoder-decoder sequence-to-sequence model. The augmented dataset is used to train an image-captioning model in a weakly supervised learning manner. For fetal ultrasound, we demonstrate that the proposed augmentation approach outperforms the baseline on semantics and syntax-based metrics, with nearly twice as much improvement in value on BLEU-1 and ROUGE-L. Moreover, we observe that superior models are trained with the proposed data augmentation, when compared with the existing regularization techniques. This work allows seamless automatic annotation of images that lack human-prepared descriptive captions for training image-captioning models. Using pseudo-captions in the training data is particularly useful for medical image captioning when significant time and effort of medical experts is required to obtain real image captions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Google code archive (2018). https://code.google.com/archive/p/word2vec/

  2. Evaluating models | automl translation documentation (2020). https://cloud.google.com/translate/automl/docs/evaluate

  3. Grammarbot (2020). https://www.grammarbot.io/

  4. Textblob (2020). https://textblob.readthedocs.io/en/dev/

  5. Context analysis in NLP: why it’s valuable and how it’s done (2021). https://www.lexalytics.com/lexablog/context-analysis-nlp

  6. Alsharid, M., El-Bouri, R., Sharma, H., Drukker, L., Papageorghiou, A.T., Noble, J.A.: A curriculum learning based approach to captioning ultrasound images. In: Hu, Y., et al. (eds.) ASMUS/PIPPI -2020. LNCS, vol. 12437, pp. 75–84. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60334-2_8

    Chapter  Google Scholar 

  7. Alsharid, M., El-Bouri, R., Sharma, H., Drukker, L., Papageorghiou, A.T., Noble, J.A.: A course-focused dual curriculum for image captioning. In: 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pp. 716–720. IEEE (2021)

    Google Scholar 

  8. Alsharid, M., Sharma, H., Drukker, L., Chatelain, P., Papageorghiou, A.T., Noble, J.A.: Captioning ultrasound images automatically. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11767, pp. 338–346. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32251-9_37

    Chapter  Google Scholar 

  9. Bernardi, R., Cakici, R., Elliott, D., Erdem, A., Erdem, E., Ikizler-Cinbis, N., et al.: Automatic description generation from images: a survey of models, datasets, and evaluation measures. J. Artif. Intell. Res. 55, 409–442 (2016)

    Article  Google Scholar 

  10. Burkov, A.: The Hundred-Page Machine Learning Book, pp. 100–101. Andriy Burkov (2019)

    Google Scholar 

  11. Chen, L., et al.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5659–5667 (2017)

    Google Scholar 

  12. Chen, T.H., Liao, Y.H., Chuang, C.Y., Hsu, W.T., Fu, J., Sun, M.: Show, adapt and tell: adversarial training of cross-domain image captioner. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 521–530 (2017)

    Google Scholar 

  13. Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, Barcelona, Spain, pp. 56–60 (2004)

    Google Scholar 

  14. Devlin, J., Cheng, H., Fang, H., Gupta, S., Deng, L., He, X., et al.: Language models for image captioning: the quirks and what works. arXiv preprint arXiv:1505.01809 (2015)

  15. Drukker, L., et al.: Transforming obstetric ultrasound into data science using eye tracking, voice recording, transducer motion and ultrasound video. Sci. Rep. 11(1), 1–12 (2021)

    Article  Google Scholar 

  16. Feng, Y., Ma, L., Liu, W., Luo, J.: Unsupervised image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4125–4134 (2019)

    Google Scholar 

  17. Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. arXiv preprint arXiv:1512.05287 (2015)

  18. Guadarrama, S., et al.: YouTube2Text: recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2712–2719 (2013)

    Google Scholar 

  19. Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2012–2019. IEEE (2009)

    Google Scholar 

  20. Hendricks, L.A., Venugopalan, S., Rohrbach, M., Mooney, R., Saenko, K., Darrell, T.: Deep compositional captioning: describing novel object categories without paired training data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–10 (2016)

    Google Scholar 

  21. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  22. Krishnamoorthy, N., Malkarnenkar, G., Mooney, R., Saenko, K., Guadarrama, S.: Generating natural-language video descriptions using text-mined knowledge. In: Proceedings of the Workshop on Vision and Natural Language Processing, pp. 10–19 (2013)

    Google Scholar 

  23. Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., et al.: Babytalk: understanding and generating simple image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2891–2903 (2013)

    Article  Google Scholar 

  24. Lyndon, D., Kumar, A., Kim, J.: Neural captioning for the ImageCLEF 2017 medical image challenges. In: CLEF (Working Notes) (2017)

    Google Scholar 

  25. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)

    Google Scholar 

  26. Sharma, H., Drukker, L., Chatelain, P., Droste, R., Papageorghiou, A.T., Noble, J.A.: Knowledge representation and learning of operator clinical workflow from full-length routine fetal ultrasound scan videos. Med. Image Anal. 69, 101973 (2021)

    Article  Google Scholar 

  27. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  28. Stuart, L.M., Taylor, J.M., Raskin, V.: The importance of nouns in text processing. In: Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 35 (2013)

    Google Scholar 

  29. Tanti, M., Gatt, A., Camilleri, K.: What is the role of recurrent neural networks (RNNs) in an image caption generator? arXiv preprint arXiv:1708.02043 (2017)

  30. Tanti, M., Gatt, A., Camilleri, K.P.: Where to put the image in an image caption generator. Nat. Lang. Eng. 24(3), 467–489 (2018)

    Article  Google Scholar 

  31. Thomason, J., Venugopalan, S., Guadarrama, S., Saenko, K., Mooney, R.: Integrating language and vision to generate natural language descriptions of videos in the wild. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 1218–1227 (2014)

    Google Scholar 

  32. Topol, E.J.: A decade of digital medicine innovation. Sci. Transl. Med. 11(498), eaaw7610 (2019)

    Google Scholar 

  33. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)

    Google Scholar 

  34. Wang, X., Pham, H., Dai, Z., Neubig, G.: SwitchOut: an efficient data augmentation algorithm for neural machine translation. arXiv preprint arXiv:1808.07512 (2018)

  35. Zeng, X.H., Liu, B.G., Zhou, M.: Understanding and generating ultrasound image description. J. Comput. Sci. Technol. 33(5), 1086–1100 (2018)

    Article  Google Scholar 

  36. Zeng, X., Wen, L., Liu, B., Qi, X.: Deep learning for ultrasound image caption generation based on object detection. Neurocomputing 392, 132–141 (2019)

    Article  Google Scholar 

  37. Zhao, W., et al.: Dual learning for cross-domain image captioning. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 29–38 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Alsharid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alsharid, M., Sharma, H., Drukker, L., Papageorgiou, A.T., Noble, J.A. (2022). Weakly Supervised Captioning of Ultrasound Images. In: Yang, G., Aviles-Rivero, A., Roberts, M., Schönlieb, CB. (eds) Medical Image Understanding and Analysis. MIUA 2022. Lecture Notes in Computer Science, vol 13413. Springer, Cham. https://doi.org/10.1007/978-3-031-12053-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-12053-4_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-12052-7

  • Online ISBN: 978-3-031-12053-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy