Skip to main content

Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Abstract

Entropy modeling is a key component for high-performance image compression algorithms. Recent developments in autoregressive context modeling helped learning-based methods to surpass their classical counterparts. However, the performance of those models can be further improved due to the underexploited spatio-channel dependencies in latent space, and the suboptimal implementation of context adaptivity. Inspired by the adaptive characteristics of the transformers, we propose a transformer-based context model, named Contextformer, which generalizes the de facto standard attention mechanism to spatio-channel attention. We replace the context model of a modern compression framework with the Contextformer and test it on the widely used Kodak, CLIC2020, and Tecnick image datasets. Our experimental results show that the proposed model provides up to 11% rate savings compared to the standard Versatile Video Coding (VVC) Test Model (VTM) 16.2, and outperforms various learning-based models in terms of PSNR and MS-SSIM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Versatile Video Coding. Standard, Rec. ITU-T H.266 and ISO/IEC 23090–3 (2020)

    Google Scholar 

  2. Asuni, N., Giachetti, A.: Testimages: a large-scale archive for testing visual devices and basic image processing algorithms. In: STAG, pp. 63–70 (2014)

    Google Scholar 

  3. Ballé, J., Chou, P.A., Minnen, D., Singh, S., Johnston, N., Agustsson, E., Hwang, S.J., Toderici, G.: Nonlinear transform coding. IEEE Journal of Selected Topics in Signal Processing 15(2), 339–353 (2020)

    Article  Google Scholar 

  4. Ballé, J., Laparra, V., Simoncelli, E.P.: Density modeling of images using a generalized normalization transformation. In: 4th International Conference on Learning Representations, ICLR 2016 (2016)

    Google Scholar 

  5. Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: 5th International Conference on Learning Representations, ICLR 2017 (2017)

    Google Scholar 

  6. Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. In: International Conference on Learning Representations (2018)

    Google Scholar 

  7. Bégaint, J., Racapé, F., Feltman, S., Pushparaja, A.: Compressai: a pyTorch library and evaluation platform for end-to-end compression research. arXiv preprint arXiv:2011.03029 (2020)

  8. Bellard, F.: BPG image format (2015). Accessed 01 Jun 2022. https://bellard.org/bpg

  9. Bjontegaard, G.: Calculation of average PSNR differences between RD-curves. VCEG-M33 (2001)

    Google Scholar 

  10. Carion, Nicolas, Massa, Francisco, Synnaeve, Gabriel, Usunier, Nicolas, Kirillov, Alexander, Zagoruyko, Sergey: End-to-end object detection with transformers. In: Vedaldi, Andrea, Bischof, Horst, Brox, Thomas, Frahm, Jan-Michael. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  11. Chen, T., Liu, H., Ma, Z., Shen, Q., Cao, X., Wang, Y.: End-to-end learnt image compression via non-local attention optimization and improved context modeling. IEEE Transactions on Image Processing 30, 3179–3191 (2021). DOI: https://doi.org/10.1109/TIP.2021.3058615

    Article  Google Scholar 

  12. Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized gaussian mixture likelihoods and attention modules. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7939–7948 (2020)

    Google Scholar 

  13. Cui, Z., Wang, J., Bai, B., Guo, T., Feng, Y.: G-VAE: A continuously variable rate deep image compression framework. arXiv preprint arXiv:2003.02012 (2020)

  14. Cui, Z., Wang, J., Gao, S., Guo, T., Feng, Y., Bai, B.: Asymmetric gained deep image compression with continuous rate adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10532–10541 (2021)

    Google Scholar 

  15. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423

  16. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2020)

    Google Scholar 

  17. Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12873–12883 (2021)

    Google Scholar 

  18. Franzen, R.: Kodak lossless true color image suite (1999)

    Google Scholar 

  19. Goyal, V.K.: Theoretical foundations of transform coding. IEEE Signal Processing Magazine 18(5), 9–21 (2001)

    Article  Google Scholar 

  20. Guo, Z., Zhang, Z., Feng, R., Chen, Z.: Causal contextual prediction for learned image compression. IEEE Transactions on Circuits and Systems for Video Technology (2021)

    Google Scholar 

  21. He, D., Zheng, Y., Sun, B., Wang, Y., Qin, H.: Checkerboard context model for efficient learned image compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14771–14780 (2021)

    Google Scholar 

  22. Jiang, Y., Chang, S., Wang, Z.: TransGAN: two transformers can make one strong gan. arXiv preprint arXiv:2102.07074 (2021)

  23. Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are RNNs: fast autoregressive transformers with linear attention. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, 13–18 Jul 2020, vol. 119, pp. 5156–5165. PMLR (2020). https://proceedings.mlr.press/v119/katharopoulos20a.html

  24. Kim, D.W., Chung, J.R., Jung, S.W.: GRDN: grouped residual dense network for real image denoising and gan-based real-world noise modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)

    Google Scholar 

  25. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  26. Koyuncu, A.B., Cui, K., Boev, A., Steinbach, E.: Parallelized context modeling for faster image coding. In: 2021 International Conference on Visual Communications and Image Processing (VCIP), pp. 1–5. IEEE (2021)

    Google Scholar 

  27. Lee, J., Cho, S., Beack, S.K.: Context-adaptive entropy model for end-to-end optimized image compression. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)

    Google Scholar 

  28. Lee-Thorp, J., Ainslie, J., Eckstein, I., Ontanon, S.: FNet: mixing tokens with Fourier transforms. arXiv preprint arXiv:2105.03824 (2021)

  29. Li, D., et al.: Involution: inverting the inherence of convolution for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12321–12330 (2021)

    Google Scholar 

  30. Li, M., Ma, K., You, J., Zhang, D., Zuo, W.: Efficient and effective context-based convolutional entropy modeling for image compression. IEEE Transactions on Image Processing 29, 5900–5911 (2020). DOI: 10.1109/TIP.2020.2985225

    Article  MathSciNet  MATH  Google Scholar 

  31. Liu, H., Chen, T., Shen, Q., Ma, Z.: Practical stacked non-local attention modules for image compression. In: CVPR Workshops (2019)

    Google Scholar 

  32. Mentzer, F., Agustsson, E., Tschannen, M., Timofte, R., Van Gool, L.: Conditional probability models for deep image compression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4394–4402 (2018)

    Google Scholar 

  33. Minnen, D., Ballé, J., Toderici, G.: Joint autoregressive and hierarchical priors for learned image compression. In: NeurIPS (2018)

    Google Scholar 

  34. Minnen, D., Singh, S.: Channel-wise autoregressive entropy models for learned image compression. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 3339–3343. IEEE (2020)

    Google Scholar 

  35. Naseer, M., Ranasinghe, K., Khan, S., Hayat, M., Khan, F.S., Yang, M.H.: Intriguing properties of vision transformers. arXiv preprint arXiv:2105.10497 (2021)

  36. Niu, Z., Zhong, G., Yu, H.: A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62 (2021). https://doi.org/10.1016/j.neucom.2021.03.091. https://www.sciencedirect.com/science/article/pii/S092523122100477X

  37. Parmar, N., et al.: Image transformer. In: International Conference on Machine Learning, pp. 4055–4064. PMLR (2018)

    Google Scholar 

  38. Qian, Y., Sun, X., Lin, M., Tan, Z., Jin, R.: Entroformer: a transformer-based entropy model for learned image compression. In: International Conference on Learning Representations (2021)

    Google Scholar 

  39. Qian, Y., et al.: Learning accurate entropy model with global reference for image compression. In: International Conference on Learning Representations (2020)

    Google Scholar 

  40. Rissanen, J., Langdon, G.G.: Arithmetic coding. IBM Journal of research and development 23(2), 149–162 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  41. Roy, A., Saffar, M., Vaswani, A., Grangier, D.: Efficient content-based sparse attention with routing transformers. Transactions of the Association for Computational Linguistics 9, 53–68 (2021)

    Article  Google Scholar 

  42. Sullivan, G., Ohm, J., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding standard vol. 22, pp. 1648–1667 (2012)

    Google Scholar 

  43. Team, J.V.E.: Versatile video coding (vvc) reference software: Vvc test model (vtm) (2022). Accessed 01 Jun 2022. https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware/_VTM

  44. Toderici, G., et al.: Workshop and challenge on learned image compression (clic2020)

    Google Scholar 

  45. Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)

    Google Scholar 

  46. Wallace, G.K.: The jpeg still picture compression standard. IEEE Trans. Cons. Electr. 38(1), xviii-xxxiv (1992)

    Google Scholar 

  47. Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2, pp. 1398–1402. IEEE (2003)

    Google Scholar 

  48. Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2, pp. 1398–1402. IEEE (2003)

    Google Scholar 

  49. Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. International Journal of Computer Vision (IJCV) 127(8), 1106–1125 (2019)

    Article  Google Scholar 

  50. Zhang, Y., Li, K., Li, K., Zhong, B., Fu, Y.: Residual non-local attention networks for image restoration. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=HkeGhoA5FX

  51. Zhao, J., Li, B., Li, J., Xiong, R., Lu, Y.: A universal encoder rate distortion optimization framework for learned compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1880–1884 (2021)

    Google Scholar 

  52. Zhou, J., Wen, S., Nakagawa, A., Kazui, K., Tan, Z.: Multi-scale and context-adaptive entropy model for image compression. arXiv preprint arXiv:1910.07844 (2019)

  53. Ziv, J.: On universal quantization. IEEE Transactions on Information Theory 31(3), 344–347 (1985)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Burakhan Koyuncu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 13525 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Koyuncu, A.B., Gao, H., Boev, A., Gaikov, G., Alshina, E., Steinbach, E. (2022). Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13679. Springer, Cham. https://doi.org/10.1007/978-3-031-19800-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19800-7_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19799-4

  • Online ISBN: 978-3-031-19800-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy