Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression

Koyuncu, A. Burakhan; Gao, Han; Boev, Atanas; Gaikov, Georgii; Alshina, Elena; Steinbach, Eckehard

doi:10.1007/978-3-031-19800-7_26

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13679))

Included in the following conference series:

European Conference on Computer Vision

4075 Accesses
39 Citations

Abstract

Entropy modeling is a key component for high-performance image compression algorithms. Recent developments in autoregressive context modeling helped learning-based methods to surpass their classical counterparts. However, the performance of those models can be further improved due to the underexploited spatio-channel dependencies in latent space, and the suboptimal implementation of context adaptivity. Inspired by the adaptive characteristics of the transformers, we propose a transformer-based context model, named Contextformer, which generalizes the de facto standard attention mechanism to spatio-channel attention. We replace the context model of a modern compression framework with the Contextformer and test it on the widely used Kodak, CLIC2020, and Tecnick image datasets. Our experimental results show that the proposed model provides up to 11% rate savings compared to the standard Versatile Video Coding (VVC) Test Model (VTM) 16.2, and outperforms various learning-based models in terms of PSNR and MS-SSIM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Accurate entropy modeling in learned image compression with joint enchanced SwinT and CNN

Article 07 July 2024

Video Compression through Advanced Video Saliency Aware Spatial-Temporal Integration and Attention Mechanisms

Article 03 October 2024

An Enhanced Multi-frequency Learned Image Compression Method

References

Versatile Video Coding. Standard, Rec. ITU-T H.266 and ISO/IEC 23090–3 (2020)
Google Scholar
Asuni, N., Giachetti, A.: Testimages: a large-scale archive for testing visual devices and basic image processing algorithms. In: STAG, pp. 63–70 (2014)
Google Scholar
Ballé, J., Chou, P.A., Minnen, D., Singh, S., Johnston, N., Agustsson, E., Hwang, S.J., Toderici, G.: Nonlinear transform coding. IEEE Journal of Selected Topics in Signal Processing 15(2), 339–353 (2020)
Article Google Scholar
Ballé, J., Laparra, V., Simoncelli, E.P.: Density modeling of images using a generalized normalization transformation. In: 4th International Conference on Learning Representations, ICLR 2016 (2016)
Google Scholar
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: 5th International Conference on Learning Representations, ICLR 2017 (2017)
Google Scholar
Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. In: International Conference on Learning Representations (2018)
Google Scholar
Bégaint, J., Racapé, F., Feltman, S., Pushparaja, A.: Compressai: a pyTorch library and evaluation platform for end-to-end compression research. arXiv preprint arXiv:2011.03029 (2020)
Bellard, F.: BPG image format (2015). Accessed 01 Jun 2022. https://bellard.org/bpg
Bjontegaard, G.: Calculation of average PSNR differences between RD-curves. VCEG-M33 (2001)
Google Scholar
Carion, Nicolas, Massa, Francisco, Synnaeve, Gabriel, Usunier, Nicolas, Kirillov, Alexander, Zagoruyko, Sergey: End-to-end object detection with transformers. In: Vedaldi, Andrea, Bischof, Horst, Brox, Thomas, Frahm, Jan-Michael. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Chen, T., Liu, H., Ma, Z., Shen, Q., Cao, X., Wang, Y.: End-to-end learnt image compression via non-local attention optimization and improved context modeling. IEEE Transactions on Image Processing 30, 3179–3191 (2021). DOI: https://doi.org/10.1109/TIP.2021.3058615
Article Google Scholar
Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized gaussian mixture likelihoods and attention modules. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7939–7948 (2020)
Google Scholar
Cui, Z., Wang, J., Bai, B., Guo, T., Feng, Y.: G-VAE: A continuously variable rate deep image compression framework. arXiv preprint arXiv:2003.02012 (2020)
Cui, Z., Wang, J., Gao, S., Guo, T., Feng, Y., Bai, B.: Asymmetric gained deep image compression with continuous rate adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10532–10541 (2021)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol.1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2020)
Google Scholar
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12873–12883 (2021)
Google Scholar
Franzen, R.: Kodak lossless true color image suite (1999)
Google Scholar
Goyal, V.K.: Theoretical foundations of transform coding. IEEE Signal Processing Magazine 18(5), 9–21 (2001)
Article Google Scholar
Guo, Z., Zhang, Z., Feng, R., Chen, Z.: Causal contextual prediction for learned image compression. IEEE Transactions on Circuits and Systems for Video Technology (2021)
Google Scholar
He, D., Zheng, Y., Sun, B., Wang, Y., Qin, H.: Checkerboard context model for efficient learned image compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14771–14780 (2021)
Google Scholar
Jiang, Y., Chang, S., Wang, Z.: TransGAN: two transformers can make one strong gan. arXiv preprint arXiv:2102.07074 (2021)
Katharopoulos, A., Vyas, A., Pappas, N., Fleuret, F.: Transformers are RNNs: fast autoregressive transformers with linear attention. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, 13–18 Jul 2020, vol. 119, pp. 5156–5165. PMLR (2020). https://proceedings.mlr.press/v119/katharopoulos20a.html
Kim, D.W., Chung, J.R., Jung, S.W.: GRDN: grouped residual dense network for real image denoising and gan-based real-world noise modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Koyuncu, A.B., Cui, K., Boev, A., Steinbach, E.: Parallelized context modeling for faster image coding. In: 2021 International Conference on Visual Communications and Image Processing (VCIP), pp. 1–5. IEEE (2021)
Google Scholar
Lee, J., Cho, S., Beack, S.K.: Context-adaptive entropy model for end-to-end optimized image compression. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)
Google Scholar
Lee-Thorp, J., Ainslie, J., Eckstein, I., Ontanon, S.: FNet: mixing tokens with Fourier transforms. arXiv preprint arXiv:2105.03824 (2021)
Li, D., et al.: Involution: inverting the inherence of convolution for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12321–12330 (2021)
Google Scholar
Li, M., Ma, K., You, J., Zhang, D., Zuo, W.: Efficient and effective context-based convolutional entropy modeling for image compression. IEEE Transactions on Image Processing 29, 5900–5911 (2020). DOI: 10.1109/TIP.2020.2985225
Article MathSciNet MATH Google Scholar
Liu, H., Chen, T., Shen, Q., Ma, Z.: Practical stacked non-local attention modules for image compression. In: CVPR Workshops (2019)
Google Scholar
Mentzer, F., Agustsson, E., Tschannen, M., Timofte, R., Van Gool, L.: Conditional probability models for deep image compression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4394–4402 (2018)
Google Scholar
Minnen, D., Ballé, J., Toderici, G.: Joint autoregressive and hierarchical priors for learned image compression. In: NeurIPS (2018)
Google Scholar
Minnen, D., Singh, S.: Channel-wise autoregressive entropy models for learned image compression. In: 2020 IEEE International Conference on Image Processing (ICIP), pp. 3339–3343. IEEE (2020)
Google Scholar
Naseer, M., Ranasinghe, K., Khan, S., Hayat, M., Khan, F.S., Yang, M.H.: Intriguing properties of vision transformers. arXiv preprint arXiv:2105.10497 (2021)
Niu, Z., Zhong, G., Yu, H.: A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62 (2021). https://doi.org/10.1016/j.neucom.2021.03.091. https://www.sciencedirect.com/science/article/pii/S092523122100477X
Parmar, N., et al.: Image transformer. In: International Conference on Machine Learning, pp. 4055–4064. PMLR (2018)
Google Scholar
Qian, Y., Sun, X., Lin, M., Tan, Z., Jin, R.: Entroformer: a transformer-based entropy model for learned image compression. In: International Conference on Learning Representations (2021)
Google Scholar
Qian, Y., et al.: Learning accurate entropy model with global reference for image compression. In: International Conference on Learning Representations (2020)
Google Scholar
Rissanen, J., Langdon, G.G.: Arithmetic coding. IBM Journal of research and development 23(2), 149–162 (1979)
Article MathSciNet MATH Google Scholar
Roy, A., Saffar, M., Vaswani, A., Grangier, D.: Efficient content-based sparse attention with routing transformers. Transactions of the Association for Computational Linguistics 9, 53–68 (2021)
Article Google Scholar
Sullivan, G., Ohm, J., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding standard vol. 22, pp. 1648–1667 (2012)
Google Scholar
Team, J.V.E.: Versatile video coding (vvc) reference software: Vvc test model (vtm) (2022). Accessed 01 Jun 2022. https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware/_VTM
Toderici, G., et al.: Workshop and challenge on learned image compression (clic2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)
Google Scholar
Wallace, G.K.: The jpeg still picture compression standard. IEEE Trans. Cons. Electr. 38(1), xviii-xxxiv (1992)
Google Scholar
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2, pp. 1398–1402. IEEE (2003)
Google Scholar
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2, pp. 1398–1402. IEEE (2003)
Google Scholar
Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. International Journal of Computer Vision (IJCV) 127(8), 1106–1125 (2019)
Article Google Scholar
Zhang, Y., Li, K., Li, K., Zhong, B., Fu, Y.: Residual non-local attention networks for image restoration. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=HkeGhoA5FX
Zhao, J., Li, B., Li, J., Xiong, R., Lu, Y.: A universal encoder rate distortion optimization framework for learned compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1880–1884 (2021)
Google Scholar
Zhou, J., Wen, S., Nakagawa, A., Kazui, K., Tan, Z.: Multi-scale and context-adaptive entropy model for image compression. arXiv preprint arXiv:1910.07844 (2019)
Ziv, J.: On universal quantization. IEEE Transactions on Information Theory 31(3), 344–347 (1985)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Technical University of Munich, Munich, Germany
A. Burakhan Koyuncu & Eckehard Steinbach
Huawei Munich Research Center, Munich, Germany
A. Burakhan Koyuncu, Atanas Boev & Elena Alshina
Huawei Moscow Research Center, Moscow, Russia
Georgii Gaikov
Tencent America, Palo Alto, USA
Han Gao

Authors

A. Burakhan Koyuncu
View author publications
You can also search for this author in PubMed Google Scholar
Han Gao
View author publications
You can also search for this author in PubMed Google Scholar
Atanas Boev
View author publications
You can also search for this author in PubMed Google Scholar
Georgii Gaikov
View author publications
You can also search for this author in PubMed Google Scholar
Elena Alshina
View author publications
You can also search for this author in PubMed Google Scholar
Eckehard Steinbach
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Burakhan Koyuncu .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 13525 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koyuncu, A.B., Gao, H., Boev, A., Gaikov, G., Alshina, E., Steinbach, E. (2022). Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13679. Springer, Cham. https://doi.org/10.1007/978-3-031-19800-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-19800-7_26
Published: 09 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19799-4
Online ISBN: 978-3-031-19800-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Accurate entropy modeling in learned image compression with joint enchanced SwinT and CNN

Video Compression through Advanced Video Saliency Aware Spatial-Temporal Integration and Attention Mechanisms

An Enhanced Multi-frequency Learned Image Compression Method

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 13525 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Accurate entropy modeling in learned image compression with joint enchanced SwinT and CNN

Video Compression through Advanced Video Saliency Aware Spatial-Temporal Integration and Attention Mechanisms

An Enhanced Multi-frequency Learned Image Compression Method

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 13525 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.