Skip to main content

Advertisement

Log in

Enhancing the ability of convolutional neural networks for remote sensing image segmentation using transformers

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The segmentation of remote sensing images has emerged as a compelling undertaking in computer vision owing to its use in the development of several applications. The U-Net style has been extensively utilized in many picture segmentation applications, yielding remarkable achievements. Nevertheless, the U-Net has several constraints in the context of remote sensing picture segmentation, mostly stemming from the limited scope of the convolution kernels. The transformer is a deep learning model specifically developed for sequence-to-sequence translation. It incorporates a self-attention mechanism to efficiently process many inputs, selectively retaining the relevant information and discarding the irrelevant inputs by adjusting the weights. However, it highlights a constraint in the localization capability caused by the absence of fundamental characteristics. This work presents a novel approach called U-Net–transformer, which combines the U-Net and transformer models for the purpose of remote sensing picture segmentation. The suggested solution surpasses individual models, such as U-Net and transformers, by combining and leveraging their characteristics. Initially, the transformer obtains the overall context by encoding tokenized picture patches derived from the feature maps of the convolutional neural network (CNN). Next, the encoded feature maps undergo upsampling through a decoder and are then merged with the high-resolution feature maps of the CNN model. This enables the localization to be more accurate. The transformer serves as an unconventional encoder for segmenting remote sensing images. It enhances the U-Net model by capturing localized spatial data, hence improving the capacity to capture intricate details. The U-Net–transformer, as suggested, has demonstrated exceptional performance in remote sensing picture segmentation across many benchmark datasets. The given findings demonstrated the efficacy of integrating the U-Net and transformer model for the purpose of segmenting remote sensing images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

Data will be made available on request.

References

  1. Brunn SD (2019) The international encyclopedia of geography: people, the earth, environment and technology. AAG rev Books 7(2):77–85

    Article  Google Scholar 

  2. A. I. Godunov, Penza State University, S. T. Balanyan, P. S. Egorov, Air Force Academy named after Professor N. E. Zhukovsky and Yu. A. Gagarin, and Air Force Academy named after Professor N. E. Zhukovsky and Yu. A. Gagarin, 2021 "Image segmentation and object recognition based on convolutional neural network technology," Reliab. qual. complex syst., no. 3

  3. P. Wang et al., 2018 “Understanding Convolution for Semantic Segmentation,” In 2018 IEEE winter conference on applications of computer vision (WACV)

  4. S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, 2018 “Path aggregation network for instance segmentation,” In 2018 IEEE/CVF conference on computer vision and pattern recognition

  5. M. N. Mullani and P. A. Dandavate. 2019 Semantic texton forests for image categorization and segmentation. Nternational j. adv. res. comput. commun. eng. 8(4): 259–262

  6. Smith A (2010) Image segmentation scale parameter optimization and land cover classification using the Random Forest algorithm. J Spat Sci 55(1):69–79

    Article  Google Scholar 

  7. Barthakur M, Sarma KK (2020) “Deep learning based semantic segmentation applied to satellite image”, in Data Visualization and Knowledge Engineering. Springer International Publishing, Cham, pp 79–107

    Google Scholar 

  8. Ayachi R, Said Y, Atri M (2021) A convolutional neural network to perform object detection and identification in large-scale visual data. Big Data 9(1):41–52

    Article  Google Scholar 

  9. Afif M, Ayachi R, Said Y, Atri M (2020) Deep learning based application for indoor scene recognition. Neural Process Lett 51(3):2827–2837

    Article  Google Scholar 

  10. Ayachi R, Afif M, Said Y, Atri M (2020) Traffic signs detection for real-world application of an advanced driving assisting system using deep learning. Neural Process Lett 51(1):837–851

    Article  Google Scholar 

  11. R. Ayachi, M. Afif, Y. Said, and A. B. Abdelaali, 2020 “Pedestrian detection for advanced driving assisting system: a transfer learning approach,” In 2020 5th international conference on advanced technologies for signal and image processing (ATSIP)

  12. Ronneberger O, Fischer P, Brox T (2015) “U-Net: Convolutional Networks for Biomedical Image Segmentation.” Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 234–241

    Google Scholar 

  13. X. Zhang, H. Yang, and E. F. Y. Young, “Attentional transfer is all you need: Technology-aware layout pattern generation,” In 2021 58th ACM/IEEE design automation conference (DAC), 2021.

  14. A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv [cs.CV], 2020.

  15. Qi X, Li K, Liu P, Zhou X, Sun M (2020) Deep attention and multiscale networks for accurate remote sensing image segmentation. IEEE Access 8:146627–146639

    Article  Google Scholar 

  16. L. Zhou, C. Zhang, and M. Wu, "D-LinkNet: LinkNet with pretrained encoder and dilated convolution for high-resolution satellite imagery road extraction," In 2018 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), 2018.

  17. F. Yu and V. Koltun, "Multiscale context aggregation by dilated convolutions," arXiv [cs.CV], 2015.

  18. Cui B, Chen X, Lu Y (2020) Semantic segmentation of remote sensing images using transfer learning and deep convolutional neural network with dense connection. IEEE Access 8:116744–116755

    Article  Google Scholar 

  19. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, 2017 “Densely connected convolutional networks,” In 2017 IEEE conference on computer vision and pattern recognition (CVPR)

  20. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. (2009) “ImageNet A large-scale hierarchical image database.” In 2009 IEEE conference on computer vision and pattern recognition. Doi https://doi.org/10.1109/CVPR.2009.5206848

  21. Gao L, Liu H, Yang M, Chen L, Wan Y, Xiao Z, Qian Y (2021) STransFuse: Fusing swin transformer and convolutional neural network for remote sensing image semantic segmentation. IEEE J Sel Top Appl Earth Obs Remote Sens 14:10990–11003

    Article  Google Scholar 

  22. Zou Z, Shi T, Li W, Zhang Z, Shi Z (2020) Do game data generalize well for remote sensing image segmentation? Remote Sens 12(2):275

    Article  Google Scholar 

  23. J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, 2017 “Unpaired image-to-image translation using cycle-consistent adversarial networks,” In 2017 IEEE international conference on computer vision (ICCV)

  24. K. He, X. Zhang, S. Ren, and J. Sun, 2016 “Deep residual learning for image recognition,” In 2016 IEEE conference on computer vision and pattern recognition (CVPR)

  25. Liu Y, Zhu Q, Cao F, Chen J, Gang Lu (2021) High-resolution remote sensing image segmentation framework based on attention mechanism and adaptive weighting. ISPRS Int J Geo Inf 10(4):241

    Article  Google Scholar 

  26. Xu Z, Zhang W, Zhang T, Yang Z, Li J (2021) Efficient transformer for remote sensing image segmentation. Remote Sens 13(18):3585

    Article  Google Scholar 

  27. Li A, Jiao L, Zhu H, Li L, Liu F (2021) Multitask semantic boundary awareness network for remote sensing image segmentation. IEEE Trans Geosci Remote Sens 60:1–14

    Google Scholar 

  28. F. Rottensteiner et al., 2012 “The isprs benchmark on urban object classification and 3d building reconstruction.” ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences; I-3. 1(1): 293-298

  29. M. Tan and Q. V. Le, 2021 “EfficientNetV2: Smaller models and faster training,” arXiv [cs.CV]

  30. D. Zhou et al., 2021 “DeepViT: Towards Deeper Vision Transformer,” arXiv [cs.CV]

  31. Liu, Ze, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021 "Swin transformer: Hierarchical vision transformer using shifted windows." In Proceedings of the IEEE/CVF international conference on computer vision. 10012–10022

  32. Demir, Ilke, Krzysztof Koperski, David Lindenbaum, Guan Pang, Jing Huang, Saikat Basu, Forest Hughes, Devis Tuia, and Ramesh Raskar. 2018 "Deepglobe 2018: A challenge to parse the earth through satellite images." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 172–181

  33. Zhang, Kaidong, and Dong Liu. (2023) "Customized segment anything model for medical image segmentation." arXiv preprint arXiv:2304.13785

  34. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int j comput vis 88:303–338

    Article  Google Scholar 

  35. Afaq Y, Manocha A (2021) Analysis on change detection techniques for remote sensing applications: a review. Eco Inform 63:101310

    Article  Google Scholar 

  36. Bai T, Wang Le, Yin D, Sun K, Chen Y, Li W, Li D (2023) Deep learning for change detection in remote sensing: a review. Geo-spat Inform Sci 26(3):262–288

    Article  Google Scholar 

  37. Zhang C, Wang L, Cheng S, Li Y (2022) SwinSUNet: Pure transformer network for remote sensing image change detection. IEEE Trans Geosci Remote Sens 60:1–13

    Google Scholar 

  38. Manocha A, Afaq Y (2023) Optical and SAR images-based image translation for change detection using generative adversarial network (GAN). Multimed Tools Appl 82(17):26289–26315

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Barr.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barr, M. Enhancing the ability of convolutional neural networks for remote sensing image segmentation using transformers. Neural Comput & Applic 36, 13605–13616 (2024). https://doi.org/10.1007/s00521-024-09743-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-09743-6

Keywords

Navigation

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy