Abstract
The segmentation of remote sensing images has emerged as a compelling undertaking in computer vision owing to its use in the development of several applications. The U-Net style has been extensively utilized in many picture segmentation applications, yielding remarkable achievements. Nevertheless, the U-Net has several constraints in the context of remote sensing picture segmentation, mostly stemming from the limited scope of the convolution kernels. The transformer is a deep learning model specifically developed for sequence-to-sequence translation. It incorporates a self-attention mechanism to efficiently process many inputs, selectively retaining the relevant information and discarding the irrelevant inputs by adjusting the weights. However, it highlights a constraint in the localization capability caused by the absence of fundamental characteristics. This work presents a novel approach called U-Net–transformer, which combines the U-Net and transformer models for the purpose of remote sensing picture segmentation. The suggested solution surpasses individual models, such as U-Net and transformers, by combining and leveraging their characteristics. Initially, the transformer obtains the overall context by encoding tokenized picture patches derived from the feature maps of the convolutional neural network (CNN). Next, the encoded feature maps undergo upsampling through a decoder and are then merged with the high-resolution feature maps of the CNN model. This enables the localization to be more accurate. The transformer serves as an unconventional encoder for segmenting remote sensing images. It enhances the U-Net model by capturing localized spatial data, hence improving the capacity to capture intricate details. The U-Net–transformer, as suggested, has demonstrated exceptional performance in remote sensing picture segmentation across many benchmark datasets. The given findings demonstrated the efficacy of integrating the U-Net and transformer model for the purpose of segmenting remote sensing images.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Data will be made available on request.
References
Brunn SD (2019) The international encyclopedia of geography: people, the earth, environment and technology. AAG rev Books 7(2):77–85
A. I. Godunov, Penza State University, S. T. Balanyan, P. S. Egorov, Air Force Academy named after Professor N. E. Zhukovsky and Yu. A. Gagarin, and Air Force Academy named after Professor N. E. Zhukovsky and Yu. A. Gagarin, 2021 "Image segmentation and object recognition based on convolutional neural network technology," Reliab. qual. complex syst., no. 3
P. Wang et al., 2018 “Understanding Convolution for Semantic Segmentation,” In 2018 IEEE winter conference on applications of computer vision (WACV)
S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, 2018 “Path aggregation network for instance segmentation,” In 2018 IEEE/CVF conference on computer vision and pattern recognition
M. N. Mullani and P. A. Dandavate. 2019 Semantic texton forests for image categorization and segmentation. Nternational j. adv. res. comput. commun. eng. 8(4): 259–262
Smith A (2010) Image segmentation scale parameter optimization and land cover classification using the Random Forest algorithm. J Spat Sci 55(1):69–79
Barthakur M, Sarma KK (2020) “Deep learning based semantic segmentation applied to satellite image”, in Data Visualization and Knowledge Engineering. Springer International Publishing, Cham, pp 79–107
Ayachi R, Said Y, Atri M (2021) A convolutional neural network to perform object detection and identification in large-scale visual data. Big Data 9(1):41–52
Afif M, Ayachi R, Said Y, Atri M (2020) Deep learning based application for indoor scene recognition. Neural Process Lett 51(3):2827–2837
Ayachi R, Afif M, Said Y, Atri M (2020) Traffic signs detection for real-world application of an advanced driving assisting system using deep learning. Neural Process Lett 51(1):837–851
R. Ayachi, M. Afif, Y. Said, and A. B. Abdelaali, 2020 “Pedestrian detection for advanced driving assisting system: a transfer learning approach,” In 2020 5th international conference on advanced technologies for signal and image processing (ATSIP)
Ronneberger O, Fischer P, Brox T (2015) “U-Net: Convolutional Networks for Biomedical Image Segmentation.” Lecture Notes in Computer Science. Springer International Publishing, Cham, pp 234–241
X. Zhang, H. Yang, and E. F. Y. Young, “Attentional transfer is all you need: Technology-aware layout pattern generation,” In 2021 58th ACM/IEEE design automation conference (DAC), 2021.
A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv [cs.CV], 2020.
Qi X, Li K, Liu P, Zhou X, Sun M (2020) Deep attention and multiscale networks for accurate remote sensing image segmentation. IEEE Access 8:146627–146639
L. Zhou, C. Zhang, and M. Wu, "D-LinkNet: LinkNet with pretrained encoder and dilated convolution for high-resolution satellite imagery road extraction," In 2018 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW), 2018.
F. Yu and V. Koltun, "Multiscale context aggregation by dilated convolutions," arXiv [cs.CV], 2015.
Cui B, Chen X, Lu Y (2020) Semantic segmentation of remote sensing images using transfer learning and deep convolutional neural network with dense connection. IEEE Access 8:116744–116755
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, 2017 “Densely connected convolutional networks,” In 2017 IEEE conference on computer vision and pattern recognition (CVPR)
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. (2009) “ImageNet A large-scale hierarchical image database.” In 2009 IEEE conference on computer vision and pattern recognition. Doi https://doi.org/10.1109/CVPR.2009.5206848
Gao L, Liu H, Yang M, Chen L, Wan Y, Xiao Z, Qian Y (2021) STransFuse: Fusing swin transformer and convolutional neural network for remote sensing image semantic segmentation. IEEE J Sel Top Appl Earth Obs Remote Sens 14:10990–11003
Zou Z, Shi T, Li W, Zhang Z, Shi Z (2020) Do game data generalize well for remote sensing image segmentation? Remote Sens 12(2):275
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, 2017 “Unpaired image-to-image translation using cycle-consistent adversarial networks,” In 2017 IEEE international conference on computer vision (ICCV)
K. He, X. Zhang, S. Ren, and J. Sun, 2016 “Deep residual learning for image recognition,” In 2016 IEEE conference on computer vision and pattern recognition (CVPR)
Liu Y, Zhu Q, Cao F, Chen J, Gang Lu (2021) High-resolution remote sensing image segmentation framework based on attention mechanism and adaptive weighting. ISPRS Int J Geo Inf 10(4):241
Xu Z, Zhang W, Zhang T, Yang Z, Li J (2021) Efficient transformer for remote sensing image segmentation. Remote Sens 13(18):3585
Li A, Jiao L, Zhu H, Li L, Liu F (2021) Multitask semantic boundary awareness network for remote sensing image segmentation. IEEE Trans Geosci Remote Sens 60:1–14
F. Rottensteiner et al., 2012 “The isprs benchmark on urban object classification and 3d building reconstruction.” ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences; I-3. 1(1): 293-298
M. Tan and Q. V. Le, 2021 “EfficientNetV2: Smaller models and faster training,” arXiv [cs.CV]
D. Zhou et al., 2021 “DeepViT: Towards Deeper Vision Transformer,” arXiv [cs.CV]
Liu, Ze, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021 "Swin transformer: Hierarchical vision transformer using shifted windows." In Proceedings of the IEEE/CVF international conference on computer vision. 10012–10022
Demir, Ilke, Krzysztof Koperski, David Lindenbaum, Guan Pang, Jing Huang, Saikat Basu, Forest Hughes, Devis Tuia, and Ramesh Raskar. 2018 "Deepglobe 2018: A challenge to parse the earth through satellite images." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 172–181
Zhang, Kaidong, and Dong Liu. (2023) "Customized segment anything model for medical image segmentation." arXiv preprint arXiv:2304.13785
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int j comput vis 88:303–338
Afaq Y, Manocha A (2021) Analysis on change detection techniques for remote sensing applications: a review. Eco Inform 63:101310
Bai T, Wang Le, Yin D, Sun K, Chen Y, Li W, Li D (2023) Deep learning for change detection in remote sensing: a review. Geo-spat Inform Sci 26(3):262–288
Zhang C, Wang L, Cheng S, Li Y (2022) SwinSUNet: Pure transformer network for remote sensing image change detection. IEEE Trans Geosci Remote Sens 60:1–13
Manocha A, Afaq Y (2023) Optical and SAR images-based image translation for change detection using generative adversarial network (GAN). Multimed Tools Appl 82(17):26289–26315
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Barr, M. Enhancing the ability of convolutional neural networks for remote sensing image segmentation using transformers. Neural Comput & Applic 36, 13605–13616 (2024). https://doi.org/10.1007/s00521-024-09743-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-09743-6