Abstract
Unmanned aerial vehicle (UAV) image object detection has extensive applications across both civilian and military domains. However, the traditional YOLOv8 detection algorithm faces significant challenges in detecting small objects in UAV imagery, primarily due to a high missed detection rate and an excessive number of parameters. To address these issues, this paper introduces an enhanced small object detection approach, called Small-Size Object Detection Algorithm Based on Improved YOLOv8 for UAV Imagery (SS-YOLOv8). Firstly, considering the difficulties and stringent real-time requirements in detecting small objects in UAV aerial images, this work streamlines the model by eliminating the large object detection head and its associated redundant network layers, significantly limiting the parameter count. Furthermore, a specialized tiny object detection head is proposed. It is custom-designed for detecting small objects. To enhance the model’s capacity for extracting fine features of small objects and mitigate information loss during feature extraction, the convolution module is replaced in the backbone with space-to-depth convolution (SPD-Conv). Moreover, in the neck section, the self created GCU module is incorporated as it effectively amalgamates deep semantic features with shallow positional ones. Finally, while maintaining consistent parameter costs, multi-scale features are meticulously reused and incorporated to achieve a more comprehensive and sophisticated feature fusion. The experimental results on the VisDrone2019 dataset showed that compared with YOLOv8, SS-YOLOv8 improved its MAP index by 6.9% on the validation set and 5.8% on the test set, while reducing the number of parameters and model size by 65.93% and 64.85%, respectively. Compared with the best performing comparison algorithm, SS-YOLOv8 has the highest detection accuracy, indicating the superiority of this method.









Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Osco, L., Junior, J., Ramos, A., et al.: A review on deep learning in uav remote sensing. Int. J. Appl. Earth Observ. Geoinf. 102, 102456 (2021)
Sun, Z., Dai, M., Leng, X., et al.: An anchor-free detection method for ship targets in high-resolution sar images. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 14, 7799–7816 (2021)
Xue, Y., Jin, G., Shen, T., et al.: Smalltrack: Wavelet pooling and graph enhanced classification for uav small object tracking. IEEE Trans. Geosci. Remote Sens. 61, 5618815 (2023)
Xue, Y., Jin, G., Shen, T., et al.: Template-guided frequency attention and adaptive cross-entropy loss for uav visual tracking. Chin. J. Aeronaut. 36(9), 299–312 (2023)
Xue, Y., Jin, G., Shen, T., et al.: Mobiletrack: Siamese efficient mobile network for high-speed uav tracking. IET Image Process. 16(12), 3300–3313 (2022)
Xue, Y., Jin, G., Shen, T., et al.: Consistent representation mining for multi-drone single object tracking. IEEE Trans. Circuits Syst. Video Technol. (2024). https://doi.org/10.1109/TCSVT.2024.3411301
Xue, Y., Shen, T., Jin, G., et al.: Handling occlusion in uav visual tracking with query-guided redetection. IEEE Trans. Instrum. Meas. (2024). https://doi.org/10.1109/TIM.2024.3440378
Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448 (2014)
Ren, S., He, K., Girshick, R., et al.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)
Liu, W., Anguelov, D., Erhan, D., et al.: Ssd: Single shot multibox detector. In: Computer Vision-ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016. Proceedings, Part I 14, 21–37 (2016)
Redmon, J., Divvala, S., Girshick ,R., et al.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)
Ge, Z., Liu, S., Wang, F., et al.: Yolox: Exceeding yolo series in 2021 (2021). arXiv:0707.3168
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement (2018). arXiv:1804.02767
Lin, Y., Chen, T., Liu, S., et al.: Quick and accurate monitoring peanut seedlings emergence rate through uav video and deep learning. Comput. Electron. Agric. 197, 106938 (2022)
Nguyen, K., Huynh, N., Nguyen, P., et al.: Detecting objects from space: an evaluation of deep-learning modern approaches. Electronics 9(4), 583 (2020)
Dai, J., Li, Y., He, K., et al.: R-fcn: object detection via region-based fully convolutional networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 379–387 (2016)
Singh, B., Najibi, M., Davis, L.: Sniper: efficient multi-scale training. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 9333–9343 (2018)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv:1904.07850 (2019)
Lin, T., Goyal, P., Girshick, R., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988 (2017)
Zhu, P., Wen, L., Bian, X., et al.: Vision meets drones: A challenge (2018). arXiv:1804.07437
Ammar, A., Koubaa, A., Benjdira, B.: Deep-learning-based automated palm tree counting and geolocation in large farms from aerial geotagged images. Agronomy 11(8), 1458 (2021)
Bochkovskiy, A., Wang, C., Liao, H.: Yolov4: Optimal speed and accuracy of object detection (2020). arXiv:2004.10934
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Cai, W., Wei, Z.: Remote sensing image classification based on a cross-attention mechanism and graph convolution. IEEE Geosci. Remote Sens. Lett. 19, 1–5 (2020)
Deng, S., Li, S., Xie, K., et al.: A global-local self-adaptive network for drone-view object detection. IEEE Trans. Image Process. 30, 1556–1569 (2020)
Domozi, Z., Stojcsics, D., Benhamida, A., et al.: Real time object detection for aerial search and rescue missions for missing persons. In: 2020 IEEE 15th International Conference of System of Systems Engineering (SoSE), pp. 000519–000524 (2020)
Hong, S., Kang, S., Cho, D.: Patch-level augmentation for object detection in aerial images. In: 17th IEEE/CVF International Conference on Computer Vision Workshop, ICCVW 2019, pp. 127–134 (2019)
Gao, J., Chen, M., Xiang, L., et al.: A comprehensive survey on evidential deep learning and its applications. (2024). arXiv preprint arXiv:2409.04720
Yang, F., Fan, H., Chu, P., et al.: Clustered object detection in aerial images. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8311–8320 (2019)
Zhao, W., Xu, L.: Weakly supervised target detection based on spatial attention. Visual Intell. 2(1), 2 (2024)
Jia, Z., Sun, S., Liu, G., et al.: Mssd: multi-scale self-distillation for object detection. Visual Intell. 2(1), 8 (2024)
Liang, X., Zhang, J., Zhuo, L., et al.: Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis. IEEE Trans. Circuits Syst. Video Technol. 30(6), 1758–1770 (2019)
Liu, Z., Gao, G., Sun, L., et al.: Hrdnet: High-resolution detection network for small objects. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2021)
Huang, Y., Chen, J., Huang, D.: Ufpmp-det: Toward accurate and efficient object detection on drone imagery. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1026–1033 (2022)
Liu, M., Wang, X., Zhou, A., et al.: Uav-yolo: small object detection on unmanned aerial vehicle perspective. Sensors 20(8), 2238–2250 (2020)
Deng, F., Xie, Z., Mao, W., et al.: Research on edge intelligent recognition method oriented to transmission line insulator fault detection. Int. J. Electr. Power Energy Syst. 139, 108054 (2022)
Wang, X., Li, W., Guo, W., et al.: Spb-yolo: an efficient real-time detector for unmanned aerial vehicle images. In: 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 099–104 (2021)
Jocher, G., et al.: Yolov5 by ultralytics (2020). Available online https://github.com/openimages
Jocher, G., Chaurasia, A., Qiu, J.: Yolo by ultralytics (2023). Available online https://github.com/ultralytics/ultralytics
Li, Y., Fan, Q., Huang, H., et al.: A modified yolov8 detection network for uav aerial image recognition. Drones 7(5), 304 (2023)
Lou, H., Duan, X., Guo, J., et al.: Dc-yolov8: small-size object detection algorithm based on camera sensor. Electronics 12(10), 2323 (2023)
Shen, L., Lang, B., Song, Z.: Ds-yolov8-based object detection method for remote sensing images. IEEE Access 11, 125122–125137 (2023)
Yi, H., Liu, B., Zhao, B., et al.: Small object detection algorithm based on improved yolov8 for remote sensing. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. (2023). https://doi.org/10.1109/JSTARS.2023.3339235
Sunkara, R., Luo, T.: No more strided convolutions or pooling: A new cnn building block for low-resolution images and small objects. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp. 443–459 (2022)
Sunkara, R., Luo, T.: No more strided convolutions or pooling: A new cnn building block for low-resolution images and small objects. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 443–459 (2022)
Han, K., Wang, Y., Tian, Q., et al.: Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1580–1589 (2020)
Du, D., Wen, L., Zhu, P., et al.: Visdrone-det2020: The vision meets drone object detection in image challenge results. In: European Conference on Computer Vision, pp. 692–712 (2020)
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258 (2017)
Wang, C., Yeh, I., Liao, H.: Yolov9: Learning what you want to learn using programmable gradient information (2024). arXiv:2402.13616
Zhang, Z.: Drone-yolo: an efficient neural network method for target detection in drone images. Drones 7(8), 526–549 (2023)
Lou, H., Duan, X., Guo, J., et al.: Dc-yolov8: small-size object detection algorithm based on camera sensor. Electronics 12(10), 2323 (2023)
Li, Y., Fan, Q., Huang, H., et al.: A modified yolov8 detection network for uav aerial image recognition. Drones 7(5), 304 (2023)
Wang, L., Shi, Y., Mao, G., et al.: Consumer-centric insights into resilient small object detection: Sciou loss and recursive transformer network. IEEE Trans. Consum. Electron. (2023). https://doi.org/10.1109/TCE.2023.3330788
Acknowledgements
This work was funded by the National Key Research and Development Program of China, grant Number 2021YFB1407005, and the Key Technology Research and Development Program of Shandong Province, Grant Number 2021SFGC0401.
Author information
Authors and Affiliations
Contributions
Jinlong Qu proposed the research idea, designed and implemented the model, conducted experiments, and served as the primary writer. Qi Li and Jie Pan contributed to the model design, result analysis, and manuscript revisions. Mingzheng Sun, Xingzheng Lu, Ying Zhou and Hongliang Zhu supervised and participated in revising the paper.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no Conflict of interest to declarethat are relevant to the content of this article.
Ethical standard
All datasets used in this paper are publicly availableand no consent was required for their use.
Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copy-right holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Additional information
Communicated by Junyu Gao.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qu, J., Li, Q., Pan, J. et al. SS-YOLOv8: small-size object detection algorithm based on improved YOLOv8 for UAV imagery. Multimedia Systems 31, 42 (2025). https://doi.org/10.1007/s00530-024-01622-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00530-024-01622-3