Evaluating The Evolution of YOLO You Only Look Onc
Evaluating The Evolution of YOLO You Only Look Onc
Abstract—This study presents a comprehensive benchmark mary approach. These methods are usually classified into two
analysis of various YOLO (You Only Look Once) algorithms,
arXiv:2411.00201v1 [cs.CV] 31 Oct 2024
Jun 8, 2015 Apr 8, 2018 Jun 9, 2020 Nov 16, 2020 Jun 1, 2021 Jun,1,2022
Jun 2021 Jan 10, 2023 May 2, 2023 Feb 21, 2024 Sep 30, 2024
Dec 25, 2016 Apr 23, 2020 Jul 23, 2020 May 10, 2021 Jul 18, 2021 Jul 6, 2022 Jan 13, 2023 Jan 30, 2024 May 23, 2024
RepVGG, an architecture that simplified convolutional layers block, delivering improved performance without compromis-
during inference, and CSPStackRep blocks, which improve ing speed. Additionally, it introduces the C2PSA (Cross Stage
accuracy by splitting the feature map into two parts to process Partial with Spatial Attention) module, which improves spatial
them separately. In addition, YOLOv6 employed a hybrid attention in feature maps, increasing accuracy, especially for
channel strategy for better feature representation. YOLOv7 small and overlapping objects.
[65] leveraged the Extended Efficient Layer Aggregation Net-
This object detection algorithm has undergone several de-
work (E-ELAN), a novel architecture that improved efficiency
velopments as seen in Figure 1 achieving competitive results
and effectiveness by enhancing information flow between
in terms of accuracy and speed, making it the preferred
layers.
algorithm in various fields such as ADAS (Advanced Driver-
Assist System) [47], video surveillance [38], face detection
The most recent versions of YOLO, including YOLOv8,
[39], and many more [18]. For instance, YOLO plays a crucial
YOLOv9, YOLOv10, and YOLO11 represent the forefront
role in the agriculture field by being implemented in numerous
of the model’s development. YOLOv8 [58], released by Ul-
applications such as crop classification [1] [17], pest detection
tralytics, introduced semantic segmentation capabilities, al-
[33], automated farming [67] [37], and virtual fencing [62].
lowing the model to classify each pixel of an image, and
Moreover, YOLO has been utilized on numerous occasions in
provided scalable versions to meet various application needs,
the field of healthcare such as cancer detection [?] [45], ulcer
from resource-constrained environments to high performance
detection [2], medicine classification [36] [42], and health
systems alongside other tasks such as pose estimation, image
protocols enforcement [11].
classification, and oriented object detection (OOB). YOLOv9
[66] built on its predecessors’ architectural advancements In recent years, Ultralytics has played a crucial role in the
with Programmable Gradient Information (PGI), which op- advancement of YOLO by maintaining, improving, and mak-
timizes gradient flow during training, and the Generalized ing these models more accessible [46]. Notably, Ultralytics has
Efficient Layer Aggregation Network (GELAN), which further streamlined the process of fine-tuning and customizing YOLO
improved performance by enhancing layer information flow. models, a task that was considerably more complex in earlier
YOLOv10 [64], developed by Tsinghua University, eliminated iterations. The introduction of user-friendly interfaces, com-
the need for Non-Maximum Suppression (NMS) used by its prehensive documentation, and pre-built modules has greatly
predecessors, a technique used to eliminate duplicate predic- simplified essential tasks such as data augmentation, model
tions and pick the bounding boxes with the most confidence, training, and evaluation. Moreover, the development of scal-
by introducing a dual assignment strategy in its training proto- able model versions allows users to select models tailored
col. Additionally, YOLOv10 features lightweight classification to specific resource constraints and application requirements,
heads, spatial-channel decoupled downsampling, and rank- thereby facilitating more effective fine-tuning. For instance,
guided block design, making it one of the most efficient and YOLOv8n is favorable over YOLOv8m in scenarios where
effective YOLO models to date. Lastly, YOLO11 [26], also speed and computational efficiency are prioritized over accu-
introduced by Ultralytics, retains the capabilities of YOLOv8 racy, making it ideal for resource-constrained environments.
with applications such as Instance Segmentation, Pose Es- The integration of advanced tools for hyperparameter tuning,
timation, and Oriented Object Detection while providing 5 automated learning rate scheduling, and model pruning has
scalable versions for different use cases. YOLO11 replaces further refined the customization process. Continuous updates
the C2f block from YOLOv8 with the more efficient C3k2 and robust community support have also contributed to making
3
YOLO models more accessible and adaptable for a wide range neglecting other key metrics such as Recall and Precision.
of applications. Additionally, it considers FPS (frames per second) as the sole
This paper aims to present a comprehensive comparative measure of computational efficiency, excluding the impact of
analysis of the YOLO algorithm’s evolution. It makes a signif- preprocessing, inference, postprocessing times, GFLOPs, and
icant contribution to the field by offering the first comprehen- size.
sive evaluation of YOLO11, the newest member of the YOLO The paper in [12] thoroughly analyzes single-stage object
family. By leveraging pre-trained models and fine-tuning them, detectors, particularly YOLOs from YOLOv1 to YOLOv4,
we evaluate their performance across three diverse custom with updates to their architecture, performance metrics, and
datasets, each with varying sizes and objectives. Consistent regression formulation. Additionally, it provides an overview
hyperparameters are applied to ensure a fair and unbiased of the comparison between two-stage and single-stage object
comparison. The analysis delves into critical performance met- detectors, several YOLO versions from version 1 to version 4,
rics, including speed, efficiency, accuracy, and computational applications utilizing two-stage detectors, and future research
complexity, as measured by GFLOPs count and model size. In prospects.
addition, we explore the real-world applications of each YOLO The authors of the paper in [53] explore the evolution
version, highlighting their strengths and limitations across of the YOLO algorithms from version 1 to 10, highlighting
different use cases. Through this comparative study, we aim their impact on automotive safety, healthcare, industrial man-
to provide valuable insights for researchers and practitioners, ufacturing, surveillance, and agriculture. The paper highlights
offering a deeper understanding of how these models can be incremental technological advances and challenges in each
effectively applied in various scenarios. version, indicating a potential integration with multimodal,
The rest of this paper is organized as follows: Section context-aware, and General Artificial Intelligence systems for
2 covers related work. Section 3 describes the datasets, the future AI-driven applications. However, the paper does not
models, and the experimental setup, including the hyperpa- include a benchmarking study or a comparative analysis of the
rameters and evaluation metrics used. Section 4 presents the YOLO models, leaving out performance comparisons across
experimental results and comparative analysis alongside a the versions.
discussion. Finally, Section 5 concludes with insights drawn The paper in [61] explores the development of the YOLO
from the study. algorithm till the fourth version. It highlights its challenges
and suggests new approaches, highlighting its impact on object
II. R ELATED W ORK detection and the need for ongoing study.
The authors in the work in [27] analyze the YOLO algo-
The YOLO (You Only Look Once) algorithm is considered rithm, focusing on its development and performance. They
one of the most prominent object detection algorithms. It conduct a comparative analysis of the different versions of
achieves state-of-the-art speed and accuracy, and its various YOLO till the 8th version, highlighting the algorithm’s po-
applications have made it indispensable in numerous fields tential to provide insights into image and video recognition
and industries. Numerous researchers have shown interest in and addressing its issues and limitations. The paper focuses
this object detection algorithm by publishing papers reviewing exclusively on the mAP metric, overlooking other accuracy
its evolution, fine-tuning its models, and benchmarking its measures such as Precision and Recall. Additionally, it ne-
performance against other computer vision algorithms. This glects speed and efficiency metrics limiting the scope of the
widespread interest underscores YOLO’s important role in comparative study. The paper also omits the evaluation of the
advancing the field of computer vision. most recent models, YOLOv9, YOLOv10, and YOLO11.
The paper in [14] examines seven semantic segmentation This paper makes several key contributions: (i) It pioneers a
and detection algorithms, including YOLOv8, for cloud seg- comprehensive comparison of YOLO11 against its predeces-
mentation from remote sensing imagery. It conducts a bench- sors across their scaled variants from nano- to extra-large; (ii)
mark analysis to evaluate their architectural approaches and it offers deep insights into the structural evolution of these al-
identify the most performing ones based on accuracy, speed, gorithms by evaluating their performance across three diverse
and potential applications. The research aims to produce ma- datasets of various object properties; and (iii) our performance
chine learning algorithms that can perform cloud segmentation evaluation extends beyond mAP and FPS to include critical
using only a few spectral bands, including RGB and RGBN-IR metrics such as Precision, Recall, Preprocessing, Inference,
combinations. and Postprocessing Time, GFLOPs, and model size. These
The authors of the paper in [22] review the evolution of the metrics provide valuable insights to guide the selection of
YOLO variants from version 1 to version 8, examining their in- the optimal YOLO algorithm for specific use cases for both
ternal architecture, key innovations, and benchmarked perfor- industry professionals and academics.
mance metrics. However, YOLOv9, YOLOv10, and YOLO11
are not considered in the analysis. The paper highlights the
models’ applications across domains like autonomous driving III. B ENCHMARK S ETUP
and healthcare and proposes incorporating federated learn-
A. Datasets
ing to improve privacy, adaptability, and generalization in
collaborative training. The review, however, limits its focus This study aims to conduct in-depth benchmark research
to mAP (mean Average Precision) for accuracy evaluation, and assess the YOLO algorithms provided by the Ultralytics
4
of YOLOv9’s architecture. This property allows the network For inference, the One-to-One Head generates a single
to retain a complete information flow, enabling more accurate best prediction per object, eliminating the need for Non-
updates to the model’s parameters. Moreover, YOLOv9 offers Maximum Suppression (NMS). By removing the need for
five scaled versions for different uses, focusing on lightweight NMS, YOLOv10 reduces latency and improves the post-
models, which are often under-parameterized and prone to processing speed. In addition, YOLOv10 includes NMS-Free
losing significant information during the feedforward process. Training, which uses consistent dual assignments to reduce
Programmable Gradient Information (PGI) is a significant inference latency, and a model design that optimizes various
advancement introduced in YOLOv9. PGI is a method that components from both efficiency and accuracy perspectives.
dynamically adjusts the gradient information during training This includes lightweight classification heads, spatial-channel
to optimize learning efficiency. By selectively focusing on decoupled downsampling, and rank-guided block design. In
the most informative gradients, PGI helps preserve crucial addition, the model incorporates large-kernel convolutions and
information that might otherwise be lost in lightweight models. partial self-attention modules to enhance performance without
This advancement ensures the model retains the essential significant computational costs.
features for accurate object detection, improving overall per-
formance.
In addition, YOLOv9 incorporates GELAN (Gradient En-
hanced Lightweight Architecture Network), a new architec-
tural advancement designed to improve parameter utiliza-
tion and computational efficiency as illustrated in Figure
5. GELAN achieves this by optimizing the computational
pathways within the network, allowing for better resource
management and adaptability to various applications without
compromising speed or accuracy.
Fig. 6. YOLOv10 architecture showcasing the dual label assignment strategy
for improving accuracy and the PAN layer for enhancing feature representation
alongside one-to-many head for regression and classification tasks and one-
to-one head for precise localization [64].
TABLE IV
YOLO VERSIONS AND SCALED VERSIONS
TABLE VI
E VALUATION RESULTS FOR THE T RAFFIC S IGNS DATASET.
size metric reflects the actual disk size of the model and the YOLO11m with a mAP50-95 of 0.795 and YOLO11l with
number of its parameters. a mAP50-95 of 0.794. In contrast, YOLOv10n exhibits the
These metrics are essential for providing a comprehensive lowest precision, with a mAP50 of 0.722 and a mAP50-95
overview of YOLO models’ performance, allowing for effec- of 0.64, closely followed by YOLOv5un with a mAP50-95 of
tive comparison and evaluation. By employing these metrics, 0.665, as evidenced by the data points in Figure 8.
we can thoroughly assess the accuracy and efficiency of b) Precision and Recall: : Figure 9 elucidates the trade-
different YOLO model versions, ensuring a robust benchmark off between precision and recall taking the size of the models
for their performance and application in various real-world into consideration. Models such as YOLO11m, YOLO10l,
scenarios. YOLOv9m, YOLOv5ux, and YOLO11l exhibit high precision
and recall, specifically with YOLO11m achieving a precision
IV. B ENCHMARK R ESULTS AND D ISCUSSION of 0.898 and a recall of 0.826 while having a size of 67.9Mb,
and YOLOv10l achieving a precision of 0.873 and a recall
A. Results of 0.807 with a significantly bigger size (126.8 Mb). In
1) Traffic Signs Dataset: Table VI presents a comparative contrast, smaller models such as YOLOv10n (precision 0.722,
analysis of the YOLO algorithms’ performance on the Traffic recall 0.602), YOLOv8n (precision 0.749, recall 0.688), and
Signs dataset, evaluated based on accuracy, computational YOLO11n (precision 0.768, recall 0.695) underperform in
efficiency, and model size. The Traffic Signs dataset is a both metrics. This underscores the superior performance of
medium-sized dataset with varied object sizes, making it favor- larger models on the Traffic Signs dataset. Moreover, the
able for benchmarking. The results highlight the effectiveness high precision (0.849) and low recall (0.701) of YOLOv5um
of YOLO models in detecting traffic signs, demonstrating a indicate a propensity for false negatives, while YOLOv3u’s
range of precision. The highest mAP50-95 was 0.799, while high recall (0.849) and low precision (0.75) suggest a tendency
the lowest recorded precision was 0.64. On the other hand, for false positives.
the highest mAP50 is 0.893 while the lowest is 0.722. The c) Computational Efficiency:: In terms of computational
substantial gap between the mAP50 and mAP50-95 results efficiency, YOLOv10n is the most efficient, with a processing
suggests that the models encounter difficulties in uniformly time of 2ms per image and a GFLOPs count of 8.3, as
handling traffic signs with different sizes at higher thresholds, shown in Figures 10 and 11. YOLO11n closely trails this
reflecting areas for potential improvement in their detection at 2.2ms with a 6.4 GFLOPs count, and YOLOv3u-tiny
algorithms. with a processing time of 2.4ms and a GFLOPs count of
a) Accuracy:: As illustrated in Figure 8, YOLOv5ul 19, making it relatively computationally inefficient compared
demonstrates the highest accuracy, achieving a mAP50 of to the other fast models. However, the data indicates that
0.866 and a mAP50-95 of 0.799. This is followed by YOLOv9e, YOLOv9m, YOLOv9c, and YOLOv9s are the least
9
Fig. 8. mAP50 and mAP50-95 YOLO results on traffic signs dataset. Each model is represented by two bars: the left bar shows the mAP50 score, while the
right bar represents the mAP50-95 score.
Fig. 9. Precision vs. Recall based on size results on traffic signs dataset. The size of each circle represents the size of the model, with larger circles indicating
larger models.
efficient, with inference times of 16.1ms, 12.1ms, 11.6ms, and of 49Mb, and 86.8 GFLOPs count), and YOLOv10m (mAP50-
11.1ms, and GFLOPs count of 189.4, 76.7, 102.6, and 26.8 95 of 0.781, inference time of 2.4ms, size of 32.1Mb, 63.8
respectively. These findings delineate a clear trade-off between GFLOPs count). These results highlight the robustness of these
accuracy and computational efficiency. models in detecting traffic signs of various sizes while main-
d) Overall Performance:: When evaluating overall per- taining short inference times and small model sizes. Notably,
formance, which includes accuracy, size, and model efficiency, the YOLO11 and YOLOv10 families significantly outperform
YOLO11m emerges as a consistently top-performing model. It other YOLO families, in terms of accuracy and computational
achieves a mAP50-95 of 0.795, an inference time of 2.4ms, a efficiency in this dataset, as their models consistently surpass
model size of 38.8Mb, and a 67.9 GFLOPs count, as detailed counterparts from other families.
in Figures 8, 10, 11, and Table VI. This is followed by 2) Africa Wildlife Dataset: The results in Table VII show-
YOLO11l (mAP50-95 of 0.794, inference time of 4.6ms, size case the performance of the YOLO models on the Africa
10
Fig. 10. Total processing time results on traffic signs dataset. Each bar represents the total processing time, divided into three sections: Preprocessing Time
(bottom), Inference Time (middle), and Postprocessing Time (top).
Fig. 11. Total processing time and GFLOPs count results on traffic signs dataset.
Wildlife dataset. This dataset contains large object sizes fo- mAP50-95 scores of 0.83 and 0.825, respectively. These
cusing on the ability of YOLO models to predict large objects results highlight the YOLOv9 family’s ability to effectively
and their risk of overfitting due to the size of the dataset. The learn patterns from a small sample of images, making it par-
models demonstrate robust accuracy across the board, with ticularly suited for smaller datasets. In contrast, YOLOv5un,
the highest-performing models achieving a mAP50-95 ranging YOLOv10n, and YOLOv3u-tiny show lower mAP50-95 scores
from 0.832 to 0.725. This relatively shorter range reflects the of 0.791, 0.786, and 0.725, indicating their limitations in accu-
effectiveness of the models in detecting and classifying large racy. The underperformance of larger models like YOLO11x,
wildlife objects by maintaining high accuracy. YOLOv5ux, YOLOv5ul, and YOLOv10l can be attributed to
overfitting, especially given the small dataset size.
a) Accuracy:: As illustrated in Figure 12, YOLOv9s
demonstrates exceptional performance with a high mAP50-95 b) Precision and Recall:: Figure 13 reveals that
of 0.832 and a mAP50 of 0.956, showcasing its robust accu- YOLO8l and YOLO11l achieve the highest precision and
racy across various IoU thresholds. YOLOv9c and YOLOv9t recall, with values of 0.942 and 0.937 for precision, and 0.898
follow closely, with mAP50 scores of 0.96 and 0.948 and and 0.896 for recall, respectively. Notably, YOLOv8n achieves
11
TABLE VII
E VALUATION RESULTS FOR THE A FRICA W ILDLIFE DATASET.
Fig. 12. mAP50 and mAP50-95 YOLO results on Africa wildlife dataset. Each model is represented by two bars: the left bar shows the mAP50 score, while
the right bar represents the mAP50-95 score.
12
Fig. 13. Precision vs. Recall based on size results on Africa wildlife dataset. The size of each circle represents the size of the model, with larger circles
indicating larger models.
Fig. 14. Total processing time results on Africa wildlife dataset. Each bar represents the total processing time, divided into three sections: Preprocessing
Time (bottom), Inference Time (middle), and Postprocessing Time (top).
similar results (0.932 for precision, 0.908 for recall) with Figures 14 and 15. Conversely, YOLOv9e exhibits the slowest
a compact size of 6.55Mb, demonstrating its efficiency. In processing time at 11.2ms and a GFLOPs count of 189.3,
contrast, YOLOv3u and YOLOv5ul exhibit lower precision followed by YOLOv5ux at 7.5ms and 246.2 GFLOPs count.
and recall scores (0.91 and 0.88 for YOLOv3u, 0.916 and These results indicate that larger models tend to require more
0.881 for YOLOv5ul), despite their larger sizes (204.86Mb processing time and hardware usage compared to smaller
for YOLOv3u, 106.85Mb for YOLOv5ul), which may be models, emphasizing the trade-off between model size and
attributed to overfitting issues. processing efficiency.
c) Computational Efficiency:: YOLOv10n, YOLOv8n, d) Overall Performance:: YOLOv9t and YOLOv9s con-
and YOLOv3u-tiny are the fastest models, achieving pro- sistently excel across all metrics, delivering high accuracy
cessing times of 2ms and 1.8ms, with GFLOPs counts of while maintaining small model sizes, low GFLOPs, and short
8.2 and 19.1, respectively. The first two models share the inference times, as shown in Table VII, and Figures 13, 14, and
same processing speed and GFLOPs count, as showcased in 15. This demonstrates the robustness of YOLOv9’s smaller
13
Fig. 15. Total processing time and GFLOPs count results on Africa wildlife dataset.
models and their effectiveness on small datasets. In contrast, of 0.668 and a recall of 0.555. It was closely followed
YOLO5ux and YOLO11x show suboptimal accuracy despite by YOLOv9m (precision of 0.668, recall of 0.551) and
their larger sizes and longer inference times, likely due to YOLOv8m (precision of 0.669, recall of 0.525), both of which
overfitting. Most large models underperformed on this dataset, are significantly smaller in size (40.98 Mb for YOLOv9m
with the exception of YOLOv10x, which benefited from a and 52.12 Mb for YOLOv8m). In contrast, YOLO11n and
modern architecture that prevents overfitting. YOLOv10s exhibited lower performance, with precisions of
3) Ships and Vessels Dataset: Table VIII presents the per- 0.574 and 0.586 and recalls of 0.51 and 0.511, respectively,
formance of YOLO models on the Ships and Vessels dataset, likely due to underfitting issues. Generally, YOLO11 models
a large dataset featuring tiny objects with varying rotations. tended to produce false positives, reflected in their low preci-
Overall, the models demonstrated moderate effectiveness in sion and high recall. Meanwhile, YOLOv10 underperformed
detecting ships and vessels, with mAP50-95 ranging from in both precision and recall, despite being one of the newest
0.273 to 0.327. This performance suggests that YOLO al- models in the YOLO family.
gorithms may face challenges in accurately detecting smaller
c) Computational Efficiency:: As illustrated in Figures
objects, and the dataset’s diversity in object sizes and rotations
18 and 19, YOLOv3u-tiny achieved the fastest processing
provides a comprehensive test of the models’ capabilities in
time at 2 ms, closely followed by YOLOv8n and YOLOv5un,
these conditions.
both recording 2.3 ms. YOLOv10 and YOLO11 models also
a) Accuracy: : The disparity between mAP50-95 and
excelled in speed, with YOLOv10n and YOLO11n achieving
mAP50, illustrated in Figure 16, underscores the challenges
rapid inference times of 2.4 ms and 2.5 ms, along with
YOLO models face with higher IoU thresholds when de-
GFLOPs counts of 8.2 and 6.3, respectively. In contrast,
tecting small objects. Additionally, YOLO models strug-
YOLOv9e exhibited the slowest speed, with an inference time
gle with detecting objects of varying rotations. Among the
of 7.6 ms and a GFLOPs count of 189.3, highlighting the
models, YOLO11x achieved the highest accuracy, with a
trade-off between accuracy and efficiency within the YOLOv9
mAP50 of 0.529 and a mAP50-95 of 0.327, closely followed
family.
by YOLO11l, YOLO11m, and YOLO11s, which recorded
mAP50 values of 0.529, 0.528, and 0.53, and mAP50-95 d) Overall Performance:: The results in Table VIII
values of 0.327, 0.325, and 0.325, respectively. These results and Figures 16, 17, and 18 demonstrate that YOLO11s and
highlight the robustness of the YOLO11 family in detecting YOLOv10s excelled in accuracy while maintaining compact
small and tiny objects. In contrast, YOLOv3u-tiny, YOLOv8n, sizes, low GFLOPs, and quick processing times. In contrast,
YOLOv3u, and YOLOv5n exhibited the lowest accuracy, with YOLOv3u, YOLOv8x, and YOLOv8l fell short of expecta-
mAP50 scores of 0.489, 0.515, 0.519, and 0.514, and mAP50- tions despite their larger sizes and longer processing times.
95 scores of 0.273, 0.297, 0.298, and 0.298, respectively. These findings highlight the robustness and reliability of the
This suggests the outdated architecture of YOLOv3u and the YOLO11 family, particularly in improving the YOLO family’s
potential underfitting of smaller models due to the large dataset performance in detecting small and tiny objects while ensur-
size. ing efficient processing. Additionally, the results reveal the
b) Precision and Recall: : Figure 17 indicates that underperformance of YOLOv9 models when faced with large
YOLOv5ux outperformed other models, achieving a precision datasets and small objects, despite their modern architecture.
14
TABLE VIII
E VALUATION RESULTS FOR THE SHIPS AND VESSELS DATASET.
Fig. 16. mAP50 and mAP50-95 YOLO results on ships and vessel dataset. Each model is represented by two bars: the left bar shows the mAP50 score,
while the right bar represents the mAP50-95 score.
15
Fig. 17. Precision vs. Recall based on size results on ships and vessels dataset. The size of each circle represents the size of the model, with larger circles
indicating larger models.
Fig. 18. Total processing time results on ships and vessels dataset. Each bar represents the total processing time, divided into three sections: Preprocessing
Time (bottom), Inference Time (middle), and Postprocessing Time (top).
16
Fig. 19. Total processing time and GFLOPs count results on ships and vessels dataset.
B. Discussion TABLE IX
OVERALL RANKING OF YOLO ALGORITHMS
Based on the performance of the models across the three
datasets, we ranked them by accuracy, speed, GFLOps count, Accuracy Speed GFLOPs Size
Version
Rank Rank Rank Rank
and size, as shown in Table IX to facilitate a comprehensive YOLOv3u-tiny 28 1 6 11
evaluation. For accuracy, the mAP50-95 metric was employed YOLOv3u 20 24 28 28
due to its capacity to assess models across a range of IoU YOLOv5un 27 6 2 4
YOLOv5us 24 7 8 9
thresholds, thus providing a detailed insight into each model’s YOLOv5um 17 15 13 18
performance. For speed, models were sorted based on the total YOLOv5ul 14 19 21 23
processing time, which encompasses preprocessing, inference, YOLOv5ux 17 27 26 27
YOLOv8n 26 5 4 5
and postprocessing durations. The rankings range from Rank YOLOv8s 23 9 11 10
1, indicating the highest performance, to Rank 28, denoting YOLOv8m 15 17 16 20
the lowest, with the respective rankings highlighted in bold YOLOv8l 13 22 22 22
YOLOv8x 8 26 27 26
within the table.
YOLOv9t 20 12 3 1
The analysis of Table IX yields several critical observations: YOLOv9s 7 15 10 6
1) Accuracy: YOLO11m consistently emerged as a top per- YOLOv9m 4 21 15 15
YOLOv9c 9 25 19 19
former, frequently ranking among the highest, closely followed YOLOv9e 12 28 24 25
by YOLOv10x, YOLO11l, YOLOv9m, and YOLO11x. This YOLOv10n 25 2 5 3
underscores the robust performance of the YOLO11 family YOLOv10s 19 3 9 7
YOLOv10m 5 10 12 12
across varying IoU thresholds and object sizes, which can YOLOv10b 9 12 18 14
be attributed to their use of C2PSA for the preservation of YOLOv10l 11 17 20 17
contextual information, leading to improved convergence and YOLOv10x 2 22 23 21
overall performance. In addition, the implementation of large- YOLO11n 22 3 1 2
YOLO11s 16 8 7 8
kernel convolutions and partial self-attention modules helped YOLO11m 1 11 14 13
increase the performance of the algorithm. YOLO11l 3 14 17 16
YOLO11x 5 19 25 24
Conversely, YOLOv3u-tiny exhibited the lowest accuracy,
particularly in the Africa Wildlife and Ships and Vessels
datasets, with YOLOv5un and YOLOv8n showing slightly bet- family, despite its later introduction, exhibited relatively lower
ter but still sub-par results. This suggests that YOLO11 models accuracy in the Traffic Signs and Africa Animals datasets,
are currently the most reliable for applications demanding high resulting in an average accuracy drop of 2.075% compared
accuracy. to the YOLOv9 models in those datasets. The slight under-
Closely following the performance of the YOLO11 family, performance of YOLOv10 can be attributed to its adoption
the YOLOv9 models demonstrate their effectiveness in detect- of the One-to-One Head approach instead of Non-Maximum
ing objects across various sizes and different IoU thresholds. Suppression (NMS) for defining bounding boxes. This strategy
However, they may struggle with small objects, as seen in can struggle to capture objects effectively, particularly when
the Ships and Vessels dataset. In contrast, the YOLOv10 dealing with overlapping items, as it relies on a single pre-
17
diction per object. This limitation helps explain the relatively tegration. This positions YOLOv9 as a viable choice for
subpar results observed in the second dataset. applications where precision is prioritized over speed.
Similarly, the outdated architecture of YOLOv3u con- In addition, YOLOv8 and YOLOv5u exhibited competitive
tributed to its inferior performance, averaging 6.5% lower results, surpassing YOLOv3u in accuracy, which is likely due
accuracy than the YOLO11 models. This decline can be to YOLOv3u’s older architecture. However, their accuracy still
traced back to its reliance on the older Darknet-53 framework, fell significantly short compared to the newer models, such
first introduced in 2018, which may not adequately address as YOLOv9, YOLOv10, and YOLO11. While YOLOv8 and
contemporary detection challenges. YOLOv5u had faster processing times than YOLOv9, their
2) Computational Efficiency: YOLOv10n consistently out- overall performance remains inferior to that of the newer
performed other models in terms of speed and GFLOPs count, models.
ranking among the top across all three datasets in terms 5) Object Size and Rotation Detection: The YOLO algo-
of speed and 5th in terms of GFLOPs count. YOLOv3u- rithm is effective in detecting large and medium-sized objects,
tiny, YOLOv10s, and YOLO11n also demonstrated notable as evidenced by high accuracy in the Africa Wildlife and
computational efficiency. Traffic Signs datasets. However, it struggles with small object
detection, probably due to its division of images into grids,
YOLOv9e exhibited the slowest inference times and a
making identifying small, low-resolution objects challenging.
very high GFLOPs count across the datasets, illustrating the
Adding to that, YOLO faces challenges when handling objects
trade-off between accuracy and efficiency. YOLO11’s speed
of different rotations due to the inability to enclose rotated
improvements, attributable to their use of the C3k2 block,
objects, leading to sub-par results overall.
make it suitable for applications where rapid processing is
essential, surpassing YOLOv10 and YOLOv9 models, in terms To handle rotated objects, models such as YOLO11 OBB
of speed by %1.41 and %31 on average, respectively. [26] and YOLOv8 OBB [25] (Oriented Bounding Box) can
be implemented. Keeping the same foundational architecture
While YOLOv9 models excelled in accuracy, their inference
as the standard YOLOv8 and YOLO11, YOLOv8 OBB and
times were among the slowest, making them less ideal for
YOLO11 OBB replace the standard bounding box prediction
time-sensitive applications. In contrast, YOLOv10 models,
head with one that predicts the four corner points of a
though slightly slower than the YOLO11 variants, still offer
rotated rectangle, allowing for more accurate localization and
a commendable balance between efficiency and speed. Their
representation of arbitrarily oriented objects.
performance is well-suited for time-sensitive scenarios, provid-
6) The Rise of YOLO11 Over YOLOv8: Although YOLOv8
ing rapid processing without significantly sacrificing accuracy,
[25] has been the algorithm of choice for its versatility in tasks
making them a viable option for real-time applications.
such as pose estimation, instance segmentation, and oriented
3) Model Size: YOLOv9t was the smallest model, ranking
object detection (OBB), YOLO11 [26] has now emerged as
first across all three datasets, followed by YOLO11n and
a more efficient and accurate alternative. With its ability to
YOLOv10n. This efficiency in model size underscores the
handle the same tasks while offering improved contextual
advancements in newer YOLO versions, especially YOLOv10,
understanding and better architectural modules, YOLO11 sets
showcasing the effectiveness of implementing the Spatial-
a new standard in performance, surpassing YOLOv8 in both
Channel Decoupled Downsampling for efficient parameter
speed and accuracy across various applications.
utilization.
7) Dataset Size: The size of the dataset significantly influ-
YOLOv3u was the largest model, highlighting its ineffi- ences the performance of YOLO models. For instance, large
ciency compared to its more modern counterparts due to its models did not perform optimally on the small African wildlife
outdated architecture. dataset compared to their results on the Traffic Signs and Ships
4) Overall Performance: Considering accuracy, speed, and Vessels datasets due to being more prone to overfitting.
size, and GFLOPs, YOLO11m, YOLOv11n, YOLO11s, and Conversely, small models such as YOLOv9t and YOLOv9s
YOLOv10s emerged as the most consistent performers. They performed significantly better on the Africa Wildlife dataset
achieved high accuracy, low processing time and power, and compared to their results on the other datasets, showcasing the
efficient disk usage, making them suitable for a wide range of effectiveness of small-scaled models when handling limited
applications where both speed and accuracy are crucial. datasets.
Conversely, YOLOv9e, YOLOv5ux, and YOLOv3u demon- 8) Impact of Training Datasets: The performance of YOLO
strated poor results across all metrics, being computation- models is influenced by the training datasets used, as shown
ally inefficient and underperforming relative to their sizes. in Tables VI, VII, and VIII. Different datasets yield varying
YOLO11 models showed the best overall performance, likely results and top performers, indicating that dataset complexity
due to recent enhancements such as the C3k2 block and affects algorithm performance. This underscores the impor-
C2PSA module. Following closely, YOLOv10 models, despite tance of using diverse datasets during benchmarking to obtain
slightly underperforming in accuracy excelled in efficiency comprehensive results on the strengths and limitations of each
thanks to its use of implementation of One-to-One head for model.
prediction. While YOLOv9 showed underperformance in com- This discussion highlights the need for a balanced consid-
putational efficiency, it remains competitive with YOLOv10 eration of accuracy, speed, and model size when selecting
and YOLO11 in terms of accuracy, thanks to its PGI in- YOLO models for specific applications. The consistent perfor-
18
mance of YOLO11 models across various metrics makes them [3] Alaa Ali and Magdy A Bayoumi. Towards real-time dpm object detector
highly recommended for versatile situations where accuracy for driver assistance. In 2016 IEEE International Conference on Image
Processing (ICIP), pages 3842–3846. IEEE, 2016.
and speed are essential. Meanwhile, YOLOv10 models can [4] Isaiah Francis E Babila, Shawn Anthonie E Villasor, and Jennifer C Dela
perform similarly while achieving faster processing times Cruz. Object detection for inventory stock counting using yolov5.
and with smaller model sizes. Additionally, YOLOv9 can In 2022 IEEE 18th International Colloquium on Signal Processing &
Applications (CSPA), pages 304–309. IEEE, 2022.
deliver comparable results in terms of accuracy but sacrifices [5] Chetan Badgujar, Daniel Flippo, Sujith Gunturu, and Carolyn Baldwin.
speed, making it suitable for applications where precision is Tree trunk detection of eastern red cedar in rangeland environment with
prioritized over rapid processing. deep learning technique. Croatian Journal of Forest Engineering, 44,
06 2023.
[6] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao.
V. C ONCLUSION Yolov4: Optimal speed and accuracy of object detection. arXiv preprint
This benchmark study thoroughly evaluates the performance arXiv:2004.10934, 2020.
of various YOLO algorithms. It pioneers a comprehensive [7] Yining Cao, Chao Li, Yakun Peng, and Huiying Ru. Mcs-yolo:
A multiscale object detection method for autonomous driving road
comparison of YOLO11 against its predecessors, evaluating environment recognition. IEEE Access, 11:22342–22354, 2023.
their performance across three diverse datasets: Traffic Signs, [8] Libo Cheng, Jia Li, Ping Duan, and Mingguo Wang. A small attentional
African Wildlife, and Ships and Vessels. The datasets were yolo model for landslide detection from satellite remote sensing images.
Landslides, 18(8):2751–2765, 2021.
carefully selected to encompass a wide range of object prop- [9] Yuan Dai, Weiming Liu, Haiyu Li, and Lan Liu. Efficient foreign object
erties, including varying object sizes, aspect ratios, and object detection between psds and metro doors via deep neural networks. IEEE
densities. We showcase the strengths and weaknesses of each Access, PP:1–1, 03 2020.
[10] N. Dalal and B. Triggs. Histograms of oriented gradients for human
YOLO version and family by examining a wide range of detection. In 2005 IEEE Computer Society Conference on Computer
metrics such as Precision, Recall, Mean Average Precision Vision and Pattern Recognition (CVPR’05), volume 1, pages 886–893
(mAP), Processing Time, GFLOPs count, and Model Size. Our vol. 1, 2005.
study addresses the following key research questions: [11] Sheshang Degadwala, Dhairya Vyas, Utsho Chakraborty, Abu Raihan
Dider, and Haimanti Biswas. Yolo-v4 deep learning model for medical
• Which YOLO algorithm demonstrates superior perfor- face mask detection. In 2021 International Conference on Artificial
mance across a comprehensive set of metrics? Intelligence and Smart Systems (ICAIS), pages 209–213. IEEE, 2021.
[12] Tausif Diwan, G Anirudh, and Jitendra V Tembhurne. Object detection
• How do different YOLO versions perform on datasets
using yolo: Challenges, architectural successors, datasets and applica-
with diverse object characteristics, such as size, aspect tions. multimedia Tools and Applications, 82(6):9243–9275, 2023.
ratio, and density? [13] Yunus Egi, Mortaza Hajyzadeh, and Engin Eyceyurt. Drone-computer
• What are the specific strengths and limitations of each communication based tomato generative organ counting model using
yolo v5 and deep-sort. Agriculture, 12:1290, 08 2022.
YOLO version, and how can these insights inform the [14] Loddo Fabio, Dario Piga, Michelucci Umberto, and El Ghazouali
selection of the most suitable algorithm for various ap- Safouane. Benchcloudvision: A benchmark analysis of deep learning
plications? approaches for cloud detection and segmentation in remote sensing
imagery. arXiv preprint arXiv:2402.13918, 2024.
In particular, the YOLO11 family emerged as the most [15] Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva
consistent, with YOLO11m striking an optimal balance be- Ramanan. Object detection with discriminatively trained part-based
tween accuracy, efficiency, and model size. While YOLOv10 models. IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, 32(9):1627–1645, 2010.
delivered slightly lower accuracy than YOLO11, it excelled [16] Di Feng, Ali Harakeh, Steven L Waslander, and Klaus Dietmayer.
in speed and efficiency, making it a strong choice for appli- A review and comparative study on probabilistic object detection in
cations requiring efficiency and fast processing. Additionally, autonomous driving. IEEE Transactions on Intelligent Transportation
Systems, 23(8):9961–9980, 2021.
YOLOv9 performed well overall and particularly stood out [17] Rongli Gai, Na Chen, and Hai Yuan. A detection algorithm for cherry
in smaller datasets. These findings provide valuable insights fruits based on the improved yolo-v4 model. Neural Computing and
for industry and academia, guiding the selection of the most Applications, 35(19):13895–13906, 2023.
suitable YOLO algorithms and informing future developments [18] Dweepna Garg, Parth Goel, Sharnil Pandya, Amit Ganatra, and Ketan
Kotecha. A deep learning approach for face detection using yolo. In
and enhancements. While the evaluated algorithms demon- 2018 IEEE Punecon, pages 1–4. IEEE, 2018.
strate promising performance, there is still room for refine- [19] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich
ment. Future research could focus on optimizing YOLOv10 to feature hierarchies for accurate object detection and semantic segmen-
tation, 2014.
enhance its accuracy while preserving its speed and efficiency [20] Juan Guerrero-Ibáñez, Sherali Zeadally, and Juan Contreras-Castillo.
advantage. Additionally, continued advancements in architec- Sensor technologies for intelligent transportation systems. Sensors,
tural design may pave the way for even more groundbreaking 18(4), 2018.
[21] Muhammad Hussain. Yolo-v1 to yolo-v8, the rise of yolo and its
YOLO algorithms. Our future work includes in-depth studies complementary nature toward digital manufacturing and industrial defect
of the identified gaps in these algorithms, along with proposed detection. Machines, 11(7):677, 2023.
improvements to demonstrate their potential impact on overall [22] Muhammad Hussain. Yolov1 to v8: Unveiling each variant–a compre-
efficiency. hensive review of yolo. IEEE Access, 12:42816–42833, 2024.
[23] Rasheed Hussain and Sherali Zeadally. Autonomous cars: Research
results, issues, and future challenges. IEEE Communications Surveys &
R EFERENCES Tutorials, 21(2):1275–1313, 2019.
[1] Oluibukun Ajayi, John Ashi, and BLESSED Guda. Performance [24] Glenn Jocher. Ultralytics yolov5, 2020.
evaluation yolo v5 model for automatic crop and weed classification [25] Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics yolov8, 2023.
on uav images. Smart Agricultural Technology, 5:100231, 04 2023. [26] Glenn Jocher and Jing Qiu. Ultralytics yolo11, 2024.
[2] Bader Aldughayfiq, Farzeen Ashfaq, NZ Jhanjhi, and Mamoona Hu- [27] Chang Ho Kang and Sun Young Kim. Real-time object detection and
mayun. Yolo-based deep learning model for pressure ulcer detection segmentation technology: an analysis of the yolo algorithm. JMST
and classification. In Healthcare, volume 11, page 1222. MDPI, 2023. Advances, 5(2):69–76, 2023.
19
[28] Nyoman Karna, Made Adi Paramartha Putra, Syifa Rachmawati, Mideth the IEEE conference on computer vision and pattern recognition, pages
Abisado, and Gabriel Sampedro. Toward accurate fused deposition 779–788, 2016.
modeling 3d printer fault detection using improved yolov8 with hy- [48] Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. In
perparameter optimization. IEEE Access, PP:1–1, 01 2023. Proceedings of the IEEE conference on computer vision and pattern
[29] Chuyi Li, Lulu Li, Hongliang Jiang, Kaiheng Weng, Yifei Geng, Liang recognition, pages 7263–7271, 2017.
Li, Zaidan Ke, Qingyuan Li, Meng Cheng, Weiqiang Nie, et al. Yolov6: [49] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement.
A single-stage object detection framework for industrial applications. arXiv preprint arXiv:1804.02767, 2018.
arXiv preprint arXiv:2209.02976, 2022. [50] Arunabha M Roy, Jayabrata Bhaduri, Teerath Kumar, and Kislay Raj.
[30] Guofa Li, Zefeng Ji, Xingda Qu, Rui Zhou, and Dongpu Cao. Cross- Wildect-yolo: An efficient and robust computer vision-based accurate
domain object detection for autonomous driving: A stepwise domain object localization model for automated endangered wildlife detection.
adaptative yolo approach. IEEE Transactions on Intelligent Vehicles, Ecological Informatics, 75:101919, 2023.
7(3):603–615, 2022. [51] Arunabha Mohan Roy, Jayabrata Bhaduri, Teerath Kumar, and Kislay
[31] Min Li, Zhijie Zhang, Liping Lei, Xiaofan Wang, and Xudong Guo. Raj. A computer vision-based object localization model for endangered
Agricultural greenhouses detection in high-resolution satellite images wildlife detection. Ecological Economics, Forthcoming, 2022.
based on convolutional neural networks: Comparison of faster r-cnn, [52] SIDDHARTH SAH. Ships/vessels in aerial images. https://www.kaggle.
yolo v3 and ssd. Sensors, 20(17):4938, 2020. com/datasets/siddharthkumarsah/ships-in-aerial-images/data, july 2023.
[32] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. visited on 2024-07-12.
Focal loss for dense object detection, 2018. [53] Ranjan Sapkota, Rizwan Qureshi, Marco Flores Calero, Muhammad
[33] Martina Lippi, Niccolò Bonucci, Renzo Fabrizio Carpio, Mario Con- Hussain, Chetan Badjugar, Upesh Nepal, Alwin Poulose, Peter Zeno,
tarini, Stefano Speranza, and Andrea Gasparri. A yolo-based pest Uday Bhanu Prakash Vaddevolu, Hong Yan, et al. Yolov10 to its genesis:
detection system for precision agriculture. In 2021 29th Mediterranean A decadal and comprehensive review of the you only look once series.
Conference on Control and Automation (MED), pages 342–347. IEEE, arXiv preprint arXiv:2406.19407, 2024.
2021. [54] Abhishek Sarda, Shubhra Dixit, and Anupama Bhan. Object detection
[34] Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xin- for autonomous driving using yolo [you only look once] algorithm.
wang Liu, and Matti Pietikäinen. Deep learning for generic object In 2021 Third international conference on intelligent communication
detection: A survey. International journal of computer vision, 128:261– technologies and virtual mobile networks (ICICV), pages 1370–1374.
318, 2020. IEEE, 2021.
[35] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott [55] Maged Shoman, Gabriel Lanzaro, Tarek Sayed, and Suliman Gargoum.
Reed, Cheng-Yang Fu, and Alexander C. Berg. SSD: Single Shot Autonomous vehicle-pedestrian interaction modeling platform: A case
MultiBox Detector, page 21–37. Springer International Publishing, 2016. study in four major cities. Journal of Transportation Engineering Part
[36] Jueal Mia, Hasan Imam Bijoy, Shoreef Uddin, and Dewan Mamun Raza. A Systems, 06 2024.
Real-time herb leaves localization and classification using yolo. In [56] Maged Shoman, Dongdong Wang, Armstrong Aboah, and Mohamed
2021 12th International Conference on Computing Communication and Abdel-Aty. Enhancing traffic safety with parallel dense video captioning
Networking Technologies (ICCCNT), pages 1–7. IEEE, 2021. for end-to-end event analysis, 2024.
[37] Hamzeh Mirhaji, Mohsen Soleymani, Abbas Asakereh, and Saman Ab- [57] Karen Simonyan and Andrew Zisserman. Very deep convolutional
danan Mehdizadeh. Fruit detection and load estimation of an orange networks for large-scale image recognition, 2015.
orchard using the yolo models through simple approaches in different [58] Mupparaju Sohan, Thotakura Sai Ram, Rami Reddy, and Ch Venkata. A
imaging and illumination conditions. Computers and Electronics in review on yolov8 and its advancements. In International Conference on
Agriculture, 191:106533, 2021. Data Intelligence and Cognitive Informatics, pages 529–545. Springer,
[38] Miand Mostafa and Milad Ghantous. A yolo based approach for 2024.
traffic light recognition for adas systems. In 2022 2nd International [59] suranaree university of technology. africa wild life dataset. https:
Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), //universe.roboflow.com/suranaree-university-of-technology-wqhl6/
pages 225–229. IEEE, 2022. africa-wild-life, feb 2023. visited on 2024-07-12.
[39] Huy Hoang Nguyen, Thi Nhung Ta, Ngoc Cuong Nguyen, Hung Manh [60] Ultralytics. YOLOv5: A state-of-the-art real-time object detection
Pham, Duc Minh Nguyen, et al. Yolo based real-time human detection system. https://docs.ultralytics.com, 2021. Accessed: insert date here.
for smart video surveillance at the edge. In 2020 IEEE eighth inter- [61] Amir Ulykbek, Azamat Serek, and Magzhan Zhailau. A comprehensive
national conference on communications and electronics (ICCE), pages review of object detection in yolo: Evolution, variants, and applications.
439–444. IEEE, 2021. [62] NL Vidya, M Meghana, P Ravi, and Nithin Kumar. Virtual fencing
[40] Radu Oprea. Traffic signs detection europe dataset. https://universe. using yolo framework in agriculture field. In 2021 Third International
roboflow.com/radu-oprea-r4xnm/traffic-signs-detection-europe, feb Conference on Intelligent Communication Technologies and Virtual
2024. visited on 2024-07-12. Mobile Networks (ICICV), pages 441–446. IEEE, 2021.
[41] Rafael Padilla, Sergio L Netto, and Eduardo AB Da Silva. A survey on [63] P. Viola and M. Jones. Rapid object detection using a boosted cascade
performance metrics for object-detection algorithms. In 2020 interna- of simple features. In Proceedings of the 2001 IEEE Computer Society
tional conference on systems, signals and image processing (IWSSIP), Conference on Computer Vision and Pattern Recognition. CVPR 2001,
pages 237–242. IEEE, 2020. volume 1, pages I–I, 2001.
[42] Govind S Patel, Ashish A Desai, Yogesh Y Kamble, Ganesh V Pujari, [64] Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, and
Priyanka A Chougule, and Varsha A Jujare. Identification and separation Guiguang Ding. Yolov10: Real-time end-to-end object detection. arXiv
of medicine through robot using yolo and cnn algorithms for healthcare. preprint arXiv:2405.14458, 2024.
In 2023 International Conference on Artificial Intelligence for Innova- [65] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao.
tions in Healthcare Industries (ICAIIHI), volume 1, pages 1–5. IEEE, Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-
2023. time object detectors. In Proceedings of the IEEE/CVF conference on
[43] Paul Paul Tsoi. YOLO11: The cutting-edge evolution in object detection computer vision and pattern recognition, pages 7464–7475, 2023.
— a brief review of the latest in the yolo series. https://medium.com, [66] Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. Yolov9:
October 2024. Accessed: 2024-10-17. Learning what you want to learn using programmable gradient infor-
[44] Minh-Tan Pham, Luc Courtrai, Chloé Friguet, Sébastien Lefèvre, and mation. arXiv preprint arXiv:2402.13616, 2024.
Alexandre Baussard. Yolo-fine: One-stage detector of small objects [67] Yifan Wang, Lin Yang, Hong Chen, Aamir Hussain, Congcong Ma, and
under various backgrounds in remote sensing images. Remote Sensing, Malek Al-gabri. Mushroom-yolo: A deep learning algorithm for mush-
12(15):2501, 2020. room growth recognition based on improved yolov5 in agriculture 4.0.
[45] Francesco Prinzi, Marco Insalaco, Alessia Orlando, Salvatore Gaglio, In 2022 IEEE 20th International Conference on Industrial Informatics
and Salvatore Vitabile. A yolo-based model for breast cancer detection (INDIN), pages 239–244. IEEE, 2022.
in mammograms. Cognitive Computation, 16(1):107–120, 2024. [68] Tingting Zhao, Xiaoli Yi, Zhiyong Zeng, and Tao Feng. Mobilenet-yolo
[46] Sovit Rath. Yolov8 ultralytics: State-of-the-art yolo models. based wildlife detection model: A case study in yunnan tongbiguan na-
LearnOpenCV–Learn OpenCV, PyTorch, Keras, TensorflowWith Exam- ture reserve, china. Journal of Intelligent & Fuzzy Systems, 41(1):2171–
ples and Tutorials, 2023. 2181, 2021.
[47] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You [69] Yifei Zheng and Hongling Zhang. Video analysis in sports by
only look once: Unified, real-time object detection. In Proceedings of lightweight object detection network under the background of sports
20