0% found this document useful (0 votes)

80 views20 pages

Evaluating The Evolution of YOLO You Only Look Onc

Uploaded by

Aimin Nur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views20 pages

Evaluating The Evolution of YOLO You Only Look Onc

Uploaded by

Aimin Nur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

1

Evaluating the Evolution of YOLO (You Only Look

Once) Models: A Comprehensive Benchmark Study
of YOLO11 and Its Predecessors
Nidhal Jegham, Chan Young Koh, Marwan Abdelatti, and Abdeltawab Hendawi

Abstract—This study presents a comprehensive benchmark mary approach. These methods are usually classified into two
analysis of various YOLO (You Only Look Once) algorithms,
arXiv:2411.00201v1 [cs.CV] 31 Oct 2024

categories: one-stage and two-stage approaches.

from YOLOv3 to the newest addition. It represents the first re- One-stage methods such as RetinaNet [32] and SSD (Sin-
search to comprehensively evaluate the performance of YOLO11,
the latest addition to the YOLO family. It evaluates their per- gle Shot MultiBox Detector) [35] perform detection in a
formance on three diverse datasets: Traffic Signs (with varying single pass, balancing speed and accuracy. In contrast, two-
object sizes), African Wildlife (with diverse aspect ratios and at stage methods, such as Region-based Convolutional Neural
least one instance of the object per image), and Ships and Vessels Networks (R-CNN) [19], generate region proposals and then
(with small-sized objects of a single class), ensuring a comprehen- perform classification, offering high precision but being com-
sive assessment across datasets with distinct challenges. To ensure
a robust evaluation, we employ a comprehensive set of metrics, putationally intensive.
including Precision, Recall, Mean Average Precision (mAP), Among one-stage object detection methods, YOLO (You
Processing Time, GFLOPs count, and Model Size. Our analysis Only Look Once) stands out for its robustness and effi-
highlights the distinctive strengths and limitations of each YOLO ciency. Initially introduced in 2015 by Redmon et al. [21],
version. For example: YOLOv9 demonstrates substantial accu- YOLO redefined object detection by predicting bounding
racy but struggles with detecting small objects and efficiency
whereas YOLOv10 exhibits relatively lower accuracy due to boxes and class probabilities directly from full images in
architectural choices that affect its performance in overlapping a single evaluation [47]. This innovative approach allowed
object detection but excels in speed and efficiency. Additionally, YOLOv1 to achieve real-time object detection with impressive
the YOLO11 family consistently shows superior performance in accuracy. Building upon this foundation, YOLOv2 [48] incor-
terms of accuracy, speed, computational efficiency, and model porated several key enhancements. It integrated the Darknet-
size. YOLO11m achieved a remarkable balance of accuracy
and efficiency, scoring mAP50-95 scores of 0.795, 0.81, and 19 framework, a 19-layer convolutional neural network that
0.325 on the Traffic Signs, African Wildlife, and Ships datasets, improved feature extraction. YOLOv2 also introduced batch
respectively, while maintaining an average inference time of normalization and employed data augmentation techniques
2.4ms, a model size of 38.8Mb, and around 67.6 GFLOPs on inspired by the VGG architecture [57] to enhance the model’s
average. These results provide critical insights for both industry generalization. YOLOv3 [49] further advanced the model with
and academia, facilitating the selection of the most suitable
YOLO algorithm for diverse applications and guiding future the Darknet-53 framework, a deeper network that significantly
enhancements. improved feature extraction capabilities. This version also
utilized a Feature Pyramid Network (FPN)-inspired design,
Index Terms—YOLO (You Only Look Once), YOLO11,
YOLOv10, Object detection, Ultralytics, Benchmark Analysis. which allowed for better detection across various object scales
by combining high-level semantic features with low-level
detailed features, and a Three-Scale detection mechanism that
improved accuracy for objects of different sizes.
I. I NTRODUCTION AND P OSITIONING Following YOLOv3, the model’s development branched into
various communities, leading to several notable iterations.
Object detection is an essential component of computer YOLOv4 [6], developed by Bochkovskiy et al., introduced
vision systems, enabling automated identification and local- enhancements such as Spatial Pyramid Pooling (SPP) and the
ization of objects within images or video frames [34]. Its Path Aggregation Network (PAN). SPP aggregates features
applications span from autonomous driving and robotics [16] from multiple scales, preserving spatial information, while
[5] [20] [56] to inventory management, video surveillance, and PAN improves the fusion of features between layers, resulting
sports analysis [4] [23] [55] [69]. in improved speed and accuracy. YOLOv5 [60] marked a
Over the years, object detection has developed significantly. significant transition by moving from the Darknet framework
Initially, traditional methods such as the Viola-Jones algorithm to PyTorch, a popular deep learning library. This transition
[63] and the Deformable Part-based Model (DPM) [15] used made the model more accessible and easier to customize. The
handcrafted features and were effective for applications such architecture incorporated strided convolution layers, which
as face detection [63], pedestrian detection [10], and video reduced computational load, and Spatial Pyramid Pooling Fast
surveillance [3]. However, these methods had limitations in (SPPF) layers, optimizing memory usage while maintaining
robustness and generalization. With the advancement of deep high performance. YOLOv6 and YOLOv7 continued this tra-
learning, network-based methods have since become the pri- jectory with innovative architectures. YOLOv6 [29] introduced
2

YOLOv3 Scaled-YOLOv4 YOLOv6 YOLO-NAS YOLOv11

Farhadi, A., & Redmon, J. Wang, C. Y., Bochkovskiy, A., et al. Li, Chuyi et al. "YOLOv6: A single- DECI-AI Glenn Jocher (Ultralytics)
"Yolov3: An incremental "Scaled-YOLOv4: Scaling cross stage object detection framework YOLO-NAS - GitHub YOLOv11 GitHub
improvement" stage partial network" for industrial applications."

YOLO is introduced YOLOS YOLOv9

Redmon, J. "You only look YOLOv5 Fang, Y. et al. "You only look at one YOLOv8 Wang, C. Y. et al. "Yolov9: Learning
once: Uni�ed, real-time Glenn Jocher (Ultralytics) sequence: Rethinking transformer Glenn Jocher (Ultralytics) what you want to learn using
object detection". YOLOv5 GitHub in vision through object detection. YOLOv8 GitHub programmable gradient information"

Jun 8, 2015 Apr 8, 2018 Jun 9, 2020 Nov 16, 2020 Jun 1, 2021 Jun,1,2022
Jun 2021 Jan 10, 2023 May 2, 2023 Feb 21, 2024 Sep 30, 2024

Dec 25, 2016 Apr 23, 2020 Jul 23, 2020 May 10, 2021 Jul 18, 2021 Jul 6, 2022 Jan 13, 2023 Jan 30, 2024 May 23, 2024

YOLOv2 (YOLO9000) PP-YOLO YOLOX YOLOv6 3.0 YOLOv10

Redmon, J., & Farhadi, A. Long, X. et al. "PP-YOLO: An Zheng Ge et al. "YOLOX: Exceeding Li, Chuyi et al. "Yolov6 v3. 0: A full- Wang, A., Chen et al. "Yolov10: Real-
"YOLO9000: better, faster, stronger" e�ective and e�cient YOLO Series in 2021" scale reloading" time end-to-end object detection"
implementation of object detector"

YOLOv4 YOLOR YOLOv7 YOLO-World

Bochkovskiy, A., et al. "Yolov4: Wang, C. Y. et al. "You only learn one Wang, C. Y., Bochkovskiy, A., et al. Cheng, T. et al. "Yolo-world: Real-
Optimal speed and accuracy of representation: Uni�ed network for "YOLOv7: Trainable bag-of-freebies time open-vocabulary object
object detection" multiple tasks" sets new state-of-the-art for real- detection"
time object detectors"

Fig. 1. Evolution of YOLO Algorithms throughout the years.

RepVGG, an architecture that simplified convolutional layers block, delivering improved performance without compromis-
during inference, and CSPStackRep blocks, which improve ing speed. Additionally, it introduces the C2PSA (Cross Stage
accuracy by splitting the feature map into two parts to process Partial with Spatial Attention) module, which improves spatial
them separately. In addition, YOLOv6 employed a hybrid attention in feature maps, increasing accuracy, especially for
channel strategy for better feature representation. YOLOv7 small and overlapping objects.
[65] leveraged the Extended Efficient Layer Aggregation Net-
This object detection algorithm has undergone several de-
work (E-ELAN), a novel architecture that improved efficiency
velopments as seen in Figure 1 achieving competitive results
and effectiveness by enhancing information flow between
in terms of accuracy and speed, making it the preferred
layers.
algorithm in various fields such as ADAS (Advanced Driver-
Assist System) [47], video surveillance [38], face detection
The most recent versions of YOLO, including YOLOv8,
[39], and many more [18]. For instance, YOLO plays a crucial
YOLOv9, YOLOv10, and YOLO11 represent the forefront
role in the agriculture field by being implemented in numerous
of the model’s development. YOLOv8 [58], released by Ul-
applications such as crop classification [1] [17], pest detection
tralytics, introduced semantic segmentation capabilities, al-
[33], automated farming [67] [37], and virtual fencing [62].
lowing the model to classify each pixel of an image, and
Moreover, YOLO has been utilized on numerous occasions in
provided scalable versions to meet various application needs,
the field of healthcare such as cancer detection [?] [45], ulcer
from resource-constrained environments to high performance
detection [2], medicine classification [36] [42], and health
systems alongside other tasks such as pose estimation, image
protocols enforcement [11].
classification, and oriented object detection (OOB). YOLOv9
[66] built on its predecessors’ architectural advancements In recent years, Ultralytics has played a crucial role in the
with Programmable Gradient Information (PGI), which op- advancement of YOLO by maintaining, improving, and mak-
timizes gradient flow during training, and the Generalized ing these models more accessible [46]. Notably, Ultralytics has
Efficient Layer Aggregation Network (GELAN), which further streamlined the process of fine-tuning and customizing YOLO
improved performance by enhancing layer information flow. models, a task that was considerably more complex in earlier
YOLOv10 [64], developed by Tsinghua University, eliminated iterations. The introduction of user-friendly interfaces, com-
the need for Non-Maximum Suppression (NMS) used by its prehensive documentation, and pre-built modules has greatly
predecessors, a technique used to eliminate duplicate predic- simplified essential tasks such as data augmentation, model
tions and pick the bounding boxes with the most confidence, training, and evaluation. Moreover, the development of scal-
by introducing a dual assignment strategy in its training proto- able model versions allows users to select models tailored
col. Additionally, YOLOv10 features lightweight classification to specific resource constraints and application requirements,
heads, spatial-channel decoupled downsampling, and rank- thereby facilitating more effective fine-tuning. For instance,
guided block design, making it one of the most efficient and YOLOv8n is favorable over YOLOv8m in scenarios where
effective YOLO models to date. Lastly, YOLO11 [26], also speed and computational efficiency are prioritized over accu-
introduced by Ultralytics, retains the capabilities of YOLOv8 racy, making it ideal for resource-constrained environments.
with applications such as Instance Segmentation, Pose Es- The integration of advanced tools for hyperparameter tuning,
timation, and Oriented Object Detection while providing 5 automated learning rate scheduling, and model pruning has
scalable versions for different use cases. YOLO11 replaces further refined the customization process. Continuous updates
the C2f block from YOLOv8 with the more efficient C3k2 and robust community support have also contributed to making
3

YOLO models more accessible and adaptable for a wide range neglecting other key metrics such as Recall and Precision.
of applications. Additionally, it considers FPS (frames per second) as the sole
This paper aims to present a comprehensive comparative measure of computational efficiency, excluding the impact of
analysis of the YOLO algorithm’s evolution. It makes a signif- preprocessing, inference, postprocessing times, GFLOPs, and
icant contribution to the field by offering the first comprehen- size.
sive evaluation of YOLO11, the newest member of the YOLO The paper in [12] thoroughly analyzes single-stage object
family. By leveraging pre-trained models and fine-tuning them, detectors, particularly YOLOs from YOLOv1 to YOLOv4,
we evaluate their performance across three diverse custom with updates to their architecture, performance metrics, and
datasets, each with varying sizes and objectives. Consistent regression formulation. Additionally, it provides an overview
hyperparameters are applied to ensure a fair and unbiased of the comparison between two-stage and single-stage object
comparison. The analysis delves into critical performance met- detectors, several YOLO versions from version 1 to version 4,
rics, including speed, efficiency, accuracy, and computational applications utilizing two-stage detectors, and future research
complexity, as measured by GFLOPs count and model size. In prospects.
addition, we explore the real-world applications of each YOLO The authors of the paper in [53] explore the evolution
version, highlighting their strengths and limitations across of the YOLO algorithms from version 1 to 10, highlighting
different use cases. Through this comparative study, we aim their impact on automotive safety, healthcare, industrial man-
to provide valuable insights for researchers and practitioners, ufacturing, surveillance, and agriculture. The paper highlights
offering a deeper understanding of how these models can be incremental technological advances and challenges in each
effectively applied in various scenarios. version, indicating a potential integration with multimodal,
The rest of this paper is organized as follows: Section context-aware, and General Artificial Intelligence systems for
2 covers related work. Section 3 describes the datasets, the future AI-driven applications. However, the paper does not
models, and the experimental setup, including the hyperpa- include a benchmarking study or a comparative analysis of the
rameters and evaluation metrics used. Section 4 presents the YOLO models, leaving out performance comparisons across
experimental results and comparative analysis alongside a the versions.
discussion. Finally, Section 5 concludes with insights drawn The paper in [61] explores the development of the YOLO
from the study. algorithm till the fourth version. It highlights its challenges
and suggests new approaches, highlighting its impact on object
II. R ELATED W ORK detection and the need for ongoing study.
The authors in the work in [27] analyze the YOLO algo-
The YOLO (You Only Look Once) algorithm is considered rithm, focusing on its development and performance. They
one of the most prominent object detection algorithms. It conduct a comparative analysis of the different versions of
achieves state-of-the-art speed and accuracy, and its various YOLO till the 8th version, highlighting the algorithm’s po-
applications have made it indispensable in numerous fields tential to provide insights into image and video recognition
and industries. Numerous researchers have shown interest in and addressing its issues and limitations. The paper focuses
this object detection algorithm by publishing papers reviewing exclusively on the mAP metric, overlooking other accuracy
its evolution, fine-tuning its models, and benchmarking its measures such as Precision and Recall. Additionally, it ne-
performance against other computer vision algorithms. This glects speed and efficiency metrics limiting the scope of the
widespread interest underscores YOLO’s important role in comparative study. The paper also omits the evaluation of the
advancing the field of computer vision. most recent models, YOLOv9, YOLOv10, and YOLO11.
The paper in [14] examines seven semantic segmentation This paper makes several key contributions: (i) It pioneers a
and detection algorithms, including YOLOv8, for cloud seg- comprehensive comparison of YOLO11 against its predeces-
mentation from remote sensing imagery. It conducts a bench- sors across their scaled variants from nano- to extra-large; (ii)
mark analysis to evaluate their architectural approaches and it offers deep insights into the structural evolution of these al-
identify the most performing ones based on accuracy, speed, gorithms by evaluating their performance across three diverse
and potential applications. The research aims to produce ma- datasets of various object properties; and (iii) our performance
chine learning algorithms that can perform cloud segmentation evaluation extends beyond mAP and FPS to include critical
using only a few spectral bands, including RGB and RGBN-IR metrics such as Precision, Recall, Preprocessing, Inference,
combinations. and Postprocessing Time, GFLOPs, and model size. These
The authors of the paper in [22] review the evolution of the metrics provide valuable insights to guide the selection of
YOLO variants from version 1 to version 8, examining their in- the optimal YOLO algorithm for specific use cases for both
ternal architecture, key innovations, and benchmarked perfor- industry professionals and academics.
mance metrics. However, YOLOv9, YOLOv10, and YOLO11
are not considered in the analysis. The paper highlights the
models’ applications across domains like autonomous driving III. B ENCHMARK S ETUP
and healthcare and proposes incorporating federated learn-
A. Datasets
ing to improve privacy, adaptability, and generalization in
collaborative training. The review, however, limits its focus This study aims to conduct in-depth benchmark research
to mAP (mean Average Precision) for accuracy evaluation, and assess the YOLO algorithms provided by the Ultralytics
4

library. The main goal is to provide a thorough and com- TABLE I

parative analysis of these models and explain their strengths, U LTRALYTICS - SUPPORTED LIBRARY TASKS AND MODELS
deficiencies, and possible applications. YOLO Pre-
Inference Validation Training
This paper is made possible using several publicly accessi- Version trained
YOLOv1 No No No No
ble datasets on Kaggle and Roboflow. The selection of the YOLOv2 No No No No
datasets is based on the increasing implementation of the YOLOv3-u (Ultralytics) Yes Yes Yes Yes
YOLO algorithms in the fields of Autonomous driving [47] YOLOv4 No No No No
YOLOv5-u (Ultralytics) Yes Yes Yes Yes
[54] [30] [7], satellite imagery [31] [8] [44], and wildlife YOLOv6 Yes Yes Yes No
conservation [50] [68] [51]. Moreover, each picked dataset YOLOv7 No No No No
presents unique difficulties and situations for object detection YOLOv8 Yes Yes Yes Yes
YOLOv9 Yes Yes Yes Yes
with varying image sizes and number of observations along- YOLOv10 Yes Yes Yes Yes
side the number of classes. YOLO11 Yes Yes Yes Yes
1) Traffic Signs Dataset: The Traffic Signs dataset by Radu
Oprea is an open-source dataset on Kaggle that contains
around 55 classes across 3253 training and 1128 validation analysis between the models provided by Ultralytics and their
images of traffic signs in different sizes and environments original counterparts on the Traffic Signs dataset provided by
[40]. All of the images in the dataset are initially in a size Radu Oprea [40] using the same hyperparameters in Table
of 640×640 with no labels for False Positives detection. V. The objective is to highlight the differences between
Undersampling techniques were applied to this dataset to Ultralytics models and the original versions, which justifies the
balance the different classes. This dataset is crucial for applica- exclusion of YOLOv4 [6], YOLOv6 [29], and YOLOv7 [65]
tions in autonomous driving, traffic management, road safety, from this paper due to the lack of support for these models
and intelligent transportation systems. However, it presents by Ultralytics. This analysis will demonstrate why focusing
challenges due to the varying sizes of target objects and the exclusively on Ultralytics-supported models ensures a fair and
similarities in patterns across different classes, complicating consistent benchmark evaluation.
the detection process. a) Ultralytics Supported Models and Tasks:: Ultralytics
2) Africa Wild Life Dataset: The Africa Wild Life dataset library provides researchers and programmers various YOLO
is an open-source Kaggle dataset by Bianca Ferreira, designed models for inference, validation, training, and export. Based
for real-time animal detection in nature reserves [59]. It fea- on the results of Table I, we notice that Ultralytics does not
tures four common African animal classes: Buffalo, elephant, support YOLOv1, YOLOv2, YOLOv4, and YOLOv7. Con-
rhino, and zebra. Each class is represented by at least 376 cerning YOLOv6, the library only supports the configuration
images collected via Google image searches and manually *.yaml files without the pre-trained *.pt models.
labeled in the YOLO format. The challenges of this dataset b) Performance Comparison of Ultralytics and Original
are the varying aspect ratios, with each image containing at Models:: Based on the results of our comparative analysis of
least one instance of the specified animal class and potentially the Ultralytics models and their original counterparts on the
multiple instances or occurrences of other classes. Moreover, Traffic Signs dataset presented in Table II, we observe signifi-
overlapping these target objects makes the detection process cant discrepancies between the performance of the Ultralytics
more challenging. This dataset is essential for applications models and their original counterparts. Notably, Ultralytics’
in wildlife conservation, anti-poaching efforts, biodiversity versions of YOLOv5n (nano) and YOLOv3 demonstrate su-
monitoring, and ecological research. perior performance, underscoring the enhancements and opti-
3) Ships/Vessels Dataset: The Ships dataset is an exten- mizations implemented by Ultralytics. Conversely, the original
sive open-source collection containing approximately 13.5k YOLOv9c (compact) slightly outperforms its Ultralytics ver-
images, collected by Siddharth Sah from numerous Roboflow sion, potentially due to the lack of extensive optimization for
datasets, curated explicitly for ship detection [52]. Each image this newer model by Ultralytics. These observations highlight
has been manually annotated with bounding boxes in the that the Ultralytics models have undergone substantial mod-
YOLO format, ensuring precise and efficient detection of ifications, making a direct comparison with the original ver-
ships. This dataset features a single class, ”ship,” allowing sions inequitable. Consequently, the noticeable performance
for streamlined and focused analysis. However, the relatively discrepancy between the two models, including the original
small size of the target objects and their varying rotations and Ultralytics models in the same benchmarking study, would
pose challenges for detection, particularly for the YOLO not provide a fair or accurate assessment. Therefore, this paper
algorithm, which often struggles with small object detection will focus exclusively on the Ultralytics-supported versions to
and objects with varying orientations. The dataset is essential ensure consistent and fair benchmarks.
for various practical applications such as maritime safety, 2) YOLOv3u: YOLOv3, based on its predecessors, aims to
fisheries management, marine pollution monitoring, defense, improve localization errors and detection efficiency, particu-
maritime security, and more. larly for smaller objects. It uses the Darknet-53 framework,
which has 53 convolutional layers and achieves double the
B. Models speed of ResNet-152 [49]. YOLOv3 also incorporates ele-
1) Comparative Analysis: Ultralytics vs. Original YOLO ments from the Feature Pyramid Network (FPN), such as
Models: In this subsection, we will conduct a comparative residual blocks, skip connections, and up-sampling, to enhance
5

TABLE II an anchor-free detection method and achieving better overall

U LTRALYTICS AND ORIGINAL YOLO PERFORMANCE COMPARISON performance, especially on complex objects of different sizes.
Version Source mAP50 mAP50-95
Ultralytics 0.845 0.748
YOLOv9c (compact)
Github 0.881 0.786
Ultralytics 0.756 0.663
YOLOv5n (nano)
Github 0.429 0.367
Ultralytics 0.766 0.67
YOLOv3
Github 0.562 0.471

its ability to detect objects efficiently across varying scales, as

seen in Figure 2. The algorithm generates feature maps at
three distinct scales, down-sampling the input at factors of
32, 16, and 8, and uses a three-scale detection mechanism to
detect large, medium, and small-sized objects using distinct
feature maps. Despite its improvements, YOLOv3 still faces
challenges in achieving precise results for medium and large-
sized objects, so Ultralytics released YOLOv3u. YOLOv3u is Fig. 3. Detailed architecture of YOLOv5 including the CSPDarknet Back-
bone, PANet Neck, and YOLO Layer Head [13].
an improved version of YOLOv3 that utilizes an anchor-free
detection method used later in YOLOv8 and improves upon
the accuracy and speed of YOLOv3, especially for medium 4) YOLOv8: Ultralytics has introduced YOLOv8, a signif-
and large-sized objects. icant evolution in the YOLO series, with five scaled versions
[58] [25]. Alongside object detection, YOLOv8 also provides
various applications such as image classification, pose esti-
mation, instance segmentation, and oriented object detection
(OOB). Key features include a backbone similar to YOLOv5,
with adjustments in the CSPLayer, now known as the C2f
module, which combines high-level features with contextual
information for enhanced detection accuracy highlighted in
Figure 4. YOLOv8 also introduces a semantic segmentation
model called YOLOv8-Seg, which combines a CSPDarknet53
feature extractor with a C2F module, achieving state-of-the-
art results in object detection and semantic segmentation
benchmarks while maintaining high efficiency.
Fig. 2. YOLOv3 architecture showcasing the residual blocks and the
upsampling layers to enhance object detection efficiency through different
scales [9].

3) YOLOv5u: YOLOv5, proposed by Glenn Jocher, tran-

sitions from the Darknet framework to PyTorch, retaining
many improvements from YOLOv4 [60] [24] and utilizing
CSPDarknet as its backbone. CSPDarknet is a modified ver-
sion of the original Darknet architecture that incorporates
Cross-Stage Partial connections by splitting feature maps into
separate paths, allowing for more efficient feature extraction
and reduced computational costs. YOLOv5 features a strided
convolution layer with a large window size, aiming to reduce
memory and computational costs, as showcased in Figure 3. Fig. 4. Detailed architecture of YOLOv8 showcasing the backbone network’s
Moreover, this version adopts the Spatial Pyramid Pooling multiple convolutional layers to extract hierarchical features, the Feature
Pyramid Network (FPN) for enhances detection at different scales, and the
Fast (SPPF) module to provide a multiscale representation of network head to perform final predictions, incorporating convolutional blocks
the input feature maps. The SPPF module works by pooling and upsample blocks to refine features [28].
features at different scales and concatenating them, allowing
the network to capture fine and coarse information. This helps 5) YOLOv9: YOLOv9, developed by Chien-Yao Wang, I-
recognize objects of various sizes more effectively. In addition, Hau Yeh, and Hong-Yuan Mark Liao, uses the Information
YOLOv5 implements several augmentations, such as Mosaic, Bottleneck Principle and Reversible Functions to preserve
copy-paste, random affine, MixUp, HSV augmentation, and essential data across the network’s depth, ensuring reliable
random horizontal flip. YOLOv5 is available in five variants, gradient generation and improved model convergence and
varying in width and depth of convolution modules. Ultralytics performance [66]. Reversible functions, which can be in-
is actively improving this model through YOLOv5u, adopting verted without loss of information, are another cornerstone
6

of YOLOv9’s architecture. This property allows the network For inference, the One-to-One Head generates a single
to retain a complete information flow, enabling more accurate best prediction per object, eliminating the need for Non-
updates to the model’s parameters. Moreover, YOLOv9 offers Maximum Suppression (NMS). By removing the need for
five scaled versions for different uses, focusing on lightweight NMS, YOLOv10 reduces latency and improves the post-
models, which are often under-parameterized and prone to processing speed. In addition, YOLOv10 includes NMS-Free
losing significant information during the feedforward process. Training, which uses consistent dual assignments to reduce
Programmable Gradient Information (PGI) is a significant inference latency, and a model design that optimizes various
advancement introduced in YOLOv9. PGI is a method that components from both efficiency and accuracy perspectives.
dynamically adjusts the gradient information during training This includes lightweight classification heads, spatial-channel
to optimize learning efficiency. By selectively focusing on decoupled downsampling, and rank-guided block design. In
the most informative gradients, PGI helps preserve crucial addition, the model incorporates large-kernel convolutions and
information that might otherwise be lost in lightweight models. partial self-attention modules to enhance performance without
This advancement ensures the model retains the essential significant computational costs.
features for accurate object detection, improving overall per-
formance.
In addition, YOLOv9 incorporates GELAN (Gradient En-
hanced Lightweight Architecture Network), a new architec-
tural advancement designed to improve parameter utiliza-
tion and computational efficiency as illustrated in Figure
5. GELAN achieves this by optimizing the computational
pathways within the network, allowing for better resource
management and adaptability to various applications without
compromising speed or accuracy.
Fig. 6. YOLOv10 architecture showcasing the dual label assignment strategy
for improving accuracy and the PAN layer for enhancing feature representation
alongside one-to-many head for regression and classification tasks and one-
to-one head for precise localization [64].

7) YOLO11: YOLO11 [26] is the latest innovation in the

YOLO series developed by Ultralytics, building upon the
developments of its predecessors, especially YOLOv8. This
iteration offers five scaled models from nano to extra large,
catering to various applications. Like YOLOv8, YOLO11
includes numerous applications such as object detection, in-
stance segmentation, image classification, pose estimation, and
oriented object detection (OBB).
Key improvements in YOLO11 include the introduction of
Fig. 5. YOLOv9 architecture featuring CSPNet, ELAN, and GELAN mod- the C2PSA (Cross-Stage Partial with Self-Attention) module,
ules. CSPNet optimizes gradient flow and reduces computational complexity
via feature map partitioning. ELAN enhances learning efficiency by linearly as seen in Figure 7, which combines the benefits of cross-
aggregating features, and GELAN extends this concept by integrating features stage partial networks with self-attention mechanisms. This
from various depths and pathways, offering increased flexibility and accuracy enables the model to capture contextual information more
in feature extraction [66].
effectively across multiple layers, improving object detection
6) YOLOv10: YOLOv10, developed by Tsinghua Univer- accuracy, especially for small and colluded objects. Addition-
sity researchers, builds upon previous models’ strengths with ally, in YOLO11, the C2f block has been replaced by C3k2,
key innovations [64]. The architecture has an enhanced CSP- a custom implementation of the CSP Bottleneck that uses two
Net (Cross Stage Partial Network) backbone for improved convolutions, unlike YOLOv8’s use of one large convolution.
gradient flow and reduced computational redundancy, as seen This block uses a smaller kernel, retaining accuracy while
in Figure 6. The network is structured into three main compo- improving efficiency and speed.
nents: the backbone, the neck, and the detection head. The
neck includes PAN (Path Aggregation Network) layers for C. Hardware and Software Setup
effective multiscale feature fusion. PAN is designed to enhance Table III showcases the libraries and packages used through-
information flow by aggregating features from different layers, out this paper. During this experiment, 23 models were trained
enabling the network to better capture and combine details at of 5 different YOLO versions found in Table IV. To ensure a
various scales, which is crucial for detecting objects of differ- fair comparative analysis, similar hyperparameters were used
ent sizes. At the same time, the One-to-Many Head generates throughout the whole experiment on all models found below
multiple predictions per object during training to provide rich in table V. We have used 2 NVIDIA RTX 4090 GPUs for the
supervisory signals and improve learning accuracy. Moreover, training and evaluation, each with 16,384 CUDA cores.
this version also offers five scaled versions, from nano to extra- For the traffic signs dataset, undersampling techniques were
large. applied to ensure a balanced dataset, reducing the number of
7

TABLE IV
YOLO VERSIONS AND SCALED VERSIONS

Version Scaled Version

YOLOv3u-tiny
YOLOv3u
YOLOv3u
YOLOv5un (nano)
YOLOv5us (small)
YOLOv5u YOLOv5um (medium)
YOLOv5ul (large)
YOLOv5ux (extra-large)
YOLOv8n (nano)
YOLOv8s (small)
YOLOv8 YOLOv8m (medium)
YOLOv8l (large)
YOLOv8x (extra-large)
YOLOv9t (tiny)
YOLOv9s (small)
YOLOv9 YOLOv9m (medium)
YOLOv9c (compact)
YOLOv9e (extended)
YOLOv10n (nano)
YOLOv10s (small)
YOLOv10m (medium)
YOLOv10
YOLOv10b (balanced)
YOLOv10l (large)
YOLOv10x (extra-large)
Fig. 7. YOLO11 architecture showcasing the new C3k2 blocks and the YOLO11n (nano)
C2PSA module. [26] [43]. YOLO11s (small)
YOLO11 YOLO11m (medium)
YOLO11l (large)
images from 4381 to 3233 images split into 70% training, 20% YOLO11x (extra-large)
validation, and 10% testing. After balancing the dataset, 24
classes remained, with an average of 100 images per class in TABLE V
the training dataset. This dataset contains numerous images of TABLE OF PARAMETERS
traffic sizes of different sizes that render it suitable for diverse Parameter Value
object detection. Epochs 100
The Africa Wildlife dataset contains 1504 images dis- Optimizer AdamW
Batch Size 16
tributed equally among all 4 classes following the 70% train- Image Size (640, 640)
ing, 20% validation, and 10% testing split. This dataset is used Initial & Final Learning rate (0.0001, 0.01)
for large object detection in this section, Dropout rate 0.15
The Ships dataset contains 13.4k images and is divided into Data Split (70, 20, 10)
70% training, 20% validation, and 10% testing. It has only one
class (ship) and is focused on small object detection.
highlighting the occurrence of False Positives. Conversely, Re-
TABLE III
call [41] measures the ratio of correctly predicted observations
S OFTWARE SETUP to all actual observations, thus emphasizing the occurrence of
False Negatives. Both mAP50 and mAP50-95 [70] provide
Name Version Description
Python 3.12 Programming language
a comprehensive summary of Precision and Recall. While
Ubuntu 22.04 Linux operating system mAP50 calculates the Mean Average Precision at an IoU
CUDA 12.5 Platform for GPU based processing threshold of 0.50, mAP50-95 extends this calculation across
cuDNN 8.9.7 CUDA library for deep neural networks multiple IoU thresholds from 0.50 to 0.95, with a step size of
Ultralytics 8.2.55 YOLO Object Detection Library
WandB 0.17.4 ML experiment tracking 0.05.
Regarding computational efficiency metrics, Preprocessing
Time, Inference Time, and Postprocessing Time will be uti-
lized to evaluate the model’s speed. Preprocessing Time refers
D. Metrics
to the duration taken to prepare raw data for input into the
This study evaluates the performance of YOLO models us- model. Inference Time is the duration required for the model to
ing three primary metrics: accuracy, computational efficiency, process the input data and generate predictions. Postprocessing
and size. The accuracy metrics include Precision, Recall, Time denotes the time needed to convert the model’s raw
mAP50 (Mean Average Precision at an IoU (Intersection over predictions into a final, usable format. These metrics were
Union) threshold of 0.50), and mAP50-95 (Mean Average measured using a sample of images for testing after training
Precision across IoU (Intersection over Union) thresholds from the models. Additionally, the GFLOPs (Giga Floating-Point
0.50 to 0.95). Precision [41] measures the ratio of correctly Operations Per Second) measure the computational power for
predicted observations to the total predicted observations, thus the model training, reflecting its efficiency. In contrast, the
8

TABLE VI
E VALUATION RESULTS FOR THE T RAFFIC S IGNS DATASET.

Preprocess Inference Postprocess Total

Versions Precision Recall mAP50 mAP50-95 GFLOPs Size
Time Time Time Time
YOLOv3u 0.75 0.849 0.874 0.781 0.7 8.5 0.4 9.6 207.86 282.4
YOLOV3u tiny 0.845 0.667 0.772 0.682 1.4 0.7 0.3 2.4 24.44 19
YOLOv5un 0.805 0.679 0.749 0.665 0.6 6.6 0.4 7.6 5.65 7.1
YOLOv5us 0.85 0.777 0.827 0.744 0.5 7.8 0.4 8.7 18.58 23.9
YOLOv5um 0.849 0.701 0.83 0.744 1.1 9.5 0.4 11 50.54 64.1
YOLOv5ul 0.831 0.836 0.886 0.799 0.6 9.7 0.4 10.7 106.85 134.9
YOLOv5ux 0.863 0.795 0.867 0.777 1.1 9.8 0.4 11.3 195.2 246.3
YOLOv8n 0.749 0.688 0.777 0.689 0.6 6.8 0.4 7.8 6.55 8.1
YOLOv8s 0.766 0.788 0.806 0.718 0.6 7.8 0.4 8.8 22.59 28.6
YOLOv8m 0.838 0.805 0.845 0.763 1.6 9.1 0.4 11.1 52.12 78.9
YOLOv8l 0.771 0.789 0.853 0.767 0.6 9.2 0.4 10.2 87.77 165
YOLOv8x 0.902 0.744 0.874 0.78 0.6 9.4 0.4 10.4 136.9 257.7
YOLOv9t 0.792 0.748 0.812 0.731 0.5 10 0.4 10.9 4.93 7.7
YOLOv9s 0.763 0.81 0.828 0.75 0.6 11.1 0.4 12.1 15.33 26.8
YOLOv9m 0.864 0.796 0.864 0.784 1 12.1 0.4 13.5 40.98 76.7
YOLOv9c 0.827 0.807 0.852 0.769 1.3 11.6 0.4 13.3 51.8 102.6
YOLOv9e 0.819 0.824 0.854 0.764 0.8 16.1 0.4 17.3 117.5 189.4
YOLOv10n 0.722 0.602 0.722 0.64 1 0.8 0.2 2 5.59 8.3
YOLOv10s 0.823 0.742 0.834 0.744 1.2 1.1 0.2 2.5 15.9 24.7
YOLOv10m 0.834 0.843 0.88 0.781 1.2 2.4 0.2 3.8 32.1 63.8
YOLOv10b 0.836 0.764 0.859 0.765 1 3.1 0.2 4.3 39.7 98.4
YOLOv10l 0.873 0.807 0.866 0.771 1.1 3.8 0.2 5.1 50 126.8
YOLOv10x 0.773 0.854 0.88 0.787 1 6.3 0.2 7.5 61.4 170.4
YOLO11n 0.768 0.695 0.757 0.668 1.2 0.6 0.4 2.2 5.35 6.4
YOLO11s 0.819 0.758 0.838 0.742 1.2 1 0.4 2.6 18.4 21.4
YOLO11m 0.898 0.826 0.893 0.795 1.2 2.4 0.4 4 38.8 67.9
YOLO11l 0.862 0.839 0.889 0.794 1.2 3 0.4 4.6 49 86.8
YOLO11x 0.819 0.816 0.885 0.784 0.9 6.1 0.4 7.4 109 194.8

size metric reflects the actual disk size of the model and the YOLO11m with a mAP50-95 of 0.795 and YOLO11l with
number of its parameters. a mAP50-95 of 0.794. In contrast, YOLOv10n exhibits the
These metrics are essential for providing a comprehensive lowest precision, with a mAP50 of 0.722 and a mAP50-95
overview of YOLO models’ performance, allowing for effec- of 0.64, closely followed by YOLOv5un with a mAP50-95 of
tive comparison and evaluation. By employing these metrics, 0.665, as evidenced by the data points in Figure 8.
we can thoroughly assess the accuracy and efficiency of b) Precision and Recall: : Figure 9 elucidates the trade-
different YOLO model versions, ensuring a robust benchmark off between precision and recall taking the size of the models
for their performance and application in various real-world into consideration. Models such as YOLO11m, YOLO10l,
scenarios. YOLOv9m, YOLOv5ux, and YOLO11l exhibit high precision
and recall, specifically with YOLO11m achieving a precision
IV. B ENCHMARK R ESULTS AND D ISCUSSION of 0.898 and a recall of 0.826 while having a size of 67.9Mb,
and YOLOv10l achieving a precision of 0.873 and a recall
A. Results of 0.807 with a significantly bigger size (126.8 Mb). In
1) Traffic Signs Dataset: Table VI presents a comparative contrast, smaller models such as YOLOv10n (precision 0.722,
analysis of the YOLO algorithms’ performance on the Traffic recall 0.602), YOLOv8n (precision 0.749, recall 0.688), and
Signs dataset, evaluated based on accuracy, computational YOLO11n (precision 0.768, recall 0.695) underperform in
efficiency, and model size. The Traffic Signs dataset is a both metrics. This underscores the superior performance of
medium-sized dataset with varied object sizes, making it favor- larger models on the Traffic Signs dataset. Moreover, the
able for benchmarking. The results highlight the effectiveness high precision (0.849) and low recall (0.701) of YOLOv5um
of YOLO models in detecting traffic signs, demonstrating a indicate a propensity for false negatives, while YOLOv3u’s
range of precision. The highest mAP50-95 was 0.799, while high recall (0.849) and low precision (0.75) suggest a tendency
the lowest recorded precision was 0.64. On the other hand, for false positives.
the highest mAP50 is 0.893 while the lowest is 0.722. The c) Computational Efficiency:: In terms of computational
substantial gap between the mAP50 and mAP50-95 results efficiency, YOLOv10n is the most efficient, with a processing
suggests that the models encounter difficulties in uniformly time of 2ms per image and a GFLOPs count of 8.3, as
handling traffic signs with different sizes at higher thresholds, shown in Figures 10 and 11. YOLO11n closely trails this
reflecting areas for potential improvement in their detection at 2.2ms with a 6.4 GFLOPs count, and YOLOv3u-tiny
algorithms. with a processing time of 2.4ms and a GFLOPs count of
a) Accuracy:: As illustrated in Figure 8, YOLOv5ul 19, making it relatively computationally inefficient compared
demonstrates the highest accuracy, achieving a mAP50 of to the other fast models. However, the data indicates that
0.866 and a mAP50-95 of 0.799. This is followed by YOLOv9e, YOLOv9m, YOLOv9c, and YOLOv9s are the least
9

Fig. 8. mAP50 and mAP50-95 YOLO results on traffic signs dataset. Each model is represented by two bars: the left bar shows the mAP50 score, while the
right bar represents the mAP50-95 score.

Fig. 9. Precision vs. Recall based on size results on traffic signs dataset. The size of each circle represents the size of the model, with larger circles indicating
larger models.

efficient, with inference times of 16.1ms, 12.1ms, 11.6ms, and of 49Mb, and 86.8 GFLOPs count), and YOLOv10m (mAP50-
11.1ms, and GFLOPs count of 189.4, 76.7, 102.6, and 26.8 95 of 0.781, inference time of 2.4ms, size of 32.1Mb, 63.8
respectively. These findings delineate a clear trade-off between GFLOPs count). These results highlight the robustness of these
accuracy and computational efficiency. models in detecting traffic signs of various sizes while main-
d) Overall Performance:: When evaluating overall per- taining short inference times and small model sizes. Notably,
formance, which includes accuracy, size, and model efficiency, the YOLO11 and YOLOv10 families significantly outperform
YOLO11m emerges as a consistently top-performing model. It other YOLO families, in terms of accuracy and computational
achieves a mAP50-95 of 0.795, an inference time of 2.4ms, a efficiency in this dataset, as their models consistently surpass
model size of 38.8Mb, and a 67.9 GFLOPs count, as detailed counterparts from other families.
in Figures 8, 10, 11, and Table VI. This is followed by 2) Africa Wildlife Dataset: The results in Table VII show-
YOLO11l (mAP50-95 of 0.794, inference time of 4.6ms, size case the performance of the YOLO models on the Africa
10

Fig. 10. Total processing time results on traffic signs dataset. Each bar represents the total processing time, divided into three sections: Preprocessing Time
(bottom), Inference Time (middle), and Postprocessing Time (top).

Fig. 11. Total processing time and GFLOPs count results on traffic signs dataset.

Wildlife dataset. This dataset contains large object sizes fo- mAP50-95 scores of 0.83 and 0.825, respectively. These
cusing on the ability of YOLO models to predict large objects results highlight the YOLOv9 family’s ability to effectively
and their risk of overfitting due to the size of the dataset. The learn patterns from a small sample of images, making it par-
models demonstrate robust accuracy across the board, with ticularly suited for smaller datasets. In contrast, YOLOv5un,
the highest-performing models achieving a mAP50-95 ranging YOLOv10n, and YOLOv3u-tiny show lower mAP50-95 scores
from 0.832 to 0.725. This relatively shorter range reflects the of 0.791, 0.786, and 0.725, indicating their limitations in accu-
effectiveness of the models in detecting and classifying large racy. The underperformance of larger models like YOLO11x,
wildlife objects by maintaining high accuracy. YOLOv5ux, YOLOv5ul, and YOLOv10l can be attributed to
overfitting, especially given the small dataset size.
a) Accuracy:: As illustrated in Figure 12, YOLOv9s
demonstrates exceptional performance with a high mAP50-95 b) Precision and Recall:: Figure 13 reveals that
of 0.832 and a mAP50 of 0.956, showcasing its robust accu- YOLO8l and YOLO11l achieve the highest precision and
racy across various IoU thresholds. YOLOv9c and YOLOv9t recall, with values of 0.942 and 0.937 for precision, and 0.898
follow closely, with mAP50 scores of 0.96 and 0.948 and and 0.896 for recall, respectively. Notably, YOLOv8n achieves
11

TABLE VII
E VALUATION RESULTS FOR THE A FRICA W ILDLIFE DATASET.

Preprocess Inference Postprocess

Versions Precision Recall mAP50 mAP50-95 Total Time Size GFLOPs
Time Time Time
YOLOv3u 0.91 0.88 0.943 0.803 0.5 6.2 0.4 7.1 207.86 282.2
YOLOV3u tiny 0.897 0.866 0.921 0.725 0.7 0.7 0.4 1.8 24.44 19.1
YOLOv5un 0.949 0.862 0.948 0.791 1.1 0.5 0.4 2 5.65 7
YOLOv5us 0.924 0.882 0.853 0.804 1 0.8 0.4 2.2 18.58 23.8
YOLOv5um 0.935 0.887 0.947 0.807 0.6 2.1 0.4 3.1 50.54 64
YOLOv5ul 0.916 0.881 0.948 0.797 0.7 3.3 0.4 4.4 106.85 135
YOLOv5ux 0.932 0.867 0.946 0.797 0.5 6.6 0.4 7.5 195.2 246.2
YOLOv8n 0.932 0.908 0.953 0.794 1.1 0.5 0.4 2 6.55 8.2
YOLOv8s 0.962 0.88 0.943 0.812 0.9 1 0.4 2.3 22.59 28.7
YOLOv8m 0.928 0.909 0.953 0.822 0.8 2.5 0.4 3.7 52.12 78.9
YOLOv8l 0.942 0.898 0.957 0.817 0.9 3.9 0.4 5.2 87.77 165.1
YOLOv8x 0.912 0.906 0.953 0.819 0.5 7.1 0.4 8 136.9 257.6
YOLOv9t 0.944 0.875 0.948 0.825 1.3 1.1 0.4 2.8 4.93 7.7
YOLOv9s 0.921 0.897 0.956 0.832 1 1.2 0.4 2.6 15.33 26.9
YOLOv9m 0.924 0.901 0.952 0.823 0.9 2.8 0.4 4.1 40.98 76.5
YOLOv9c 0.934 0.897 0.96 0.83 0.9 3.4 0.4 4.7 51.8 102.7
YOLOv9e 0.932 0.864 0.944 0.809 0.5 7.6 0.4 8.5 117.5 189.3
YOLOv10n 0.901 0.9 0.936 0.786 1.1 0.7 0.2 2 5.59 8.2
YOLOv10s 0.929 0.888 0.947 0.799 0.9 1.1 0.2 2.2 15.9 24.5
YOLOv10m 0.91 0.88 0.945 0.8 1 2.4 0.2 3.6 32.1 63.4
YOLOv10b 0.905 0.899 0.944 0.809 0.8 3.2 0.2 4.2 39.7 98
YOLOv10l 0.922 0.894 0.944 0.8 0.7 3.8 0.2 4.7 50 126.3
YOLOv10x 0.96 0.862 0.949 0.819 0.8 6.3 0.2 7.3 61.4 169.8
YOLO11n 0.964 0.877 0.964 0.802 1.1 0.7 0.4 2.2 5.35 6.3
YOLO11s 0.952 0.892 0.959 0.8 1.1 1 0.4 2.5 18.4 21.3
YOLO11m 0.922 0.906 0.96 0.81 0.9 2.4 0.4 3.7 38.8 67.7
YOLO11l 0.937 0.896 0.965 0.805 0.9 3 0.4 4.3 49 86.6
YOLO11x 0.908 0.886 0.945 0.795 0.6 6.1 0.4 7.1 109 194.4

Fig. 12. mAP50 and mAP50-95 YOLO results on Africa wildlife dataset. Each model is represented by two bars: the left bar shows the mAP50 score, while
the right bar represents the mAP50-95 score.
12

Fig. 13. Precision vs. Recall based on size results on Africa wildlife dataset. The size of each circle represents the size of the model, with larger circles
indicating larger models.

Fig. 14. Total processing time results on Africa wildlife dataset. Each bar represents the total processing time, divided into three sections: Preprocessing
Time (bottom), Inference Time (middle), and Postprocessing Time (top).

similar results (0.932 for precision, 0.908 for recall) with Figures 14 and 15. Conversely, YOLOv9e exhibits the slowest
a compact size of 6.55Mb, demonstrating its efficiency. In processing time at 11.2ms and a GFLOPs count of 189.3,
contrast, YOLOv3u and YOLOv5ul exhibit lower precision followed by YOLOv5ux at 7.5ms and 246.2 GFLOPs count.
and recall scores (0.91 and 0.88 for YOLOv3u, 0.916 and These results indicate that larger models tend to require more
0.881 for YOLOv5ul), despite their larger sizes (204.86Mb processing time and hardware usage compared to smaller
for YOLOv3u, 106.85Mb for YOLOv5ul), which may be models, emphasizing the trade-off between model size and
attributed to overfitting issues. processing efficiency.
c) Computational Efficiency:: YOLOv10n, YOLOv8n, d) Overall Performance:: YOLOv9t and YOLOv9s con-
and YOLOv3u-tiny are the fastest models, achieving pro- sistently excel across all metrics, delivering high accuracy
cessing times of 2ms and 1.8ms, with GFLOPs counts of while maintaining small model sizes, low GFLOPs, and short
8.2 and 19.1, respectively. The first two models share the inference times, as shown in Table VII, and Figures 13, 14, and
same processing speed and GFLOPs count, as showcased in 15. This demonstrates the robustness of YOLOv9’s smaller
13

Fig. 15. Total processing time and GFLOPs count results on Africa wildlife dataset.

models and their effectiveness on small datasets. In contrast, of 0.668 and a recall of 0.555. It was closely followed
YOLO5ux and YOLO11x show suboptimal accuracy despite by YOLOv9m (precision of 0.668, recall of 0.551) and
their larger sizes and longer inference times, likely due to YOLOv8m (precision of 0.669, recall of 0.525), both of which
overfitting. Most large models underperformed on this dataset, are significantly smaller in size (40.98 Mb for YOLOv9m
with the exception of YOLOv10x, which benefited from a and 52.12 Mb for YOLOv8m). In contrast, YOLO11n and
modern architecture that prevents overfitting. YOLOv10s exhibited lower performance, with precisions of
3) Ships and Vessels Dataset: Table VIII presents the per- 0.574 and 0.586 and recalls of 0.51 and 0.511, respectively,
formance of YOLO models on the Ships and Vessels dataset, likely due to underfitting issues. Generally, YOLO11 models
a large dataset featuring tiny objects with varying rotations. tended to produce false positives, reflected in their low preci-
Overall, the models demonstrated moderate effectiveness in sion and high recall. Meanwhile, YOLOv10 underperformed
detecting ships and vessels, with mAP50-95 ranging from in both precision and recall, despite being one of the newest
0.273 to 0.327. This performance suggests that YOLO al- models in the YOLO family.
gorithms may face challenges in accurately detecting smaller
c) Computational Efficiency:: As illustrated in Figures
objects, and the dataset’s diversity in object sizes and rotations
18 and 19, YOLOv3u-tiny achieved the fastest processing
provides a comprehensive test of the models’ capabilities in
time at 2 ms, closely followed by YOLOv8n and YOLOv5un,
these conditions.
both recording 2.3 ms. YOLOv10 and YOLO11 models also
a) Accuracy: : The disparity between mAP50-95 and
excelled in speed, with YOLOv10n and YOLO11n achieving
mAP50, illustrated in Figure 16, underscores the challenges
rapid inference times of 2.4 ms and 2.5 ms, along with
YOLO models face with higher IoU thresholds when de-
GFLOPs counts of 8.2 and 6.3, respectively. In contrast,
tecting small objects. Additionally, YOLO models strug-
YOLOv9e exhibited the slowest speed, with an inference time
gle with detecting objects of varying rotations. Among the
of 7.6 ms and a GFLOPs count of 189.3, highlighting the
models, YOLO11x achieved the highest accuracy, with a
trade-off between accuracy and efficiency within the YOLOv9
mAP50 of 0.529 and a mAP50-95 of 0.327, closely followed
family.
by YOLO11l, YOLO11m, and YOLO11s, which recorded
mAP50 values of 0.529, 0.528, and 0.53, and mAP50-95 d) Overall Performance:: The results in Table VIII
values of 0.327, 0.325, and 0.325, respectively. These results and Figures 16, 17, and 18 demonstrate that YOLO11s and
highlight the robustness of the YOLO11 family in detecting YOLOv10s excelled in accuracy while maintaining compact
small and tiny objects. In contrast, YOLOv3u-tiny, YOLOv8n, sizes, low GFLOPs, and quick processing times. In contrast,
YOLOv3u, and YOLOv5n exhibited the lowest accuracy, with YOLOv3u, YOLOv8x, and YOLOv8l fell short of expecta-
mAP50 scores of 0.489, 0.515, 0.519, and 0.514, and mAP50- tions despite their larger sizes and longer processing times.
95 scores of 0.273, 0.297, 0.298, and 0.298, respectively. These findings highlight the robustness and reliability of the
This suggests the outdated architecture of YOLOv3u and the YOLO11 family, particularly in improving the YOLO family’s
potential underfitting of smaller models due to the large dataset performance in detecting small and tiny objects while ensur-
size. ing efficient processing. Additionally, the results reveal the
b) Precision and Recall: : Figure 17 indicates that underperformance of YOLOv9 models when faced with large
YOLOv5ux outperformed other models, achieving a precision datasets and small objects, despite their modern architecture.
14

TABLE VIII
E VALUATION RESULTS FOR THE SHIPS AND VESSELS DATASET.

Preprocess Inference Postprocess Total

Versions Precision Recall mAP50 mAP50-95 Size GFLOPs
Time Time Time Time
YOLOv3u 0.679 0.534 0.519 0.298 0.8 6.2 0.3 7.3 207.86 282.5
YOLOV3u tiny 0.647 0.511 0.489 0.273 1 0.7 0.3 2 24.44 18.9
YOLOv5un 0.635 0.532 0.514 0.298 1.5 0.6 0.3 2.4 5.65 7.2
YOLOv5us 0.653 0.541 0.518 0.299 1.2 0.8 0.3 2.3 18.58 24
YOLOv5um 0.667 0.541 0.526 0.308 0.9 2.1 0.3 3.3 50.54 64
YOLOv5ul 0.654 0.545 0.525 0.305 0.9 3.3 0.3 4.5 106.85 134.8
YOLOv5ux 0.668 0.555 0.531 0.309 0.8 6.7 0.3 7.8 195.2 246.2
YOLOv8n 0.655 0.533 0.515 0.297 1.5 0.5 0.3 2.3 6.55 8
YOLOv8s 0.647 0.545 0.518 0.301 1.1 1 0.3 2.4 22.59 28.5
YOLOv8m 0.669 0.547 0.525 0.302 0.8 2.5 0.3 3.6 52.12 79
YOLOv8l 0.659 0.551 0.526 0.303 0.9 3.9 0.3 5.1 87.77 165
YOLOv8x 0.655 0.55 0.529 0.306 0.8 7.1 0.3 8.2 136.9 257.7
YOLOv9t 0.647 0.516 0.512 0.3 1.4 1.1 0.3 2.8 4.93 7.5
YOLOv9s 0.655 0.552 0.522 0.308 1.4 1.2 0.3 2.9 15.33 26.9
YOLOv9m 0.668 0.551 0.529 0.307 1.1 2.7 0.3 4.1 40.98 76.8
YOLOv9c 0.663 0.547 0.523 0.303 1.2 3.4 0.3 4.9 51.8 102.4
YOLOv9e 0.667 0.537 0.524 0.308 1.1 7.6 0.3 9 117.5 189.5
YOLOv10n 0.584 0.487 0.506 0.31 1.4 0.8 0.2 2.4 5.59 8.2
YOLOv10s 0.586 0.511 0.515 0.319 1.1 1.1 0.2 2.4 15.9 24.4
YOLOv10m 0.588 0.517 0.522 0.322 1 2.4 0.1 3.5 32.1 63.4
YOLOv10b 0.603 0.509 0.523 0.319 1.1 3.2 0.1 4.4 39.7 97.9
YOLOv10l 0.601 0.511 0.522 0.322 1.1 3.8 0.1 5 50 126.3
YOLOv10x 0.6 0.523 0.526 0.321 1 6.3 0.2 7.5 61.4 169.8
YOLO11n 0.574 0.51 0.505 0.311 1.5 0.7 0.3 2.5 5.35 6.3
YOLO11s 0.585 0.535 0.521 0.323 1.3 1 0.3 2.6 18.4 21.3
YOLO11m 0.588 0.541 0.53 0.325 1 2.4 0.3 3.7 38.8 67.6
YOLO11l 0.596 0.531 0.528 0.325 1.1 3 0.4 4.5 49 86.6
YOLO11x 0.596 0.538 0.529 0.327 0.8 6.1 0.3 7.2 109 194.4

Fig. 16. mAP50 and mAP50-95 YOLO results on ships and vessel dataset. Each model is represented by two bars: the left bar shows the mAP50 score,
while the right bar represents the mAP50-95 score.
15

Fig. 17. Precision vs. Recall based on size results on ships and vessels dataset. The size of each circle represents the size of the model, with larger circles
indicating larger models.

Fig. 18. Total processing time results on ships and vessels dataset. Each bar represents the total processing time, divided into three sections: Preprocessing
Time (bottom), Inference Time (middle), and Postprocessing Time (top).
16

Fig. 19. Total processing time and GFLOPs count results on ships and vessels dataset.

B. Discussion TABLE IX
OVERALL RANKING OF YOLO ALGORITHMS
Based on the performance of the models across the three
datasets, we ranked them by accuracy, speed, GFLOps count, Accuracy Speed GFLOPs Size
Version
Rank Rank Rank Rank
and size, as shown in Table IX to facilitate a comprehensive YOLOv3u-tiny 28 1 6 11
evaluation. For accuracy, the mAP50-95 metric was employed YOLOv3u 20 24 28 28
due to its capacity to assess models across a range of IoU YOLOv5un 27 6 2 4
YOLOv5us 24 7 8 9
thresholds, thus providing a detailed insight into each model’s YOLOv5um 17 15 13 18
performance. For speed, models were sorted based on the total YOLOv5ul 14 19 21 23
processing time, which encompasses preprocessing, inference, YOLOv5ux 17 27 26 27
YOLOv8n 26 5 4 5
and postprocessing durations. The rankings range from Rank YOLOv8s 23 9 11 10
1, indicating the highest performance, to Rank 28, denoting YOLOv8m 15 17 16 20
the lowest, with the respective rankings highlighted in bold YOLOv8l 13 22 22 22
YOLOv8x 8 26 27 26
within the table.
YOLOv9t 20 12 3 1
The analysis of Table IX yields several critical observations: YOLOv9s 7 15 10 6
1) Accuracy: YOLO11m consistently emerged as a top per- YOLOv9m 4 21 15 15
YOLOv9c 9 25 19 19
former, frequently ranking among the highest, closely followed YOLOv9e 12 28 24 25
by YOLOv10x, YOLO11l, YOLOv9m, and YOLO11x. This YOLOv10n 25 2 5 3
underscores the robust performance of the YOLO11 family YOLOv10s 19 3 9 7
YOLOv10m 5 10 12 12
across varying IoU thresholds and object sizes, which can YOLOv10b 9 12 18 14
be attributed to their use of C2PSA for the preservation of YOLOv10l 11 17 20 17
contextual information, leading to improved convergence and YOLOv10x 2 22 23 21
overall performance. In addition, the implementation of large- YOLO11n 22 3 1 2
YOLO11s 16 8 7 8
kernel convolutions and partial self-attention modules helped YOLO11m 1 11 14 13
increase the performance of the algorithm. YOLO11l 3 14 17 16
YOLO11x 5 19 25 24
Conversely, YOLOv3u-tiny exhibited the lowest accuracy,
particularly in the Africa Wildlife and Ships and Vessels
datasets, with YOLOv5un and YOLOv8n showing slightly bet- family, despite its later introduction, exhibited relatively lower
ter but still sub-par results. This suggests that YOLO11 models accuracy in the Traffic Signs and Africa Animals datasets,
are currently the most reliable for applications demanding high resulting in an average accuracy drop of 2.075% compared
accuracy. to the YOLOv9 models in those datasets. The slight under-
Closely following the performance of the YOLO11 family, performance of YOLOv10 can be attributed to its adoption
the YOLOv9 models demonstrate their effectiveness in detect- of the One-to-One Head approach instead of Non-Maximum
ing objects across various sizes and different IoU thresholds. Suppression (NMS) for defining bounding boxes. This strategy
However, they may struggle with small objects, as seen in can struggle to capture objects effectively, particularly when
the Ships and Vessels dataset. In contrast, the YOLOv10 dealing with overlapping items, as it relies on a single pre-
17

diction per object. This limitation helps explain the relatively tegration. This positions YOLOv9 as a viable choice for
subpar results observed in the second dataset. applications where precision is prioritized over speed.
Similarly, the outdated architecture of YOLOv3u con- In addition, YOLOv8 and YOLOv5u exhibited competitive
tributed to its inferior performance, averaging 6.5% lower results, surpassing YOLOv3u in accuracy, which is likely due
accuracy than the YOLO11 models. This decline can be to YOLOv3u’s older architecture. However, their accuracy still
traced back to its reliance on the older Darknet-53 framework, fell significantly short compared to the newer models, such
first introduced in 2018, which may not adequately address as YOLOv9, YOLOv10, and YOLO11. While YOLOv8 and
contemporary detection challenges. YOLOv5u had faster processing times than YOLOv9, their
2) Computational Efficiency: YOLOv10n consistently out- overall performance remains inferior to that of the newer
performed other models in terms of speed and GFLOPs count, models.
ranking among the top across all three datasets in terms 5) Object Size and Rotation Detection: The YOLO algo-
of speed and 5th in terms of GFLOPs count. YOLOv3u- rithm is effective in detecting large and medium-sized objects,
tiny, YOLOv10s, and YOLO11n also demonstrated notable as evidenced by high accuracy in the Africa Wildlife and
computational efficiency. Traffic Signs datasets. However, it struggles with small object
detection, probably due to its division of images into grids,
YOLOv9e exhibited the slowest inference times and a
making identifying small, low-resolution objects challenging.
very high GFLOPs count across the datasets, illustrating the
Adding to that, YOLO faces challenges when handling objects
trade-off between accuracy and efficiency. YOLO11’s speed
of different rotations due to the inability to enclose rotated
improvements, attributable to their use of the C3k2 block,
objects, leading to sub-par results overall.
make it suitable for applications where rapid processing is
essential, surpassing YOLOv10 and YOLOv9 models, in terms To handle rotated objects, models such as YOLO11 OBB
of speed by %1.41 and %31 on average, respectively. [26] and YOLOv8 OBB [25] (Oriented Bounding Box) can
be implemented. Keeping the same foundational architecture
While YOLOv9 models excelled in accuracy, their inference
as the standard YOLOv8 and YOLO11, YOLOv8 OBB and
times were among the slowest, making them less ideal for
YOLO11 OBB replace the standard bounding box prediction
time-sensitive applications. In contrast, YOLOv10 models,
head with one that predicts the four corner points of a
though slightly slower than the YOLO11 variants, still offer
rotated rectangle, allowing for more accurate localization and
a commendable balance between efficiency and speed. Their
representation of arbitrarily oriented objects.
performance is well-suited for time-sensitive scenarios, provid-
6) The Rise of YOLO11 Over YOLOv8: Although YOLOv8
ing rapid processing without significantly sacrificing accuracy,
[25] has been the algorithm of choice for its versatility in tasks
making them a viable option for real-time applications.
such as pose estimation, instance segmentation, and oriented
3) Model Size: YOLOv9t was the smallest model, ranking
object detection (OBB), YOLO11 [26] has now emerged as
first across all three datasets, followed by YOLO11n and
a more efficient and accurate alternative. With its ability to
YOLOv10n. This efficiency in model size underscores the
handle the same tasks while offering improved contextual
advancements in newer YOLO versions, especially YOLOv10,
understanding and better architectural modules, YOLO11 sets
showcasing the effectiveness of implementing the Spatial-
a new standard in performance, surpassing YOLOv8 in both
Channel Decoupled Downsampling for efficient parameter
speed and accuracy across various applications.
utilization.
7) Dataset Size: The size of the dataset significantly influ-
YOLOv3u was the largest model, highlighting its ineffi- ences the performance of YOLO models. For instance, large
ciency compared to its more modern counterparts due to its models did not perform optimally on the small African wildlife
outdated architecture. dataset compared to their results on the Traffic Signs and Ships
4) Overall Performance: Considering accuracy, speed, and Vessels datasets due to being more prone to overfitting.
size, and GFLOPs, YOLO11m, YOLOv11n, YOLO11s, and Conversely, small models such as YOLOv9t and YOLOv9s
YOLOv10s emerged as the most consistent performers. They performed significantly better on the Africa Wildlife dataset
achieved high accuracy, low processing time and power, and compared to their results on the other datasets, showcasing the
efficient disk usage, making them suitable for a wide range of effectiveness of small-scaled models when handling limited
applications where both speed and accuracy are crucial. datasets.
Conversely, YOLOv9e, YOLOv5ux, and YOLOv3u demon- 8) Impact of Training Datasets: The performance of YOLO
strated poor results across all metrics, being computation- models is influenced by the training datasets used, as shown
ally inefficient and underperforming relative to their sizes. in Tables VI, VII, and VIII. Different datasets yield varying
YOLO11 models showed the best overall performance, likely results and top performers, indicating that dataset complexity
due to recent enhancements such as the C3k2 block and affects algorithm performance. This underscores the impor-
C2PSA module. Following closely, YOLOv10 models, despite tance of using diverse datasets during benchmarking to obtain
slightly underperforming in accuracy excelled in efficiency comprehensive results on the strengths and limitations of each
thanks to its use of implementation of One-to-One head for model.
prediction. While YOLOv9 showed underperformance in com- This discussion highlights the need for a balanced consid-
putational efficiency, it remains competitive with YOLOv10 eration of accuracy, speed, and model size when selecting
and YOLO11 in terms of accuracy, thanks to its PGI in- YOLO models for specific applications. The consistent perfor-
18

mance of YOLO11 models across various metrics makes them [3] Alaa Ali and Magdy A Bayoumi. Towards real-time dpm object detector
highly recommended for versatile situations where accuracy for driver assistance. In 2016 IEEE International Conference on Image
Processing (ICIP), pages 3842–3846. IEEE, 2016.
and speed are essential. Meanwhile, YOLOv10 models can [4] Isaiah Francis E Babila, Shawn Anthonie E Villasor, and Jennifer C Dela
perform similarly while achieving faster processing times Cruz. Object detection for inventory stock counting using yolov5.
and with smaller model sizes. Additionally, YOLOv9 can In 2022 IEEE 18th International Colloquium on Signal Processing &
Applications (CSPA), pages 304–309. IEEE, 2022.
deliver comparable results in terms of accuracy but sacrifices [5] Chetan Badgujar, Daniel Flippo, Sujith Gunturu, and Carolyn Baldwin.
speed, making it suitable for applications where precision is Tree trunk detection of eastern red cedar in rangeland environment with
prioritized over rapid processing. deep learning technique. Croatian Journal of Forest Engineering, 44,
06 2023.
[6] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao.
V. C ONCLUSION Yolov4: Optimal speed and accuracy of object detection. arXiv preprint
This benchmark study thoroughly evaluates the performance arXiv:2004.10934, 2020.
of various YOLO algorithms. It pioneers a comprehensive [7] Yining Cao, Chao Li, Yakun Peng, and Huiying Ru. Mcs-yolo:
A multiscale object detection method for autonomous driving road
comparison of YOLO11 against its predecessors, evaluating environment recognition. IEEE Access, 11:22342–22354, 2023.
their performance across three diverse datasets: Traffic Signs, [8] Libo Cheng, Jia Li, Ping Duan, and Mingguo Wang. A small attentional
African Wildlife, and Ships and Vessels. The datasets were yolo model for landslide detection from satellite remote sensing images.
Landslides, 18(8):2751–2765, 2021.
carefully selected to encompass a wide range of object prop- [9] Yuan Dai, Weiming Liu, Haiyu Li, and Lan Liu. Efficient foreign object
erties, including varying object sizes, aspect ratios, and object detection between psds and metro doors via deep neural networks. IEEE
densities. We showcase the strengths and weaknesses of each Access, PP:1–1, 03 2020.
[10] N. Dalal and B. Triggs. Histograms of oriented gradients for human
YOLO version and family by examining a wide range of detection. In 2005 IEEE Computer Society Conference on Computer
metrics such as Precision, Recall, Mean Average Precision Vision and Pattern Recognition (CVPR’05), volume 1, pages 886–893
(mAP), Processing Time, GFLOPs count, and Model Size. Our vol. 1, 2005.
study addresses the following key research questions: [11] Sheshang Degadwala, Dhairya Vyas, Utsho Chakraborty, Abu Raihan
Dider, and Haimanti Biswas. Yolo-v4 deep learning model for medical
• Which YOLO algorithm demonstrates superior perfor- face mask detection. In 2021 International Conference on Artificial
mance across a comprehensive set of metrics? Intelligence and Smart Systems (ICAIS), pages 209–213. IEEE, 2021.
[12] Tausif Diwan, G Anirudh, and Jitendra V Tembhurne. Object detection
• How do different YOLO versions perform on datasets
using yolo: Challenges, architectural successors, datasets and applica-
with diverse object characteristics, such as size, aspect tions. multimedia Tools and Applications, 82(6):9243–9275, 2023.
ratio, and density? [13] Yunus Egi, Mortaza Hajyzadeh, and Engin Eyceyurt. Drone-computer
• What are the specific strengths and limitations of each communication based tomato generative organ counting model using
yolo v5 and deep-sort. Agriculture, 12:1290, 08 2022.
YOLO version, and how can these insights inform the [14] Loddo Fabio, Dario Piga, Michelucci Umberto, and El Ghazouali
selection of the most suitable algorithm for various ap- Safouane. Benchcloudvision: A benchmark analysis of deep learning
plications? approaches for cloud detection and segmentation in remote sensing
imagery. arXiv preprint arXiv:2402.13918, 2024.
In particular, the YOLO11 family emerged as the most [15] Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester, and Deva
consistent, with YOLO11m striking an optimal balance be- Ramanan. Object detection with discriminatively trained part-based
tween accuracy, efficiency, and model size. While YOLOv10 models. IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, 32(9):1627–1645, 2010.
delivered slightly lower accuracy than YOLO11, it excelled [16] Di Feng, Ali Harakeh, Steven L Waslander, and Klaus Dietmayer.
in speed and efficiency, making it a strong choice for appli- A review and comparative study on probabilistic object detection in
cations requiring efficiency and fast processing. Additionally, autonomous driving. IEEE Transactions on Intelligent Transportation
Systems, 23(8):9961–9980, 2021.
YOLOv9 performed well overall and particularly stood out [17] Rongli Gai, Na Chen, and Hai Yuan. A detection algorithm for cherry
in smaller datasets. These findings provide valuable insights fruits based on the improved yolo-v4 model. Neural Computing and
for industry and academia, guiding the selection of the most Applications, 35(19):13895–13906, 2023.
suitable YOLO algorithms and informing future developments [18] Dweepna Garg, Parth Goel, Sharnil Pandya, Amit Ganatra, and Ketan
Kotecha. A deep learning approach for face detection using yolo. In
and enhancements. While the evaluated algorithms demon- 2018 IEEE Punecon, pages 1–4. IEEE, 2018.
strate promising performance, there is still room for refine- [19] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich
ment. Future research could focus on optimizing YOLOv10 to feature hierarchies for accurate object detection and semantic segmen-
tation, 2014.
enhance its accuracy while preserving its speed and efficiency [20] Juan Guerrero-Ibáñez, Sherali Zeadally, and Juan Contreras-Castillo.
advantage. Additionally, continued advancements in architec- Sensor technologies for intelligent transportation systems. Sensors,
tural design may pave the way for even more groundbreaking 18(4), 2018.
[21] Muhammad Hussain. Yolo-v1 to yolo-v8, the rise of yolo and its
YOLO algorithms. Our future work includes in-depth studies complementary nature toward digital manufacturing and industrial defect
of the identified gaps in these algorithms, along with proposed detection. Machines, 11(7):677, 2023.
improvements to demonstrate their potential impact on overall [22] Muhammad Hussain. Yolov1 to v8: Unveiling each variant–a compre-
efficiency. hensive review of yolo. IEEE Access, 12:42816–42833, 2024.
[23] Rasheed Hussain and Sherali Zeadally. Autonomous cars: Research
results, issues, and future challenges. IEEE Communications Surveys &
R EFERENCES Tutorials, 21(2):1275–1313, 2019.
[1] Oluibukun Ajayi, John Ashi, and BLESSED Guda. Performance [24] Glenn Jocher. Ultralytics yolov5, 2020.
evaluation yolo v5 model for automatic crop and weed classification [25] Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics yolov8, 2023.
on uav images. Smart Agricultural Technology, 5:100231, 04 2023. [26] Glenn Jocher and Jing Qiu. Ultralytics yolo11, 2024.
[2] Bader Aldughayfiq, Farzeen Ashfaq, NZ Jhanjhi, and Mamoona Hu- [27] Chang Ho Kang and Sun Young Kim. Real-time object detection and
mayun. Yolo-based deep learning model for pressure ulcer detection segmentation technology: an analysis of the yolo algorithm. JMST
and classification. In Healthcare, volume 11, page 1222. MDPI, 2023. Advances, 5(2):69–76, 2023.
19

[28] Nyoman Karna, Made Adi Paramartha Putra, Syifa Rachmawati, Mideth the IEEE conference on computer vision and pattern recognition, pages
Abisado, and Gabriel Sampedro. Toward accurate fused deposition 779–788, 2016.
modeling 3d printer fault detection using improved yolov8 with hy- [48] Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. In
perparameter optimization. IEEE Access, PP:1–1, 01 2023. Proceedings of the IEEE conference on computer vision and pattern
[29] Chuyi Li, Lulu Li, Hongliang Jiang, Kaiheng Weng, Yifei Geng, Liang recognition, pages 7263–7271, 2017.
Li, Zaidan Ke, Qingyuan Li, Meng Cheng, Weiqiang Nie, et al. Yolov6: [49] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement.
A single-stage object detection framework for industrial applications. arXiv preprint arXiv:1804.02767, 2018.
arXiv preprint arXiv:2209.02976, 2022. [50] Arunabha M Roy, Jayabrata Bhaduri, Teerath Kumar, and Kislay Raj.
[30] Guofa Li, Zefeng Ji, Xingda Qu, Rui Zhou, and Dongpu Cao. Cross- Wildect-yolo: An efficient and robust computer vision-based accurate
domain object detection for autonomous driving: A stepwise domain object localization model for automated endangered wildlife detection.
adaptative yolo approach. IEEE Transactions on Intelligent Vehicles, Ecological Informatics, 75:101919, 2023.
7(3):603–615, 2022. [51] Arunabha Mohan Roy, Jayabrata Bhaduri, Teerath Kumar, and Kislay
[31] Min Li, Zhijie Zhang, Liping Lei, Xiaofan Wang, and Xudong Guo. Raj. A computer vision-based object localization model for endangered
Agricultural greenhouses detection in high-resolution satellite images wildlife detection. Ecological Economics, Forthcoming, 2022.
based on convolutional neural networks: Comparison of faster r-cnn, [52] SIDDHARTH SAH. Ships/vessels in aerial images. https://www.kaggle.
yolo v3 and ssd. Sensors, 20(17):4938, 2020. com/datasets/siddharthkumarsah/ships-in-aerial-images/data, july 2023.
[32] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. visited on 2024-07-12.
Focal loss for dense object detection, 2018. [53] Ranjan Sapkota, Rizwan Qureshi, Marco Flores Calero, Muhammad
[33] Martina Lippi, Niccolò Bonucci, Renzo Fabrizio Carpio, Mario Con- Hussain, Chetan Badjugar, Upesh Nepal, Alwin Poulose, Peter Zeno,
tarini, Stefano Speranza, and Andrea Gasparri. A yolo-based pest Uday Bhanu Prakash Vaddevolu, Hong Yan, et al. Yolov10 to its genesis:
detection system for precision agriculture. In 2021 29th Mediterranean A decadal and comprehensive review of the you only look once series.
Conference on Control and Automation (MED), pages 342–347. IEEE, arXiv preprint arXiv:2406.19407, 2024.
2021. [54] Abhishek Sarda, Shubhra Dixit, and Anupama Bhan. Object detection
[34] Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xin- for autonomous driving using yolo [you only look once] algorithm.
wang Liu, and Matti Pietikäinen. Deep learning for generic object In 2021 Third international conference on intelligent communication
detection: A survey. International journal of computer vision, 128:261– technologies and virtual mobile networks (ICICV), pages 1370–1374.
318, 2020. IEEE, 2021.
[35] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott [55] Maged Shoman, Gabriel Lanzaro, Tarek Sayed, and Suliman Gargoum.
Reed, Cheng-Yang Fu, and Alexander C. Berg. SSD: Single Shot Autonomous vehicle-pedestrian interaction modeling platform: A case
MultiBox Detector, page 21–37. Springer International Publishing, 2016. study in four major cities. Journal of Transportation Engineering Part
[36] Jueal Mia, Hasan Imam Bijoy, Shoreef Uddin, and Dewan Mamun Raza. A Systems, 06 2024.
Real-time herb leaves localization and classification using yolo. In [56] Maged Shoman, Dongdong Wang, Armstrong Aboah, and Mohamed
2021 12th International Conference on Computing Communication and Abdel-Aty. Enhancing traffic safety with parallel dense video captioning
Networking Technologies (ICCCNT), pages 1–7. IEEE, 2021. for end-to-end event analysis, 2024.
[37] Hamzeh Mirhaji, Mohsen Soleymani, Abbas Asakereh, and Saman Ab- [57] Karen Simonyan and Andrew Zisserman. Very deep convolutional
danan Mehdizadeh. Fruit detection and load estimation of an orange networks for large-scale image recognition, 2015.
orchard using the yolo models through simple approaches in different [58] Mupparaju Sohan, Thotakura Sai Ram, Rami Reddy, and Ch Venkata. A
imaging and illumination conditions. Computers and Electronics in review on yolov8 and its advancements. In International Conference on
Agriculture, 191:106533, 2021. Data Intelligence and Cognitive Informatics, pages 529–545. Springer,
[38] Miand Mostafa and Milad Ghantous. A yolo based approach for 2024.
traffic light recognition for adas systems. In 2022 2nd International [59] suranaree university of technology. africa wild life dataset. https:
Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), //universe.roboflow.com/suranaree-university-of-technology-wqhl6/
pages 225–229. IEEE, 2022. africa-wild-life, feb 2023. visited on 2024-07-12.
[39] Huy Hoang Nguyen, Thi Nhung Ta, Ngoc Cuong Nguyen, Hung Manh [60] Ultralytics. YOLOv5: A state-of-the-art real-time object detection
Pham, Duc Minh Nguyen, et al. Yolo based real-time human detection system. https://docs.ultralytics.com, 2021. Accessed: insert date here.
for smart video surveillance at the edge. In 2020 IEEE eighth inter- [61] Amir Ulykbek, Azamat Serek, and Magzhan Zhailau. A comprehensive
national conference on communications and electronics (ICCE), pages review of object detection in yolo: Evolution, variants, and applications.
439–444. IEEE, 2021. [62] NL Vidya, M Meghana, P Ravi, and Nithin Kumar. Virtual fencing
[40] Radu Oprea. Traffic signs detection europe dataset. https://universe. using yolo framework in agriculture field. In 2021 Third International
roboflow.com/radu-oprea-r4xnm/traffic-signs-detection-europe, feb Conference on Intelligent Communication Technologies and Virtual
2024. visited on 2024-07-12. Mobile Networks (ICICV), pages 441–446. IEEE, 2021.
[41] Rafael Padilla, Sergio L Netto, and Eduardo AB Da Silva. A survey on [63] P. Viola and M. Jones. Rapid object detection using a boosted cascade
performance metrics for object-detection algorithms. In 2020 interna- of simple features. In Proceedings of the 2001 IEEE Computer Society
tional conference on systems, signals and image processing (IWSSIP), Conference on Computer Vision and Pattern Recognition. CVPR 2001,
pages 237–242. IEEE, 2020. volume 1, pages I–I, 2001.
[42] Govind S Patel, Ashish A Desai, Yogesh Y Kamble, Ganesh V Pujari, [64] Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, and
Priyanka A Chougule, and Varsha A Jujare. Identification and separation Guiguang Ding. Yolov10: Real-time end-to-end object detection. arXiv
of medicine through robot using yolo and cnn algorithms for healthcare. preprint arXiv:2405.14458, 2024.
In 2023 International Conference on Artificial Intelligence for Innova- [65] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao.
tions in Healthcare Industries (ICAIIHI), volume 1, pages 1–5. IEEE, Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-
2023. time object detectors. In Proceedings of the IEEE/CVF conference on
[43] Paul Paul Tsoi. YOLO11: The cutting-edge evolution in object detection computer vision and pattern recognition, pages 7464–7475, 2023.
— a brief review of the latest in the yolo series. https://medium.com, [66] Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. Yolov9:
October 2024. Accessed: 2024-10-17. Learning what you want to learn using programmable gradient infor-
[44] Minh-Tan Pham, Luc Courtrai, Chloé Friguet, Sébastien Lefèvre, and mation. arXiv preprint arXiv:2402.13616, 2024.
Alexandre Baussard. Yolo-fine: One-stage detector of small objects [67] Yifan Wang, Lin Yang, Hong Chen, Aamir Hussain, Congcong Ma, and
under various backgrounds in remote sensing images. Remote Sensing, Malek Al-gabri. Mushroom-yolo: A deep learning algorithm for mush-
12(15):2501, 2020. room growth recognition based on improved yolov5 in agriculture 4.0.
[45] Francesco Prinzi, Marco Insalaco, Alessia Orlando, Salvatore Gaglio, In 2022 IEEE 20th International Conference on Industrial Informatics
and Salvatore Vitabile. A yolo-based model for breast cancer detection (INDIN), pages 239–244. IEEE, 2022.
in mammograms. Cognitive Computation, 16(1):107–120, 2024. [68] Tingting Zhao, Xiaoli Yi, Zhiyong Zeng, and Tao Feng. Mobilenet-yolo
[46] Sovit Rath. Yolov8 ultralytics: State-of-the-art yolo models. based wildlife detection model: A case study in yunnan tongbiguan na-
LearnOpenCV–Learn OpenCV, PyTorch, Keras, TensorflowWith Exam- ture reserve, china. Journal of Intelligent & Fuzzy Systems, 41(1):2171–
ples and Tutorials, 2023. 2181, 2021.
[47] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You [69] Yifei Zheng and Hongling Zhang. Video analysis in sports by
only look once: Unified, real-time object detection. In Proceedings of lightweight object detection network under the background of sports
20

industry development. Computational Intelligence and Neuroscience,

2022:1–10, 08 2022.
[70] Zhengxia Zou, Keyan Chen, Zhenwei Shi, Yuhong Guo, and Jieping
Ye. Object detection in 20 years: A survey. Proceedings of the IEEE,
111(3):257–276, 2023.

Mastering All YOLO Models From YOLOv1 To YOLO
100% (1)
Mastering All YOLO Models From YOLOv1 To YOLO
58 pages
Unix PPT Lesson
75% (4)
Unix PPT Lesson
70 pages
AR253 History 2 - Structuralism and Metabolism
No ratings yet
AR253 History 2 - Structuralism and Metabolism
55 pages
Object Detection Week 2 YOLOv1-YOLOv8
100% (1)
Object Detection Week 2 YOLOv1-YOLOv8
264 pages
Advantage Workstation 4.3 SM
100% (1)
Advantage Workstation 4.3 SM
346 pages
Batch Record
No ratings yet
Batch Record
11 pages
Paper 5
No ratings yet
Paper 5
13 pages
Evolution of Yolo Algorithm and Yolov5: The State-Of-The-Art Object Detection Algorithm
100% (1)
Evolution of Yolo Algorithm and Yolov5: The State-Of-The-Art Object Detection Algorithm
61 pages
Features of Yolo11
No ratings yet
Features of Yolo11
9 pages
YOLOv1 v8综述
No ratings yet
YOLOv1 v8综述
36 pages
YOLO Algorithm For Real-Time Object Detection: 2.1. Network Design
No ratings yet
YOLO Algorithm For Real-Time Object Detection: 2.1. Network Design
3 pages
AR Yolo 12: A - B E - P V: Eview of V Ttention Ased Nhancements VS Revious Ersions
No ratings yet
AR Yolo 12: A - B E - P V: Eview of V Ttention Ased Nhancements VS Revious Ersions
18 pages
Deep Research
No ratings yet
Deep Research
18 pages
Yolo1 11
No ratings yet
Yolo1 11
38 pages
W Yolo 5: A: Hat Is V Deep Look Into The Internal Features of The Popular Object Detector
No ratings yet
W Yolo 5: A: Hat Is V Deep Look Into The Internal Features of The Popular Object Detector
8 pages
YOLOv12 - A Breakdown of The Key Architectural Features
No ratings yet
YOLOv12 - A Breakdown of The Key Architectural Features
9 pages
Real Time Object Detection
No ratings yet
Real Time Object Detection
8 pages
Deep Learning YOLOv2
No ratings yet
Deep Learning YOLOv2
3 pages
You Only Look Once - Object Detection Models A Review
No ratings yet
You Only Look Once - Object Detection Models A Review
8 pages
Image Detection and Segmentation Using YOLO v5 For
No ratings yet
Image Detection and Segmentation Using YOLO v5 For
6 pages
Enhancing Real-Time Object Detection With YOLO Alg
No ratings yet
Enhancing Real-Time Object Detection With YOLO Alg
9 pages
Make 05 00083 v2
No ratings yet
Make 05 00083 v2
37 pages
1 s2.0 S1877050922001363 Main
No ratings yet
1 s2.0 S1877050922001363 Main
8 pages
YOLO You Only Look Once For Object
No ratings yet
YOLO You Only Look Once For Object
1 page
Csit 121602
No ratings yet
Csit 121602
12 pages
Overview of YOLO ObjectDetectionAlgorithm
No ratings yet
Overview of YOLO ObjectDetectionAlgorithm
7 pages
YOLO Evolution - 2017 To 2025
No ratings yet
YOLO Evolution - 2017 To 2025
8 pages
Yolo
No ratings yet
Yolo
34 pages
Yolov5 Paper
No ratings yet
Yolov5 Paper
12 pages
Yolo
No ratings yet
Yolo
32 pages
A Comprehensive Review of YOLO From YOLOv1 To YOLO
No ratings yet
A Comprehensive Review of YOLO From YOLOv1 To YOLO
27 pages
Synopsis - Internship - Group-53
No ratings yet
Synopsis - Internship - Group-53
8 pages
Yolov10 To Its Genesis A Decadal and Comprehensive
No ratings yet
Yolov10 To Its Genesis A Decadal and Comprehensive
49 pages
Yolo Paper
No ratings yet
Yolo Paper
10 pages
Analyzing Real-Time Object Detection With YOLO Alg
No ratings yet
Analyzing Real-Time Object Detection With YOLO Alg
43 pages
YOLOv8 A Novel Object Detection Algorithm With Enhanced Performance and Robustness
No ratings yet
YOLOv8 A Novel Object Detection Algorithm With Enhanced Performance and Robustness
6 pages
Object Detection Document
No ratings yet
Object Detection Document
4 pages
Object Detection and Classification Using Yolov3 IJERTV10IS020078
No ratings yet
Object Detection and Classification Using Yolov3 IJERTV10IS020078
6 pages
Algoritm For MOD
No ratings yet
Algoritm For MOD
32 pages
Yolov 8
No ratings yet
Yolov 8
31 pages
MJEER-Volume 30-Issue 1 - Page 52-57
No ratings yet
MJEER-Volume 30-Issue 1 - Page 52-57
6 pages
YOLO Based Object Detection Models: A Review and Its Applications
No ratings yet
YOLO Based Object Detection Models: A Review and Its Applications
40 pages
Yolo Algorithm
No ratings yet
Yolo Algorithm
37 pages
Yolo Comprehensive (v1 To v8)
No ratings yet
Yolo Comprehensive (v1 To v8)
34 pages
Efficient Object Detection With YOLO A C
No ratings yet
Efficient Object Detection With YOLO A C
13 pages
2023 - Comparison of Transfer Learning Techniques For Object Detection
No ratings yet
2023 - Comparison of Transfer Learning Techniques For Object Detection
10 pages
You Only Look Once Model-Based Object Identification in Computer Vision
No ratings yet
You Only Look Once Model-Based Object Identification in Computer Vision
12 pages
You Only Look Once - Unified, Real-Time Object Detection
No ratings yet
You Only Look Once - Unified, Real-Time Object Detection
10 pages
YOLO
No ratings yet
YOLO
10 pages
Abir
No ratings yet
Abir
10 pages
YOLO Based Detection and Classification of Objects in Video Records
No ratings yet
YOLO Based Detection and Classification of Objects in Video Records
5 pages
Mca Format Weapon Detection
No ratings yet
Mca Format Weapon Detection
76 pages
14489-Article Text-52397-76611-10-20240112
No ratings yet
14489-Article Text-52397-76611-10-20240112
10 pages
YOLOv1 To v8 Unveiling Each VariantA Comprehensive Review of YOLO
No ratings yet
YOLOv1 To v8 Unveiling Each VariantA Comprehensive Review of YOLO
18 pages
Evolution of Yolov3
No ratings yet
Evolution of Yolov3
2 pages
YOLO Is The State-Of-The-Art, Real Time System Built On Deep Learning For Solving Object Detection Problems
50% (2)
YOLO Is The State-Of-The-Art, Real Time System Built On Deep Learning For Solving Object Detection Problems
8 pages
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
No ratings yet
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
5 pages
Yolo Real-Time Object Detection System
No ratings yet
Yolo Real-Time Object Detection System
16 pages
YOLOv 5
No ratings yet
YOLOv 5
10 pages
Ex No 06
No ratings yet
Ex No 06
4 pages
Final Synopsis1
No ratings yet
Final Synopsis1
10 pages
Project
100% (1)
Project
30 pages
YOLO Object Detection Explained: Definitive Reference for Developers and Engineers
From Everand
YOLO Object Detection Explained: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
CH11-Digital Logic
No ratings yet
CH11-Digital Logic
6 pages
Time Is Money - Estimating The Cost of Latency in Trading
No ratings yet
Time Is Money - Estimating The Cost of Latency in Trading
61 pages
Ec34 Question Bank
No ratings yet
Ec34 Question Bank
6 pages
Classroom Inventory List SCHOOL YEAR
No ratings yet
Classroom Inventory List SCHOOL YEAR
1 page
MLE1101 - Tutorial 2 - Suggested Solutions
No ratings yet
MLE1101 - Tutorial 2 - Suggested Solutions
8 pages
Marantz SR 4500 Brochure
No ratings yet
Marantz SR 4500 Brochure
4 pages
Ionic Equilibrium DPP
No ratings yet
Ionic Equilibrium DPP
33 pages
Java Sript
No ratings yet
Java Sript
2 pages
Data Handling Practice Sheets
No ratings yet
Data Handling Practice Sheets
8 pages
Mip Report
No ratings yet
Mip Report
22 pages
Rexroth Servo Drives Programming:: Page 1 of 56
No ratings yet
Rexroth Servo Drives Programming:: Page 1 of 56
56 pages
HITEC PowerPRO2700 - 2016 PDF
100% (4)
HITEC PowerPRO2700 - 2016 PDF
55 pages
Recent Advances in Diagnostic Aids
No ratings yet
Recent Advances in Diagnostic Aids
59 pages
P235GH Engl PDF
No ratings yet
P235GH Engl PDF
4 pages
Cbds 2103
No ratings yet
Cbds 2103
11 pages
Mcp737Pro: Cpflight Operations Manual
No ratings yet
Mcp737Pro: Cpflight Operations Manual
12 pages
1000-4 European Union EN12975
No ratings yet
1000-4 European Union EN12975
26 pages
APEC 2015 Intro Small Signal Modeling Seminar
No ratings yet
APEC 2015 Intro Small Signal Modeling Seminar
171 pages
قوانين الفصول بملف واحد فيزياء السادس علمي للاستاذ سعيد محي تومان PDF PDF Mathematical Analysis Teaching Mathematics
No ratings yet
قوانين الفصول بملف واحد فيزياء السادس علمي للاستاذ سعيد محي تومان PDF PDF Mathematical Analysis Teaching Mathematics
1 page
Chemical Shift
No ratings yet
Chemical Shift
10 pages
KRNT fx175qtv Data Cheet PDF
No ratings yet
KRNT fx175qtv Data Cheet PDF
2 pages
Timber Home Living 2015-09-10
No ratings yet
Timber Home Living 2015-09-10
84 pages
Structural Analysis
No ratings yet
Structural Analysis
3 pages
SAL Event Documentation
No ratings yet
SAL Event Documentation
13 pages
Homomorphism
No ratings yet
Homomorphism
10 pages
Jr. Chemistry 2024 AP
No ratings yet
Jr. Chemistry 2024 AP
12 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Evaluating The Evolution of YOLO You Only Look Onc

Uploaded by

Evaluating The Evolution of YOLO You Only Look Onc

Uploaded by

1

Evaluating the Evolution of YOLO (You Only Look

categories: one-stage and two-stage approaches.

YOLOv3 Scaled-YOLOv4 YOLOv6 YOLO-NAS YOLOv11

YOLO is introduced YOLOS YOLOv9

YOLOv2 (YOLO9000) PP-YOLO YOLOX YOLOv6 3.0 YOLOv10

YOLOv4 YOLOR YOLOv7 YOLO-World

Fig. 1. Evolution of YOLO Algorithms throughout the years.

library. The main goal is to provide a thorough and com- TABLE I

TABLE II an anchor-free detection method and achieving better overall

its ability to detect objects efficiently across varying scales, as

3) YOLOv5u: YOLOv5, proposed by Glenn Jocher, tran-

7) YOLO11: YOLO11 [26] is the latest innovation in the

Version Scaled Version

Preprocess Inference Postprocess Total

Preprocess Inference Postprocess

Preprocess Inference Postprocess Total

industry development. Computational Intelligence and Neuroscience,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.