YOLO-LLTS: Real-Time Low-Light Traffic Sign Detection Via Prior-Guided Enhancement and Multi-Branch Feature Interaction
YOLO-LLTS: Real-Time Low-Light Traffic Sign Detection Via Prior-Guided Enhancement and Multi-Branch Feature Interaction
Abstract—Detecting traffic signs effectively under low-light traffic sign dataset, End-to-End Algorithm, Edge device deploy-
conditions remains a significant challenge. To address this issue, ment.
we propose YOLO-LLTS, an end-to-end real-time traffic sign de-
tection algorithm specifically designed for low-light environments.
Firstly, we introduce the High-Resolution Feature Map for Small
Object Detection (HRFM-TOD) module to address indistinct
small-object features in low-light scenarios. By leveraging high-
resolution feature maps, HRFM-TOD effectively mitigates the
feature dilution problem encountered in conventional PANet
frameworks, thereby enhancing both detection accuracy and
inference speed. Secondly, we develop the Multi-branch Feature
Interaction Attention (MFIA) module, which facilitates deep
feature interaction across multiple receptive fields in both chan-
nel and spatial dimensions, significantly improving the model’s
information extraction capabilities. Finally, we propose the Prior-
Guided Enhancement Module (PGFE) to tackle common image
Normal Low-light
quality challenges in low-light environments, such as noise, low
contrast, and blurriness. This module employs prior knowledge to
enrich image details and enhance visibility, substantially boosting Fig. 1. The comparison of traffic signs in normal and low-light environments
detection performance. To support this research, we construct
a novel dataset, the Chinese Nighttime Traffic Sign Sample
Set (CNTSSS), covering diverse nighttime scenarios, including I. I NTRODUCTION
urban, highway, and rural environments under varying weather
conditions. Experimental evaluations demonstrate that YOLO-
LLTS achieves state-of-the-art performance, outperforming the
previous best methods by 2.7% mAP50 and 1.6% mAP50:95 on
T RAFFIC sign recognition plays a crucial role in advanced
driver-assistance systems (ADAS) and autonomous ve-
hicles, ensuring road safety and assisting in navigation. As
TT100K-night, 1.3% mAP50 and 1.9% mAP50:95 on CNTSSS, shown in Fig. 2, after the camera captures the traffic signs,
and achieving superior results on the CCTSDB2021 dataset.
Moreover, deployment experiments on edge devices confirm the vehicle equipped with an assisted driving system performs
the real-time applicability and effectiveness of our proposed calculations through a mobile edge computing device. The
approach. system uses a deep learning network to detect the traffic signs,
Index Terms—Traffic Sign Detection, Low-Light Conditions, ensuring driving safety [1]. Existing object detection methods
have demonstrated strong capabilities in accurately detecting
This work is expected to be submitted to IEEE Transactions on Instrumen- various traffic elements, including pedestrians [2]–[5], vehicles
tation and Measurement. [6]–[10], and traffic lights [11]–[13]. However, the detection of
This project is jointly supported by the National Natural Science Foundation traffic signs remains a considerable challenge, primarily due to
of China (Nos.52172350, W2421069, 51775565 and 61003143), the Guang-
dong Basic and Applied Research Foundation (No. 2022B1515120072), the their small size and the complexity involved in distinguishing
Guangzhou Science and Technology Plan Project (No.2024B01W0079), the them from other objects in the scene, making this task par-
Nansha Key RD Program (No.2022ZD014), the Science and Technology Plan- ticularly difficult in real-world scenarios. In high-resolution
ning Project of Guangdong Province (No.2023B1212060029). (Corresponding
author: Ronghui Zhang.) images with dimensions of 2048×2048 pixels, a sign may
Ziyu Lin, Yunfan Wu, Yuhang Ma, Junzhou Chen, Ronghui Zhang occupy only a small area of approximately 30×30 pixels.
is with the Guangdong Key Laboratory of Intelligent Transportation Due to the extremely low resolution and limited information,
System, School of intelligent systems engineering, Sun Yat-sen
University, Guangzhou 510275, China (e-mail: linzy88@mail2.sysu.edu.cn, significant efforts have been made in recent years to enhance
wuyf227@mail2.sysu.edu.cn, mayh39@mail2.sysu.edu.cn, chenjun- the performance of small object detection. Existing traffic sign
zhou@mail.sysu.edu.cn, zhangrh25@mail.sysu.edu.cn). detection algorithms [14]–[16] have been improved to address
Jiaming Wu is with the Department of Architecture and Civil Engineering,
Chalmers University of Technology, Sven Hultins gata 6 SE-412 96, Gothen- the characteristics of small targets, enabling effective detection
burg, Sweden (jiaming.wu@chalmers.se). of traffic signs during the daytime.
Guodong Yin is with the School of Mechanical Engineering, Southeast However, with the increasing number of low-light traffic
University, Nanjing 211189, China (e-mail: ygd@seu.edu.cn).
Liang Lin is with the School of Computer Science and Engineering, Sun accidents and the growing demand for all-weather systems,
Yat-sen University, Guangzhou 510275, China (e-mail: linliang@ieee.org). the importance of traffic sign recognition under low-light
2
Our Model
YOLO-LLTS
NVIDIA Jetson AGX Orin
MFIA-1
MFIA-2
Head
MFIA-3
MFIA-4
Fig. 2. Application Scenarios of Traffic Sign Detection in advanced driver-assistance systems [17] [18] [19].
conditions has garnered more attention. As shown in Fig. 1, suitable only for specific datasets. Zhang et al. [22] enhanced
visibility decreases and image noise increases, complicating model robustness through exposure, tone, and brightness fil-
the driving scenario in low-light environments. Facing the ters, enabling end-to-end training. However, simply brighten-
dual challenges of low-resolution small objects and low- ing low-light images led to the loss of original information and
visibility low-light conditions, existing methods struggle to introduced more noise, which undermined the original intent.
clearly capture the features of traffic signs for detection and Sun et al. [23] proposed LLTH-YOLOv5, which enhanced
classification. images using pixel-level adjustments and non-reference loss
functions. However, this method introduced an enhancement-
A straightforward approach is to pre-enhance the images specific loss function based on YOLO, meaning that end-to-
using advanced low-light enhancement techniques and then end training was not fully achieved.
apply object detection algorithms to identify the enhanced
images. the approach of addressing image enhancement and To address the dual challenges of small targets with low
object detection as discrete tasks frequently results in a chal- resolution and low visibility in low-light environments, we
lenge of compatibility. Moreover, simply concatenating the have designed an end-to-end traffic sign recognition algo-
two models results in slow inference speeds, which fails to rithm specifically for low-light conditions. We have improved
meet the real-time requirements of ADAS. Chowdhury et al. YOLOv8 by utilizing high-resolution images to extract clearer
[20] utilized optimal reinforcement learning strategies and features and designed a new multi-branch features interaction
various Generative Adversarial Network (GAN) [21] models to attention module to fuse features from different receptive
enhance the training data for traffic sign recognition. However, fields. Additionally, we have developed a prior-providing
this approach is heavily reliant on data labels, making it module that not only enhances the image but also supple-
3
ments its details. Our algorithm effectively mitigates the poor methods in this field mainly consist of two classes: traditional
performance of existing methods under low-light conditions, methods and machine learning methods.
thereby improving the safety of autonomous driving systems. Traditional methods in LLIE primarily emphasized his-
Furthermore, to address the lack of low-light scene data in togram equalization and strategies derived from Retinex theory
existing traffic sign datasets, we have constructed a multi-scene [24]. Histogram equalization enhanced image brightness by
Chinese traffic sign dataset ranging from dusk to deep night, expanding the dynamic range of pixel values, which included
providing a foundational resource for industry research in this global methods [25] and local methods [26]. Methods based on
area. Retinex decomposed the image into illumination and reflection
In summary, the main contributions of our proposed method components, assuming that the reflection component remained
are as follows: consistent under different illumination conditions. For ex-
1) Chinese Nighttime Traffic Sign Sample Set: To address ample, Fu et al. [27], [28] first used two norm-constrained
the lack of traffic sign datasets under low-light conditions, illumination and proposed a two norm-based optimization
we constructed a novel dataset named CNTSSS. This solution, while the Retinex model proposed by Li et al. [29]
dataset was collected across 17 cities in China, containing considering noise, estimating the illumination map by solving
images captured under various nighttime lighting condi- an optimization problem. However, these traditional methods
tions ranging from dusk to deep night. It covers diverse usually rely on manually extracted features and may struggle
scenarios, including urban, highway, and rural environ- to achieve ideal enhancement effects under complex lighting
ments, as well as clear and rainy weather conditions. conditions.
2) High-Resolution Feature Map for Small Object De- LLIE based on Machine learning methods mainly consist of
tection (HRFM-TOD): To address the challenge of two categories: supervised learning and unsupervised learning.
indistinct features for small objects in low-light condi- Supervised learning approaches generally rely on extensive
tions, we propose the HRFM-TOD module, which utilizes datasets of low-light images as well as their paired images un-
high-resolution feature maps for detection. This module der normal-light conditions to facilitate effective training. For
effectively alleviates the feature dilution problem encoun- example, LLNet [30], as the first deep learning method for low
tered in traditional PANet frameworks when detecting light image enhancement (LLIE) that introduced an end-to-end
small targets, thus improving both detection accuracy and network, was trained on simulated data with random gamma
inference speed. correction. Furthermore, Wei et al. [31] creatively combined
3) Multi-branch Feature Interaction Attention Module Retinex theory with convolutional neural networks, dividing
(MFIA): To enhance the model’s capability of capturing the network into modules of decomposition, adjustment and
information from multiple receptive fields, we introduce reconstruction, and trained it using their self-built LOL dataset.
the MFIA module. This module facilitates deep interac- The performance of these methods relies heavily on the quality
tion and fusion of multi-scale features across both channel and diversity of the paired training datasets.
and spatial dimensions. Unlike previous methods that Unsupervised learning methods focus on enhancing low-
focus solely on single-scale attention mechanisms, MFIA light images without paired training data. For example, Zero-
effectively integrates multi-scale, semantically diverse DCE [32] approached image enhancement as a task of estimat-
features. ing image-specific curves using a deep network, driving the
4) Prior-Guided Enhancement Module (PGFE): To over- network’s learning process through a series of meticulously
come common image quality issues such as noise, re- crafted loss functions to achieve enhancement without the
duced contrast, and blurriness inherent in low-light envi- need for paired data. EnlightenGAN [33] used an attention
ronments, we propose the PGFE module. This module mechanism-based U-Net [34] as a generator and employed
utilizes prior knowledge to enhance images and sup- the method of GAN to perform image enhancement tasks
plement image details, significantly boosting detection without paired training data. Cui et al. [35] proposed the
performance under challenging low-light conditions. Illumination Adaptive Transformer (IAT) model, which uses
attention mechanisms to adjust parameters related to the image
signal processor (ISP), effectively enhancing target under
II. R ELATED W ORKS
various lighting conditions. These methods demonstrate the
For the task of detecting traffic signs, the greatest challenge potential for unsupervised learning in LLIE and demonstrate
lies in the small size of the traffic signs and the precise detec- the flexibility and adaptability of deep learning models to
tion and localization of the traffic signs in various complex diverse lighting conditions.
scenarios. Therefore, we systematically review the relevant
research status in three aspects: low-light image enhancement
B. Small Target Detection Methods
(LLIE) methods, target detection methods in complex scenar-
ios, and small target detection methods. Detecting small targets in object detection is a challenging
task. Small targets are often plagued by low resolution, and
their feature extraction and precise detection are extremely
A. Low-Light Image Enhancement Methods (LLIE Methods) difficult due to interference from various background infor-
LLIE methods are able to effectively enhance the quality of mation. Moreover, since the positions of small target detection
images under low-light conditions. Currently, the enhancement objects are usually not fixed and may appear anywhere in the
4
image, including peripheral areas or overlapping objects, their method to simplify the training process, effectively completing
precise localization is even more challenging. Data augmen- target detection tasks under various low-light conditions. Yang
tation, multi-receptive field learning, and context learning are et al. [45] proposed a Dual-Mode Serial Night Road Object
effective strategies to boost the performance of small target Detection Model (OMOT) based on depthwise separable con-
detection. volution and a self-attention mechanism. OMOT significantly
Data augmentation, as a simple and effective strategy, can improves the detection accuracy of vehicles and pedestrians at
effectively enhance the ability to extract features of small night by leveraging a lightweight object proposal module and
target objects by increasing the diversity of the training set. Cui a classification module enhanced with self-attention mecha-
et al. [36] pasted objects into different backgrounds to directly nisms. The model not only considers vehicle light features,
augment the rare classes in the dataset; Zhang et al. [37] used but also enhances the extraction of nighttime features through
GAN to perform data augmentation, optimizing the model’s self-attention, demonstrating robust performance in complex
stability and robustness; Xie et al. [38] increased the number of environments.
high-difficulty negative samples in data set balancing and data The problem of traffic sign detection under low-light con-
augmentation during training, expanding the existing dataset ditions is a subtask of object detection in complex scenes. It
by simulating complex environmental changes. can be decomposed into two challenges: object detection in
Multi-scale fusion learning integrates deep semantic infor- complex scenes and small object detection. There is limited
mation with shallow representation information, effectively existing research on this topic. Zhang et al. [22] enhanced
mitigating the fading of small object features and positional images using exposure, tone, and luminance filters with a small
details in the detection network layer by layer. SODNet [39] convolutional network for predicting filter parameters during
adaptively acquires corresponding spatial information through training. And they improved object detection accuracy with a
multi-scale fusion, thereby strengthening the network’s capac- Feature Encoder and an explicit objectness branch. Sun et al.
ity to extract features of small target objects. Ma et al. [40] [23] enhanced images using pixel-level adjustments and non-
used deconvolution to upsample deep semantic information reference loss functions at the low-light enhancement step. The
and then combined it with shallow representation information framework improved upon YOLOv5 by replacing PANet with
to build a feature pyramid, improving detection accuracy. BIFPN and introducing a transformer-based detection head for
TsingNet [41] constructed a bidirectional attention feature better small target detection during the object detection phase.
pyramid, using both top-down and bottom-up subnets to per-
ceive foreground features and reduce semantic gaps between
multiple scales, effectively detecting small targets. MIAF-
Net [15], consisting of a lightweight FCSP-Net backbone, an Beijing
Attention-Balanced Feature Pyramid Network (ABFPN), and
Tianjin
a Multi-Scale Information Fusion Detection Head (MIFH),
can not only effectively extract features of small objects but
also enhance the association between foreground features and
Shangqiu Zhenjiang
contextual information through a self-attention mechanism,
Nanchong
thereby performing well in small target detection tasks. Wuhan Nanjing
Shanghai
The appearance features of small targets are usually not Chengdu
Jingdezhen
Chongqing
very prominent, so appropriate context modeling can improve Shangrao
Changsha
the detector’s performance on small target detection tasks. Guilin
prohibitory
CNTSSS
14%
warning
22%
64%
(33, 19)
prohibitory
mandatory
warning
mandatory
C2f
Input Output
SPPF Conv Upsampling MFIA_4
P4 (2020512) F4 (160160128)
C Channel Concatenation
C2f CBS Split CBS CBS … CBS CBS C
n Element-wise Addition
in Fig. 6, deep multi-scale features are uniformly adjusted to Multi-branch Features Interaction Attention Module (MFIAi)
Multi-branch Features Multi-branch Features
the high-resolution image size 160×160×128 through 1×1 Channel Attention Module(MFCA) Spatial Attention Module(MFSA)
convolution and Bilinear upsampling. The formula is as fol- Fi cam Fi c Fi
c
Fi α1 Fi α2 Fi β mfia
lows:
CAM1
CAM2
• • • Fi
SAM
F2 = Upsampling (Conv (P2 )) (1)
F3 = Upsampling (Conv (P3 )) (2)
MaxPool
1x1 Conv
1x1 Conv
AvgPool
Sigmoid
ReLU
7x7 Conv
Sigmoid
C
AvgPool
where P2 , P3 and P4 represent the input features, Conv(·)
denotes the convolution operation, and Upsampling(·) refers to • Element-wise Multiplication
the upsampling operation. F2 , F3 , and F4 represent the output
features. Fig. 7. Overview of Multi-branch Features Interaction Attention Module
In the feature extraction process, we adjust the receptive (MFIAi ). MFIA is applied to Fi (i = 1, 2, 3, 4) four times throughout
the network. The illustration shows one MFIAi module along with its
field obtained from the backbone using the computational corresponding output feature Fimfia .
method proposed in [16]. As a result, the receptive field
received by the HRFM-TOD module varies with the size of
the target. C. Multi-branch Features Interaction Attention Module
The high-resolution feature maps of the same scale are fed
into the MFIA module (MFIA1 to MFIA4 ) for feature fusion. In low-light conditions, small targets have low resolution
This module is designed to focus more on the resolution of and contain less information, requiring the model to have a
small objects. By integrating the feature maps F1 , F2 , F3 , F4 stronger ability to capture information. The attention mech-
from four different receptive fields, it significantly enhances anism has been extensively studied and applied to enhance
the feature representation of small objects while reducing com- the model’s ability to capture key features (e.g., SENet [50],
putational costs. Finally, the four features are fused through CANet [51], CBAM [52]). However, most existing attention
concatenation, as shown in the following formula: mechanisms focus on processing a single feature, neglecting
the potential complementarity and interactions between fea-
Ffuse = Concat (MFIAi (F1 , F2 , F3 , F4 )) (4) tures. Dai et al. [53] proposed a method for attentional feature
fusion of local and global features, yet this method failed to
where i represents the i-th pass through the MFIA (i = address the challenge of fusing features across more than two
1, 2, 3, 4). scales. Zhao et al. [54] proposed BA-Net, which improved
7
SENet by leveraging information from shallow convolutional To address this challenge, we propose the Prior-Guided
layers but overlooked the spatial domain of the image. There- Feature Enhancement (PGFE) Module, which replaces the
fore, we propose an attention mechanism called Multi-branch original P0 layer of YOLOv8. This module transforms RGB
Features Interaction Attention Module (MFIA). images from 3 channels to 64 channels to enhance low-
As shown in Fig. 7, MFIA consists of two components: light images and provide prior knowledge for the following
the Multi-branch Features Channel Attention Module (MFCA) detection. The module consists of two main components:
and the Multi-branch Features Spatial Attention Module the Prior-Guided Enhancement (PGE) Module and the Detail
(MFSA). It effectively utilizes features with semantic and scale Texture Recovery (DTR) Module.
inconsistencies, enabling efficient interaction between multiple 1) Prior-Guided Enhancement Module: Retinex theory
features in both the channel and spatial dimensions. Notably, [55] indicates that there is a connection between the desired
MFIA is employed four times throughout the network. The clear image z and low-light observation y: y = z ⊗ x, where x
illustration shows one MFIAi module along with its corre- represents the illumination component. Inspired by SCI [56],
sponding output feature Fimfia . the image brightness can be improved to some extent by
1) Multi-branch Features Channel Attention Module: learning a residual representation between illumination and
MFCAi achieves preliminary interaction of multi-receptive low-light observation. Compared to directly mapping low-
field features through two consecutive Channel Attention Mod- light observation to illumination, learning the residual repre-
ules (CAM1 and CAM2 ). Each CAM layer processes the input sentation significantly reduces computational complexity and
features using 1×1 convolutions followed by ReLU activation helps avoid exposure issues caused by imbalanced brightness
functions, and then generates attention weights α1 and α2 enhancement. This design allows our network to maintain
sequentially through the Sigmoid function. These weights strong performance at night while preserving its performance
are used to adjust the channel importance of the input fea- during the day. As shown in Fig. 8, L(u) applies n consecutive
tures. Our experiments demonstrate that by iterating through residual operations on the input features, and then the final
lightweight channel attention layers multiple times, feature output is added to the initial features to enhance them. The
interactions can be enhanced without compromising model formula is as follows:
performance, thus avoiding the potential bottleneck that may
G (ut ) = ut−1 + CBR (ut−1 )
arise from the initial feature map integration. Additionally, (7)
L(u) = u1 + G (ut )
both weight multiplications are applied to the initial feature
Fi to prevent the loss of original information during feature where ut represents the output parameters at the t-th stage
transmission. The entire process can be expressed as: (t = 0, . . . , T ), and G(·)denotes the residual operation. u1
represents the input feature, while uT represents the output
αn = CAMn (F1 , F2 , F3 , F4 ) feature. The output of L(·) is fed into C(·) for contrast
(5)
Ficam = αn · Fi enhancement, and then into E(·)for edge enhancement. The
where, n denotes the n-th pass through the CAM (n = 1, 2), principles of both operations are similar. C(·) enhances the
and i represents the i-th pass through the MFIA (i = 1, 2, 3, 4). contrast by amplifying the differences between different pixel
2) Multi-branch Features Spatial Attention Module: The values, and this can be represented by the following formula:
MFSA module achieves deep interaction between feature
C(x) = γ · (x − x̄) + x̄ (8)
maps through the Spatial Attention Mechanism (SAM). SAM
extracts spatial information from the feature maps using av- where x represents the mean value of x, and γ is the contrast
erage pooling and max pooling layers, followed by further enhancement coefficient, which is set to 2. E(y) enhances the
processing of this information through a 7×7 convolutional edges by amplifying the difference between the input image
layer. Finally, the spatial attention weight β is generated and its blurred version obtained through Gaussian filtering.
through a Sigmoid function. This weight is used to adjust the The formula can be expressed as:
spatial importance of the feature maps, and the process can be
expressed as: E(y) = δ · |y − Gaussian(y)| + y (9)
β = SAM(F1c , F2c , F3c , F4c ) where Gaussian(·) denotes the Gaussian filter, and δ repre-
(6) sents the sharpening intensity coefficient, which is set to 2.5.
Fimfia = β · Fic
2) Detail Texture Recovery Module: Feature enhancement
where i denotes the i-th pass through the MFIA (i = 1, 2, 3, 4).
may lead to the loss of original information in low-light im-
ages, thus requiring a network capable of effectively extracting
D. Prior-Guided Feature Enhancement Module image details and textures to supplement this. Inspired by
Images captured in conditions of diminished lighting fre- [57]–[60], the Invertible Neural Network (INN) extracts local
quently often suffer from poor quality, which manifests as information that is highly correlated with the high-frequency
increased noise, reduced contrast, blurred edges, and hidden features in the frequency domain, specifically the edges and
information in dark areas, significantly affecting the accu- lines within the image. INN prevents information loss by
racy of traffic sign detection. However, simply using existing dividing the input parameters into two parts, allowing the
networks that increase exposure may amplify image noise, input and output features to mutually generate each other. This
leaving the originally low-quality image still unclear, which can be considered as a lossless feature extraction, which is
negatively impacts subsequent object detection tasks. particularly suitable in this context. As shown in Fig. 8, our
8
n
L(u) u1 … ut CBR ut+1 … uT C(x) E(y)
(32032064)
6406403
32032064
Prior-Guided Enhancement Module(PGE)
CBS C CBS
Detail Texture Recovery Module(DTR)
uk [c+1:C] uk+1[c+1:C]
CBS Conv BN SiLU CBR Conv BN ReLU CDC Conv ReLU Depthwise Conv Conv ReLU
IV. E XPERIMENTS Fig. 9. Use CycleGAN to generate a low-light image dataset, TT100K-night,
from the TT100K dataset.
In this section, we provide a detailed description of the
dataset, parameter settings, and evaluation metrics used in the
experiments. The effectiveness of the algorithm and the ratio-
nality of the structure are demonstrated through experimental 1) TT100K-night: The TT100K dataset, organized and re-
results. Finally, we perform error analysis on the experimental leased by the joint laboratory of Tsinghua University and
outcomes and conduct real-world testing. Tencent, is modified following the approach of Zhu et al. [46],
where categories with fewer than 100 samples were excluded,
narrowing the focus to 45 categories. The training set consists
A. Datasets of 6,105 images, while the test set includes 3,071 images,
To evaluate the performance of our model in recognizing each with a resolution of 2048 × 2048 pixels. As shown in
traffic signs under nighttime conditions, we conduct a com- Fig.9, we employed CycleGAN [61] to augment the TT100K
prehensive assessment using the publicly available datasets dataset, ensuring a more accurate evaluation of the model’s
TT100K, CCTSDB2021, and our proprietary dataset CNTSSS. performance.
9
2) CNTSSS: Converting daytime data to nighttime condi- C. Comparisons with the State-of-the-arts
tions cannot accurately assess a model’s performance during
the night. Therefore, we have constructed our own dataset In comparison with existing state-of-the-art technologies,
consisting of traffic sign images captured exclusively at night. our model demonstrates superior performance in both accuracy
The CNTSSS dataset includes 3,276 images in the training set and speed. We compare YOLO-LLTS with several models,
and 786 images in the test set, with traffic signs classified into including YOLOv5 [63], YOLOv6 [64], GOLD-YOLO [65],
mandatory, prohibitory, or warning types. Further details have YOLOv8 [66], YOLOv9 [67], YOLOv10 [68], Zhang et al.
been outlined in the previous section. [22], MIAF-net [15], YOLOv11 [69], YOLO-TS [16], and
3) CCTSDB2021: The CCTSDB2021 dataset was created YOLOv12 [70]. These comparison models are all advanced
by Changsha University of Science and Technology and com- models developed in the past five years.
prises 17,856 images in both the training and test sets, with As shown in Tab. I, although YOLO-TS demonstrates
traffic signs classified into mandatory, prohibitory, or warning exceptionally high performance on the TT100K-night dataset,
types. The training set contains 16,356 images, of which our model still outperforms YOLO-TS. Specifically, our model
approximately 700 are captured at night, while the remaining achieves an accuracy of 77.2%, surpassing the second-ranked
images are taken during the day. Although the training set does YOLO-TS by 2.0%. In terms of recall, our model reaches
not include as much nighttime data as the CNTSSS dataset, 64.4%, 1.2% higher than YOLO-TS. Furthermore, our model
this distribution better reflects real-world driving conditions, achieves an F1 score of 70.2, surpassing YOLO-TS by 1.5. In
making it a challenging yet valuable benchmark. To assess the mAP50 and mAP50-95 evaluation metrics, which measure
the model’s performance at night, we selected 500 nighttime model detection performance at different IoU thresholds, our
images from the test set as the basis for performance evalua- model also performs the best, achieving 71.4% and 50.0%,
tion. respectively. With a parameter count of 9.9M and an FPS
of 83.3, our model maintains high efficiency in processing
B. Experimental Settings speed. Due to the lack of open-source code for MIAF-net, the
1) Training details: Our experiments are carried out on experimental results are based on our own reproduction. Zhang
a machine with four NVIDIA GeForce RTX 4090 GPUs. et al. only released the trained weights for the CCTSDB2021
The input images are resized to a resolution of 640 × 640 dataset. Therefore, experimental results for datasets other than
pixels. The number of epochs for training is set to 200 for the CCTSDB2021 were obtained by training the model using
CNTSSS dataset and 300 for the TT100K and CCTSDB2021 the open-source code. The model’s poor performance on
datasets. The batch size is set to 48. We utilize Stochastic the TT100K-night dataset may be attributed to the fact that
Gradient Descent (SGD) with a learning rate of 0.01 and a TT100K-night is generated from daytime images, and the
momentum of 0.937. model does not generalize well to such generated data.
2) Evaluation metrics: We evaluate the performance of Tab. II presents a performance comparison of different
the proposed algorithm using Precision, Recall, F1 score, traffic sign detection models on the CNTSS dataset. Our
mean Average Precision at 50% IoU (mAP50), mean Average model achieves an accuracy of 88.3%, surpassing the second-
Precision from 50% to 95% IoU (mAP50:95), and speed ranked YOLO-TS by 1.0%. In terms of recall, our approach
(FPS). These metrics [62] are calculated using the following reaches 74.9%, which is 1.0% higher than the second-ranked
formulas: GOLD-YOLO-L, demonstrating superior detection coverage.
TP Our model’s F1 score is 81.0, outperforming GOLD-YOLO-
Precision = (11)
TP + FP L by 1.5. The mAP50 of our model is 81.2%, surpassing
GOLD-YOLO-L by 1.3%, and the mAP50-95 is 60.1%, which
TP
Recall = (12) is 1.9% higher than YOLO-TS. With a parameter count
TP + FN of 13.9M and an FPS of 82.0, these results indicate that
2 × Precision × Recall our approach maintains high accuracy and robustness across
F1 = (13) different detection difficulties.
Precision + Recall
Tab. III presents a performance comparison of different
N
1 X traffic sign detection models on the CCTSDB2021 dataset.
mAP50 = APi50 (14)
N i=1 Despite the fact that the nighttime data used for training
account for only a small portion of the dataset, with the
majority consisting of daytime traffic sign data, our model still
M N
!
1 X 1 X i,k
mAP50:95 = P (15) achieves outstanding results across multiple metrics. Specifi-
M N i=1 50:95
k=1 cally, our model reaches an accuracy of 88.8%, surpassing
where TP represents the number of correctly identified positive the second-ranked YOLO-TS by 0.7%. In terms of recall, our
instances. FP refers to the count of instances incorrectly model achieves 81.1%, surpassing the second-ranked YOLO-
identified as positive. FN represents the count of instances in- TS by 0.3%. With an F1 score of 84.8, our model outperforms
correctly identified as negative. N represents the total number YOLO-TS by 0.5, reflecting a good balance between preci-
of categories. M represents the number of Intersection over sion and recall. Our method also excels in the mAP50 and
Union (IoU) threshold intervals which is equal to 10, spanning mAP50-95 metrics, which evaluate the model’s performance
from 0.5 to 0.95 with an increment of 0.05. across different IoU thresholds, achieving 87.8% and 57.5%,
10
Fig. 10. Comparison experimental results on the CNTSSS dataset are shown in the images. The first to third columns represent the latest advanced algorithms,
the fourth column shows our model, and the fifth column represents the ground truth. Our detection results are consistent with the ground truth.
surpassing YOLO-TS by 1.8% and 0.3%. With a parameter second to the fifth row, several models demonstrate instances
count of 10.2M and an FPS of 93.6, our model not only of missed detections. In contrast, our model’s results align
exceeds existing state-of-the-art models in accuracy but also perfectly with the ground truth, accurately recognizing traffic
demonstrates competitive processing speed, enabling rapid and signs.
accurate detection of traffic signs in practical applications.
TABLE I
P ERFORMANCE COMPARISON ON TT100K- NIGHT DATASET; T HE FIRST AND SECOND BEST RESULTS ARE INDICATED IN BLUE AND GREEN .
Method Venue Input Size Precision(%) Recall(%) F1 mAP50(%) mAP50:95(%) Param(M) FPS GPU
YOLOv5-L [63] 2020 640×640 70.6 57.2 63.2 62.4 44.4 46.1 99.0 RTX 4090
YOLOv6-L [64] CVPR2022 640×640 65.7 52.8 58.5 57.4 40.9 59.5 43.0 RTX 4090
GOLD-YOLO-L [65] NeurIPS2023 640×640 71.3 54.7 61.9 60.5 43.1 75.0 55.2 RTX 4090
YOLOv8-L [66] 2023 640×640 72.1 59.3 65.1 65.2 46.3 43.6 107.5 RTX 4090
YOLOv9-C [67] CVPR2024 640×640 71.3 59.7 65.0 64.6 45.5 50.7 80.6 RTX 4090
YOLOv10-L [68] NeurIPS2024 640×640 68.4 57.4 62.4 61.4 43.8 25.7 72.5 RTX 4090
Zhang et al. [22] TETCI2024 640×640 17.3 18.9 18.1 21.9 - 20.8 105.1 RTX 4090
MIAF-net [15] TIM2024 640×640 69.6 57.7 63.1 60.4 41.2 34.4 102.0 RTX 4090
YOLOv11-L [69] 2024 640×640 69.5 57.3 62.8 62.5 44.1 25.3 73.0 RTX 4090
YOLO-TS [16] arXiv2024 640×640 75.2 63.2 68.7 68.7 48.4 11.1 109.9 RTX 4090
YOLOv12-L [70] arXiv2025 640×640 72.6 54.5 62.3 61.4 43.8 26.4 46.3 RTX 4090
YOLO-LLTS(ours) - 640×640 77.2(+2.0)↑ 64.4(+1.2)↑ 70.2(+1.5)↑ 71.4(+2.7)↑ 50.0(+1.6)↑ 9.9 83.3 RTX 4090
11
TABLE II
P ERFORMANCE COMPARISON ON CNTSSS DATASET; T HE FIRST AND SECOND BEST RESULTS ARE INDICATED IN BLUE AND GREEN .
Method Venue Input Size Precision(%) Recall(%) F1 mAP50(%) mAP50:95(%) Param(M) FPS GPU
YOLOv5-L [63] 2020 640×640 83.7 66.7 74.2 75.0 53.4 46.1 97.1 RTX 4090
YOLOv6-L [64] CVPR2022 640×640 86.4 73.1 79.2 79.8 55.9 59.5 45.2 RTX 4090
GOLD-YOLO-L [65] NeurIPS2023 640×640 86.0 73.9 79.5 79.9 56.2 75.0 42.9 RTX 4090
YOLOv8-L [66] 2023 640×640 85.1 67.7 75.4 75.1 53.3 43.6 100.0 RTX 4090
YOLOv9-C [67] CVPR2024 640×640 84.1 66.2 74.1 74.7 52.8 50.7 80.0 RTX 4090
YOLOv10-L [68] NeurIPS2024 640×640 83.9 63.8 72.5 72.6 52.1 25.7 78.1 RTX 4090
Zhang et al. [22] TETCI2024 640×640 42.7 44.6 43.6 48.9 - 20.8 110.8 RTX 4090
MIAF-net [15] TIM2024 640×640 84.1 69.0 75.8 76.2 52.6 34.2 158.7 RTX 4090
YOLOv11-L [69] 2024 640×640 80.4 66.9 73.0 73.2 51.9 25.3 65.4 RTX 4090
YOLO-TS [16] arXiv2024 640×640 87.3 70.9 78.2 79.6 58.2 15.1 117.6 RTX 4090
YOLOv12-L [70] arXiv2025 640×640 85.0 62.6 72.1 72.8 51.7 26.3 46.1 RTX 4090
YOLO-LLTS(ours) - 640×640 88.3(+1.0)↑ 74.9(+1.0)↑ 81.0(+1.5)↑ 81.2(+1.3)↑ 60.1(+1.9)↑ 13.9 82.0 RTX 4090
Note: The article [22] only released the trained weights for the CCTSDB2021 dataset. Therefore, experimental results for datasets other than CCTSDB2021
were obtained by training the model using the open-source code.
TABLE III
P ERFORMANCE COMPARISON ON CCTSDB2021 DATASET; T HE FIRST AND SECOND BEST RESULTS ARE INDICATED IN BLUE AND GREEN .
Method Venue Input Size Precision(%) Recall(%) F1 mAP50(%) mAP50:95(%) Param(M) FPS GPU
YOLOv5-L [63] 2020 640×640 85.4 72.2 78.2 78.6 50.2 46.1 88.5 RTX 4090
YOLOv6-L [64] CVPR2022 640×640 87.4 76.5 81.6 82.0 52.6 59.5 52.2 RTX 4090
GOLD-YOLO-L [65] NeurIPS2023 640×640 83.8 77.0 80.3 80.6 51.2 75.0 63.3 RTX 4090
YOLOv8-L [66] 2023 640×640 84.4 74.4 79.1 80.5 53.0 43.6 91.7 RTX 4090
YOLOv9-C [67] CVPR2024 640×640 84.8 76.0 80.2 81.9 53.7 50.7 42.4 RTX 4090
YOLOv10-L [68] NeurIPS2024 640×640 83.9 73.2 78.2 78.7 51.4 25.7 68.0 RTX 4090
Zhang et al. [22] TETCI2024 640×640 82.7 80.7 81.6 78.1 - 20.8 101.1 RTX 4090
MIAF-net [15] TIM2024 640×640 60.1 49.6 54.5 52.7 36.2 34.4 151.5 RTX 4090
YOLOv11-L [69] 2024 640×640 85.8 74.2 79.6 81.2 54.1 25.3 70.4 RTX 4090
YOLO-TS [16] arXiv2024 640×640 88.1 80.8 84.3 86.0 57.2 12.9 138.9 RTX 4090
YOLOv12-L [70] arXiv2025 640×640 84.9 74.2 79.2 82.8 55.7 26.3 44.8 RTX 4090
YOLO-LLTS(ours) - 640×640 88.8(+0.7)↑ 81.1(+0.3) 84.8(+0.5) 87.8(+1.8)↑ 57.5(+0.3)↑ 10.2 93.6 RTX 4090
Note: Since the receptive field of the YOLO-LLTS model adjusts according to the target size, both the number of parameters and FPS vary with changes
in the dataset. Due to the lack of open-source code for MIAF-net, the experimental results are based on our own reproduction.
TABLE IV TABLE V
A BLATION STUDY FOR DIFFERENT COMPONENTS ON CNTSSS. T HE BEST A BLATION STUDY FOR γ AND δ ON CNTSSS. T HE BEST RESULT IS
RESULT IS MARKED IN BLUE . MARKED IN BLUE .
Model HRFM-TOD PGFE DFEDR mAP50 mAP50:95 FPS γ mAP50 mAP50-95 δ mAP50 mAP50-95
Baseline 75.1 53.3 100.0
1 79.9 58.7 1.5 80.8 58.7
✓ 77.6 55.5 112.4
✓ 78.3 55.5 74.1 1.5 80.0 58.5 2 80.7 59.3
✓ ✓ 79.5 59.1 78.7 2 81.2↑ 60.1↑ 2.5 81.2↑ 60.1↑
✓ ✓ 79.6 58.4 102.0
2.5 80.6 58.1 3 80.0 58.8
Ours ✓ ✓ ✓ 81.2↑ 60.1↑ 82.0
Fig. 11. An analysis of several critical performance metrics of our model during the training phase, evaluated on the TT100K-night, CNTSSS, and CCTSDB2021
datasets.
Fig. 12. Conducting field tests using Mobile Edge Computing Devices.
from 53.3% to 55.5%, an increase of 2.2%. When PGFE and the normalized loss decreases rapidly and eventually stabilizes
HRFM-TOD are combined, the mAP50 increases by 4.4%, and as the number of training epochs increases across all datasets,
the mAP50:95 increases by 5.8%. The experiments demon- indicating that the model exhibits good convergence on these
strate that the HRFM-TOD module significantly enhances low- datasets. These critical performance metrics quickly increase
quality images, resulting in better detection performance. Ad- in the early stages of training and eventually stabilize.
ditionally, we conducted experiments with different parameter At the same time, we performed inference speed testing
settings in the formula. As shown in the table, the model on the mobile edge device, NVIDIA Jetson AGX Orin. The
performs best when γ and δ are set to 2 and 2.5, respectively. experiments were conducted in an Ubuntu 18.04 operating
3) Effectiveness of DFEDR: Since the DFEDR module can system environment, using the PyTorch deep learning frame-
only be used in conjunction with the HRFM-TOD module, we work (version 2.1.0), with optimization support provided by
validated the combined effect of both modules. As shown in Jetpack 5.1. We selected 1,500 test images from the CCTSDB-
the second to last row of the table, the mAP50 and mAP50:95 2021 dataset for the speed testing. Without utilizing TensorRT
increased by 4.5% and 5.1%, respectively, compared to the acceleration, the inference time for YOLO-LLTS per image
baseline. When compared to the HRFM-TOD-only configu- was 44.9 ms, with an FPS of 22.3. The experiments demon-
ration, mAP50 and mAP50:95 increased by 2.0% and 2.9%, strate that our model exhibits good real-time performance on
respectively. The experiment demonstrates that the DFEDR edge devices. This indicates that our model has promising
module effectively integrates multi-receptive field features, applications in advanced driver assistance systems (ADAS)
improving detection performance. and autonomous driving systems. In addition, we evaluated the
model using a Mobile Edge Computing Device in real-world
E. Error Analysis and Model Deployment road scenarios. In the scene depicted in the image 12, traffic
In this study, we conducted a detailed analysis of the signs were successfully detected and accurately classified. This
training errors of the traffic sign detection model across three demonstrates the full effectiveness of our model in real-world
different datasets [71]. Fig. 11 presents the evolution of key applications.
training metrics, including normalized loss, precision, recall,
mAP50, and mAP50-95, with respect to the number of training V. C ONCLUSION
epochs. The normalized loss is the sum of the box loss,
classification loss, and distribution focal Loss, which we have In this paper, we propose YOLO-LLTS, an end-to-end real-
normalized for ease of presentation. As shown in the figure, time traffic sign detection algorithm specifically designed for
13
low-light environments. To address the lack of nighttime- [8] G. Li, Z. Ji, and X. Qu, “Stepwise domain adaptation (sda) for object
specific traffic sign datasets, we constructed a novel dataset detection in autonomous vehicles using an adaptive centernet,” IEEE
Trans. Intell. Transp. Syst., vol. 23, no. 10, pp. 17 729–17 743, Apr.
named Chinese Nighttime Traffic Sign Sample Set (CNTSSS). 2022.
This dataset includes images captured under varying low-light [9] H. Wang, C. Liu, Y. Cai, L. Chen, and Y. Li, “Yolov8-qsd: An improved
conditions from dusk to midnight, covering diverse scenarios small object detection algorithm for autonomous vehicles based on
yolov8,” IEEE Trans. Instrum. Meas., vol. 73, pp. 1–16, Mar. 2024.
such as urban, highway, and rural environments under different [10] Q. Zhang and X. Hu, “Msffa-yolo network: Multiclass object detection
weather conditions. We introduce the High-Resolution Feature for traffic investigations in foggy weather,” IEEE Trans. Instrum. Meas.,
Map for Small Object Detection (HRFM-TOD) module to ef- vol. 72, pp. 1–12, Sep. 2023.
[11] H. Wang et al., “Yolov5-fog: A multiobjective visual detection algorithm
fectively address the indistinct features of small objects under for fog driving scenes based on improved yolov5,” IEEE Trans. Instrum.
low-light conditions, significantly improving both detection Meas., vol. 71, pp. 1–12, Aug. 2022.
accuracy and inference speed. Moreover, we design the Multi- [12] G. Golcarenarenji, I. Martinez-Alpiste, Q. Wang, and J. M. Alcaraz-
Calero, “Robust real-time traffic light detector on small-form platform
branch Feature Interaction Attention (MFIA) module, enabling for autonomous vehicles,” J. Intell. Transport. Syst., vol. 28, no. 5, pp.
deep interaction and fusion of features across multiple recep- 668–678, 2024.
tive fields, thereby enhancing the model’s ability to capture and [13] Q. Wang, Q. Zhang, X. Liang, Y. Wang, C. Zhou, and V. I. Mikulovich,
“Traffic lights detection and recognition method based on the improved
utilize critical information. Furthermore, we propose the Prior- yolov4 algorithm,” Sensors, vol. 22, no. 1, p. 200, 2021.
Guided Enhancement Module (PGFE) to alleviate challenges [14] J. Chen, K. Jia, W. Chen, Z. Lv, and R. Zhang, “A real-time and high-
such as increased noise, reduced contrast, and blurriness under precision method for small traffic-signs recognition,” Neural Comput.
Appl., vol. 34, no. 3, pp. 2233–2245, 2022.
low-light environments, significantly boosting the detection [15] Y. Zhao, C. Wang, X. Ouyang, J. Zhong, N. Zhao, and Y. Li, “Miaf-
performance. Experimental results demonstrate that our ap- net: A multi-information attention fusion network for field traffic sign
proach achieves state-of-the-art performance on the TT100K- detection,” IEEE Trans. Instrum. Meas., vol. 73, pp. 1–14, Aug. 2024.
[16] J. Chen et al., “Yolo-ts: Real-time traffic sign detection with enhanced
night, CNTSSS, and CCTSDB2021 datasets. The deployment accuracy using optimized receptive fields and anchor-free fusion,” 2024,
experiments on edge devices further validate the practical arxiv:2410.17144.
effectiveness and real-time applicability of our method. In [17] “Iconfont,” iconfont.cn. Accessed: Mar. 16, 2025. [Online]. Available:
https://www.iconfont.cn/
future work, we plan to expand our CNTSSS dataset with [18] “Image search engine,” 52112.com. Accessed: Mar. 16, 2025. [Online].
more diverse scenarios and further optimize the algorithm Available: https://www.52112.com/
to enhance robustness and generalizability for real-world au- [19] “Iconfinder,” iconfinder.com. Accessed: Mar. 16, 2025. [Online].
Available: https://www.iconfinder.com/
tonomous driving applications. We would also like to release [20] S. R. Chowdhury et al., “Automated augmentation with reinforcement
our source code and dataset to facilitate further research learning and gans for robust identification of traffic signs using front
in this area. camera images,” in Proc. IEEE Asilomar Conf. Signals Syst. Comput.,
2019, pp. 79–83.
[21] I. J. Goodfellow et al., “Generative adversarial networks,” 2014,
ACKNOWLEDGMENTS arXiv:1406.2661.
[22] J. Zhang, Y. Lv, J. Tao, F. Huang, and J. Zhang, “A robust real-time
This project is jointly supported by the National Natural anchor-free traffic sign detector with one-level feature,” IEEE Trans.
Science Foundation of China (Nos.52172350, W2421069, Emerg. Topics Comput. Intell., vol. 8, no. 2, pp. 1437–1451, Jan. 2024.
[23] X. Sun, K. Liu, L. Chen, Y. Cai, and H. Wang, “Llth-yolov5: a real-time
51775565 and 61003143), the Guangdong Basic and traffic sign detection algorithm for low-light scenes,” Automot. Innov.,
Applied Research Foundation (No. 2022B1515120072), vol. 7, no. 1, pp. 121–137, 2024.
the Guangzhou Science and Technology Plan Project [24] Z. ur Rahman, D. J. Jobson, and G. A. Woodell, “Retinex processing
for automatic image enhancement,” J. Electron. Imaging, 2002.
(No.2024B01W0079), the Nansha Key RD Program [25] D. Coltuc, P. Bolon, and J.-M. Chassery, “Exact histogram specification,”
(No.2022ZD014), the Science and Technology Planning IEEE Trans. Image Process., vol. 15, no. 5, pp. 1143–1152, Apr. 2006.
Project of Guangdong Province (No.2023B1212060029). [26] J. Stark, “Adaptive image contrast enhancement using generalizations of
histogram equalization,” IEEE Trans. Image Process., vol. 9, no. 5, pp.
889–896, May 2000.
R EFERENCES [27] X. Fu, Y. Liao, D. Zeng, Y. Huang, X.-P. Zhang, and X. Ding, “A prob-
abilistic method for image enhancement with simultaneous illumination
[1] S. Shirmohammadi and A. Ferrero, “Camera as the instrument: the rising and reflectance estimation,” IEEE Trans. Image Process., vol. 24, no. 12,
trend of vision based measurement,” IEEE Instrum. Meas. Mag., vol. 17, pp. 4965–4977, Dec. 2015.
no. 3, pp. 41–47, Jun. 2014. [28] X. Fu, D. Zeng, Y. Huang, X.-P. Zhang, and X. Ding, “A weighted vari-
[2] Q. Yuan et al., “Triangular chain closed-loop detection network for dense ational model for simultaneous reflectance and illumination estimation,”
pedestrian detection,” IEEE Trans. Instrum. Meas., vol. 73, pp. 1–14, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016,
Dec. 2024. pp. 2782–2790.
[3] W.-Y. Hsu and P.-C. Chen, “Pedestrian detection using stationary wavelet [29] M. Li, J. Liu, W. Yang, X. Sun, and Z. Guo, “Structure-revealing low-
dilated residual super-resolution,” IEEE Trans. Instrum. Meas., vol. 71, light image enhancement via robust retinex model,” IEEE Trans. Image
pp. 1–11, Jan. 2022. Process., vol. 27, no. 6, pp. 2828–2841, Feb. 2018.
[4] X. Chu, A. Zheng, X. Zhang, and J. Sun, “Detection in crowded scenes: [30] K. G. Lore, A. Akintayo, and S. Sarkar, “Llnet: A deep au-
One proposal, multiple predictions,” in Proc. IEEE/CVF Conf. Comput. toencoder approach to natural low-light image enhancement,” 2016,
Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 12 214–12 223. arXiv:1511.03995.
[5] L. Jiang et al., “Mffsodnet: Multiscale feature fusion small object [31] C. Wei, W. Wang, W. Yang, and J. Liu, “Deep retinex decomposition
detection network for uav aerial images,” IEEE Trans. Instrum. Meas., for low-light enhancement,” 2018, arXiv:1808.04560.
vol. 73, pp. 1–14, Mar. 2024. [32] C. Guo et al., “Zero-reference deep curve estimation for low-light image
[6] T. Ye, W. Qin, Z. Zhao, X. Gao, X. Deng, and Y. Ouyang, “Real-time enhancement,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
object detection network in uav-vision based on cnn and transformer,” (CVPR), Jun. 2020, pp. 1777–1786.
IEEE Trans. Instrum. Meas., vol. 72, pp. 1–13, Feb. 2023. [33] Y. Jiang et al., “Enlightengan: Deep light enhancement without paired
[7] L. Chen et al., “Deep neural network based vehicle and pedestrian supervision,” 2021, arXiv:1906.06972.
detection for autonomous driving: A survey,” IEEE Trans. Intell. Transp. [34] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
Syst., vol. 22, no. 6, pp. 3234–3246, Jun. 2021. for biomedical image segmentation,” 2015, arXiv:1505.04597.
14
[35] Z. Cui et al., “You only need 90k parameters to adapt light: A light [61] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image
weight transformer for image enhancement and exposure correction,” translation using cycle-consistent adversarial networks,” in Proc. IEEE
2022, arXiv:2205.14871. Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2223–2232.
[36] L. Cui et al., “Context-aware block net for small object detection,” IEEE [62] H. Al Osman and S. Shirmohammadi, “Machine learning in measure-
Trans. Cybern., vol. 52, no. 4, pp. 2300–2313, Jul. 2022. ment part 2: Uncertainty quantification,” IEEE Instrum. Meas. Mag.,
[37] X. Zhang, Z. Wang, D. Liu, Q. Lin, and Q. Ling, “Deep adversarial data vol. 24, no. 3, pp. 23–27, May 2021.
augmentation for extremely low data regimes,” IEEE Trans. Circuits [63] G. Jocher, “Ultralytics yolov5,” 2020. [Online]. Available: https:
Syst. Video Technol., vol. 31, no. 1, pp. 15–28, Jan. 2021. //github.com/ultralytics/yolov5
[38] J. Zhang, Z. Xie, J. Sun, X. Zou, and J. Wang, “A cascaded r-cnn with [64] C. Li et al., “Yolov6: A single-stage object detection framework for
multiscale attention and imbalanced samples for traffic sign detection,” industrial applications,” 2022, arXiv:2209.02976.
IEEE Access, vol. 8, pp. 29 742–29 754, Feb. 2020. [65] C. Wang et al., “Gold-yolo: Efficient object detector via gather-and-
[39] G. Qi, Y. Zhang, K. Wang, N. Mazur, Y. Liu, and D. Malaviya, “Small distribute mechanism,” in Proc. Adv. Neural Inf. Process. Syst, vol. 36,
object detection method based on adaptive spatial parallel convolution 2024, pp. 1–19.
and fast multi-scale fusion,” Remote Sens., vol. 14, p. 420, Jan. 2022. [66] G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics yolov8,” 2023.
[40] X. Ma, X. Li, X. Tang, B. Zhang, R. Yao, and J. Lu, “Deconvolution [Online]. Available: https://github.com/ultralytics/ultralytics
feature fusion for traffic signs detection in 5g driven unmanned vehicle,” [67] C.-Y. Wang, I.-H. Yeh, and H.-Y. Mark Liao, “Yolov9: Learning what
Phys. Commun., vol. 47, p. 101375, 2021. you want to learn using programmable gradient information,” in Proc.
[41] Y. Liu, J. Peng, J.-H. Xue, Y. Chen, and Z.-H. Fu, “Tsingnet: Scale- Eur. Conf. Comput. Vis. (ECCV). Springer, Oct. 2024, pp. 1–21.
aware and context-rich feature learning for traffic sign detection and [68] A. Wang et al., “Yolov10: Real-time end-to-end object detection,” 2024,
recognition in the wild,” Neurocomputing, vol. 447, pp. 10–22, 2021. arXiv:2405.14458.
[42] T. Zhang, L. Li, S. Cao, T. Pu, and Z. Peng, “Attention-guided pyramid [69] G. Jocher and J. Qiu, “Ultralytics yolo11,” 2024. [Online]. Available:
context networks for detecting infrared small target under complex https://github.com/ultralytics/ultralytics
background,” IEEE Trans. Aerosp. Electron. Syst., vol. 59, no. 4, pp. [70] Y. Tian, Q. Ye, and D. Doermann, “Yolov12: Attention-centric real-time
4250–4261, Jan. 2023. object detectors,” 2025, arXiv:2502.12524.
[43] W. Liu, G. Ren, R. Yu, S. Guo, J. Zhu, and L. Zhang, “Image- [71] S. Shirmohammadi and H. Al Osman, “Machine learning in mea-
adaptive yolo for object detection in adverse weather conditions,” 2022, surement part 1: Error contribution and terminology confusion,” IEEE
arXiv:2112.08088. Instrum. Meas. Mag., vol. 24, no. 2, pp. 84–92, Apr. 2021.
[44] X. Yin, Z. Yu, Z. Fei, W. Lv, and X. Gao, “Pe-yolo: Pyramid enhance-
ment network for dark object detection,” 2023, arXiv:2307.10953. Ziyu Lin is currently pursuing a B.Sc. degree
[45] Q. Yang, Y. Ma, L. Li, and Z. Zhao, “Dual-mode serial night road in Traffic Engineering with the School of intelli-
object detection model based on depthwise separable and self-attention gent systems engineering, Sun Yat-sen University,
mechanism,” IEEE Trans. Instrum. Meas., vol. 73, pp. 1–9, Apr. 2024. Guangzhou, China. Her research interests include
[46] Z. Zhu, D. Liang, S. Zhang, X. Huang, B. Li, and S. Hu, “Traffic-sign computer vision, deep learning, and autonomous
detection and classification in the wild,” in Proc. IEEE Conf. Comput. driving technology, with a particular focus on ex-
Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2110–2118. ploring how these technologies can enhance the
[47] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel, “The german traffic efficiency and safety of autonomous driving.
sign recognition benchmark: A multi-class classification competition,”
in Proc. Int. Joint Conf. Neural Netw., Jul. 2011, pp. 1453–1460.
[48] J. Zhang, X. Zou, L.-D. Kuang, J. Wang, R. S. Sherratt, and X. Yu,
“Cctsdb 2021: a more comprehensive traffic sign detection benchmark,” Yunfan Wu is currently pursuing a B.Sc. degree
Hum.-Centric Comput. Inf. Sci., vol. 12, p. 23, 2022. in Traffic Engineering with the School of intelli-
[49] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for gent systems engineering, Sun Yat-sen University,
instance segmentation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Guangzhou, China. His research interests include au-
Recognit. (CVPR), Jun. 2018, pp. 8759–8768. tonomous vehicles, computer vision and deep learn-
[50] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proc. ing, with a commitment to reducing the likelihood
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. of accidents in autonomous vehicles and enhancing
7132–7141. their operational efficiency.
[51] Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient
mobile network design,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Recognit. (CVPR), Jun. 2021, pp. 13 713–13 722.
[52] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block
attention module,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Oct. 2018, Yuhang Ma received his B.S. degree in Automation
pp. 3–19. at Huazhong University of Science and Technology
[53] Y. Dai, F. Gieseke, S. Oehmcke, Y. Wu, and K. Barnard, “Attentional in 2020. He is currently pursuing the master’s degree
feature fusion,” in Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. in Electronic and Information Engineering at Sun
(WACV), Jan. 2021, pp. 3560–3569. Yat-sen University, Shenzhen, 518107, Guangdong,
[54] Y. Zhao, J. Chen, Z. Zhang, and R. Zhang, “Ba-net: Bridge attention for P.R.China. His current research interests include
deep convolutional neural networks,” in Proc. Eur. Conf. Comput. Vis. autonomous driving and computer vision.
(ECCV). Springer, Oct. 2022, pp. 297–312.
[55] Z.-u. Rahman, D. J. Jobson, and G. A. Woodell, “Retinex processing
for automatic image enhancement,” J. Electron. Imaging, vol. 13, no. 1,
pp. 100–110, 2004.
[56] L. Ma, T. Ma, R. Liu, X. Fan, and Z. Luo, “Toward fast, flexible, and Junzhou Chen received his Ph.D. in Computer Sci-
robust low-light image enhancement,” in Proc. IEEE/CVF Conf. Comput. ence and Engineering from the Chinese University of
Vis. Pattern Recognit. (CVPR), Jun. 2022, pp. 5637–5646. Hong Kong in 2008, following his M.Eng degree in
[57] H. Wang, X. Wu, Z. Huang, and E. P. Xing, “High-frequency component Software Engineering and B.S. in Computer Science
helps explain the generalization of convolutional neural networks,” in and Applications from Sichuan University in 2005
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. and 2002, respectively. Between March 2009 and
2020, pp. 8684–8694. February 2019, he served as a Lecturer and later as
[58] A. N. Gomez, M. Ren, R. Urtasun, and R. B. Grosse, “The reversible an Associate Professor at the School of Information
residual network: Backpropagation without storing activations,” in Proc. Science and Technology at Southwest Jiaotong Uni-
Adv. Neural Inf. Process. Syst, vol. 30, 2017, pp. 2214–2224. versity. He is currently an associate professor at the
[59] L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using Guangdong Provincial Key Laboratory of Intelligent
real nvp,” 2016, arXiv:1605.08803. Transportation System, School of Intelligent Systems Engineering, Sun Yat-
[60] Z. Zhao et al., “Cddfuse: Correlation-driven dual-branch feature decom- Sen University, Guangzhou 510275, China. His research interests include
position for multi-modality image fusion,” in Proc. IEEE/CVF Conf. computer vision, machine learning, intelligent transportation systems, mobile
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, pp. 5906–5916. computing and medical image processing.
15
Ronghui Zhang received a B.Sc. (Eng.) from the Guodong Yin received the Ph.D. degree in the
Department of Automation Science and Electrical Department of Vehicle Engineering, Southeast Uni-
Engineering, Hebei University, Baoding, China, in versity, Nanjing, China, in 2007. From 2011 to 2012,
2003, an M.S. degree in Vehicle Application Engi- he was a Visiting Scholar with the Department of
neering from Jilin University, Changchun, China, in Mechanical and Aerospace Engineering, The Ohio
2006, and a Ph.D. (Eng.) in Mechanical & Electrical State University, Columbus, OH, USA. He is cur-
Engineering from Changchun Institute of Optics, rently a Professor with the School of Mechanical
Fine Mechanics and Physics, the Chinese Academy Engineering, Southeast University, Nanjing, China.
of Sciences, Changchun, China, in 2009. After fin- His research interests include vehicle dynamics,
ishing his post-doctoral research work at INRIA, connected vehicles, and multiagent control. He has
Paris, France, in February 2011, he is currently published more than 150 peer-reviewed journal and
an Associate Professor with the Guangdong Provincial Key Laboratory of conference papers.
Intelligent Transportation System, School of Intelligent Systems Engineering,
Sun Yat-Sen University, Guangzhou 510275, China. His current research
Liang Lin (Fellow, IEEE) is a full Professor at Sun
interests include computer vision, intelligent control and ITS.
Yat-sen University. He served as the Executive R&D
Director and Distinguished Scientist of SenseTime
Group from 2016 to 2018, taking charge of trans-
ferring cutting-edge technology into products. He
has authored or co-authored more than 200 papers
Jiaming Wu received the B.S. and M.S. degree in in leading academic journals and conferences with
School of Transportation Science and Engineering more than 12,000 citations. He is an associate editor
from Harbin Institute of Technology in 2014, and of IEEE Trans. Human-Machine Systems and IET
Ph.D. degree from Southeast University in 2019. He Computer Vision. He served as Area Chairs for
is currently an Assistant Professor in the Department numerous conferences such as CVPR, ICCV, and
of Architecture and Civil Engineering, Chalmers IJCAI. He is the recipient of numerous awards and honors including Wu
University of Technology, Gothenburg, Sweden. His Wen-Jun Artificial Intelligence Award, CISG Science and Technology Award,
research interests include electric vehicle fleet man- ICCV Best Paper Nomination in 2019, Annual Best Paper Award by Pattern
agement (routing and charging), connected and au- Recognition (Elsevier) in 2018, Best Paper Diamond Award in IEEE ICME
tomated vehicles, and intersection control. 2017, Google Faculty Award in 2012, and Hong Kong Scholars Award in
2014. He is a Fellow of IAPR and IET.