0% found this document useful (0 votes)
21 views22 pages

Electronics 12 03089 v2

This paper presents DK_YOLOv5, an improved object detection model based on YOLOv5, specifically designed for low-light environments, particularly in underground mining scenarios. The model incorporates advanced image enhancement techniques and modifications to its architecture, resulting in a significant increase in detection accuracy, achieving an mAP0.5 of 71.9% on the Mine_Exdark dataset, which is 4.4% higher than YOLOv5. The study emphasizes the importance of integrating low-light image enhancement with object detection algorithms to address challenges posed by poor lighting conditions.

Uploaded by

y1357768791
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views22 pages

Electronics 12 03089 v2

This paper presents DK_YOLOv5, an improved object detection model based on YOLOv5, specifically designed for low-light environments, particularly in underground mining scenarios. The model incorporates advanced image enhancement techniques and modifications to its architecture, resulting in a significant increase in detection accuracy, achieving an mAP0.5 of 71.9% on the Mine_Exdark dataset, which is 4.4% higher than YOLOv5. The study emphasizes the importance of integrating low-light image enhancement with object detection algorithms to address challenges posed by poor lighting conditions.

Uploaded by

y1357768791
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

electronics

Article
Research on Improved YOLOv5 for Low-Light Environment
Object Detection
Jing Wang 1 , Peng Yang 1, *, Yuansheng Liu 1 , Duo Shang 2 , Xin Hui 2 , Jinhong Song 3 and Xuehui Chen 3

1 Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China;
20211083510912@buu.edu.cn (J.W.); yuansheng@buu.edu.cn (Y.L.)
2 China Academy of Industrial Internet, Beijing 100102, China; shangduo@china-aii.com (D.S.);
huixin@china-aii.com (X.H.)
3 Jiaojia Gold Mine of Shandong Gold Mining Co., Ltd., Laizhou 261441, China; song_jh@sd-gold.com (J.S.);
chenxuehui@sd-gold.com (X.C.)
* Correspondence: yangpeng@buu.edu.cn; Tel.: +86-139-1096-6690

Abstract: Object detection in low-light scenarios has been widely acknowledged as a significant
research area in the field of computer vision, presenting a challenging task. Aiming at the low
detection accuracy of mainstream single-stage object detection models in low-light scenarios, this
paper proposes a detection model called DK_YOLOv5 based on YOLOv5, specifically designed for
such scenarios. First, a low-light image enhancement algorithm with better results is selected to
generate enhanced images that achieve relatively better visual effects and amplify target features.
Second, the SPPF layer is improved to an R-SPPF module with faster inference speed and stronger
feature expression ability. Next, we replace the C3 module with the C2f module and incorporate an
attention mechanism to develop the C2f_SKA module, enabling richer gradient information flow
and reducing the impact of noise features. Finally, the model detection head is replaced with a
decoupled head suitable for the object detection task in this scenario to improve model performance.
Additionally, we expand the Exdark dataset to include low-light data of underground mine scenario
targets, named Mine_Exdark. Experimental results demonstrate that the proposed DK_YOLOv5
model achieves higher detection accuracy than other models in low-light scenarios, with an mAP0.5
of 71.9% on the Mine_Exdark dataset, which is 4.4% higher than that of YOLOv5.
Citation: Wang, J.; Yang, P.; Liu, Y.;
Shang, D.; Hui, X.; Song, J.; Chen, X. Keywords: low-light scenarios; object detection; image enhancement; YOLOv5; underground
Research on Improved YOLOv5 for mine scenarios
Low-Light Environment Object
Detection. Electronics 2023, 12, 3089.
https://doi.org/10.3390/
electronics12143089 1. Introduction

Academic Editor: Stefanos Kollias


In recent years, with the development of artificial intelligence technology, object
detection [1] has experienced significant development and has been widely applied as a
Received: 17 May 2023 fundamental task and a research hotspot in the field of computer vision. Object detection
Revised: 4 July 2023 has been applied in various domains of human life. For instance, in the field of autonomous
Accepted: 11 July 2023 driving [2], it is used to accurately recognize and locate pedestrians, vehicles, and other
Published: 16 July 2023
objects on the road. In the medical field [3], object detection assists in tasks such as lesion
recognition and tumor detection, contributing to diagnosis and healthcare. Additionally, it
plays a crucial role in domains such as facial recognition [4], agriculture [5], and industry [6].
Object detection can be classified into single-stage object detection algorithms, rep-
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
resented by YOLO (you only look once) [7] and SSD (single shot multibox detector) [8],
This article is an open access article
and two-stage object detection algorithms, represented by RCNN (Region CNN) [9], Fast
distributed under the terms and RCNN [10], and Faster RCNN [11], based on the detection approach. However, mainstream
conditions of the Creative Commons object detection algorithms are often developed and studied using high-quality image
Attribution (CC BY) license (https:// datasets with good lighting conditions. As a result, factors related to specific environments
creativecommons.org/licenses/by/ may not be adequately considered in practical application scenarios. For example, images
4.0/).

Electronics 2023, 12, 3089. https://doi.org/10.3390/electronics12143089 https://www.mdpi.com/journal/electronics


Electronics 2023, 12, 3089 2 of 22

may suffer from distortion, inadequate exposure, and other challenges that can affect the
performance of object detection algorithms.
Mineral resources serve as a crucial foundation and support for the development of
human society. In recent years, the integration of artificial intelligence technology with
the mining industry has emerged as a popular research direction [12] to maintain normal
mining operations and to ensure the safety of underground workers. However, the actual
environment of underground mines is highly complex, posing various challenges to object
detection algorithms, including issues such as image distortion and insufficient exposure
mentioned earlier, thereby affecting the performance of object detection algorithms in un-
derground mining settings. To address the challenges posed by low lighting conditions and
other environmental factors in underground mines, researchers have proposed hardware-
based measures to enhance detection performance, such as using high-power light sources,
thermal imaging technology, or cameras with higher sensitivity [13]. However, these meth-
ods currently incur high costs, making it difficult for them to be widely adopted. Therefore,
researchers are directing their attention toward proposing or improving object detection
algorithms to enhance their performance in low-light scenarios like underground mines.
To mitigate the impact of low-light conditions on object detection performance in
specific scenarios, some researchers have made improvements to the network structure
of object detection models. For instance, Xiao et al. [14] argued that mainstream object
detection models are designed for normal lighting conditions. Therefore, they proposed a
specialized feature pyramid network and contextual fusion network based on the RFB-Net
(receptive field block net) model proposed by Liu et al. [15] to improve object detection
performance in low-light scenarios. Li et al. [16], addressing the challenges of poor il-
lumination and complex environments in underground mines, presented an improved
faster R-CNN method for pedestrian detection. They replaced the traditional handcrafted
feature engineering approach with a deep convolutional neural network to automatically
extract features from images. Additionally, they incorporated feature fusion techniques
to enhance the detection performance of pedestrians in underground mines, considering
factors such as blurriness, occlusion, and small target sizes. However, these approaches,
which solely focus on enhancing the capability of extracting low-light image features at the
network structure level, may not be effective under extremely low-lighting conditions. It
becomes necessary to consider improving the quality of low-light images themselves, as
the illumination intensity significantly influences the performance of object detection tasks.
Low-light image enhancement has been a popular research direction among re-
searchers. In recent years, with the advancement of artificial intelligence technology, several
deep learning-based algorithms for low-light image enhancement have emerged. For ex-
ample, Wei et al. [17] proposed RetinexNet, a low-light image enhancement network based
on Retinex, and Jiang et al. [18] designed EnlightenGAN, a generative adversarial network
with self-attention mechanisms. According to the research conducted by Sobbahi et al. [19],
significant progress has been made in utilizing deep learning methods for low-light image
enhancement. They comprehensively compare the impact of various deep learning-based
low-light image enhancement algorithms on object detection tasks and summarize that the
integration of deep learning techniques with image enhancement can yield positive results
to a certain extent. Therefore, some researchers have shifted their focus toward integrating
low-light image enhancement algorithms with object detection models. Li et al. [20] ad-
dressed the issue of poor object detection performance in low-light scenarios by combining
the perception-sensitive bi-directional similarity (PSB) image enhancement algorithm with
the SSD object detection algorithm. This approach effectively improved the detection speed
and accuracy of low-light images. Zhang Mingzhen [21] proposed the dense-YOLO model
for pedestrian detection in low-light underground mine scenarios. The model enhanced
low-light images using algorithms such as gamma transformation and weighted logarith-
mic transformation, followed by global denoising. These enhancements were then used
as inputs to the object detection network, resulting in improved pedestrian detection in
low-light underground scenarios. Xu et al. [22] proposed a salient object detection model
Electronics 2023, 12, 3089 3 of 22

to address the challenges caused by the degradation of low-light images in object detection
tasks due to scene depth and environmental lighting. The principle of their approach
involves directly embedding the physical lighting model into the deep neural network,
resulting in an improved detection performance in low-light environments. In the context
of poor lighting conditions in foggy traffic scenes, Qiu et al. [23] proposed a module called
IDOD (AOD + SAIP) that combines the defogging algorithm AOD (an all-in-one network
for dehazing and beyond) with the image enhancement algorithm SAIP. They integrated
this image processing module, IDOD, with the YOLOv7 detection model for end-to-end
joint learning. Their research demonstrates that the integrated approach not only improves
the visual quality of the images, but also enhances the performance of object detection. In
order to overcome the challenges of insufficient illumination and high noise in low-light
environments, Cui et al. [24] proposed a multitask auto-encoding transformation model
(MAET). The model adopts a self-supervised approach to encode and decode realistic
illumination-degrading transformations, considering the physical noise model and image
signal processing (ISP), in order to learn the intrinsic visual structure. On the basis of the
above, it can be integrated with mainstream object detection architectures to achieve im-
proved object detection performance in low-light conditions. However, most of the research
mentioned above is to improve detection performance by enhancing image quality through
the improvement of image enhancement algorithms. Therefore, the challenge faced by
object detection models incorporating low-light image enhancement algorithms is that, in
addition to amplifying the desired target features during the enhancement process, there is
a possibility of amplifying noise characteristics as well. Therefore, this requires continuous
attention from researchers, and perhaps it is worth considering improvements in the object
detection algorithms to combine with the low-light image enhancement algorithms.
We propose a low-light adaptive object detection model called DK_YOLOv5, based on
the YOLOv5 model. First, the model takes low-light enhanced images as input to achieve
relatively better visual effects and amplify the object information and features to some
extent. Second, we improve the last layer SPPF (spatial pyramid pooling fast) module of
the model’s backbone network to the R-SPPF module, which offers faster inference speed
and stronger feature representation. Next, we replace all C3 structures in the backbone
network, including the first C3 structure in the neck, with C2f structures. This modification
aims to reduce the network’s depth while enriching the gradient information flow in
feature extraction. Additionally, the last three C3 modules in the neck are replaced with
the proposed C2f_SKA module, which incorporates the attention mechanism SKAttention
to fuse multi-scale feature information. This helps the detection model focus more on the
target regions during the learning process, thereby reducing the impact of noise introduced
by low-light image enhancement algorithms and improving the network’s learning ability.
Finally, we replace the detection head of the network with a decoupled head that is more
suitable for low-light detection tasks in such environments. This improvement enhances
the detection accuracy of the network. Experimental evaluations conducted on the Exdark
(exclusively dark) low-light dataset [25] and the Mine_Exdark low-light dataset, augmented
specifically for underground targets, demonstrate that the DK_YOLOv5 model outperforms
other models in low-light scenarios and performs well in underground object detection
tasks. This confirms the rationality and effectiveness of the proposed improvements and
innovations in this paper.
This paper will be structured into six sections. In the introductory Section 1, we
will provide a detailed overview of the research background in low-light environment
object detection, survey related low-light image enhancement algorithms, and the existing
object detection models. Additionally, we will summarize the relevant improvements of
our work. In Section 2, we will delve into the theoretical foundations starting with an
exploration of the YOLO series algorithms, followed by the selection of the baseline model
for our research. Section 3, the related work section, will discuss the current low-light
environment object detection datasets, some low-light image enhancement algorithms,
and their integration with object detection models to explore their impact on detection
Electronics 2023, 12, 3089 4 of 22

performance. Subsequently, in Section 4, we will provide detailed descriptions of the


proposed improvement modules, along with the corresponding network architecture and
low-light image enhancement diagram. Section 5 will focus on the experimental setup,
presenting and discussing the experimental results to derive meaningful conclusions.
Finally, in Section 6, we will summarize the contributions of our work, highlight the
remaining challenges, and outline future research directions.

2. Theoretical Background
2.1. YOLO Series Algorithms
The YOLO series algorithm has gained significant attention in the field of real-time
object detection due to its ability to strike a balance between inference speed and detection
accuracy. Since its introduction by J. Redmon et al. [26] at CVPR (IEEE Conference on
Computer Vision and Pattern Recognition) in 2016 with the YOLOv1 model, it has evolved
to the latest YOLOv8 version.
Currently, the widely used YOLO series models in real-time object detection in-
clude YOLOv5, YOLOv7 [27], and YOLOv8. YOLOv7 was published by the authors
of YOLOv4 [28] on ArXiv in July 2022. YOLOv5 and YOLOv8 are open-sourced and
maintained by Ultralytics. As of the writing of this paper, they have not yet published
academic papers on these models, but they continue to maintain and update YOLOv5
and YOLOv8. YOLOv5 has reached a relatively stable version, while YOLOv8 is still
undergoing continuous updates and improvements. In terms of computational efficiency
and accuracy, YOLOv8 has shown improvements over YOLOv5 and YOLOv7. However,
YOLOv5 has advantages in terms of training speed, inference speed, and memory usage,
especially in certain applications with mobile devices or limited resources.
Since the release of YOLOv5 v1.0 by Ultralytics in 2020, subsequent updates were
made, including version v6.1 in February 2022, v6.2 in August 2022, and v7.0 in Novem-
ber 2022. Considering that after the v6.2 version, the updates focused on expanding
classification and instance segmentation tasks based on object detection, YOLOv5 v6.1 is
considered a great choice for focusing on object detection tasks, but scientific justification is
still required to substantiate this claim.

2.2. Scientific Justification for YOLO Algorithm Selection


Although we have provided a brief overview of the strengths and weaknesses of the
YOLO series algorithms, the selection of a specific YOLO version still requires a scientific
justification. Therefore, in this section, we will use a table to present a comparative analysis
of various models based on initial parameters, computational complexity, and inference
speed, aiming to quantitatively assess their respective merits. The comparison is shown in
Table 1.

Table 1. Comparison between YOLO algorithms.

Model Parameters FLOPs FPS (bs = 1)


YOLOv5s 7.2 M 16.5 G 376
YOLOv6s 18.5 M 45.3 G 339
YOLOv7 36.9 M 104.7 G 110
YOLOv8s 11.2 M 28.6 G 311

We selected various YOLO models commonly used for evaluating and comparing
models. The comparison includes parameters, computational complexity (GFlops), and FPS
(frames per second), along with the mAP (mean average precision) accuracy on the Exdark
dataset as shown in Table 2. These parameters represent the model’s parameter count,
computational resource consumption, and frames per second during inference, combined
with the mAP metric to measure its accuracy on the Exdark dataset.
Electronics 2023, 12, 3089 5 of 22

Table 2. Detection results for enhanced datasets.

Dataset mAP 0.5:0.95 mAP 0.5


ExDark 0.381 0.642
RetinexNet 0.281 0.485
EnlightenGAN 0.379 0.645
Zero-DCE 0.347 0.596
Zero-DCE++ 0.353 0.627
SGZ 0.350 0.618

It can be observed that among these models, YOLOv5s achieves the highest FPS
while using the fewest parameters and computational resources. It also demonstrates
commendable accuracy. Therefore, considering the potential application of real-time object
detection in low-light underground scenarios, where a balance between accuracy and speed
is crucial, we chose the YOLOv5s model from the YOLOv5 v6.1 version as our baseline.

2.3. YOLOv5 Algorithm Model


YOLOv5 is one of the classic object detection algorithms in the YOLO series. Although
it has slightly lower detection accuracy compared to subsequent versions of YOLO, it
stands out for its unique advantages of fast inference speed, lightweight design, and quick
deployment in specific scenarios.
YOLOv5 v6.1 is a newer and more stable version. Its core idea is to transform the
object detection task into a regression problem, where the network predicts the bounding
boxes and classes of various objects. Numerous improvements have been made in model
architecture, data augmentation, and post-processing strategies, resulting in enhanced
performance and optimized detection efficiency. YOLOv5 provides different scales of
models, including YOLOv5n, YOLOv5s, and YOLOv5l. These versions share a similar
network structure, with the only difference being the depth of the network to meet the
accuracy and real-time requirements of different application scenarios.
Among them, YOLOv5s is the model that strikes the best balance between detection
accuracy and real-time performance. Its network structure, as shown in Figure 1, consists
of three main parts: the backbone network, neck, and detection head.
The YOLOv5 algorithm incorporates the Mosaic data augmentation at its input, which
combines four different images together to enrich the training dataset. Moreover, to enable
the model to adapt to various object detection datasets, the authors employ an algorithm to
automatically compute the optimal anchor box values for each dataset. Additionally, the
input images are adaptively scaled to the desired size, enhancing the inference speed of
the model.
YOLOv5 utilizes Darknet53 as its backbone network. The C3 module is employed to
replace the CSP (cross stage partial network) module used in earlier versions, which plays a
crucial role in reducing the number of network parameters and improving both training and
inference speed. Furthermore, YOLOv5 introduces the SPPF module to replace the original
SPP module, maintaining the same level of effectiveness while further enhancing speed.
In the neck network, YOLOv5 adopts a PAN (path aggregation network) structure to
fuse the multi-scale feature maps extracted from the backbone network. This results in a
series of feature maps with varying scales and semantic information, thereby increasing
the network’s ability to express features.
The detection head module of YOLOv5 is primarily responsible for multi-scale object
detection based on the multi-scale feature maps obtained from the neck network. Currently,
it still utilizes a coupled head. Moreover, compared to earlier versions of YOLO, YOLOv5
employs the GIoU as the loss function for bounding boxes, optimizing their positions
and sizes.
Electronics 2023, 12,
Electronics 2023, 12, 3089
3089 66 of
of2222

Backbone Neck Head


Coupled
CBS C3_ST Concat C3_SF Detect
Head

C3_ST CBS UpSample CBS

CBS Concat
CBS C3_ST

Coupled
C3_SF C3_SF Detect
Head
CBS CBS
Concat CBS

C3_ST
UpSample Concat
Input

Coupled
SPPF CBS C3_SF Detect
Head

BottleNeck_ST
SPPF = CBS MaxPool2d MaxPool2d MaxPool2d
CBS CBS +
Concat

BottleNeck_SF
CBS
CBS CBS

C3_SF = CBS BottleNeck_SF Concat CBS


CBS = Conv BN SiLU
CBS

C3_ST = CBS BottleNeck_ST Concat CBS

CBS

Figure 1.
Figure 1. YOLOv5
YOLOv5 network
networkmodel
modeldiagram.
diagram.

The detection
3. Related Work head module of YOLOv5 is primarily responsible for multi-scale object
detection
3.1. based
About the on the Object
Low-Light multi-scale feature
Detection maps obtained from the neck network. Cur-
Datasets
rently, it still utilizes a coupled head. Moreover, compared to
In the field of object detection under low-light conditions, earlier
several versions of
well-known YOLO,
datasets
YOLOv5
have beenemploys the GIoU
widely applied, as the loss
including functionExdark,
NightOwls, for bounding boxes,and
CityPersons, optimizing
DarkFace.their po-
These
sitions and sizes.
datasets share the common goal of providing image samples under low-light environments
for training and evaluating low-light object detection algorithms.
3. Related Work
The NightOwls dataset primarily focuses on pedestrian detection at night, capturing
diverse images
3.1. About in urban Object
the Low-Light nighttime scenes.Datasets
Detection It encompasses various object categories such as
pedestrians, vehicles, and bicycles, along with
In the field of object detection under low-light complexconditions,
background scenarios.
several well-known da-
tasets have been widely applied, including NightOwls, Exdark, CityPersons,Itsand
The Exdark dataset is specifically designed for low-light object detection. strength
Dark-
lies in providing diverse low-light images covering 12 object categories
Face. These datasets share the common goal of providing image samples under low-lightand a variety of
dark scenes, effectively simulating real-world low-light conditions. Its diverse low-light
environments for training and evaluating low-light object detection algorithms.
images and multiple object categories make it a significant benchmark for studying object
The NightOwls dataset primarily focuses on pedestrian detection at night, capturing
detection algorithms under low-light conditions.
diverse images in urban nighttime scenes. It encompasses various object categories such
The CityPersons dataset is dedicated to pedestrian detection in urban scenes. It
as pedestrians, vehicles, and bicycles, along with complex background scenarios.
offers challenging urban environment images, including different lighting conditions
The Exdark dataset is specifically designed for low-light object detection. Its strength
during the day and night. The aim of the CityPersons dataset is to encourage researchers
lies in providing diverse low-light images covering 12 object categories and a variety of
to develop pedestrian detection algorithms suitable for real urban scenarios, providing
dark scenes,
detailed effectively
bounding simulating real-world low-light conditions. Its diverse low-light
box annotations.
imagesThe DarkFace dataset iscategories
and multiple object make
specifically it a significant
tailored benchmark
for face detection for studying
in low-light object
environ-
detection
ments. It algorithms under
includes face low-light
images conditions.
captured under dim lighting conditions and serves as
The CityPersons dataset is dedicated
a benchmark for evaluating face detection to algorithms
pedestrian detection in urbansituations.
under low-light scenes. It offers
The
challenging urban environment images, including different lighting conditions during the
day and night. The aim of the CityPersons dataset is to encourage researchers to develop
Electronics 2023, 12, 3089 7 of 22

dataset’s distinctive feature is its inclusion of complex lighting variations found in real-
world scenarios, enabling algorithms to accurately detect and localize faces under low-
light conditions.
Although these datasets provide samples in low-light conditions, they focus on dif-
ferent application scenarios, such as pedestrian detection or face detection. Therefore, to
comprehensively improve the detection performance in low-light environments, it is prefer-
able to select datasets that provide more diverse scenes and object classes for evaluation.
Based on the aforementioned reasons, we have chosen the Exdark dataset as our
primary evaluation dataset. The Exdark dataset not only provides a rich collection of
samples under low-light conditions but also encompasses diverse object categories and
scenes. It covers 12 object categories and a variety of low-light scenes, allowing for a
comprehensive assessment of algorithm robustness and accuracy in low-light conditions.
Furthermore, due to the diversity of object categories and scenes in the Exdark dataset, we
have the opportunity to expand the dataset by adding our own application-specific image
data. This enables us to incorporate the desired application scenarios into the evaluation
process, thereby enhancing the practicality and relevance of low-light object detection
algorithms. Therefore, based on its advantages, we have chosen it as the primary dataset
for our research.

3.2. About the Low-Light Image Enhancement Algorithms


In selecting low-light image enhancement algorithms, we considered several key
factors. First, through literature review and research, we identified widely cited and
representative algorithms for low-light image enhancement, including MBLLEN [29],
RetinexNet, KinD [30], Zero-DCE (zero-reference deep curve estimation) [31], Enlighten-
GAN, Zero-DCE++ [32], URetinexNet4 [33], SCI (self-calibrated illumination) [34], and
SGZ (semantic-guided zero-shot) [35]. These algorithms have received significant attention
in both academia and industry.
Second, we compared the technical approaches of these algorithms and categorized
them into different techniques based on classical Retinex theory, zero-shot learning, and
generative adversarial network (GAN) learning. Given the importance of diverse algorithm
selection, we aimed to cover these different technical approaches.
Therefore, we chose several representative algorithms to combine with object detection
algorithms and observe their impact on object detection performance. The specific algo-
rithms selected include RetinexNet, which is based on classical Retinex theory, Zero-DCE,
Zero-DCE++, and SGZ, which are based on zero-shot learning, as well as EnlightenGAN,
which utilizes GAN learning. These algorithms have achieved significant results in previous
research and demonstrate diverse technical strategies and practical application potential.
By combining these low-light image enhancement algorithms with object detection models,
we can make a fair comparison of the impact of different enhancement algorithms based
on different approaches, avoiding biases resulting from selecting algorithms with the same
technical approach.

3.3. Incorporating Low-Light Image Enhancement Algorithm into YOLOv5


In recent years, researchers have discovered the advantages of using low-light image
enhancement algorithms to address object detection in low-light environments, including
low cost, applicability to various environments, and high flexibility. However, it is essential
to consider the potential negative effects of such algorithms as well. In order to investigate
the impact of low-light image enhancement algorithms on object detection algorithms,
we selected several algorithms, namely EnlightenGAN, SGZ, RetinexNet, Zero-DCE, and
Zero-DCE++, to enhance the ExDark dataset. In Section 5, we provide a detailed description
of the ExDark dataset and its data partition. Here, we first trained the YOLOv5 model on
the original ExDark training set (4800 images) and validated it on the original ExDark vali-
dation set (2563 images) to obtain the detection performance of the YOLOv5 model on this
dataset. Next, to explore the impact of low-light image enhancement algorithms on object
DCE, and Zero-DCE++, to enhance the ExDark dataset. In Section 5, we provide a detailed
description of the ExDark dataset and its data partition. Here, we first trained the YOLOv5
model on the original ExDark training set (4800 images) and validated it on the original
Electronics 2023, 12, 3089 ExDark validation set (2563 images) to obtain the detection performance of the8 YOLOv5 of 22
model on this dataset. Next, to explore the impact of low-light image enhancement algo-
rithms on object detection performance, we applied these several low-light image en-
hancement
detection algorithmswe
performance, toapplied
the ExDark
thesevalidation set, resulting
several low-light in enhancedalgorithms
image enhancement validation sets
corresponding to each algorithm. Figure 2 presents examples of the enhancement
to the ExDark validation set, resulting in enhanced validation sets corresponding to each effects
of these algorithms
algorithm. on the validation
Figure 2 presents examples ofset.the
From Figure 2, we
enhancement can of
effects observe that enhancement
these algorithms on
algorithms
the validationbased on Retinex
set. From theory,
Figure 2, we canzero-shot
observe thatlearning methods,
enhancement and generative
algorithms based on adver-
Retinex theory, zero-shot learning methods, and generative adversarial network
sarial network learning methods exhibit different effects in image enhancement. Clearly, learning
methods exhibitalgorithms
enhancement different effects
basedinon
image enhancement.
the latter two methodsClearly,
canenhancement algorithms
achieve relatively better vis-
based on the latter two methods can achieve relatively better visual effects.
ual effects.

Figure2.2.Effects
Figure Effectsofofdifferent
different enhancement
enhancement algorithms.
algorithms.

As
Asmentioned
mentionedearlier,
earlier,wewetrained thethe
trained YOLOv5
YOLOv5 model on the
model on original ExDark
the original training
ExDark training
set and applied various image enhancement algorithms to the
set and applied various image enhancement algorithms to the ExDark validation ExDark validation set toset to
create corresponding enhanced validation sets. Subsequently, we used the trained model to
create corresponding enhanced validation sets. Subsequently, we used the trained model
perform detections on both the original ExDark validation set and each enhanced validation
to perform detections on both the original ExDark validation set and each enhanced vali-
set to obtain the mAP values for each set when evaluated with the original YOLOv5 model.
dation set to obtain the mAP values for each set when evaluated with the original YOLOv5
The two columns in Table 2 represent mAP 0.5 and mAP 0.5:0.95, respectively. They measure
model.
the meanThe two columns
average precision in Tableat2an
(mAP) represent mAP of
IoU threshold 0.50.5
andandmAP 0.5:0.95,
across a rangerespectively.
of IoU
thresholds from 0.5 to 0.95, serving as metrics to evaluate the object detectionofperformance
They measure the mean average precision (mAP) at an IoU threshold 0.5 and across a
range
after of IoU thresholds
incorporating from 0.5
the low-light to 0.95,
image serving asalgorithms.
enhancement metrics to evaluate the object
The complete resultsdetection
are
performance
presented after2.incorporating the low-light image enhancement algorithms. The com-
in Table
pleteBased
results
on are
the presented
results shown in Table 2. 2 and Table 2, it can be observed that although
in Figure
BasedYOLOv5
combining on the results shown in
with low-light Figure
image 2 and Table
enhancement 2, it can be
algorithms canobserved that although
achieve relatively
better visual effects, it did not yield the expected improvement in terms
combining YOLOv5 with low-light image enhancement algorithms can achieve relatively of object detection
performance. Most ofitthe
better visual effects, didselected
not yieldenhancement
the expected algorithms
improvement did not contribute
in terms to andetection
of object im-
provement
performance. in detection
Most ofperformance,
the selectedand even the EnlightenGAN
enhancement algorithms did algorithm only achieved
not contribute to an im-
aprovement
marginal increase of 0.3% performance,
in detection in mAP. This indicates
and even thatthe
the EnlightenGAN
noise introduced algorithm
during the only
enhancement process by low-light image enhancement algorithms cannot
achieved a marginal increase of 0.3% in mAP. This indicates that the noise introduced be overlooked.
Therefore, for the YOLOv5 model, while integrating low-light image enhancement algo-
during the enhancement process by low-light image enhancement algorithms cannot be
rithms to obtain enhanced images that achieve relatively better visual effects, there is a
overlooked. Therefore, for the YOLOv5 model, while integrating low-light image en-
need for model improvements to enhance its ability to extract meaningful features and
hancement algorithms to obtain enhanced images that achieve relatively better visual
mitigate the impact of noise on detection performance, thus enhancing the overall object
detection performance.
Electronics 2023, 12, 3089 9 of 22

Electronics 2023, 12, 3089 9 of 22


effects, there is a need for model improvements to enhance its ability to extract meaningful
features and mitigate the impact of noise on detection performance, thus enhancing the
overall object detection performance.
Therefore,ititisishighly
Therefore, highlynecessary
necessarytoto analyze
analyze thethe first-layer
first-layer feature
feature maps
maps of various
of various im-
images
ages input
input to the
to the models,
models, using
using the the EnlightenGAN
EnlightenGAN enhancement
enhancement algorithm
algorithm as example,
as an an exam-
ple, which
which bringsbrings marginal
marginal improvement
improvement in in objectdetection
object detectionperformance.
performance.Specifically,
Specifically, we
we
compare the first-layer feature map of the original image input to the YOLOv5
compare the first-layer feature map of the original image input to the YOLOv5 model, the model, the
first-layer feature
first-layer feature map
map ofof the
the image
image input
input enhanced
enhanced by by the
the EnlightenGAN
EnlightenGAN algorithm
algorithm to to
the YOLOv5 model, and the first-layer feature map of the image
the YOLOv5 model, and the first-layer feature map of the image input enhanced by input enhanced by the
the
EnlightenGAN algorithm
EnlightenGAN algorithm to to the
the DK_YOLOv5
DK_YOLOv5 model. model. TheThe corresponding
corresponding feature
feature maps
maps are
are
shown in Figure
shown in Figure 3. 3.

(a) (b) (c)


Figure
Figure 3.
3. Comparison
Comparison of
of feature
feature maps.
maps. (a)
(a) Original
Original image
image input
input to
to the
the YOLOv5
YOLOv5 model;
model; (b) Image
(b) Image
input enhanced by the EnlightenGAN algorithm and input to the YOLOv5 model; (c) Image
input enhanced by the EnlightenGAN algorithm and input to the YOLOv5 model; (c) Image input input
enhanced by the EnlightenGAN algorithm and input to the DK_YOLOv5 model.
enhanced by the EnlightenGAN algorithm and input to the DK_YOLOv5 model.

As
As shown
shown inin Figure
Figure 3,3, from
from left
left to
to right,
right, we
we have
have the
the first-layer
first-layer feature
feature maps
maps of the
of the
original
original image
image input
input to
to the
the YOLOv5
YOLOv5 model,
model, the
the image
image enhanced
enhanced by by the
the EnlightenGAN
EnlightenGAN
algorithm
algorithm andand input to to the
the YOLOv5
YOLOv5 model,model,and
andthetheimage
imageenhanced
enhancedbybythe the Enlight-
Enlighten-
enGAN algorithm
GAN algorithm andand input
input to the
to the DK_YOLOv5
DK_YOLOv5 model.
model. Analyzing
Analyzing the comparison
the comparison betweenbe-
tween
FigureFigure 3a,b,
3a,b, we canwe can observe
observe that 3b
that Figure Figure 3b represents
represents the first-layer
the first-layer feature
feature map map
of the of
orig-
the original YOLOv5 model after image enhancement. It can be seen
inal YOLOv5 model after image enhancement. It can be seen that Figure 3b contains richerthat Figure 3b con-
tains
targetricher target
features features
around aroundcompared
the person the persontocompared
Figure 3a.toAdditionally,
Figure 3a. Additionally,
the background the
background
and other noiseandfeatures
other noise features
in Figure in Figure
3b are 3b are
amplified andamplified
become more and become moreSimi-
pronounced. pro-
nounced. Similarly,
larly, comparing comparing
Figure 3b withFigure
Figure3b 3c,with
bothFigure
Figure3c, both
3b,c Figureimage
undergo 3b,c undergo image
enhancement.
However, Figure
enhancement. 3c represents
However, Figurethe first-layer feature
3c represents map input
the first-layer to our
feature map proposed
input tomodel.
our pro- It
can be observed that, compared to Figure 3b,c maintains the amplified
posed model. It can be observed that, compared to Figure 3b,c maintains the amplified target features while
reducing
target the background
features while reducingnoisethefeatures to somenoise
background extent.
features to some extent.

4. Methods
4. Methods
We employed
We employedYOLOv5
YOLOv5asasthe thedetection
detection model
model and
and found
found that
that its its performance
performance in
in di-
directly detecting objects in low-light conditions was subpar. Even when using low-light
rectly detecting objects in low-light conditions was subpar. Even when using low-light
enhanced image
enhanced image data
data as
as input
input to
to the
the network,
network, the
the detection
detection performance
performance showed
showed limited
limited
improvement and, in some cases, even declined. To ensure that the enhanced images
improvement and, in some cases, even declined. To ensure that the enhanced images
achieve relatively better visual effects, while effectively improving the object detection
achieve relatively better visual effects, while effectively improving the object detection
performance, it is crucial to improve the network architecture of the YOLOv5 detection
performance, it is crucial to improve the network architecture of the YOLOv5 detection
model. This enhancement aims to improve its feature extraction capabilities for low-light
model. This enhancement aims to improve its feature extraction capabilities for low-light
conditions and noise-enhanced data, reducing the impact of insufficient lighting and noisy
conditions and noise-enhanced data, reducing the impact of insufficient lighting and noisy
features on the object detection performance, thereby meeting the requirements for object
features on the object detection performance, thereby meeting the requirements for object
detection in low-light scenarios such as underground mine environments.
detection in low-light scenarios such as underground mine environments.
4.1. SKAttention-Based C2f_SKA Module
4.1. SKAttention-Based C2f_SKA Module
YOLOv5 network utilizes multiple C3 structures to increase network depth and recep-
YOLOv5
tive field, network
as shown utilizes
in Figure 1. multiple C3 structures
Each C3 structure to increase
consists network depth
of three standard and re-
convolutional
ceptive field, as shown in Figure 1. Each C3 structure consists of three standard
modules and multiple residual modules. However, in low-light conditions, the input image convolu-
tional
qualitymodules
is poor, and
and multiple residual
the C3 module modules.
fails However,
to extract effectiveinlow-level
low-lightfeatures
conditions, the in-
and reduce
put
the image
impactquality is poor, lighting
of insufficient and the C3andmodule fails to extract
noisy features, effectiveaffecting
consequently low-levelthe
features
object
detection performance.
Electronics 2023, 12, 3089 10 of 22

Electronics 2023, 12, 3089 and reduce the impact of insufficient lighting and noisy features, consequently10affecting of 22
the object detection performance.
To address these issues, this paper draws inspiration from the design principles of
the C2f structure
To address andissues,
these proposes a feature
this paper learning
draws module
inspiration from called C2f_SKA
the design based
principles on the
of the
attention mechanism of SKNet [36]. The C2f_SKA module enhances the existing
C2f structure and proposes a feature learning module called C2f_SKA based on the attention convolu-
tional branchofinSKNet
mechanism the C2f module
[36]. by incorporating
The C2f_SKA a channel the
module enhances attention
existingmechanism branch.
convolutional
It performs
branch multi-scale
in the C2f module fusion with the features
by incorporating learnedattention
a channel by the residual structure,
mechanism allowing
branch. It
performs multi-scale fusion with the features learned by the residual structure,
for the acquisition of more informative low-level features and gradient flow informationallowing for
the acquisition
while reducingofthemore informative
network’s low-level
depth. This features and gradient
improvement flowthe
increases information
network’swhile
feature
reducingcapability.
learning the network’s
Thedepth. Thisof
structure improvement
the proposed increases
C2f_SKA themodule
network’s feature learning
is illustrated in Figure
capability. The structure of the proposed C2f_SKA module is illustrated in Figure 4.
4.

SKAttention

B×C/2×H×W BottleNeck

Conv1
BottleNeck

Conv1 Spilt BottleNeck Cat Conv2

B×C×H×W B×(N+3)C/2×H×W
Conv1

B×C/2×H×W

4. Structure
Figure 4.
Figure Structureof
ofthe
theC2f_SKA
C2f_SKAmodule.
module.
The C2f_SKA module first splits the input convolution branch into two identical
The C2f_SKA module first splits the input convolution branch into two identical
branches. One branch is fed into the residual module for residual feature learning and
branches.
multi-scaleOne branch
feature is fed
fusion. into the residual
Additionally, a new module for residual
feature learning branchfeature
with learning
attention and
multi-scale feature fusion. Additionally, a new feature learning branch
mechanism is introduced to enable the network to extract low-level features of the target with attention
mechanism is introduced
region effectively to enable
while reducing the network
the influence to extract
of noise. low-level
This branch features
is then fused of thethe
with target
region effectively branch
other convolution while reducing the influence
and the residual ofat
features noise. Thisscales,
multiple branch is thensignificantly
thereby fused with the
other convolution
enhancing branch
the detection and the of
capability residual features at multiple scales, thereby significantly
the network.
enhancing the4,detection
In Figure the size of capability the Conv1 section is denoted as B × C × H × W,
of theinnetwork.
the feature maps
where InBFigure
represents the batch
4, the size ofsize,
theC represents
feature mapsthe number
in theofConv1
channels, H represents
section the as
is denoted
B×C × H
height, × WW, where
and B the
represents width. After
represents the split
the batch C represents
size,operation, the feature maps ofofthe
the number chan-
two convolution branches are represented as B × C/2 × H ×
nels, H represents the height, and W represents the width. After the split operation,
W. One convolution branch
goes through N residual module branches and one SKAttention branch,Bwhich performs
the feature maps of the two convolution branches are represented as ×C / 2× H ×W . One
attention, and is concatenated with the other convolution branch to form the input of Conv2.
convolution branch goes through N residual module branches and one SKAttention
Therefore, the size of the feature maps in the Conv2 section is B × ( N + 3) × C/2 × H × W.
branch, which performs attention, and is concatenated with the other convolution branch
The SKAttention part takes the output X of Conv1 as input and splits it into multiple
Electronics 2023, 12, 3089 to form the
branches input
with of Conv2.
different Therefore,
convolution thesizes,
kernel size of
asthe feature
shown maps5,inwhere
in Figure the Conv2
U1 and section
11U2of 22is
B × (two
are N +example
3) × C / 2 × H ×W .
branches.
The SKAttention part takes the output X of Conv1 as input and splits it into multi-
ple branches with different convolution kernel sizes, as shown in Figure 5, where U 1 and
U 2 are two example branches.
U1

c×1×1
X
Split
+ U
z×1×1
C× h × w
C× h × w
U2 c×1×1 c×1×1

A + A1 A2

Figure 5. Structure of the C2f_SKA module.


Figure 5. Structure of the C2f_SKA module.

Then, the feature maps of U 1 and U 2 are element-wise summed to obtain the fea-
ture map U . Subsequently, a global average pooling is applied to U to generate chan-
nel-wise statistics. The calculation formula is as follows:
Electronics 2023, 12, 3089 11 of 22

Then, the feature maps of U1 and U2 are element-wise summed to obtain the feature
map U. Subsequently, a global average pooling is applied to U to generate channel-wise
statistics. The calculation formula is as follows:
H W
1
H × W i∑ ∑ Uc(i, j)
S= (1)
=1 j =1

where H and W represent the width and height of the feature map, respectively, and i
and j indicate their corresponding spatial feature positions. By utilizing this equation, the
channel-wise statistics of U, denoted as S, are calculated. Following that, a fully connected
layer is employed to map the original c-dimensional information to a z-dimensional space.
It is calculated using the following equation:

z(d × 1) = F f c(s) = δ( β(W (d × c)s)) (2)

here, δ denotes the ReLU activation function, β represents batch normalization, d corre-
sponds to the dimension of features after the fully connected layer, and W has a dimension
of d × c, while z has a dimension of d × 1. Following this, z is linearly transformed back to
the original c-dimensional space, and a softmax normalization is applied to obtain weight
vectors for each channel. These weight vectors are multiplied with their respective U1 and
U2, resulting in the multiplied modules A1 and A2. The information fusion of A1 and A2
yields the final module A, which encompasses multi-scale information.
In response to the challenges presented by low-light environments, such as under-
ground mine scenarios, we propose a solution by replacing the C3 modules in the backbone
network and the first C3 module in the neck part of YOLOv5 with C2f modules. This im-
provement not only reduces the network’s depth but also enhances the backbone network’s
capability to extract gradient information flow. To further improve the network’s focus on
crucial information in the target region and mitigate the impact of noise, the last three C3
modules in the neck part are replaced with the newly proposed C2f_SKA modules. Finally,
adjustments are made to the fusion layers of concatenation at the 19th and 22nd layers of
the network, allowing them to, respectively, concatenate with the first C2f layer in the neck
part and the last layer of the backbone network.

4.2. Improving the SPPF Module


YOLOv5 introduces a novel structure called SPPF based on SPP [37]. SPPF retains the
ability of the SPP structure to fuse local and global features while improving the model’s
speed. The convolutional module in SPPF utilizes SiLU as the activation function, defined
by the following equation:
x
silu( x ) = x × sigmoid( x ) = (3)
1 + e− x

In the equation, x represents the input of the function, and sigmoid( x ) represents the
sigmoid activation function with x as its input.
The activation function, SiLU, possesses several desirable properties, including being
unbounded with a lower bound and exhibiting smooth non-monotonic behavior, making it
perform well in deeper networks. However, SiLU also has some clear disadvantages. In
extreme cases of excessively large or small data, it may suffer from gradient vanishing or ex-
ploding issues. Additionally, the trade-off between the computational overhead introduced
by SiLU and the corresponding performance improvement needs to be further balanced.
When using images from low-light environments, such as those encountered in un-
derground mine scenarios, as input to the network, extreme pixel values are likely to
be encountered. Additionally, the inference speed of the model needs to be taken into
consideration. Therefore, to address this issue, this paper replaces the activation function
in the convolutional modules of SPPF with ReLU, defined by the following equation:
duced by SiLU and the corresponding performance improvement needs to be further bal-
anced.
When using images from low-light environments, such as those encountered in un-
derground mine scenarios, as input to the network, extreme pixel values are likely to be
Electronics 2023, 12, 3089 encountered. Additionally, the inference speed of the model needs to be taken12 into
of 22 con-
sideration. Therefore, to address this issue, this paper replaces the activation function in
the convolutional modules of SPPF with ReLU, defined by the following equation:
> 0
 x, if x )
f ( xx,) =i f x > 0  = max(0, x)
(
(4)
f (x) = 0, if x ≤ 0= max(0, x ) (4)
0, i f x ≤ 0
here, x represents the input to the function.
here, x represents the input to the function.
TheReLU
The ReLUactivation
activation function
function cancan
be be regarded
regarded as aas a piecewise
piecewise function
function and exhibits
and exhibits
favorablecomputational
favorable computational properties
properties during
during neuralneural
networknetwork training.
training. It further
It further enhancesenhances
the
the inference
inference speedspeed of YOLOv5
of YOLOv5 comparedcompared to theSiLU
to the original original SiLU function.
activation activationAdditionally,
function. Addi-
tionally,
in the casein
ofthe case
input of input
images fromimages from
low-light low-lightsuch
conditions conditions such as mine
as underground underground
scenarios,mine
scenarios,
extreme extreme
values values
are likely are likely
to occur. to occur.the
By replacing BySiLU
replacing the SiLU
activation activation
function function
with ReLU
with
in the ReLU in the SPPF
SPPF module, module, approach
the proposed the proposed approach
significantly significantly
mitigates mitigates
the issues the issues
of gradient
explosion
of gradientorexplosion
vanishingor gradients
vanishing caused by extreme
gradients causedinputs. This improvement
by extreme ensures
inputs. This improvement
the retention of the SPPF module’s capability to integrate local and
ensures the retention of the SPPF module’s capability to integrate local and globalglobal features. The fea-
modified SPPF module is referred to as R-SPPF, as illustrated in Figure
tures. The modified SPPF module is referred to as R-SPPF, as illustrated in Figure 6.6.

R-SPPF = Conv BN ReLU Concat Conv BN ReLU

MaxPool2d

MaxPool2d

MaxPool2d

Figure6.6.Structure
Figure Structureofof the
the R-SPPF
R-SPPF module.
module.

4.3.
4.3.Introducing
IntroducingDecoupled
DecoupledHead
Head
The
Thedetection
detectionhead
head used
usedin in
thethe
YOLOv5
YOLOv5 algorithm
algorithmadopts
adopts a coupled headhead
a coupled structure,
structure,
where the weight parameters are shared between the classification
where the weight parameters are shared between the classification and regressionand regression taskstasks
in the object detection task. However, it is evident that these two tasks conflict with each
in the object detection task. However, it is evident that these two tasks conflict with each
other. Therefore, inspired by the drawbacks of the coupled head structure, Ge et al. [38]
other. Therefore, inspired by the drawbacks of the coupled head structure, Ge et al. [38]
proposed a decoupled detection head, where the coupled head structure is redesigned
proposed a decoupled detection head, where the coupled head structure is redesigned
into a decoupled head structure. In the decoupled head structure, the classification and
into a decoupled
regression tasks arehead structure.
separately In theusing
processed decoupled head
different structure,
branch heads.the
Thisclassification
decoupling and
regression tasks are separately processed using different branch heads.
approach effectively enhances the detection performance of the object detection network. This decoupling
approach
The effectively
structures enhances
of the coupled head theand
detection performance
the decoupled head areof illustrated
the object in
detection
Figure 7.network.
The From
structures of the coupled
the diagram, it can be head
observedandthat
the the
decoupled
coupled head are illustrated
head structure utilizesinthe
Figure
same 7.
weight parameters to process both the classification and regression tasks using the feature
map from the upper layers. The decoupled head structure separates the classification and
regression tasks, employing different network branches with distinct weight parameters for
each task. Furthermore, within the regression task branch, it further divides the tasks into
regressing the target position information and regressing the confidence. This approach
improves the detection performance of the model.
Based on the analysis above and considering the issue of low detection accuracy
in underground and low-light conditions, it is more reasonable to use the decoupled
head structure. Therefore, in this paper, we replaced the coupled head structure in the
YOLOv5 algorithm with the decoupled head structure, effectively enhancing the detection
performance of the network.
Electronics 2023, 12, 3089 13 of 22
Electronics 2023, 12, 3089 13 of 22

Coupled Head

1×1 Conv Feature

Decoupled Head
3×3 Conv 3×3 Conv 1×1 Conv Cls

Reg
1×1 Conv

3×3 Conv 3×3 Conv

1×1 Conv IoU

1×1 Conv

Figure
Figure 7.
7. Coupled head and
Coupled head anddecoupled
decoupledhead.
head.

4.4. Overall
From theNetwork Structure
diagram, it can be observed that the coupled head structure utilizes the
sameWe weight parameters to process
propose an object detection bothcalled
model the classification
DK_YOLOv5, and regression
which is based tasks using the
on YOLOv5
Electronics 2023, 12, 3089 feature map from
and designed the upperenvironments.
for low-light layers. The decoupled
The overallhead structure
network separates
structure 14 ofin22
the classifica-
is illustrated
Figure
tion and8.regression tasks, employing different network branches with distinct weight pa-
rameters for each task. Furthermore, within the regression task branch, it further divides
the tasks into regressing the target position information and regressing the confidence.
Backbone Neck Head
This approach improves the detection performance of the model.
Based on the analysis above and considering the issue of low detection accuracy in
Decoupled
CBS CBS Concat
underground and low-light conditions,C2f_SKA
it is more reasonable
Head toDetect
use the decoupled head
structure. Therefore, in this paper, we replaced the coupled head structure in the YOLOv5
C2f UpSample CBS
algorithm with the decoupled head structure, effectively enhancing the detection perfor-
mance CBSof the network. CBS
Concat

4.4. Overall
C2f Network Structure
C2f Decoupled
C2f_SKA Detect
Head
We propose an object detection model called DK_YOLOv5, which is based on
CBS Concat
Enhancement YOLOv5 and designed for low-light environments. CBS The overall network structure is illus-
trated in C2f
Figure 8. UpSample
In this network, the SPPF module Concat in the last layer of the backbone network is im-
provedCBS to R-SPPF module. CBS This modification enhances the model’s inference speed while
achieving the fusion of local and globalC2f_SKA features, thereby Decoupled
enhancing
Detectthe feature represen-
Head
tation capability.
C2f Furthermore,
R-SPPF all C3 modules in the backbone network and the first C3
module in the neck are replaced with C2f modules. This not only reduces the network
depth8.but
Figure also network
Overall enrichesstructure
the extraction of gradient information in the backbone network.
of DK_YOLOv5.
Figure 8. Overall network structure of DK_YOLOv5.
Additionally, the last three layers of C3 modules are replaced with C2f_SKA modules
proposed
In thisinnetwork,
Inregard this paper.
to the the By incorporating
enhancement
SPPF module part in the
in the the
last attention
overall
layer mechanism,
ofnetwork SKAttention,
architecture,
the backbone network it improved
is these
can be any
low-light
to R-SPPFimage
modules integrate
module.enhancement
multi-scale
This algorithm.
feature
modification Based
enhances theon
information the
andSection
model’s focus 3 and
more
inference thethe
on
speed data
while and conclu-
information
achieving fea-
sions
tures presented
of theoftarget
the fusion in area,
local the global
and Section
thereby 5,reducing
we have
features, chosen
the
thereby impact to of
enhancing incorporate
noise and the
the feature EnlightenGAN
enhancing en-
the network’s
representation capa-
hancement
ability algorithm
bility. Furthermore,
to filter allinC3this
out relevant context,
modules in as
thedepicted
information. Finally,innetwork
backbone Figure
the 9.
and the
detection firstof
head C3the
module in the
network is re-
neck are replaced with C2f modules. This not only reduces the network
placed with a decoupled head that is more suitable for object detection tasks. By separately depth but also en-
riches thethe
learning extraction of gradient
classification and information in the backbone
regression tasks, the network’snetwork. Additionally,
detection the last can
performance
be effectively improved.
CBS CBS Concat

C2f C2f Decoupled


C2f_SKA Detect
Head
Electronics 2023, 12, 3089 CBS Concat 14 of 22
Enhancement CBS

C2f UpSample
Concat
three layers of C3 modules are replaced with C2f_SKA modules proposed in this paper. By
CBS CBS
incorporating the attention mechanism, SKAttention, these modules integrate multi-scale
Decoupled
C2f_SKA Detect
feature information and focus more on the information Headfeatures of the target area, thereby
C2f R-SPPF
reducing the impact of noise and enhancing the network’s ability to filter out relevant
information. Finally, the detection head of the network is replaced with a decoupled head
Figure that
8. Overall
is morenetwork structure
suitable of DK_YOLOv5.
for object detection tasks. By separately learning the classification and
regression tasks, the network’s detection performance can be effectively improved.
In regard to thetoenhancement
In regard the enhancement partpart
in the overall
in the overallnetwork
networkarchitecture,
architecture, ititcan
canbebeany
any
low-
low-light image enhancement algorithm.
light image enhancement algorithm. Based Based on the Section 3 and the data and conclu-
the Section 3 and the data and conclusions
sions presented
presented in inthe
theSection
Section5, 5,
wewe have
have chosen
chosen to incorporate
to incorporate the EnlightenGAN
the EnlightenGAN en-
enhancement
hancement algorithm
algorithm in thisin this context,
context, as depicted
as depicted in Figurein 9.
Figure 9.

Attention Map

Concatenate
Max pooling layer
Convolution block layer
Upsampling layer
Attention Module
Element-wise multiplication
Element-wise addition

Figure Figure
9. Mine_Exdark datasetdataset
9. Mine_Exdark examples.
examples.

EnlightenGAN
EnlightenGAN is an image
is an image enhancement
enhancement algorithm
algorithm thatthat utilizes
utilizes a generative
a generative adver-
adver-
sarial network framework. As shown in Figure 9, it consists of two main components: a a
sarial network framework. As shown in Figure 9, it consists of two main components:
generator
generator networknetwork
and aand a discriminator
discriminator network.
network. The The generator
generator network
network takes
takes low-light
low-light
images as input and aims to generate their enhanced versions. The discriminator
images as input and aims to generate their enhanced versions. The discriminator network network
acts as a binary classifier to distinguish between the generated enhanced images andand
acts as a binary classifier to distinguish between the generated enhanced images real
real
high-quality images.
high-quality images.
5. Experiment and Results Discussion
5.1. Experimental Environment and Parameter Configuration
The experiments were completed based on the Windows 10 operating system and the
Pytorch framework version 1.9.0. The CPU model of our equipment is Intel i5-12400F CPU,
the GPU model is GeForce GTX 3060 12G. The software environment included CUDA 11.1
and CuDNN 8.0, the Python version was 3.7.2. The experiments were conducted using the
PyCharm IDE 2022.2.3 (Professional Edition).
The training parameters have a direct impact on the convergence speed and generaliza-
tion ability of the model. Therefore, to find a relatively balanced set of training parameters
during the experimental process, it is necessary to continuously adjust these parameters,
striking a balance between convergence speed and the prevention of underfitting and
overfitting. In the experimental process, we employed a controlled variable approach to
Electronics 2023, 12, 3089 15 of 22

iteratively adjust these parameters, including selecting optimization strategies, fine-tuning


the initial learning rate, choosing pretraining weights, and adjusting momentum. Addi-
tionally, we took hardware factors into consideration, such as batch size and the number
of workers. We evaluated the convergence speed and generalization ability by analyzing
loss graphs and the number of training epochs required for convergence. Based on these
evaluations, we determined the following parameter settings: We utilized stochastic gradi-
ent descent (SGD) as the optimization strategy with an initial learning rate of 0.01 and a
momentum of 0.937. Mosaic data augmentation was applied during the training process
with eight workers. The batch size was set to 16, and the number of training epochs was
set to 80.

5.2. Dataset Introduction


In order to evaluate the effectiveness of the proposed algorithm in low-light conditions,
the publicly available Exdark dataset is used as the base dataset. This dataset consists of a
total of 7363 images captured in low-light environments. It comprises 12 common object
classes, including bicycles (652 images), boats (679 images), bottles (547 images), buses
(527 images), cars (638 images), cats (735 images), chairs (648 images), cups (519 images),
dogs (801 images), motorbikes (503 images), people (609 images), and tables (505 images).
To create a balanced training set, 400 images were selected from each class, resulting in a
total of 4800 images. The remaining 2563 images were allocated for validation purposes.
To enable the model to perform object detection tasks in low-light environments in
underground mines, we further expanded the Exdark dataset by adding 1716 additional
images collected from Shandong Jiaojia Gold Mine and the internet. These images in-
clude various object classes such as underground track equipment (686 images), non-track
equipment (583 images), and signs (447 images). Out of these images, 1317 were used to
expand the training set of the Exdark dataset. Among them, 516 images were added for
the underground track equipment class, 478 images for the non-track equipment class,
and 323 images for the signs class. The remaining 399 images were used to expand the
validation set. As a result, the expanded dataset consists of 6117 images for training and
2962 images for validation. This expanded dataset is referred to as Mine_Exdark, and
Figure 10 illustrates some examples from the expanded dataset. The division ensures that
the dataset captures a representative distribution of the various object classes, facilitating
comprehensive training and evaluation of the proposed algorithms. The primary experi-
Electronics 2023, 12, 3089 16 of 22
ments in this paper were conducted on the expanded dataset to validate the effectiveness
of the proposed improvements and innovative methods.

Figure 10.Mine_Exdark
Figure10. Mine_Exdarkdataset
datasetexamples.
examples.

5.3. Training mAP and Loss


During the training process of the model, the mAP and the validation loss of the de-
tection objects are shown in Figure 11.
Electronics 2023, 12, 3089 16 of 22
Figure 10. Mine_Exdark dataset examples.

5.3.5.3.
Training mAP
Training and
mAP andLoss
Loss
During
Duringthethe
training process
training of of
process thethe
model, thethe
model, mAP and
mAP thethe
and validation loss
validation of of
loss thethe
de-
tection objects are shown in Figure 11.
detection objects are shown in Figure 11.

(a) (b)
Figure 11.11.
Figure Training mAP
Training and
mAP andloss.
loss.(a)
(a)Training
Training mAP;
mAP; (b) Training
Trainingloss.
loss.

From
From Figure
Figure 11a,it itcan
11a, canbebeobserved
observed that
that the
the proposed
proposed model
modelconsistently
consistentlyachieves
achieves
higher mAP during the training process compared to YOLOv5s. The improvementinin
higher mAP during the training process compared to YOLOv5s. The improvement
mAPmAP values
values provides
provides preliminary
preliminary evidence
evidence that
that ourour proposed
proposed model
model achieves
achieves higher
higher de-
detection accuracy compared to the YOLOv5s model. This confirms the effectiveness of
tection accuracy compared to the YOLOv5s model. This confirms the effectiveness of the
the enhancements we proposed in improving the detection performance. In Figure 11b
enhancements we proposed in improving the detection performance. In Figure 11b de-
depicting the loss, it can be seen that both our model and YOLOv5s exhibit a decreasing
picting the loss, it can be seen that both our model and YOLOv5s exhibit a decreasing
trend and have converged after 80 epochs of training. However, our model shows a faster
trend and have rate
convergence converged after 80decrease
and a steeper epochs ofintraining.
loss. ThisHowever, our model
demonstrates shows
that our a faster
proposed
convergence rate and
model possesses a steeper
better learningdecrease in loss. This
and optimization demonstrates
capabilities, enablingthat
it toour proposed
adapt to the
model possesses better learning and optimization
training data in an efficient manner compared to YOLOv5s. capabilities, enabling it to adapt to the
trainingBased
data on
in an efficient manner compared to YOLOv5s.
the trends of these two values, it can be initially concluded that our proposed
Based on the
DK_YOLOv5 trends
model of these
converges twoand
faster values, it can
achieves be initially
higher detectionconcluded that our pro-
accuracy compared to
posed DK_YOLOv5
the original YOLOv5s model converges
model. fasterfurther
Meanwhile, and achieves higherare
experiments detection
needed to accuracy
validatecom-
its
performance
pared comprehensively.
to the original YOLOv5s model. Meanwhile, further experiments are needed to val-
idate its performance comprehensively.
5.4. Experimental Results Discussion and Comparison
In the experimental
5.4. Experimental section, precision
Results Discussion (P), recall (R), and mean average precision (mAP)
and Comparison
were used as evaluation metrics to assess the performance of the detection models. To
In the experimental section, precision (P), recall (R), and mean average precision
provide a comprehensive evaluation of the model’s performance, we utilized mAP at an
(mAP) were used as evaluation metrics to assess the performance of the detection models.
IoU threshold of 0.5 (mAP 0.5) and mAP across a range of IoU thresholds from 0.5 to 0.95
To(mAP
provide a comprehensive evaluation of the model’s performance, we utilized mAP at
0.5:0.95).
To validate the improved feature extraction capability of the proposed DK_YOLOv5
model in low-light environments, a comparison was conducted among DK_YOLOv5,
multitask AET (MAET) proposed in reference [24], and the current mainstream YOLO
single-stage object detection models. This comparison involved training and result analysis
on the Mine_Exdark dataset. The detection results of various object detection models on
this dataset are presented in Table 3.
From the experimental results in Table 3, it can be observed that without incorporating
low-light image enhancement algorithms, the detection performance of different models on
the low-light environment dataset Mine_Exdark varies. Overall, the proposed improved
model DK_YOLOv5 exhibits the highest detection performance, achieving precision (P) of
0.750, recall (R) of 0.622, mAP 0.5:0.95 of 0.470, and mAP 0.5 of 0.715 across all evaluation
metrics. Meanwhile, it can be observed that the MAET method proposed in reference [24]
has the lowest mAP 0.5 and mAP 0.5:0.95 values in this dataset, which are 59.2% and 35.1%,
respectively. Looking at the results in Table 3, it can be seen that using YOLOv6s and
Electronics 2023, 12, 3089 17 of 22

YOLOv7 models on the Mine_Exdark dataset even results in lower mAP 0.5 and mAP
0.5:0.95 compared to the YOLOv5s model, indicating their inferior performance in the
specific application environments of this paper. The latest YOLOv8s model demonstrates
better detection performance than the previous three models. The proposed improved
model DK_YOLOv5 exhibits better performance improvements across different IOU thresh-
olds compared to other comparative models. It achieves a mAP 0.5 of 71.5%, outperforming
the YOLOv8s model by 3.2% and demonstrating a 4.4% enhancement over the original
YOLOv5s model. Additionally, its mAP 0.5:0.95 of 0.470 surpasses the experimental results
of the previous models. This demonstrates that even without employing image enhance-
ment algorithms to amplify features, the proposed improved algorithm exhibits better
feature extraction capability in low-light environments compared to these algorithms, thus
validating the effectiveness of the proposed improvement algorithm.

Table 3. Performance comparison of different models on Mine_Exdark.

Detection Algorithm P R mAP 0.5:0.95 mAP 0.5


YOLOv5s 0.713 0.592 0.403 0.671
MAET 0.631 0.533 0.351 0.592
YOLOv6s 0.658 0.549 0.359 0.618
YOLOv7 0.724 0.602 0.395 0.669
YOLOv8s 0.706 0.596 0.469 0.683
DK_YOLOv5 (ours) 0.750 0.622 0.470 0.715

Furthermore, based on the conclusions drawn from the exploration of the impact of
low-light image enhancement algorithms on object detection performance, we select two
effective enhancement algorithms, namely EnlightenGAN and Zero_DCE++, to combine
with each object detection model in Table 3. This is done to validate the ability of the
proposed DK_YOLOv5 model to better suppress negative features introduced by image
enhancement algorithms, such as noise, while simultaneously enhancing the feature ex-
traction capability in target regions. The detection results of each combined model are
presented in Table 4.

Table 4. Performance comparison on the augmented Mine_Exdark.

Detection Algorithm P R mAP 0.5:0.95 mAP 0.5


EnlightenGAN + YOLOv5s 0.702 0.601 0.402 0.675
Zero_DCE++ + YOLOv5s 0.692 0.568 0.383 0.653
EnlightenGAN + MEAT 0.652 0.369 0.351 0.595
Zero_DCE++ + MEAT 0.633 0.375 0.339 0.579
EnlightenGAN + YOLOv6s 0.661 0.537 0.336 0.623
Zero_DCE++ + YOLOv6s 0.645 0.370 0.313 0.598
EnlightenGAN + YOLOv7 0.702 0.622 0.387 0.669
Zero_DCE++ + YOLOv7 0.681 0.584 0.354 0.657
EnlightenGAN + YOLOv8s 0.709 0.596 0.465 0.684
Zero_DCE++ + YOLOv8s 0.689 0.544 0.397 0.654
EnlightenGAN + DK_YOLOv5 (ours) 0.747 0.629 0.467 0.719
Zero_DCE++ + DK_YOLOv5 (ours) 0.723 0.615 0.456 0.701

From Table 4, it can be observed that the detection performance of various YOLO
detection algorithms on the low-light environment dataset is improved when combined
with the EnlightenGAN image enhancement algorithm, compared to the models without
the enhancement algorithm in Table 3. However, after applying the Zero_DCE++ enhance-
ment, the performance of the models slightly decreases. This demonstrates that combining
the detection models with the EnlightenGAN algorithm in the low-light dataset helps
amplify the target features in the images. Among them, the proposed model combined
with EnlightenGAN achieves a mAP 0.5 of 71.9%, which is higher than the previous results.
Electronics 2023, 12, 3089 18 of 22

It exhibits a 4.4% improvement over the combination with YOLOv5s and a 3.5% improve-
ment over the combination with YOLOv8s. The results of mAP 0.5:0.95 are also higher than
the previous combination. This validates that the proposed DK_YOLOv5 model, when
combined with low-light image algorithms, effectively suppresses the noise introduced
by the enhancement algorithm and enhances the feature extraction capability in the target
regions, resulting in improved detection performance.

5.5. Ablation Experiments


To investigate the impact of various improvements and innovations in the proposed
DK_YOLOv5 model on detection performance, ablation experiments were conducted on
the Exdark and Mine_Exdark datasets. These experiments aimed to validate the rationality
and effectiveness of the proposed modules in different low-light datasets. The results of

these experiments are presented in Tables 5 and 6. The symbol is used to denote the
utilization of the module, while its absence indicates non-utilization.

Table 5. Exdark dataset ablation results.

Decoupled Head C2f_SKA R-SPPF P R mAP 0.5

√ 0.673 0.573 0.642


√ 0.701 0.591 0.669
√ 0.721 0.587 0.676
√ √ 0.696 0.565 0.651
√ √ 0.701 0.621 0.688
√ √ 0.701 0.598 0.670
√ √ √ 0.725 0.595 0.678
0.715 0.629 0.698

Table 6. Mine_Exdark dataset ablation results.

Decoupled Head C2f_SKA R-SPPF P R mAP 0.5

√ 0.713 0.592 0.671


√ 0.707 0.607 0.682
√ 0.730 0.605 0.691
√ √ 0.707 0.599 0.680
√ √ 0.737 0.615 0.706
√ √ 0.718 0.612 0.693
√ √ √ 0.719 0.627 0.705
0.750 0.622 0.715

Based on the experimental results shown in Tables 5 and 6, it can be observed that
the proposed improvements and innovations to the YOLOv5 model effectively enhance
the object detection capability across different datasets. By replacing the coupled head
of YOLOv5 with a decoupled head, the mAP 0.5 improves from 64.2% to 66.9% on the
Exdark dataset and from 67.1% to 68.2% on the Mine_Exdark dataset, demonstrating that
the decoupled head is a more suitable choice for object detection models.
Furthermore, replacing the C3 module with the C2f module, enhanced to the C2f_SKA
module, results in a performance improvement of 3.4% and 2.0% on the two datasets,
respectively. This enhancement enhances the model’s capability to extract effective features
from low-light images and suppresses the interference of noise features, thereby achieving
better object detection performance. The improvement in the SPPF module further enhances
the accuracy by 0.9%, contributing to the overall detection performance of the network.
By combining these improved modules, the experimental results on different low-light
datasets consistently demonstrate improved accuracy compared to the baseline models.
This validates the rationality and effectiveness of the proposed improvements and in-
novations, as well as their complementary advantages when used in combination. The
Electronics 2023, 12, 3089 19 of 22

final mAP 0.5 reaches 69.8% and 71.5% on the two datasets, respectively, representing an
improvement of 5.6% and 4.4% compared to the original model.

5.6. Comparison of Multi-Object Detection Results


To visually compare the performance difference between the proposed DK_YOLOv5
model and the original YOLOv5 model for object detection in low-light scenarios, we
Electronics 2023, 12, 3089 selected several representative images to compare the effects of the models, as shown
20 in
of 22
Figure 12.

(a) (b) (c) (d)


Figure
Figure12.
12. Comparison
Comparison ofofdetection
detection results.
results. (a)original
(a) The The original image;
image; (b) (b) The image;
The enhanced enhanced image; (c)
(c) YOLOv5
YOLOv5 detection results; (d) DK_YOLOv5 detection
detection results; (d) DK_YOLOv5 detection results. results.

6. Conclusions
Aiming at the challenges of existing object detection models in low-light environ-
ments and the impact of noise introduced by low-light image enhancement algorithms,
we propose an improved object detection model, DK_YOLOv5, adapted to low-light con-
ditions based on the YOLOv5 model. The proposed model takes low-light enhanced im-
ages as input, amplifying the object features while achieving relatively better visual ef-
Electronics 2023, 12, 3089 20 of 22

Figure 12a represents the original image captured in a low-light environment, Figure 12b
represents the image enhanced by EnlightenGAN, Figure 12c represents the image detected
by the original YOLOv5 model, and Figure 12d represents the image detected by the
proposed DK_YOLOv5 model. By analyzing the confidence levels and the number of
detected objects in the images, it can be observed that the DK_YOLOv5 model proposed
in this paper, combined with the image enhancement algorithm, effectively improves the
accuracy of object detection in low-light environments. This verifies the effectiveness of
our approach in practical applications.

6. Conclusions
Aiming at the challenges of existing object detection models in low-light environments
and the impact of noise introduced by low-light image enhancement algorithms, we
propose an improved object detection model, DK_YOLOv5, adapted to low-light conditions
based on the YOLOv5 model. The proposed model takes low-light enhanced images as
input, amplifying the object features while achieving relatively better visual effects. The
SPPF module in the backbone network is replaced with the R-SPPF module, which offers
faster inference speed and stronger feature representation. Additionally, all C3 modules
are substituted with C2f modules, further improved as C2f_SKA modules, to reduce the
noise introduced by the low-light enhancement algorithm and enhance the network’s
learning capabilities by enriching the gradient information flow while reducing network
depth. The detection head of the network is replaced with a more effective decoupled
head to adapt to object detection tasks in low-light scenarios. Furthermore, to adapt
the model for object detection tasks in underground low-light scenarios, we expanded
the Exdark dataset by including underground mine target images. Experimental results
demonstrate that the proposed DK_YOLOv5 model achieves higher detection accuracy
in low-light conditions compared to other models and performs well in object detection
tasks in underground mine scenarios. Although our work has achieved certain results,
there are still some issues that need further research and investigation. Under extremely
poor lighting conditions, excessive image noise leads to poor image quality and the loss of
significant details, imposing higher requirements on the image enhancement algorithms.
Additionally, there is a relative scarcity of low-light scenarios object detection datasets,
making it necessary to further expand the relevant datasets for research and application.
Therefore, in future work, we will continue to explore the algorithmic aspects by optimizing
the image enhancement algorithms and considering the integration of other methods, such
as image denoising, to enhance image quality. Furthermore, we will persist in improving
the object detection model to enhance its detection performance in low-light environments.
Simultaneously, we will continue to collect low-light environment datasets, particularly
focusing on underground scenes, that are relevant to our research. Our future plan involves
integrating the image enhancement algorithm into the object detection model to create
a unified system, enabling further improvements that would allow its deployment on
mobile devices. These works will contribute to further refining the existing problems and
providing more reliable solutions for practical applications.

Author Contributions: Funding acquisition: P.Y.; investigation: J.W. and D.S.; conceptualization:
J.W.; methodology: J.W. and X.H.; software: J.W. and Y.L.; supervision: P.Y.; data curation: J.S. and
X.C.; formal analysis: J.W., Y.L. and X.H.; writing—original draft: J.W.; writing—review and editing:
J.W., P.Y. and D.S. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the National Key R&D Program of China, Grant Number
2021YFC3001300.
Data Availability Statement: Some of the datasets used to support this paper are sourced from the
below links: https://github.com/cs-chan/Exclusively-Dark-Image-Dataset (accessed on 6 Septem-
ber 2022). And some of the datasets involve state-owned enterprise confidentiality cannot be
disclosed publicly.
Electronics 2023, 12, 3089 21 of 22

Acknowledgments: We are greatly thankful for the contributions of Tao Wang in collecting under-
ground mine scenario datasets, and Yijun Xing, for providing English language support.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Bao, X.; Wang, S. Survey of Object Detection Algorithms Based on Deep Learning. Trans. Microsyst. 2022, 41, 5–9.
2. Mao, Z.; Zhu, J.; Wu, X. Review of YOLO Based Target Detection for Autonomous Driving. Comput. Eng. Appl. 2022, 58, 68–77.
3. Xia, L.; Shen, J.; Zhang, R. Application of Deep Learning Techniques in Medical Imaging Research. Med. J. Peking Union Med. Coll.
Hosp. 2018, 9, 10–14.
4. Hu, J.; Shi, Y.; Xie, S. Research and application of face recognition algorithm based on lightweight CNN. Trans. Microsyst. 2022, 41,
153–156.
5. Jiang, X.; Chen, T.; Wang, C. Survey of Deep Learning Algorithms for Agricultural Pest Detection. Comput. Eng. Appl. 2023, 59,
30–44.
6. Zhu, A.; Wang, R.; Zhang, Z. Design of aluminum surface defect detection system based on deep learning. Trans. Microsyst. 2022,
41, 96–99+103.
7. Terven, J.; Cordova-Esparza, D. A Comprehensive Review of YOLO: From YOLOv1 to YOLOv8 and Beyond. arXiv 2023,
arXiv:2304.00501.
8. Liu, W.; Anguelov, D.; Erhan, D. SSD: Single Shot Multibox Detector. In Proceedings of the 14th European Conference on Computer
Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Berlin/Heidelberg,
Germany, 2016; pp. 21–37.
9. Girshick, R.; Donahue, J.; Darrell, T. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014;
pp. 580–587.
10. Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13
December 2015; pp. 1440–1448.
11. Ren, S.; He, K.; Girshick, R. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings
of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015.
12. Lu, W.; Fu, H.; Zhao, H. Equipment Recognition of Mining Patrol Robot Based on Deep Learning Algorithm. Chin. J. Eng. Des.
2019, 26, 527–533.
13. Du, C. Research on Video-Assisted Driving System of Mine Trackless Rubber tire Vehicle. Coal Eng. 2020, 52, 178–182.
14. Xiao, Y.; Jiang, A.; Ye, J. Making of Night Vision: Object Detection under Low-Illumination. IEEE Access 2020, 8, 123075–123086.
[CrossRef]
15. Liu, S.; Huang, D. Receptive Field Block Net for Accurate and Fast Object Detection. In Proceedings of the European Conference
on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 385–400.
16. Li, W.; Wei, C.; Wang, L. Improved Faster R-CNN for Pedestrian Detection in Underground Coal Mine. Comput. Eng. Appl. 2019,
55, 200–207.
17. Wei, C.; Wang, W.; Yang, W. Deep Retinex Decomposition for Low-Light Enhancement. arXiv 2018, arXiv:1808.04560.
18. Jiang, Y.; Gong, X.; Liu, D. Enlightengan: Deep Light Enhancement without Paired Supervision. IEEE Trans. Image Process. 2021,
30, 2340–2349. [CrossRef]
19. Al Sobbahi, R.; Tekli, J. Comparing deep learning models for low-light natural scene image enhancement and their impact
on object detection and classification: Overview, empirical evaluation, and challenges. Signal Process. Image Commun. 2022,
109, 116848. [CrossRef]
20. Li, L.; Liu, X.; Zhao, Y. Low Light Image Fusion and Detection Method Based on Lego Filter and SSD. Comput. Sci. 2021, 48,
213–218.
21. Zhang, M. Underground Pedestrian Detection Model Based on Dense-YOLO Network. J. Mine Autom. 2022, 48, 86–90.
22. Xu, X.; Wang, S.; Wang, Z. Exploring image enhancement for salient object detection in low light images. ACM Trans. Multimed.
Comput. Commun. Appl. 2021, 17, 1–19. [CrossRef]
23. Qiu, Y.; Lu, Y.; Wang, Y. IDOD-YOLOV7: Image-Dehazing YOLOV7 for Object Detection in Low-Light Foggy Traffic Environments.
Sensors 2023, 23, 1347. [CrossRef]
24. Cui, Z.; Qi, G.J.; Gu, L. Multitask aet with orthogonal tangent regularity for dark object detection. In Proceedings of the IEEE/CVF
International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 2553–2562.
25. Loh, Y.P.; Chan, C.S. Getting to Know Low-Light Images with the Exclusively Dark Dataset. Comput. Vis. Image Underst. 2019,
178, 30–42. [CrossRef]
26. Redmon, J.; Divvala, S.; Girshick, R. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788.
27. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object
Detectors. arXiv 2022, arXiv:2207.02696.
Electronics 2023, 12, 3089 22 of 22

28. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020,
arXiv:2004.10934.
29. Lv, F.; Lu, F.; Wu, J. MBLLEN: Low-Light Image/Video Enhancement Using CNNs. In Proceedings of the BMVC 2018, Newcastle,
UK, 3–6 September 2018; Volume 220, p. 4.
30. Zhang, Y.; Zhang, J.; Guo, X. Kindling the darkness: A practical low-light image enhancer. In Proceedings of the 27th ACM
International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 1632–1640.
31. Guo, C.; Li, C.; Guo, J. Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1780–1789.
32. Li, C.; Guo, C.; Loy, C.C. Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation. IEEE Trans. Pattern
Anal. Mach. Intell. 2021, 44, 4225–4238. [CrossRef]
33. Wu, W.; Weng, J.; Zhang, P. Uretinex-net: Retinex-based deep unfolding network for low-light image enhancement. In Pro-
ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022;
pp. 5901–5910.
34. Ma, L.; Ma, T.; Liu, R. Toward fast, flexible, and robust low-light image enhancement. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 5637–5646.
35. Zheng, S.; Gupta, G. Semantic-Guided Zero-Shot Learning for Low-Light Image/Video Enhancement. In Proceedings of the
IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 581–590.
36. Li, X.; Wang, W.; Hu, X. Selective Kernel Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 510–519.
37. He, K.; Zhang, X.; Ren, S. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern
Anal. Mach. Intell. 2015, 37, 1904–1916. [CrossRef] [PubMed]
38. Ge, Z.; Liu, S.; Wang, F. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy