Applsci 10 02749 v2
Applsci 10 02749 v2
sciences
Review
A Survey on Theories and Applications for
Self-Driving Cars Based on Deep Learning Methods
Jianjun Ni 1,2, * , Yinan Chen 1 , Yan Chen 1 , Jinxiu Zhu 1,2 , Deena Ali 1 and Weidong Cao 1,2
1 College of IOT Engineering, Hohai University, Changzhou 213022, China; chenyinan96@163.com (Y.C.);
stcy401@126.com (Y.C.); zhujx@hhuc.edu.cn (J.Z.); deena.ahd@gmail.com (D.A.);
caoweidong19890710@126.com (W.C.)
2 Jiangsu Universities and Colleges Key Laboratory of Special Robot Technology, Hohai University,
Changzhou 213022, China
* Correspondence: njjhhuc@gmail.com; Tel.: +86-519-8519-1711
Received: 23 March 2020; Accepted: 11 April 2020; Published: 16 April 2020
Abstract: Self-driving cars are a hot research topic in science and technology, which has a great
influence on social and economic development. Deep learning is one of the current key areas in
the field of artificial intelligence research. It has been widely applied in image processing, natural
language understanding, and so on. In recent years, more and more deep learning-based solutions
have been presented in the field of self-driving cars and have achieved outstanding results. This paper
presents a review of recent research on theories and applications of deep learning for self-driving cars.
This survey provides a detailed explanation of the developments of self-driving cars and summarizes
the applications of deep learning methods in the field of self-driving cars. Then the main problems in
self-driving cars and their solutions based on deep learning methods are analyzed, such as obstacle
detection, scene recognition, lane detection, navigation and path planning. In addition, the details of
some representative approaches for self-driving cars using deep learning methods are summarized.
Finally, the future challenges in the applications of deep learning for self-driving cars are given out.
Keywords: self-driving cars; deep learning method; obstacle detection; scene recognition; lane detection
1. Introduction
Recently, the rapid development of artificial intelligence has greatly promoted the progress of
unmanned driving, such as self-driving cars, unmanned aerial vehicles, and so on [1,2]. Among these
unmanned driving technologies, self-driving cars have attracted more and more attention for
their important economic effect [3]. However, there are lots of challenges in self-driving cars [4,5].
For example, the safety problem is the key technology that must be solved efficiently in self-driving
cars, otherwise, it is impossible to allow self-driving cars on the road. Deep learning is an important
part of machine learning and has been a hot topic recently [6,7]. Due to its excellent performances,
it has been applied by scientists in the research and development of self-driving cars. More and
more solutions based on deep learning for self-driving cars have been presented, including obstacle
detection, scene recognition, lane detection, and so on.
This paper provides a survey on theories and applications of deep learning for self-driving cars.
We introduce the theoretical foundation of the main deep learning methods used for self-driving
cars. On this basis, we focus exclusively on the applications of deep learning for self-driving cars.
Other relevant surveys in the field of deep learning and self-driving cars can be used as a supplement
to this paper (see e.g., [1,6,8,9]).
The main contributions of this paper are summarized as follows. (1) A comprehensive analysis and
review of the development of self-driving cars are presented. In addition, the challenges and limitations
of current methods are enumerated. (2) A survey on the theoretical foundation of deep learning
methods is given out, which focuses on the network structure. (3) An overview of the applications in
self-driving cars based on deep learning is given out, and the details of some representative approaches
are summarized. At last, some prospects of future studies in this field are discussed.
This paper is organized as follows: In Section 2, a general introduction to the development
of self-driving cars is provided. A comprehensive analysis is given out on the development of the
hardware and software technologies in this field. Section 3 introduces the theoretical background
of some common deep learning methods used for self-driving cars. The main applications and
some representative approaches based on deep learning methods in the field of self-driving cars
are summarized in Section 4. Section 5 discusses the future research directions for the deep learning
methods in self-driving cars. Finally, conclusions are given out in Section 6.
(a) (b)
Figure 1. Self-driving cars of Google and Tesla: (a) Google’s self-driving car; (b) Tesla’s self-driving car.
In addition to the companies mentioned above, lots of Internet companies and car companies
worldwide are also focusing on the self-driving car field recently. For example, in Sweden, Volvo and
Autoliv established a joint company-Zenuity, which is committed to the security of self-driving
cars [12]. In South Korea, Samsung received approval from the South Korean government to test
its driverless cars on public roads in 2017. It should be noted that Samsung applied for the highest
number of patents in the world in the field of self-driving cars from 2011 to 2017 [13]. In China,
Baidu deep learning institute led the research project of self-driving cars in 2013. In 2014, Baidu
established the automotive networking business division and successively launched CarLife, My-car,
CoDriver and other products [14]. In 2016, Baidu held a strategic signing ceremony with Wuzhen
Appl. Sci. 2020, 10, 2749 3 of 29
Tourism, announcing that the unmanned driving at Level4 will be implemented in this scenic area
(see Figure 2a). Other information technology (IT) companies in China also have intensively studied
and made great progress in this field based on their technology in the field of artificial intelligence,
such as Tencent, Alibaba, Huawei, and so on [15]. For example, Tencent has displayed the red flag
Level3 self-driving car in cooperation with FAW (First Auto Work) (see Figure 2b).
(a) (b)
Figure 2. Self-driving cars of the companies in China: (a) Baidu’s self-driving car; (b) Tencent’s
self-driving car.
According to the autonomous driving level, a car in Level0 to Level2 needs a driver to mainly
monitor the environment. Advanced Driver Assistance Systems (ADAS) are intelligent systems that
reside inside the vehicles categorized from Level0 to Level2 and help the driver in the process of
driving [16,17]. The risks can be minimized based on the ADAS, such as the electronic stability control
system and forward emergency braking system, by reducing driver errors, continuously alerting
drivers, and controlling the vehicle if the driver is incompetent [18].
In this paper, we focus on self-driving cars which are categorized as level 3 or above. The overall
technical framework of self-driving cars that are equipped with a Level3 or higher autonomy system
can be divided into four parts, namely the driving environment perception system, the autonomous
decision system, the control execution system and the monitor system [19]. The architecture of it is
shown in Figure 3.
Environment information
Figure 3. The overall technical framework of self-driving cars with a Level3 or higher autonomy system.
The environment perception system utilizes the prior knowledge of the environment to establish
an environmental model including obstacles, road structures, and traffic signs through obtaining
surrounding environmental information. The main function of the environment perception system is
to realize functions like lane detection, traffic signal detection, and obstacle detection, by using some
hardware devices such as cameras and laser radars.
Appl. Sci. 2020, 10, 2749 4 of 29
The main function of the autonomous decision system is to make some decisions for the
self-driving car, including obstacle avoidance, path planning, navigation, and so on. For example,
in the path planning, the autonomous decision system plans a global path according to the current
location and the target location firstly, then reasonably plans a local path for the self-driving car
by combining the global path and the local environment information provided by the environment
perception system.
The control execution system’s function is to execute the commands received from the autonomous
decision system, such as braking, steering, and accelerating to complete the speed control and
path-following control. The control execution system will perform some actions according to the
situations of the environment directly sometimes, without any commands from the autonomous
decision system, to deal with some emergencies, such as pedestrian avoidance.
The monitor system is responsible to check whether the car is making actual progress towards its
goal and reacts with recovery actions when meeting problems like unexpected obstacles, faults, etc.
The self-driving car is a complex autonomous system, which requires the support of the
theories and technologies. At the technical level, it is impossible to achieve such rapid development
of the self-driving car without the rapid development of the hardware and software. There are
various good hardware and soft platforms capable of rapid data analysis, as well as managing and
understanding of self-driving cars [20]. For example, the NVIDIA DRIVE PX2 driverless car platform
can perform 30 trillion deep learning operations per second and can achieve Level4 autopilot [21].
It supports 12-channel camera inputs, laser positioning, radar, and ultrasonic sensors, and includes
two new-generation NVIDIA Tegra processors (see Figure 4). When it comes to softwares, Tensorflow
is one of the main libraries for deep learning used in the field of self-driving cars [22].
At the theoretical level, some methods used in the robotic field can be applied in self-driving
cars for the similarity between them, including path planning, environmental sensing, autonomous
navigation and control, etc. [23,24]. Various artificial intelligence algorithms have been used in
self-driving cars, such as fuzzy logic, neural network, and so on. Among these methods, deep learning
methods have achieved great success in the self-driving field for their distinct advantages, such as
high accuracy, strong robustness, and low cost. This paper will focus on the deep learning methods
used in the field of self-driving cars.
Remark 1 (About the autonomous level of self-driving cars). The NHTSA has adopted the level
classification provided by the Society of Automotive Engineers, which ranges from Level 0 to Level 5 [25].
In this standard, Level 0 represents a vehicle without any autonomy. Level 1 has basic driving assistance such as
adaptive cruise and emergency braking. Level 2 has partial autonomy while the driver needs to supervise the
system and perform some tasks. At Level 3, the system has full autonomy under certain conditions, but the
Appl. Sci. 2020, 10, 2749 5 of 29
human operator is still required to take control if necessary. The vehicle at Level 4 is still a semi-autonomous
system, which has higher automation than Level 3. Vehicles in Level 5 are fully autonomous in all conditions.
Remark 2 (About Tensorflow). Tensorflow is a common deep learning platform that was created by Google.
The design of Tensorflow is intended to simplify the construction of deep neural networks and speed up the
learning process with a heterogeneous distributed computational environment [26]. Tensorflow provides lots
of Application Programming Interface (API) libraries to build and train deep learning models, which can
support Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) and other deep neural
network models.
Input Output
Convolution Pooling
Fully
Convolution Pooling
connected
Convolution layer is the core component of CNNs although the specific structures of CNNs may
be different. The convolution kernel (i.e., filter matrix) is convolved with a local region of the input
image, namely
y j = ∑ wij ∗ x + b j , (1)
where the operator ∗ represents two-dimensional discrete convolution operation; w represents the
filter matrix and b is the bias parameter; x is the input feature map and y represents the output of the
feature map. The convolution kernel is generally initialized as a small matrix of 3 × 3 or 5 × 5. In the
training process of the network, the convolution kernel will be constantly updated through learning
and finally get a reasonable weight.
Appl. Sci. 2020, 10, 2749 6 of 29
The output of the convolution operation is usually run by a nonlinear activation function.
The activation function can help better to solve the linear inseparable problem. Functions such as
Sigmoid, Tanh, and ReLU are usually used as activation functions, which are listed as follows:
1
Sigmoid : R = 1+e−y
y − y
Tanh : R = eey − e
+e−y
(2)
ReLU : R = max (0, y)
when the training gradient descends, ReLU is preferred as it has a faster convergence rate than
traditional saturated nonlinear functions. In addition, the output of the convolution operation is
modified by a pooling function. Mean-pooling and max-pooling are commonly used, which can keep
more information about the background and the texture of the image respectively.
The success of the Convolutional Neural Network is due to its three important characteristics:
local receptive fields, shared weights, and spatial sampling. The shared weight reduces the connection
between the layers of the network and reduces the possibility of over-fitting. The pooling layer
(i.e., subsampling layer) can reduce the dimension of the middle hidden layer and reduce the calculation
amount of the next layer, and provide rotation invariance.
The CNN has yielded outstanding results in computer image and general image classification tasks
recently. Because many self-driving cars technologies rely on image feature representation, they can be
easily realized based on CNNs, such as obstacle detection, scene classification, and lane recognition.
Output Layer O
V
W
Hidden Layer H
Input Layer X
In the RNN network, the current state of each loop unit is determined by the input at this moment
and the previous state. Given the learning data input in sequence X = { x1 , x2 , · · · xt }, the hidden state
at the time t is updated by:
Ht = ϕ (UXt + W Ht−1 + b) , (3)
where the weight matrices U and W determine the importance given to the current input Xt and to
the previous state Ht−1 respectively; ϕ (·) is an activation function and b is the bias matrix. Then the
output Ot is calculated by:
Ot = V Ht + c, (4)
Appl. Sci. 2020, 10, 2749 7 of 29
Encoding Decoding
W1 W2
As shown in Figure 7, the encoder is used to convert the high-dimensional original data into a
low-dimensional vector representation, so that the compressed low-dimensional vector can retain the
typical characteristics of the input data. The decoder is used to restore the original data. The training
objective of AE is to minimize the error between the input and the output [35].
On the basis of the traditional AE, many improved methods are proposed to add some constraints
on the hidden layer, forcing the hidden layer to express difference from the input. For example,
the Convolutional Auto-Encoder (CAE) uses the convolution layer and pooling layer to replace the full
connection layer of traditional AE, which can retain the spatial information of the two-dimensional
signal. In the decoding process of CAE, deconvolution is used, which can be viewed as the inverse
operation of convolution. By building a deconvolution network, the low resolution of feature
representations can be mapped to the input resolution and the network generates accurate boundary
localization with pixel-wise supervision [36].
where γ is the discount factor that controls the importance of immediate and future rewards.
Appl. Sci. 2020, 10, 2749 8 of 29
Environment
State Reward Ac on
Agent
where Q∗ is the optimal value of Q. The optimal policy π ∗ is to maximize the value function, namely
Deep reinforcement learning (DRL) combines deep learning and reinforcement learning to solve
the problem of capacity limitation and sample correlation. DRL has both the perceptive ability of
deep learning and the decision-making ability of RL in a general form. DRL can learn a mapping
from the original input to the action output [41]. One of the DRL methods is Deep Q-Network (DQN),
which can utilize a deep neural network to map the relationships of actions and states, which is similar
to the Q-learning method [42]. The DQN uses a CNN as the function approximator with weights θ as
a Q-network, which is shown as follows:
∗
yi = Qi (st , at ) = E r (st , at ) + γ max Q (st+1 , at+1 ; θi−1 ) |st , at , (8)
a t +1
then the Q-network can be trained by updating the parameters θi in each iteration i by minimizing the
mean-squared error as follows:
h i
Li (θi ) = E (yi − Q (st , at ; θt ))2 . (9)
As introduced above, deep learning methods have many excellent characteristics that can meet the
needs of self-driving cars. In this paper, the deep learning methods used in the field of self-driving cars
can be classified based on their network structures, which are shown in Figure 9. The details of how
the deep learning methods applied to self-driving cars will be clearly introduced in the next section.
Appl. Sci. 2020, 10, 2749 9 of 29
RNN
Lane recognition
Recognition
Traffic sign/light recognition
Obstacle detection
Detection
Pedestrian detection
AE
Figure 9. The classification of the deep learning methods used in the field of self-driving cars.
images are acquired by the binocular camera, and a disparity map is obtained by the stereo matching
algorithm. Then the calculation of the disparity map determines whether it is an obstacle point. Finally,
the obstacle area is extracted.
Image acquision
Matching cost
Cost aggregaon
Stereo
Disparity computaon matching
Disparity image
Obstacle determinaon
Typical stereo matching algorithms include the following four steps: matching cost computation,
cost aggregation, disparity computation, and disparity refinement [49]. There are many research results
on the binocular vision obstacle detection based on deep learning. For example, Žbontar, et al. [50]
designed a twin convolution structure MC-CNN (Matching Cost-Convolutional Neural network),
where the CNN was applied to the image similarity measure and matching cost calculation.
The structure of this MC-CNN is shown in Figure 11.
Similarity score
Concatenate
Fully-connected
In the twin convolution structure of MC-CNN, there are many convolutional layers in the
networks. Each layer is followed with a rectified linear unit. Then the obtained two vectors are
connected in series and propagated by the fully-connected layers. At last, a single number is produced
by the last fully-connected layer. This number denotes the similarity rate between the input patches.
In the method of [50], the matching cost is directly initialized from the output of the network:
CCNN (p, d) = −s < P L (p) , P R (p − d) > , (10)
where s (·) is the output function; P L (p) and P R (p − d) are the input patches from the left image and
the right image respectively; p denotes the position ( x, y), and d is the correct disparity at this position.
The cost aggregation process of stereo matching in MC-CNN is as follows:
0
CCBCA = CCNN (p, d)
i −1 (11)
i
CCBCA = |U 1(p)| ∑ CCBCA (q, d),
d
q∈Ud (p)
Appl. Sci. 2020, 10, 2749 11 of 29
where, Ud (p) denotes the combined support region for p; i is the number of iterations.
The final matching cost CSGM is defined as the average across of the four directions r, namely:
1
4∑
CSGM (p, d) = Cr (p, d) (12)
r
Cr (p − r, d) ,
Cr (p − r, d − 1) + P1 ,
4
Cr (p, d) = CCBCA (p, d) − min Cr (p − r, k) + min Cr (p − r, d + 1) + P1 , , (13)
k
min Cr (p − r, k) + P2
k
where P1 and P2 are the penalty parameters. Then the disparity image D is calculated by finding the
disparity d that minimizes C (p, d), which is as follows:
After the disparity is calculated, the obstacle area can be detected, which makes it possible to
detect the obstacle in the self-driving car. The experiment examples of disparity results using MC-CNN
are shown in Figure 12.
(a)
(b)
Figure 12. Examples of the disparity results by MC-CNN in [50]: (a) error of 3.73%; (b) error of 2.36%.
The solution of the stereo matching algorithm lays a solid foundation for obstacle determination.
There are other methods based on deep learning used in the obstacle detection for self-driving cars.
For example, Nguyen, et al. [51] presented a network structure, which is constructed by a wide
context learning network and stacked encoder-decoder 2D CNNs. Zhang, et al. [52] proposed an
end-to-end multidimensional residual dense attention network, which focuses on more comprehensive
pixel-level feature extraction. The network includes a two-dimensional residual dense attention
network for feature extraction and a three-dimensional convolutional attention network for matching.
Kendall, et al. [53] presented a deep learning network for regressing the disparity of a pair of stereo
images, where the context is incorporated directly from the data employing 3D convolutional network.
Dairi, et al. [54], proposed a method where a deep-stacked auto-encoders (DSA) model is used. In this
DSA model, the greedy learning features are combined with the dimensionality reduction capacity.
In addition, an unsupervised k-nearest neighbor algorithm is employed to detect the obstacles.
A summary of the deep-learning-based obstacle detection algorithms is illustrated in Table 1,
where the main methods used in the obstacle detection are given out and the corresponding references
are also listed.
Appl. Sci. 2020, 10, 2749 12 of 29
(a) (b)
Figure 13. The impacts of weather: (a) bad weather; (b) good weather.
Deep learning shows obvious advantages in scene recognition. In recent years, the theoretical
research and application of deep networks in scene recognition are also rich. For example,
Wang, et al. [60] proposed a multi-resolution CNN network model, which won the first prize in
the LSUN competition that year. This model includes coarse resolution CNNs and fine resolution
CNNs, which are used to capture the visual structures at a large scale and a relatively smaller scale
respectively. The architecture used in [60] takes two different resolution images as input (see Figure 15),
so that it can be used for the scene understanding with different scales.
Appl. Sci. 2020, 10, 2749 14 of 29
(a) (b)
(c) (d)
Figure 14. The examples of various scenes: (a) urban main road; (b) suburban road; (c) mountain road;
(d) city street.
Multi-resolution CNN
Coarse resolution CNN
Conv1 Conv2
112*112*64 56*56*192
Pool
Stride2 Stride1 Incep3a~5b
1*1*1024
Pool Pool
224*224 3*3 3*3
Incep3a Incep3b Incep3c Incep4a Incep4b Incep4c Incep4d Incep4e Incep5a Incep5b
28*28*256 28*28*320 14*14*576 14*14*576 14*14*576 14*14*608 14*14*608 7*7*1056 7*7*1024 7*7*1024
Srtide1 Stride1 Stride2 Srtide1 Stride1 Stride1 Stride1 Stride2 Stride1 Stride1
Incep3a Incep3b Incep3c Incep4a Incep4b Incep4c Incep4d Incep4e Incep5a Incep5b
43*42*256 21*21*320 11*11*576 11*11*576 11*11*576 11*11*608 11*11*608 11*11*1056 11*11*1024 11*11*1024
Srtide1 Stride1 Stride2 Srtide1 Stride1 Stride1 Stride1 Stride2 Stride1 Stride1
In the multi-resolution CNN in Figure 15, the coarse resolution architecture starts from two
convolution layers and the max-pooling layer which converts a 224 × 224 input image into a
28 × 28 feature maps. The subsequent ten inception layers fast process the small size of feature maps.
Because the coarse resolution CNNs focus on the global arrangements or objects at a larger scale,
they capture visual appearance and structure at a relatively coarse resolution. The fine resolution
CNNs use the 336 × 336 image region as input. Three extra convolutional layers are added on top of
the inception layer. The purpose of this structure is to capture the visual content in a finer resolution
and enhance local detailed information.
In addition, the authors in [60] proposed a knowledge-based disambiguation method to deal
with the problem of label ambiguity. Firstly, the knowledge of extra networks is exploited to provide
Appl. Sci. 2020, 10, 2749 15 of 29
supervised information for each image. Then the knowledge is used to guide the training process of the
CNN networks. In the training process, the CNN networks can predict the hard labels (the ground-truth
scene labels) and the soft labels (the predicted scene labels) simultaneously. The objective function
used in [60] is as follows:
!
K1 K2
l (D) = − ∑ ∑ ∏ (yi = k) log pi,k + λ ∑ ∑ qi,k log fi,k , (15)
I i ∈ D k =1 I i ∈ D k =1
where Ii denotes the i-th image in the training dataset D; yi and pi are the hard label (with K1 dimension)
and the soft label (with K2 dimension) respectively; f i is the soft code produced by an extra knowledge
network; qi is the predicted soft code of Ii ; and λ is a balancing parameter.
The results of experiments on different datasets based on the proposed method in [60] are listed
in Table 2, which shows that the multi-resolution learning framework can improve the generalization
ability by the knowledge obtained from extra networks. Furthermore, this framework can reduce the
over-fitting during the training process.
Table 2. The classification error on different datasets based on multi-resolution CNN in [60].
In addition to the above multi-resolution CNN method, there are lots of research results in
scene recognition based on deep learning methods. In terms of scene classification, Chen et al. [59]
proposed a road scene recognition method based on a multi-label neural network. This network
architecture integrates different classification modes into a cost function for training. Tang et al. [61],
employed the GoogLeNet model for scene recognition, which is divided into three parts of layers
from bottom to top and the output features from each of the three parts are fused to generate the
final decision for scene recognition. When it comes to scene understanding, Fu et al. [36] presented
a contextual deconvolution network embedding channel contextual module and spatial contextual
module. The decoder network uses hierarchical supervision for multi-level feature maps to improve
the representation of the scene semantic information. Byeon et al. [62] proposed a 2D-LSTM network
to learn surrounding context information and model spatial dependencies of scene labels. The final
layer outputs the class probabilities of each image patch.
A summary of the deep learning method-based scene recognition algorithms is illustrated in
Table 3, where the network structures of the corresponding methods are given out.
To deal with the disadvantages of the traditional image processing methods in lane detection,
many scholars have applied deep learning to lane recognition. For example, John et al. [67] proposed
a lane detection algorithm, where the semantic road lane is estimated using the regression and
classification framework based on the extra tree. The input of this framework is extracted from the
deconvolutional network. The training and testing process of the method proposed in [67] is shown
in Figure 16.
FC
P5 DC-FC
C3 C4 C5 DC1 DC2
C1 C2 DC3 DC4 DC5
HSD input
(Stage A)
Training Set
Road Scene Extra Trees
Ground Features Classifier
Truth Label Fine-tuned Deconvnet
224*224
Scene Labels
P5
(Stage B)
C4 C5 Training Set
HSD Input C2 C3
C1 Road Scene Scene-based Extra
Features Trees Regression
Lane Marker
Locations
(Stage C)
Training Phase
Road Scene Trained Extra Scene Labels Predicted Lane Label Vector
Features Trees Classifier
As shown in Figure 16, in the training phase of the method in [67], Stage A is used to fine-tune
the deconvolution network. The input of the deconvolution network is the combination of the hue,
saturation, and depth (HSD). This deconvolution network is trained to extract the features of the road
scene. Stage B is used to train the extra trees-based classifier for modeling the relationship between the
road scene and the labels. Then this classifier is used to predict the scene labels based on the color and
depth features of the road scenes. In Stage C, for each road scene label, a separate extra tree regressor
is trained using the image-based deep features and the lane marker locations annotated manually.
This extra tree regressor can be used to predict the lane marker position.
In the testing phase of the method in [67] (see Figure 16), the trained deconvolution network is
used to predict the road surface and output the features of the road surface. Then, these features are
input to the classifier model to get the scene label. At last, the estimated scene labels and the road
features are input to the corresponding regression model, to predict the lane locations and the semantic
information. Results of the algorithm proposed in [67] are shown in Figure 17.
Appl. Sci. 2020, 10, 2749 18 of 29
Figure 17. The results of lane prediction based on the method presented in [67].
In order to deal with complex noises and scenes in the lane detection for self-driving cars, lots
of methods based on deep learning have been proposed. For example, Xiao, et al. [68] proposed an
accurate and fast deep CNN, which combined self-attention and channel attention in lane marking
detection. Kim, et al. [69] proposed a stacked ELM (extreme learning machine) architecture for CNNs,
which was applied to lane detection. This method can reduce learning time and produce accurate
results. Liu [70] designed a gradient-guided deep convolution network to detect the presence of lane,
where the gradient cues and geometric attributes are used. In this method, the spatial distribution of
detected lanes is represented by a recurrent neural layer.
Lane detection based on deep learning can be divided into two categories: one-stage method and
two-stage method. The one-stage method refers to the method that directly outputs the parameters
about the lane through the deep network. The two-stage method means that it is divided into two
steps: Firstly, semantic segmentation is carried out through the deep network to output the pixel
collection of the lanes; Secondly, a curve through these pixels is fitted to get the lane parametrization.
A summary of the lane detection methods based on deep learning is illustrated in Table 4.
Sample
Replay Memory Mini-batch
Store
Optimizer Optimizer
Update Policy
Update Q gradient
gradient
a
Environment gradient
Online policy
Online Q network
s,v,a network
at , st
Soft update yi Soft update
Target policy
Target Q network
network at 1 , st 1
In the method of [79], the Actor network includes an online policy network and a target policy
network. In this actor network, an action is gotten based on the deterministic strategy from the current
state. The online policy network of the actor is updated with sampling gradient:
1
∑
Q
∇ θ µ µ | si ≈ ∇ a Q ( s, a ) s=si ,a=µ(si ) ∇θ µ µ(s |θ µ ) |si , (17)
N
θ
i
where θ Q and θ µ are the parameters of critic online Q network and actor’s online policy network
respectively; N is the number of batches; and µ(si ) denotes current strategy at the state si .
Appl. Sci. 2020, 10, 2749 21 of 29
The critic network includes an online Q network and a target Q network. In this critic network,
the Bellman equation is used to evaluate the quality of action. The online Q network in the critic
network is updated by:
1 2
L= ∑ yi − Q(si , ai θ Q . (18)
N i
In the method proposed in [79], the reinforcement learning model is trained based on a deep
deterministic policy gradient and a vehicle dynamic model. So it has the advantages of deep Q network
and can ensure the convergence of the network. The performance of this method is better than the
traditional trajectory planning method.
Ground-truth steers
Low Sensor
Encoder
pass fiter synchronization
Backpropagation
algorithm
Figure 19. A overview of the block diagram of the system proposed in [80].
During the training process of the method in [80], the ground-truth steering angle φ is encoded as
following sine wave function:
2π (i − 1)
φπ
Yi = sin − , 1 ≤ i ≤ N, (19)
N−1 2φmax
where Yi is the activation of the i-th output neuron and N is the number of output neurons. In this
method, a least squares regression is used in the classification layer, to fit the predicted function. In the
process of deployment, the steering angle is output based on the results of the least squares regression.
detection methods in discriminative and representative feature learning. So, the pedestrian detection
methods based on deep learning have been studied extensively. For example, Shen et al. [81]
proposed a single-shot pedestrian detection method using a multi-receptive field-based framework.
The framework of the pedestrian detection in [81] is shown in Figure 20. First, the image is used as the
input of the Visual Geometry Group (VGG) network. Then, the multi-resolution and multi-receptive
field feature pyramid is built.
1*1
Conv 1. Head
1
Max Pooling
VGG
3*3 2. Body
Conv 2 3
Network
Max Pooling
5*5 3. Leg Graph CNN
Conv
Input image
Max Pooling
GCN Occlusion Handing
MRPP Module
Non-Maximun
Convolution and
Suppression
Detections
(NMS)
1 3 5 10 19 38
As shown in Figure 20, a multi-receptive pooling pyramid (MRPP) module is proposed to extract
feature maps. There are four max-pooling layers in this MRPP module, which are used to deal with
the spatial size of the final VGG feature maps. The MRPP module will output five feature maps with
different spatial resolutions. Then, a Graph CNN (GCN) module is used to handle the occlusion
problem based on the outputs of the MRPP module and one VGG feature. The final detection results
can be obtained based on the single-shot multi-box detection (SSD) algorithm and the non-maximum
suppression module.
In the training process of the method in [81], there are two parts in the objective function,
namely the classification loss (denoted as Lcls ) and the localisation loss (denoted as Lloc ). Lcls is a
multiple classes Softmax loss, which is defined as follows:
N k N
1 1
∑ ∑ ti,j log ∑ y log
Lcls = − pi,j = − pi,j , (20)
N i =1 j =1
N i =1
where ti,j denotes the indicator of the i-th sample in class j; pi,j is the predicted output; and y is the
class label of the ground truth. Lloc is a bounding box regression loss, which is defined as follows:
N
Lloc = ∑ ti,j smoothL1 ( pbox , gbox ), (21)
i =1
where pbox and gbox are the parameters of the predicted and ground truth bounding box respectively.
The total loss function is defined as follows:
(a)
(b)
Figure 21. The examples of various traffic signs and lights: (a) traffic signs; (b) traffic lights.
In the traffic signs recognition, Xu et al. [82] proposed a traffic signs recognition approach based
on a CNN algorithm. First, the structural information of the traffic sign image is extracted based on the
hierarchical significance detection method. Then, a neural network model is used to extract the features
of the region of interest. Finally, the traffic sign is classified by the Softmax classifier to complete the
detection of the traffic sign. Alghmgham et al. [83] designed a deep-learning-based architecture and
applied it in the real-time traffic sign classification. The proposed architecture in [83] consists of two
convolutional layers, two max-pooling layers, one dropout layer and three dense layers.
In the traffic lights recognition, Lee and Kim [84] proposed a DNN-based method to detect traffic
lights in images. The detector in this paper has a DNN architecture of encoder-decoder. The encoder
is used to generate feature maps from the images by the ResNet-101. Then, the decoder is used to
generate a refined feature map from the results of the encoder, to output the final classification results
for the traffic lights. Kim et al. [85] proposed a traffic light recognition method based on deep learning,
which consists of a semantic segmentation network and a fully convolutional network. The semantic
segmentation network is employed to detect traffic lights and the fully convolutional network is used
for traffic light classification.
5. Future Directions
In recent years, deep learning has made a breakthrough in image recognition, and also promoted
the increasing development of self-driving cars technology. It can be seen from the achievements of
several scientific research organizations that the research on self-driving cars has made great progress.
However, the applications of deep learning for self-driving cars still have many challenges, which need
to be improved as follows:
(1) The samples problem of deep learning. The deep learning model is trained through samples.
In order to achieve the required accuracy in the recognition task, a large number of correct samples are
usually required to meet the needs of developers. The quantity and quality of data is still a problem
for good generalization capability [86]. In addition, the development of more realistic virtual datasets
is an open problem, which can be solved by recording real cases [87–89].
Appl. Sci. 2020, 10, 2749 24 of 29
(2) The complexity problem of deep learning. The complexity of deep learning algorithms is
described by the parameters of the model. The number of parameters in a deep learning model usually
exceeds millions [90]. The deep learning model is considerably complex with various functions to
realize, so it cannot be trained on simple equipment. Embedded hardware needs strong communication
and computing capabilities. So the hardware of self-driving cars needs to be improved, and the tradeoff
of the performance and the price should be considered.
(3) The robustness problem of deep learning. The applications of self-driving cars based on deep
learning methods rely on the images obtained during driving. However, the pictures acquired in the
process of moving are easy to be interfered with by occlusion and illumination, which decreases the
recognition accuracy. Robustness against influences is a key challenge.
(4) The real-time problem of deep learning. The processing ability of the human brain is still
beyond the range of current deep learning. The perception speed to surrounding environment
information of the autonomous driving model is far less than the speed of human response, so the
real-time requirement still needs to be further improved.
(5) The high-dimensional state-space problem of deep learning. Real-world problems usually
involve high-dimensional continuous state space (a large number of state or actions) [91]. When faced
with the overwhelming set of different states and/or actions, it will be difficult to solve optimization
problems and seriously restrict the development of practical applications based on deep learning.
An effective approach to deal with such problems remains a challenge.
(6) The 3D point cloud data processing based on deep learning. The methods focused on in this
paper are all based on image sensors. The range sensor is also the main sensor used in self-driving
cars. The 3D point clouds can be obtained based on range sensors (such as LiDAR), which are useful
for scene understanding [92], object detection [93], and so on. Deep learning is good at processing the
point cloud data too, however, it is faced with many problems like arraying irregularly in space. In the
future, there are still many challenges, including how to further solve the disorder problem of the point
cloud data, the sampling problem of non-uniform distribution and the noise problem of original data.
(7) The road support system based on deep learning. The road support system is a scheduling
and auxiliary system, which is often installed at the city traffic control center. So the tasks of the
road support system are different from those of self-driving cars introduced above, such as vehicle
detection and identification, pedestrian detection, and license plate recognition [94]. The road support
system can provide more accurate and effective assistance for self-driving cars. Because the road
support system can obtain much more information than the self-driving car, it is a new challenge for
the methods based on deep learning.
The ultimate goal for the development of self-driving cars is to build an automatic platform
capable of real-time, all-day and efficient driving service. Driverless technology can greatly improve
social productivity, generate huge social benefits, and improve the way people travel, to make a better
living environment. So there are lots of problems that need to be solved efficiently, which include two
sides, namely the applications of self-driving cars based on deep learning and the improvements of
deep learning algorithms. Thus, self-driving cars based on deep learning are still on the road.
6. Conclusions
In this survey, the latest advances in the development of autonomous driving systems based on
deep learning techniques were analyzed. This paper first introduced the development of deep learning
and self-driving cars in recent years, as well as the mainstream architecture of self-driving cars. Then the
most common network architectures in deep learning used for self-driving cars were described.
Furthermore, the current research and applications of deep learning in the field of self-driving cars
were elaborated and the details of some representative approaches in the applications of self-driving
cars were given out. Finally, the key problems and challenges in deep learning for self-driving cars
were analyzed. Because the number of the literature is large and is still growing in this area, many good
methods based on deep learning have not been included to focus on some of the key issues in this field.
Appl. Sci. 2020, 10, 2749 25 of 29
It is clear that deep learning methods will become one of the hottest research topics in the
self-driving car field, including theory and application research. The deep learning will be able to
further improve the intelligence and autonomy of self-driving cars, and can solve some bottle-necks of
traditional technologies, such as the accuracy, robustness, and safety. Currently, many basic problems
of self-driving cars based on deep learning have been explored and the results are exciting, which show
the potential of deep learning. However, there are still lots of problems that should be further studied,
such as real-time problems, reliability problems, and so on.
Author Contributions: Conceptualization, J.N. and Y.C. (Yinan Chen); methodology, J.N. and Y.C. (Yan Chen);
formal analysis, J.N. and J.Z. and W.C.; writing-original draft preparation, Y.C. (Yinan Chen); writing-review
and editing, J.N. and D.A. and W.C.; funding acquisition, J.N. and J.Z. All authors have read and agreed to the
published version of the manuscript.
Acknowledgments: The authors would like to thank the National Natural Science Foundation of China (61873086,
61573128). The authors would also like to thank the authors of all the references.
Conflicts of Interest: The authors declared that they have no conflicts of interest to this work.
References
1. Duarte, F. Self-driving cars: A city perspective. Sci. Robot. 2019, 4. [CrossRef]
2. Shakhatreh, H.; Sawalmeh, A.H.; Al-Fuqaha, A.; Dou, Z.; Almaita, E.; Khalil, I.; Othman, N.S.; Khreishah, A.;
Guizani, M. Unmanned aerial vehicles (UAVs): A survey on civil applications and key research challenges.
IEEE Access 2019, 7, 48572–48634. [CrossRef]
3. Clements, L.M.; Kockelman, K.M. Economic effects of automated vehicles. Trans. Res. Rec. 2017, 2606, 106–114.
[CrossRef]
4. Hussain, R.; Zeadally, S. Autonomous cars: research results, issues, and future challenges. IEEE Commun.
Surv. Tutor. 2019, 21, 1275–1313. [CrossRef]
5. Xue, J.R.; Fang, J.W.; Zhang, P. A survey of scene understanding by event reasoning in autonomous driving.
Int. J. Autom. Comput. 2018, 15, 249–266. [CrossRef]
6. Wang, Z.; Liu, K.; Li, J.; Zhu, Y.; Zhang, Y. Various frameworks and libraries of machine learning and deep
learning: A survey. Arch. Comput. Methods Eng. 2019. [CrossRef]
7. Guo, G.; Zhang, N. A survey on deep learning based face recognition. Comput. Vis. Image Underst. 2019,
189, 102805. [CrossRef]
8. Greenblatt, N.A. Self-driving cars and the law. IEEE Spect. 2016, 53, 46–51. [CrossRef]
9. Bengio, Y.I.; Goodfellow, J.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016.
10. Birdsall, M. Google and ITE: The road ahead for self-driving cars. ITE J. (Inst. Transp. Eng.) 2014, 84, 36–39.
11. Dikmen, M.; Burns, C. Trust in autonomous vehicles: The case of Tesla autopilot and summon. In Proceedings
of the 2017 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2017, Banff, AB, Canada,
5–8 October 2017; pp. 1093–1098.
12. Coelingh, E.; Nilsson, J.; Buffum, J. Driving tests for self-driving cars. IEEE Spectr. 2018, 55, 41–45. [CrossRef]
13. Park, J.; Nam, C.; Kim, H.j. Exploring the key services and players in the smart car market. Telecommun. Policy
2019, 43, 101819. [CrossRef]
14. Toschi, A.; Sanic, M.; Leng, J.; Chen, Q.; Wang, C.; Guo, M. Characterizing perception module performance
and robustness in production-scale autonomous driving system. In Proceedings of the 16th IFIP WG
10.3 International Conference on Network and Parallel Computing, Hohhot, China, 23–24 August 2019;
pp. 235–247.
15. Li, Y.; Wang, J.; Xing, T.; Liu, T.; Li, C.; Su, K. TAD16K: An enhanced benchmark for autonomous
driving. In Proceedings of the 24th IEEE International Conference on Image Processing, Beijing, China,
17–20 September 2017; pp. 2344–2348.
16. Yue, L.; Abdel-Aty, M.; Wu, Y.; Wang, L. Assessment of the safety benefits of vehicles advanced driver
assistance, connectivity and low level automation systems. Accident Anal. Prev. 2018, 117, 55–64. [CrossRef]
[PubMed]
17. Bylykbashi, K.; Qafzezi, E.; Ikeda, M.; Matsuo, K.; Barolli, L. Fuzzy-based driver monitoring system
(FDMS): Implementation of two intelligent FDMSs and a testbed for safe driving in VANETs. Future Gener.
Comput. Syst. 2020, 105, 665–674. [CrossRef]
Appl. Sci. 2020, 10, 2749 26 of 29
18. Schnelle, S.; Wang, J.; Jagacinski, R.; Su, H.j. A feedforward and feedback integrated lateral and longitudinal
driver model for personalized advanced driver assistance systems. Mechatronics 2018, 50, 177–188. [CrossRef]
19. Paden, B.; Cap, M.; Yong, S.Z.; Yershov, D.; Frazzoli, E. A survey of motion planning and control techniques
for self-driving urban vehicles. IEEE Trans. Intell. Veh. 2016, 1, 33–55. [CrossRef]
20. Mittal, S. A Survey on optimized implementation of deep learning models on the NVIDIA Jetson platform.
J. Syst. Arch. 2019, 97, 428–442. [CrossRef]
21. Jones, L. Driverless when and cars: Where? Eng. Technol. 2017, 12, 36–40. [CrossRef]
22. Dekhtiar, J.; Durupt, A.; Bricogne, M.; Eynard, B.; Rowson, H.; Kiritsis, D. Deep learning for big data
applications in CAD and PLM research review, opportunities and case study. Comput. Ind. 2018, 100, 227–243.
[CrossRef]
23. Ni, J.; Wang, K.; Cao, Q.; Khan, Z.; Fan, X. A memetic algorithm with variable length chromosome for robot
path planning under dynamic environments. Int. J. Robot. Autom. 2017, 32, 414–424. [CrossRef]
24. Ni, J.; Wu, L.; Fan, X.; Yang, S.X. Bioinspired intelligent algorithm and its applications for mobile robot
control: A survey. Comput. Intell. Neurosci. 2016, 2016, 3810903. [CrossRef]
25. McCall, R.; McGee, F.; Mirnig, A.; Meschtscherjakov, A.; Louveton, N.; Engel, T.; Tscheligi, M. A taxonomy
of autonomous vehicle handover situations. Transp. Res. Part A Policy Pract. 2019, 124, 507–522. [CrossRef]
26. Liu, M.; Grana, D. Accelerating geostatistical seismic inversion using TensorFlow: A heterogeneous
distributed deep learning framework. Comput. Geosci. 2019, 124, 37–45. [CrossRef]
27. Zhao, B.; Feng, J.; Wu, X.; Yan, S. A survey on deep learning-based fine-grained object classification and
semantic segmentation. Int. J. Autom. Comput. 2017, 14, 119–135. [CrossRef]
28. Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral-spatial residual network for hyperspectral image
classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2018, 56, 847–858. [CrossRef]
29. Agrawal, P.; Ganapathy, S. Modulation filter learning using deep variational networks for robust speech
recognition. IEEE J. Sel. Top. Sign. Process. 2019, 13, 244–253. [CrossRef]
30. Zhang, Z.; Geiger, J.; Pohjalainen, J.; Mousa, A.E.D.; Jin, W.; Schuller, B. Deep learning for environmentally
robust speech recognition: An overview of recent developments. ACM Trans. Intell. Syst. Technol. 2018, 9, 49.
[CrossRef]
31. Young, T.; Hazarika, D.; Poria, S.; Cambria, E. Recent trends in deep learning based natural language
processing. IEEE Comput. Intell. Mag. 2018, 13, 55–75. [CrossRef]
32. Sun, S.; Luo, C.; Chen, J. A review of natural language processing techniques for opinion mining systems.
Inform. Fus. 2017, 36, 10–25. [CrossRef]
33. Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures.
Neural Comput. 2019, 31, 1235–1270. [CrossRef]
34. Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to forget: Continual prediction with LSTM. In Proceedings
of the 1999 the 9th International Conference on ’Artificial Neural Networks (ICANN99)’, Edinburgh, UK,
7–10 September 1999; Volume 2, pp. 850–855.
35. Bouwmans, T.; Javed, S.; Sultana, M.; Jung, S.K. Deep neural network concepts for background subtraction:
A systematic review and comparative evaluation. Neural Netw. 2019, 117, 8–66. [CrossRef]
36. Fu, J.; Liu, J.; Li, Y.; Bao, Y.; Yan, W.; Fang, Z.; Lu, H. Contextual deconvolution network for semantic
segmentation. Pattern Recognit. 2020, 101, 107152. [CrossRef]
37. Woo, J.; Kim, N. Collision avoidance for an unmanned surface vehicle using deep reinforcement learning.
Ocean Eng. 2020, in press. [CrossRef]
38. Ding, Y.; Ma, L.; Ma, J.; Suo, M.; Tao, L.; Cheng, Y.; Lu, C. Intelligent fault diagnosis for rotating
machinery using deep Q-network based health state classification: A deep reinforcement learning approach.
Adv. Eng. Inform. 2019, 42, 100977. [CrossRef]
39. Ni, J.; Liu, M.; Ren, L.; Yang, S.X. A multiagent Q-learning-based optimal allocation approach for urban
water resource management system. IEEE Trans. Autom. Sci. Eng. 2014, 11, 204–214. [CrossRef]
40. Ni, J.; Li, X.; Hua, M.; Yang, S.X. Bioinspired neural network-based Q-learning approach for robot path
planning in unknown environments. Int. J. Robot. Autom. 2016, 31, 464–474. [CrossRef]
41. Wen, S.; Zhao, Y.; Yuan, X.; Wang, Z.; Zhang, D.; Manfredi, L. Path planning for active SLAM based on deep
reinforcement learning under unknown environments. Intell. Serv. Robot. 2020, in press. [CrossRef]
Appl. Sci. 2020, 10, 2749 27 of 29
42. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.;
Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015,
518, 529–533. [CrossRef]
43. Liu, H.; Sun, F.; Zhang, X. Robotic material perception using active multimodal fusion. IEEE Trans.
Indust. Electron. 2019, 66, 9878–9886. [CrossRef]
44. Zhang, X.; Zhou, M.; Qiu, P.; Huang, Y.; Li, J. Radar and vision fusion for the real-time obstacle detection
and identification. Indust. Robot 2019, 46, 391–395. [CrossRef]
45. Kriechbaumer, T.; Blackburn, K.; Breckon, T.P.; Hamilton, O.; Casado, M.R. Quantitative evaluation
of stereo visual odometry for autonomous vessel localisation in inland waterway sensing applications.
Sensors (Switzerland) 2015, 15, 31869–31887. [CrossRef]
46. Mancini, M.; Costante, G.; Valigi, P.; Ciarfuglia, T.A. J-MOD2 : Joint monocular obstacle detection and depth
estimation. IEEE Robot. Autom. Lett. 2018, 3, 1490–1497.
47. Chen, H. Monocular vision-based obstacle detection and avoidance for a multicopter. IEEE Access 2019,
7, 167869–167883.
48. Parmar, Y.; Natarajan, S.; Sobha, G. DeepRange: Deep-learning-based object detection and ranging in
autonomous driving. IET Intell. Trans. Syst. 2019, 13, 1256 – 1264.
49. Scharstein, D.; Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms.
Int. J. Comput. Vis. 2002, 47, 7–42.
50. Žbontar, J.; Lecun, Y. Stereo matching by training a convolutional neural network to compare image patches.
J. Mach. Learn. Res. 2016, 17, 1–32.
51. Nguyen, T.P.; Jeon, J.W. Wide context learning network for stereo matching. Signal Process. Image Commun.
2019, 78, 263–273.
52. Zhang, G.; Zhu, D.; Shi, W.; Ye, X.; Li, J.; Zhang, X. Multi-dimensional residual dense attention network for
stereo matching. IEEE Access 2019, 7, 51681–51690.
53. Kendall, A.; Martirosyan, H.; Dasgupta, S.; Henry, P.; Kennedy, R.; Bachrach, A.; Bry, A. End-to-end learning
of geometry and context for deep stereo regression. In Proceedings of the 16th IEEE International Conference
on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017.
54. Dairi, A.; Harrou, F.; Sun, Y.; Senouci, M. Obstacle detection for intelligent transportation systems using
deep stacked autoencoder and k-nearest neighbor scheme. IEEE Sens. J. 2018, 18, 5122–5132.
55. Mancini, M.; Costante, G.; Valigi, P.; Ciarfuglia, T.A. Fast robust monocular depth estimation for obstacle
detection with fully convolutional networks. In Proceedings of the 2016 IEEE/RSJ International Conference
on Intelligent Robots and Systems, IROS 2016, Daejeon, Korea, 9–14 October 2016; Volume 2016-November,
pp. 4296–4303.
56. Jia, B.; Feng, W.; Zhu, M. Obstacle detection in single images with deep neural networks. Signal Image
Video Process. 2016, 10, 1033 –1040.
57. Zhong, Y.; Li, H.; Dai, Y. Open-world stereo video matching with deep RNN. In Proceedings of the 15th
European Conference on Computer Vision, ECCV 2018, Munich, Germany, 8–14 September 2018.
58. Jie, Z.; Wang, P.; Ling, Y.; Zhao, B.; Wei, Y.; Feng, J.; Liu, W. Left-right comparative recurrent model for stereo
matching. In Proceedings of the 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, Salt Lake City, UT, USA, 18–22 June 2018.
59. Chen, L.; Zhan, W.; Tian, W.; He, Y.; Zou, Q. Deep integration: A multi-label architecture for road scene
recognition. IEEE Trans. Image Process. 2019, 28, 4883–4898.
60. Wang, L.; Guo, S.; Huang, W.; Xiong, Y.; Qiao, Y. Knowledge guided disambiguation for large-scale scene
classification with multi-resolution CNNs. IEEE Trans. Image Process. 2017, 26, 2055–2068. [CrossRef]
61. Tang, P.; Wang, H.; Kwong, S. G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene
recognition. Neurocomputing 2017, 225, 188–197. [CrossRef]
62. Byeon, W.; Breuel, T.M.; Raue, F.; Liwicki, M. Scene labeling with LSTM recurrent neural networks.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston,
MA, USA, 7–12 June 2015.
63. Zhang, L.; Li, L.; Pan, X.; Cao, Z.; Chen, Q.; Yang, H. Multi-level ensemble network for scene recognition.
Multimed. Tools Appl. 2019, 78, 28209–28230. [CrossRef]
64. Liu, S.; Tian, G.; Xu, Y. A novel scene classification model combining ResNet based transfer learning and
data augmentation with a filter. Neurocomputing 2019, 338, 191–206. [CrossRef]
Appl. Sci. 2020, 10, 2749 28 of 29
65. Cheng, X.; Lu, J.; Feng, J.; Yuan, B.; Zhou, J. Scene recognition with objectness. Pattern Recognit. 2018,
74, 474–487. [CrossRef]
66. Sun, N.; Li, W.; Liu, J.; Han, G.; Wu, C. Fusing object semantics and deep appearance features for scene
recognition. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 1715–1728. [CrossRef]
67. John, V.; Liu, Z.; Mita, S.; Guo, C.; Kidono, K. Real-time road surface and semantic lane estimation using
deep features. Signal Image Video Process. 2018, 12, 1133–1140. [CrossRef]
68. Xiao, D.; Yang, X.; Li, J.; Islam, M. Attention deep neural network for lane marking detection.
Knowl. Based Syst. 2020, 105584. [CrossRef]
69. Kim, J.; Kim, J.; Jang, G.J.; Lee, M. Fast learning method for convolutional neural networks using extreme
learning machine and its application to lane detection. Neural Netw. 2017, 87, 109–121. [CrossRef]
70. Liu, J. Learning full-reference quality-guided discriminative gradient cues for lane detection based on neural
networks. J. Vis. Commun. Image Represent. 2019, 65, 102675. [CrossRef]
71. Li, J.; Mei, X.; Prokhorov, D.; Tao, D. Deep neural network for structural prediction and lane detection in
traffic scene. IEEE Trans. Neural Networks Learn. Syst. 2017, 28, 690–703. [CrossRef] [PubMed]
72. Tian, Y.; Gelernter, J.; Wang, X.; Chen, W.; Gao, J.; Zhang, Y.; Li, X. Lane marking detection via deep
convolutional neural network. Neurocomputing 2018, 280, 46–55. [CrossRef] [PubMed]
73. Neven, D.; De Brabandere, B.; Georgoulis, S.; Proesmans, M.; Van Gool, L. Towards end-to-end lane
detection: An instance segmentation approach. In Proceedings of the 2018 IEEE Intelligent Vehicles
Symposium, IV 2018, Changshu, China, 26–30 June 2018.
74. Yang, W.; Cheng, Y.; Chung, P. Improved lane detection with multilevel features in branch convolutional
neural networks. IEEE Access 2019, 7, 173148–173156. [CrossRef]
75. Zou, Q.; Jiang, H.; Dai, Q.; Yue, Y.; Chen, L.; Wang, Q. Robust lane detection from continuous driving scenes
using deep neural networks. IEEE Trans. Veh. Technol. 2020, 69, 41–54. [CrossRef]
76. Ghafoorian, M.; Nugteren, C.; Baka, N.; Booij, O.; Hofmann, M. EL-GAN: Embedding loss driven generative
adversarial networks for lane detection. In Proceedings of the 15th European Conference on Computer
Vision, ECCV 2018, Munich, Germany, 8–14 September 2018; pp. 256–272.
77. Ni, J.; Wu, L.; Shi, P.; Yang, S.X. A dynamic bioinspired neural network based real-time path planning
method for autonomous underwater vehicles. Comput. Intell. Neurosci. 2017, 2017, 9269742. [CrossRef]
78. Ni, J.; Yang, L.; Wu, L.; Fan, X. An improved spinal neural system-based approach for heterogeneous AUVs
cooperative hunting. Int. J. Fuzzy Syst. 2018, 20, 672–686. [CrossRef]
79. Yu, L.; Shao, X.; Wei, Y.; Zhou, K. Intelligent land-vehicle model transfer trajectory planning method based
on deep reinforcement learning. Sensors (Switzerland) 2018, 18, 2905.
80. Eraqi, H.M.; Moustafa, M.N.; Honer, J. End-to-end deep learning for steering autonomous vehicles
considering temporal dependencies. In Proceedings of the 31st Conference on Neural Information Processing
Systems (NIPS 2017), MLITS Workshop, Long Beach, CA, USA, 4–9 December 2017.
81. Shen, C.; Zhao, X.; Fan, X.; Lian, X.; Zhang, F.; Kreidieh, A.R.; Liu, Z. Multi-receptive field graph
convolutional neural networks for pedestrian detection. IET Intell. Trans. Syst. 2019, 13, 1319–1328.
[CrossRef]
82. Xu, H.; Srivastava, G. Automatic recognition algorithm of traffic signs based on convolution neural network.
Multimed. Tools Appl. 2020, in press. [CrossRef]
83. Alghmgham, D.A.; Latif, G.; Alghazo, J.; Alzubaidi, L. Autonomous traffic sign (ATSR) detection and
recognition using deep CNN. Procedia Comput. Sci. 2019, 163, 266–274. [CrossRef]
84. Lee, E.; Kim, D. Accurate traffic light detection using deep neural network with focal regression loss.
Image Vis. Comput. 2019, 87, 24–36. [CrossRef]
85. Kim, H.K.; Yoo, K.Y.; Park, J.H.; Jung, H.Y. Traffic light recognition based on binary semantic segmentation
network. Sensors (Switzerland) 2019, 19, 1700. [CrossRef]
86. Testolin, A.; Stoianov, I.; Zorzi, M. Letter perception emerges from unsupervised deep learning and recycling
of natural image features. Nat. Hum. Behav. 2017, 1, 657–664. [CrossRef]
87. Fraga-Lamas, P.; Ramos, L.; Mondejar-Guerra, V.; Fernandez-Carames, T.M. A review on IoT deep learning
UAV systems for autonomous obstacle detection and collision avoidance. Remote Sens. 2019, 11, 2144.
[CrossRef]
Appl. Sci. 2020, 10, 2749 29 of 29
88. Micucci, A.; Mantecchini, L.; Sangermano, M. Analysis of the relationship between turning signal detection
and motorcycle driver’s characteristics on urban roads; A case study. Sensors (Switzerland) 2019, 19, 1802.
[CrossRef]
89. Micucci, A.; Sangermano, M. A study on cyclists behaviour and bicycles kinematic. Int. J. Trans. Dev. Integr.
2020, 4, 14–28. [CrossRef]
90. Khamparia, A.; Singh, K.M. A systematic review on deep learning architectures and applications. Expert Syst.
2019, 36, e12400. [CrossRef]
91. Carrio, A.; Sampedro, C.; Rodriguez-Ramos, A.; Campoy, P. A review of deep learning methods and
applications for unmanned aerial vehicles. J. Sens. 2017, 2017, 3296874. [CrossRef]
92. Zhang, R.; Li, G.; Li, M.; Wang, L. Fusion of images and point clouds for the semantic segmentation of
large-scale 3D scenes based on deep learning. ISPRS J. Photogram. Remote Sens. 2018, 143, 85–96. [CrossRef]
93. Yang, Y.; Chen, F.; Wu, F.; Zeng, D.; mu Ji, Y.; Jing, X.Y. Multi-view semantic learning network for point
cloud based 3D object detection. Neurocomputing 2020, in press. [CrossRef]
94. Sirohi, D.; Kumar, N.; Rana, P.S. Convolutional neural networks for 5G-enabled intelligent transportation
system: A systematic review. Comput. Commun. 2020, 153, 459–498. [CrossRef]
c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).