Research On Deep Reinforcement Learning Control Al
Research On Deep Reinforcement Learning Control Al
Article
Research on Deep Reinforcement Learning Control Algorithm
for Active Suspension Considering Uncertain Time Delay
Yang Wang 1, *, Cheng Wang 2 , Shijie Zhao 2 and Konghui Guo 2
Abstract: The uncertain delay characteristic of actuators is a critical factor that affects the control
effectiveness of the active suspension system. Therefore, it is crucial to develop a control algorithm
that takes into account this uncertain delay in order to ensure stable control performance. This
study presents a novel active suspension control algorithm based on deep reinforcement learning
(DRL) that specifically addresses the issue of uncertain delay. In this approach, a twin-delayed
deep deterministic policy gradient (TD3) algorithm with system delay is employed to obtain the
optimal control policy by iteratively solving the dynamic model of the active suspension system,
considering the delay. Furthermore, three different operating conditions were designed for simulation
to evaluate the control performance: deterministic delay, semi-regular delay, and uncertain delay. The
experimental results demonstrate that the proposed algorithm achieves excellent control performance
under various operating conditions. Compared to passive suspension, the optimization of body
vertical acceleration is improved by more than 30%, and the proposed algorithm effectively mitigates
body vibration in the low frequency range. It consistently maintains a more than 30% improvement
in ride comfort optimization even under the most severe operating conditions and at different speeds,
demonstrating the algorithm’s potential for practical application.
Keywords: active suspension; deep reinforcement learning; suspension control; uncertain time delay
dynamics [11], and the other is to consider system delays during the controller design
process. Ji et al. [12] proposed an improved variable universe fuzzy control strategy
with real-time adjustment of the contracting–expanding factor parameters to improve the
ride comfort of vehicles. Some scholars have proposed the use of delay compensation
techniques to reduce or offset the negative effects of delays [13–15]. Udwadia et al. [16]
proposed the application of delayed state positive feedback proportional control to the
active control of structures. Pan et al. [17] designed a suspension delay active controller
using an adaptive control strategy. Some scholars [18,19] have also attempted to design
delayed instantaneous optimal control laws for suspension systems with delay using a
state transformation approach to ensure the stability of the system. Du et al. [20] designed
full-frequency domain state feedback controllers considering input delay for automotive
suspensions and seat suspensions and achieved good damping effects within a certain delay
range. Li et al. [21], on the other hand, proposed a time-delayed full-frequency domain
robust control method based on dynamic output feedback. In addition, Kim et al. [22]
combined perception technology and proposed a model predictive control of a semi-active
suspension with a shift delay compensation using preview road information. Wu et al. [23]
proposed a time-delay control strategy and the idea of using a linear motor as the actuator.
According to this idea, they proposed a linear equivalent excitation method to optimize the
optimal time-delay control parameters under complex excitation. Li et al. [24] proposed
a fuzzy cooperative control strategy based on linear matrix inequality theory to weaken
the effect of perturbations on vehicle driving conditions by integrating the problems of
variable structure, disturbance immunity, and time delay. Based on Lyapunov theory and
backstepping technique, Wang [25] studied the adaptive control problem of nonlinear active
suspension systems with random perturbations and time delay. Moreover, some scholars
have considered the time-delay characteristics of active suspension systems and controlled
them according to the Takagi–Sugeno fuzzy model [26–28]. Further, scholars have been
searching for more robust control methods to ensure the stability and performance of the
suspension system [29–31].
Although numerous scholars have conducted extensive research on the problem of
delay, the delay control system remains enigmatic due to its infinite-dimensional nature
and the inherent uncertainty associated with delay. Therefore, further in-depth research is
essential. Simultaneously, the automotive industry is rapidly adopting intelligence as part
of its processes, which has led to the integration of artificial intelligence (AI) technology
as a solution for controlling complex and uncertain systems. Notably, deep reinforcement
learning (DRL) techniques have demonstrated significant advantages in addressing high-
dimensional problems. By combining the robust perception and information-processing
capabilities of deep learning (DL) with the decision-making proficiency of reinforcement
learning (RL) in complex environments, these techniques are gradually gaining traction in
the study of various intricate systems.
Currently, DRL has shown its advantages in vision and decision-making [32,33].
Some scholars have also tried to apply it to suspension control [34,35]. In an intuitive
application, Pang et al. [36] proposed a non-fragile fault-tolerant control design for Markov-
type systems. Providing more depth, Kozek et al. [37] proposed a neural algorithm based
on reinforcement learning to optimize the creation of a linear quadratic regulator (LQR).
In recent years, Li et al. [38] used an actor–critic architecture to study the adaptive neural
network output feedback optimal control problem. In addition, many scholars have utilized
different DRL architectures for active or semi-active suspension control. Lin et al. [39]
studied a deep deterministic policy gradient (DDPG) control strategy for a full-vehicle
active Macpherson suspension system. Yong et al. [40] proposed learning and control
strategies for a semi-active suspension system in a full car using soft actor–critic (SAC)
models on real roads. Similarly, Lee et al. [41] conducted a study on semi-active suspension
control using DRL and proposed a state-normalization filter to improve the generalization
performance. Further, Du et al. [42] proposed the utilization of external knowledge in the
DDPG framework for suspension control and integration of speed planning to ensure ride
Sensors 2023, 23, 7827 3 of 20
comfort. It is worth mentioning that Han et al. [43] and Dridi et al. [44] have tried to use
a proximal policy optimization (PPO) algorithm for semi-active and active suspension
control and achieved satisfactory control results. Although there have been some useful
studies on the application of DRL to suspension control, few attempts have been made to
solve the delay problem. However, it is worth noting that in sequential decisions such as
DRL control, the delay problem can have a significant impact on the actual control effect.
Therefore, using DRL to solve the delay problem is still a new idea to be investigated. In
this study, by adding a delay link to the twin-delayed deep deterministic policy gradient
(TD3) algorithm, the agent is guided to explore the possibility of obtaining a more robust
control strategy in a time-delayed environment, and then effectively suppress the effect of
uncertain delay on the active suspension system. To summarize, the main innovations of
this study are as follows:
1. To our knowledge, this study represents the first research endeavor to employ DRL
techniques within the realm of delay control. The primary aim of this investigation is
to alleviate the repercussions of uncertain delays through the implementation of active
suspension control strategies rooted in DRL. Furthermore, this study demonstrates
the utilization of high-dimensional advantages of DRL in an infinite-dimensional
delay control system, ultimately achieving commendable results.
2. In this study, multiple sets of simulation considering deterministic delay, semi-regular
delay, and uncertain delay are proposed to test the control performance of the algorithm.
Various delay characteristics and uncertainty of the control system under their influence
are considered to make the simulation closer to the actual working conditions.
3. The control algorithm proposed in this study maintains good control performance
in multiple sets of the simulation built by MATLAB/Simulink for different working
conditions and speeds, proving its application potential.
This paper is organized as follows. Section 2 presents the active suspension quarter
model and road model. The proposed controller algorithm and the model design associ-
ated with it are presented in Sections 3 and 4, respectively. In Section 5, the simulation
environments with deterministic, semi-regular, and uncertain delays set up to validate
the performance of the algorithm and their results are shown. In Section 6, the simulation
results are discussed in a broader context with respect to the simulation results. Finally,
Section 7 provides the conclusions of this study. A table of notations and abbreviations
used in this paper is provided in Appendix A.
τ and actuator delay τa in the control system are considered. Among them, the inherent
Sensors 2023, 23, 7827 4 of 20
delay is the amount of delay caused by the acquisition and transmission of signals in the
active suspension control system and the controller operation, and the actuator delay is
the amount of delay caused by the actuator response delay, which is unavoidable in the
the amount
control loop.of delay caused by the actuator response delay, which is unavoidable in the
control loop.
Figure1.1. Dynamics
Figure Dynamicsmodel
modelconsidering
consideringdelay
delaytime.
time.
According
According to the second
to the secondclass
classofofLagrangian
Lagrangianequations,
equations,
thethe kinetic
kinetic equation
equation with
with de-
delayed quantities at moment t can be obtained
layed quantities at moment t can be obtained as as
.. m x ( t ) + k ( x ( t ) − x. ( t ) ) + c. ( x (t ) − x ( t ) ) + u ( t − τ − τ ) = 0
mb xb (t) + k b ( xbb(tb) − xu (b t))b + cb xub (t) − xb u (tb) + uu(t − τ − τa ) = 0a
(1)
..
mu x u (t) + k b (xm x ( t ) + k ( xu ( tk) −( xxb ((tt)) )−+ w
ku((tx))u (+
t ) c− wx.( t )()t)+−
cb x(. xu(t()t ) −−xu ( tt)−) −τu−( t τ−aτ) −=τa0) = 0 (1)
u (u t )u − xb ( tb)) + u u b u b b (
where mm
where is the sprung mass, kg. mu the
bbis the sprung mass, kg. mu is
is the unsprung
unsprung mass, mass, k b iskbthe
kg.kg. is spring
the spring stiff-
stiffness,
N/m.
ness, N/m. ku equivalent
k u is the is the equivalent tire stiffness, N/m. N/m.
tire stiffness, cb equivalent
cb is the damping
is the equivalent factor of
damping the
factor
suspension damping element, N·s/m. xb is the vertical displacement of the sprung mass, m.
of the suspension damping element, N·s/m. x is the vertical displacement of the sprung
xu is the vertical displacement of the unsprung bmass, m. w is the vertical tire displacement,
which can xalso
mass, m. u is be
theequated
vertical to displacement of the unsprung
the road displacement, m. Fmass,= u(m.t −w τ −isτathe
) isvertical tire
the active
displacement, which can also be equated to the road displacement, m. F = u ( t − τ − τa )
suspension actuator control force for the time-delay system, N.
Take the system state variables as
is the active suspension actuator control force for the time-delay system, N.
Take the system state variables as . . T
x (t) = xb (t) xu (t) x b (t) x u (t) (2)
T
x ( t ) = xb ( t ) xu ( t ) xb ( t ) xu ( t )
(2)
Take the system output variable as
Take the system output variable as
.. . . . T
y(t) = [ x b (t) x b (t) x b (t) − x u (t) xb (t) − xu (t) xu (t) − w(t)]T (3)
y ( t ) = [
xb ( t ) xb ( t ) xb ( t ) − xu ( t ) xb ( t ) − xu ( t ) xu ( t ) − w ( t )] (3)
Then,
Then,the
thestate
statespace
spaceexpression
expressionof ofthetheactive
activesuspension,
suspension,considering
consideringthe
thedelay,
delay,isis
) x=( t )Ax t) (+t )Bu ( t −τ τ−−ττaa))++ Ew
Ew((tt))
.
x (t = (Ax + Bu
(t −
y(t) y=( tCx
(4)
(4)
) =(Cx ( t )Du
t) +
+ Du ( t −τ −
(t − τ −ττaa))++ Lw(( t ))
0 0 1 0 0
0
0 0 0 1 0 0
where A = c b , B = , E = ,
− kb kb
− mcb − 1 0
m b mb b m b
m b
1 ku
kb
mu − (kbm+uku ) cb
mu − mcbu mu mu
− kb kb
− mcbb cb
1
− mb
0
mb mb mb
0 0 1 0 0 0
C= 0 −1 , D = 0 , L = 0 .
0 1
1 −1 0 0
0 0
0 1 0 0 0 −1
0 0 1 0 0
C= , D= , L=0.
0 0 1 −1 0
1 −1 0 0 0 0
− 1
0 1 0 0 0
Sensors 2023, 23, 7827 5 of 20
2.2. Road Model
The vehicle road model uses a filtered white noise time domain road input model,
2.2.
i.e., Road Model
The vehicle road model uses a filtered white noise time domain road input model, i.e.,
w ( t ) = − 2 π f 0 w ( t ) + 2 π S q ( n0 ) v w0 ( t ) (5)
.
where w ( t ) is the roadwdisplacement. vw0 (t)frequency. Sq ( n0 ) is the
q
(t) = −2π f 0 w(ft0) + is 2π Sq (n0 )cutoff
the lower (5)
road unevenness coefficient, which is related to the road class, where the unevenness co-
where
efficientsw(tof
) is the A,
class road
B, and C roads aref 016,
displacement. is the lower
64, and cutoff
256, frequency.v Sisq (the
respectively. n0 ) speed,
is the road
m/s.
unevenness coefficient, which is related to the road class, where the unevenness coefficients
w is a uniformly distributed white noise with a mean of 0 and an intensity of 1.
of0 class A, B, and C roads are 16, 64, and 256, respectively. v is the speed, m/s. w0 is a
The lower
uniformly cutoff frequency
distributed is with
white noise calculated
a mean asof 0 and an intensity of 1.
The lower cutoff frequency is calculated f =as
2πn v
0 00 (6)
3. Controller Algorithm
3. Controller Algorithm
This
This section
section introduces
introducesthe
theprimary
primaryalgorithm
algorithmemployed
employedininconstructing
constructingthe
thecontroller
control-
for
ler for the active suspension delay system. Initially, the fundamental principles framework
the active suspension delay system. Initially, the fundamental principles and and frame-
of reinforcement
work learning
of reinforcement (RL) are
learning presented,
(RL) followed
are presented, by an elucidation
followed of the advantages
by an elucidation of the ad-
offered
vantages byoffered
the deep
by reinforcement learning (DRL)
the deep reinforcement learningalgorithm compared
(DRL) algorithm to traditional
compared RL
to tradi-
algorithms. Subsequently, the TD3 algorithm, known for its suitability in continuous
tional RL algorithms. Subsequently, the TD3 algorithm, known for its suitability in
control systems, is selected based on the characteristics of the delay control system. The
algorithmic process and technical intricacies of TD3 are then described.
Figure3.
Figure 3. RL
RL basic
basic framework.
framework.
3.2.
3.2. Deep
Deep Reinforcement
Reinforcement Learning
Learning
In RL, the agent learns
In RL, the agent learns a function
a functionto formulate
to formulateappropriate actions
appropriate to maximize
actions the goal.
to maximize the
However, RL has limitations when dealing with large-scale problems. For
goal. However, RL has limitations when dealing with large-scale problems. For example, example, in cases
where
in cases thewhere
state space andspace
the state actionandspace are extremely
action large, traditional
space are extremely RL algorithms
large, traditional may
RL algo-
face the issues of the curse of dimensionality and computational complexity.
rithms may face the issues of the curse of dimensionality and computational complexity. Additionally,
traditional
Additionally, RL algorithms mayalgorithms
traditional RL require a large
may number
require aoflarge
samples and time
number to learn
of samples anda good
time
policy, which can be time-consuming and inefficient for certain tasks. With the development
to learn a good policy, which can be time-consuming and inefficient for certain tasks. With
of DL technology, we can use deep neural networks as function-approximation methods to
the development of DL technology, we can use deep neural networks as function-approx-
learn value functions that imply multidimensional information. It extends the application
imation methods to learn value functions that imply multidimensional information. It ex-
of RL to higher dimensions, allowing it to suppress the high-dimensional disasters of
tends the application of RL to higher dimensions, allowing it to suppress the high-dimen-
traditional optimization problems to some extent.
sional disasters of traditional optimization problems to some extent.
The three primary functions that can be learned in RL align with the three primary
The three primary functions that can be learned in RL align with the three primary
methods in DRL, namely, policy-based, value-based, and model-based. Researchers use
methods in DRL, namely, policy-based, value-based, and model-based. Researchers use
these three major classes of methods individually or in combination to meet the needs of
these three major classes of methods individually or in combination to meet the needs of
practical tasks. In this study, the active suspension control system considering delay is a
practical tasks. In this study, the active suspension control system considering delay is a
typical continuous state and continuous action control system. The classical DRL algorithm
typical continuous state and continuous action control system. The classical DRL algo-
in this field is the DDPG [48]. DDPG utilizes an actor–critic architecture for learning, i.e.,
rithm
an in network
actor this field is
is the
addedDDPG [48].
to the DDPG
deep utilizes an
Q network actor–critic
(DQN) [49,50]architecture for learning,
to output action values
i.e., an actor network is added to the deep Q network (DQN) [49,50]
directly. It helps the agent to obtain more feedback information and thus make to output action val-
more
ues directly. It
accurate decisions. helps the agent to obtain more feedback information and thus make more
accurate decisions.
3.3. Twin-Delayed Deep Deterministic Policy Gradient
Although the classical DDPG algorithm has achieved satisfactory results in many
tasks, the traditional actor–critic framework suffers from the bias and variance problems
associated with function approximation, considering the cumulative nature of the DRL
iterative process. Specifically, on the one hand, the variance can cause overestimation, while
at the same time, high variance can cause the accumulation of errors, which in turn makes
the system less stable. Considering the characteristics of time-delayed control systems,
high variance means more aggressive control behavior and high deviation means gradual
deviation from the steady state, and the agent gains more benefit but also takes more
risk, which is more likely to cause instability of the control system. Therefore, we need to
suppress both overestimation and cumulative error. Based on the above practical control
requirements, this study uses the TD3 algorithm [51] as the basic algorithmic framework,
i.e., adding techniques such as clipped double Q-learning, delayed policy updates, and
target policy smoothing to the DDPG framework. The framework improvement of the TD3
algorithm is shown in Figure 4.
gradual deviation from the steady state, and the agent gains more benefit but also takes
more risk, which is more likely to cause instability of the control system. Therefore, we
need to suppress both overestimation and cumulative error. Based on the above practical
control requirements, this study uses the TD3 algorithm [51] as the basic algorithmic
framework, i.e., adding techniques such as clipped double Q-learning, delayed policy up-
Sensors 2023, 23, 7827 dates, and target policy smoothing to the DDPG framework. The framework improve- 7 of 20
Figure 4.
Figure 4. TD3
TD3 algorithm
algorithm improvement
improvement structure.
structure.
When building
When building aa controller
controller using the
using the TD3 algorithm,
TD3 algorithm, first,
first, randomly
randomly initialize
initialize the
the
( ) ( ) ( )
critic network QQ1 s,s,aa θθQ1 , Q s, a θ , and actor network µ ( s | θ ) with parameters
Q1 , Q2 s, a θQ2 , and actor network μ s θμ
2 Q2 µ
critic network 1 with parameters
θQ1, θQ2 , and θµ ,respectively. At the same time, we need to initialize target networks
Qθ0Q1 s,
, aθQθ20 , and
, Q0θμ s,
, respectively.
a θ0 , and µAt0 the
s θ0 same time,
with we needθto
parameters 0 initialize
, θ0 , and θ target
0 networks
, respectively.
1 Q1 2 Q2 µ µ
( ) ( )
Q1 Q2
'
Q s, a θ
Then,
1
' '
synchronize
Q1 , 2 ,a θ
Q sthe '
parameters:
Q2 , and μ
'
(s θ )'
μ
'
with parameters θ Q 1 , θ Q 2 , and θμ , respec-
' '
ξt N t ( 0,actions
Performing σ) in the environment and obtaining rewards rt and new state st+1 .
where
We .
store these experiences in the form of samples of (st , at , rt , st+1 ) that are transformed
into the experience buffer. When the experience buffer reaches a certain number of samples,
we randomly sample a mini-batch of experience–transfer samples (si , ai , ri , si+1 ) from it to
perform parameter updates. Let
∼
yi = ri + γmin j=1,2 Q0j si+1 , µ0 si+1 θµ0
0
θQj + ζ , ζ ∼ clip N 0, σ , −c, c (9)
where γ is the discount factor and takes the value of 0.99, which is taken to be equivalent to
considering the situation after 100 time steps. c is the noise clipping, which in this study
is taken as 0.5. The clipped double Q-learning idea and target policy smoothing idea of
TD3 are reflected here. With clipped double Q-learning, the value target cannot introduce
any additional overestimation bias using the standard Q-learning target. While this update
rule may induce an underestimation bias, this is far preferable to overestimation bias, as
unlike overestimated actions, the value of underestimated actions will not be explicitly
propagated through the policy update [51]. Moreover, the effect of inaccuracy caused by
function approximation error can be effectively reduced by target policy smoothing.
Sensors 2023, 23, 7827 8 of 20
Update the parameters θQj of the critic network according to the minimization loss
function, i.e.,
1 2
∇θQj L θQj = ∑i yi − Q j si , ai θQj
∇θQj Q j si , ai θQj , j = 1, 2 (10)
N
where N is the size of the mini-batch sample, whose value we take as 128 in this study. Ac-
cording to our experience, the size of the mini-batch should be related to the complexity of
the problem being studied. The parameters θµ of the actor network are updated according
to the objective maximization function, i.e.,
1
∑
∇θµ J (µ) = i
∇ a Q s, a θQ s=si ,a=µ(si ) · ∇θµ µ(s|θµ )|si (11)
N
Finally, in order to avoid unknown oscillations affecting convergence during gradient
descent to ensure that the update of the network can balance stability and rapidity, we
perform the soft update process for the network parameters, i.e.,
where ε is the soft update factor, since the delay control system is a high-dimensional
unstable system, ε = 0.001 is taken in this study to satisfy the robustness of the policy to
some extent.
In addition, the critic network is updated twice as often as the actor network in order
to first minimize the errors induced by the value estimation before introducing policy
updates. Soft updating of network parameters and delayed policy updating ensure target
stabilization and thus reduce error increase. TD3 is summarized in Algorithm 1.
Algorithm 1 TD3
Randomly initialize critic networks Q1 s, a θQ1 , Q2 s, a θQ2 , and actor network µ(s|θµ )
1
with random parameters θQ1 , θQ2 , and θµ .
2 Initialize target networks θQ1 → θ0Q1 , θQ2 → θ0Q2 , θµ → θµ 0
4. Controller Model
Solving a new problem using DRL involves the creation of an environment, so in this
section, we focus on the environment components involved in the controller model, namely,
the state, action, reward, and transfer function. Among them, the transfer function of DRL
is the dynamics model introduced in Section 2, which will not be repeated here.
4.1. State
The state is the information that describes the environment, and the RL environment
must provide the algorithm with enough information to solve the problem. In the active
suspension delay control system, multidimensional information such as displacement,
velocity, acceleration, control force, and time are included. These high-level states are the
raw states. The raw state should contain all the information relevant to the problem, but it
is often difficult to learn because it usually has a lot of redundancy and background noise.
Raw and complete information provides more freedom, but extracting and interpreting
useful signals from it is a much heavier burden. Therefore, the selection and design of the
state space becomes particularly important.
Based on the important suspension performance parameters included in the dynamics
model, the following state variables are chosen to characterize the state space considering
the control performance requirements of the active suspension. The designed state space
takes into account both real-world constraints and observability cases under the influence
of sensor arrangements.
.. . . . T
s = xb xb xb − xu xb − xu (13)
The state information takes into account the ride comfort, actuator efficiency, and
suspension travel of the suspension system. This state contains much less information than
the original state, but it contains more straightforward information for the algorithm and is
easier to learn.
Further, the agent further preprocesses the designed states for its own use. In order to
improve the generalization performance of the controller model, this study normalizes the
states, and the final state vector is represented as
h .. . . . iT
xb xb xb −xu xb − xu
s= λ1 λ2 λ3 λ4
(14)
4.2. Action
Action is the output of the agent, which is the amount of control output by the
controller. Actions change the environment by transitioning the dynamics model to the
next state and into the next round of iterations. How the action is designed affects the ease
of control of the system and thus directly affects the problem’s difficulty. In this study, the
action is the control force of the active suspension actuator, i.e.,
where µ(s|θµ ) is the actor network. It should be noted that due to the delay, the control
force in the actual system often appears as
a t + τ + τ a = µ ( s t | θµ ) (16)
Considering the specific performance constraints of the actuator, adding the double
truncation constraint to the actor network, the final action is represented as
In this study, based on the saturation constraint of the actuator, we set Fmin = −3 kN
and Fmax = 3 kN.
4.3. Reward
The reward signal is used to define the objective that the agent should maximize to
guide the agent’s exploration to obtain a better policy from a global level. The reward
function can be somewhat analogous to the objective function of a traditional control prob-
lem, so reward design is an important issue in DRL. In the design of an active suspension
controller considering delay, the control performance requirements of multiple objectives
need to be considered.
The first issue is the ride comfort, which is closely related to the vertical acceleration
of the vehicle body. Therefore, the impact of road unevenness on the body is avoided, and
the vertical acceleration is reduced while a suitable active control force is needed.
Secondly, the practical constraints of suspension travel need to be considered. Dynamic
suspension travel needs to satisfy the following inequalities within a safe range:
where f d is the maximum travel value of the suspension, and in this study, let f d = 0.15 m.
The value is determined by referring to the literature [52] and considering the actual
constraints of the suspension obtained.
Then, the grounding of the suspension needs to be considered, i.e., the following
inequalities need to be satisfied to ensure the handling stability of the vehicle:
where Fm is the static load of the tire, and its calculation formula is
Fm = (mb + mu ) g (20)
Finally, the control characteristics of the actuator need to be considered. The actuator
delay τa has a close relationship with the total delay of the whole system, so in order to
suppress the effect of delay at the physical level, we should ensure that the control force is
relatively stable in a relatively small interval as much as possible.
In summary, the reward function is defined as
..
2
r = − k 1 x b + k 2 | x b − x u |2 + k 3 | x u − w |2 + k 4 | F |2 (21)
where k1 = 0.7, k2 = 0.1, k3 = 0.1, k4 = 0.1 are the weight coefficients of the balanced
multi-objective optimization problem.
It should be noted that the agent’s reward function in the training phase references
the state information of the system after the control force has been applied after the delay.
In contrast, the state referenced by the actor network and critic network is the current state.
In other words, the delayed control system in the experimental phase is equivalent to an
open-loop control system.
we established a severe operating condition with uncertain delay to test the anti-disturbance
performance of the proposed algorithm and its improvement of ride comfort.
The critic and actor network used for the agent were specifically designed for the
control
Sensors 2023, 23, x FOR PEER REVIEW of active suspension systems, as shown in Figure 5, and the hyperparameters used
12 of 21
to train the network are shown in Table 2. In order to better verify the performance of the
proposed algorithm in this study, we chose the active suspension DDPG control architecture
proposed in the literature [53] as a baseline for comparison. The hyperparameters of the
the baseline algorithm were used in this study, and additional hyperparameters were se-
baseline algorithm were used in this study, and additional hyperparameters were selected
lected based on the original TD3 algorithm. In addition, we performed combinatorial ex-
based on the original TD3 algorithm. In addition, we performed combinatorial experiments
periments on some of the hyperparameters; see Appendix B.
on some of the hyperparameters; see Appendix B.
(a) (b)
Figure
Figure 5. Critic
5. Critic and actor
and actor network.
network. (a) Network
(a) Network architecture
architecture created
created forcritic,
for the the critic, andnetwork
and (b) (b) network
architecture created for the actor. It should be noted that the last fully connected layer in theincritic
architecture created for the actor. It should be noted that the last fully connected layer the critic
network directly outputs the result without the need for activation.
network directly outputs the result without the need for activation.
Table 2. Agent hyperparameter.
Hyperparameter
Item Value
Learning rate 1 × 10−3
Critic Gradient threshold 1
L2 Regularization factor 1 × 10−4
Learning rate 1 × 10−2
Actor
Gradient threshold 1
Sample time 0.01
Sensors 2023, 23, 7827 12 of 20
Hyperparameter
Item Value
Learning rate 1 × 10−3
Critic Gradient threshold 1
L2 Regularization factor 1 × 10−4
Learning rate 1 × 10−2
Actor
Gradient threshold 1
Sample time 0.01
Target smoothing factor 1 × 10−3
Experience buffer length 1 × 106
Discount factor 0.99
Agent Mini-batch size 128
Soft update factor 1 × 10−3
Delayed update frequency 2
Noise clipping 0.5
Noise variance 0.6
The decay rate of noise variance 1 × 10−5
Max episodes 2000
Training process
Max steps 1000
10 ms 20 ms 30 ms
Controller Passive
Proposed DDPG Proposed DDPG Proposed DDPG
.. 1.0595 0.7772 1.0347 1.3901 1.2716 1.5439
xb
1.8778 (+43.58%) (+58.61%) (+44.9%) (+25.97%) (+32.28%) (+17.78%)
m/s2
As we can see from the graphs, the proposed control algorithm optimized the ride
comfort by 43.58%, 44.9%, and 32.28% for 10 ms, 20 ms, and 30 ms deterministic delays,
respectively, compared to the passive suspension. Although the control algorithm of the
proposed algorithm is slightly inferior to that of DDPG under the low latency condition of
10 ms, the control performance of the proposed algorithm improves by 25.56% compared
to that of DDPG under the latency condition of 20 ms. Further, under the large delay
condition of 30 ms, the proposed algorithm still maintains the optimization result of 32.28%
compared to DDPG, which cannot maintain stability and crashes. The above results clearly
posed algorithm with different time delays in the simulation environment were compared
with passive results, as tabulated in Table 3. The simulation comparison results of the
body acceleration and the frequency response are shown in Figure 6. In addition, we com-
pared the proposed algorithm with the most classical and effective DDPG algorithm [53];
the comparison results are also presented in the graphs.
Sensors 2023, 23, 7827 13 of 20
10 ms 20 ms 30 ms
Controller Passivedemonstrate the superior control performance of the proposed algorithm. Although the
proposed algorithm exhibited good control performance at deterministic delays, DDPG
Proposed DDPG Proposed DDPG Proposed realistic
xb
control systems did not always have deterministic delays. Deterministic delay is equivalent
1.0595 0.7772 1.0347 1.3901 1.2716 1.5439
(m s ) 2 1.8778 to adding a deterministic dimension to the overall control system, which is still solvable
(+43.58%) (+58.61%) (+44.9%) (+25.97%) (+32.28%) (+17.78%)
to some extent. Therefore, the study of delayed control systems requires more in-depth
discussion and analysis.
(a) (b)
(c) (d)
Figure
Figure 6. 6.Control
Controlperformance
performance with
with deterministic
deterministic time
time delay.
delay. (a–c)
(a–c) Vehicle
Vehicle body
body acceleration
acceleration curves
curves
under
under 1010ms,ms,
2020 ms,
ms, and
and 3030
msms delay,
delay, respectively.
respectively. (d)(d) Frequency
Frequency response
response
..
of of xb / w.
x b /w.
δ, 0 < |∆F | ≤ f
where δ is the unit delay amount, and its value was taken as 10 ms in this study. f is the unit
amount determined according to the maximum limiting control force of the actuator, and
its value was taken as 2 kN in this study. This value is chosen by considering the bandwidth
of the active suspension actuator and can actually be obtained by testing the actuation force
response characteristics of the active suspension actuator under different loads.
where δ is the unit delay amount, and its value was taken as 10 ms in this study. f is
the unit amount determined according to the maximum limiting control force of the actu-
ator, and its value was taken as 2 kN in this study. This value is chosen by considering the
bandwidth of the active suspension actuator and can actually be obtained by testing the
Sensors 2023, 23, 7827 14 of 20
actuation force response characteristics of the active suspension actuator under different
loads.
The semi-regular delay condition was based on the fuzzy relationship between actu-
atorThe semi-regular
delay delaycapacity,
and actuation conditionandwasthe
based onwas
delay the fuzzy
graded relationship
in steps. Itbetween
can makeactuator
the sys-
delay and actuation capacity, and the delay was graded in steps. It can make the
tem simple while retaining the actuator’s role to a greater extent, so this condition had system
simple while
stronger retaining
practical the actuator’s
significance role tothe
for testing a greater
controlextent, so this
algorithm. Thecondition had stronger
body acceleration and
practical
frequency response with the semi-regular delay condition are shown in Figurefrequency
significance for testing the control algorithm. The body acceleration and 7.
response with the semi-regular delay condition are shown in Figure 7.
(a) (b)
Figure
Figure 7. 7.Control
Controlperformance
performancewith
withsemi-regular
semi-regulartime
timedelay.
delay.(a)(a)Vehicle
Vehicle body
body acceleration
acceleration curve
curve
under
under semi-regular
semi-regular time
time delay.
delay. (b)(b) Frequency
Frequency response
response
..
of of xb / w.
x b /w.
In the semi-regular delay condition, the RMS result of the proposed algorithm is
0.9529 m/s2 , and the ride comfort is optimized by 44.13%. In comparison, the RMS value
under the DDPG baseline control is 1.1321 m/s2 , while the control performance of the
proposed algorithm exceeds the DDPG baseline by 15.8%. We can see that the proposed
algorithm still maintained good control performance in the operating conditions where the
fuzzy characteristics of the actuator were considered.
(a) (b)
Figure
Figure 8. 8. Control
Control performance
performance with
with uncertain
uncertain time
time delay.
delay. (a)(a) Vehicle
Vehicle body
body acceleration
acceleration curve
curve under
under
uncertain
uncertain time
time delay.
delay. (b)(b) Frequency
Frequency response
response
..
of of xb / w.
x b /w.
It Itshould
shouldbebenoted
notedthat
thatinin
the
thecomparative
comparativeexperiments,
experiments,the
theDDPG
DDPGalgorithm
algorithmwas
was
unable to complete the full verification due to triggering the termination condition within
unable to complete the full verification due to triggering the termination condition within
the episode described in Equation (18) under the conditions of a 30 ms large delay, a semi-
regular delay, and an uncertain delay. In order to conduct the comparative experiments, we
had to remove the relevant termination conditions. In comparison, the proposed algorithm
consistently did not trigger the termination condition, ensuring the safety of the driving
process. Furthermore, in order to verify the generalization performance of the proposed
algorithm in different environments, we conducted experiments by varying the speed. The
control results are shown in Table 4. The table indicates that the proposed algorithm shows
good control performance at different speeds.
6. Discussion
In this study, we made some beneficial attempts using DRL to solve the challenging
problem of time delay control in active suspension systems. We set deterministic, semi-
regular, and uncertain time delays to simulate the changes from ideal working conditions to
real working conditions and then to harsh working conditions, thereby testing the control
performance of the proposed algorithm. Under deterministic delay, the proposed algorithm
demonstrated good control performance at working conditions of 10 ms, 20 ms, and 30 ms,
surpassing the DDPG baseline and maintaining good stability even under larger time
delays. In addition, the proposed algorithm effectively suppressed the first resonance peak
of road excitation and body, and improved ride comfort. The proposed algorithm includes
predictions of future rewards, thus possessing stronger robustness to a certain extent.
This condition corresponds to a relatively ideal working environment for the actuator,
where stable fixed delay is desirable. However, under actual conditions, system delay is
Sensors 2023, 23, 7827 16 of 20
often closely related to the actuator’s manufacturing capability and response. Therefore,
we designed semi-regular delay conditions to simply simulate this characteristic. The
simulation results also reflected the good control performance of the proposed algorithm
and its improvement in ride comfort. We believe that these results are due to the fact
that the proposed algorithm, based on predicting the future, has imposed planning on the
output of control force, keeping it within a small range of variation that better aligns with
the actuator’s response characteristics. Furthermore, uncertain conditions are relatively
harsh working conditions, and it is necessary to conduct simulations under such conditions
to better test the performance of the proposed algorithm. It can be seen that under such
conditions, the proposed algorithm can still maintain a 37.56% optimization effect. We
believe this is because, for the infinite-dimensional delay control system, the data-driven
algorithm bypasses the analysis at the high-dimensional level and directly performs end-
to-end analysis. To a certain extent, it corresponds to re-architecting a solver that simulates
a high-dimensional environment, and this approach is undoubtedly novel and effective.
Of course, further research is needed to verify its effectiveness. It is encouraging that
Baek et al. [54] and Li et al. [55] have attempted to apply DRL to robotics, Zhu et al. [56]
have applied it in Mobile Sensor Networks, and Chen et al. [57] have generalized it even
more to continuous control. At the same time, it is also important for us to focus on choosing
more suitable algorithms for the control of delay systems. In recent years, PPO [58] has been
widely applied in the industry due to its robustness and performance. The characteristics
of PPO in parameter tuning and policy updating provide us with new ideas for our future
research, which will be a valuable direction for future studies.
Furthermore, the generalization ability of the algorithm has always been a contro-
versial issue for learning-based algorithms. For this reason, we conducted simulations at
different speeds, and the results showed that the proposed algorithm maintained over 30%
comfort optimization from 10 m/s to 50 m/s, which covers almost all driving speeds in
reality. Moreover, to apply the proposed algorithm in the real world, the complexity of the
algorithm must be considered, and its real-time calculation performance must be examined.
The trained DRL controller has 33,537 parameters, and it only takes 5.6 ms to compute on a
typical PC (Intel Core i9-12900KF, 16 GB RAM). Therefore, the proposed controller will not
be a problem in terms of real-time implementation.
7. Conclusions
This paper proposed an active suspension DRL control algorithm considering time
delay to study the uncertain delay problem in the actual control system of active suspen-
sion. Firstly, a dynamics model of the active suspension system considering time delay
was established. Secondly, the TD3 algorithm was enhanced by incorporating delay, en-
abling the agent to explore more robust policies. Finally, simulation experiments were
conducted under three different experimental conditions: deterministic delay, semi-regular
delay, and uncertain delay. The proposed algorithm’s control performance was evalu-
ated, and experimental validation was performed at various speeds. The results illustrate
the algorithm’s effectiveness in mitigating the impact of uncertain delay on the active
suspension system, resulting in significant improvements in ride comfort optimization.
Specifically, the proposed algorithm achieved comfort optimization rates of 43.58%, 44.9%,
and 32.28% for deterministic delays of 10 ms, 20 ms, and 30 ms, respectively. Additionally,
it obtained optimization rates of 44.13% and 37.56% for semi-regular and uncertain delay
conditions, respectively. Furthermore, when compared to the DDPG baseline algorithm, the
proposed algorithm demonstrates excellent stability and convergence even under complex
delay conditions.
Despite satisfactory results in the current research, the important characteristic of
time delay still requires further investigation. In future work, we aim to enhance our
understanding of the relationship between delay and control performance by incorporating
an integrated system model that accounts for actuator dynamics into the DRL environment.
Sensors 2023, 23, 7827 17 of 20
By relying on a comprehensive model environment, the agent can derive improved control
policies that are better suited for real-world vehicle deployment scenarios.
Author Contributions: Conceptualization, Y.W., C.W., S.Z. and K.G.; methodology, Y.W. and C.W.;
software, Y.W.; validation, Y.W. and S.Z.; formal analysis, Y.W.; investigation, Y.W.; resources, Y.W.;
data curation, Y.W.; writing—original draft preparation, C.W.; writing—review and editing, Y.W. and
S.Z.; visualization, Y.W.; supervision, K.G.; project administration, K.G.; funding acquisition, K.G. All
authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by The National Key Research and Development Program of
China (Grant No. 2022YFB3206602) and China Postdoctoral Science Foundation Funded Project
(Grant No. 2022M720433).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Acknowledgments: The authors sincerely thank the anonymous reviewers for their critical comments
and suggestions for improving the manuscript.
Conflicts of Interest: The authors declare no conflict of interest.
Neuronal
32–64 64–128 128–256 256–512
Assemblies
Optimization 24.99% 30.30% 35.62% 32.82%
Table A4. Critic network and actor network learning rate. The first behavioral critic network learning
rate and the first column is the actor network learning rate.
References
1. Yan, G.H.; Wang, S.H.; Guan, Z.W.; Liu, C.F. PID Control Strategy of Vehicle Active Suspension Based on Considering Time-Delay
and Stability. Adv. Mater. Res. 2013, 706–708, 901–906. [CrossRef]
2. Xu, J.; Chung, K.W. Effects of Time Delayed Position Feedback on a van Der Pol–Duffing Oscillator. Phys. D Nonlinear Phenom.
2003, 180, 17–39. [CrossRef]
3. Zhang, H.; Wang, X.-Y.; Lin, X.-H. Topology Identification and Module–Phase Synchronization of Neural Network with Time
Delay. IEEE Trans. Syst. Man Cybern. Syst. 2017, 47, 885–892. [CrossRef]
4. Min, H.; Lu, J.; Xu, S.; Duan, N.; Chen, W. Neural Network-Based Output-Feedback Control for Stochastic High-Order Non-Linear
Time-Delay Systems with Application to Robot System. IET Control. Theory Appl. 2017, 11, 1578–1588. [CrossRef]
5. Chen, X.; Leng, S.; He, J.; Zhou, L.; Liu, H. The Upper Bounds of Cellular Vehicle-to-Vehicle Communication Latency for
Platoon-Based Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2023, 24, 6874–6887. [CrossRef]
6. Li, J.; Liu, X.; Xiao, M.; Lu, G. A Planning Control Strategy Based on Dynamic Safer Buffer to Avoid Traffic Collisions in an
Emergency for CAVs at Nonsignalized Intersections. J. Transp. Eng. Part A Syst. 2023, 149, 04023066. [CrossRef]
7. Xu, L.; Ma, J.; Zhang, S.; Wang, Y. Car Following Models for Alleviating the Degeneration of CACC Function of CAVs in Weak
Platoon Intensity. Transp. Lett. 2023, 15, 1–13. [CrossRef]
8. Samiayya, D.; Radhika, S.; Chandrasekar, A. An Optimal Model for Enhancing Network Lifetime and Cluster Head Selection
Using Hybrid Snake Whale Optimization. Peer-to-Peer Netw. Appl. 2023, 16, 1959–1974. [CrossRef]
Sensors 2023, 23, 7827 19 of 20
9. Reddy, P.Y.; Saikia, L.C. Hybrid AC/DC Control Techniques with Improved Harmonic Conditions Using DBN Based Fuzzy
Controller and Compensator Modules. Syst. Sci. Control Eng. 2023, 11, 2188406. [CrossRef]
10. Wang, R.; Jorgensen, A.B.; Liu, W.; Zhao, H.; Yan, Z.; Munk-Nielsen, S. Voltage Balancing of Series-Connected SiC Mosfets with
Adaptive-Impedance Self-Powered Gate Drivers. IEEE Trans. Ind. Electron. 2023, 70, 11401–11411. [CrossRef]
11. Klockiewicz, Z.; Slaski, G. Comparison of Vehicle Suspension Dynamic Responses for Simplified and Advanced Adjustable
Damper Models with Friction, Hysteresis and Actuation Delay for Different Comfort-Oriented Control Strategies. Acta Mech.
Autom. 2023, 17, 1–15. [CrossRef]
12. Ji, G.; Li, S.; Feng, G.; Wang, H. Enhanced Variable Universe Fuzzy Control of Vehicle Active Suspension Based on Adaptive
Contracting-Expanding Factors. Int. J. Fuzzy Syst. 2023, 1–15. [CrossRef]
13. Han, S.-Y.; Zhang, C.-H.; Tang, G.-Y. Approximation Optimal Vibration for Networked Nonlinear Vehicle Active Suspension with
Actuator Time Delay. Asian J. Control. 2017, 19, 983–995. [CrossRef]
14. Lei, J. Optimal Vibration Control of Nonlinear Systems with Multiple Time-Delays: An Application to Vehicle Suspension. Integr.
Ferroelectr. 2016, 170, 10–32. [CrossRef]
15. Bououden, S.; Chadli, M.; Zhang, L.; Yang, T. Constrained Model Predictive Control for Time-Varying Delay Systems: Application
to an Active Car Suspension. Int. J. Control Autom. Syst. 2016, 14, 51–58. [CrossRef]
16. Udwadia, F.E.; Phohomsiri, P. Active Control of Structures Using Time Delayed Positive Feedback Proportional Control Designs.
Struct. Control. Health Monit. 2006, 13, 536–552. [CrossRef]
17. Pan, H.; Sun, W.; Gao, H.; Yu, J. Finite-Time Stabilization for Vehicle Active Suspension Systems with Hard Constraints. IEEE
Trans. Intell. Transp. Syst. 2015, 16, 2663–2672. [CrossRef]
18. Yang, J.N.; Li, Z.; Danielians, A.; Liu, S.C. Aseismic Hybrid Control of Nonlinear and Hysteretic Structures I. J. Eng. Mech. 1992,
118, 1423–1440. [CrossRef]
19. Kwon, W.; Pearson, A. Feedback Stabilization of Linear Systems with Delayed Control. IEEE Trans. Autom. Control. 1980,
25, 266–269. [CrossRef]
20. Du, H.; Zhang, N. H∞ Control of Active Vehicle Suspensions with Actuator Time Delay. J. Sound Vib. 2007, 301, 236–252.
[CrossRef]
21. Li, H.; Jing, X.; Karimi, H.R. Output-Feedback-Based Hınfty Control for Vehicle Suspension Systems with Control Delay. IEEE
Trans. Ind. Electron. 2014, 61, 436–446. [CrossRef]
22. Kim, J.; Lee, T.; Kim, C.-J.; Yi, K. Model Predictive Control of a Semi-Active Suspension with a Shift Delay Compensation Using
Preview Road Information. Control Eng. Pract. 2023, 137, 105584. [CrossRef]
23. Wu, K.; Ren, C.; Nan, Y.; Li, L.; Yuan, S.; Shao, S.; Sun, Z. Experimental Research on Vehicle Active Suspension Based on
Time-Delay Control. Int. J. Control 2023, 96, 1–17. [CrossRef]
24. Li, G.; Huang, Q.; Hu, G.; Ding, R.; Zhu, W.; Zeng, L. Semi-Active Fuzzy Cooperative Control of Vehicle Suspension with a
Magnetorheological Damper. J. Intell. Mater. Syst. Struct. 2023, 1045389X231157353. [CrossRef]
25. Wang, D. Adaptive Control for the Nonlinear Suspension Systems with Stochastic Disturbances and Unknown Time Delay. Syst.
Sci. Control Eng. 2022, 10, 208–217. [CrossRef]
26. Zhang, Z.; Dong, J. A New Optimization Control Policy for Fuzzy Vehicle Suspension Systems Under Membership Functions
Online Learning. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 3255–3266. [CrossRef]
27. Xie, Z.; You, W.; Wong, P.K.; Li, W.; Ma, X.; Zhao, J. Robust Fuzzy Fault Tolerant Control for Nonlinear Active Suspension Systems
via Adaptive Hybrid Triggered Scheme. Int. J. Adapt. Control Signal Process. 2023, 37, 1608–1627. [CrossRef]
28. Sakthivel, R.; Shobana, N.; Priyanka, S.; Kwon, O.M. State Observer-Based Predictive Proportional-Integral Tracking Control for
Fuzzy Input Time-Delay Systems. Int. J. Robust Nonlinear Control 2023, 33, 6052–6069. [CrossRef]
29. Gu, B.; Cong, J.; Zhao, J.; Chen, H.; Fatemi Golshan, M. A Novel Robust Finite Time Control Approach for a Nonlinear Disturbed
Quarter-Vehicle Suspension System with Time Delay Actuation. Automatika 2022, 63, 627–639. [CrossRef]
30. Ma, X.; Wong, P.K.; Li, W.; Zhao, J.; Ghadikolaei, M.A.; Xie, Z. Multi-Objective H-2/H-8 Control of Uncertain Active Suspension
Systems with Interval Time-Varying Delay. Proc. Inst. Mech. Eng. Part I J. Syst. Control Eng. 2023, 237, 335–347. [CrossRef]
31. Lee, Y.J.; Pae, D.S.; Choi, H.D.; Lim, M.T. Sampled-Data L-2 - L-8 Filter-Based Fuzzy Control for Active Suspensions. IEEE Access
2023, 11, 21068–21080. [CrossRef]
32. Ma, G.; Wang, Z.; Yuan, Z.; Wang, X.; Yuan, B.; Tao, D. A Comprehensive Survey of Data Augmentation in Visual Reinforcement
Learning. arXiv 2022. [CrossRef]
33. Gao, Z.; Yan, X.; Gao, F.; He, L. Driver-like Decision-Making Method for Vehicle Longitudinal Autonomous Driving Based on
Deep Reinforcement Learning. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2022, 236, 3060–3070. [CrossRef]
34. Fares, A.; Bani Younes, A. Online Reinforcement Learning-Based Control of an Active Suspension System Using the Actor Critic
Approach. Appl. Sci. 2020, 10, 8060. [CrossRef]
35. Liu, M.; Li, Y.; Rong, X.; Zhang, S.; Yin, Y. Semi-Active Suspension Control Based on Deep Reinforcement Learning. IEEE Access
2020, 8, 9978–9986. [CrossRef]
36. Pang, H.; Luo, J.; Wang, M.; Wang, L. A Stability Guaranteed Nonfragile Fault-Tolerant Control Approach for Markov-Type
Vehicle Active Suspension System Subject to Faults and Disturbances. J. Vib. Control 2023, 10775463231160807. [CrossRef]
37. Kozek, M.; Smoter, A.; Lalik, K. Neural-Assisted Synthesis of a Linear Quadratic Controller for Applications in Active Suspension
Systems of Wheeled Vehicles. Energies 2023, 16, 1677. [CrossRef]
Sensors 2023, 23, 7827 20 of 20
38. Li, Y.; Wang, T.; Liu, W.; Tong, S. Neural Network Adaptive Output-Feedback Optimal Control for Active Suspension Systems.
IEEE Trans. Syst. Man Cybern Syst. 2022, 52, 4021–4032. [CrossRef]
39. Lin, Y.-C.; Nguyen, H.L.T.; Yang, J.-F.; Chiou, H.-J. A Reinforcement Learning Backstepping-Based Control Design for a Full
Vehicle Active Macpherson Suspension System. IET Control Theory Appl. 2022, 16, 1417–1430. [CrossRef]
40. Yong, H.; Seo, J.; Kim, J.; Kim, M.; Choi, J. Suspension Control Strategies Using Switched Soft Actor-Critic Models for Real Roads.
IEEE Trans. Ind. Electron. 2023, 70, 824–832. [CrossRef]
41. Lee, D.; Jin, S.; Lee, C. Deep Reinforcement Learning of Semi-Active Suspension Controller for Vehicle Ride Comfort. IEEE Trans.
Veh. Technol. 2023, 72, 327–339. [CrossRef]
42. Du, Y.; Chen, J.; Zhao, C.; Liao, F.; Zhu, M. A Hierarchical Framework for Improving Ride Comfort of Autonomous Vehicles via
Deep Reinforcement Learning with External Knowledge. Comput.-Aided Civ. Infrastruct. Eng. 2022, 38, 1059–1078. [CrossRef]
43. Han, S.-Y.; Liang, T. Reinforcement-Learning-Based Vibration Control for a Vehicle Semi-Active Suspension System via the PPO
Approach. Appl. Sci. 2022, 12, 3078. [CrossRef]
44. Dridi, I.; Hamza, A.; Ben Yahia, N. A New Approach to Controlling an Active Suspension System Based on Reinforcement
Learning. Adv. Mech. Eng. 2023, 15, 16878132231180480. [CrossRef]
45. Kwok, N.M.; Ha, Q.P.; Nguyen, T.H.; Li, J.; Samali, B. A Novel Hysteretic Model for Magnetorheological Fluid Dampers and
Parameter Identification Using Particle Swarm Optimization. Sens. Actuators A Phys. 2006, 132, 441–451. [CrossRef]
46. Krauze, P.; Kasprzyk, J. Driving Safety Improved with Control of Magnetorheological Dampers in Vehicle Suspension. Appl. Sci.
2020, 10, 8892. [CrossRef]
47. Savaresi, S.M.; Spelta, C. Mixed Sky-Hook and ADD: Approaching the Filtering Limits of a Semi-Active Suspension. J. Dyn. Syst.
Meas. Control 2006, 129, 382–392. [CrossRef]
48. Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep
Reinforcement Learning. arXiv 2019. [CrossRef]
49. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.;
Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [CrossRef]
50. Van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the AAAI
Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [CrossRef]
51. Fujimoto, S.; Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the
Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 3 July 2018; pp. 1587–1596. [CrossRef]
52. Theunissen, J.; Sorniotti, A.; Gruber, P.; Fallah, S.; Ricco, M.; Kvasnica, M.; Dhaens, M. Regionless Explicit Model Predictive
Control of Active Suspension Systems with Preview. IEEE Trans. Ind. Electron. 2020, 67, 4877–4888. [CrossRef]
53. Liang, G.; Zhao, T.; Wei, Y. DDPG Based Self-Learning Active and Model-Constrained Semi-Active Suspension Control. In
Proceedings of the 2021 5th CAA International Conference on Vehicular Control and Intelligence (CVCI), Tianjin, China, 29–31
October 2021; pp. 1–6. [CrossRef]
54. Baek, S.; Baek, J.; Choi, J.; Han, S. A Reinforcement Learning-Based Adaptive Time-Delay Control and Its Application to Robot
Manipulators. In Proceedings of the 2022 American Control Conference (ACC), Atlanta, GA, USA, 8–10 June 2022; pp. 2722–2729.
[CrossRef]
55. Li, S.; Ding, L.; Gao, H.; Liu, Y.-J.; Li, N.; Deng, Z. Reinforcement Learning Neural Network-Based Adaptive Control for State and
Input Time-Delayed Wheeled Mobile Robots. IEEE Trans. Syst. Man Cybern. Syst. 2020, 50, 4171–4182. [CrossRef]
56. Zhu, W.; Garg, T.; Raza, S.; Lalar, S.; Barak, D.D.; Rahmani, A.W. Application Research of Time Delay System Control in Mobile
Sensor Networks Based on Deep Reinforcement Learning. Wirel. Commun. Mob. Comput. 2022, 2022, 7844719. [CrossRef]
57. Chen, B.; Xu, M.; Li, L.; Zhao, D. Delay-Aware Model-Based Reinforcement Learning for Continuous Control. Neurocomputing
2021, 450, 119–128. [CrossRef]
58. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.