0% found this document useful (0 votes)

14 views20 pages

Research On Deep Reinforcement Learning Control Al

Uploaded by

ankon das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views20 pages

Research On Deep Reinforcement Learning Control Al

Uploaded by

ankon das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

sensors

Article
Research on Deep Reinforcement Learning Control Algorithm
for Active Suspension Considering Uncertain Time Delay
Yang Wang 1, *, Cheng Wang 2 , Shijie Zhao 2 and Konghui Guo 2

1 School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100811, China

2 State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130025, China
* Correspondence: wangyangascl@163.com

Abstract: The uncertain delay characteristic of actuators is a critical factor that affects the control
effectiveness of the active suspension system. Therefore, it is crucial to develop a control algorithm
that takes into account this uncertain delay in order to ensure stable control performance. This
study presents a novel active suspension control algorithm based on deep reinforcement learning
(DRL) that specifically addresses the issue of uncertain delay. In this approach, a twin-delayed
deep deterministic policy gradient (TD3) algorithm with system delay is employed to obtain the
optimal control policy by iteratively solving the dynamic model of the active suspension system,
considering the delay. Furthermore, three different operating conditions were designed for simulation
to evaluate the control performance: deterministic delay, semi-regular delay, and uncertain delay. The
experimental results demonstrate that the proposed algorithm achieves excellent control performance
under various operating conditions. Compared to passive suspension, the optimization of body
vertical acceleration is improved by more than 30%, and the proposed algorithm effectively mitigates
body vibration in the low frequency range. It consistently maintains a more than 30% improvement
in ride comfort optimization even under the most severe operating conditions and at different speeds,
demonstrating the algorithm’s potential for practical application.

Keywords: active suspension; deep reinforcement learning; suspension control; uncertain time delay

Citation: Wang, Y.; Wang, C.; Zhao,

S.; Guo, K. Research on Deep
Reinforcement Learning Control
1. Introduction
Algorithm for Active Suspension
Considering Uncertain Time Delay.
With the development of microprocessor, sensor, and actuator technologies, scholars
Sensors 2023, 23, 7827. https://
have studied various aspects of active suspension systems. The active suspension system
doi.org/10.3390/s23187827 can adaptively adjust the output force by controlling the actuator according to the driving
operation and road conditions, achieving both ride comfort and better vehicle driving
Academic Editors: Toshikazu
performance. The active suspension system control has significant prospects due to its
Nishida and Paolo Mercorelli
ability to meet the comfort and safety requirements of modern vehicles.
Received: 3 July 2023 Due to the closed-loop nature of the active suspension system, which includes sensors,
Revised: 31 August 2023 controllers, and actuators, there are inherent delays in the measured signal or actuator. In
Accepted: 7 September 2023 the majority of cases, the time delay can be disregarded as it is insignificant. However, there
Published: 12 September 2023 are situations where the magnitude of the time delay is comparable to the control cycle,
making it impossible to ignore and requiring careful consideration. Time delays tend to
degrade control performance and potentially induce instability in the control system [1–4].
Therefore, for active suspension systems with delay, developing control strategies that
Copyright: © 2023 by the authors.
resist uncertain time delays is particularly important.
Licensee MDPI, Basel, Switzerland.
Currently, delay-sensitive control systems have made significant progress in multiple
This article is an open access article
industries including transportation [5,6], autonomous vehicle control [7], wireless sensor
distributed under the terms and
networks [8], and power systems [9,10]. In recent years, stability analysis and controller
conditions of the Creative Commons
integration have been used in linear systems with measurement delays or actuator delays.
Attribution (CC BY) license (https://
In general, there are two main approaches to dealing with the problem of system delays.
creativecommons.org/licenses/by/
4.0/).
One is to design the controller using an integrated system model that includes actuator

Sensors 2023, 23, 7827. https://doi.org/10.3390/s23187827 https://www.mdpi.com/journal/sensors

Sensors 2023, 23, 7827 2 of 20

dynamics [11], and the other is to consider system delays during the controller design
process. Ji et al. [12] proposed an improved variable universe fuzzy control strategy
with real-time adjustment of the contracting–expanding factor parameters to improve the
ride comfort of vehicles. Some scholars have proposed the use of delay compensation
techniques to reduce or offset the negative effects of delays [13–15]. Udwadia et al. [16]
proposed the application of delayed state positive feedback proportional control to the
active control of structures. Pan et al. [17] designed a suspension delay active controller
using an adaptive control strategy. Some scholars [18,19] have also attempted to design
delayed instantaneous optimal control laws for suspension systems with delay using a
state transformation approach to ensure the stability of the system. Du et al. [20] designed
full-frequency domain state feedback controllers considering input delay for automotive
suspensions and seat suspensions and achieved good damping effects within a certain delay
range. Li et al. [21], on the other hand, proposed a time-delayed full-frequency domain
robust control method based on dynamic output feedback. In addition, Kim et al. [22]
combined perception technology and proposed a model predictive control of a semi-active
suspension with a shift delay compensation using preview road information. Wu et al. [23]
proposed a time-delay control strategy and the idea of using a linear motor as the actuator.
According to this idea, they proposed a linear equivalent excitation method to optimize the
optimal time-delay control parameters under complex excitation. Li et al. [24] proposed
a fuzzy cooperative control strategy based on linear matrix inequality theory to weaken
the effect of perturbations on vehicle driving conditions by integrating the problems of
variable structure, disturbance immunity, and time delay. Based on Lyapunov theory and
backstepping technique, Wang [25] studied the adaptive control problem of nonlinear active
suspension systems with random perturbations and time delay. Moreover, some scholars
have considered the time-delay characteristics of active suspension systems and controlled
them according to the Takagi–Sugeno fuzzy model [26–28]. Further, scholars have been
searching for more robust control methods to ensure the stability and performance of the
suspension system [29–31].
Although numerous scholars have conducted extensive research on the problem of
delay, the delay control system remains enigmatic due to its infinite-dimensional nature
and the inherent uncertainty associated with delay. Therefore, further in-depth research is
essential. Simultaneously, the automotive industry is rapidly adopting intelligence as part
of its processes, which has led to the integration of artificial intelligence (AI) technology
as a solution for controlling complex and uncertain systems. Notably, deep reinforcement
learning (DRL) techniques have demonstrated significant advantages in addressing high-
dimensional problems. By combining the robust perception and information-processing
capabilities of deep learning (DL) with the decision-making proficiency of reinforcement
learning (RL) in complex environments, these techniques are gradually gaining traction in
the study of various intricate systems.
Currently, DRL has shown its advantages in vision and decision-making [32,33].
Some scholars have also tried to apply it to suspension control [34,35]. In an intuitive
application, Pang et al. [36] proposed a non-fragile fault-tolerant control design for Markov-
type systems. Providing more depth, Kozek et al. [37] proposed a neural algorithm based
on reinforcement learning to optimize the creation of a linear quadratic regulator (LQR).
In recent years, Li et al. [38] used an actor–critic architecture to study the adaptive neural
network output feedback optimal control problem. In addition, many scholars have utilized
different DRL architectures for active or semi-active suspension control. Lin et al. [39]
studied a deep deterministic policy gradient (DDPG) control strategy for a full-vehicle
active Macpherson suspension system. Yong et al. [40] proposed learning and control
strategies for a semi-active suspension system in a full car using soft actor–critic (SAC)
models on real roads. Similarly, Lee et al. [41] conducted a study on semi-active suspension
control using DRL and proposed a state-normalization filter to improve the generalization
performance. Further, Du et al. [42] proposed the utilization of external knowledge in the
DDPG framework for suspension control and integration of speed planning to ensure ride
Sensors 2023, 23, 7827 3 of 20

comfort. It is worth mentioning that Han et al. [43] and Dridi et al. [44] have tried to use
a proximal policy optimization (PPO) algorithm for semi-active and active suspension
control and achieved satisfactory control results. Although there have been some useful
studies on the application of DRL to suspension control, few attempts have been made to
solve the delay problem. However, it is worth noting that in sequential decisions such as
DRL control, the delay problem can have a significant impact on the actual control effect.
Therefore, using DRL to solve the delay problem is still a new idea to be investigated. In
this study, by adding a delay link to the twin-delayed deep deterministic policy gradient
(TD3) algorithm, the agent is guided to explore the possibility of obtaining a more robust
control strategy in a time-delayed environment, and then effectively suppress the effect of
uncertain delay on the active suspension system. To summarize, the main innovations of
this study are as follows:
1. To our knowledge, this study represents the first research endeavor to employ DRL
techniques within the realm of delay control. The primary aim of this investigation is
to alleviate the repercussions of uncertain delays through the implementation of active
suspension control strategies rooted in DRL. Furthermore, this study demonstrates
the utilization of high-dimensional advantages of DRL in an infinite-dimensional
delay control system, ultimately achieving commendable results.
2. In this study, multiple sets of simulation considering deterministic delay, semi-regular
delay, and uncertain delay are proposed to test the control performance of the algorithm.
Various delay characteristics and uncertainty of the control system under their influence
are considered to make the simulation closer to the actual working conditions.
3. The control algorithm proposed in this study maintains good control performance
in multiple sets of the simulation built by MATLAB/Simulink for different working
conditions and speeds, proving its application potential.
This paper is organized as follows. Section 2 presents the active suspension quarter
model and road model. The proposed controller algorithm and the model design associ-
ated with it are presented in Sections 3 and 4, respectively. In Section 5, the simulation
environments with deterministic, semi-regular, and uncertain delays set up to validate
the performance of the algorithm and their results are shown. In Section 6, the simulation
results are discussed in a broader context with respect to the simulation results. Finally,
Section 7 provides the conclusions of this study. A table of notations and abbreviations
used in this paper is provided in Appendix A.

2. Dynamics Model of Active Suspension System Considering Time Delay

2.1. Active Suspension Quarter Model
In the field of control algorithms for active suspension systems, the two-degrees-
of-freedom quarter suspension has emerged as the established benchmark model for
experimentation. This model offers simplicity, and despite only addressing the vertical
vibration of the sprung and unsprung masses, it provides an intuitive representation of the
algorithm’s impact on control performance. Consequently, it aids researchers in algorithm
development. Further, the vertical vibration of the vehicle body is the primary determinant
of ride comfort, while the complex structure and pitch-and-roll motion of the body are
presently disregarded. Hence, the quarter suspension serves as the foundational model for
developing a dynamic model that accounts for delay.
Typically, models that take into account time delays are generally constructed from
Takagi–Sugeno fuzzy models or include a model of an adjustable damper with hysteresis
in the dynamics model [45–47]. The two-degrees-of-freedom active suspension dynamics
model considering delay used in this study is shown in Figure 1. In this model, the system
state delay is too small in magnitude, so it can be neglected, and only the inherent delay
τ and actuator delay τa in the control system are considered. Among them, the inherent
delay is the amount of delay caused by the acquisition and transmission of signals in the
active suspension control system and the controller operation, and the actuator delay is
Sensors 2023, 23, x FOR PEER REVIEW 4 of 21

τ and actuator delay τa in the control system are considered. Among them, the inherent
Sensors 2023, 23, 7827 4 of 20
delay is the amount of delay caused by the acquisition and transmission of signals in the
active suspension control system and the controller operation, and the actuator delay is
the amount of delay caused by the actuator response delay, which is unavoidable in the
the amount
control loop.of delay caused by the actuator response delay, which is unavoidable in the
control loop.

Figure1.1. Dynamics
Figure Dynamicsmodel
modelconsidering
consideringdelay
delaytime.
time.

According
According to the second
to the secondclass
classofofLagrangian
Lagrangianequations,
equations,
thethe kinetic
kinetic equation
equation with
with de-
delayed quantities at moment t can be obtained
layed quantities at moment t can be obtained as as

..  m  x ( t ) + k ( x ( t ) − x. ( t ) ) + c. ( x (t ) − x ( t ) ) + u ( t − τ − τ ) = 0
mb xb (t) + k b ( xbb(tb) − xu (b t))b + cb xub (t) − xb u (tb) + uu(t − τ − τa ) = 0a

(1)
..
mu x u (t) + k b (xm x ( t ) + k ( xu ( tk) −( xxb ((tt)) )−+ w
 ku((tx))u (+
t ) c− wx.( t )()t)+−
cb x(. xu(t()t ) −−xu ( tt)−) −τu−( t τ−aτ) −=τa0) = 0 (1)
u (u t )u − xb ( tb)) + u u b u b b (

where mm
where is the sprung mass, kg. mu the
bbis the sprung mass, kg. mu is
is the unsprung
unsprung mass, mass, k b iskbthe
kg.kg. is spring
the spring stiff-
stiffness,
N/m.
ness, N/m. ku equivalent
k u is the is the equivalent tire stiffness, N/m. N/m.
tire stiffness, cb equivalent
cb is the damping
is the equivalent factor of
damping the
factor
suspension damping element, N·s/m. xb is the vertical displacement of the sprung mass, m.
of the suspension damping element, N·s/m. x is the vertical displacement of the sprung
xu is the vertical displacement of the unsprung bmass, m. w is the vertical tire displacement,
which can xalso
mass, m. u is be
theequated
vertical to displacement of the unsprung
the road displacement, m. Fmass,= u(m.t −w τ −isτathe
) isvertical tire
the active
displacement, which can also be equated to the road displacement, m. F = u ( t − τ − τa )
suspension actuator control force for the time-delay system, N.
Take the system state variables as
is the active suspension actuator control force for the time-delay system, N.
Take the system state variables as . . T
x (t) = xb (t) xu (t) x b (t) x u (t) (2)
T
x ( t ) =  xb ( t ) xu ( t ) xb ( t ) xu ( t ) 
  (2)
Take the system output variable as
Take the system output variable as
.. . . . T
y(t) = [ x b (t) x b (t) x b (t) − x u (t) xb (t) − xu (t) xu (t) − w(t)]T (3)
y ( t ) = [
xb ( t ) xb ( t ) xb ( t ) − xu ( t ) xb ( t ) − xu ( t ) xu ( t ) − w ( t )] (3)
Then,
Then,the
thestate
statespace
spaceexpression
expressionof ofthetheactive
activesuspension,
suspension,considering
consideringthe
thedelay,
delay,isis
) x=( t )Ax t) (+t )Bu ( t −τ τ−−ττaa))++ Ew
Ew((tt))
.
x (t = (Ax + Bu
(t −

y(t) y=( tCx
(4)
(4)
) =(Cx ( t )Du
t) +
 + Du ( t −τ −
(t − τ −ττaa))++ Lw(( t ))
   
0 0 1 0 0

0

 0 0 0 1   0  0
where A = c b , B = , E =  ,
   
− kb kb
− mcb − 1 0
 m b mb b m b
  m b

1 ku
kb
mu − (kbm+uku ) cb
mu − mcbu mu mu
− kb kb
− mcbb cb
   1 
− mb
 
0
 mb mb mb
 0 0 1 0   0   0 
    
C= 0 −1 , D =  0 , L =  0  .
 
0 1   

 1 −1 0 0 
  0   0 
0 1 0 0 0 −1
 0 0 1 0   0   
C= , D= , L=0.
 0 0 1 −1   0   
 1 −1 0 0   0  0
     − 1
 0 1 0 0   0 
Sensors 2023, 23, 7827 5 of 20
2.2. Road Model
The vehicle road model uses a filtered white noise time domain road input model,
2.2.
i.e., Road Model
The vehicle road model uses a filtered white noise time domain road input model, i.e.,
w ( t ) = − 2 π f 0 w ( t ) + 2 π S q ( n0 ) v w0 ( t ) (5)
.
where w ( t ) is the roadwdisplacement. vw0 (t)frequency. Sq ( n0 ) is the
q
(t) = −2π f 0 w(ft0) + is 2π Sq (n0 )cutoff
the lower (5)
road unevenness coefficient, which is related to the road class, where the unevenness co-
where
efficientsw(tof
) is the A,
class road
B, and C roads aref 016,
displacement. is the lower
64, and cutoff
256, frequency.v Sisq (the
respectively. n0 ) speed,
is the road
m/s.
unevenness coefficient, which is related to the road class, where the unevenness coefficients
w is a uniformly distributed white noise with a mean of 0 and an intensity of 1.
of0 class A, B, and C roads are 16, 64, and 256, respectively. v is the speed, m/s. w0 is a
The lower
uniformly cutoff frequency
distributed is with
white noise calculated
a mean asof 0 and an intensity of 1.
The lower cutoff frequency is calculated f =as
2πn v
0 00 (6)

where n00 is the spatial cutoﬀ frequencyf 0 =of2πn v

the00pavement, n00 = 0.011m−1 . (6)
In this study, the time domain road unevenness curve used by the training agent is
where n00 is the spatial cutoff frequency of the pavement, n00 = 0.011 m−1 .
shown in Figure 2.
In this study, the time domain road unevenness curve used by the training agent is
shown in Figure 2.

Figure 2. Road time domain unevenness curve.

3. Controller Algorithm
3. Controller Algorithm
This
This section
section introduces
introducesthe
theprimary
primaryalgorithm
algorithmemployed
employedininconstructing
constructingthe
thecontroller
control-
for
ler for the active suspension delay system. Initially, the fundamental principles framework
the active suspension delay system. Initially, the fundamental principles and and frame-
of reinforcement
work learning
of reinforcement (RL) are
learning presented,
(RL) followed
are presented, by an elucidation
followed of the advantages
by an elucidation of the ad-
offered
vantages byoﬀered
the deep
by reinforcement learning (DRL)
the deep reinforcement learningalgorithm compared
(DRL) algorithm to traditional
compared RL
to tradi-
algorithms. Subsequently, the TD3 algorithm, known for its suitability in continuous
tional RL algorithms. Subsequently, the TD3 algorithm, known for its suitability in
control systems, is selected based on the characteristics of the delay control system. The
algorithmic process and technical intricacies of TD3 are then described.

3.1. Reinforcement Learning

RL is a payoff learning method developed from traditional attempted learning, and
its basic principles can be traced back to optimal control theory and Markov decision
processes (MDP). The RL can be represented as an interactive system consisting of an
agent and an environment, as shown in Figure 3. At time t, the environment generates
information describing the state of the system, which is the state st . The agent interacts with
the environment by observing the state and using this information to select the action at .
The environment accepts the action and transitions to the next state st+1 . The environment
then feeds the next state and its reward rt to the agent. The cycle repeats and iterates
continuously until the environment terminates. RL explores learning the optimal policy by
maximizing the total reward in the process.
the environment by observing the state and using this information to select the action at
. The environment accepts the action and transitions to the next state st +1 . The environ-
ment then feeds the next state and its reward rt to the agent. The cycle repeats and iter-
Sensors 2023, 23, 7827 ates continuously until the environment terminates. RL explores learning the optimal pol-
6 of 20
icy by maximizing the total reward in the process.

Figure3.
Figure 3. RL
RL basic
basic framework.
framework.

3.2.
3.2. Deep
Deep Reinforcement
Reinforcement Learning
Learning
In RL, the agent learns
In RL, the agent learns a function
a functionto formulate
to formulateappropriate actions
appropriate to maximize
actions the goal.
to maximize the
However, RL has limitations when dealing with large-scale problems. For
goal. However, RL has limitations when dealing with large-scale problems. For example, example, in cases
where
in cases thewhere
state space andspace
the state actionandspace are extremely
action large, traditional
space are extremely RL algorithms
large, traditional may
RL algo-
face the issues of the curse of dimensionality and computational complexity.
rithms may face the issues of the curse of dimensionality and computational complexity. Additionally,
traditional
Additionally, RL algorithms mayalgorithms
traditional RL require a large
may number
require aoflarge
samples and time
number to learn
of samples anda good
time
policy, which can be time-consuming and inefficient for certain tasks. With the development
to learn a good policy, which can be time-consuming and ineﬃcient for certain tasks. With
of DL technology, we can use deep neural networks as function-approximation methods to
the development of DL technology, we can use deep neural networks as function-approx-
learn value functions that imply multidimensional information. It extends the application
imation methods to learn value functions that imply multidimensional information. It ex-
of RL to higher dimensions, allowing it to suppress the high-dimensional disasters of
tends the application of RL to higher dimensions, allowing it to suppress the high-dimen-
traditional optimization problems to some extent.
sional disasters of traditional optimization problems to some extent.
The three primary functions that can be learned in RL align with the three primary
The three primary functions that can be learned in RL align with the three primary
methods in DRL, namely, policy-based, value-based, and model-based. Researchers use
methods in DRL, namely, policy-based, value-based, and model-based. Researchers use
these three major classes of methods individually or in combination to meet the needs of
these three major classes of methods individually or in combination to meet the needs of
practical tasks. In this study, the active suspension control system considering delay is a
practical tasks. In this study, the active suspension control system considering delay is a
typical continuous state and continuous action control system. The classical DRL algorithm
typical continuous state and continuous action control system. The classical DRL algo-
in this field is the DDPG [48]. DDPG utilizes an actor–critic architecture for learning, i.e.,
rithm
an in network
actor this field is
is the
addedDDPG [48].
to the DDPG
deep utilizes an
Q network actor–critic
(DQN) [49,50]architecture for learning,
to output action values
i.e., an actor network is added to the deep Q network (DQN) [49,50]
directly. It helps the agent to obtain more feedback information and thus make to output action val-
more
ues directly. It
accurate decisions. helps the agent to obtain more feedback information and thus make more
accurate decisions.
3.3. Twin-Delayed Deep Deterministic Policy Gradient
Although the classical DDPG algorithm has achieved satisfactory results in many
tasks, the traditional actor–critic framework suffers from the bias and variance problems
associated with function approximation, considering the cumulative nature of the DRL
iterative process. Specifically, on the one hand, the variance can cause overestimation, while
at the same time, high variance can cause the accumulation of errors, which in turn makes
the system less stable. Considering the characteristics of time-delayed control systems,
high variance means more aggressive control behavior and high deviation means gradual
deviation from the steady state, and the agent gains more benefit but also takes more
risk, which is more likely to cause instability of the control system. Therefore, we need to
suppress both overestimation and cumulative error. Based on the above practical control
requirements, this study uses the TD3 algorithm [51] as the basic algorithmic framework,
i.e., adding techniques such as clipped double Q-learning, delayed policy updates, and
target policy smoothing to the DDPG framework. The framework improvement of the TD3
algorithm is shown in Figure 4.
gradual deviation from the steady state, and the agent gains more benefit but also takes
more risk, which is more likely to cause instability of the control system. Therefore, we
need to suppress both overestimation and cumulative error. Based on the above practical
control requirements, this study uses the TD3 algorithm [51] as the basic algorithmic
framework, i.e., adding techniques such as clipped double Q-learning, delayed policy up-
Sensors 2023, 23, 7827 dates, and target policy smoothing to the DDPG framework. The framework improve- 7 of 20

ment of the TD3 algorithm is shown in Figure 4.

Figure 4.
Figure 4. TD3
TD3 algorithm
algorithm improvement
improvement structure.
structure.

When building
When building aa controller
controller using the
using the TD3 algorithm,
TD3 algorithm, first,
first, randomly
randomly initialize
initialize the
the
( ) ( ) ( )

critic network QQ1 s,s,aa θθQ1 , Q s, a θ , and actor network µ ( s | θ ) with parameters
Q1 , Q2 s, a θQ2 , and actor network μ s θμ
2 Q2 µ
critic network 1 with parameters
θQ1, θQ2 , and θµ ,respectively. At the same time, we need to initialize target networks
Qθ0Q1 s,
, aθQθ20 , and
, Q0θμ s,

, respectively.
a θ0 , and µAt0 the
s θ0 same time,
with we needθto
parameters 0 initialize
, θ0 , and θ target
0 networks
, respectively.
1 Q1 2 Q2 µ µ

( ) ( )
Q1 Q2
'
Q s, a θ
Then,
1
' '
synchronize
Q1 , 2 ,a θ
Q sthe '
parameters:
Q2 , and μ
'
(s θ )'
μ
'
with parameters θ Q 1 , θ Q 2 , and θμ , respec-
' '

tively. Then, synchronize theθQ1 → θ0Q1 , θQ2 → θ0Q2 , θµ → θµ

parameters: 0
(7)
' ' '
θ 1 → θ , θQ 2 → θ , θμ → θ
In the learning phase, we set Qthe capacity of the experience buffer and the number (7)
Q1 Q2 μ
of
episodes. After initializing the state s , the actions are selected based on the current
In the learning phase, we set the tcapacity of the experience buﬀer and the number of actor
network
episodes.and exploration
After the state s , the actions are selected based on the current actor
noise:
initializing t
network and exploration noise: at = µ(st |θµ ) + ξt (8)

where ξt ∼ Nt (0, σ).

at = μ st θμ +ξt ( ) (8)

ξt  N t ( 0,actions
Performing σ) in the environment and obtaining rewards rt and new state st+1 .
where
We .
store these experiences in the form of samples of (st , at , rt , st+1 ) that are transformed
into the experience buffer. When the experience buffer reaches a certain number of samples,
we randomly sample a mini-batch of experience–transfer samples (si , ai , ri , si+1 ) from it to
perform parameter updates. Let
∼
yi = ri + γmin j=1,2 Q0j si+1 , µ0 si+1 θµ0
0
θQj + ζ , ζ ∼ clip N 0, σ , −c, c (9)

where γ is the discount factor and takes the value of 0.99, which is taken to be equivalent to
considering the situation after 100 time steps. c is the noise clipping, which in this study
is taken as 0.5. The clipped double Q-learning idea and target policy smoothing idea of
TD3 are reflected here. With clipped double Q-learning, the value target cannot introduce
any additional overestimation bias using the standard Q-learning target. While this update
rule may induce an underestimation bias, this is far preferable to overestimation bias, as
unlike overestimated actions, the value of underestimated actions will not be explicitly
propagated through the policy update [51]. Moreover, the effect of inaccuracy caused by
function approximation error can be effectively reduced by target policy smoothing.
Sensors 2023, 23, 7827 8 of 20

Update the parameters θQj of the critic network according to the minimization loss
function, i.e.,

1 2
∇θQj L θQj = ∑i yi − Q j si , ai θQj

∇θQj Q j si , ai θQj , j = 1, 2 (10)
N
where N is the size of the mini-batch sample, whose value we take as 128 in this study. Ac-
cording to our experience, the size of the mini-batch should be related to the complexity of
the problem being studied. The parameters θµ of the actor network are updated according
to the objective maximization function, i.e.,

1
∑

∇θµ J (µ) = i
∇ a Q s, a θQ s=si ,a=µ(si ) · ∇θµ µ(s|θµ )|si (11)
N
Finally, in order to avoid unknown oscillations affecting convergence during gradient
descent to ensure that the update of the network can balance stability and rapidity, we
perform the soft update process for the network parameters, i.e.,

θ0Qj ← εθQj + (1 − ε)θ0Qj , j = 1, 2

0 0 (12)
θµ ← εθµ + (1 − ε)θµ

where ε is the soft update factor, since the delay control system is a high-dimensional
unstable system, ε = 0.001 is taken in this study to satisfy the robustness of the policy to
some extent.
In addition, the critic network is updated twice as often as the actor network in order
to first minimize the errors induced by the value estimation before introducing policy
updates. Soft updating of network parameters and delayed policy updating ensure target
stabilization and thus reduce error increase. TD3 is summarized in Algorithm 1.

Algorithm 1 TD3

Randomly initialize critic networks Q1 s, a θQ1 , Q2 s, a θQ2 , and actor network µ(s|θµ )
1
with random parameters θQ1 , θQ2 , and θµ .
2 Initialize target networks θQ1 → θ0Q1 , θQ2 → θ0Q2 , θµ → θµ 0

3 Initialize replay buffer Ω

4 for episode = 1 to Max episodes, do
5 Initialize a Gaussian random process ξt ∼ Nt (0, σ) for action exploration.
6 Receive the initial observation state s0 of the environment.
7 for t = 1 to Max steps, do
8 Select action at = µ(st |θµ ) + ξt according to the current policy and exploration noise.
9 Execute action at and observe reward rt and observe the new state st+1 .
10 Store transition tuple (st , at , rt , st+1 ) in Ω
11 Sample mini-batch of N transitions
Ω
(si , ai , ri , si+1 ) from ∼
0
0 0
0
12 Set yi = ri + γmin j=1,2 Q j si+1 , µ si+1 θµ θQj + ζ , ζ ∼ clip N 0, σ , −c, c
Update parameters θQj :
13 2
∇θQj L θQj = N1 ∑i yi − Q j si , ai θQj ∇θQj Q j si , ai θQj , j = 1, 2
14 if t mod delayed update frequency, then
Update the
parameters θµ :
15 1

∇θµ J (µ) = N ∑i ∇ a Q s, a θQ s=si ,a=µ(si ) · ∇θµ µ(s|θµ )|si
16 θ0Qj ← εθQj + (1 − ε)θ0Qj , j = 1, 2
Soft update target networks: 0 ← εθ + (1 − ε)θ0
θµ µ µ
17 end if
18 end for
19 end for
Sensors 2023, 23, 7827 9 of 20

4. Controller Model
Solving a new problem using DRL involves the creation of an environment, so in this
section, we focus on the environment components involved in the controller model, namely,
the state, action, reward, and transfer function. Among them, the transfer function of DRL
is the dynamics model introduced in Section 2, which will not be repeated here.

4.1. State
The state is the information that describes the environment, and the RL environment
must provide the algorithm with enough information to solve the problem. In the active
suspension delay control system, multidimensional information such as displacement,
velocity, acceleration, control force, and time are included. These high-level states are the
raw states. The raw state should contain all the information relevant to the problem, but it
is often difficult to learn because it usually has a lot of redundancy and background noise.
Raw and complete information provides more freedom, but extracting and interpreting
useful signals from it is a much heavier burden. Therefore, the selection and design of the
state space becomes particularly important.
Based on the important suspension performance parameters included in the dynamics
model, the following state variables are chosen to characterize the state space considering
the control performance requirements of the active suspension. The designed state space
takes into account both real-world constraints and observability cases under the influence
of sensor arrangements.
.. . . . T
s = xb xb xb − xu xb − xu (13)

The state information takes into account the ride comfort, actuator efficiency, and
suspension travel of the suspension system. This state contains much less information than
the original state, but it contains more straightforward information for the algorithm and is
easier to learn.
Further, the agent further preprocesses the designed states for its own use. In order to
improve the generalization performance of the controller model, this study normalizes the
states, and the final state vector is represented as
h .. . . . iT
xb xb xb −xu xb − xu
s= λ1 λ2 λ3 λ4
(14)

where λ is the normalized coefficient of each state variable.

4.2. Action
Action is the output of the agent, which is the amount of control output by the
controller. Actions change the environment by transitioning the dynamics model to the
next state and into the next round of iterations. How the action is designed affects the ease
of control of the system and thus directly affects the problem’s difficulty. In this study, the
action is the control force of the active suspension actuator, i.e.,

Ft = at = µ(st |θµ ) (15)

where µ(s|θµ ) is the actor network. It should be noted that due to the delay, the control
force in the actual system often appears as

a t + τ + τ a = µ ( s t | θµ ) (16)

Considering the specific performance constraints of the actuator, adding the double
truncation constraint to the actor network, the final action is represented as

a = clip( at+τ+τa , Fmin , Fmax ) (17)

Sensors 2023, 23, 7827 10 of 20

In this study, based on the saturation constraint of the actuator, we set Fmin = −3 kN
and Fmax = 3 kN.

4.3. Reward
The reward signal is used to define the objective that the agent should maximize to
guide the agent’s exploration to obtain a better policy from a global level. The reward
function can be somewhat analogous to the objective function of a traditional control prob-
lem, so reward design is an important issue in DRL. In the design of an active suspension
controller considering delay, the control performance requirements of multiple objectives
need to be considered.
The first issue is the ride comfort, which is closely related to the vertical acceleration
of the vehicle body. Therefore, the impact of road unevenness on the body is avoided, and
the vertical acceleration is reduced while a suitable active control force is needed.
Secondly, the practical constraints of suspension travel need to be considered. Dynamic
suspension travel needs to satisfy the following inequalities within a safe range:

| xb (t) − xu (t)| ≤ f d (18)

where f d is the maximum travel value of the suspension, and in this study, let f d = 0.15 m.
The value is determined by referring to the literature [52] and considering the actual
constraints of the suspension obtained.
Then, the grounding of the suspension needs to be considered, i.e., the following
inequalities need to be satisfied to ensure the handling stability of the vehicle:

k u | xu (t) − w(t)| ≤ Fm (19)

where Fm is the static load of the tire, and its calculation formula is

Fm = (mb + mu ) g (20)

Finally, the control characteristics of the actuator need to be considered. The actuator
delay τa has a close relationship with the total delay of the whole system, so in order to
suppress the effect of delay at the physical level, we should ensure that the control force is
relatively stable in a relatively small interval as much as possible.
In summary, the reward function is defined as
..
2
r = − k 1 x b + k 2 | x b − x u |2 + k 3 | x u − w |2 + k 4 | F |2 (21)

where k1 = 0.7, k2 = 0.1, k3 = 0.1, k4 = 0.1 are the weight coefficients of the balanced
multi-objective optimization problem.
It should be noted that the agent’s reward function in the training phase references
the state information of the system after the control force has been applied after the delay.
In contrast, the state referenced by the actor network and critic network is the current state.
In other words, the delayed control system in the experimental phase is equivalent to an
open-loop control system.

5. Simulation and Results

This section describes the simulation and results used to verify the proposed control
algorithm in the simulation environment built by MATLAB2023a/Simulink. First, the
specific environment and agent-related network information required for the simulation
experiments were introduced and set up. Then, parallel experiments were conducted for
10 ms, 20 ms, and 30 ms defined delay conditions to demonstrate the control performance of
the proposed algorithm for deterministic delay control. In addition, we established a semi-
regular delay control condition based on the fuzzy relationship between actuator control
force and delay to test the excellent control performance of the proposed algorithm. Finally,
Sensors 2023, 23, 7827 11 of 20

we established a severe operating condition with uncertain delay to test the anti-disturbance
performance of the proposed algorithm and its improvement of ride comfort.

5.1. Implementation Details

The active suspension system dynamics model with consideration of delay introduced
in Section 2 was used to build the environment required for training the DRL, and the
model parameters and values are listed in Table 1. The road information used for training
the agent is shown in Figure 2, and a deterministic delay of 30 ms was added to the network
during training.

Table 1. Environment model parameters.

Parameters Value Parameters Value

mb 400 mu 40
kb 20,000 ku 200,000
v 20 τ 0
Sq 256 cu 0
cb (active) 0 cb (passive) 1500

The critic and actor network used for the agent were specifically designed for the
control
Sensors 2023, 23, x FOR PEER REVIEW of active suspension systems, as shown in Figure 5, and the hyperparameters used
12 of 21
to train the network are shown in Table 2. In order to better verify the performance of the
proposed algorithm in this study, we chose the active suspension DDPG control architecture
proposed in the literature [53] as a baseline for comparison. The hyperparameters of the
the baseline algorithm were used in this study, and additional hyperparameters were se-
baseline algorithm were used in this study, and additional hyperparameters were selected
lected based on the original TD3 algorithm. In addition, we performed combinatorial ex-
based on the original TD3 algorithm. In addition, we performed combinatorial experiments
periments on some of the hyperparameters; see Appendix B.
on some of the hyperparameters; see Appendix B.

(a) (b)
Figure
Figure 5. Critic
5. Critic and actor
and actor network.
network. (a) Network
(a) Network architecture
architecture created
created forcritic,
for the the critic, andnetwork
and (b) (b) network
architecture created for the actor. It should be noted that the last fully connected layer in theincritic
architecture created for the actor. It should be noted that the last fully connected layer the critic
network directly outputs the result without the need for activation.
network directly outputs the result without the need for activation.
Table 2. Agent hyperparameter.

Hyperparameter
Item Value
Learning rate 1 × 10−3
Critic Gradient threshold 1
L2 Regularization factor 1 × 10−4
Learning rate 1 × 10−2
Actor
Gradient threshold 1
Sample time 0.01
Sensors 2023, 23, 7827 12 of 20

Table 2. Agent hyperparameter.

Hyperparameter
Item Value
Learning rate 1 × 10−3
Critic Gradient threshold 1
L2 Regularization factor 1 × 10−4
Learning rate 1 × 10−2
Actor
Gradient threshold 1
Sample time 0.01
Target smoothing factor 1 × 10−3
Experience buffer length 1 × 106
Discount factor 0.99
Agent Mini-batch size 128
Soft update factor 1 × 10−3
Delayed update frequency 2
Noise clipping 0.5
Noise variance 0.6
The decay rate of noise variance 1 × 10−5
Max episodes 2000
Training process
Max steps 1000

5.2. Deterministic Delayed Conditions

Due to the random nature of the road excitation during the actual driving of the
vehicle, we selected a random class C road as the random excitation. The proposed active
control algorithm of the delay system was used to study the dynamics of the suspension
system under random excitation. One study [20] showed that the classical control method
could not prevent unstable divergence in an active suspension system with a 24 ms time
delay. Based on this conclusion, it was determined that the delay times were designed to
be similar to 10 ms, 20 ms, and 30 ms.
We quantitatively evaluated the intensity of body vibration under DRL control using
root mean square (RMS) values of acceleration. In addition, we also analyzed the frequency
response of the body acceleration to a random road. The RMS results of the proposed
algorithm with different time delays in the simulation environment were compared with
passive results, as tabulated in Table 3. The simulation comparison results of the body
acceleration and the frequency response are shown in Figure 6. In addition, we compared
the proposed algorithm with the most classical and effective DDPG algorithm [53]; the
comparison results are also presented in the graphs.

Table 3. RMS values of body acceleration in deterministic delayed conditions.

10 ms 20 ms 30 ms
Controller Passive
Proposed DDPG Proposed DDPG Proposed DDPG
.. 1.0595 0.7772 1.0347 1.3901 1.2716 1.5439
xb
1.8778 (+43.58%) (+58.61%) (+44.9%) (+25.97%) (+32.28%) (+17.78%)
m/s2

As we can see from the graphs, the proposed control algorithm optimized the ride
comfort by 43.58%, 44.9%, and 32.28% for 10 ms, 20 ms, and 30 ms deterministic delays,
respectively, compared to the passive suspension. Although the control algorithm of the
proposed algorithm is slightly inferior to that of DDPG under the low latency condition of
10 ms, the control performance of the proposed algorithm improves by 25.56% compared
to that of DDPG under the latency condition of 20 ms. Further, under the large delay
condition of 30 ms, the proposed algorithm still maintains the optimization result of 32.28%
compared to DDPG, which cannot maintain stability and crashes. The above results clearly
posed algorithm with diﬀerent time delays in the simulation environment were compared
with passive results, as tabulated in Table 3. The simulation comparison results of the
body acceleration and the frequency response are shown in Figure 6. In addition, we com-
pared the proposed algorithm with the most classical and eﬀective DDPG algorithm [53];
the comparison results are also presented in the graphs.
Sensors 2023, 23, 7827 13 of 20

Table 3. RMS values of body acceleration in deterministic delayed conditions.

10 ms 20 ms 30 ms
Controller Passivedemonstrate the superior control performance of the proposed algorithm. Although the
proposed algorithm exhibited good control performance at deterministic delays, DDPG
Proposed DDPG Proposed DDPG Proposed realistic
xb
 control systems did not always have deterministic delays. Deterministic delay is equivalent
1.0595 0.7772 1.0347 1.3901 1.2716 1.5439
(m s ) 2 1.8778 to adding a deterministic dimension to the overall control system, which is still solvable
(+43.58%) (+58.61%) (+44.9%) (+25.97%) (+32.28%) (+17.78%)
to some extent. Therefore, the study of delayed control systems requires more in-depth
discussion and analysis.

(a) (b)

(c) (d)
Figure
Figure 6. 6.Control
Controlperformance
performance with
with deterministic
deterministic time
time delay.
delay. (a–c)
(a–c) Vehicle
Vehicle body
body acceleration
acceleration curves
curves
under
under 1010ms,ms,
2020 ms,
ms, and
and 3030
msms delay,
delay, respectively.
respectively. (d)(d) Frequency
Frequency response
response
..
of of xb / w.
x b /w.

5.3. Semi-Regular Delayed Conditions

The amount of delay in a control system is often closely related to the actual actuation
capability of the actuator, so some scholars used an integrated system model that includes
the actuator dynamics to design the controller. To simulate this characteristic, we developed
a semi-regular delay condition, specifically using the following rules:

 δ, 0 < |∆F | ≤ f


τa = 2δ, f < |∆F | ≤ 2 f (22)

3δ, 2 f < |∆F | ≤ 3 f


where δ is the unit delay amount, and its value was taken as 10 ms in this study. f is the unit
amount determined according to the maximum limiting control force of the actuator, and
its value was taken as 2 kN in this study. This value is chosen by considering the bandwidth
of the active suspension actuator and can actually be obtained by testing the actuation force
response characteristics of the active suspension actuator under different loads.
where δ is the unit delay amount, and its value was taken as 10 ms in this study. f is
the unit amount determined according to the maximum limiting control force of the actu-
ator, and its value was taken as 2 kN in this study. This value is chosen by considering the
bandwidth of the active suspension actuator and can actually be obtained by testing the
Sensors 2023, 23, 7827 14 of 20
actuation force response characteristics of the active suspension actuator under diﬀerent
loads.
The semi-regular delay condition was based on the fuzzy relationship between actu-
atorThe semi-regular
delay delaycapacity,
and actuation conditionandwasthe
based onwas
delay the fuzzy
graded relationship
in steps. Itbetween
can makeactuator
the sys-
delay and actuation capacity, and the delay was graded in steps. It can make the
tem simple while retaining the actuator’s role to a greater extent, so this condition had system
simple while
stronger retaining
practical the actuator’s
significance role tothe
for testing a greater
controlextent, so this
algorithm. Thecondition had stronger
body acceleration and
practical
frequency response with the semi-regular delay condition are shown in Figurefrequency
significance for testing the control algorithm. The body acceleration and 7.
response with the semi-regular delay condition are shown in Figure 7.

(a) (b)
Figure
Figure 7. 7.Control
Controlperformance
performancewith
withsemi-regular
semi-regulartime
timedelay.
delay.(a)(a)Vehicle
Vehicle body
body acceleration
acceleration curve
curve
under
under semi-regular
semi-regular time
time delay.
delay. (b)(b) Frequency
Frequency response
response
..
of of xb / w.
x b /w.

In the semi-regular delay condition, the RMS result of the proposed algorithm is
0.9529 m/s2 , and the ride comfort is optimized by 44.13%. In comparison, the RMS value
under the DDPG baseline control is 1.1321 m/s2 , while the control performance of the
proposed algorithm exceeds the DDPG baseline by 15.8%. We can see that the proposed
algorithm still maintained good control performance in the operating conditions where the
fuzzy characteristics of the actuator were considered.

5.4. Uncertain Delay Conditions

The previous consideration of deterministic delay and semi-regular delay conditions
amounted to adding one or more dimensions of uncertainty to the delay control system. Fur-
ther, the delay in a practical time-delay control system was often uncertain. Mathematically
and theoretically, this environment was closer to an infinite-dimensional control system.
Therefore, we set up simulation experiments under uncertain delay time working condi-
tions to simulate a more severe environment. Under uncertain delay conditions, DDPG
fails to converge due to its inability to adapt to such a high-dimensional working condition.
Specifically, the delay time was set to satisfy the uniform distribution of the follow-
ing equation:
τa = ν, ν ∼ U(10, 40) (23)
The body acceleration and frequency response under the uncertain delay conditions
are shown in Figure 8. The value of RMS under the control of the proposed algorithm was
1.1116 m/s2 in the uncertain delay condition, and the optimization of ride comfort reached
37.56%. In comparison, the RMS value under the DDPG baseline control is 1.2528 m/s2 ,
while the control performance of the proposed algorithm exceeds the DDPG baseline
by 11.3%. It can be seen that the control performance of the proposed algorithm only
showed a small degradation in the case of a sharp increase in system complexity. In
Figure 8b, the response of the proposed algorithm to road vibration shows a slight increase
in a certain range in the high-frequency part. This is due to the fact that the uncertain
delay condition is an extremely severe condition, so the DRL algorithm will prioritize the
control needs in the low-frequency section. A more in-depth study will be carried out for
high-frequency control.
a small degradation in the case of a sharp increase in system complexity. In Figure 8b, the
response of the proposed algorithm to road vibration shows a slight increase in a certain
range in the high-frequency part. This is due to the fact that the uncertain delay condition
is an extremely severe condition, so the DRL algorithm will prioritize the control needs in
Sensors 2023, 23, 7827 the low-frequency section. A more in-depth study will be carried out for high-frequency
15 of 20
control.

(a) (b)
Figure
Figure 8. 8. Control
Control performance
performance with
with uncertain
uncertain time
time delay.
delay. (a)(a) Vehicle
Vehicle body
body acceleration
acceleration curve
curve under
under
uncertain
uncertain time
time delay.
delay. (b)(b) Frequency
Frequency response
response
.. 
of of xb / w.
x b /w.

It Itshould
shouldbebenoted
notedthat
thatinin
the
thecomparative
comparativeexperiments,
experiments,the
theDDPG
DDPGalgorithm
algorithmwas
was
unable to complete the full verification due to triggering the termination condition within
unable to complete the full verification due to triggering the termination condition within
the episode described in Equation (18) under the conditions of a 30 ms large delay, a semi-
regular delay, and an uncertain delay. In order to conduct the comparative experiments, we
had to remove the relevant termination conditions. In comparison, the proposed algorithm
consistently did not trigger the termination condition, ensuring the safety of the driving
process. Furthermore, in order to verify the generalization performance of the proposed
algorithm in different environments, we conducted experiments by varying the speed. The
control results are shown in Table 4. The table indicates that the proposed algorithm shows
good control performance at different speeds.

Table 4. RMS values of body acceleration at different speeds.

Speed (m/s) Passive Proposed Optimization

10 1.265 0.8853 30.02%
15 1.5461 0.9885 36.06%
20 1.7803 1.1116 37.56%
25 1.9837 1.2049 39.26%
30 2.1645 1.2794 40.89%
35 2.3275 1.353 41.87%
40 2.476 1.4122 42.96%
45 2.6123 1.4705 43.71%
50 2.7383 1.5175 44.58%

6. Discussion
In this study, we made some beneficial attempts using DRL to solve the challenging
problem of time delay control in active suspension systems. We set deterministic, semi-
regular, and uncertain time delays to simulate the changes from ideal working conditions to
real working conditions and then to harsh working conditions, thereby testing the control
performance of the proposed algorithm. Under deterministic delay, the proposed algorithm
demonstrated good control performance at working conditions of 10 ms, 20 ms, and 30 ms,
surpassing the DDPG baseline and maintaining good stability even under larger time
delays. In addition, the proposed algorithm effectively suppressed the first resonance peak
of road excitation and body, and improved ride comfort. The proposed algorithm includes
predictions of future rewards, thus possessing stronger robustness to a certain extent.
This condition corresponds to a relatively ideal working environment for the actuator,
where stable fixed delay is desirable. However, under actual conditions, system delay is
Sensors 2023, 23, 7827 16 of 20

often closely related to the actuator’s manufacturing capability and response. Therefore,
we designed semi-regular delay conditions to simply simulate this characteristic. The
simulation results also reflected the good control performance of the proposed algorithm
and its improvement in ride comfort. We believe that these results are due to the fact
that the proposed algorithm, based on predicting the future, has imposed planning on the
output of control force, keeping it within a small range of variation that better aligns with
the actuator’s response characteristics. Furthermore, uncertain conditions are relatively
harsh working conditions, and it is necessary to conduct simulations under such conditions
to better test the performance of the proposed algorithm. It can be seen that under such
conditions, the proposed algorithm can still maintain a 37.56% optimization effect. We
believe this is because, for the infinite-dimensional delay control system, the data-driven
algorithm bypasses the analysis at the high-dimensional level and directly performs end-
to-end analysis. To a certain extent, it corresponds to re-architecting a solver that simulates
a high-dimensional environment, and this approach is undoubtedly novel and effective.
Of course, further research is needed to verify its effectiveness. It is encouraging that
Baek et al. [54] and Li et al. [55] have attempted to apply DRL to robotics, Zhu et al. [56]
have applied it in Mobile Sensor Networks, and Chen et al. [57] have generalized it even
more to continuous control. At the same time, it is also important for us to focus on choosing
more suitable algorithms for the control of delay systems. In recent years, PPO [58] has been
widely applied in the industry due to its robustness and performance. The characteristics
of PPO in parameter tuning and policy updating provide us with new ideas for our future
research, which will be a valuable direction for future studies.
Furthermore, the generalization ability of the algorithm has always been a contro-
versial issue for learning-based algorithms. For this reason, we conducted simulations at
different speeds, and the results showed that the proposed algorithm maintained over 30%
comfort optimization from 10 m/s to 50 m/s, which covers almost all driving speeds in
reality. Moreover, to apply the proposed algorithm in the real world, the complexity of the
algorithm must be considered, and its real-time calculation performance must be examined.
The trained DRL controller has 33,537 parameters, and it only takes 5.6 ms to compute on a
typical PC (Intel Core i9-12900KF, 16 GB RAM). Therefore, the proposed controller will not
be a problem in terms of real-time implementation.

7. Conclusions
This paper proposed an active suspension DRL control algorithm considering time
delay to study the uncertain delay problem in the actual control system of active suspen-
sion. Firstly, a dynamics model of the active suspension system considering time delay
was established. Secondly, the TD3 algorithm was enhanced by incorporating delay, en-
abling the agent to explore more robust policies. Finally, simulation experiments were
conducted under three different experimental conditions: deterministic delay, semi-regular
delay, and uncertain delay. The proposed algorithm’s control performance was evalu-
ated, and experimental validation was performed at various speeds. The results illustrate
the algorithm’s effectiveness in mitigating the impact of uncertain delay on the active
suspension system, resulting in significant improvements in ride comfort optimization.
Specifically, the proposed algorithm achieved comfort optimization rates of 43.58%, 44.9%,
and 32.28% for deterministic delays of 10 ms, 20 ms, and 30 ms, respectively. Additionally,
it obtained optimization rates of 44.13% and 37.56% for semi-regular and uncertain delay
conditions, respectively. Furthermore, when compared to the DDPG baseline algorithm, the
proposed algorithm demonstrates excellent stability and convergence even under complex
delay conditions.
Despite satisfactory results in the current research, the important characteristic of
time delay still requires further investigation. In future work, we aim to enhance our
understanding of the relationship between delay and control performance by incorporating
an integrated system model that accounts for actuator dynamics into the DRL environment.
Sensors 2023, 23, 7827 17 of 20

By relying on a comprehensive model environment, the agent can derive improved control
policies that are better suited for real-world vehicle deployment scenarios.

Author Contributions: Conceptualization, Y.W., C.W., S.Z. and K.G.; methodology, Y.W. and C.W.;
software, Y.W.; validation, Y.W. and S.Z.; formal analysis, Y.W.; investigation, Y.W.; resources, Y.W.;
data curation, Y.W.; writing—original draft preparation, C.W.; writing—review and editing, Y.W. and
S.Z.; visualization, Y.W.; supervision, K.G.; project administration, K.G.; funding acquisition, K.G. All
authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by The National Key Research and Development Program of
China (Grant No. 2022YFB3206602) and China Postdoctoral Science Foundation Funded Project
(Grant No. 2022M720433).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Acknowledgments: The authors sincerely thank the anonymous reviewers for their critical comments
and suggestions for improving the manuscript.
Conflicts of Interest: The authors declare no conflict of interest.

Appendix A. Notations and Abbreviations

Table A1. The notations used in the paper.

Parameters Notation Unit

τ Inherent delay s
τa Actuator delay s
mb Sprung mass kg
mu Unsprung mass kg
kb Spring stiffness N/m
ku Equivalent tire stiffness N/m
cb Equivalent damping factor N · s/m
Vertical displacement of the
xb m
sprung mass
Vertical displacement of the
xu m
unsprung mass
w Road displacement m
Active suspension actuator
F = u(t − τ − τa ) N
control force
v Speed m/s
f0 Lower cutoff frequency —
Sq ( n0 ) Road unevenness coefficient —
w0 Uniformly distributed white noise —
n00 Spatial cutoff frequency —
st State at time t —
at Action at time t —
s t +1 State at time t + 1 —
rt Reward at time t —
Q s, a θQ Critic network —
µ(s|θµ ) Actor network —
Q0 s, a θ0Q Target critic network —

0
µ0 s θµ Target actor network —
γ Discount factor —
ε Soft update factor —
ξt Exploration noise —
ζ Smoothing noise —
λ Normalized coefficient —
Maximum travel value of the
fd —
suspension
Fm Static load of the tire —
k Weight coefficient —
δ Unit delay amount s
f Unit amount determined force kN
Sensors 2023, 23, 7827 18 of 20

Table A2. The list of abbreviations.

Abbreviation Full Name

DDPG Deep Deterministic Policy Gradient
DL Deep Learning
DQN Deep Q Network
DRL Deep Reinforcement Learning
LQR Linear Quadratic Regulator
MDP Markov Decision Processes
PPO Proximal Policy Optimization
RL Reinforcement Learning
RMS Root Mean Square
SAC Soft Actor–Critic
Twin-Delayed Deep Deterministic Policy
TD3
Gradient

Appendix B. Hyperparametric Portfolio Experiment

In this study, a specifically designed deep neural network architecture was chosen
to reduce computational complexity to meet the requirements of the controller in the real
world. In order to address the exploration and utilization challenges, the learning rate of
actor and critic networks is also an important influence. Therefore, we have investigated
the hyperparameter combinations of these two components separately, which are shown in
Tables A3 and A4, respectively.

Table A3. Number of neurons in the deep neural network.

Neuronal
32–64 64–128 128–256 256–512
Assemblies
Optimization 24.99% 30.30% 35.62% 32.82%

Table A4. Critic network and actor network learning rate. The first behavioral critic network learning
rate and the first column is the actor network learning rate.

Optimization 0.1 0.01 0.001

0.1 16.77% 23.79% 23.72%
0.01 28.59% 31.98% 28.82%
0.001 31.00% 35.62% 32.81%

References
1. Yan, G.H.; Wang, S.H.; Guan, Z.W.; Liu, C.F. PID Control Strategy of Vehicle Active Suspension Based on Considering Time-Delay
and Stability. Adv. Mater. Res. 2013, 706–708, 901–906. [CrossRef]
2. Xu, J.; Chung, K.W. Effects of Time Delayed Position Feedback on a van Der Pol–Duffing Oscillator. Phys. D Nonlinear Phenom.
2003, 180, 17–39. [CrossRef]
3. Zhang, H.; Wang, X.-Y.; Lin, X.-H. Topology Identification and Module–Phase Synchronization of Neural Network with Time
Delay. IEEE Trans. Syst. Man Cybern. Syst. 2017, 47, 885–892. [CrossRef]
4. Min, H.; Lu, J.; Xu, S.; Duan, N.; Chen, W. Neural Network-Based Output-Feedback Control for Stochastic High-Order Non-Linear
Time-Delay Systems with Application to Robot System. IET Control. Theory Appl. 2017, 11, 1578–1588. [CrossRef]
5. Chen, X.; Leng, S.; He, J.; Zhou, L.; Liu, H. The Upper Bounds of Cellular Vehicle-to-Vehicle Communication Latency for
Platoon-Based Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2023, 24, 6874–6887. [CrossRef]
6. Li, J.; Liu, X.; Xiao, M.; Lu, G. A Planning Control Strategy Based on Dynamic Safer Buffer to Avoid Traffic Collisions in an
Emergency for CAVs at Nonsignalized Intersections. J. Transp. Eng. Part A Syst. 2023, 149, 04023066. [CrossRef]
7. Xu, L.; Ma, J.; Zhang, S.; Wang, Y. Car Following Models for Alleviating the Degeneration of CACC Function of CAVs in Weak
Platoon Intensity. Transp. Lett. 2023, 15, 1–13. [CrossRef]
8. Samiayya, D.; Radhika, S.; Chandrasekar, A. An Optimal Model for Enhancing Network Lifetime and Cluster Head Selection
Using Hybrid Snake Whale Optimization. Peer-to-Peer Netw. Appl. 2023, 16, 1959–1974. [CrossRef]
Sensors 2023, 23, 7827 19 of 20

9. Reddy, P.Y.; Saikia, L.C. Hybrid AC/DC Control Techniques with Improved Harmonic Conditions Using DBN Based Fuzzy
Controller and Compensator Modules. Syst. Sci. Control Eng. 2023, 11, 2188406. [CrossRef]
10. Wang, R.; Jorgensen, A.B.; Liu, W.; Zhao, H.; Yan, Z.; Munk-Nielsen, S. Voltage Balancing of Series-Connected SiC Mosfets with
Adaptive-Impedance Self-Powered Gate Drivers. IEEE Trans. Ind. Electron. 2023, 70, 11401–11411. [CrossRef]
11. Klockiewicz, Z.; Slaski, G. Comparison of Vehicle Suspension Dynamic Responses for Simplified and Advanced Adjustable
Damper Models with Friction, Hysteresis and Actuation Delay for Different Comfort-Oriented Control Strategies. Acta Mech.
Autom. 2023, 17, 1–15. [CrossRef]
12. Ji, G.; Li, S.; Feng, G.; Wang, H. Enhanced Variable Universe Fuzzy Control of Vehicle Active Suspension Based on Adaptive
Contracting-Expanding Factors. Int. J. Fuzzy Syst. 2023, 1–15. [CrossRef]
13. Han, S.-Y.; Zhang, C.-H.; Tang, G.-Y. Approximation Optimal Vibration for Networked Nonlinear Vehicle Active Suspension with
Actuator Time Delay. Asian J. Control. 2017, 19, 983–995. [CrossRef]
14. Lei, J. Optimal Vibration Control of Nonlinear Systems with Multiple Time-Delays: An Application to Vehicle Suspension. Integr.
Ferroelectr. 2016, 170, 10–32. [CrossRef]
15. Bououden, S.; Chadli, M.; Zhang, L.; Yang, T. Constrained Model Predictive Control for Time-Varying Delay Systems: Application
to an Active Car Suspension. Int. J. Control Autom. Syst. 2016, 14, 51–58. [CrossRef]
16. Udwadia, F.E.; Phohomsiri, P. Active Control of Structures Using Time Delayed Positive Feedback Proportional Control Designs.
Struct. Control. Health Monit. 2006, 13, 536–552. [CrossRef]
17. Pan, H.; Sun, W.; Gao, H.; Yu, J. Finite-Time Stabilization for Vehicle Active Suspension Systems with Hard Constraints. IEEE
Trans. Intell. Transp. Syst. 2015, 16, 2663–2672. [CrossRef]
18. Yang, J.N.; Li, Z.; Danielians, A.; Liu, S.C. Aseismic Hybrid Control of Nonlinear and Hysteretic Structures I. J. Eng. Mech. 1992,
118, 1423–1440. [CrossRef]
19. Kwon, W.; Pearson, A. Feedback Stabilization of Linear Systems with Delayed Control. IEEE Trans. Autom. Control. 1980,
25, 266–269. [CrossRef]
20. Du, H.; Zhang, N. H∞ Control of Active Vehicle Suspensions with Actuator Time Delay. J. Sound Vib. 2007, 301, 236–252.
[CrossRef]
21. Li, H.; Jing, X.; Karimi, H.R. Output-Feedback-Based Hınfty Control for Vehicle Suspension Systems with Control Delay. IEEE
Trans. Ind. Electron. 2014, 61, 436–446. [CrossRef]
22. Kim, J.; Lee, T.; Kim, C.-J.; Yi, K. Model Predictive Control of a Semi-Active Suspension with a Shift Delay Compensation Using
Preview Road Information. Control Eng. Pract. 2023, 137, 105584. [CrossRef]
23. Wu, K.; Ren, C.; Nan, Y.; Li, L.; Yuan, S.; Shao, S.; Sun, Z. Experimental Research on Vehicle Active Suspension Based on
Time-Delay Control. Int. J. Control 2023, 96, 1–17. [CrossRef]
24. Li, G.; Huang, Q.; Hu, G.; Ding, R.; Zhu, W.; Zeng, L. Semi-Active Fuzzy Cooperative Control of Vehicle Suspension with a
Magnetorheological Damper. J. Intell. Mater. Syst. Struct. 2023, 1045389X231157353. [CrossRef]
25. Wang, D. Adaptive Control for the Nonlinear Suspension Systems with Stochastic Disturbances and Unknown Time Delay. Syst.
Sci. Control Eng. 2022, 10, 208–217. [CrossRef]
26. Zhang, Z.; Dong, J. A New Optimization Control Policy for Fuzzy Vehicle Suspension Systems Under Membership Functions
Online Learning. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 3255–3266. [CrossRef]
27. Xie, Z.; You, W.; Wong, P.K.; Li, W.; Ma, X.; Zhao, J. Robust Fuzzy Fault Tolerant Control for Nonlinear Active Suspension Systems
via Adaptive Hybrid Triggered Scheme. Int. J. Adapt. Control Signal Process. 2023, 37, 1608–1627. [CrossRef]
28. Sakthivel, R.; Shobana, N.; Priyanka, S.; Kwon, O.M. State Observer-Based Predictive Proportional-Integral Tracking Control for
Fuzzy Input Time-Delay Systems. Int. J. Robust Nonlinear Control 2023, 33, 6052–6069. [CrossRef]
29. Gu, B.; Cong, J.; Zhao, J.; Chen, H.; Fatemi Golshan, M. A Novel Robust Finite Time Control Approach for a Nonlinear Disturbed
Quarter-Vehicle Suspension System with Time Delay Actuation. Automatika 2022, 63, 627–639. [CrossRef]
30. Ma, X.; Wong, P.K.; Li, W.; Zhao, J.; Ghadikolaei, M.A.; Xie, Z. Multi-Objective H-2/H-8 Control of Uncertain Active Suspension
Systems with Interval Time-Varying Delay. Proc. Inst. Mech. Eng. Part I J. Syst. Control Eng. 2023, 237, 335–347. [CrossRef]
31. Lee, Y.J.; Pae, D.S.; Choi, H.D.; Lim, M.T. Sampled-Data L-2 - L-8 Filter-Based Fuzzy Control for Active Suspensions. IEEE Access
2023, 11, 21068–21080. [CrossRef]
32. Ma, G.; Wang, Z.; Yuan, Z.; Wang, X.; Yuan, B.; Tao, D. A Comprehensive Survey of Data Augmentation in Visual Reinforcement
Learning. arXiv 2022. [CrossRef]
33. Gao, Z.; Yan, X.; Gao, F.; He, L. Driver-like Decision-Making Method for Vehicle Longitudinal Autonomous Driving Based on
Deep Reinforcement Learning. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2022, 236, 3060–3070. [CrossRef]
34. Fares, A.; Bani Younes, A. Online Reinforcement Learning-Based Control of an Active Suspension System Using the Actor Critic
Approach. Appl. Sci. 2020, 10, 8060. [CrossRef]
35. Liu, M.; Li, Y.; Rong, X.; Zhang, S.; Yin, Y. Semi-Active Suspension Control Based on Deep Reinforcement Learning. IEEE Access
2020, 8, 9978–9986. [CrossRef]
36. Pang, H.; Luo, J.; Wang, M.; Wang, L. A Stability Guaranteed Nonfragile Fault-Tolerant Control Approach for Markov-Type
Vehicle Active Suspension System Subject to Faults and Disturbances. J. Vib. Control 2023, 10775463231160807. [CrossRef]
37. Kozek, M.; Smoter, A.; Lalik, K. Neural-Assisted Synthesis of a Linear Quadratic Controller for Applications in Active Suspension
Systems of Wheeled Vehicles. Energies 2023, 16, 1677. [CrossRef]
Sensors 2023, 23, 7827 20 of 20

38. Li, Y.; Wang, T.; Liu, W.; Tong, S. Neural Network Adaptive Output-Feedback Optimal Control for Active Suspension Systems.
IEEE Trans. Syst. Man Cybern Syst. 2022, 52, 4021–4032. [CrossRef]
39. Lin, Y.-C.; Nguyen, H.L.T.; Yang, J.-F.; Chiou, H.-J. A Reinforcement Learning Backstepping-Based Control Design for a Full
Vehicle Active Macpherson Suspension System. IET Control Theory Appl. 2022, 16, 1417–1430. [CrossRef]
40. Yong, H.; Seo, J.; Kim, J.; Kim, M.; Choi, J. Suspension Control Strategies Using Switched Soft Actor-Critic Models for Real Roads.
IEEE Trans. Ind. Electron. 2023, 70, 824–832. [CrossRef]
41. Lee, D.; Jin, S.; Lee, C. Deep Reinforcement Learning of Semi-Active Suspension Controller for Vehicle Ride Comfort. IEEE Trans.
Veh. Technol. 2023, 72, 327–339. [CrossRef]
42. Du, Y.; Chen, J.; Zhao, C.; Liao, F.; Zhu, M. A Hierarchical Framework for Improving Ride Comfort of Autonomous Vehicles via
Deep Reinforcement Learning with External Knowledge. Comput.-Aided Civ. Infrastruct. Eng. 2022, 38, 1059–1078. [CrossRef]
43. Han, S.-Y.; Liang, T. Reinforcement-Learning-Based Vibration Control for a Vehicle Semi-Active Suspension System via the PPO
Approach. Appl. Sci. 2022, 12, 3078. [CrossRef]
44. Dridi, I.; Hamza, A.; Ben Yahia, N. A New Approach to Controlling an Active Suspension System Based on Reinforcement
Learning. Adv. Mech. Eng. 2023, 15, 16878132231180480. [CrossRef]
45. Kwok, N.M.; Ha, Q.P.; Nguyen, T.H.; Li, J.; Samali, B. A Novel Hysteretic Model for Magnetorheological Fluid Dampers and
Parameter Identification Using Particle Swarm Optimization. Sens. Actuators A Phys. 2006, 132, 441–451. [CrossRef]
46. Krauze, P.; Kasprzyk, J. Driving Safety Improved with Control of Magnetorheological Dampers in Vehicle Suspension. Appl. Sci.
2020, 10, 8892. [CrossRef]
47. Savaresi, S.M.; Spelta, C. Mixed Sky-Hook and ADD: Approaching the Filtering Limits of a Semi-Active Suspension. J. Dyn. Syst.
Meas. Control 2006, 129, 382–392. [CrossRef]
48. Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep
Reinforcement Learning. arXiv 2019. [CrossRef]
49. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.;
Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [CrossRef]
50. Van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the AAAI
Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [CrossRef]
51. Fujimoto, S.; Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the
Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 3 July 2018; pp. 1587–1596. [CrossRef]
52. Theunissen, J.; Sorniotti, A.; Gruber, P.; Fallah, S.; Ricco, M.; Kvasnica, M.; Dhaens, M. Regionless Explicit Model Predictive
Control of Active Suspension Systems with Preview. IEEE Trans. Ind. Electron. 2020, 67, 4877–4888. [CrossRef]
53. Liang, G.; Zhao, T.; Wei, Y. DDPG Based Self-Learning Active and Model-Constrained Semi-Active Suspension Control. In
Proceedings of the 2021 5th CAA International Conference on Vehicular Control and Intelligence (CVCI), Tianjin, China, 29–31
October 2021; pp. 1–6. [CrossRef]
54. Baek, S.; Baek, J.; Choi, J.; Han, S. A Reinforcement Learning-Based Adaptive Time-Delay Control and Its Application to Robot
Manipulators. In Proceedings of the 2022 American Control Conference (ACC), Atlanta, GA, USA, 8–10 June 2022; pp. 2722–2729.
[CrossRef]
55. Li, S.; Ding, L.; Gao, H.; Liu, Y.-J.; Li, N.; Deng, Z. Reinforcement Learning Neural Network-Based Adaptive Control for State and
Input Time-Delayed Wheeled Mobile Robots. IEEE Trans. Syst. Man Cybern. Syst. 2020, 50, 4171–4182. [CrossRef]
56. Zhu, W.; Garg, T.; Raza, S.; Lalar, S.; Barak, D.D.; Rahmani, A.W. Application Research of Time Delay System Control in Mobile
Sensor Networks Based on Deep Reinforcement Learning. Wirel. Commun. Mob. Comput. 2022, 2022, 7844719. [CrossRef]
57. Chen, B.; Xu, M.; Li, L.; Zhao, D. Delay-Aware Model-Based Reinforcement Learning for Continuous Control. Neurocomputing
2021, 450, 119–128. [CrossRef]
58. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

MCQ - Class 9 - Matter in Our Surroundings
100% (4)
MCQ - Class 9 - Matter in Our Surroundings
22 pages
Case Presentation On Pem
82% (22)
Case Presentation On Pem
22 pages
Sensors 23 07827 With Cover
No ratings yet
Sensors 23 07827 With Cover
21 pages
Jve 21 3 20077
No ratings yet
Jve 21 3 20077
12 pages
Online Reinforcement Learning-Based Control of An Active Suspension System Using The Actor Critic Approach
No ratings yet
Online Reinforcement Learning-Based Control of An Active Suspension System Using The Actor Critic Approach
13 pages
Vibration Control of An Active Vehicle Suspension Systems Using Optimized
No ratings yet
Vibration Control of An Active Vehicle Suspension Systems Using Optimized
9 pages
Actuators 12 00437 v2
No ratings yet
Actuators 12 00437 v2
17 pages
Aperiodic Sampled-Data H Infty Control of Vehicle Active Suspension System An Uncertain Discrete-Time Model Approach
No ratings yet
Aperiodic Sampled-Data H Infty Control of Vehicle Active Suspension System An Uncertain Discrete-Time Model Approach
12 pages
IEEE Transactions On Industrial Electronics Volume Issue 2019 (Doi 10.1109 - TIE.2019.2893847) Liu, Yan-Jun Zeng, Qiang Tong, Shaocheng Che
No ratings yet
IEEE Transactions On Industrial Electronics Volume Issue 2019 (Doi 10.1109 - TIE.2019.2893847) Liu, Yan-Jun Zeng, Qiang Tong, Shaocheng Che
9 pages
Active Suspension Control of Full Car Systems Without Function Approximation PDF
No ratings yet
Active Suspension Control of Full Car Systems Without Function Approximation PDF
12 pages
Dynamic Output Feedback Fault-Tolerant Control For Switched Vehicle Active Suspension Delayed Systems
No ratings yet
Dynamic Output Feedback Fault-Tolerant Control For Switched Vehicle Active Suspension Delayed Systems
13 pages
Semi-Active Suspension Control Based On Deep Reinforcement Learning
No ratings yet
Semi-Active Suspension Control Based On Deep Reinforcement Learning
9 pages
Development of A Machine Learning Based Control System For Vehicle Active Suspension Systems
No ratings yet
Development of A Machine Learning Based Control System For Vehicle Active Suspension Systems
9 pages
tmp4D3F TMP
No ratings yet
tmp4D3F TMP
8 pages
Tuning Parameters of The Fractional Order PID-LQR
No ratings yet
Tuning Parameters of The Fractional Order PID-LQR
23 pages
4.Drsrg 2017 Assjeejn
No ratings yet
4.Drsrg 2017 Assjeejn
10 pages
Adaptive Sliding Mode Control For Uncertain Active Suspension Systems With Prescribed Performance
No ratings yet
Adaptive Sliding Mode Control For Uncertain Active Suspension Systems With Prescribed Performance
9 pages
Neural Controller Design For Suspension Systems
No ratings yet
Neural Controller Design For Suspension Systems
10 pages
Adaptive Control For Active Suspension System Based On The High-Order Fully Actuated System Theory
No ratings yet
Adaptive Control For Active Suspension System Based On The High-Order Fully Actuated System Theory
6 pages
2021 - Wang, Chang, Tian - Extended State Observer-Based Backstepping Fast Terminal Sliding Mode Control For Active Susp
No ratings yet
2021 - Wang, Chang, Tian - Extended State Observer-Based Backstepping Fast Terminal Sliding Mode Control For Active Susp
16 pages
Huang 2017 Pidsiso
No ratings yet
Huang 2017 Pidsiso
13 pages
A Comparative Analysis Ofpid, LQR and Fuzzy Logic Controller For Active Suspension System Using Degree of Freedom Quarter Car Model
No ratings yet
A Comparative Analysis Ofpid, LQR and Fuzzy Logic Controller For Active Suspension System Using Degree of Freedom Quarter Car Model
5 pages
Time Delay Control
No ratings yet
Time Delay Control
7 pages
Constrained H Control of Active Suspensions An LMI Approach
No ratings yet
Constrained H Control of Active Suspensions An LMI Approach
11 pages
Fuzzy
No ratings yet
Fuzzy
8 pages
Active Suspension Control Based On Multi-Agent Predictive Algorithm&#x002A
No ratings yet
Active Suspension Control Based On Multi-Agent Predictive Algorithm&#x002A
5 pages
Applsci 13 08204
No ratings yet
Applsci 13 08204
14 pages
A Deep Reinforcement Learning-Based Controller For Magnetorheological-Damped Vehicle Suspension
No ratings yet
A Deep Reinforcement Learning-Based Controller For Magnetorheological-Damped Vehicle Suspension
19 pages
Intelligent Neural Network Control For Active Heavy Truck Suspension October 2018
No ratings yet
Intelligent Neural Network Control For Active Heavy Truck Suspension October 2018
9 pages
Design and Performance Evaluation of A Novel Fract
No ratings yet
Design and Performance Evaluation of A Novel Fract
11 pages
Machines 11 01022 v2
No ratings yet
Machines 11 01022 v2
17 pages
Design An Intelligent Controller For Full Vehicle
No ratings yet
Design An Intelligent Controller For Full Vehicle
21 pages
Research On Robust Fault-Tolerant Control of The C
No ratings yet
Research On Robust Fault-Tolerant Control of The C
23 pages
Adaptive Neural Network Sliding Mode Control For A
No ratings yet
Adaptive Neural Network Sliding Mode Control For A
11 pages
Applsci 13 13219
No ratings yet
Applsci 13 13219
17 pages
Applied Soft Computing: Jimoh O. Pedro, Muhammed Dangor, Olurotimi A. Dahunsi, M. Montaz Ali
No ratings yet
Applied Soft Computing: Jimoh O. Pedro, Muhammed Dangor, Olurotimi A. Dahunsi, M. Montaz Ali
13 pages
Design and Simulation Optimal Controller For Quarter Car Active Suspension System
No ratings yet
Design and Simulation Optimal Controller For Quarter Car Active Suspension System
7 pages
Advanced Control For Vehicle Active Suspension Systems (Weichao Sun, Huijun Gao, Peng Shi)
No ratings yet
Advanced Control For Vehicle Active Suspension Systems (Weichao Sun, Huijun Gao, Peng Shi)
236 pages
Controlstrategies Automotivesuspension Bookchapter Springer Kashem2017
No ratings yet
Controlstrategies Automotivesuspension Bookchapter Springer Kashem2017
14 pages
Sensors 22 08732 v3
No ratings yet
Sensors 22 08732 v3
21 pages
MultiMode Active Suspension Control Based On A Genetic KMeans Clustering Linear Quadratic Algorithm
No ratings yet
MultiMode Active Suspension Control Based On A Genetic KMeans Clustering Linear Quadratic Algorithm
24 pages
Sensors 23 05722
No ratings yet
Sensors 23 05722
27 pages
Road-Adaptive Static Output Feedback Control of A Semi-Active Suspension System For Ride Comfort
No ratings yet
Road-Adaptive Static Output Feedback Control of A Semi-Active Suspension System For Ride Comfort
25 pages
LQR Control Scheme For Active Vehicle Suspension Systems Based On Modal Decomposition
No ratings yet
LQR Control Scheme For Active Vehicle Suspension Systems Based On Modal Decomposition
6 pages
Control System
No ratings yet
Control System
174 pages
An Optimal Vibration Control Strategy For A Vehicle's Active Suspension Based On Improved Cultural Algorithm
No ratings yet
An Optimal Vibration Control Strategy For A Vehicle's Active Suspension Based On Improved Cultural Algorithm
8 pages
Design and Modeling of Active Suspension
No ratings yet
Design and Modeling of Active Suspension
5 pages
Wevj 14 00032 v2
No ratings yet
Wevj 14 00032 v2
19 pages
1 s2.0 S0888327023000237 Main
No ratings yet
1 s2.0 S0888327023000237 Main
12 pages
PID Control of A Nonlinear Half-Car Active Suspension System Via Force Feedback
No ratings yet
PID Control of A Nonlinear Half-Car Active Suspension System Via Force Feedback
7 pages
2016 Optimization and Static Output-Feedback Control For Half-Car
No ratings yet
2016 Optimization and Static Output-Feedback Control For Half-Car
13 pages
Preprints202505 0427 v1
No ratings yet
Preprints202505 0427 v1
17 pages
Vehicle System Dynamics: International Journal of Vehicle Mechanics and Mobility
No ratings yet
Vehicle System Dynamics: International Journal of Vehicle Mechanics and Mobility
14 pages
Fuzzy Ömürcan Özgüney
No ratings yet
Fuzzy Ömürcan Özgüney
16 pages
2012 - Adaptive Backstepping Control For Active Suspension Systems With Hard Constraints
No ratings yet
2012 - Adaptive Backstepping Control For Active Suspension Systems With Hard Constraints
8 pages
..Optimal Vibration Control For Tracked Vehicle Suspension Systems
No ratings yet
..Optimal Vibration Control For Tracked Vehicle Suspension Systems
8 pages
Wevj 14 00249
No ratings yet
Wevj 14 00249
23 pages
Iyimodelling and Controller Design For A Cruise Control System
No ratings yet
Iyimodelling and Controller Design For A Cruise Control System
5 pages
2024 Ijvics-218367
No ratings yet
2024 Ijvics-218367
12 pages
2013 - Sun, Gao, Yao - Adaptive Robust Vibration Control of Full-Car Active Suspensions With Electrohydraulic Actuators
No ratings yet
2013 - Sun, Gao, Yao - Adaptive Robust Vibration Control of Full-Car Active Suspensions With Electrohydraulic Actuators
6 pages
Design and Analysis of Controllers: Definitive Reference for Developers and Engineers
From Everand
Design and Analysis of Controllers: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Automatic Control: Experimental Approaches
From Everand
Automatic Control: Experimental Approaches
Subodh Keshari
No ratings yet
Prenatal Genetic Testing For Monogenic Diabetes Due To Glucokinase Deficiency (December 2023) What's New
No ratings yet
Prenatal Genetic Testing For Monogenic Diabetes Due To Glucokinase Deficiency (December 2023) What's New
33 pages
Jivit 200810
No ratings yet
Jivit 200810
6 pages
Pharmacy Site File Checklist
No ratings yet
Pharmacy Site File Checklist
7 pages
The Nearest Neighbour Algorithm
No ratings yet
The Nearest Neighbour Algorithm
3 pages
003 - Syngas Generation For GTL PDF
No ratings yet
003 - Syngas Generation For GTL PDF
91 pages
French Sociologist Pierre Bourdieu
No ratings yet
French Sociologist Pierre Bourdieu
3 pages
Godavarman Case
No ratings yet
Godavarman Case
9 pages
Control Serum Preparation
No ratings yet
Control Serum Preparation
12 pages
Rubric For Preparation of Design/Computational Plate
No ratings yet
Rubric For Preparation of Design/Computational Plate
1 page
Based On May 2011 Occupational Standards: Ethiopian TVET-System
No ratings yet
Based On May 2011 Occupational Standards: Ethiopian TVET-System
92 pages
DBMS Lab 6
No ratings yet
DBMS Lab 6
3 pages
3D Assignment
No ratings yet
3D Assignment
7 pages
5 Paragraph Essay
No ratings yet
5 Paragraph Essay
5 pages
Rubrics For Group Presentation
100% (1)
Rubrics For Group Presentation
1 page
Usg Plasters Hydrocal Gypsum Cements Sealers Parting Compounds Brochure en IG515
No ratings yet
Usg Plasters Hydrocal Gypsum Cements Sealers Parting Compounds Brochure en IG515
2 pages
Grade 6 Conjunctions
No ratings yet
Grade 6 Conjunctions
65 pages
Chinese and Japanese Architecture
No ratings yet
Chinese and Japanese Architecture
26 pages
D D D D D D D D: TL5001, TL5001A Pulse-Width-Modulation Control Circuits
No ratings yet
D D D D D D D D: TL5001, TL5001A Pulse-Width-Modulation Control Circuits
33 pages
Singing Competition Themanoor
No ratings yet
Singing Competition Themanoor
4 pages
World Religion Week 2 PDF
No ratings yet
World Religion Week 2 PDF
9 pages
Pas Bahasa Inggris Kelas Ix
No ratings yet
Pas Bahasa Inggris Kelas Ix
7 pages
Study of E Banking Services Offered by ICICI Bank Manavi Mhaskar 09
No ratings yet
Study of E Banking Services Offered by ICICI Bank Manavi Mhaskar 09
58 pages
DXC INTERVIEW QUESTIONS Consolidated
No ratings yet
DXC INTERVIEW QUESTIONS Consolidated
8 pages
p6 Angles Studentonline
No ratings yet
p6 Angles Studentonline
10 pages
Public Administration
No ratings yet
Public Administration
178 pages
Module 1-Ders Notları
No ratings yet
Module 1-Ders Notları
2 pages
Mockingbird
No ratings yet
Mockingbird
4 pages
SurgeTesting EARbasics 0716
100% (1)
SurgeTesting EARbasics 0716
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Research On Deep Reinforcement Learning Control Al

Uploaded by

Research On Deep Reinforcement Learning Control Al

Uploaded by

sensors

1 School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100811, China

Citation: Wang, Y.; Wang, C.; Zhao,

Sensors 2023, 23, 7827. https://doi.org/10.3390/s23187827 https://www.mdpi.com/journal/sensors

2. Dynamics Model of Active Suspension System Considering Time Delay

where n00 is the spatial cutoﬀ frequencyf 0 =of2πn v

Figure 2. Road time domain unevenness curve.

3.1. Reinforcement Learning

ment of the TD3 algorithm is shown in Figure 4.

tively. Then, synchronize theθQ1 → θ0Q1 , θQ2 → θ0Q2 , θµ → θµ

where ξt ∼ Nt (0, σ).

θ0Qj ← εθQj + (1 − ε)θ0Qj , j = 1, 2

3 Initialize replay buffer Ω

where λ is the normalized coefficient of each state variable.

Ft = at = µ(st |θµ ) (15)

a = clip( at+τ+τa , Fmin , Fmax ) (17)

| xb (t) − xu (t)| ≤ f d (18)

k u | xu (t) − w(t)| ≤ Fm (19)

5. Simulation and Results

5.1. Implementation Details

Table 1. Environment model parameters.

Parameters Value Parameters Value

Table 2. Agent hyperparameter.

5.2. Deterministic Delayed Conditions

Table 3. RMS values of body acceleration in deterministic delayed conditions.

Table 3. RMS values of body acceleration in deterministic delayed conditions.

5.3. Semi-Regular Delayed Conditions

τa = 2δ, f < |∆F | ≤ 2 f (22)

5.4. Uncertain Delay Conditions

Table 4. RMS values of body acceleration at different speeds.

Speed (m/s) Passive Proposed Optimization

Appendix A. Notations and Abbreviations

Table A1. The notations used in the paper.

Parameters Notation Unit

Table A2. The list of abbreviations.

Abbreviation Full Name

Appendix B. Hyperparametric Portfolio Experiment

Table A3. Number of neurons in the deep neural network.

Optimization 0.1 0.01 0.001

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.