0% found this document useful (0 votes)

27 views11 pages

Design and Application of Adaptive PID Controller Based On Asynchronous Advantage Actor-Critic Learning Method

Uploaded by

Jorge CB

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views11 pages

Design and Application of Adaptive PID Controller Based On Asynchronous Advantage Actor-Critic Learning Method

Uploaded by

Jorge CB

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Wireless Networks (2021) 27:3537–3547

https://doi.org/10.1007/s11276-019-02225-x (0123456789().,-volV)(0123456789().,-volV)

Design and application of adaptive PID controller based

on asynchronous advantage actor–critic learning method
Qifeng Sun1 • Chengze Du1 • Youxiang Duan1 • Hui Ren1 • Hongqiang Li2

Published online: 31 December 2019

Ó The Author(s) 2019

Abstract
To address the problems of the slow convergence and inefficiency in the existing adaptive PID controllers, we propose a
new adaptive PID controller using the asynchronous advantage actor–critic (A3C) algorithm. Firstly, the controller can
train the multiple agents of the actor–critic structures in parallel exploiting the multi-thread asynchronous learning
characteristics of the A3C structure. Secondly, in order to achieve the best control effect, each agent uses a multilayer
neural network to approach the strategy function and value function to search the best parameter-tuning strategy in
continuous action space. The simulation results indicate that our proposed controller can achieve the fast convergence and
strong adaptability compared with conventional controllers.

Keywords Reinforcement learning Asynchronous advantage actor–critic Adaptive PID control Stepping motor

1 Introduction evolutionary adaptive PID controller [8] has difficulty in

achieving real-time control due to the fact that it requires
The PID controller is a control loop feedback mechanism less prior knowledge [9]. The adaptive PID controller,
which is widely used in industrial control system [1]. Based which is based on reinforcement learning [10], solves the
on the investigation of conventional PID controller, the problem by obtaining the teacher’s signal in unsupervised
adaptive PID controller adopts online parameter adjust- learning process. And the optimization of the control
ment method according to the state of the system, therefore parameters is simple. The actor–critic (AC) adaptive PID
it has better system adaptability. The fuzzy PID controller [11, 12] is the most widely used reinforcement learning
[2] adopts the ideology of matrix estimations [3, 4]. In controller. However, the convergence speed of the con-
order to satisfy the requirement of the self-tuning PID troller is affected by the correlation of the learning data in
parameters, the method adjusts the parameters by querying the AC algorithm [13].
fuzzy matrix table. The limitation of this method is that it Google’s DeepMind team proposed the asynchronous
needs much more prior knowledge. Moreover, this method advantage actor–critic (A3C) learning algorithm [14, 15].
has a large number of parameters that is needed to be This algorithm adopts multi-strategies to train multiple
optimized [5]. agents in parallel, each agent will experience different
The adaptive PID controller [6, 7] approximates non- learning state. So the correlation of the learning sample is
linear structure by neural networks, which can achieve broken while improving the computational efficiency [16].
effective control without identifying the complex nonlinear This algorithm has been applied in many fields [17, 18].
controlled object. But, it is difficult to obtain the teacher The proposed method aims to improve the convergence
signals in the supervised learning process. The and adaptive ability of the PID controller. To achieve this
purpose, we use the A3C algorithm that enhance the
& Qifeng Sun learning rate to train agent in the parallel threads. And two
Sunqf@upc.edu.cn BP neural networks are used to approach policy function
1
and value function separately. The experiments show that
China University of Petroleum, Qingdao 266580, China the proposed algorithm outperforms the conventional PID
2
China Petrochemical Group Victory Drilling Technology controlling algorithms. The rest of paper is arranged as
Research Institute, Dongying 257000, China

123
3538 Wireless Networks (2021) 27:3537–3547

fellows. Starting from a brief description of PID controller 2.3 Reinforcement learning adaptive controller
in Sect. 2 and 3, we introduce our new approach in Sect. 4
and show experimental results in Sect. 5. We conclude the Aziz Khater [26] proposed a PID controller that combining
paper in Sect. 6. the ASN reinforcement-learning network with fuzzy math.
Despite this method did not need too much accurate
training samples compared the neural network PID, its
2 Related work structure was too complex to guarantee the real-time per-
formance. In view of this point, Adel [10] designed an
The conventional PID controlling algorithms can be adaptive PID controller based on AC algorithm. This
roughly classified into three categories: the fuzzy PID controller had simple structure with one RBF network.
controller, neural network PID controller and reinforce- However, its speed of convergence was slow owing to the
ment learning PID controller. relevance in learning sample of AC algorithm.

2.1 Fuzzy PID controller

3 Basic structure of PID controller
Tang [19] proposed a method that combined the fuzzy
math with the PID control. However, this method still had Incremental PID is an algorithm of PID control by incre-
some limitations such as that it required a lot of manual ment of control volume. The typical control system struc-
experience to establish the rule table. Besides, the rule ture is shown in Fig. 1. Besides, its formula is as follows:
table was often only adapted to a specific application sce-
uðtÞ ¼ uðt 1Þ þ DuðtÞ
nario. To address these issues, Sun [20] developed a fuzzy
PID controller based on improved genetic algorithm, which ¼ uðt 1Þ þ Ki ðtÞeðtÞ þ Kp ðtÞDeðtÞ ð1Þ
2
used multiple fuzzy control rules to adjust parameters by þ Kd ðtÞD eðtÞ
genetic algorithm. The controller abandoned the plenty of
where
manual work and set up an exclusive rule under the envi-
ronment. Spired by the idea of the work, Zhu [21] added e ðt Þ ¼ y0 ðt Þ y ðt Þ
the normalized velocity parameter reflecting the response DeðtÞ ¼ eðtÞ eðt 1Þ
of the system based on the adjusting factor of fuzzy rules.
D2 eðtÞ ¼ eðtÞ 2 eðt 1Þ þ eðt 2Þ
The method aimed to change the mapping between input
and output variables with the fuzzy subsets so that it made y0 ðtÞ, yðtÞ, eðtÞ, DeðtÞ, D2 eðtÞ represents the current actual
the controller be able to divide the error and the rate of signal value, the output value of the current system, the
error into multiple control stages. system output error, the first-order difference of error and
the second-order difference of error respectively. In the
2.2 Adaptive controller based on neural network form 1, incremental PID is cancelled the integral summa-
tion, so it saves the time of calculation. Besides, it influ-
Liao [22] proposed a method utilizing the neural network ence the system lightly when the system is broken. In the
to reinforce the performance of PID controller for the synthesis the factor, the incremental PID is optimum choice
nonlinear system. Although the initial parameters of neural for the practical application.
network could be determined by artificial test, it could not
ensure the reliability of the manual result. Based on this, Li
[23] adopted the genetic algorithm to obtain the optimal 4 A3C adaptive PID control
initial parameters of the network. However, the genetic
algorithm was easily to fall into local optimum. In order to A3C algorithm is a deep reinforcement learning algorithm.
solve the problem, Patel [24] appended the immigration It introduces an asynchronous training method on the basis
mechanism, 10% of the elite population and the inferior of AC framework. The A3C learning framework consists
population were selected as the variant population, to the of a central network (Global Net) and multiple AC
neural network adaptive PID controller (MN-PID). In
addition, Nie [25] presented an adaptive chaos particles
swarm optimization for tuning parameters of PID con- ∆ um(t-1)
-1
Z
troller (CSP-PID) to avoid the local minima. ym`(t) Incremental
State converter PID Plant
em(t) controller ∆ um(t) ym(t)

Fig. 1 PID control structure

123
Wireless Networks (2021) 27:3537–3547 3539

structures, which are executed and learned in parallel by rm ðtÞ ¼ a1 r1 ðtÞ þ a2 r2 ðtÞ
creating multiple agent in same environmental instances.
0; jem ðtÞj\e
The central network is responsible for updating and storing r1 ðtÞ ¼
e em ðtÞ; other
AC network parameters. One agent has its own AC struc-
ture. Different agent will transfer learning data to central 0; jem ðtÞj jem ðt 1Þj
r2 ðtÞ ¼
network to update their parameters of AC network. Further jem ðtÞj jem ðt 1Þj; other
the Actor network is responsible for policy learning, while ð2Þ
critic network is responsible for estimating value function.
In the next step, the Actor (m) and the Critic (m) send their
0 0
4.1 Structure of A3C-PID controller own parameters Wam , Wvm and the generated dTD into the
Global Net to update Wa and Wv with the policy gradient
The design of A3C adaptive PID controller is to combine and the descend gradient. Accordingly, the Global Net
the asynchronous learning structure of A3C with the passes their Wa and Wv to Actor (m) and Critic (m), making
incremental PID controller. Its structure is shown in Fig. 2. them continue to learn new parameters.
The whole process is as follow:
Step 1: For each thread, the initial error em ðtÞ enters the 4.2 A3C learning with neural networks
state converter to calculate Dem ðtÞD2 em ðtÞ and output the
T Multilayer feed-forward neural network [27, 28], also
state vector Sm ðtÞ ¼ em ðtÞ; Dem ðtÞ; D2 em ðtÞ . known as BP neural network, is a back-propagation algo-
Step 2: The Actor (m) maps the state vector Sm ðtÞ to rithm for multilayer feed-forward networks. It has strong
three parameters, Kp Ki and Kd, of PID controller. ability for nonlinear mapping and is suitable for solving
Step 3: The updated controller acts on the environment problems with complex internal mechanism. Therefore, the
to receive the reward rm ðtÞ. method uses two BP neural networks respectively to realize
After n times, Critic (m) receives Sm ðt þ nÞ which is the the learning of policy function and value function. The
state vector of the system. Then it produces the value network structure is as follows.
function estimation VðStþn ; Wv0 Þ and n-step TD error dTD , As shown in Fig. 3, the Actor network has three layers:
which are the important basis for updating parameters. The
The first level is the input layer. The input vector S ¼
formula of the reward function is shown as Formula (2) T
em ðtÞ; Dem ðtÞ; D2 em ðtÞ represents the state vector. The
second layer is the hidden layer. The input of the hidden
layer as follows:

Z-1
∆ um(t-1)

y`m(t) Incremental
State
PID Plant(m)
conventor(m) ym(t)
em(t) Controller(m) ∆ um(t)

Kp Ki Kd

WA
Actor(1)

Actor(m) W`Am
Global Net
TD (t)

Critic(1) Wv

Critic(m)
rm(t) W`vm TD (t)

Fig. 2 Adaptive PID control diagram based on A3C learning

123
3540 Wireless Networks (2021) 27:3537–3547

e(t)
.
Kp .
e(t) . .
. .

∆e(t) .
v
∆e(t) . Ki .
. .
.

Output layer

.
∆2e(t) ∆2e(t) .
. .
.
. Kd
Input layer
Input layer
Output layer Hidden layer

Hidden layer Fig. 4 Critic network structure of actor–critic

Fig. 3 Actor network structure of actor–critic
value VðSt ; Wv0 Þ of the initial state and the estimation value
P
n after n-step, as followed:
hik ðtÞ ¼ wik xi ðtÞ bk k ¼ 1; 2; 3. . .20 ð3Þ
i¼1 dTD ¼ qt V St ; Wv0

where k represents the number of neurons in the hidden qt ¼ rtþ1 þ crtþ2 þ þ cn1 rtþn þ cn V Stþn ; Wv0
layer, wik is the weights connected the input layer and the ð7Þ
hidden layer, bk is the bias of the k neuron. The output of
The 0\c\1, represents the discount factor, is used to
the hidden layer as follows:
determine the ratio of the delayed returns and the imme-
hok ðtÞ ¼ minðmaxðhik ðtÞ; 0Þ; 6Þ k ¼ 1; 2; 3. . .20 ð4Þ diate returns. Wv0 is the weight of the Critic network. The
TD error dTD reflects the quality of the selected actions in
The third layer is the output layer. The input of the output
the Actor network. The performance of the system learning
layer as follows:
is:
P
k
yio ðtÞ ¼ who hoj bo o ¼ 1; 2; 3 ð5Þ 1
j¼1
EðtÞ ¼ d2TD ðtÞ ð8Þ
2
where o represents the number of neurons in the output After calculating the TD error, each AC network in the
layer, who is the weights connected the hidden layer and the A3C structure does not update its network weight directly,
output layer, bo is the bias of the k neuron. but updates the AC network parameters of the central
The output of the output layer as follows: network (Global-Net) with its own gradient. The update
formula is as follows:
yoo ðtÞ ¼ log 1 þ eyio ðtÞ o ¼ 1; 2; 3 ð6Þ

Wa ¼ Wa þ aa dWa þ rw0 a log p as; Wa0 dTD ð9Þ
Actor network does not output the value of Kp Ki and Kd

directly, but output the mean and variance of the three Wv ¼ Wv þ ac dWv þ od2TD Wv0 ð10Þ
parameters. Finally, the actual value of Kp, Ki and Kd is
estimated by the Gauss distribution. where Wa is the weight of Actor network stored by the
The Critic network structure is similar to the Actor central network, and Wa0 represents the weights of Actor
network structure. As shown in Fig. 4, the Critic network network in AC structure, Wv is the weight of Critic network
also uses BP neural networks with three layers structure. in the central network, and Wv0 represents the Critic net-
The first two layers are the same as the layers in the Actor work weights for each AC structure. aa is the learning rate
network. The output layer of the Critic network has only of Actor, and ac is the learning rate of Critic.
one node to output the value function VðSt ; Wv0 Þ of the state.
In the A3C structure, Actor and Critic networks use n- 4.3 The network initialization of A3C-PID
step TD error method [29, 30] to learn action probability controller
function and value function. In the learning method of this
algorithm, the calculation of the n-step TD error dTD is The initial parameters of the network directly affect the
realized by the difference between the state estimation stability of the closed loop control system. However, the

123
Wireless Networks (2021) 27:3537–3547 3541

PID controller of the neural network is difficult to obtain 5 Experiments

the teacher’s signal. Therefore it is necessary to determine
the network parameters by experience or manual trial. The 5.1 Simulation experiment of nonlinear signal
unsupervised learning characteristics of reinforcement
learning enable the controller to obtain the optimal initial In order to verify the effectiveness and superiority of this
parameters of the network through iterative learning. algorithm, the nonlinear objects are simulated and analyzed
However, the AC-PID controller has a slow convergence based on PID, CSP-PID, MN-PID, AC-PID and A3C-PID
speed due to the correlation between the learning samples respectively. The discrete model of the object is as follows:
obtained by the AC algorithm. A3C-PID Controller learns
yðk þ 1Þ ¼ f ðyðkÞ; yðk 1Þ; yðk 2Þ; uðkÞ; uðk 1ÞÞ
network parameters in multi-threading asynchronously,
which can break the relevance of samples and improve the ð11Þ
3 x5 ðx3 1Þþx4
convergence rate. The learning process of A3C-PID net- where f ðx1 ; x2 ; x3 ; x4 ; x5 Þ ¼ x1 x2 x1þx 2 þx2 .
3 2
work parameter is similar to that described in the 3.1 The inputs rin is that:
section, but the difference is that A3C-PID sets the m for 8
> 0:5 sinðpk=25Þ k\250
the number of computer CPU cores in iterative learning, >
>
>
> 0:5 250 k\460
then the value of m is set to one when online controlling. >
< 0:5 460 k\660
rinðtÞ ¼ 0:5 660 k\870
>
>
4.4 Working process of A3C-PID controller >
> 0:3 sinðpk=25Þ þ 0:4 sin
>
> 870 k\1000
:
ðpk=32Þ þ 0:3 sinðpk=40Þ
Based on the architecture of asynchronous learning and the
learning mode with taking n-step TD error as the perfor- ð12Þ
mance, the working process of A3C-PID controller is as The parameters of nonlinear signal simulation are set as
follows: follows: the sampling period is 1 s, m = 4, aa ¼ 0:001,
(a) Setting the sampling period ts, the number of threads ac ¼ 0:01, e ¼ 0:001, c ¼ 0:9, n = 30, K = 3000. The root
of the A3C algorithm m, update the period n, and mean square error (RMSE) and the mean absolute error
initialize the network parameters of each AC struc- (MAE) are used to describe the accuracy of the controller.
ture through iteration learning on K times; The simulation results are shown in Figs. 5, 6, 7, 8 and 9
(b) Calculating errors of system and constructing state and Table 1.
vectors as inputs to Actor(m) and Critic(m); The simulation results show that the A3C-PID controller
(c) Critic(m) outputs VðSt ; Wv0 Þ; reaches the minimum for the root mean square error
(d) Actor(m) outputs the value of Kp, Ki and Kd. Then (RMSE) and the mean absolute error (MAE) value. Com-
the system observes the error em ðt þ 1Þ when next pared with the other three controllers, the control accuracy
sampling time and calculate the reward rm ðtÞ of A3C-PID is higher. It not only proves that our design of
according the Eq. (2); a new PID controller is reasonable but the controller has
(e) Determining whether to update the parameters of the better control performance for the nonlinear system.
Actor(m) and Critic(m). The Critic outputs the state
value VðStþn ; Wv0 Þ then the system updates the
parameters of Global Net which is Wa and Wc
according to Eqs. (9) and (10), if it has meet update
time n. Otherwise, returning step d);
0
(f) Global Net transmits the new parameters Wam and
0
Wcm to each Actor(m) and Critic(m);
(g) Determining whether the end condition is satisfied, if
that exiting the controlling, otherwise updating Sm ðtÞ
and returning step (c).

Fig. 5 Position tracking of PID

123
3542 Wireless Networks (2021) 27:3537–3547

Fig. 6 Position tracking of CPS-PID Fig. 9 Position tracking of A3C-PID

Table 1 The comparison of controller performance

Kinds of controllers RMSE MAE

PID 0.1547 0. 0705

CPS-PID 0.1201 0.0620
MN-PID 0.1203 0.0628
AC-PID 0.1196 0.0621
A3C-PID 0.0884 0.0326

5.2 Simulation experiment of inverted

pendulum

The control of single inverted pendulum is a classic

problem in the control study. The control process is to
Fig. 7 Position tracking of MN-PID
apply the force F to the bottom of the car to make the car
stay in the setting position and make the angle between the
rod and the vertical line in a deviation range.

Fig. 8 Position tracking of AC-PID

Fig. 10 The structure of single inverted pendulum

123
Wireless Networks (2021) 27:3537–3547 3543

Figure 10 shows a single inverted pendulum. As shown

in Fig. 10, the quality of the car is M, the quality of the
pendulum is m, the position of the car is x, the angle of the
pendulum is, the equation of the single inverted pendulum
is obtained as Eqs. (13) and (14).
mðm þ M Þgl ml
h€ ¼ h F ð13Þ
ðm þ M ÞI þ mMl2 ðm þ M Þl þ mMl2
m2 gl2 I þ ml
x€ ¼ h F ð14Þ
ðm þ M ÞI þ mMl2 ðm þ M Þl þ mMl2
1
where I¼ 12 mL2 ; l ¼ 12 L, F represents the force acting on
the car, and take continuous value on [- 10, 10]. Sampling
period is 20 ms. Single inverted pendulum has 4 control
indexes: pendulum angle, swing speed, position of trolley
and speed of car. There initial conditions are as follow:
_ Fig. 12 The output of controller among the PID, AC-PID and A3C-
hð0Þ ¼ 10 ; hð0Þ ¼ 0; xð0Þ ¼ 0:2; xð0Þ
_ ¼0 ð15Þ
PID
The final state of expectation is:
can be seen that under the A3C-PID controlling, the
_
hð0Þ ¼ 0 ; hð0Þ ¼ 0; xð0Þ ¼ 0; xð0Þ
_ ¼0 ð16Þ inverted pendulum can quickly reach the stable state of 4
In the simulation, the parameters of the inverted pendulum control indicators. Figure 12 is the output of A3C-PID,
are as follows: AC-PID and traditional PID control. It can be seen that
A3C-PID controller has better system tracking perfor-
g ¼ 9:8 m=s2 ; M ¼ 10 kg; m ¼ 0:1 kg; L ¼ 0:5 m; mance than traditional PID and AC-PID.
lc ¼ 0:005; lp ¼ 2 105
5.3 Position control of two phase hybrid
The lc presents the friction coefficient of the car relative stepping motor
to the guide rail indicates. The lp presents the friction
coefficient of the rod to the car. The parameters of the 5.3.1 Closed loop control structure of stepping motor
A3C-PID controller are set to as follow:
m = 4; aa ¼ 0:002; ac ¼ 0:01; e ¼ 0:001; c ¼ 0:9; n ¼ 50 The stepper motor is a low speed permanent magnet syn-
chronous motor. It is not used as the input of the pulse
The results of the simulation are shown in Figs. 11 and sequence. But used in the digital control system by
12. Figure 11 is the response of the four controlling indi- changing the excitation state to realize the angle actuating
cators of the inverted pendulum in 10 s. From Fig. 11, it element. The stepper motor usually adds a photoelectric
encoder, a rotating transformer or other measuring feed-
back elements to achieve high precision positioning control
in the closed loop control. The block diagram of the closed-
loop servo control system is shown in Fig. 13. From the
Fig. 13, the inner loop includes the current loop and the
speed loop. The current loop is used to track the current of
the two phase hybrid stepping motor, so that the dual phase
hybrid stepping motor can output the torque smoothly
under the micro step. The speed loop control enables the
load electricity to track the setting speed and achieve the
effect of speed control. The outer loop is the position loop,
which loads the output to track a given position. The
position loop controller usually adopts PID control.
Therefore, we added the A3C-PID to the position loop to
test the validity of the controller.

Fig. 11 The response of four index with A3C-PID

123
3544 Wireless Networks (2021) 27:3537–3547

Input position Position Speed Torque Current Power Motor and

controller controller controller controller drive load

S Current
feedback
Position
feedback

Fig. 13 The closed-loop servo control system of hybrid stepping motor

Fig. 14 The simulation of servo system

5.3.2 Modeling and simulation of two-phase hybrid

In above formulas, ua and ub are two-phase voltage and
stepping motor
current respectively of A and B. R is winding resistance. L
is winding inductance. ke is torque coefficient. h and x are
In this paper, a two-phase hybrid stepping motor is used to
rotation angle and angular velocity of motor respectively.
control in the simulation experiment. Firstly, we need to
Nr is the number of rotor teeth. Te is electromagnetic torque
establish a mathematical model. However, the two-phase
of hybrid stepping motor. TL is Load torque. J and B are the
hybrid stepping motor is a highly nonlinear mechanical and
load moment of inertia and the viscous friction coefficient
electrical device, so it is difficult to describe it accurately.
respectively. It can be seen from the mathematical model
Therefore, the mathematical model of a two phase hybrid
of a two phase hybrid stepping motor that the two phase
stepping motor is studied in this paper. It is simplified and
hybrid stepping motor is still a highly nonlinear and cou-
assumed to be as follows: The magnetic chain in the phase
pled system under a series of simplified conditions.
winding of the permanent magnet varies with the rotor
The simulation model of two phase hybrid stepping
position according to the sinusoidal law. The magnetic
motor servo control system is built by using Simulink in
hysteresis and the eddy current effect are not considered,
Matlab. The simulation is shown in Fig. 14. The parame-
only the mean and fundamental components of the air gap
ters of the motor are as follows: L = 0.5H, Nr = 50, R¼8X,
magnetic conductance are considered. The mutual induc-
J = 2 g cm2, B = 0 N m s/rad, N = 100, TL = 0, ke-
tance between the two phase windings is ignored. On the
= 17.5 N m/A. The N is the reduction ratio of the har-
basis of the above limit, the mathematical model of the two
monic reducer. The parameters of A3C-PID controller are
phase hybrid stepping motor can be described by the
set as follows: m = 4, aa ¼ 0:001, ts = 0.001 s, ac ¼ 0:01,
Eqs. 17–21.
e ¼ 0:001, c ¼ 0:9, n = 30, K = 3000. The results are
dia shown in Figs. 15 and 16 and Table 2.
ua ¼ L þ Ria ke x sinðNr hÞ ð17Þ
dt Dynamic performance of the A3C, BP, and AC adaptive
dib PID controller are shown on Fig. 15. In the time of early
ub ¼ L þ Rib ke x sinðNr hÞ ð18Þ simulation (20 cycles), the BP-PID controller has a faster
dt
response speed and a shorter rise time (12 ms), but it has a
Te ¼ ke ia sinðNr hÞ þ ke ib cosðNr hÞ ð19Þ
higher overshoot of 2.1705%. On the contrary, both the
dx AC-PID and the A3C-PID controller have smaller over-
J þ Bx þ TL ¼ Te ð20Þ
dt shoot as 0.1571% and 0.1021%. But the adjustment time of
dh AC-PID is long (48 ms), and the rise time is 21 ms. In
¼x ð21Þ
dt contrast, A3C-PID controller has better stability and
rapidity.

123
Wireless Networks (2021) 27:3537–3547 3545

Fig. 15 Position tracking Fig. 17 Reward value curve of reinforcement learning

Table 2 The comparison of controller performance

Controller Overshoot Rise time Steady state Adjustment
(%) (ms) error time (ms)

A3C-PID 0.1571 18 0 33
AC-PID 0.1021 21 0 48
BP-PID 2.1705 12 0 32

Simulation results show that the A3C-PID controller has

good adaptive capabilities.
The AC-PID and A3C-PID reward value curves are
shown in Fig. 17. The goal of reinforcement learning is to
learn the best strategy to maximize reward value U. The
calculation formula is as seen in Eq. (22)
" #
Fig. 16 The result of controller parameter turning X
end
U¼E ct RðSt Þ ð22Þ
t¼0
Figure 16 shows the process of adaptive transformation
of A3C-PID controller parameters. As be seen from We can conclude from the analysis of Fig. 17 that A3C-
Fig. 16, the A3C-PID controller is able to adjust the PID PID controller has a higher U value after 3000 iterations
parameters based on errors in different periods. At the than AC-PID controller. In addition, the U value of A3C-
beginning of the simulation, the tracking error of system is PID has become stable after about 1800 iterations, while
large. In order to ensure a fast response speed of the sys- AC-PID converges only after the 2500 iterations. So, A3C-
tem, KP is continuously increasing while Kd is reducing. PID has a faster convergence rate than the AC-PID.
Then the system is in order to prevent from having a high
overshoot, which limits the increasing of Ki. With the error
decreasing, KP begins to decrease and the value of Ki is 6 Conclusions
gradually increased to eliminate the cumulative error, but
at the same time, a small amount of overshoot is caused. Machine Learning and Intelligent Algorithms have been
Since the Kd value at this stage has a large influence on the well applied in many industrial fields [31–36]. The purpose
system, it tends to be stable. When the final tracking error of this paper has been to present our efforts to improve the
comes to zero, KP, Ki and Kd reach a steady state. convergence and adaptability of the adaptive PID

123
3546 Wireless Networks (2021) 27:3537–3547

controller. In this paper, a new PID controller is proposed new membership function in nonlinear fuzzy PID controllers with
with A3C algorithm. The controller uses the BP neural variable gains. Information and Control, 5, 1–7.
6. Caocang, Li, & Cuifang, Zhang. (2015). Adaptive neuron PID
network to approach the policy function and the value control based on minimum resource allocation network. Appli-
function. BP neural network have the strong ability in cation Research of Computers, 32(1), 167–169.
nonlinear mapping, which can enhance the adaptive ability 7. Sheng, X., Jiang, T., Wang, J., et al. (2015). Speed-feed-forward
of the controller. The learning speed of A3C PID controller PID controller design based on BP neural network. Journal of
Computer Applications, 35(S2), 134–137.
is accelerated with the parallel training of CPU multi- 8. Wang, X. S., Cheng, Y. H., & Wei, S. (2007). A proposal of
threading. The method of asynchronous multi-thread adaptive PID controller based on reinforcement learning. Journal
training reduces the correlation of the training data, and of China University of Mining and Technology, 17(1), 40–44.
makes the controller more stable and adaptable. Our 9. Huo, L., Jiang, D., & Lv, Z. (2018). Soft frequency reuse-based
optimization algorithm for energy efficiency of multi-cell net-
experiments of nonlinear signal and inverted pendulum works. Computers and Electrical Engineering, 66(2), 316–331.
demonstrate that A3C-PID controller has higher control 10. Akbarimajd, A. (2015). Reinforcement learning adaptive PID
accuracy than others PID controller. The experiments about controller for an under-actuated robot arm. International Journal
the position control of two phase hybrid stepping motor of Integrated Engineering, 7(2), 20–27.
11. Chen, X. S., & Yang, Y. M. (2011). A novel adaptive PID con-
show that A3C-PID controller has a good performance on troller based on actor–critic learning. Control Theory and
overshoot, rise time, steady state error and adjustment time. Applications, 28(8), 1187–1192.
According these work, the effectiveness and application 12. Bahdanau, D., Brakel, P., Xu, K., Goyal, A., Lowe, R., Pineau, J.,
significance of the new method can be confirmed. Our aim et al. (2016). An actor–critic algorithm for sequence prediction.
arXiv preprint arXiv:1607.07086.
is to make the controller apply to the multi-axis motion 13. Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavuk-
control and the actual industrial production. cuoglu, K., et al. (2016). Sample efficient actor–critic with
experience replay. arXiv preprint arXiv:1611.01224.
Acknowledgements This work was supported by National Science 14. Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T.,
and Technology Major Project of China (Grant Number Harley, T., et al. (2016). Asynchronous methods for deep rein-
2017ZX05009-001). forcement learning. In International conference on machine
learning (pp. 1928–1937).
Open Access This article is licensed under a Creative Commons 15. Jiang, D., Huo, L., Lv, Z., Song, H., & Qin, W. (2018). A joint
Attribution 4.0 International License, which permits use, sharing, multi-criteria utility-based network selection approach for vehi-
adaptation, distribution and reproduction in any medium or format, as cle-to-infrastructure networking. IEEE Transactions on Intelli-
long as you give appropriate credit to the original author(s) and the gent Transportation Systems, 19(10), 3305–3319.
source, provide a link to the Creative Commons licence, and indicate 16. Liu, Q., Zhai, J. W., Zhang, Z. Z., & Zhong, S. (2018). A survey
if changes were made. The images or other third party material in this on deep reinforcement learning. Chinese Journal of Computers,
article are included in the article’s Creative Commons licence, unless 41(01), 1–27.
indicated otherwise in a credit line to the material. If material is not 17. Qin, R., Zeng, S., Li, J. J., & Yuan, Y. (2015). Parallel enterprises
included in the article’s Creative Commons licence and your intended resource planning based on deep reinforcement learning. Zi-
use is not permitted by statutory regulation or exceeds the permitted donghua Xuebao/Acta Automatica Sinica, 43(9), 1588–1596.
use, you will need to obtain permission directly from the copyright 18. Jiang, D., Wang, Y., Lv, Z., Qi, S., & Singh, S. (2019). Big data
holder. To view a copy of this licence, visit http://creativecommons. analysis-based network behavior insight of cellular networks for
org/licenses/by/4.0/. industry 4.0 applications. IEEE Transactions on Industrial
Informatics. https://doi.org/10.1109/TII.2019.2930226.
19. Tang, H. C., Li, Z. X., Wang, Z. T., et al. (2005). A fuzzy PID
control system. Electric Machines and Control, 2, 136–138.
References 20. Sun, J. P., Yan, L., Li, Y., et al. (2006). Design of fuzzy PID
controllers based on improved genetic algorithms. Chinese
1. Adel, T., & Abdelkader, C. (2013). A particle swarm optimiza- Journal of Scientific Instrument, S3, 1991–1992.
tion approach for optimum design of PID controller for nonlinear 21. Zhu, Y. H., Xue, L. Y., & Huang, W. (2011). Design of fuzzy
systems. In International conference on electrical engineering PID controller based on self-organizing adjustment factors.
and software applications (pp. 1–4). IEEE. Journal of System Simulation, 23(12), 2732–2737.
2. Savran, A. (2013). A multivariable predictive fuzzy PID control 22. Liao, F. F., & Xiao, J. (2005). Research on self-tuning of PID
system. Applied Soft Computing, 13(5), 2658–2667. parameters based on BP neural networks. Acta Simulata Sys-
3. Jiang, D., Wang, W., Shi, L., & Song, H. (2018). A compressive tematica Sinica, 07, 1711–1713.
sensing-based approach to end-to-end network traffic recon- 23. Li, G. Y., & Chen, X. L. (2008). Neural network self-learning
struction. IEEE Transactions on Network Science and Engi- PID controller based on real-coded genetic algorithm. Micro-
neering, 5(3), 1–12. motors Servo Technique, 1, 43–45.
4. Jiang, D., Huo, L., & Li, Y. (2018). Fine-granularity inference 24. Patel, R., & Kumar, V. (2015). Multilayer neuro PID controller
and estimations to network traffic for SDN. PLoS ONE, 13(5), based on back propagation algorithm. Procedia Computer Sci-
1–23. ence, 54, 207–214.
5. Zhang, X., Bao, H., Du, J., & Wang, C. (2014). Application of a

123
Wireless Networks (2021) 27:3537–3547 3547

25. Nie, S. K., Wang, Y. J., Xiao, S., & Liu, Z. (2017). An adaptive Chengze Du was born in 1996,
chaos particle swarm optimization for tuning parameters of PID Postgraduate. He is a researcher
controller. Optimal Control Applications and Methods, 38(6), of college of computer science
1091–1102. and technology, China Univer-
26. Aziz Khater, A., El-Bardini, M., & El-Rabaie, N. M. (2015). sity of Petroleum, Qingdao,
Embedded adaptive fuzzy controller based on reinforcement 26658, China. His research
learning for DC motor with flexible shaft. Arabian Journal for interest includes deep learning,
Science and Engineering, 40(8), 2389–2406. industrial application, etc.
27. Liu, Z., Zeng, X., Liu, H., & Chu, R. (2015). A heuristic two-
layer reinforcement learning algorithm based on BP neural net-
works. Journal of Computer Research and Development, 52(3),
579–587.
28. Zhu, J., Song, Y., Jiang, D., & Song, H. (2018). A new deep-Q-
learning-based transmission scheduling mechanism for the cog-
nitive Internet of things. IEEE Internet of Things Journal, 5(4),
2375–2385.
29. Xu, X., Zuo, L., & Huang, Z. (2014). Reinforcement learning Youxiang Duan was born in
algorithms with function approximation: Recent advances and 1964, Ph.D. Graduated from
applications. Information Sciences, 261, 1–31. China University of Petroleum.
30. Jiang, D., Huo, L., & Song, H. (2018). Rethinking behaviors and Now he is a professor of college
activities of base stations in mobile cellular networks based on of computer science and tech-
big data analysis. IEEE Transactions on Network Science and nology, China University of
Engineering, 1(1), 1–12. Petroleum, Qingdao, 26658,
31. Wang, F., Jiang, D., & Qi, S. (2019). An adaptive routing algo- China. His research interest
rithm for integrated information networks. China Communica- includes service computing,
tions, 7(1), 196–207. intelligent control, machine
32. Huo, L., & Jiang, D. (2019). Stackelberg game-based energy- learning etc.
efficient resource allocation for 5G cellular networks. Telecom-
munication System, 23(4), 1–11.
33. Jiang, D., Zhang, P., Lv, Z., & Song, H. (2016). Energy-efficient
multi-constraint routing algorithm with load balancing for smart
city applications. IEEE Internet of Things Journal, 3(6),
1437–1447. Hui Ren was born in 1993,
34. Jiang, D., Li, W., & Lv, H. (2017). An energy-efficient cooper- received his B.S. degree in
ative multicast routing in multi-hop wireless networks for smart Electrical Engineering from
medical applications. Neurocomputing, 220(2017), 160–169. China University of Petroleum,
35. Wang, F., Jiang, D., Wen, H., & Song, H. (2019). Adaboost- China. His research interest
based security level classification of mobile intelligent terminals. includes intelligent control,
The Journal of Supercomputing, 75, 1–19. Actor-Critic learning, etc.
36. Sun, M., Jiang, D., Song, H., & Liu, Y. (2017). Statistical reso-
lution limit analysis of two closely-spaced signal sources using
Rao test. IEEE Access, 5, 22013–22022.

Publisher’s Note Springer Nature remains neutral with regard to

jurisdictional claims in published maps and institutional affiliations.

Qifeng Sun was born in 1976,

Ph.D. Graduated from China Hongqiang Li was born in 1975,
University of Petroleum. Now Senior Engineer, Shengli Dril-
he is a lecturer of college of ling Institute of Sinopec Group,
computer science and technol- Dongying, 257000, China. He
ogy, China University of Petro- has been engaged in field ser-
leum, Qingdao, 26658, China. vice of instruments while dril-
His research interest includes ling. He is dedicated to the
intelligent control, machine research control accuracy anal-
learning etc. ysis of horizontal wells with
large displacement, azimuth
gamma imaging while drilling,
diameter while drilling, and
array imaging while drilling
measurement methods.

123

Reinenforement Learning With Pid Loop
No ratings yet
Reinenforement Learning With Pid Loop
7 pages
Chertovskikh 2019 J. Phys. Conf. Ser. 1359 012090
No ratings yet
Chertovskikh 2019 J. Phys. Conf. Ser. 1359 012090
7 pages
Interpretable PID Parameter Tuning For Control Engineering Using General Dynamic Neural Networks: An Extensive Comparison
No ratings yet
Interpretable PID Parameter Tuning For Control Engineering Using General Dynamic Neural Networks: An Extensive Comparison
16 pages
Pid With Neural
No ratings yet
Pid With Neural
4 pages
Cong 2009
No ratings yet
Cong 2009
8 pages
Literature Review of PID Controller Based On Various Soft Computing Techniques
No ratings yet
Literature Review of PID Controller Based On Various Soft Computing Techniques
4 pages
Preprints202403 0914 v1
No ratings yet
Preprints202403 0914 v1
18 pages
E1FLA5 2014 v14n2 136
No ratings yet
E1FLA5 2014 v14n2 136
9 pages
Reinforcement Learning Approach To Autonomous PID Tuning
No ratings yet
Reinforcement Learning Approach To Autonomous PID Tuning
6 pages
1 s2.0 S1877050925015960 Main
No ratings yet
1 s2.0 S1877050925015960 Main
7 pages
Biomimetics 08 00434
No ratings yet
Biomimetics 08 00434
26 pages
The Application of Adaptive PID Control in The Spray Robot
No ratings yet
The Application of Adaptive PID Control in The Spray Robot
4 pages
Lec. 12-B Artificial Neural Networks - Single Neruon PID
No ratings yet
Lec. 12-B Artificial Neural Networks - Single Neruon PID
18 pages
J Matpr 2021 02 281
No ratings yet
J Matpr 2021 02 281
6 pages
Energies 15 02834
No ratings yet
Energies 15 02834
25 pages
Review of PID Control Design and Tuning Methods
No ratings yet
Review of PID Control Design and Tuning Methods
7 pages
Control Theory Basics and Hands On
No ratings yet
Control Theory Basics and Hands On
22 pages
Lec. 12-A Artificial Neural Networks - Single Neruon PID
No ratings yet
Lec. 12-A Artificial Neural Networks - Single Neruon PID
17 pages
Paper 1
No ratings yet
Paper 1
8 pages
Design of A Data-Driven Controller For A Spiral Heat Exchanger
No ratings yet
Design of A Data-Driven Controller For A Spiral Heat Exchanger
5 pages
Adaptive Fourier Series Neural Network PID Controller - s12555-020-0185-3
No ratings yet
Adaptive Fourier Series Neural Network PID Controller - s12555-020-0185-3
12 pages
Study of Neural Network PID Control in Variable Frequency Air-Conditioning System
No ratings yet
Study of Neural Network PID Control in Variable Frequency Air-Conditioning System
6 pages
Proyecto Final 4
No ratings yet
Proyecto Final 4
11 pages
PID Control Algorithm Based On Multistrategy Enhan
No ratings yet
PID Control Algorithm Based On Multistrategy Enhan
27 pages
Adaptive 2
No ratings yet
Adaptive 2
20 pages
Design of A Data-Driven PID Controller: Toru Yamamoto, Member, IEEE, Kenji Takao, and Takaaki Yamada
No ratings yet
Design of A Data-Driven PID Controller: Toru Yamamoto, Member, IEEE, Kenji Takao, and Takaaki Yamada
11 pages
Design and Implementation of A Self-Tuning Pid Controller
No ratings yet
Design and Implementation of A Self-Tuning Pid Controller
6 pages
Fuzzy Logic Control of Blood Pressure During Anesthesia
100% (1)
Fuzzy Logic Control of Blood Pressure During Anesthesia
14 pages
Adaptive Proportional-Integral Controller Using OL PDF
No ratings yet
Adaptive Proportional-Integral Controller Using OL PDF
11 pages
Applsci 12 11128 v2
No ratings yet
Applsci 12 11128 v2
16 pages
Real Time Application of Ants Colony Optimization: Dr.S.M.Girirajkumar Dr.K.Ramkumar Sanjay Sarma O.V
No ratings yet
Real Time Application of Ants Colony Optimization: Dr.S.M.Girirajkumar Dr.K.Ramkumar Sanjay Sarma O.V
13 pages
147 337 1 SM PDF
No ratings yet
147 337 1 SM PDF
22 pages
1 s2.0 S2405844022006879 Main
No ratings yet
1 s2.0 S2405844022006879 Main
29 pages
Adaptive PID Controller
No ratings yet
Adaptive PID Controller
6 pages
Comparison of Fuzzy and Neural Network Adaptive Methods
No ratings yet
Comparison of Fuzzy and Neural Network Adaptive Methods
4 pages
IET Power Electronics - 2021 - Saadat - Adaptive Neuro Fuzzy Inference Systems ANFIS Controller Design On Single Phase
No ratings yet
IET Power Electronics - 2021 - Saadat - Adaptive Neuro Fuzzy Inference Systems ANFIS Controller Design On Single Phase
14 pages
A. S. Silveira Et Al. - Pseudo PID Controler - Design, Tuning and Applications
No ratings yet
A. S. Silveira Et Al. - Pseudo PID Controler - Design, Tuning and Applications
6 pages
Development of Tuning Free SISO PID Controllers For First Ord 2021 Results I
No ratings yet
Development of Tuning Free SISO PID Controllers For First Ord 2021 Results I
15 pages
Simple Analytic Rules For Model Reduction and PID
No ratings yet
Simple Analytic Rules For Model Reduction and PID
20 pages
Dhruv Anirudh DrSandeep
No ratings yet
Dhruv Anirudh DrSandeep
21 pages
PID Tuning: A Modern Approach Via The Weighted Sensitivity Problem 1st Edition Salvador Alcántara Cano
No ratings yet
PID Tuning: A Modern Approach Via The Weighted Sensitivity Problem 1st Edition Salvador Alcántara Cano
50 pages
A PID-Type Fuzzy Logic Controller-Based Approach For Motion Control Applications
No ratings yet
A PID-Type Fuzzy Logic Controller-Based Approach For Motion Control Applications
19 pages
Pseudo-PID Controller: Design, Tuning and Applications
No ratings yet
Pseudo-PID Controller: Design, Tuning and Applications
6 pages
Boiler Flow Control Using PID and Fuzzy Logic Controller
No ratings yet
Boiler Flow Control Using PID and Fuzzy Logic Controller
5 pages
Simple Fuzzy PID Controllers For DC-DC Converters: K.-W. Seo and Han Ho Choi
No ratings yet
Simple Fuzzy PID Controllers For DC-DC Converters: K.-W. Seo and Han Ho Choi
6 pages
Fuzzy Logic in Pid Gain Scheduling
No ratings yet
Fuzzy Logic in Pid Gain Scheduling
6 pages
Hardware Implementation of A Neural Network Controller With An MCU and An FPGA For Nonlinear Systems
No ratings yet
Hardware Implementation of A Neural Network Controller With An MCU and An FPGA For Nonlinear Systems
8 pages
A Study On Effects of Different Control Period of Neural Network Based Reference Modified PID Control For DC-DC Converters
No ratings yet
A Study On Effects of Different Control Period of Neural Network Based Reference Modified PID Control For DC-DC Converters
6 pages
Fuzzy Gain Scheduling of PID Controllers PDF
No ratings yet
Fuzzy Gain Scheduling of PID Controllers PDF
7 pages
Advanced Methods of PID Controller Tuning For Specified Performance
No ratings yet
Advanced Methods of PID Controller Tuning For Specified Performance
48 pages
Fuzzy Control Strategies Development For A 3-DoF Robotic
No ratings yet
Fuzzy Control Strategies Development For A 3-DoF Robotic
30 pages
PID Control Perspective Techniques and Uses
No ratings yet
PID Control Perspective Techniques and Uses
8 pages
Fernandes 20
No ratings yet
Fernandes 20
6 pages
Fuzzy Q-Learning Agent For Online Tuning of PID Controller
No ratings yet
Fuzzy Q-Learning Agent For Online Tuning of PID Controller
13 pages
Singh 2019 J. Phys. Conf. Ser. 1172 012054
No ratings yet
Singh 2019 J. Phys. Conf. Ser. 1172 012054
10 pages
Applsci 12 10269 v2
No ratings yet
Applsci 12 10269 v2
14 pages
Neuro-Fuzzy Control of Inverted Pendulum System Fo
No ratings yet
Neuro-Fuzzy Control of Inverted Pendulum System Fo
7 pages
Hybrid Pid Control Algorithms For Nonlinear Process Control: Albena Taneva - Michail Petrov - Ivan Ganchev
No ratings yet
Hybrid Pid Control Algorithms For Nonlinear Process Control: Albena Taneva - Michail Petrov - Ivan Ganchev
6 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Design and Analysis of Controllers: Definitive Reference for Developers and Engineers
From Everand
Design and Analysis of Controllers: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Servo Motor Controller or Servo Motor Driver
No ratings yet
Servo Motor Controller or Servo Motor Driver
4 pages
User Guide FOR Governor YWT: Serial Number: 0S4.412.2060 Revision: V1.0 02/2016
No ratings yet
User Guide FOR Governor YWT: Serial Number: 0S4.412.2060 Revision: V1.0 02/2016
36 pages
Pani Puri Paper
No ratings yet
Pani Puri Paper
42 pages
Drive
No ratings yet
Drive
365 pages
Unit 2 (0.5)
No ratings yet
Unit 2 (0.5)
36 pages
B-SV-600 WPM2019
No ratings yet
B-SV-600 WPM2019
2 pages
Ktu Students: For More Study Materials WWW - Ktustudents.in
No ratings yet
Ktu Students: For More Study Materials WWW - Ktustudents.in
2 pages
CNC USB Controller Mk3: User Manual
No ratings yet
CNC USB Controller Mk3: User Manual
20 pages
Complete Motor Guide For Robotics
100% (2)
Complete Motor Guide For Robotics
50 pages
ASDA A2 Manual
100% (1)
ASDA A2 Manual
583 pages
Clean Room Design
75% (4)
Clean Room Design
26 pages
SS S S: Teknic System Manual
No ratings yet
SS S S: Teknic System Manual
120 pages
Catalog Lexium SH3 - MH3 - SHS Servo Motors For Lexium 62 - 52 Servo Drives - March2022
No ratings yet
Catalog Lexium SH3 - MH3 - SHS Servo Motors For Lexium 62 - 52 Servo Drives - March2022
24 pages
Eco490n - en
100% (1)
Eco490n - en
20 pages
Trust TA115 DataSheet
No ratings yet
Trust TA115 DataSheet
2 pages
2016-11-07 MEI Product Catalog JX HighRes
No ratings yet
2016-11-07 MEI Product Catalog JX HighRes
122 pages
Raspberry Pi - Expansion Boards (Elinux)
No ratings yet
Raspberry Pi - Expansion Boards (Elinux)
30 pages
Nema 17
No ratings yet
Nema 17
5 pages
Homework Writing Machine
No ratings yet
Homework Writing Machine
17 pages
Aqua Drone Ieee Paper
100% (1)
Aqua Drone Ieee Paper
3 pages
Product Data Sheet: LXM52 Single Drive 6A/18A
No ratings yet
Product Data Sheet: LXM52 Single Drive 6A/18A
3 pages
Grad
No ratings yet
Grad
74 pages
Meseret
No ratings yet
Meseret
278 pages
LSG-V SeriesClamping Force 40 ~ 660ls乐星注塑机
No ratings yet
LSG-V SeriesClamping Force 40 ~ 660ls乐星注塑机
11 pages
Robotics Interview Questions: 1) What Do You Understand by The Term, The Robotics?
No ratings yet
Robotics Interview Questions: 1) What Do You Understand by The Term, The Robotics?
9 pages
Robotic Arm: Mimicking
No ratings yet
Robotic Arm: Mimicking
5 pages
Integrated Servos: Typical Applications iSV Series
100% (3)
Integrated Servos: Typical Applications iSV Series
3 pages
Smartstep 2 Servo Drive Datasheet
No ratings yet
Smartstep 2 Servo Drive Datasheet
12 pages
EL7-EC Series AC Servo Drive User Manual V1.00
No ratings yet
EL7-EC Series AC Servo Drive User Manual V1.00
304 pages
2008 Catalog
No ratings yet
2008 Catalog
99 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Design and Application of Adaptive PID Controller Based On Asynchronous Advantage Actor-Critic Learning Method

Uploaded by

Design and Application of Adaptive PID Controller Based On Asynchronous Advantage Actor-Critic Learning Method

Uploaded by

Wireless Networks (2021) 27:3537–3547

Design and application of adaptive PID controller based

Published online: 31 December 2019

1 Introduction evolutionary adaptive PID controller [8] has difficulty in

2.1 Fuzzy PID controller

Fig. 1 PID control structure

Fig. 2 Adaptive PID control diagram based on A3C learning

Hidden layer Fig. 4 Critic network structure of actor–critic

PID controller of the neural network is difficult to obtain 5 Experiments

Fig. 5 Position tracking of PID

Fig. 6 Position tracking of CPS-PID Fig. 9 Position tracking of A3C-PID

Table 1 The comparison of controller performance

PID 0.1547 0. 0705

5.2 Simulation experiment of inverted

The control of single inverted pendulum is a classic

Fig. 8 Position tracking of AC-PID

Figure 10 shows a single inverted pendulum. As shown

Fig. 11 The response of four index with A3C-PID

Input position Position Speed Torque Current Power Motor and

Fig. 13 The closed-loop servo control system of hybrid stepping motor

Fig. 14 The simulation of servo system

5.3.2 Modeling and simulation of two-phase hybrid

Fig. 15 Position tracking Fig. 17 Reward value curve of reinforcement learning

Table 2 The comparison of controller performance

Simulation results show that the A3C-PID controller has

Publisher’s Note Springer Nature remains neutral with regard to

Qifeng Sun was born in 1976,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.