Design and Application of Adaptive PID Controller Based On Asynchronous Advantage Actor-Critic Learning Method
Design and Application of Adaptive PID Controller Based On Asynchronous Advantage Actor-Critic Learning Method
https://doi.org/10.1007/s11276-019-02225-x (0123456789().,-volV)(0123456789().,-volV)
Abstract
To address the problems of the slow convergence and inefficiency in the existing adaptive PID controllers, we propose a
new adaptive PID controller using the asynchronous advantage actor–critic (A3C) algorithm. Firstly, the controller can
train the multiple agents of the actor–critic structures in parallel exploiting the multi-thread asynchronous learning
characteristics of the A3C structure. Secondly, in order to achieve the best control effect, each agent uses a multilayer
neural network to approach the strategy function and value function to search the best parameter-tuning strategy in
continuous action space. The simulation results indicate that our proposed controller can achieve the fast convergence and
strong adaptability compared with conventional controllers.
Keywords Reinforcement learning Asynchronous advantage actor–critic Adaptive PID control Stepping motor
123
3538 Wireless Networks (2021) 27:3537–3547
fellows. Starting from a brief description of PID controller 2.3 Reinforcement learning adaptive controller
in Sect. 2 and 3, we introduce our new approach in Sect. 4
and show experimental results in Sect. 5. We conclude the Aziz Khater [26] proposed a PID controller that combining
paper in Sect. 6. the ASN reinforcement-learning network with fuzzy math.
Despite this method did not need too much accurate
training samples compared the neural network PID, its
2 Related work structure was too complex to guarantee the real-time per-
formance. In view of this point, Adel [10] designed an
The conventional PID controlling algorithms can be adaptive PID controller based on AC algorithm. This
roughly classified into three categories: the fuzzy PID controller had simple structure with one RBF network.
controller, neural network PID controller and reinforce- However, its speed of convergence was slow owing to the
ment learning PID controller. relevance in learning sample of AC algorithm.
123
Wireless Networks (2021) 27:3537–3547 3539
structures, which are executed and learned in parallel by rm ðtÞ ¼ a1 r1 ðtÞ þ a2 r2 ðtÞ
creating multiple agent in same environmental instances.
0; jem ðtÞj\e
The central network is responsible for updating and storing r1 ðtÞ ¼
e em ðtÞ; other
AC network parameters. One agent has its own AC struc-
ture. Different agent will transfer learning data to central 0; jem ðtÞj jem ðt 1Þj
r2 ðtÞ ¼
network to update their parameters of AC network. Further jem ðtÞj jem ðt 1Þj; other
the Actor network is responsible for policy learning, while ð2Þ
critic network is responsible for estimating value function.
In the next step, the Actor (m) and the Critic (m) send their
0 0
4.1 Structure of A3C-PID controller own parameters Wam , Wvm and the generated dTD into the
Global Net to update Wa and Wv with the policy gradient
The design of A3C adaptive PID controller is to combine and the descend gradient. Accordingly, the Global Net
the asynchronous learning structure of A3C with the passes their Wa and Wv to Actor (m) and Critic (m), making
incremental PID controller. Its structure is shown in Fig. 2. them continue to learn new parameters.
The whole process is as follow:
Step 1: For each thread, the initial error em ðtÞ enters the 4.2 A3C learning with neural networks
state converter to calculate Dem ðtÞD2 em ðtÞ and output the
T Multilayer feed-forward neural network [27, 28], also
state vector Sm ðtÞ ¼ em ðtÞ; Dem ðtÞ; D2 em ðtÞ . known as BP neural network, is a back-propagation algo-
Step 2: The Actor (m) maps the state vector Sm ðtÞ to rithm for multilayer feed-forward networks. It has strong
three parameters, Kp Ki and Kd, of PID controller. ability for nonlinear mapping and is suitable for solving
Step 3: The updated controller acts on the environment problems with complex internal mechanism. Therefore, the
to receive the reward rm ðtÞ. method uses two BP neural networks respectively to realize
After n times, Critic (m) receives Sm ðt þ nÞ which is the the learning of policy function and value function. The
state vector of the system. Then it produces the value network structure is as follows.
function estimation VðStþn ; Wv0 Þ and n-step TD error dTD , As shown in Fig. 3, the Actor network has three layers:
which are the important basis for updating parameters. The
The first level is the input layer. The input vector S ¼
formula of the reward function is shown as Formula (2) T
em ðtÞ; Dem ðtÞ; D2 em ðtÞ represents the state vector. The
second layer is the hidden layer. The input of the hidden
layer as follows:
Z-1
∆ um(t-1)
y`m(t) Incremental
State
PID Plant(m)
conventor(m) ym(t)
em(t) Controller(m) ∆ um(t)
Kp Ki Kd
WA
Actor(1)
Actor(m) W`Am
Global Net
TD (t)
Critic(1) Wv
Critic(m)
rm(t) W`vm TD (t)
123
3540 Wireless Networks (2021) 27:3537–3547
e(t)
.
Kp .
e(t) . .
. .
∆e(t) .
v
∆e(t) . Ki .
. .
.
Output layer
.
∆2e(t) ∆2e(t) .
. .
.
. Kd
Input layer
Input layer
Output layer Hidden layer
123
Wireless Networks (2021) 27:3537–3547 3541
123
3542 Wireless Networks (2021) 27:3537–3547
123
Wireless Networks (2021) 27:3537–3547 3543
123
3544 Wireless Networks (2021) 27:3537–3547
S Current
feedback
Position
feedback
123
Wireless Networks (2021) 27:3537–3547 3545
A3C-PID 0.1571 18 0 33
AC-PID 0.1021 21 0 48
BP-PID 2.1705 12 0 32
123
3546 Wireless Networks (2021) 27:3537–3547
controller. In this paper, a new PID controller is proposed new membership function in nonlinear fuzzy PID controllers with
with A3C algorithm. The controller uses the BP neural variable gains. Information and Control, 5, 1–7.
6. Caocang, Li, & Cuifang, Zhang. (2015). Adaptive neuron PID
network to approach the policy function and the value control based on minimum resource allocation network. Appli-
function. BP neural network have the strong ability in cation Research of Computers, 32(1), 167–169.
nonlinear mapping, which can enhance the adaptive ability 7. Sheng, X., Jiang, T., Wang, J., et al. (2015). Speed-feed-forward
of the controller. The learning speed of A3C PID controller PID controller design based on BP neural network. Journal of
Computer Applications, 35(S2), 134–137.
is accelerated with the parallel training of CPU multi- 8. Wang, X. S., Cheng, Y. H., & Wei, S. (2007). A proposal of
threading. The method of asynchronous multi-thread adaptive PID controller based on reinforcement learning. Journal
training reduces the correlation of the training data, and of China University of Mining and Technology, 17(1), 40–44.
makes the controller more stable and adaptable. Our 9. Huo, L., Jiang, D., & Lv, Z. (2018). Soft frequency reuse-based
optimization algorithm for energy efficiency of multi-cell net-
experiments of nonlinear signal and inverted pendulum works. Computers and Electrical Engineering, 66(2), 316–331.
demonstrate that A3C-PID controller has higher control 10. Akbarimajd, A. (2015). Reinforcement learning adaptive PID
accuracy than others PID controller. The experiments about controller for an under-actuated robot arm. International Journal
the position control of two phase hybrid stepping motor of Integrated Engineering, 7(2), 20–27.
11. Chen, X. S., & Yang, Y. M. (2011). A novel adaptive PID con-
show that A3C-PID controller has a good performance on troller based on actor–critic learning. Control Theory and
overshoot, rise time, steady state error and adjustment time. Applications, 28(8), 1187–1192.
According these work, the effectiveness and application 12. Bahdanau, D., Brakel, P., Xu, K., Goyal, A., Lowe, R., Pineau, J.,
significance of the new method can be confirmed. Our aim et al. (2016). An actor–critic algorithm for sequence prediction.
arXiv preprint arXiv:1607.07086.
is to make the controller apply to the multi-axis motion 13. Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavuk-
control and the actual industrial production. cuoglu, K., et al. (2016). Sample efficient actor–critic with
experience replay. arXiv preprint arXiv:1611.01224.
Acknowledgements This work was supported by National Science 14. Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T.,
and Technology Major Project of China (Grant Number Harley, T., et al. (2016). Asynchronous methods for deep rein-
2017ZX05009-001). forcement learning. In International conference on machine
learning (pp. 1928–1937).
Open Access This article is licensed under a Creative Commons 15. Jiang, D., Huo, L., Lv, Z., Song, H., & Qin, W. (2018). A joint
Attribution 4.0 International License, which permits use, sharing, multi-criteria utility-based network selection approach for vehi-
adaptation, distribution and reproduction in any medium or format, as cle-to-infrastructure networking. IEEE Transactions on Intelli-
long as you give appropriate credit to the original author(s) and the gent Transportation Systems, 19(10), 3305–3319.
source, provide a link to the Creative Commons licence, and indicate 16. Liu, Q., Zhai, J. W., Zhang, Z. Z., & Zhong, S. (2018). A survey
if changes were made. The images or other third party material in this on deep reinforcement learning. Chinese Journal of Computers,
article are included in the article’s Creative Commons licence, unless 41(01), 1–27.
indicated otherwise in a credit line to the material. If material is not 17. Qin, R., Zeng, S., Li, J. J., & Yuan, Y. (2015). Parallel enterprises
included in the article’s Creative Commons licence and your intended resource planning based on deep reinforcement learning. Zi-
use is not permitted by statutory regulation or exceeds the permitted donghua Xuebao/Acta Automatica Sinica, 43(9), 1588–1596.
use, you will need to obtain permission directly from the copyright 18. Jiang, D., Wang, Y., Lv, Z., Qi, S., & Singh, S. (2019). Big data
holder. To view a copy of this licence, visit http://creativecommons. analysis-based network behavior insight of cellular networks for
org/licenses/by/4.0/. industry 4.0 applications. IEEE Transactions on Industrial
Informatics. https://doi.org/10.1109/TII.2019.2930226.
19. Tang, H. C., Li, Z. X., Wang, Z. T., et al. (2005). A fuzzy PID
control system. Electric Machines and Control, 2, 136–138.
References 20. Sun, J. P., Yan, L., Li, Y., et al. (2006). Design of fuzzy PID
controllers based on improved genetic algorithms. Chinese
1. Adel, T., & Abdelkader, C. (2013). A particle swarm optimiza- Journal of Scientific Instrument, S3, 1991–1992.
tion approach for optimum design of PID controller for nonlinear 21. Zhu, Y. H., Xue, L. Y., & Huang, W. (2011). Design of fuzzy
systems. In International conference on electrical engineering PID controller based on self-organizing adjustment factors.
and software applications (pp. 1–4). IEEE. Journal of System Simulation, 23(12), 2732–2737.
2. Savran, A. (2013). A multivariable predictive fuzzy PID control 22. Liao, F. F., & Xiao, J. (2005). Research on self-tuning of PID
system. Applied Soft Computing, 13(5), 2658–2667. parameters based on BP neural networks. Acta Simulata Sys-
3. Jiang, D., Wang, W., Shi, L., & Song, H. (2018). A compressive tematica Sinica, 07, 1711–1713.
sensing-based approach to end-to-end network traffic recon- 23. Li, G. Y., & Chen, X. L. (2008). Neural network self-learning
struction. IEEE Transactions on Network Science and Engi- PID controller based on real-coded genetic algorithm. Micro-
neering, 5(3), 1–12. motors Servo Technique, 1, 43–45.
4. Jiang, D., Huo, L., & Li, Y. (2018). Fine-granularity inference 24. Patel, R., & Kumar, V. (2015). Multilayer neuro PID controller
and estimations to network traffic for SDN. PLoS ONE, 13(5), based on back propagation algorithm. Procedia Computer Sci-
1–23. ence, 54, 207–214.
5. Zhang, X., Bao, H., Du, J., & Wang, C. (2014). Application of a
123
Wireless Networks (2021) 27:3537–3547 3547
25. Nie, S. K., Wang, Y. J., Xiao, S., & Liu, Z. (2017). An adaptive Chengze Du was born in 1996,
chaos particle swarm optimization for tuning parameters of PID Postgraduate. He is a researcher
controller. Optimal Control Applications and Methods, 38(6), of college of computer science
1091–1102. and technology, China Univer-
26. Aziz Khater, A., El-Bardini, M., & El-Rabaie, N. M. (2015). sity of Petroleum, Qingdao,
Embedded adaptive fuzzy controller based on reinforcement 26658, China. His research
learning for DC motor with flexible shaft. Arabian Journal for interest includes deep learning,
Science and Engineering, 40(8), 2389–2406. industrial application, etc.
27. Liu, Z., Zeng, X., Liu, H., & Chu, R. (2015). A heuristic two-
layer reinforcement learning algorithm based on BP neural net-
works. Journal of Computer Research and Development, 52(3),
579–587.
28. Zhu, J., Song, Y., Jiang, D., & Song, H. (2018). A new deep-Q-
learning-based transmission scheduling mechanism for the cog-
nitive Internet of things. IEEE Internet of Things Journal, 5(4),
2375–2385.
29. Xu, X., Zuo, L., & Huang, Z. (2014). Reinforcement learning Youxiang Duan was born in
algorithms with function approximation: Recent advances and 1964, Ph.D. Graduated from
applications. Information Sciences, 261, 1–31. China University of Petroleum.
30. Jiang, D., Huo, L., & Song, H. (2018). Rethinking behaviors and Now he is a professor of college
activities of base stations in mobile cellular networks based on of computer science and tech-
big data analysis. IEEE Transactions on Network Science and nology, China University of
Engineering, 1(1), 1–12. Petroleum, Qingdao, 26658,
31. Wang, F., Jiang, D., & Qi, S. (2019). An adaptive routing algo- China. His research interest
rithm for integrated information networks. China Communica- includes service computing,
tions, 7(1), 196–207. intelligent control, machine
32. Huo, L., & Jiang, D. (2019). Stackelberg game-based energy- learning etc.
efficient resource allocation for 5G cellular networks. Telecom-
munication System, 23(4), 1–11.
33. Jiang, D., Zhang, P., Lv, Z., & Song, H. (2016). Energy-efficient
multi-constraint routing algorithm with load balancing for smart
city applications. IEEE Internet of Things Journal, 3(6),
1437–1447. Hui Ren was born in 1993,
34. Jiang, D., Li, W., & Lv, H. (2017). An energy-efficient cooper- received his B.S. degree in
ative multicast routing in multi-hop wireless networks for smart Electrical Engineering from
medical applications. Neurocomputing, 220(2017), 160–169. China University of Petroleum,
35. Wang, F., Jiang, D., Wen, H., & Song, H. (2019). Adaboost- China. His research interest
based security level classification of mobile intelligent terminals. includes intelligent control,
The Journal of Supercomputing, 75, 1–19. Actor-Critic learning, etc.
36. Sun, M., Jiang, D., Song, H., & Liu, Y. (2017). Statistical reso-
lution limit analysis of two closely-spaced signal sources using
Rao test. IEEE Access, 5, 22013–22022.
123