06722294
06722294
Abstract—Our aim in this paper is to present a novel method Thorough studies in ADP have been conducted by [1] [2],
for online optimal control system design via state heuristic where they propose news ideas and comment the trends of
dynamic programming (HDP) to approximate the solution of ADP for this decade. The state of the art on approximate
the Hamilton-Jacobi-Bellman (HJB) equation by means of the
recursive least-square (RLS) approach. Because the randomness solution of the Hamilton-Jacobi-Bellman (HJB) equation that
nature associated to primary energy sources, the control of is associated to the discrete algebraic Riccati equation [3] can
eolic and solar energy systems demands methods and technics be found in [4] [5]. In references [6] [7] the efficient rein-
that are suitable with the high degree of the environment forcement learning, temporal difference learning and function
uncertainties. The reinforcement learning (RL) and approximate approximation, respectively, are discussed in the context of
dynamic programming (ADP) approaches furnish the key ideas
and the mathematical formulations to develop optimal control least-square to solve HJB equation.
system methods and strategies for alternative energy systems. Reports on applications of DLQR control in DFIG show the
We are proposing a online design method to establish control importance of this device and technologic improvements that
strategies for the the main unit of a eolic system that is the doubly
fed induction generator (DFIG). The performance of proposed
are promoted by optimal control strategies. The reference [8]
method is evaluated via computational experiments for discrete in decentralized nonlinear control of wind turbine via DFIG
time HDP algorithms that map eigenstructure assignments in the (Doubly Fed Induction Generator) presents the linear quadratic
stable Z-plane. regulator (LQR) design method to improve the transient stabil-
Index Terms—Heuristic Dynamic Programming, Multivariable ity of the power systems and enhance the system damping. An
Control, Dynamic Programming, Optimal Control Tuning, Con- optimal control strategy based on LQR for DFIG is presented
vergence, Discrete Linear Quadratic Regulator, Digital Control,
DFIG wind turbines, FACTS Devices; Doubly Fed Induction by [9]. The strategies are designed to solve transient stability
Generator. problems, and gain adjustment of linear quadratic controllers
is performed via deviations of weighting matrices values. In
reference [10], the authors present an optimal control strategy
I. I NTRODUCTION
for reactive power, where the DFIG is a reactive power source
Lots of efforts are being made today for the development of the wind farm, where a genetic algorithm was developed
of alternative energy systems, such as solar and eolic plants to optimize the control strategy. An optimal tracking of the
as primary energy sources to transform those energies into secondary voltage control for DFIG . This control is based
electrical energy. These natural resources are subjected to on the regulation margin of the grid buses intelligent selection
uncertainties provoked by environmental changes of temper- based on the voltage violation condition.
ature, pressure and human been environment changes. Due to In this paper, we present a novel method and a on-line
these uncertainties control systems must be robust to lead with algorithm to design control strategies for DFIG in eolic plants.
random situations in his normal operation. Mostly to minimize Adaptive dynamic programming, discrete linear quadratic reg-
and sometimes to avoid unwanted effects of uncertainties, we ulator concepts that support the development of proposed
present the first insights of on-line optimal control design online optimal control system design method are presented
method based on reinforcement learning and approximated in Section II. The online DFIG-HDP-DLQR design method of
solution of HJB equation that is oriented to handle random Section III is based on adaptive critic approach and HDP al-
and non-linear processes. gorithms that are associated to the turbine-generator linearized
3175
3181
A. Adaptive Critic Approach on iterative process that are instructions established from steps
The adaptive critic approach is used to determine the 14-42. The control strategies are established according to the
parameters of the value function approximation. The Eq.(5) process.
that minimizes the expected squared error is parametrized Algorithm 1 - HDP Algorithm - Recursive least-square and Recurrence Step.
according to the Eq.(7). This can be written in vectorized form PD INAMIC -DLQR-HDP(N )
as 1 ————————————————————————-
2 - Setup - Initial Conditions
3 - weighting Matrices and Dynamic System
4 [Q, R, Ad , Bd , N ] ← [ ]
θj+1 = arg min Ex {|x̄T θ − d(x, r, f, Pj )|2 }, 5 Select - Discount factor - 0 < γ ≤ 1.
θ 6 RLS Parameters: θ0 , Γ0
(12) 7 Select - Forgetting factor - 0 < τ ≤ 1.
8 Select - Initial State - x0 .
where x̄ ∈ n(n+1)/2
R is defined according to 9 - Iterative Process Parameters
10 [N, nrevit ] ← [ ]
the Kronecker product that is given by x̄T = 11 - P and K initial Values
[x21 . . . x1 xn x22 . . . xn−1 xn x2n ]. The function 12 [P0 , K0 ] ← [ ]
13 ————————————————————————-
θ = vec(P ) ∈ Rn(n+1)/2 of the square matrix P is a 14 Iterative Process
vector containing the n diagonal entries of P and the 15 for i ← 0 : N
16 do
n(n + 1)/2 − n distinct sums pik + pki . Assuming an 17 Optimal Policy
ordering between the vector x̄ and the vectorization vec(P ) 18 ui ← Ki xi
19 States
to represent the quadratic form xT P x = x̄T vec(P ), the 20 xi+1 ← Ad xi + Bd ui
least-square parametric estimate of Eq.(12) is given by 21 Basis Set - Kronecker Product
22 xi ← [x21i ; x1i x2i ; . . . ; x26i ]
23 Target Vector Assembling
24 d(x, r, f, P ) ← xT i Qxi + ui Rui + xi+1 P xi+1
T T
θj+1 = (Ex {x̄T x̄})−1 Ex {x̄T d(x, r, f, Pj )}. (13) 25 Recursive least-square
Γi x i
26 Li+1 = T
τ +x Γi xi
The matrix vectorization and Kronecker product theories θi+1 = θi +
i
Li+1 (d(·) − xT )
27 i θi
[11] contribute for an approximate solution of the DARE that 28 Γi+1 = τ −1 Γ x xT Γ
Γi − i iT i i
τ +x Γi xi
is obtained from an iterative scheme of systems of linear 29
i
P matrix recovery from vector θ
equations for the coefficients of the matrix P which is given 30 P ← [θ1 , θ2 /2, θ3 /2, θ4 /2, θ5 /2, θ6 /2;
by 31 θ2 /2, θ7 , θ8 /2, θ9 /2, θ10 /2, θ11 /2;
32 θ3 /2, θ8 /2, θ12 , θ13 /2, θ14 /2, θ15 /2;
33 θ4 /2, θ9 /2, θ13 /2, θ16 , θ17 /2, θ18 /2;
34 θ5 /2, θ10 /2, θ14 /2, θ17 /2, θ19 , θ20 /2;
xTi vec(Pj+1 ) = d(xi , r, f, Pj ), (14) 35 θ6 /2, θ11 /2, θ15 /2, θ18 /2, θ20 /2, θ21 ]
36 Feedback Optimal Gain K
37 K ← −(R + BdT P Bd )−1 BdT P Ad
where ij−1 ≤ i ≤ ij , with ij − ij−1 = n(n + 1)/2 linearly 38 if i%nrevit = 0
independent samples. The regression vector xi and target value 39 then
40 xi+1 ← xrevit
d(·) are given by 41 —————————————————————–
42 End - Iterative Process
3176
3182
approximation of the HJB-DARE solution and reinforcement approximate solution of HJB-DARE equation is performed by
learning theory. the recursive least-square (RLS). The evolution of the iterative
process for the Q and R cost diagonal matrices is shown in
Figures 2 and 3. The Q matrix has its diagonal elements given
by q1,1 = 0.1, q2,2 = 100, q3,3 = 100, q4,4 = 1000, q5,5 =
1000 and q6,6 = 1. The R matrix is the identity matrix. The
iterative process behaviour is presented for the horizon of 420
seconds.
120 250
100 200
θestim
80
150 θ0
p11
p22
60
100
40
θestim
20 50
θ0
0 0
0 10 20 30 40 0 10 20 30 40
a) tamost(s) b) tamost(s)
4
x 10
2 4000
3000
Figure 1. Control system of DFIG. 1
θestim
2000
0 θ0
p33
p44
1000
A. Setup of the Iterative Process −1
0
θestim
The setup of the iterative process is classified in three −2
θ0
−1000
p66
2000
⎡ ⎤ 0
−2000
⎢ ⎥ −100
Ac = ⎢ ⎥
⎢ 0.00 0.00 0.00 −250.00 377.00 0.00 ⎥ −200 −8000
⎣ 0.00 0.00 0.00 −377.00 −250.00 0.00 ⎦ 0 10 20
e) tamost(s)
30 40 0 10 20
f) tamost(s)
30 40
3177
3183
Table I convergence, in the sense of algorithm iterations, for situations
S TATISTICS OF D IAGONAL PARAMETERS - S OLUTION OF HJB-DARE
E QUATION .
1, 2 and 3 are 600, 4000 and 1000 iterations, respectively.
Table III
Parameter θ1 θ7 θ12 θ16 U NIFORM D EVIATIONS - Q M ATRIX T RACES AND DYNAMIC S YSTEM
p11 p22 p33 p44 E IGENVALUES
True 103.04 103.04 100.99 100.24
Estimated 103.04 103.04 100.99 100.24
No Trace Eigenvalues (10−2 )
Expectation 102.70 103.10 -3.70 111.64
1 6 44.536 +j 0.000 -12.705 +j 33.142 -12.705 +j -33.142
Standard Deviation 3.43 10.41 1603.51 218.21 2.596 +j 0.000 -6.260 +j 4.664 -6.260 +j -4.664
Median 103.04 103.04 100.99 100.24 2 600 -2.914 +j 7.590 -2.914 +j -7.590 -1.467 +j 1.208
Minimum 0.00 0.00 -23304.65 -1863.10 -1.467 +j -1.208 0.981 +j 0.000 0.037 +j 0.000
Maximum 107.97 223.99 14353.35 3276.06 3 6000 -0.375 +j 0.975 -0.375 +j -0.975 -0.182 +j 0.149
-0.182 +j -0.149 0.100 +j 0.000 0.004 +j 0.000
Table II
S TATISTICS OF O FF - DIAGONAL PARAMETERS - S OLUTION OF HJB-DARE
For situations 1 and 2 presented in Table III, the forgetting
E QUATION .
factor values associated with the uniform deviations of Q
matrix is 0.89. For situation 3 the forgetting factor value is
Parameter θ2 θ3 θ4 θ5 θ6
0.85. The traces for non-uniform variations of Q matrix and
p12 p13 p14 p15 p16
the eigenvalues of dynamic system are presented in Table IV.
The forgetting factor values for situations 1, 2 and 3 of Table
True 0.00 0.00 0.00 0.00 0.00 IV are 0.89, 085 and 0.92, respectively. The RLS convergence
Estimated -0.00 -0.00 -0.00 0.00 -0.00 is reached around 600, 900 and 1000 iterations for situations
Expectation -0.89 -5.97 1.47 2.90 3.43 1, 2 and 3, respectively.
S. Deviation 17.76 95.03 14.94 40.82 38.05 Table IV
Median 0.00 -0.00 0.00 0.00 0.00 N ON - UNIFORM D EVIATIONS - Q M ATRIX T RACES AND DYNAMIC S YSTEM
Minimum -255.83 -1382.45 -125.48 -351.08 -306.07 E IGENVALUES .
Maximum 158.51 836.30 214.10 590.98 550.43
3178
3184
Z-plane involves the development of algebraic relationships neering (PPGEE), State University of Maranhão (UEMA) for
that can support the application of polarized heuristics for the the development infrastructure and financial support, as well
selection of matrices Q and R. as, FAPEMA, CNPq and CAPES.
D. Tests with plant parameter variations R EFERENCES
In order to show the robustness of the system, tests with [1] G. Lendaris, “A retrospective on adaptive dynamic programming for
plant parameter variations have been carried out on the speed control,” in Neural Networks, 2009. IJCNN 2009. International Joint
Conference on, vol. 0, 2009, pp. 1750 –1757.
of rotor, slip frequency, DC-link capacitance, DC-link voltage [2] P. J. Werbos, “Foreword - adp: The key direction for future research in
and others, these variations are related with operation of the intelligent control and understanding brain intelligence,” Systems, Man,
DFIG. Due to the unpredictable behavior of the wind, changes and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 38,
no. 4, pp. 898–900, Aug. 2008.
in its operating parameters are not impossible to happen. To [3] A. J. Laub, “A schur method for solving algebraic riccati equations,”
evaluate the impact of variations on system, it is defined a IEEE Transactions on Automatic Control, vol. 24, no. 6, pp. 913–921,
disturbance variable V ar that represents the plant changes, as 1979.
[4] T. Landelius, “Reinforcement Learning and Distributed Local Model
can be seen in Figure 4. Synthesis,” Ph.D. dissertation, Linköping University, Sweden, SE-581
The plant response to pulse variation is seen in the Figure 83 Linköping, Sweden, 1997, dissertation No 469, ISBN 91-7871-892-
4, these parameter changes are held in the interval from 1500 9.
[5] J. Murray, C. Cox, G. Lendaris, and R. Saeks, “Adaptive dynamic
to 2500 interactions (15-25 seconds). The disturbance leads programming,” Systems, Man, and Cybernetics, Part C: Applications and
the plant to another operational point, when the disturbance Reviews, IEEE Transactions on, vol. 32, no. 2, pp. 140 – 153, May 2002.
is ceased the system returns to scheduled operation. This test [6] X. Xu, H. He, and D. Hu, “Efficient reinforcement learning using
recursive least-squares methods,” in Department of Automatic Control.
shows the ability of the system to recovery from parametric National University of Defense Technology, ChangSha, Hunan, 410073,
changes. P.R.China, 2002.
[7] J. Boyan, “Least-squares temporal difference learning,” in Technical
update: least-squares temporal difference learning. Machine Learning,
Special Issue on Reinforcement Learning, to appear, 2002.
[8] F. Wu, X.-P. Zhang, P. Ju, and M. Sterling, “Decentralized nonlinear
control of wind turbine with doubly fed induction generator,” Power
Systems, IEEE Transactions on, vol. 23, no. 2, pp. 613 –621, may 2008.
[9] L. Barros, W. Mota, J. da Silva, and C. Barros, “An optimal control strat-
egy for dfig,” in Industrial Technology (ICIT), 2010 IEEE International
Conference on, march 2010, pp. 1727 –1732.
[10] J. Fang, G. Li, X. Liang, and M. Zhou, “An optimal control strategy
for reactive power in wind farms consisting of vscf dfig wind turbine
generator systems,” in Electric Utility Deregulation and Restructuring
and Power Technologies (DRPT), 2011 4th International Conference on,
july 2011, pp. 1709 –1715.
[11] J. Brewer, “Kronecker products and matrix calculus in system theory,”
Circuits and Systems, IEEE Transactions on, vol. 25, no. 9, pp. 772 –
781, Sep. 1978.
[12] K. J. Astrom and B. Wittenmark, Adaptive Control, 2nd ed. Boston,
MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1994.
[13] V. P. Pinto, J. C. T. Campos, L. L. N. Dos Reis, C. B.
Jacobina, and N. Rocha, “Robustness and performance analysis for
Figure 4. System response to parametric variations the linear quadratic gaussian/loop transfer recovery with integral
action controller applied to doubly fed induction generators in
wind energy conversion systems,” Electric Power Components and
Systems, vol. 40, no. 2, pp. 131–146, 2011. [Online]. Available:
VI. C ONCLUSION http://www.tandfonline.com/doi/abs/10.1080/15325008.2011.629331
[14] A. Rosas, P. A C. e Estanqueiro, “Guia de projeto elétrico de centrais
In this article, we highlighted some insights on the on- eólicas, volume 1: Projeto elétrico e impacto de centrais eólicas na rede
line control system design. The problem was characterized elétrica.”
by Hamilton-Jacobi-Bellman Equation and parameterized for [15] J. V. da Fonseca Neto and L. R. Lopes, “On the convergence of
dlqr control system design and recurrences of riccati and lyapunov in
DLQR in the so called discrete algebraic Riccati equation. The dynamic programming strategies,” in Proceedings of the 13th UKSim-
optimal control law gains were given by approximations of the AMSS International Conference on Computer Modelling and Simulation,
DARE solution via HDP. The theory and development of a Cambridge University, Emmanuel College, Cambridge, UK, 30 March -
1 April 2011, D. Al-Dabass, A. Orsoni, R. Cant, and A. Abraham, Eds.
dedicated policy iteration algorithm were presented to evaluate IEEE, 2011, pp. 26–31.
the feasibility of proposed method in energy eolic plants. [16] J. Fonseca Neto and L. R. Lopes, “On the convergence of DLQR control
The proposed method has shown to be an alternative to and recurrences of riccati and lyapunov in dynamic programming,”
in UKSim 13th International Conference on Computer Modelling and
assign the eigenvalues of the dynamic system inside of unitary Simulation (UKSim2011), Cambridge, United Kingdom.
circle by estimating the DARE solution via the recursive least-
square.
ACKNOWLEDGMENT
The authors are indebted the Federal University of
Maranhão (UFMA), the Graduate Program in Electrical Engi-
3179
3185