Deep Reinforcement Learning Energy Management
Deep Reinforcement Learning Energy Management
Abstract—In recent years, energy management systems have performance. Hence, energy management of such systems is
become an emerging research topic. This concept allows the a serious issue as it significantly influences the performance
distribution of energy-intensive loads among various energy of electric vehicles. But, their maintenance and their energy
sources. An appropriate resource allocation scheme is necessary
for the controller to efficiently allocate its energy resources management are getting more burdensome and costly. Ad-
in different operating conditions. Recent advances in artificial ditionally, the amount of energy of these devices is limited
intelligence are instrumental to solve complex energy manage- and hence, it has to be used efficiently. Not long ago, energy
ment problems by learning large repertoires of behavioral skills. management systems started to be given a thorough attention
This consists of hand-engineered policy and human-like expertise by the research community. Henceforth, optimal energy usage
representations. In this paper, a deep reinforcement learning
based resource allocation scheme is proposed for electric vehicles of energy storage devices is among the numerous challenges
avoiding to work at the level of complex vehicle dynamics. to be addressed, which raises the urgency to find alternative
Using multiple energy storage devices, like batteries, in parallel efficient energy deployment techniques to keep up with the
increases their maintenance due to their different behavior in growing energy demand.
various operating conditions. Thus, the proposed strategy aims to Various energy management methods have been proposed
learn optimal policies to equilibrate the state of charge (SOC) of
all batteries extending their lifespan and reducing their frequent throughout the years such as, dynamic programming (DP) [1],
maintenance. adaptive DP [2], optimal control [3] and soft-computing meth-
ods [4]–[7]. In [1], a hybrid trip model is presented to obtain
I. I NTRODUCTION the vehicle-speed trajectory for the trip path without GPS data.
With the aim to reduce carbon emissions, global demand Next, a DP-based EMS with prediction horizon is proposed.
for clean technologies and more sustainable modes of trans- Since DP is known for its heavy computational requirements,
portation is rising. Eliminating carbon emissions remains one search range optimization algorithm is used. To alleviate the
of the world’s major challenges and social pressure for a computational burden of DP, a stochastic DP is proposed
more sustainable future has never been higher. In an effort in [8]. In [9], better performance is achieved with a multiagent
to reduce the human-induced climate change, transitioning fuzzy logic strategy as no load profile a priori knowledge
to sustainable technologies is considered as a cure for our is required. Also, multiagents and particle swarm optimiza-
addiction to fossil fuel. The transportation sector has been tion [10], [11] have been used for optimal energy management
dominated for almost a century by combustion engines. In and hybrid distributed energy management systems. In [12],
the last decade, electric transportation has been experiencing [13], a hybrid approach combines between supercapacitors fast
rapid growth across the globe and its broad-scale adoption is dynamics and high batteries energy density for multiple energy
bringing significant societal changes. For many years, effective sources integration. Analytical optimization methods, such as
transportation solutions have been provided for a wide range Pontryagin’s minimum principle, find an analytical solution
of applications from golf carts and forklifts, to utility vehicles, using a mathematical problem formulation [14]. This makes
and now interest is spreading. the obtained solution faster than the purely numerical methods.
In vehicular technology, electric/electronic systems are tak- But, optimal solutions are generated offline and require that
ing over pneumatic, hydraulic, and mechanical systems. That the future driving conditions to be known in prior. In [15], an
is because clean energies have received a increasing interest energy management technique is proposed using fuzzy logic
as they have been considered for the last decade as a way for an embedded fuel-cell system. On the other hand, neural
of reversing climate change. Consequently, the development networks are suggested in [16] for hybrid electric vehicles
of electric vehicles is booming and researchers have an op- as an efficient energy management system. Using multiple
portunity to contribute to solutions to improve their energy energy sources, the energy requirement of hybrid vehicles
efficiency. An energy management system (EMS) is consid- can be easily managed. Recent advances in soft-computing
ered as the backbone in electric transportation systems that methodologies has led to the widespread of intelligent sys-
use multiple energy sources such as batteries, supercapacitors, tems [17]–[24]. But, neural networks remain incapable of
and fuel cells, which yield great flexibility in achieving higher incorporating any human-like expertise and fuzzy logic is
Unit3
battery units. The difference between all battery units SOC -10
gradually decreases over time to finally settle at zero as it -20
can be observed in Fig. 5(d). For that, the deep reinforcement
-30
learning based resource allocation scheme assigns a operation 0 2 4 6 8
rate parameter of a 13 making all battery units operating at the Time (min)
0.4
of life (usually due to accelerated aging), all batteries within
that system are replace to preserve its overall integrity. The 0.3
units from premature aging and allows the use of units with 0.1
different capacities and from different manufacturers. 0
0 2 4 6 8
Time (min)
R EFERENCES
(e)
[1] J. Liu, Y. Chen, W. Li, F. Shang, and J. Zhan, “Hybrid-Trip-Model-
Based Energy Management of a PHEV With Computation-Optimized Fig. 6. System response under 2 min time interval power demand: (a) DC bus
Dynamic Programming,” IEEE Transactions on Vehicular Technology, voltage VDC ; (b) battery currents isi ; (c) duty cycle ρi ; (d) state of charge
vol. 67, no. 1, pp. 338–353, Jan. 2018. SOCi ; and (e) parameters ki and k̄i .
[2] Q. Wei, F. L. Lewis, G. Shi, and R. Song, “Error-Tolerant Iterative Learning,” IEEE Transactions on Intelligent Transportation Systems,
Adaptive Dynamic Programming for Optimal Renewable Home Energy vol. 19, no. 4, pp. 1198–1207, Apr. 2018.
Scheduling and Battery Management,” IEEE Transactions on Industrial [23] H. Chaoui and P. Sicard, “Fuzzy Logic Based Supervisory Energy
Electronics, vol. 64, no. 12, pp. 9527–9537, Dec. 2017. Management for Multisource Electric Vehicles,” in IEEE Vehicle Power
[3] S. Delprat, T. Hofman, and S. Paganelli, “Hybrid Vehicle Energy Man- and Propulsion Conference, 2011.
agement: Singular Optimal Control,” IEEE Transactions on Vehicular [24] H. Chaoui, S. Miah, and P. Sicard, “Adaptive Fuzzy Logic Control of
Technology, vol. 66, no. 11, pp. 9654–9666, Nov. 2017. a DC-DC Boost Converter with Large Parametric and Load Uncertain-
[4] Z. Chen, C. C. Mi, J. Xu, X. Gong, and C. You, “Energy Management ties,” in IEEE/ASME Advanced Intelligent Mechatronics International
for a Power-Split Plug-in Hybrid Electric Vehicle Based on Dynamic Conference, 2010.
Programming and Neural Networks,” IEEE Transactions on Vehicular [25] N. D. Nguyen, T. Nguyen, and S. Nahavandi, “System Design Perspec-
Technology, vol. 63, no. 4, pp. 1567–1580, Apr. 2014. tive for Human-Level Agents Using Deep Reinforcement Learning: A
[5] A. Arabali, M. Ghofrani, M. Etezadi-Amoli, M. S. Fadali, and Y. Bagh- Survey,” IEEE Access, vol. 5, pp. 27 091–27 102, 2017.
zouz, “Genetic-Algorithm-Based Optimization Approach for Energy [26] L. Li, Y. Lv, and F.-Y. Wang, “Traffic signal timing via deep reinforce-
Management,” IEEE Transactions on Power Delivery, vol. 28, no. 1, ment learning,” IEEE/CAA Journal of Automatica Sinica, vol. 3, no. 3,
pp. 162–170, Jan. 2013. pp. 247–254, July 2016.
[6] E. Kamal and L. Adouane, “Intelligent Energy Management Strategy [27] S. S. Mousavi, M. Schukat, and E. Howley, “Traffic light control using
Based on Artificial Neural Fuzzy for Hybrid Vehicle,” IEEE Transac- deep policy-gradient and value-function-based reinforcement learning,”
tions on Intelligent Vehicles, vol. 3, no. 1, pp. 112–125, Mar. 2018. IET Intelligent Transport Systems, vol. 11, no. 7, pp. 417–423, 2017.
[7] R.-J. Wai, S.-J. Jhung, J.-J. Liaw, and Y.-R. Chang, “Intelligent Optimal [28] T. de Bruin, J. Kober, K. Tuyls, and R. Babuka, “Integrating State
Energy Management System for Hybrid Power Sources Including Fuel Representation Learning Into Deep Reinforcement Learning,” IEEE
Cell and Battery,” IEEE Transactions on Power Electronics, vol. 28, Robotics and Automation Letters, vol. 3, no. 3, pp. 1394–1401, July
no. 7, pp. 3231–3244, July 2013. 2018.
[8] S. J. Moura, H. K. Fathy, D. S. Callaway, and J. L. Stein, “A stochastic [29] D. Zhao, Y. Chen, and L. Lv, “Deep Reinforcement Learning With
optimal control approach for power management in plug-in hybrid Visual Attention for Vehicle Classification,” IEEE Transactions on
electric vehicles,” IEEE Transactions on Control Systems Technology, Cognitive and Developmental Systems, vol. 9, no. 4, pp. 356–367, Dec.
vol. 19, no. 3, pp. 545–555, May 2011. 2017.
[9] K. Manickavasagam, “Intelligent Energy Control Center for Distributed [30] H. Chaoui, C. Ibe-Ekeocha, and H. Gualous, “Aging Prediction and State
Generators Using Multi-Agent System,” IEEE Power Systems, vol. 30, of Charge Estimation of a LiFePO4 Battery using Input Time-Delayed
no. 5, pp. 2442–2449, Sep. 2015. Neural Networks,” Electric Power Systems Research, Elsevier, vol. 146,
[10] M. Mao, P. Jin, N. D. Hatziargyriou, and L. Chang, “Multiagent-Based pp. 189–197, May 2017.
Hybrid Energy Management System for Microgrids,” IEEE Transactions [31] H. Chaoui, N. Golbon, I. Hmouz, R. Souissi, and S. Tahar, “Lyapunov-
on Sustainable Energy, vol. 5, no. 3, pp. 938–946, July 2014. Based Adaptive State of Charge and State of Health Estimation for
[11] V.-H. Bui, A. Hussain, and H.-M. Kim, “A Multiagent-Based Hierar- Lithium-Ion Batteries,” IEEE Trans. Ind. Electron., vol. 62, no. 3, pp.
chical Energy Management Strategy for Multi-Microgrids Considering 1610–1618, Mar. 2015.
Adjustable Power and Demand Response,” IEEE Transactions on Smart [32] H. Chaoui and S. Mandalapu, “Comparative Study of Online Open
Grid, vol. 9, no. 2, pp. 1323–1333, Mar. 2018. Circuit Voltage Estimation Techniques for State of Charge Estimation
[12] N. Mendis, K. M. Muttaqi, and S. Perera, “Management of Battery- of Lithium-Ion Batteries,” Batteries, vol. 3, pp. 1–13, Apr. 2017.
Supercapacitor Hybrid Energy Storage and Synchronous Condenser
for Isolated Operation of PMSG Based Variable-Speed Wind Turbine
Generating Systems,” IEEE Transactions on Smart Grid, vol. 5, no. 2,
pp. 944–953, Mar. 2014.
[13] U. Akram, M. Khalid, and S. Shafiq, “An Innovative Hybrid Wind-
Solar and Battery-Supercapacitor Microgrid SystemDevelopment and
Optimization,” IEEE Access, vol. 5, pp. 25 897–25 912, 2017.
[14] S. Teleke, M. Baran, S.Bhattacharya, and A. Huang, “Optimal Control of
Battery Energy Storage for Wind Farm Dispatching,” IEEE Transactions
on Energy Conversion, vol. 25, no. 3, pp. 787–794, Sep. 2010.
[15] M. Tekin, D. Hissel, M.-C. Pera, and J. Kauffmann, “Energy-
Management Strategy for Embedded Fuel-Cell Systems Using Fuzzy
Logic,” IEEE Transactions on Industrial Electronics, vol. 54, no. 1, pp.
595–603, Feb. 2007.
[16] J. Moreno, M. Ortuzar, and J. Dixon, “Energy-management system for a
hybrid electric vehicle, using ultracapacitors and neural networks,” IEEE
Transactions on Industrial Electronics, vol. 53, no. 2, pp. 614–623, Mar.
2006.
[17] H. Chaoui, M. Khayamy, and O. Okoye, “Adaptive RBF Network Based
Speed Control for Interior PMSM Drives without Current Sensing,”
IEEE Trans. Veh. Technol., vol. PP, no. 99, pp. 1–1, In Press 2018.
[18] H. Chaoui and C. Ibe-Ekeocha, “State of Charge and State of Health
Estimation for Lithium Batteries using Recurrent Neural Networks,”
IEEE Trans. Veh. Technol., vol. 66, no. 10, pp. 8773–8783, Oct. 2017.
[19] H. Chaoui, M. Khayamy, and A. A. Aljarboua, “Adaptive Interval Type-
2 Fuzzy Logic Control for PMSM Drives with a Modified Reference
Frame,” IEEE Trans. Ind. Electron., vol. 64, no. 5, pp. 3786–3797, May
2017.
[20] H. Chaoui, B. Hamane, and M. L. Doumbia, “Adaptive Control of
Venturini Modulation Based Matrix Converters Using Interval Type-2
Fuzzy Sets,” Journal of Control, Automation and Electrical Systems,
Springer, vol. 7, no. 2, pp. 132–143, Apr. 2016.
[21] H. Chaoui and P. Sicard, “Adaptive Fuzzy Logic Control of Permanent
Magnet Synchronous Machines with Nonlinear Friction,” IEEE Trans.
Ind. Electron., vol. 59, no. 2, pp. 1123–1133, Feb. 2012.
[22] F. Belletti, D. Haziza, G. Gomes, and A. M. Bayen, “Expert Level
Control of Ramp Metering Based on Multi-Task Deep Reinforcement