Paper 1
Paper 1
Keywords: In smart grid, energy consumption has grown exponentially in residential houses, which
Reinforcement learning necessitates the adoption of demand response management. To alleviate and handle the energy
Demand response management management in residential houses, an efficient residential energy management (REM) system
Blockchain
can be employed to regulate the energy consumption of appliances for several energy loads such
Decentralized energy management
as non-shiftable, shiftable, and controllable loads. Many researchers have focused on the REM
DQN
using machine learning and deep learning techniques which is not able to provide secure and
optimal energy management procedure. Thus, in this paper, a multi-agent-based decentralized
REM, i.e., MD-REM approach is proposed using Deep Reinforcement Learning (DRL) with the
utilization of blockchain. Furthermore, The combinatorial model DQN, i.e., Q-learning and
deep neural network (DNN) is employed, to gain the optimal price based on the reduced
energy consumption by appliances associated with different energy loads utilizing the Markov
Decision Process (MDP). Here, multiple agents are designed to handle different energy loads and
consumption is controlled by the DQN agent, then reduced consumption data is securely shared
among all stakeholders using blockchain-based smart contract. The performance evaluation of
the proposed MD-REM approach seems to be efficient in terms of reduced energy consumption,
optimal energy price, reward, and total profit analysis. Moreover, blockchain-based result is
evaluated for the proposed MD-REM approach considering the performance metrics such as
transaction efficiency, Interplanetary File System (IPFS) bandwidth utilization, and data storage
cost comparison.
1. Introduction
The growing proliferation of energy demand entails the need for an advanced energy management system (EMS) in modern
smart grid infrastructure associated with intelligent information and communication technologies [1,2]. The EMS has been widely
adopted in residential houses to monitor and handle the energy demand response [3]. Moreover, the employed residential energy
management (REM) system utilizes the smart grid along with its operations to schedule and optimize the energy consumption of
∗ Corresponding author.
E-mail addresses: aparna.kumari@nirmauni.ac.in (A. Kumari), 21ftphde56@nirmauni.ac.in (R. Kakkar), sudeep.tanwar@nirmauni.ac.in (S. Tanwar),
deepak.garg@sru.edu.in (D. Garg), zdzislaw.polkowski@kans.pl (Z. Polkowski), fhalqahtani@ksu.edu.sa (F. Alqahtani), atolba@ksu.edu.sa (A. Tolba).
https://doi.org/10.1016/j.jobe.2024.109031
Received 19 October 2023; Received in revised form 5 March 2024; Accepted 9 March 2024
Available online 15 March 2024
2352-7102/© 2024 Elsevier Ltd. All rights reserved.
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
residential consumers [4,5]. The main criteria for incorporating REM systems are maintaining the balance between demand and
supply, high reliability, efficiency, and reduced energy load during peak hours. Many researchers conducted a study on single
energy load-based REM systems to manage the demand and supply for consumers [6–8]. However, integrating REM systems for
multiple energy loads is challenging due to the dynamic consumption environment. As a result, regulating the energy consumption
and predicting the operational time for home appliances need to be considered for an efficient and reliable demand response for
REM system.
Thus far, substantial efforts have been put to address the aforementioned issues of the REM system, although most of the
research works focused on reduced energy cost, real-time energy transaction, and minimized peak load. Some of the significant
research works are: Barua et al. [9] employed an optimization-based EMS to optimize and minimize the energy cost for residential
consumers/customers. Then, the authors of [10] mainly focus on EMS to minimize the cost of energy usage in residential buildings
with the help of EVs and distributed energy sources based on centralized EMS. However, centralized EMS can allow malicious
attackers to manipulate the energy data, which can demotivate participants from using the system.
Rajarajeswari et al. [11] adopted a novel demand-side smart management scheme to obtain reduced energy cost and energy
consumption for residential buildings. But, they deployed a smart energy socket as a centralized load management system to
handle the energy consumption of appliances vulnerable to single-point-of-failure, data manipulation, and data spoofing security
attacks. Hence, many researchers adopted machine learning and deep learning techniques to secure the energy data in EMS. The
employed techniques utilized various approaches, such as Recurrent Neural Networks (RNN), Artificial Neural Networks (ANN),
and Convolutional Neural Networks (CNN), to establish transparency in the system. For example, Ahrarinouri et al. [12] proposed a
distributed reinforcement learning energy management strategy considering the multiple residential energy hubs to improve energy
costs. Then, the authors of [13] optimized the EMS for residential buildings consisting of controllable and shiftable energy loads
utilizing the swarm intelligence algorithms. However, the proposed research does not provide efficient energy management and
monitoring when the energy load exceeds the threshold value. Thus, to address the aforementioned issues, the non-intrusive load
monitoring (NILM) technique is incorporated to monitor and track appliances’ energy consumption by observing the current and
voltage fluctuations.
Several researchers deployed the NILM technique in their energy management system for efficient peak load monitoring. For
example, Ramadan et al. [14] provided a promising intelligent home energy management based on the Internet of Things (IoT)
platform through the usage of the NILM technique to efficiently monitor the fluctuations in energy consumption of appliances.
Next, the authors of [15] presented a NILM-based EMS for residential microgrids to evaluate consumer energy consumption with
improved accuracy. Although the aforementioned approaches address energy management issues, these approaches still do not work
well for unseen energy consumption patterns. So, Ojand et al. [16] proposed a Q-learning-based energy management predictive
control model for residential buildings to optimize the demand response. Then, the authors in [17] discussed a Q-learning-based
demand response approach for a smart home EMS adopting the static ToU price to minimize the energy cost.
The implemented machine learning, deep learning, and load monitoring techniques for energy management cannot provide a
secure and preserved energy data storage to much extent due to the various security attacks such as malware injection, man-in-the-
middle (MITM), impersonation, adversary attacks, etc. [18–20]. Moreover, the Q-learning approach optimizes consumers’ prices by
generating a Q-table. However, it is ineffective in case of a large set of state and action that needs to be taken by the agents for
optimizing the energy cost [21]. Next, the EMS discussed by the researchers did not consider the multiple agents to monitor the
energy consumption of home appliances based on the varying energy loads and data security. Also, optimal energy price, profit,
reward, transaction efficiency, and data storage cost is not considered by the researchers while discussing the REM. Therefore,
to mitigate the security challenges of deep learning and machine learning techniques and to optimize the working of Q-learning
with small state spaces, we proposed a multi-agent-based decentralized residential energy management (MD-REM ) approach using
Deep Reinforcement Learning (DRL). Here, in the proposed MD-REM approach, the Deep Q-learning Network (DQN) mechanism (a
combination of deep neural network (DNN) with Q-learning approach) and blockchain is employed to approximate the Q-value to
attain the optimal price using Markov Decision Process (MDP) based on the reduced energy consumption of various home appliances
associated with the various energy loads. Moreover, blockchain is incorporated with Interplanetary File System (IPFS) and 5G
network to mitigate the security issues in centralized data storage, i.e., at a cloud server, for a secure and decentralized energy
data storage for an efficient and reliable REM [22].
• We propose a multi-agent-based decentralized residential energy management, i.e., MD-REM approach considering the varying
energy loads such as non-shiftable, shiftable, and controllable, that need to be monitored and managed by multiple agents
considered for each of the energy loads. The multiple agents are given the responsibility to keep track of the energy
consumption of the home appliances associated with the several energy loads in a dynamic consumption environment.
Furthermore, energy consumption has to be reduced based on the energy usage requirement of home appliances so that the
price involved due to the energy utilization can be optimized.
• Therefore, the energy consumption state of the home appliances in a dynamic environment is forwarded to each of the agents
to monitor the load. Moreover, the hourly energy prices as input provided by the utility provider are redirected towards the
multiple agents, which can consider energy prices and energy consumption state (determined from the dynamic environment)
2
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
Table 1
Abbreviations
Abbreviation Definition
CNN Convolutional neural network
DQN Deep Q-learning Network
DNN Deep neural network
DRL Deep reinforcement learning
EMS Energy management system
IPFS Interplanetary File System
NILM Non-intrusive load monitoring
P2P Peer-to-peer
REM Residential energy management
RNN Recurrent neural network
ToU Time-of-Use
MITM Man-in-the-middle
Table 2
Symbols.
Symbols Definition Symbols Definition
𝑁𝑠𝐸 Non-shiftable energy load 𝜏 Time slot
𝑆𝐸 Shiftable energy load 𝑃 𝑀𝑙𝑎 Price function
𝐶𝐸 Controllable energy load 𝑅𝑆𝑑 Rate of energy price
𝑆𝑑 Smart grid 𝛹 Action value function
𝛼𝑑 Dynamic environment 𝛱 State set
𝑈 𝑝𝑢 Utility provider 𝛿∗ Optimal policy
𝐸𝑑 Energy data 𝜒 Discount factor
𝐸
𝜁𝑧𝑁𝑠 Home appliances for non-shiftable energy load 𝛬 Energy price
𝐸
𝜆𝑆𝑎 Home appliances for shiftable energy load 𝜓 Q-value
𝐸
𝜅𝑝𝐶 Home appliances for critical controlled energy load 𝜄 Learning rate
𝛤 Energy 𝜚𝑣 Network parameters
𝑀𝑙𝑎 Agents 𝜖 Public key
𝛾 Energy load 𝜀 Private key
𝛩 Energy consumption 𝜙 Decryption
𝛥 Energy demand 𝑆ℎ𝑑 Hash digest
to optimize the price based on the reduced energy consumption considering the MDP modeling. For that, the DQN approach
is introduced, approximating the Q-value by utilizing the DNN, which overcomes the issues of low efficiency of Q-learning in
the case of large state spaces.
• Moreover, the energy data, i.e., energy consumption and hourly energy prices, should be facilitated with a secure and preserved
data storage platform. For that, the utility provider is registered with the smart contract to enable authenticated energy data
storage in the introduced blockchain network by utilizing IPFS content-addressing data storage protocol. As a result, the
energy data can be securely stored in the blockchain and the 5G networks, ensuring massive network capacity, high reliability,
ultra-low latency, and high availability.
• Furthermore, we have introduced the concept of an intelligent grid operator to handle the massive energy demand of the
appliances with the optimal cost. The optimal energy price is obtained using the DQN approach, stored in the blockchain
network (through an intermediary IPFS) using a smart contract, and securely accessible by all stakeholders.
• Finally, the performance evaluation of the proposed MD-REM approach is analyzed using the DQN approach to deduce
the reduced energy consumption and optimal energy price along with the reward and total profit analysis. Additionally,
blockchain-based results are evaluated for the proposed MD-REM approach in terms of transaction efficiency, IPFS bandwidth
utilization, and data storage cost.
The rest of the paper is organized as follows. Section 2 highlights the related work. Section 3 describes the system model of
MD-REM and problem formulation of the proposed MD-REM approach. Then, Section 4 discusses the proposed MD-REM approach.
Section 5 presents the performance evaluation and experimental results of the proposed MD-REM approach. Section 6 highlights
the discussion on the proposed MD-REM approach. and finally, the paper is concluded in Section 7 with the future work. Table 1
shows the abbreviations along with their definitions considered for the proposed MD-REM approach. Table 2 shows the symbols
used in the proposed MD-REM approach.
2. Related work
Many researchers have given persuasive solutions to enable optimal and efficient residential multi-agent energy management
utilizing the deep learning models [34–37]. Most researchers have adopted a reinforcement learning model to schedule and control
3
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
Table 3
Comparative analysis of various state-of-the-art energy management scheme with the proposed approach.
Author Year Objective Pros Cons
Martinez et al. [23] 2019 Heuristic-based home energy Reduced computational Did not consider privacy
management using renewable time and optimal solution and reward aspect
energy
Jiang et al. [24] 2020 Proposed a multi-agent-based Minimized cost and Need to consider dynamic
cooperative reinforcement optimized energy loads
learning approach for energy consumption
management
Nizami et al. [25] 2020 Investigated a Optimized profit and Should focus on enhancing
multi-agent-based energy real-time energy scalability and reliability
management system for transactions
residential buildings
Ye et al. [2] 2021 Discussed a secure Reduced peak demand and voltage and thermal limits
multi-agent-based for P2P community cost need to be considered
energy trading using deep
reinforcement learning
Ahrarinouri et al. [26] 2021 Applied reinforcement Low cost, highly Should work on dynamic
learning for multi-agent interoperable resource allocation and
energy management in privacy issues
residential buildings
Kumari et al. [3] 2021 Presented a multi-agent-based Reduced energy cost and Data security issues against
residential energy consumption DoS, man-in-the-middle,
management scheme with and cyber attacks
reinforcement learning
Wang et al. [27] 2021 Multi-objective-based home Reduced energy cost and Lack of discussion on
energy management based on user satisfaction profit and privacy issues
IoT system
Lai et al. [28] 2022 Discussed an economical Optimized power demand Security and privacy issues
multi-agent community energy and low community cost
management using
reinforcement learning
Chen et al. [29] 2022 Formulated a P2P energy Low operating cost No consideration of data
trading and conversion security issues, high
scheme using multi-agent deep computational cost
reinforcement learning
Tilburg et al. [30] 2023 Presented a reinforcement Reduced peak-to-average Should focus on profit and
learning-based residential ratio, improved reliability security analysis
demand response management and efficiency
for incentivizing energy
consumption of consumers
Hossain et al. [31] 2023 Energy management strategy Reduced operational cost Vulnerable to malicious
for multi-microgrids utilizing and real-time scheduling attacks
reinforcement learning
Guo [32] 2023 Utilized deep reinforcement Optimal decision, profit, No consideration of
learning for multi-microgrid and reward analysis security aspect
energy management
Xiong et al. [33] 2023 Discussed home energy Optimized electricity cost Susceptible to security and
management using and energy privacy attacks
reinforcement learning
utilizing decoupling value
The proposed system 2024 Proposed a multi-agent-based Highly secure, minimum –
decentralized residential peak energy, and efficient
energy management using
deep reinforcement learning
energy consumption at an optimal price. The reinforcement learning model helps tackle the energy consumption for efficient energy
management in which agents try to maximize their reward by autonomously learning from the environment. Therefore, to perform an
optimal energy consumption, Jiang et al. [24] investigated a multi-agent cooperative-based energy management system with applied
reinforcement learning. They have formulated a Markov game approach to balance household energy consumption with an optimal
cost. However, the previous research work ignored the management of energy consumption for dynamic loads in real time. Thus, to
provide real-time energy transactions between users, the authors in [25] proposed a multi-agent-based energy management approach
for residential buildings. They have formulated a two-stage energy management system to optimize the profit for consumers with
real-time transactions. But, the researchers in [25] ignored the scalability and reliability of the energy management system, further
deteriorating its system performance.
4
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
To overcome the scalability and reliability issues, Ye et al. [2] discussed a scalable peer-to-peer (P2P) multi-agent-based deep
reinforcement learning approach for energy trading on a large scale. Later, the authors in [26] applied a cooperative multi-
agent approach to enable robust and optimal energy management for residential buildings. Although they have attained a highly
interoperable and low-cost energy consumption for residential buildings, an important aspect of dynamic resource allocation is
completely ignored in their research work. Therefore, Kumari et al. [3] proposed a dynamic pricing mechanism for multi-agent-
based residential energy management and applied reinforcement learning. They have designed a dynamic environment for energy
consumption by introducing multiple energy carriers. However, a dynamic pricing mechanism for energy management does not
involve optimizing the operating cost required for energy consumption by energy carriers. As a result, to reduce the operating cost
in energy management, Chen et al. [28] considered the multi-energy microgrids for P2P energy trading and conversion with the
applied multi-agent deep reinforcement learning.
Furthermore, Wang et al. [27] implemented the multi-objective optimization method by improvising the butterfly algorithm
considering the low consumption IoT system for ensuring the optimal home energy management in terms of reduced energy cost
and user satisfaction. Then, the authors of [23] adopted the genetic-based heuristic algorithm to determine an optimal solution for
efficient home energy management system with the low computation time. However, the abovementioned discussed optimization
approaches is not able to provide the optimal solution for home energy management to that extent. Based on that, Lai et al. [28]
applied reinforcement learning model to facilitate the economical multi-agent-based community energy management for consumers.
Next, the authors of [30] proposed a residential demand response management for incentivizing energy consumption of consumers
utilizing the reinforcement learning. They have focused on improving the reliability, efficiency, and peak-to-average ratio of the
energy management system. Then, Fu et al. [38] considered the scenario of multi-zone residential buildings to implement an
event-driven DRL method to regulate the energy consumption by constructing the MDP model. Furthermore, the authors in [39]
utilized the Artificial Intelligence (AI)-based energy management system by optimizing the microgrid management to improve the
performance in terms of power usage reduction. However, the abovementioned research work did not consider the discussion on
operational cost and real-time scheduling for efficient energy management which is the main focus of [31]. Their proposed energy
management strategy also implemented the reinforcement learning for multi-microgrids using the proximal policy optimization
algorithm. Furthermore, Guo [32] focused on the various optimal energy management aspects such as optimal decision, profit, and
reward analysis which is not discussed in the aforementioned research works. The proposed multi-microgrid energy management
strategy considered the deep reinforcement learning for yielding the optimal decision, but there is no consideration of the security
attacks that can disrupt the entire energy management. Next, the authors of [33] utilized the concept of decoupling value for
discussing the home energy management implementing the reinforcement learning by optimizing the electricity cost of the energy.
Nevertheless, the aforementioned research works proposed by the authors have focused on optimizing energy consumption in
residential energy management utilizing the reinforcement learning model. But, reinforcement learning models work on the principle
of a centralized system that can exploit the privacy of an energy management system and make it vulnerable to several security
attacks such as single-point-of-failure, data injection, cyber-attacks, etc. More, some of the research works [2,28,29] have not
considered the optimal energy price, which can worsen the performance of multi-agent energy management. Therefore, to mitigate
energy management’s security and privacy issues, we have proposed a secure MD-REM approach using the DQN mechanism and
blockchain. Blockchain, as a decentralized and secure framework, ensure the security and privacy of energy consumption data for
residential houses. Furthermore, Table 3 shows the comparative analysis of various state-of-the-art energy management schemes
with the proposed MD-REM approach.
This section discusses the system model of the proposed MD-REM approach using the DQN mechanism and highlights the problem
formulation of MD-REM approach.
Fig. 1 shows the basic workflow of the proposed multi-agent-based decentralized REM system, MD-REM using DQN, i.e., Q-
learning and DNN model to optimize the energy consumption of various home appliances for utilizing the energy of the smart
grid. We have considered a dynamic environment that consists of multiple energy carriers, i.e., non-shiftable (𝑁𝑠𝐸 ), shiftable (𝑆 𝐸 ),
and controllable energy load (𝐶 𝐸 ) in which energy consumption of various home appliances should be reduced to lessen the energy
burden on the smart grid (𝑆𝑑 ). Energy consumption by several home appliances, i.e., fridge and alarm systems of non-shiftable energy
load, washing machines and dishwashers of shiftable energy load, and air conditioners, lights, and electric vehicles of controllable
energy load are managed and handled by multiple agents based on the various energy load for optimizing the incentive using DQN
approach. We have applied the DQN for an efficient REM system considering the dynamic environment (𝛼𝑑 ), which is categorized
into energy load 𝑁𝑠𝐸 , 𝑆 𝐸 , 𝐶 𝐸 associated with the appliances in which multiple agents get the reward for obtaining the optimal
energy price. We have introduced the utility provider 𝑈 𝑝𝑢 ∈ {𝑈 𝑝1 , 𝑈 𝑝2 , … , 𝑈 𝑝𝑝 } after registering it with 𝑆𝑑 to keep track of the
hourly energy prices that helps us to find the optimal price based on the energy consumption state.
Therefore, the energy consumption state from multiple energy carriers is monitored by the multiple agents, and hourly energy
prices managed by the utility provider are forwarded as input to the multi-agent REM system to obtain the optimal price using
the DQN approach using MDP modeling. DQN approach works on the principle of reinforcement learning in which a deep learning
model can be applied to obtain the best action, i.e., optimal price, by learning the Q-value function and the dynamic consumption
5
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
environment. Then, multi-agents participate in the DQN off-policy reinforcement learning algorithm to receive the reward for
yielding an optimized price for energy consumption reduction. After that, the residential energy data 𝐸𝑑 , i.e., energy consumption
and hourly energy price, maintained by the 𝑈 𝑝𝑢 , need to be published securely on the blockchain network to make it accessible to
all stakeholders. To accomplish a cost-efficient energy consumption data storage in blockchain based on the hourly energy prices, we
have introduced an off-chain IPFS protocol that stores energy data based on the validation done by a smart contract. After attaining
low-cost data storage in IPFS, blockchain can maintain all the stakeholders, i.e., 𝑆𝑑 , 𝑈 𝑝𝑢 , 𝛼𝑑 , to provide a secure platform for the
efficient and optimal energy consumption information in the proposed MD-REM approach.
In the proposed MD-REM approach, we have considered multiple energy carriers, i.e., {𝑁𝑠𝐸 , 𝑆 𝐸 , 𝐶 𝐸 } of various number of home
𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸
appliances {𝜁1𝑁𝑠 , 𝜁2𝑁𝑠 , … … , 𝜁𝑧𝑁𝑠 } ∈ 𝜁𝑒𝑁𝑠 for non-shiftable energy load, {𝜆𝑆1 , 𝜆𝑆2 , … … , 𝜆𝑆𝑙 } ∈ 𝜆𝑆𝑎 for shiftable energy load,
𝐸 𝐸 𝐸 𝐸
and {𝜅1𝐶 , 𝜅2𝐶 , … … , 𝜅𝑘𝐶 } ∈ 𝜅𝑝𝐶 for critical controlled energy load in which the energy provided by the 𝑆𝑑 needs to be controlled
by the multiple agents. In this study, we have considered multiple agents that is communicating with the dynamic environment
of multiple energy loads through a bi-directional communication channel to exchange the information about energy consumption
state and hourly energy prices that is denoted by {𝛩 𝑁𝑠𝐸 , 𝛩 𝑁𝑠𝐸 , … … , 𝛩 𝑁𝑠𝐸 } ∈ 𝛩 𝑁𝑠𝐸 associated with non-shiftable energy load,
𝜁1 𝜁2 𝜁𝑧 𝜁𝑒
{𝛩 𝐸 ,𝛩 𝐸 ,……,𝛩 𝐸 } ∈ 𝛩 𝐸 associated with shiftable energy load, and {𝛩 𝐸 ,𝛩 𝐸 ,……,𝛩 𝐸 } ∈ 𝛩 𝐸 for controllable
𝜆𝑆
1
𝜆𝑆
2
𝜆𝑆
𝑙
𝜆𝑆
𝑎 𝜅1𝐶 𝜅2𝐶 𝜅𝑘𝐶 𝜅𝑝𝐶
energy load. However, before discussing the energy consumption, we need to consider the energy {𝛤𝑁𝑠𝐸 , 𝛤𝑆 𝐸 , 𝛤𝐶 𝐸 } provided by
the 𝑆𝑑 to various home appliances using several energy loads. Therefore, we define the communication between 𝑆𝑑 , 𝛼𝑑 , and multi-
agent REM system 𝑀𝑙𝑎 (𝑁𝑠𝐸 , 𝑆 𝐸 , 𝐶 𝐸 ) to understand the flow of energy consumption by home appliances, which can be expressed
as follows:
𝛤 ∑
𝑧 ∑
𝑙 ∑
𝑘
𝐸 𝐸 𝐸
𝑆𝑑 ←←←→
← {𝜁𝑒𝑁𝑠 , 𝜆𝑆𝑎 , 𝜅𝑝𝐶 } (1)
𝑒=1 𝑎=1 𝑝=1
𝐸 𝐸 𝐸
𝑁𝑠𝐸 →
←← {𝜁1𝑁𝑠 , 𝜁2𝑁𝑠 , … … , 𝜁𝑧𝑁𝑠 } (2)
𝐸 𝐸 𝐸
𝑆 𝐸→
←← {𝜆𝑆1 , 𝜆𝑆2 , … … , 𝜆𝑆𝑙 } (3)
𝐸 𝐸 𝐸
𝐶 𝐸→
←← {𝜅1𝐶 , 𝜅2𝐶 , … … , 𝜅𝑘𝐶 } (4)
𝛾 𝐸 𝐸 𝐸
← {𝜁𝑒𝑁𝑠 , 𝜆𝑆𝑎 , 𝜅𝑝𝐶 }
𝑀𝑙𝑎 ←←→ (5)
𝐸 𝐸 𝐸
where, 𝛤 signifies z, l, and k number of home appliances associated with {𝜁𝑒𝑁𝑠 , 𝜆𝑆𝑎 , and 𝜅𝑝𝐶 } in the smart grid environment for
energy usage. Further, 𝛾 denotes various energy loads managed and tackled by the multiple associated agents 𝑀𝑙𝑎 . Furthermore, the
proposed MD-REM approach categorizes home appliances into critical and non-critical components to control energy consumption
6
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
based on the priority and energy distribution of the particular appliance. For example, non-shiftable energy load appliances cannot
be rescheduled. It means their energy requirement should be fulfilled during energy distribution.
Controllable energy load appliances demand their energy requirement during the energy distribution. However, in this case, a
feasible energy demand range from minimum to maximum exists to fulfill the controllable energy load requirement. But, shiftable
energy loads can be rescheduled later based on the lower energy prices. Therefore, we can determine the energy consumption of
various energy loads based on their critical and priority characteristics. Firstly, we can discuss the energy consumption (𝛩) of z
number of non-shiftable energy loads, which is equivalent to the energy demand (𝛥), which can vary along with a timeslot in a day,
which can be mentioned as follows:
∑𝑧
𝜁 ∑𝑧
𝜁
𝛩𝑒 𝐸 = 𝛥𝑒 𝐸 (6)
𝑁𝑠 ,𝜏 𝑁𝑠 ,𝜏
𝑒=1 𝑒=1
where z is the number of critical non-shiftable energy load appliances, and 𝜏 represents the particular time slot of a day (in
hours) that can be denoted by 𝜏1 , 𝜏2 , … … , 𝜏𝑢 ∈ {1, 24}. Next, we can contemplate the energy consumption of l number of critical
controllable energy load appliances that operate based on the minimum and maximum energy demand at the time of energy
distribution.
∑
𝑘
𝜅𝑝 ∑
𝑘
𝜅𝑝
𝛩 = 𝛥 (7)
𝐶 𝐸 ,𝜏 𝐶 𝐸 ,𝜏
𝑝=1 𝑝=1
𝜅𝑝 𝜅𝑝 𝜅𝑝
𝑚𝑖𝑛(𝛥 )<𝛥 < 𝑚𝑎𝑥(𝛥 ) (8)
𝐶 𝐸 ,𝜏 𝐶 𝐸 ,𝜏 𝐶 𝐸 ,𝜏
where k is the number of critical controllable energy load appliances and the energy demand between minimum and maximum with
the variation in hourly energy prices.
On the other hand, shiftable energy load appliances are considered non-critical appliances in which energy demand can be
rescheduled based on the fluctuating hourly energy prices. For example, if low hourly energy prices are available, the energy demand
requirement for shiftable load appliances can be fulfilled. An otherwise particular appliance can be shut down due to the high energy
prices. Therefore, energy consumption for non-critical shiftable energy load appliances can be determined as follows:
{
1, if prices are low
𝜉 𝑆𝐸 = (9)
𝜆𝑎 ,𝜏 0, if prices are high
∑
𝑙
𝜆𝑎 ∑
𝑙
𝜆𝑎
𝛩 = 𝛥 ∗𝜉 𝐸 (10)
𝑆 𝐸 ,𝜏 𝑆 𝐸 ,𝜏 𝜆𝑆
𝑎 ,𝜏
𝑎=1 𝑎=1
Finally, the optimized energy consumption for appliances associated with non-shiftable energy loads can be evaluated using actual
𝜁 𝑎𝑐 𝜁 𝑜𝑏
and objective energy consumption {𝛩𝜏𝑒 , 𝛩𝜏𝑒 } at a time slot 𝜏, which is mentioned as follows:
𝜁 𝑀𝑖𝑛 | 𝜁 𝑎𝑐 𝜁 𝑜𝑏 |
2
𝛩𝜏 𝑒 = ||𝛩𝜏𝑒 − 𝛩𝜏𝑒 || (11)
| |
𝜁𝑒𝑎𝑐
Further, the actual energy consumption 𝛩𝜏 for appliances associated with non-shiftable energy load is defined based on their
energy utilized and type of energy load, which is defined as follows:
𝜁 𝑎𝑐 𝐸 𝜁𝑒
𝛩𝜏𝑒 = 𝜁𝑒𝑁𝑠 𝛩 (12)
𝑁𝑠𝐸 ,𝜏
Similarly, the optimized energy consumption for appliances associated with critical controlled and shiftable energy loads can be
𝜅 𝑎𝑐 𝜅 𝑜𝑏 𝜆𝑎𝑐 𝜆𝑜𝑏
determined using actual and objective energy consumption {𝛩𝜏 𝑝 , 𝛩𝜏 𝑝 } and {𝛩𝜏 𝑎 , 𝛩𝜏 𝑎 } for the specific load considering the time
interval of 𝜏, which is expressed as follows:
𝜅 𝑀𝑖𝑛 | 𝜅 𝑎𝑐 𝜅 𝑜𝑏 |
2
𝛩𝜏 𝑝 = ||𝛩𝜏 𝑝 − 𝛩𝜏 𝑝 || (13)
| |
𝜆𝑀𝑖𝑛 | 𝜆𝑎𝑐 𝜆𝑜𝑏 |
2
𝛩𝜏 𝑎 = ||𝛩𝜏 𝑎 − 𝛩𝜏 𝑎 || (14)
| |
Thus, actual energy consumption for appliances associated with critical controlled and shiftable energy loads can be defined as
follows:
𝜅 𝑎𝑐 𝐸 𝜅𝑝
𝛩𝜏 𝑝 = 𝜅𝑝𝐶 𝛩 (15)
𝐶 𝐸 ,𝜏
𝜆𝑎𝑐 𝐸 𝜆𝑎
𝛩𝜏 𝑎 = 𝜆𝑆𝑎 𝛩 (16)
𝑆 𝐸 ,𝜏
Finally, optimized energy consumption by various appliances of non-shiftable, critical controlled, and shiftable energy loads can be
utilized to obtain the price objective function for multiple agents 𝑀𝑙𝑎 ∈ 𝑀𝑙1 , 𝑀𝑙2 , … … , 𝑀𝑙𝑚 to monitor various home appliances
considering the time slot of a day, i.e., 𝜏𝑢 . Thus, a smart grid as an energy provider facilitates optimal price or incentive to multiple
agents based on the optimized energy consumption, for which objective price function 𝑃 𝑀𝑙𝑎 can be defined as follows:
∑ ∑
24 (𝑧,𝑘,𝑙)
𝜁 𝑀𝑖𝑛 𝜅 𝑀𝑖𝑛 𝜆𝑀𝑖𝑛
𝑀𝑎𝑥𝑖𝑚𝑖𝑧𝑒 (𝑃 𝑀𝑙𝑎 (𝜁, 𝜅, 𝜆)) = {𝛩𝜏𝑒 , 𝛩𝜏 𝑝 , 𝛩𝜏 𝑎 } ∗ 𝑅𝑆𝑑 (17)
𝜏=1 (𝑒,𝑝,𝑎)=1
7
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
where 𝑅𝑆𝑑 denotes the rate of energy price declared by the grid, and it is used to optimize the price for e, p, and a number of
appliances of several energy loads based on the reduced energy consumption. Thus, data, i.e., reduced energy consumption and
optimal price, should be provided with a secure storage platform. For that, we have incorporated blockchain technology, which
stores energy data transactions with tamper-proof, personal, and transparent characteristics. However, despite the highly secure
and transparent data storage platform, it incurs a high amount of data storage cost for storing the data in an immutable way. Thus,
IPFS enabled with the blockchain is a low-cost data storage combinatorial framework to store energy consumption and price data
securely and reliably. Additionally, 5G wireless networks enable a high data rate, low latency, and reliable communication between
smart grids, dynamic consumption environments, and multiple agents.
Fig. 2 shows the proposed MD-REM approach, which consists of multiple layers, i.e., Dynamic Consumption Layer, DQN Layer, and
Blockchain and IPFS-enabled Layer to realize the entire procedure of energy consumption reduction in the dynamic environment of
a residential house. Several energy loads can be attained based on the hourly energy prices maintained by the utility provider and
energy consumption state forwarded as input to the multiple agents to apply the DQN approach for obtaining the optimal price as
the best action. Therefore, multiple layers, i.e., Dynamic Consumption Layer, DQN Layer and Blockchain and IPFS-enabled Layer of the
proposed MD-REM approach are discussed in detail in which energy loads acquired from Dynamic Consumption Layer is handled
and monitored by the multiple agents of DQN Layer to apply the DNN and Q-learning approach to yield the best action in the form
of optimal price based on the reduced energy consumption of the energy loads. Furthermore, energy data security is confirmed with
the help of an executed smart contract to store it in the blockchain network through an intermediary IPFS protocol. Thus, multiple
layers associated with the proposed MD-REM approach are represented as follows:
The dynamic consumption layer consists of various home appliances (e.g., fridge, alarm system, air conditioner, dishwasher, etc.)
associated with their dynamic energy loads in which we have highlighted the non-shiftable 𝑁𝑠𝐸 , shiftable 𝑆 𝐸 , and controllable
𝐸 𝐸 𝐸
energy load 𝐶 𝐸 . Based on these energy loads, each home appliances {𝜁𝑒𝑁𝑠 , 𝜆𝑆𝑎 , 𝜅𝑝𝐶 } acquire dynamic energy consumption
according to their requirement. Moreover, a smart grid as an administrator or energy operator is introduced to provide the required
energy to the appliances. Now, smart grid should facilitate sufficient energy to the appliances fulfilling the energy requirement of
the appliances, but it should be provided in an optimized for efficient demand response between smart grid and appliances. The
main criteria of the proposed approach is that home appliances should only consume sufficient energy based on their requirement or
the energy load associated with them. Moreover, this study introduced multiple agents to tackle each energy consumption incurred
by the home appliances based on their energy load. The agents have been assigned to a group of appliances consisting of specific
energy loads to manage their energy consumption, which is explained deeply in the DQN layer. Furthermore, energy consumption
state from the dynamic consumption layer is transferred to the DQN layer to optimize the energy consumption price considering
the hourly energy prices made available by the utility provider.
8
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
DQN layers work as an intermediary between dynamic consumption and blockchain and IPFS-enabled layer. The energy
consumption state from the several energy loads in the dynamic consumption layer is considered in the DQN layer. The DQN layer
explains how the energy consumption of the home appliances can be optimized according to their energy usage for associated energy
load in detail. Therefore, multiple agents of non-shiftable, shiftable, and controlled energy load appliances, i.e., {𝑁𝑠𝐸 , 𝑆 𝐸 , 𝐶 𝐸 },
obtain energy consumption state 𝛩 from the dynamic environment as an input. Moreover, we need to attain an optimal price for
various home appliances as a result of the best action with the help of the DQN approach. Once the optimal price for various energy
loads has been obtained, multiple agents get the reward in the form of a reduced energy bill for monitoring and managing the
energy consumption of home appliances.
Furthermore, multiple agents in the dynamic environment layer communicate with the utility provider to enable the optimal
pricing action with the help of hourly energy prices that the utility provider is monitoring. The utility provider 𝑈 𝑝𝑢 is being registered
with the smart grid 𝑆𝑑 to ensure the security and preserve transit of hourly energy prices to the multi-agent REM system. The
hourly energy prices help us find the optimal price using a dynamic pricing policy. So, we require energy consumption state from
the dynamic environment and hourly energy prices from the utility provider to apply the DQL approach for optimizing the price for
efficient energy management. Thus, the association between the above-mentioned entities to explain the flow of energy consumption
state and hourly energy prices in the dynamic environment of the proposed MD-REM approach is represented as follows:
′
∑
𝑝
𝑎
← 𝑆𝑑 , 𝑝′ < 𝑝
𝑈 𝑝𝑢 ←←→ (18)
𝑢=1
𝐸 𝐸 𝐸 𝑏 ∑
𝑚
{𝜁𝑒𝑁𝑠 , 𝜆𝑆𝑣 , 𝜅𝑝𝐶 } ←←→
← 𝑀𝑙𝑎 (19)
𝑎=1
∑
𝑝
𝑓 ∑
𝑚
𝑈 𝑝𝑢 ←←←→
← 𝑀𝑙𝑎 (20)
𝑢=1 𝑎=1
where, v represents the association of smart grid with p number of utility provider. Further, b and f signify the relation between
dynamic environment, utility provider, and multiple agents. Therefore, the multi-agent REM system accepts energy consumption
state and hourly energy prices as an input to obtain the output as an optimal price with the help of the DQN approach. Initially,
we have to determine the action-value function (𝛹 ) for multiple agents 𝑀𝑙𝑎 associated with the specific energy loads to observe
how they respond to a dynamic environment for obtaining the optimal price. For that, a state set 𝛱 is considered to define the
particular state of multiple agents, i.e., {𝜋(𝑀𝑙𝑎 (𝑁𝑠𝐸 , 𝜏)), 𝜋(𝑀𝑙𝑎 (𝑆 𝐸 ), 𝜏), 𝜋(𝑀𝑙𝑎 (𝐶 𝐸 ))} ∈ 𝜋 𝑀𝑙𝑎 , at a time slot of 𝜏 based on the specific
energy loads. Moreover, action set P is considered to define the action 𝜌𝑀𝑙𝑎 ∈ {𝜌(𝑀𝑙𝑎 (𝑁𝑠𝐸 , 𝜏)), 𝜌(𝑀𝑙𝑎 (𝑆 𝐸 ), 𝜏), 𝜌(𝑀𝑙𝑎 (𝐶 𝐸 ))} taken by
multiple agents to optimize the energy consumption by appliances of varying energy loads. Thus, multiple agents focus on finding
the optimal policy 𝛿∗ considering the particular state and action. For that, the action-value function for policy 𝛿 can be defined as
follows:
∑
𝑇
𝛹𝛿 (𝜋 𝑀𝑙𝑎 ,𝜏 , 𝜌𝑀𝑙𝑎 ,𝜏 ) = 𝜒 𝑖−𝜏−1 (𝑃 𝑀𝑙𝑎 (𝜁 , 𝜅, 𝜆)𝑖−1 |𝜋 𝑀𝑙𝑎 ,𝜏 , 𝜌𝑀𝑙𝑎 ,𝜏 ) (21)
𝑖=𝜏+1
where 𝜒 signifies the discount factor of multiple agents of various energy loads; thus, after agents select an action to get the
optimized price, the dynamic environment can transition the present state to the next state, i.e., (𝜋 𝑀𝑙𝑎 ,𝜏 , 𝜋 𝑀𝑙𝑎 ,𝜏+1 ). Further,
based on the following state taken by the multiple agents, the Q-value 𝜓(𝜋 𝑀𝑙𝑎 ,𝜏 , 𝜌𝑀𝑙𝑎 ,𝜏 ) at a particular state and action in DQN
approach works on the principle of Q-table to optimize the price 𝛬𝜏+1 (𝜋 𝑀𝑙𝑎 ,𝜏 , 𝜌𝑀𝑙𝑎 ,𝜏 ) which is being obtained as feedback from the
dynamic environment for multiple agents handling the several appliances of varying energy loads. Thus, calculation of the Q-value
𝐸 𝐸
𝜓(𝜋 𝑀𝑙𝑎 (𝑁𝑠 ),𝜏 , 𝜌𝑀𝑙𝑎 (𝑁𝑠 ),𝜏 ) can be performed for agent handling the non-shiftable energy load, which is defined as follows:
𝐸 ),𝜏 𝐸 ),𝜏 𝐸 ),𝜏 𝐸 ),𝜏 𝐸 ),𝜏 𝐸 ),𝜏 𝐸 ),𝜏+1
𝜓(𝜋 𝑀𝑙𝑎 (𝑁𝑠 , 𝜌𝑀𝑙𝑎 (𝑁𝑠 ←← 𝜓(𝜋 𝑀𝑙𝑎 (𝑁𝑠
)← , 𝜌𝑀𝑙𝑎 (𝑁𝑠 ) + 𝜄𝛬𝜏+1 (𝜋 𝑀𝑙𝑎 (𝑁𝑠 , 𝜌𝑀𝑙𝑎 (𝑁𝑠 ) + 𝜒 𝑚𝑎𝑥 𝜓(𝜋 𝑀𝑙𝑎 (𝑁𝑠 ,
𝑀𝑙𝑎 (𝑁𝑠𝐸 ),𝜏+1 𝑀𝑙𝑎 (𝑁𝑠𝐸 ),𝜏 𝑀𝑙𝑎 (𝑁𝑠𝐸 ),𝜏
𝜌 ) − 𝜓(𝜋 ,𝜌 ) (23)
where 𝜄 signifies the learning rate that varies based on the range of [0,1] and 𝜒 is expressed as the discount factor defined while
calculating the action-value function to determine the optimized price for multiple agents to monitor and handle the various
home appliances associated with non-shiftable energy load. Similar to the calculation of the Q-value for non-shiftable energy
𝐸 𝐸
load, we can determine the Q-value for each of the agent monitoring controlled 𝜓 ′ (𝜋 𝑀𝑙𝑎 (𝐶 ),𝜏 , 𝜌𝑀𝑙𝑎 (𝐶 ),𝜏 ) and shiftable energy load
𝐸 𝐸
𝜓 ′′ (𝜋 𝑀𝑙𝑎 (𝑆 ),𝜏 , 𝜌𝑀𝑙𝑎 (𝑆 ),𝜏 ), which is expressed as follows:
𝐸 ),𝜏 𝐸 ),𝜏 𝐸 ),𝜏 𝐸 ),𝜏 𝐸 ),𝜏 𝐸 ),𝜏
𝜓 ′ (𝜋 𝑀𝑙𝑎 (𝐶 , 𝜌𝑀𝑙𝑎 (𝐶 ←← 𝜓 ′ (𝜋 𝑀𝑙𝑎 (𝐶
)← , 𝜌𝑀𝑙𝑎 (𝐶 ) + 𝜄′ 𝛬𝜏+1 (𝜋 𝑀𝑙𝑎 (𝐶 , 𝜌𝑀𝑙𝑎 (𝐶 )
′ ′ 𝑀𝑙𝑎 (𝐶 𝐸 ),𝜏+1 𝑀𝑙𝑎 (𝐶 𝐸 ),𝜏+1 𝐸 ),𝜏 𝐸 ),𝜏
+𝜒 𝑚𝑎𝑥 𝜓 (𝜋 ,𝜌 ) − 𝜓 ′ (𝜋 𝑀𝑙𝑎 (𝐶 , 𝜌𝑀𝑙𝑎 (𝐶 ) (24)
9
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
where 𝜄′ and 𝜒 ′ represent the learning rate and discount factor considered for evaluating the Q-value for critical controlled energy
load. Moreover, 𝜄′′ and 𝜒 ′′ signify the learning rate and discount factor correlated with the calculation of Q-value for critical
controlled and shiftable energy load. The DQN agent takes into account of load shifting, relying on the Q-value computed for
shiftable appliances in the proposed approach, as outlined in Algorithm 1 as well. This load shifting involves strategically adjusting
the timing of energy consumption, often to off-peak hours, to optimize energy usage and reduce costs.
𝐸 ),𝜏
𝜌𝑀𝑙𝑎 (𝐶 𝐸 , 𝜏)∗ = 𝑎𝑟𝑔𝑚𝑎𝑥(𝜓 ′ (𝜋 𝑀𝑙𝑎 (𝐶 )) (27)
𝐸 ),𝜏
𝜌𝑀𝑙𝑎 (𝑆 𝐸 , 𝜏)∗ = 𝑎𝑟𝑔𝑚𝑎𝑥(𝜓 ′′ (𝜋 𝑀𝑙𝑎 (𝑆 )) (28)
The applied DQN attains the optimal pricing with the help of considered reinforcement learning and DNN with the help of action-
value function calculated using Q-table utilizing the advantages of the DQN approach. Thus, DQN can be used to maximize the
action-value function 𝛹𝛿 (𝜋 𝑀𝑙𝑎 ,𝜏 , 𝜌𝑀𝑙𝑎 ,𝜏 ) considering the policy 𝛿 by randomly selecting the action with probability 𝜗 for several
energy loads, which is represented as follows:
∑ 𝐸 𝐸
𝜌(𝜋, 𝜌, 𝑁𝑠𝐸 ) = 𝑚𝑎𝑥 𝛹𝛿 (𝜋 𝑀𝑙𝑎 (𝑁𝑠 ),𝜏 , 𝜌𝑀𝑙𝑎 (𝑁𝑠 ),𝜏 ) (29)
𝛿
∑ 𝐸 ),𝜏
𝐸
𝜌(𝜋, 𝜌, 𝐶 ) = 𝑚𝑎𝑥 𝛹𝛿 (𝜋 𝑀𝑙𝑎 (𝐶 , 𝜌𝑀𝑙𝑎 ,𝜏 ) (30)
𝛿
∑ 𝐸 ),𝜏 𝐸 ),𝜏
𝜌(𝜋, 𝜌, 𝑆 𝐸 ) = 𝑚𝑎𝑥 𝛹𝛿 (𝜋 𝑀𝑙𝑎 (𝑆 , 𝜌𝑀𝑙𝑎 (𝑆 ) (31)
𝛿
Finally, after optimizing the action-value function utilizing the DQN approach considering the MDP modeling, the loss function can
be evaluated to update the episodes in the DQN approach based on the multiple agents associated with the specific energy load,
which is defined as follows:
where 𝛬 denotes the reward, 𝜒 ∗ represents the discount factor, and 𝜚𝑣 signifies the network parameters utilized in the DQN network
to update the episode. Therefore, the loss function is determined by adding the reward and subtracting the values obtained from
prediction using DNN.
Therefore, the loss function helps to obtain the optimal pricing in the form of reward by improving the action value function of
the reinforcement learning model with the help of DNN prediction. Algorithm 1 highlights the complete procedure to yield optimal
energy price using DQN that incurs time complexity of O(E), in which E represents the number of updation in episodes in the DQN
approach. However, the data, i.e., hourly energy prices and energy consumption to calculate the optimal price, must be stored in
a highly secure storage platform. The decentralized and transparent blockchain platform facilitates protected and safe data storage
for participants involved in the network. However, data storage in blockchain seems quite costly and can demotivate participants
from storing the data. As a result, the blockchain and IPFS-enabled layer is considered to store hourly energy prices and energy
consumption with high privacy and low data storage cost due to the possible content-addressing in IPFS that makes energy data
accessible with the help of its cryptographic hash.
The residential energy data after determining the optimal price from the DQN layer is transferred to the blockchain and IPFS-
enabled layer. As residential energy data can be easily manipulated by the malicious attackers which can impact the security
and privacy of the REM. Thus, blockchain with its decentralized and immutable properties is introduced to secure and protect
the residential energy data transactions in the proposed MD-REM approach. However, data storage cost associated with energy
consumption and hourly energy prices in the blockchain could be more affordable to optimize the price based on the reduced
energy consumption of appliances. Therefore, IPFS is embodied with the blockchain network to enable data storage with better
and more reliable accessibility in a cost-efficient way. But, it depends on the validation of the smart contract and its permission
for data storage in an off-chain mechanism, i.e., IPFS, so that no malicious attacker can illegitimately access the data. After the
authentication of data by the smart contract (executes as a self-executable line of code), the energy consumption and hourly energy
prices 𝐻𝐸𝑝 data can be stored in an immutable IPFS data storage protocol.
With the ease of data storage and accessibility in IPFS, participants, i.e., each of the agent and utility provider, also get the
𝐸 𝐸 𝐸
respective hash keys {ℎ𝑘(𝜁𝑒𝑁𝑠 ), ℎ𝑘(𝜆𝑆𝑎 ), ℎ𝑘(𝜅𝑝𝐶 )} and ℎ𝑘(𝑈 𝑝𝑢 ) for monitoring the energy consumption of appliances (associated
10
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
with non-shiftable, shiftable, and critical controlled energy load) and the hourly energy prices. As a result, multiple agents, smart
grid operators, and utility providers containing the energy data can execute the transactions over the blockchain network based on
the authorization of hash keys assigned to them by the IPFS. Here, asymmetric key cryptography is employed on energy consumption
data and hourly energy prices with the help of public and private key 𝜖𝑈 𝑝𝑢 , 𝜀𝑈 𝑝𝑢 and 𝜀𝑀𝑙𝑎 , 𝜀𝑀𝑙𝑎 , generated for utility provider and
multiple agents, respectively. Therefore, security and transparency can be provided to add the energy consumption and hourly
energy prices data to the blockchain, which is represented as follows [41]:
𝐸 𝐸 𝐸
𝑆ℎ𝑑 (𝐸𝑑 (𝑀𝑙𝑎 , 𝑈 𝑝𝑢 )) = (ℎ𝑘(𝜁𝑒𝑁𝑠 ), ℎ𝑘(𝜆𝑆𝑎 , 𝜅𝑝𝐶 ), ℎ𝑘(𝑈 𝑝𝑢 )) (33)
𝛼
(𝑀𝑙𝑎 ,𝑈 𝑝𝑢 ) 𝜀 𝑟
𝜙𝜖𝑘 (𝐷𝑠𝑑𝑘 )(𝑆ℎ𝑑 ((𝐸𝑑 (𝑀𝑙𝑎 , 𝑈 𝑝𝑢 )))) = 𝑆ℎ𝑑 𝐸𝑑 (𝑀𝑙𝑎 , 𝑈 𝑝𝑢 ) (34)
11
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
𝐸
1: procedure Optimal_pricing(𝜏, 𝜋(𝑀𝑙𝑎 (𝑁𝑠𝐸 , 𝜏)), 𝜌𝑀𝑙𝑎 (𝑁𝑠 ),𝜏 )
′
2: The parameters of DQN approach 𝜚𝑣 and 𝜚𝑣 are initialized randomly
3: Calculate action value function for optimal policy
∑
4: 𝛹𝛿 (𝜋 𝑀𝑙𝑎 ,𝜏 , 𝜌𝑀𝑙𝑎 ,𝜏 ) = 𝑇𝑖=𝜏+1 𝜒 𝑖−𝜏−1 (𝑃 𝑀𝑙𝑎 (𝜁, 𝜅, 𝜆)𝑖−1 |𝜋 𝑀𝑙𝑎 ,𝜏 , 𝜌𝑀𝑙𝑎 ,𝜏 )
𝐸
5: if 𝐸𝑑 ∈ 𝜁𝑒𝑁𝑠 then
6: Obtain state {𝜋(𝑀𝑙𝑎 (𝑁𝑠𝐸 , 𝜏))
7: for 𝜏 time interval < 0 do
8: 𝐴𝑠𝑠𝑖𝑔𝑛𝑄 − 𝑣𝑎𝑙𝑢𝑒→ ←← 0
9: for E dopisode e
10: Calculate Q-value for non-shiftable appliances
𝐸 𝐸
11: 𝜓(𝜋 𝑀𝑙𝑎 (𝑁𝑠 ),𝜏 , 𝜌𝑀𝑙𝑎 (𝑁𝑠 ),𝜏 )
12: Maximize the obtained Q-value
𝐸
13: 𝜌𝑀𝑙𝑎 (𝑁𝑠𝐸 , 𝜏)∗ = 𝑎𝑟𝑔𝑚𝑎𝑥(𝜓(𝜋 𝑀𝑙𝑎 (𝑁𝑠 ),𝜏 )
14: Apply DQN approach by selecting the probability
∑ 𝐸 𝐸
15: 𝜌(𝜋, 𝜌, 𝑁𝑠𝐸 ) = 𝑚𝑎𝑥 𝛿 𝛹𝛿 (𝜋 𝑀𝑙𝑎 (𝑁𝑠 ),𝜏 , 𝜌𝑀𝑙𝑎 (𝑁𝑠 ),𝜏 )
16: end for
17: end for
𝐸
18: else if 𝐸𝑑 ∈ 𝜅𝑝𝐶 then
19: Obtain State 𝜋(𝑀𝑙𝑎 (𝐶 𝐸 , 𝜏))
20: for 𝜏 time interval < 0 do
21: 𝐴𝑠𝑠𝑖𝑔𝑛𝑄 − 𝑣𝑎𝑙𝑢𝑒→ ←← 0
22: for E dopisode e
23: Calculate Q-value for controllable appliances
′ 𝐸 𝐸
24: 𝜓 (𝜋 𝑀𝑙𝑎 (𝐶 ),𝜏 , 𝜌𝑀𝑙𝑎 (𝐶 ),𝜏 )
25: Maximize the obtained Q-value
′ 𝐸
26: 𝑟ℎ𝑜𝑀𝑙𝑎 (𝐶 𝐸 , 𝜏)∗ = 𝑎𝑟𝑔𝑚𝑎𝑥(𝜓 (𝜋 𝑀𝑙𝑎 (𝐶 ),𝜏 )
27: Apply DQN approach by selecting the probability
∑ 𝐸
28: 𝜌(𝜋, 𝜌, 𝐶 𝐸 ) = 𝑚𝑎𝑥 𝛿 𝛹𝛿 (𝜋 𝑀𝑙𝑎 (𝐶 ),𝜏 , 𝜌𝑀𝑙𝑎 ,𝜏 )
29: end for
30: end for
31: else
32: Obtain State 𝜋(𝑀𝑙𝑎 (𝑆 𝐸 , 𝜏))
33: for 𝜏 time interval < 0 do
34: 𝐴𝑠𝑠𝑖𝑔𝑛𝑄 − 𝑣𝑎𝑙𝑢𝑒→ ←← 0
35: for E dopisode e
36: Calculate Q-value for shiftable appliances
′′ 𝐸 𝐸
37: 𝜓 (𝜋 𝑀𝑙𝑎 (𝑆 ),𝜏 , 𝜌𝑀𝑙𝑎 (𝑆 ),𝜏 )
38: Maximize the obtained Q-value
′′ 𝐸
39: 𝜌𝑀𝑙𝑎 (𝑆 𝐸 , 𝜏)∗ = 𝑎𝑟𝑔𝑚𝑎𝑥(𝜓 (𝜋 𝑀𝑙𝑎 (𝑆 ),𝜏 )
40: Apply DQN approach by selecting the probability
′′ 𝐸
41: 𝜌𝑀𝑙𝑎 (𝑆 𝐸 , 𝜏)∗ = 𝑎𝑟𝑔𝑚𝑎𝑥(𝜓 (𝜋 𝑀𝑙𝑎 (𝑆 ),𝜏 )
42: end for
43: end for
44: end if
45: Calculate loss function in DQN approach
′ ′ ′ ′
46: 𝛶 (𝜚𝑣 ) = [(𝛬 + 𝜒 ∗ 𝑚𝑎𝑥(𝛹𝛿 (𝜋 , 𝜌 , 𝜚𝑣 ) − (𝛹𝛿 (𝜋 𝑀𝑙𝑎 , 𝜌𝑀𝑙𝑎 , 𝜚𝑣 )))2 ]
47: end procedure
where 𝑆ℎ𝑑 signifies the hash digest of the energy data 𝐸𝑑 transactions between the utility provider and multiple agents. 𝜙𝜖𝑘 denotes
the decryption of energy data associated with the multiple agents and utility provider with its public key 𝜖𝑘 . Ds denotes the digital
signature involved to provide secure energy data in the network with private key 𝜀𝑘 . So that energy data transactions can be
performed securely in the blockchain network. Further, Algorithm 2 explains the detailed procedure to perform the secure data
storage in blockchain with the time complexity of O(p) and O(q). Additionally, Fig. 3 represents the workflow of the proposed
MD-REM approach that highlights the residential energy data, i.e., energy consumption state and hourly energy prices, from the
energy loads and utility provider and apply the DQN approach to find the optimal price by optimizing the Q-value using optimal
policy. Furthermore, residential energy data is made secured and cost-efficient through the combination of blockchain and IPFS.
5. Performance evaluation
This section highlights the performance evaluation along with the experimental results of the proposed MD-REM approach consid-
ering the various performance evaluation metrics such as consumption profile for consumers, optimal energy price, reduced energy
consumption analysis, reward, and total profit analysis. In addition, a detailed explanation of the aforementioned parameters is
included in the subsection. These subsections include Dataset Description, simulation Settings, Experimental Results, and Blockchain-based
Results.
12
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
Fig. 4. Consumption Profile: (a) Energy consumption profile for consumers and (b) Optimal energy price for proposed MD-REM approach.
The proposed MD-REM approach is implemented over the benchmarked and standard dataset, i.e., Open Energy Information
(Open EI) dataset [42] to extract the information of energy consumption by the several home appliances for controllable and
shiftable energy loads of residential consumers. Further, the pre-processing is performed on the considered energy data to eliminate
and handle imbalanced, missing or zero values. This dataset provides 15-minute resolution load profiles for all major residential
buildings and end energy uses across all climate regions in the United States and is supported by the U.S. Department of Energy
(DOE). The data can be used to inform energy efficiency programs, demand response programs, and distributed energy resource
planning. More, hourly energy price is extracted from the PJM Data Miner as of 29th September 2022 [43] along with the energy
consumption data considered from the Open EI. Here, energy consumption is directly proportionate to the energy demand.
13
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
The proposed MD-REM approach is implemented in a high-level functional programming language, i.e., python on Windows
operating system (OS), which is configured as Intel(R) Core(TM) CPU @ 2.60 GHz with 16 GB RAM. Further, several open-source
libraries like Numpy v1.18.4, gym 0.19.0 [44], and Pandas v1.0.4 has been used to perform a complex task in the proposed approach.
Further, Table 4 represents the various simulation parameters considered for evaluating and predicting the results using the DQN-
based proposed MD-REM approach. Further, to make energy data accessible by all stakeholders smart contract is designed using the
Ethereum blockchain in contract-oriented language, i.e., solidity 0.8.12, and deployed in the Truffle suite. Finally, the vulnerability
assessment of the smart contract is done on the open-source tool, i.e., Mythril.
14
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
Table 4
Simulation parameters.
Parameters Value Existing approaches
Arrival time 𝜂(17, 12 ) Xu et al. [34]
Departure time 𝜂(8, 12 ) Xu et al. [34]
Energy consumed 𝜂(0.5, 0.12 ) Kumari et al. [3]
Learning iterations 5 ∗ 103 Xu et al. [34]
Start exploration rate 1 Kumari et al. [3]
Discount factor 0.9 Kumari et al. [3]
Fraction for Exploration 0.1 Lai et al. [28]
End exploration rate 0.02 Ahrarinouri et al. [26]
Batch size 32 Lai et al. [28]
Transactions numbers 1000 Abishu et al. [41]
Fig. 7. Comparative analysis of proposed MD-REM approach: (a) Reward analysis and (b) Total profit analysis.
The proposed MD-REM approach is implemented in a high-level python programming language using above mentioned
simulation settings to evaluate the experimental results for the considered parameters. Therefore, the experimental results are
explained in detail as follows.
15
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
Fig. 8. Comparative analysis of the proposed MD-REM approach: (a) Transaction efficiency and (b) Data storage cost.
16
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
Fig. 10. Energy demand analysis with the proposed MD-REM approach.
by the consumer for seven different days. 𝑋-axis represents the number of days, and the 𝑦-axis shows the profit calculated using
the objective function defined in Eq. (17). Here, the proposed approach has shown a promising outcome compared to the baseline
approach-2. Next, the baseline approach-1 approach obtained high profit though its implementation in the real-life scenario is
impossible as it knows the future prices. Thus, the total profit for the proposed approach, which is calculated as an optimization
using the DQN approach, lies between the profit analyzed of those mentioned above heuristic and the optimal approach. Moreover,
the MD-REM approach achieved an 86% energy saving during peak hours, surpassing the 79% energy saving obtained from the
baseline approach-2 heuristics [23] and the 92% saving of energy (appears to be unrealistic in real-life scenario) from the baseline
approach-1 optimal [27]. Furthermore, Fig. 10 shows the visualization of energy demand analysis over the period of time for the
proposed MD-REM approach based on the peak and non-peak hours. It is further observed from the energy demand analysis graph
that energy demand after applying the proposed optimization approach is lower and it is higher in non-peak hours than the peak
hours.
Moreover, the energy data, i.e., energy consumption and profit evaluated for the residential consumer, need to be stored in a
secure blockchain-based platform. So, a smart contract is designed to make the energy data accessible to all stakeholders. Different
parameters such as transaction efficiency, IPFS bandwidth utilization, and transaction cost have been considered to evaluate the
efficiency of blockchain usage in the proposed MD-REM approach. The blockchain-based results for the proposed MD-REM approach
are explained as follows.
In this subsection, the Ethereum blockchain-based results are evaluated in the proposed MD-REM approach using various
parameters like transaction efficiency, IPFS bandwidth utilization, and transaction cost.
17
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
6. Discussion
Residential energy management is one of the critical aspect that needs to be focused on for efficient demand response to regulate
the energy consumption of appliances for various energy loads. Thus, we consider a dynamic environment encompassing of non-
shiftable, shiftable, and controllable energy loads to reduce the energy consumption by the appliances. For that, multiple agents
handle and regulate the several energy loads to reduce the power burden on the smart grid. Further, we have adopted the DQN
approach (with MDP modeling) to determine the optimal price based on the reduced energy consumption state. So, multiple agents
involve in the residential energy management and receive the reward for obtaining the optimal price for the appliances acquiring
various energy load in the proposed MD-REM approach. Now, we have considered the utility provider that provide hourly energy
prices to the multiple agents along with the energy consumption state to finally optimize the price using DQN approach.
The residential energy data, i.e., hourly energy prices and energy consumption state, can be manipulated or forged by the
malicious attackers that is secured utilizing the decentralized blockchain platform and an intermediary IPFS storage protocol.
Multiple agents and utility provider access the IPFS cost-efficient protocol for residential energy data storage and IPFS allocate them
the hash keys based on the verification performed by the smart contract. Then, blockchain utilizes the asymmetric cryptography
using public and private key to secure the energy consumption and hourly energy prices associated with the multiple agent and
utility provider. Moreover, the MD-REM approach achieved an 86% reduction in energy consumption during peak hours, surpassing
the 79% reduction obtained from the baseline approach-1. Furthermore, the performance evaluation of the proposed MD-REM
approach is analyzed and simulated considering various parameters such as consumption profile, optimal energy price, reduced
energy consumption, reward, and profit analysis. Moreover, we have also shown the blockchain-based results for the proposed
MD-REM approach with various metrics such as transaction efficiency, IPFS bandwidth utilization, and data storage cost.
In this paper, we propose a multi-agent-based decentralized residential energy management, MD-REM approach, using a
combinatorial DQN and blockchain. The decentralized and immutable characteristics of the blockchain platform secures and protects
the residential energy data for accomplishing decentralized energy management. Moreover, incorporation of blockchain with the
IPFS helps to provide cost-efficient and reliable data storage for the stakeholders involved in the energy management procedure.
Furthermore, the proposed MD-REM approach focuses on calculating the reduced energy consumption of consumers classified based
on the various energy loads such as non-shiftable, shiftable, and controllable energy loads. Moreover, the DQN agent optimizes the
energy price of the consumers with the help of an approximation of Q-value by generating the Q-table in the Q-learning approach
based on the reduced energy consumption. Furthermore, the loss function is evaluated with the help of the DQN approach, in which
multiple agents handle each of the energy loads to optimize the action-value function for the considered policy. Thus, loss function
evaluation for the proposed MD-REM approach yields the reward for multiple agents handling the appliances of several energy
loads. Finally, the performance of the proposed MD-REM approach is evaluated with the help of various performance evaluation
metrics such as optimal energy price, reduced energy consumption, reward, and total profit analysis. Moreover, the blockchain-based
results for the proposed MD-REM approach is evaluated in terms of various metrics such as transaction efficiency, IPFS bandwidth
utilization, and data storage cost.
In future, the security and privacy of consumer energy data will be preserved by incorporating the distributed federated learning
concept with different weight encryption algorithms considering a real-time scenario. Moreover, the applied DQN algorithm in the
proposed approach is not fully decentralized for consumers and agents involved in the dynamic environment. So, federated learning
will make the learning environment distributed, reducing the system’s computational complexity.
Aparna Kumari: Writing – review & editing, Writing – original draft, Software, Resources, Conceptualization. Riya Kakkar:
Writing – original draft, Software, Resources, Methodology, Investigation. Sudeep Tanwar: Writing – review & editing, Writing –
original draft, Supervision, Formal analysis, Data curation, Conceptualization. Deepak Garg: Validation, Supervision, Methodology,
Funding acquisition, Formal analysis. Zdzislaw Polkowski: Writing – review & editing, Visualization, Validation, Supervision,
Software. Fayez Alqahtani: Writing – review & editing, Validation, Funding acquisition, Data curation, Conceptualization. Amr
Tolba: Writing – review & editing, Supervision, Funding acquisition, Formal analysis, Data curation, Conceptualization.
Data availability
18
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
Acknowledgment
This work was funded by the Researchers Supporting Project Number (RSPD2024R681), King Saud University, Riyadh, Saudi
Arabia.
References
[1] W. Liu, D. Qi, F. Wen, Intraday residential demand response scheme based on peer-to-peer energy trading, IEEE Trans. Ind. Inform. 16 (3) (2020)
1823–1835.
[2] Y. Ye, Y. Tang, H. Wang, X.-P. Zhang, G. Strbac, A scalable privacy-preserving multi-agent deep reinforcement learning approach for large-scale peer-to-peer
transactive energy trading, IEEE Trans. Smart Grid 12 (6) (2021) 5185–5200.
[3] A. Kumari, S. Tanwar, Reinforcement learning for multiagent-based residential energy management system, in: 2021 IEEE Globecom Workshops, GC
Wkshps, 2021, pp. 1–6.
[4] A. Baniasadi, D. Habibi, O. Bass, M.A.S. Masoum, Optimal real-time residential thermal energy management for peak-load shifting with experimental
verification, IEEE Trans. Smart Grid 10 (5) (2019) 5587–5599.
[5] X. Kou, F. Li, J. Dong, M. Starke, J. Munk, Y. Xue, M. Olama, H. Zandi, A scalable and distributed algorithm for managing residential demand response
programs using alternating direction method of multipliers (admm), IEEE Trans. Smart Grid 11 (6) (2020) 4871–4882.
[6] K. Utkarsh, F. Ding, A peer-to-peer market-based control strategy for a smart residential community with behind-the-meter distributed energy resources,
in: 2022 IEEE Power and Energy Society General Meeting, PESGM, 2022, pp. 1–5.
[7] E.M.S. Duque, P.P. Vergara, P.H. Nguyen, A. van der Molen, J.G. Slootweg, Conditional multivariate elliptical copulas to model residential load profiles
from smart meter data, IEEE Trans. Smart Grid 12 (5) (2021) 4280–4294.
[8] T. Ma, W. Pei, H. Xiao, G. Zhang, S. Ma, The energy management strategies of residential integrated energy system considering integrated demand response,
in: 2021 IEEE/IAS Industrial and Commercial Power System Asia, ICPS Asia, 2021, pp. 188–193.
[9] S. Barua, N. Mohammad, Optimization-based energy management system to minimize electricity bill for residential customer, in: 2021 3rd International
Conference on Electrical and Electronic Engineering, ICEEE, 2021, pp. 105–108.
[10] S. Rafique, M.J. Hossain, M.S.H. Nizami, U.B. Irshad, S.C. Mukhopadhyay, Energy management systems for residential buildings with electric vehicles and
distributed energy resources, IEEE Access 9 (2021) 46997–47007.
[11] R. Rajarajeswari, D. Suchitra, S. Fernandez, G.L.M. Rao, A novel demand side management scheme for residential buildings using smart system, Int. J.
Renew. Energy Res. 11 (4) (2021) 1652–1662.
[12] M. Ahrarinouri, M. Rastegar, K. Karami, A.R. Seifi, Distributed reinforcement learning energy management approach in multiple residential energy hubs,
Sustain. Energy Grids Netw. 32 (2022) 100795.
[13] S. Yelisetti, V.K. Saini, R. Kumar, R. Lamba, A. Saxena, Optimal energy management system for residential buildings considering the time of use price
with swarm intelligence algorithms, J. Build. Eng. 59 (2022) 105062.
[14] R. Ramadan, Q. Huang, O. Bamisile, A.S. Zalhaf, Intelligent home energy management using internet of things platform based on nilm technique, Sustain.
Energy Grids Netw. 31 (2022) 100785.
[15] H. Çimen, N. Çetinkaya, J.C. Vasquez, J.M. Guerrero, A microgrid energy management system based on non-intrusive load monitoring via multitask
learning, IEEE Trans. Smart Grid 12 (2) (2021) 977–987.
[16] K. Oj, H. Dagdougui, Q-learning-based model predictive control for energy management in residential aggregator, IEEE Trans. Autom. Sci. Eng. 19 (1)
(2022) 70–81.
[17] W. Angano, P. Musau, C.W. Wekesa, Design and testing of a demand response q-learning algorithm for a smart home energy management system, in:
2021 IEEE PES/IAS PowerAfrica, 2021, pp. 1–5.
[18] K. Shouryadhar, C.-N. Chen, Y.-C. Chen, An enhanced reinforcement learning based approach of energy management optimization for microgrids, in: 2022
IET International Conference on Engineering Technologies and Applications, IET-ICETA, 2022, pp. 1–2.
[19] T. Han, K. Muhammad, T. Hussain, J. Lloret, S.W. Baik, An efficient deep learning framework for intelligent energy management in iot networks, IEEE
Internet Things J. 8 (5) (2021) 3170–3179.
[20] A. Kumari, D. Vekaria, R. Gupta, S. Tanwar, Redills: Deep learning-based secure data analytic framework for smart grid systems, in: 2020 IEEE International
Conference on Communications Workshops, ICC Workshops, 2020, pp. 1–6.
[21] A. Kumari, R. Gupta, S. Tanwar, Prs-p2p: A prosumer recommender system for secure p2p energy trading using q-learning towards 6 g, in: 2021 IEEE
International Conference on Communications Workshops, ICC Workshops, 2021, pp. 1–6.
[22] R. Kakkar, R. Gupta, S. Tanwar, J.J.P.C. Rodrigues, Coalition game and blockchain-based optimal data pricing scheme for ride sharing beyond 5g, IEEE
Syst. J. 16 (4) (2022) 6321–6327.
[23] V.J. Gutierrez-Martinez, C.A. Moreno-Bautista, J.M. Lozano-Garcia, A. Pizano-Martinez, E.A. Zamora-Cardenas, M.A. Gomez-Martinez, A heuristic home
electric energy management system considering renewable energy availability, Energies 12 (4) (2019).
[24] F. Jiang, C. Zheng, D. Gao, X. Zhang, W. Liu, Y. Cheng, C. Hu, J. Peng, A novel multi-agent cooperative reinforcement learning method for home energy
management under a peak power-limiting, in: 2020 IEEE International Conference on Systems, Man, and Cybernetics, SMC, 2020, pp. 350–355.
[25] M.S.H. Nizami, M.J. Hossain, E. Fernandez, Multiagent-based transactive energy management systems for residential buildings with distributed energy
resources, IEEE Trans. Ind. Inform. 16 (3) (2020) 1836–1847.
[26] M. Ahrarinouri, M. Rastegar, A.R. Seifi, Multiagent reinforcement learning for energy management in residential buildings, IEEE Trans. Ind. Inform. 17
(1) (2021) 659–666.
[27] X. Wang, X. Mao, H. Khodaei, A multi-objective home energy management system based on internet of things and optimization algorithms, J. Build. Eng.
33 (2021) 101603.
[28] B.-C. Lai, W.-Y. Chiu, Y.-P. Tsai, Multiagent reinforcement learning for community energy management to mitigate peak rebounds under renewable energy
uncertainty, IEEE Trans. Emerg. Top. Comput. Intell. 6 (3) (2022) 568–579.
[29] T. Chen, S. Bu, X. Liu, J. Kang, F.R. Yu, Z. Han, Peer-to-peer energy trading and energy conversion in interconnected multi-energy microgrids using
multi-agent deep reinforcement learning, IEEE Trans. Smart Grid 13 (1) (2022) 715–727.
[30] J. van Tilburg, L.C. Siebert, J.L. Cremer, Marl-idr: Multi-agent reinforcement learning for incentive-based residential demand response, in: 2023 IEEE
Belgrade PowerTech, 2023, pp. 1–8.
[31] M.S. Hossain, C. Enyioha, Multi-agent energy management strategy for multi-microgrids using reinforcement learning, in: 2023 IEEE Texas Power and
Energy Conference, TPEC, 2023, pp. 1–6.
[32] G. Guo, Y. Gong, Multi-microgrid energy management strategy based on multi-agent deep reinforcement learning with prioritized experience replay, Appl.
Sci. 13 (5) (2023).
[33] L. Xiong, Y. Tang, C. Liu, S. Mao, K. Meng, Z. Dong, F. Qian, A home energy management approach using decoupling value and policy in reinforcement
learning, Front. Inf. Technol. Electron. Eng. 24 (9) (2023) 1261–1272.
19
A. Kumari et al. Journal of Building Engineering 87 (2024) 109031
[34] X. Xu, Y. Jia, Y. Xu, Z. Xu, S. Chai, C.S. Lai, A multi-agent reinforcement learning-based data-driven method for home energy management, IEEE Trans.
Smart Grid 11 (4) (2020) 3201–3211.
[35] M. Perera, R. Disanayaka, E. Kumara, W. Walisundara, H. Priyadarshana, E. Ekanayake, K. Hemapala, Multi agent based energy management system for
microgrids, in: 2020 IEEE 9th Power India International Conference, PIICON, 2020, pp. 1–5.
[36] C.-S. Karavas, G. Kyriakarakos, K.G. Arvanitis, G. Papadakis, A multi-agent decentralized energy management system based on distributed intelligence for
the design and control of autonomous polygeneration microgrids, Energy Convers. Manage. 103 (2015) 166–179.
[37] A. Kumari, S. Tanwar, A reinforcement-learning-based secure demand response scheme for smart grid system, IEEE Internet Things J. 9 (3) (2021)
2180–2191.
[38] Q. Fu, Z. Li, Z. Ding, J. Chen, J. Luo, Y. Wang, Y. Lu, Ed-dqn: An event-driven deep reinforcement learning control method for multi-zone residential
buildings, Build. Environ. 242 (2023) 110546.
[39] C. Vandana, A. Chaturvedi, S. Ambala, R. Dineshkumar, J.V.N. Ramesh, B.S. Alfurhood, Optimizing residential dc microgrid energy management system
using artificial intelligence, Soft Comput. (2023) 1–8.
[40] M. Huang, X. Lin, Z. Feng, D. Wu, Z. Shi, A multi-agent decision approach for optimal energy allocation in microgrid system, Electr. Power Syst. Res. 221
(2023) 109399.
[41] H.N. Abishu, A.M. Seid, Y.H. Yacob, T. Ayall, G. Sun, G. Liu, Consensus mechanism for blockchain-enabled vehicle-to-vehicle energy trading in the internet
of electric vehicles, IEEE Trans. Veh. Technol. 71 (1) (2022) 946–960.
[42] Open EI, Open energy information, 2022, https://openei.org/datasets/dataset?sectors=smartgrid. (Online Accessed 29 September 2022).
[43] pjm data miner, 2022, https://www.pjm.com/markets-and-operations/etools/data-miner-2.aspx. (Online Accessed 29 September 2022).
[44] Gym, 2022, https://github.com/openai/gym. (Online Accessed 29 September 2022).
20