Li2020 Power System Clearance Time Perdiction
Li2020 Power System Clearance Time Perdiction
Song KE Hui DU
School of Electrical Engineering and School of Electrical Engineering and
Automation Automation
Wuhan University Wuhan University
Wuhan, China Wuhan, China
729396285@qq.com duhui1994@hotmail.com
2020 IEEE Sustainable Power and Energy Conference (iSPEC) | 978-1-7281-9164-5/20/$31.00 ©2020 IEEE | DOI: 10.1109/ISPEC50848.2020.9351275
Abstract—With the expanding of the scale of power grids, remove the failure in time may leads to chain reaction,
the uncertainty and complexity of power system operation expansion of failure scale, and then leads to the
are increasing. Critical clearing time (CCT) is the key to occurrence of blackout [2-9].
transient stability assessment (TSA) in power system
At present, time domain simulation, direct method and
operation, security defense, maintenance and other scenarios.
The application of machine learning to TSA has become a machine learning are the main methods of power system
research hotspot. In this paper, a method of obtaining the transient stability assessment. Time domain simulation
CCT by machine learning is proposed. First,time domain method is also called step-by-step integration method.
simulation under electrical fault is utilized to obtain samples According to the topological relationship between the
for machine learning. Second, pattern features of transient various components of the power grid, a set of differential
stability are obtained from the samples , the maximum equations and algebraic equations are constructed to
information coefficient method is used to reduce feature describe the state variables and network structure of the
dimension, and the Box-Cox transformation is used to
improve data distribution characteristics. Further, based on
system respectively. Taking the power flow solution as
the XGBoost, Random Forest algorithms, a stacking model the initial value to solve the equation, the curve of the
with SVR as the meta learner is established to improve the system state variables with time is drawn, and the
performance of CCT prediction. In the end, this proposed transient stability of the system is judged according to the
method is verified via the CEPRI 8-machine 36-bus system. power angle difference of the generator rotor. Due to the
The results prove that this method can provide accurate large amount and time-consuming calculation, it is
CCT predictions, and it is also robust to load changes.
difficult to meet the requirements of large-scale system
for calculation speed. The direct method can be used to
Index Terms—critical clearing time (CCT), transient
stability assessment (TSA), kernel call, feature engineering, analyze the transient stability from the energy point of
stacking view. It does not need to calculate the trajectory of the
whole system. It can quickly distinguish the transient
I. INTRODUCTION stability of the system and give the stability margin
L ARGE scale power grid carries the transmission and [10-14]. However, the result tends to be conservative, and
exchange of electric power within and between it is not suitable for large systems or a series of
regions, its operation mode and safety control means has disturbances.
become more and more complex, and the risk of safe Machine learning can acquire a large number of
operation is increasingly prominent. The application of samples for off-line learning, and directly call the model
artificial intelligence technology to analyze and prevent in the online prediction stage. It does not need complex
the risk of operation of power grid has become a research operations, and has the advantages of fast speed and high
hotspot. In recent years, the scale of power grid is accuracy. At present, the successful application of
expanding, new energy is developing rapidly, and a large artificial intelligence algorithms in transient stability
number of wind turbine units have connected to the power assessment mainly includes pattern recognition [15],
grid [1]. In addition, new energy load is also gradually artificial neural network [16] and expert system. Some
increasing its scale, such as new energy vehicle charging scholars have applied DBN to power system transient
equipment and large-scale energy storage components stability assessment. The model can give full play to the
connected to the grid, which brings great challenges to the advantages of strong feature extraction ability. At the
operation of the grid to maintain stability. In case of same time, the generalization ability of the system can be
power grid failure. That measures cannot be taken to improved due to the use of a large number of
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 06,2021 at 22:15:32 UTC from IEEE Xplore. Restrictions apply.
dimensionless data [17]. However, considering it only and fairness. It assumes that the grid can be drawn on
judges whether the power grid can maintain transient the scatter diagram of related variable pairs, which
stability, the cost of security control in risk assessment partitions the data to encapsulate the relationship between
cannot be obtained, the stability margin cannot be them.
analyzed, and the analysis value is limited. In reference Specifically, the MIC between attribute 𝐴𝑖 and
[18], the deep learning method of adaptive neuro-fuzzy prediction result 𝐶 can be defined as follows: the
inference system is used for online evaluation of critical training data set D composed of two variables (𝑎𝑖 ,𝑐)
clearing time. ANFIS can combine neural network and will be divided into different x rows or y columns
fuzzy logic principle, and combine the advantages of both according to its value. This division is called the x-by-y
in a single framework. The advantages are less input data grid G, 𝐷|𝐺 is the probability distribution of D on G, it
and high training efficiency. compared with the can be summarized as the following equation:
traditional artificial neural network, the prediction
𝐼 ∗ (𝐷𝑗 ,𝑥,𝑦)
capability is better and the efficiency is higher. ANFIS 𝑀𝐼𝐶(𝑎𝑖 , 𝑐) = 𝑚𝑎𝑥
⏟
𝑙𝑜𝑔2 min(𝑥,𝑦)
technology can estimate CCT quickly and accurately { 𝑥𝑦<𝐵 (1)
under different load levels and fault locations. Therefore, 𝐼∗ (𝐷𝑗 , 𝑥, 𝑦) = 𝑚𝑎𝑥𝐺 (∑𝑥,𝑦 𝑝(𝑥, 𝑦)log(
𝑝(𝑥,𝑦)
))
the estimated CCT can be used for real-time setting of 𝑝(𝑥)𝑝(𝑦)
relay delay time to avoid false trip. In reference [19], a Where, M is the number of samples in the training data
CCT acquisition method based on numerical calculation set, and B is a function of sample size, which is usually
and artificial intelligence technology is proposed. Firstly, set to M0.6.
the critical trajectory method based on critical generation
is used to calculate CCT, and then the extreme learning B. Box-Cox Transformation
machine is used to predict CCT. This method can obtain Box-Cox transformation is a kind of generalized power
the CCT of load change, different fault occurrence, transformation method. The transformation can reduce the
accuracy and fast calculation under the condition of correlation between the unobservable error and the target
considering the controller, and finally get better variable. In addition, we can get the new characteristics of
prediction than the traditional neural network. the transformed target, such as the normal distribution of
This paper proposes a stacking model to obtain CCTs the transformed data, it is defined as:
for transient stability assessment. It’s capable of 𝑦 𝜆 −1
predicting more accurate CCTs than other methods such ,𝜆 ≠ 0
𝑦(𝜆) = { 𝜆 (2)
as XGBoost and can be used for online TSA. Through the ln(𝑦) ,𝜆 = 0
stacking ensemble model, different types of learners can
𝑦(𝜆) is the result of Box-Cox transformation. 𝑦 is the
observe data characteristics from different angles, learn
data from a more comprehensive perspective, and get original variable and 𝜆 is the transformation factor. It
more accurate results. In this paper, the time domain can be adjusted step by step according to the distribution
simulation is used to obtain samples, and the kernel call of 𝑦(𝜆).
method is used to control the simulation software PSASP In the regression problem, support vector regression
for calculation, which greatly improves the efficiency of and other regression algorithms often require the target
sample acquisition. Python programming is used for variable to obey the normal distribution. Taking linear
Feature Engineering and stacking implementation. regression as an example, the formula is defined as
follows:
II. PRETREATMENT OF SAMPLES 𝑌 = 𝑋𝛽 + 𝜀 (3)
After the initial samples are obtained, the feature Where, ε is the column vector composed of unobserved
engineering processing can effectively reduce the features, random components, and β is the parameter vector. ε
avoid data redundancy, and change the distribution of should observe the normal distribution, but the actual data
data, which is the preparation for the model building,and often do not meet this requirement. To solve this problem,
then making the prediction better. So, in this paper, the the CCT column vector is transformed by Box-Cox
maximum information coefficient method is used to transformation to make it closer to normal distribution.
reduce feature dimension, and the Box-Cox Skew function is used to calculate in order to reflect the
transformation is used to improve data distribution asymmetry degree of CCT column vector. The larger the
characteristics. skew value is, the more deviates from the normal
A. Maximal Information Coefficient distribution.
The maximum information coefficient is a new method The prediction algorithm first uses the Box-Cox
to detect the nonlinear correlation between variables. It transformation on CCT column vector, and then apply the
breaks through the limitation that Pearson coefficient transformed 𝐶𝐶𝑇′ vector to train the prediction model.
method can only reflect the linear correlation between Finally, the prediction result of the model is inversely
variables, MIC can reflect all the linear and non-linear transformed to the final critical clearing time result by the
correlation between variables, and has strong universality inverse transformation formula:
This work was supported by the National Key R&D Program of (𝑦 ∗ 𝜆 + 1)1/𝜆 , 𝜆 ≠ 0
China 2017YFB0902600 and State Grid Corporation of China Project 𝑦′ = { 𝑦 (4)
SGJS0000DKJS1700840
𝑒 , 𝜆 = 0
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 06,2021 at 22:15:32 UTC from IEEE Xplore. Restrictions apply.
III. CONSTRUCTION OF PREDICTION MODEL model fitting training data; Ω(𝑓𝑘 ) is the regularization
Stacking algorithm is based on the idea of cross term, and the goal is to get the corresponding model f (x)
validation, and uses the output of multiple models as the when ℒ(𝜙) is minimized the regularization term is used
input to train the meta learner. This algorithm integrates to prevent over fitting, it’s defined as:
the advantages of each model, reduces the risk of over 1
𝛺(𝑓𝑘 ) = 𝛾𝑇 + 𝜆||𝜔||2 = 𝛾𝑇 + 𝜆 ∑𝑇𝑗=1 𝜔𝑗 2
1
(7)
fitting of models, makes the prediction deviation 2 2
expectation after ensemble smaller than that of single Where, 𝛾 and 𝜆 denote the penalty coefficient of the
model, and improves the accuracy of the model. In this model. 𝑇 and 𝜔 denote the number of leaves and the
paper, XGBoost, LightGBM, AdaBoost and Random weight of leaves of the k-th tree, respectively
Forest are selected as base learners, and support vector In each iteration, a new function𝑓(𝑥𝑖 ) is added to reduce
regression as the meta learners. the objective function as much as possible. After the t-th
A. XGBoost iteration, in order to simplify the calculation, only lower
XGBoost algorithm is a lifting method based on order terms are reserved after Taylor series expansion. the
GDBT. According to the idea of ensemble model, the objective function becomes:
classification and regression tree (CART) is combined. 1
ℒ (𝑡) = ∑𝑛𝑖=1[𝑔𝑖 𝑓𝑡 (𝑥𝑖 ) + ℎ𝑖 𝑓𝑡 2 (𝑥𝑖 )] + 𝛺(𝑓𝑡 ) (8)
Each basic model CART is trained on the basis of the 2
previously trained learner, so that the model can be 𝜕(𝑦𝑖 ,𝑦̂ (𝑡−1) ) 𝜕2 (𝑦𝑖 ,𝑦̂ (𝑡−1) )
Where, 𝑔𝑖 = and ℎ𝑖 =
optimized in the gradient direction of residual reduction, 𝜕𝑦̂ (𝑡−1) 𝜕(𝑦̂ (𝑡−1) )2
and the purpose of seeking the optimal solution quickly Define𝐼𝑗 = {𝑖|𝑞(𝑥𝑖 ) = 𝑗}, 𝐺𝑖 = ∑𝑖∈𝐼𝑗 𝑔𝑖 , 𝐻𝑖 = ∑𝑖∈𝐼𝑗 ℎ𝑖
can be realized. ∑𝑖∈𝐼 𝑔𝑖
𝑗
CART is similar to the decision tree. The difference is and 𝜔𝑗 = − ∑ , The objective function eventually
𝑖∈𝐼𝑗 ℎ𝑖 +𝜆
that each node of the decision tree can only be a discrete gets calculated as follows:
value, while the leaf node of cart can be a continuous
value. Therefore, CART can be used for both regression 1 𝐺2𝑗
ℒ̃ (𝑡) = − ∑𝑇𝑗=1 +𝛾∙𝑇 (9)
analysis and classification problems. CART sets a given 2 𝐻𝑗 +𝜆
value at each node. If the eigenvalue is greater than the The actual training process of XGBoost model can be
value, CART extends to the left subtree, otherwise, it described as follows: Get the CART function added
extends to the right subtree. According to the size of each iteratively. When the number of tree models continues to
feature, the sample is divided into two parts, the increase and the accuracy of the model increases by less
segmentation error is calculated, and the scheme with the than s, stop the iteration is and the number of tree models
smallest segmentation error is selected. The CART is no longer increased to get the final XGBoost model. In
architecture is shown in Fig.1 each iteration, in order to get a new function 𝑓𝑡 , starting
from a single leaf node, a tree branch is added to each leaf
node at a time. In all possible tree growth schemes, the
scheme to minimize the optimal loss function ℒ̃ (𝑡) is
selected, which is carried out in such a cycle.
B. Stacking
Stacking is a kind of optimization scheme which
combines multiple strong learning models to further
strengthen the training capability. Compared with bagging
and boosting, the framework uses the same type of basic
learners to construct, and stacking is to combine different
Fig. 1. Architecture of CART types of basic learners, because different types of basic
In this paper, program PSASP is used to obtain learners have great differences in data space and structure
features-CCT samples through time domain simulation. learning, different types of learners can observe data
After feature extraction, we get data set {𝐷 = {(𝑥𝑖 , characteristics from different angles, learn data from a
more comprehensive perspective, and get a more accurate
𝑦𝑖 ):𝑖 = 1,2, … , 𝑥𝑖 ∈ ℝ𝑚 , 𝑦𝑖 ∈ ℝ} , there are n result.
samples in total. Each sample has m features and
corresponds to a target value 𝑦𝑖 , predicting that there are
k regression trees, the model can be denoted as:
𝑦̂𝑖 = ∑𝐾
𝑘=1 𝑓𝑘 (𝑥𝑖 ) , 𝑓𝑘 ∈ ℱ (5)
𝑓𝑘 represents a regression tree, 𝑓𝑘 (𝑥𝑖 ) represents the
calculation score of the k-th tree for the i-th sample in the
data set, and the space of CART tree is ℱ.
The objective function is:
Fig. 2. Architecture of Stacking
ℒ(𝜙) = ∑𝑛𝑖=1 𝑙(yi ,ŷi ) + ∑𝐾
𝑘=1 Ω(𝑓𝑘 ) (6)
Stacking uses the idea of k-fold cross validation,
Where, 𝑙 is the error function, reflecting the degree of usually 5-fold cross validation, and 3-fold or 4-fold cross
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 06,2021 at 22:15:32 UTC from IEEE Xplore. Restrictions apply.
validation depending on the situation when there is a lot predicted result is considered accurate. Compared with
of data. Taking the stacking model of 5-fold cross the total number of prediction samples, the prediction
validation as an example, as shown in Fig.2, firstly, the accuracy index is obtained:
training set is randomly divided into five parts, and one 𝑛
part is selected as the validation data and the other four 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = × 100% (10)
𝑁
parts are combined into one training data set, so there are Where, N is the total number of samples, and n is the
five combinations. In each combination mode, model 1 is number of predicted correct samples.
used for training, and the validation data and test set are In addition, root mean square error is used for auxiliary
predicted. The five combinations are trained and evaluation of the model.
predicted in turn. The prediction results of all validation
1
data are combined to obtain vectors with the same length 𝑅𝑀𝑆𝐸(X,h) = √ ∑𝑚 (𝑖) (𝑖) 2
𝑖=1(ℎ(𝑥 ) − 𝑦 ) (11)
𝑚
and sample number of training set as the training set part
of new feature 1. The training results of the test set under Where, ℎ(𝑥 (𝑖) ) is the prediction value of the critical
these five combinations are averaged, and the test set part clearing time of the i-th sample, and 𝑦 (𝑖) is the
of the new feature 1 is obtained. Similarly, using model 2, simulation value of the i-th sample.
model 3… Repeat the above steps for model n to obtain
new feature 2, new feature 3 … new feature n, taking all
IV. RESULTS
new features as the input of the meta learner, the original
prediction target is still predicted to train the meta learner. A. Sample Acquisition and Feature Engineering
The flow chart of implementation steps in this paper is In this paper, the CEPRI 8-machine 36-bus system is
shown in Fig.3. selected as the test example, as shown in Fig.4. The whole
system consists of 9 load nodes and 8 generators. The
reference power of the system is set as 100MVA. The DC
transmission line between bus 33 and 34 is changed to
AC line. The whole grid operates in pure AC mode.
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 06,2021 at 22:15:32 UTC from IEEE Xplore. Restrictions apply.
TABLE I. XGBOOST RESULTS WITH &WITHOUT FEATURE ENGINEERING
XGBoost ACCURACY RMSE/s
TRAIN SET TEST SET TRAIN SET TEST SET
Before feature 99.69% 92.41% 0.0017 0.0310
engineering
After feature 99.78% 93.29% 0.0014 0.0293
engineering
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 06,2021 at 22:15:32 UTC from IEEE Xplore. Restrictions apply.
The load level of train set and test set is between 80% REFERENCES
and 120%. The sample with load level of 75% is collected [1] Dong Qing, Li Lu, Han,Feng. Influence of interference in PMU
and used as the test set to examine the generalization data on Calculation of line frequency parameters[J]. Electric
Power Science and Engineering, 2017,33(09):56-63.
ability of the model. The test set is predicted by single
[2] Li Mingyang, Fang Lianhang, Zhao QianChuan, Zhu Wangcheng,
model and Stacking model that have got perfectly trained. Wan Xinshu. Overview of transmission line switching strategy for
The result is shown in TABLE Ⅳ. power grid security correction [J]. Power grid technology,
TABLE IV. RESULT OF GENERALIZATION ABILITY EXAMINATION 2017,41 (08): 2506-2513
K-fold ACCURACY [3] Jing Xu, Meng Ye, Xianyi Peng, Zhi Li. Influential factor
TRAIN SET TEST SET analysis of China's unsustainable electric power system: A case
XGBoost 99.78% 61.86% study of Chengdu Electric Bureau[J]. Energy Policy,2019,129.
[4] Farnaz Mahdavian, Stephen Platt, Marcus Wiens, Miriam
LightGBM 99.29% 65.25%
Klein,Frank Schultmann. Communication blackouts in power
AdaBoost 99.75% 62.28% outages: Findings from scenario exercises in Germany and
Random Forest 95.75% 56.78% France[J]. International Journal of Disaster Risk
Stacking 99.93% 67.37% Reduction,2020,46.
[5] Baik Sunhee, Davis Alexander L, Morgan M Granger. Assessing
According to TABLE Ⅳ, the prediction ability and the Cost of Large-Scale Power Outages to Residential
generalization ability of stacking ensemble model are still Customers.[J]. Risk analysis : an official publication of the
Society for Risk Analysis,2018,38(2).
better than other algorithms. For the load level of the train
[6] Nansheng Pang, Hao Liu, Shuyi Huang,Junjiao Meng.
set, the prediction accuracy of the test set of each model is Emergency rush repair task scheduling of distribution networks in
more than 90%. For the untrained load level of the train large-scale blackouts[J]. International Journal of Electrical Power
set (75% to the reference load level), the accuracy of the and Energy Systems,2016,82.
test set decreases to about 60%. Therefore, to improve the [7] Sridhar Adepu, Nandha Kumar andasamy, Jianying
Zhou, Aditya Mathur. Attacks on smart grid: power supply
generalization ability of the model, more load level interruption and malicious power generation[J]. International
samples need to be generated for model training. Journal of Information Security, 2020, Vol.19 (1), pp.189-211
[8] Chen Xiaoping, Jiao min, Huo Xiaoming. Enlightenment of
V. CONCLUSION India's "7.30" blackout on power grid development and
dispatching operation management in power receiving areas [J].
In this paper, the machine learning is used to predict Electrical application, 2013,32 (S2): 563-567
the critical clearing time of power grid fault, and the [9] Energy - Electric Power; Findings from University of New South
Wales Provide New Insights into Electric Power (Decision Tree
effectiveness of the method is tested and analyzed by Analysis To Identify Harmful Contingencies and Estimate
CEPRI 8-machine 36-bus system. The power grid Blackout Indices for Predicting System Vulnerability)[J]. Energy
operates in the mode of pure AC, and samples are Weekly News,2020.
obtained through time-domain simulation in the software [10] El-Abiad A H, Nagappan K. Transient Stability Regions of Mul
timachine Power Systems[J]. Power Apparatus & Systems IEEE
PSASP. The control method by kernel call is used to Transactions on, 1966, PAS-85(2):169-179.
make the sample acquisition procedure automatic. The [11] Gless G E. Direct Method of Liapunov Applied to
result form Stacking model is compared with that form Transient Power System Stability[J]. IEEE Transactions
models like XGBoost, etc. and it exceeds all other models on Power Apparatus & Systems, 1966,
PAS-85(2):159-168.
The results show that the kernel-calling program in this
[12] Xue, Yingzhao & Van Cutsem, Thierry & Pavella, Mania. (1988).
paper can call PSASP to collect fault data automatically, A Simple Direct Method for Fast Transient Stability Assessment
and constructs the original samples for critical clearing of Large Power Systems. Power Systems, IEEE Transactions on.
time prediction, which greatly improves the efficiency of 3. 400 - 412. 10.1109/59.192890.
[13] Xue, Yingzhao & Van Cutsem, Thierry & Pavella, Mania. (1989).
sample acquisition. The maximum information coefficient Extended equal area criterion justifications, generalizations,
method is used to quantitatively evaluate and rank the applications. Power Systems, IEEE Transactions on. 4. 44 - 52.
correlation between features and critical clearing time, 10.1109/59.32456.
and to eliminate irrelevant and repetitive features. In [14] Ni Yixin, Yao Liangzhong, Cai Zexiang. Comprehensive
method of direct transient stability analysis [J]. Chinese Journal of
addition, the CCT vector is transformed by Box-Cox electrical engineering, 1992 (06): 65-70
transformation to make it close to normal distribution, [15] Jain,Anil.K.Duin, Robert.P.W.Mao, Jianchang. Statistical pattern
which enhances the prediction ability of prediction model; recognition: a review. IEEE Transactions on Pattern Analysis and
The result shows that the stacking model has a higher Machine Intelligence. 2000, 22 (1): 4–37.
accuracy than the single models, the over fitting is [16] Liu Rong. Overview of basic principles of artificial neural
suppressed, and the generalization ability is enhanced. network [J]. Computer products and circulation, 2020 (06): 35 +
81
[17] Zhu Qiaomu, dangjie, Chen Jinfu, Xu Youping, Li Yinhong,
ACKNOWLEDGMENT Duan Xianzhong. Power system transient stability assessment
method based on deep confidence network [J]. Chinese Journal of
The authors gratefully acknowledge the funding and electrical engineering, 2018,38 (03): 735-743
contributions of the National Key R&D Program of China [18] Witsawa Phootrakornchai, Somchat Jiriwibhakorn. Online critical
2017YFB0902600 and supports from State Grid clearing time estimation using an adaptive neuro-fuzzy inference
Corporation of China Project SGJS0000DKJS1700840. system (ANFIS)[J]. International Journal of Electrical Power and
Energy Systems,2015,73.
[19] Irrine Budi Sulistiawati, Ardyono Priyadi, Ony Asrarul Qudsi,Adi
Soeprijanto, Naoto Yorino. Critical Clearing Time prediction
within various loads for transient stability assessment by means of
the Extreme Learning Machine method[J]. International Journal
of Electrical Power and Energy Systems,2016,77.
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 06,2021 at 22:15:32 UTC from IEEE Xplore. Restrictions apply.