Graph Neural Network-Based Fault Diagnosis: A Review
Graph Neural Network-Based Fault Diagnosis: A Review
Abstract—Graph neural network (GNN)-based fault diagnosis V Discussions on constructing association graph 8
(FD) has received increasing attention in recent years, due to V-A Using K-nearest neighbor method to
the fact that data coming from several application domains can construct the association graph . . . . . 8
arXiv:2111.08185v1 [eess.SY] 16 Nov 2021
2. Review-Oriented. Several architectures of GNNs are tasks that are hardly achievable when ignoring the functional
reviewed, and the feasible applications of these architectures dependency.
on fault diagnosis are explained. It is pointed out that one It has been found in [47]–[49] that the category of a
of difficulties of GNN-based fault diagnosis method lies in given node can be inferred from its neighbors, so except
the construction of the association graph, so several feasible for using time-series and image data for fault diagnosis, the
solutions are provided. introduction of the graph structure to the data is an alternative
3. Benchmark Study. The diagnosis performance of several and promising way. Furthermore, the involvement of graphs
GNN-based methods and baseline methods are compared makes it possible for the classical fault diagnosis based on
through three benchmark data sets. Based on the results time and space sensing to take into account dependencies
obtained, a discussion follows. type of inductive bias. On top of this, we argue that the
4. Future Research. With the expectation of identifying connections existing among nodes not only model explicit
possible research directions, several challenges related with functional dependencies, but also dependencies as well as
GNN-based fault diagnosis are presented and discussed. topological connections.
5. Open Source. The source code for this review will be Generally speaking, the common methods for processing
made available after the peer-review stage. The code provides graph structure among data include the random-walk [50],
the implementation details of the GNN-based fault diagnosis the graph-clustering [51], and GNN-based methods [39]. In
methods discussed in this paper. this paper, only GNN-based methods, including GCN, graph
attention network (GAT), and graph sample and aggregate
II. DATA - DRIVEN FAULT DIAGNOSIS (GraphSage), are considered.
With the rapid development of advanced sensing tech-
nologies and computational power, neural-network-based fault III. G RAPH NEURAL NETWORK
diagnosis (NNFD) has received considerable attention in in-
dustry. According to the various types of data representations, A. Mathematical notations
conventional NNFD methods can be divided into two cate- Mathematical notations used in this paper follow [53] and
gories, namely, image-based NNFD methods and time-series- are given in Table I. A simple graph can be represented as
based NNFD methods.
As shown in Fig. 3, taking a modern wind farm for example, G = G(V, E) (1)
sensor data contains rich heterogeneous information, such as
images of possible fan blade cracks and time-series data of where V and E are the sets of nodes and edges, respectively.
vibration, temperature, and speed. After obtaining the sensor Let vi ∈ V be a node and eij = (vi , vj ) ∈ E denote an edge
data, the NNDF methods can be constructed according to the between vi and vj . Then, the neighborhood of a node v can
structures described in Fig. 4 and Fig. 5, and the typical fault be defined as N (v) = {u ∈ V |(v, u) ∈ E}. Usually, a graph
diagnosis is realized. can be described by an adjacency matrix A ∈ RN ×N where
However, few attention has focused on the ubiquitous de- N is the number of nodes, that is, N = |V |. In particular,
pendencies among data. For example, considering the inter- Aij = 1 if {vi , vj } ∈ E and i 6= j; Aij = 1, otherwise.
action of neighboring wind turbines, a graph representation In undirected graph, Aij denotes an edge connection between
well models the relationship of all wind turbines in the wind nodes vi and vj , while in a directed graph, Aij represents
farm, and permits the assessment of the operation status of the an edge pointing from vi to vj . In practical applications, a
whole farm. In fact, even for a single wind turbine, a graph graph may have node features (also called attributes) X ∈
representation models the coupling and dependency among RN ×c where c is the dimension of a node feature vector. A
components, and provides additional value for the diagnosis degree matrix D ∈ RN ×N is a diagonal matrix, which can be
4
Fig. 4: Fault diagnosis framework based on time-series processing. A time-series-based network model is built to compute
the latent representations of data. To compute the probability for fault categories, linear and a softmax layers are used.
Fig. 5: Fault diagnosis framework based on image processing. Images are obtained through optical cameras or derived from
time-series data [52]. Images go through image processing network, several layers, and a softmax layer, yielding the
probability of fault categories.
PN
obtained as Dii = j=1 Aij . A graph can also be represented Table I: Notations
by the Laplacian matrix L, defined as: Notations Descriptions
|| Concatenation operation
L=D−A (2) |·| The length of a set
G A graph representation of a system in the fault-free case
Gf A graph representation of a system in the faulty case
An illustration of the relationship among the above three V The set of nodes in a graph
matrices is shown in Fig. 6. v A node v ∈ V
E The set of edges in a graph
N (v) The neighbors of a node v
B. GNN Architectures A The graph adjacency matrix
AT The transpose of the matrix A
The input of GNNs receives two contributions [39], i.e., the N The number of nodes, that is, N = |V |
c The dimension of a node feature vector
feature matrix X and the adjacency matrix A. The output of
X ∈ RN ×c The feature matrix of a graph
GNNs can be obtained by the general forward propagation D The degree matrix of A, that is, Dii = N
P
j=1 Aij
equation (3) L The Laplacian matrix of a graph, that is, L = D − A
l The layer index
Zgnn = sof tmax(gnnl (..., gnn2 (A, gnn1 (A, X)))) (3) t The time step/iteration index
δ Nonlinear activation function
where gnn denotes an operation of GNN and l is the number Θ Learnable model parameters
of layers in the architecture.
1) Graph convolutional networks (GCNs)
model has undergone many improvements, yielding to [28],
A GCN can be regarded as an extension of a CNN ar-
which proposed the computation
chitecture, and provides an effective method to extract rela-
tional/spatial features from graph structured inputs. The GCN Z = σ((D̃−0.5 ÃD̃−0.5 )XΘ) (4)
5
where à = A + IN , IN is identity matrix of order N , and where k represents the concatenation operator. Then, without
D̃ a diagonal matrix, whose diagonal elements are D̃ii = loss of generality, the output of the (l + 1)th hidden layer
P N
j=1 Ãij . σ is an activation function, e.g., the Relu function. become X
0
Θ ∈ Rc×c is the parameter matrix of the network to be hl+1
i = σ( Whlj aij ) (9)
learned, c0 represents the dimension of the output, and Z ∈ j∈N (i)
0
RN ×c is the output matrix.
where σ denotes the activation function, and hl is the output of
The cross-entropy loss function
the lth hidden layer, with h0 = X. The learnable parameters
Ntr X
M
X in GAT include the vector a and the parameter matrix W.
Loss = − Zij logYij (5) 3) Graph sample and aggregate
i=1 j=1
Similar to the GAT model, GraphSage belongs to the
is generally used to train the parameters. Here, Ntr repre- inductive learning model. In principle, GraphSage learns an
sents the number of the training points, M is the number aggregation function, which can aggregate the features of a
of categories, Z is obtained from Eq. (4), and Y is the specific node and its neighbors to obtain its high-order features
corresponding label of the training set. It should be noted that [56].
in a classification problem, the dimension of the output equals Denote the input feature matrix as X. The forward propa-
the number of categories, i.e., c0 = M . gation formula of the lth hidden layer in a GraphSage model
Even though GCNs have shown major improvements w.r.t. is:
preceding non-relational architectures, there still exists some
aspects to be improved: hlv = σ(Wl Concat(hl−1 l−1
v , Aggre(hu , ∀u ∈ N (v))) (10)
(1) Shallow GCN network cannot spread label information
where Concat represents the concatenation operation, Aggre
on a large scale, which limits its receptive field [30].
performs an aggregation operation on features, and hl the
(2) The deep GCN network leads to over-smoothing solu-
output of the lth hidden layer, h0 = X. Graphsage uses four
tions. A multi-layer GCN will make data become highly simi-
types of feature aggregation operations, including mean aggre-
lar in the forward transmission process, and cannot be correctly
gator, GCN aggregator, Long Short-Term Memory (LSTM)
distinguished, which greatly reduces the model performance.
aggregator, and a pooling aggregator. GraphSage does not
(3) GCN belongs to the transductive learning model. When
involve the attention mechanism, so it treats all neighbor
dealing with new data, new nodes must be introduced to
nodes equally. Due to the inductive learning characteristic,
modify the original association graph, and the whole GCN
GraphSage can be used for an online fault diagnosis task with
model must be trained again to adapt to the new graph.
a lower computational complexity compared to GCN.
(4) For a given node, GCN considers all its neighbors to
4) Graph auto-encoder (GAE)
be equally important, instead of paying selective attention to
A GAE is an unsupervised learning model, similar to the
special neighbors: this limits the performance.
Above deficiencies not only restrict performance, but also auto-encoder [57] [58] [59]. GAE uses a GCN layer as the
constraint applications. To address these limits, several new encoder, its input includes the association graph and the feature
GNN models have been proposed over time. matrix, and the output is the coding value  of the node in
2) Graph attention networks the graph.
A GAT architecture uses the attention mechanism to assign  = σ(ZZT ) (11)
different weights to a node neighbors. In addition, it is an where Z = gcn(X, A); the mean-squared error function is
inductive model, implying that GAT can perform online (fault used as the loss function,
diagnosis) task after training [54] [55].
1 X
The GAT architecture is more complex than a GCN one. LGAE = − ||Aij − Âij ||22 (12)
In detail, the feature matrix X = {x1 , x2 , . . . , xN } ∈ N ij
RN ×c , xi ∈ Rc , i = 1, 2, ..., N . Then, performing a linear
transformation on the vector xi of the node to obtain its feature where  derives from Eq. (11).
0
0
vector xi ∈ Rc , W is a linear mapping matrix: 5) Spatial temporal graph convolutional network (STGCN)
0 0 STGCNs have been widely used to solve traffic forecasting
×c
xi = Wxi , W ∈ Rc , i = 1, 2, ..., N (6) problems [60]. Generally, STGCN is composed both of GCN
0
and CNN architectures, where GCN is responsible for aggre-
0 0 0 0
X = [x1 , x2 , . . . , xN ] ∈ RN ×c (7) gating nodes features in the spatial dimension, while CNN
performs temporal convolution in the temporal dimension.
In GAT, for a given xi , the attention weight value of its
0 0 According to the different graph analytic tasks, the networks
neighbor xj is reflected by aij obtained as xi and xj of
mentioned above fall into three categories, namely, node-level
nodes i and j after linear mapping are concatenated together,
GNNs, edge-level GNNs, and graph-level GNNs [53]. GCN,
and then the inner product is calculated with a vector a of
0 GAT, and GraphSage belong to the node-level GNN, which
length 2c . The activation function uses Leaky_ReLU , and a
classify the nodes in the association graph. Edge-level GNN,
final softmax layer normalizes the weights value.
0 0
like GAE, can be used for matrix completion, which predicts
exp(Leaky_ReLU ([xi kxj ]a)) the correlation edges that do not exist in the input adjacency
aij = P 0 0 (8)
k∈N (i) exp(Leaky_ReLU ([xi kxk ]a)
matrix. In a graph-level GNN, such as STGNN, each graph
6
corresponds to a single feature. The characteristics and taxon- colors show different categories of the motor running states.
omy of the aforementioned GNN models in subsection III.B There are eight square nodes (the number of training sets) in
are summarized in Fig. 7. the figure; the test set is composed of 290 nodes. Fig. 8(b)
shows the motor running state data after the graph convolu-
tional layer. Data have been normalized and their dimension
C. Visualization of GNN’s data classification capability reduced too. The meaning of nodes in both subfigures is the
Fig. 8 illustrates the efficiency of a graph neural network to same; data processed by GCN show operational clusters. As
exploit information coming from its neighbors. Fig. 8(a) shows shown in Fig. 8(a), it is likely that the high model performance
the original data of a motor running state after normalization cannot be obtained using the insufficient training set with eight
and dimensionality reduction. Principal component analysis nodes. On the other hand, for the data shown in Fig. 8(b), since
(PCA) was used to reduce the original data from 48 dimen- nodes belonging to the same categories are more concentrated
sions to 2 dimensions for visualization. Nodes with different and nodes belonging to different categories are farther apart,
7
the training set after the GCN layer is more representative. It Besides, now that industrial systems are characterized by rela-
can be expected that GNN-based method can achieve better tional dependencies and that we wish to exploit the relational
classification performance. Considering that the data set for inductive bias, it is an alternative and promising way to use a
fault diagnosis is always composed of a large proportion of GNN-based method for fault diagnosis task.
unlabeled data (fault-free data) and a small part of labeled Typically, the framework of GNN-based fault diagnosis
data (faulty data), the ability of GNN to obtain information of method is shown in Fig. 9, which mainly consists of two parts,
neighbors becomes essential. that is, building the graph from data and building the GNN
model.
Fig. 10: KNN-based graph construction method C. Using matrix completion to adjust the association graph
The construction process of the graph is to analyze the
potential existence of the dependency between nodes. There-
1) The selection of time and frequency domain features fore, the construction of the association graph can also be
[63]. In fact, the selection of feature types is completely transformed to the link prediction issue. Inspired by [65],
subjective. It is difficult to explain why some feature we propose a new method for constructing the association
types are used instead of others, many trial-and-error graph. Firstly, an incomplete association graph of a data set is
experiments are necessary. constructed by the KNN method. Then the adjacency matrix
2) The determination of the distance between nodes. A of the graph is reconstructed by the GAE model. Finally, the
straightforward method is to use Euclidean distance to downstream task is completed according to the reconstructed
measure the distance between features. From another association graph.
point of view, the occurrence of a fault will lead to The method of using a matrix to complete and adjust the
the deviation of features, but may not affect others. association graph is an extension of the KNN method. Since
Therefore, various features should be lead to different GAE is a type of unsupervised learning, its purpose is to adjust
distance values. However, due to the principle of KNN the original association graph, so the graph reconstructed
10
by GAE is at least no worse than the original one. Subse- Table V: Hyperparametric numerical simulation of GNN
quent experiments show that the method of reconstructing model
association graph using GAE model can improve diagnostic Models Number of epochs Learning rate Optimizer
performance to a certain extent. The illustration of GAE-based GCN 300 0.000 18 RMSProp
graph adjustment method is shown in Fig. 12. GAT 200 0.000 05 Adam
GraphSage 300 0.005 RMSProp
STGCN 200 0.001 Adam
D. Assess the quality of the association graph
According to the constructed association graphs, GNNs
aggregate the isolated data into a whole. Through various feature distributions of two neighbors in the graph are quite
aggregation methods, GNNs enable each node to aggregate the different (corresponding to a big λf indicator), but they have
information of its neighbors. Due to this, the performance of the same category (corresponding to a small λl indicator).
the GNN model is better than that of ordinary neural network Interestingly, [66] implicitly explains the limitation of KNN-
in many fields. based construction method: it tends to connect two nodes with
In [66], it was pointed out that the GNN model has been similar feature distribution, which makes nodes obtain less
widely used in graph representation learning, but it remains a information from their neighbors. From this point of view,
issue to measure the quality of the graph. To deal with this, both the λf index and the λl index calculated by the graph
two quality indicators, namely feature smoothness and label from KNN-based method are small, which indicate a “correct
smoothness, were proposed. The feature smoothness index λf but useless” association graph.
is:
|| ei,j ∈E (xvi − xvj )2 ||
P
λf = (15) VI. B ENCHMARK STUDY AND COMPARISON
|E| · d To illustrate the performance of GNN-based FD methods,
where xvi and xvj represent the features of the nodes vi and three baseline models as well as several GNN-based FD
vj , respectively, d is the dimension of xvi and xvj , and |E| methods are tested on three industrial data sets and compared.
represents the amount of the edges E in a association graph.
It is assumed that nodes with dissimilar features compared
A. The designed models
with their neighbors tend to obtain more information from
association graph. Therefore, λf is positively correlated with The detailed architectures of various GNN methods have
the quality of the association graph. been discussed in Section III. The corresponding hyperparam-
Another label smoothness index λl is defined as: eters of each models are shown in Table V. The GCN model
X is composed of a GC layer, three 1dCNN layers, and two
λl = (1 − I(vi , vj ))/|E| (16) fully connected layers; The GAT model is composed of two
ei,j ∈E graph attention layers with a multihead mechanism, where the
where I(vi , vj ) = 0 if Yvi 6= Yvj , I(vi , vj ) = 1 if Yvi = Yvj , number of heads is eight; The GraphSage model is composed
Yvi and Yvj represent the category of the nodes vi and vj , of two layers, and the GCN aggregator is used; the STGCN
respectively. model contains two temporal blocks and one spatial block.
The purpose of calculating λl is based on an inductive bias The spatial block uses traditional GCN for graph convolution,
that, in a graph, if most nodes and their neighbors are in the while the temporal blocks perform convolution-max-pooling-
same categories, the graph is beneficial for the training of the convolution for each node in the temporal dimension.
model. Therefore, λl is negatively correlated with the quality Furthermore, 6 kinds of widely-used baseline models are
of the association graph. used, namely the CNN, LSTM, RF, GBDT, LGBM, and SVC
In conclusion, λf and λl measure the quality of the asso- models.
ciation graph from two aspects. They point out that a high- 1. CNN. CNN is a rather common baseline model, which
quality graph should have the following characteristics: the performs well in dealing with issues in various research fields.
11
the TEP data set (accounts for 13.64% of the data set). To
make a comparison between different construction methods,
we used the SA-based method, the KNN and KNN + GAE-
based method on the rectifier data set, KNN and KNN + GAE-
based methods on the motor and TEP data sets. The diagnostic
accuracies are shown in Table IX, they reflect the performance
of various diagnostic methods. For example, when the size of
the test set is 867, and the accurate prediction quantity is 617,
(a) Graph constructed by SA method (b) Graph constructed by
KNN method
the diagnostic accuracy is calculated as 0.712.
From Table IX, the selection of the construction methods
have a great impact on model performance. First consider the
rectifier data set, the accuracies of three GNN models with SA-
based construction method reach approximately 90% in aver-
age, which is much higher than other methods. This is because
the SA-based construction method uses a large amount of
prior knowledge to construct a high-quality association graph.
Besides, the performance of the three FD models using KNN
(c) Graph constructed by
KNN+GAE method + GAE method for graph construction is better than that using
the KNN method; On the motor data set, the performance of
Fig. 14: Schematic diagrams of graphs constructed by three GCN using KNN method is slightly better than that of GCN
kinds of methods with KNN + GAE method; As for the TEP data set, both
construction methods can achieve favorable fault diagnosis
Table VIII: Related parameters of the association graph performance. The accuracies of the GCN and the GAT models
using KNN + GAE method are better than that using KNN.
Construction method Data set No. of edges K value
SA rectifier data set 104 521 \ Generally speaking, the SA-based construction method is
rectifier data set 325 56 45 better than other methods. However, in most cases, it is
KNN motor data set 540 96 50 difficult to obtain prior knowledge of system, so KNN-based
TEP data set 539 00 30
rectifier data set 862 69 45 and KNN + GAE-based construction methods can be used
KNN+GAE motor data set 119 387 50 as its replacement for cases where the structure information
TEP data set 103 667 30 is unknown. Furthermore, the overall performances of GNNs
using KNN + GAE-based method are higher than that of
GNNs using the KNN-based method. This may be attributed
by SA method, all nodes in a graph are clustered respectively. to the fact that, the usage of GAE realizes the prediction of
Nodes belonging to the same cluster are connected to each the association relationships that do not exist in the original
other, and nodes of different clusters are isolated from each graph, which actually improves the quality of the association
other (as is shown in Fig. 14(a)). For the graph constructed by graph.
KNN method, all nodes have the potential to be connected to
each other. Usually, all nodes in the graph form a connected 2) Comparison between GNN models and baselines: In
domain (as is shown in Fig. 14(b)). The graph constructed by the benchmark study, GNN-based methods are also compared
KNN + GAE method is slightly different from the KNN-based with various baseline methods. Now that the SA-based graph
graph, whose schematic diagram is shown in Fig. 14(c). construction method introduces prior knowledge, which is
Related parameters about each graph are shown in Table infeasible in baseline methods, GNN models trained by SA
VIII. Note that the K value is only applicable to the KNN are not compared with baseline methods. It is found that GCN
method. and GraphSage are not as good as CNN when dealing with the
rectifier data set, but achieve better performance than the other
baseline methods. STGCN with KNN-based method and GAT
E. Experimental results and analysis with KNN + GAE-based method outperform baseline methods
1) Comparison between graph construction methods: The in terms of diagnostic accuracy.
sizes of training set in three data sets are It should be noted that on the motor data set, the LGBM
The amount of training set in the rectifier data set is 50 and GBDT methods outperform GNN-based methods as well
(accounts for 4.69% of the data set), while 80 in the motor as other baseline models, this may be because of the fact that
data set (accounts for 7.25% of the data set), and 150 in the motor data set itself is relatively simple and the association
13
Fig. 15: Detailed display of experimental results on the rectifier data set
between data is not close. experiments were conducted for the comparison with baseline
In general, considering the performance of GNN methods in methods. The experimental results are shown in Table X, Fig.
three data set, GNN methods achieve same or better diagnosis 15, and Fig. 18. In order to better present the convergence
performance compared with the six baseline methods. Besides, process inside the STGCN model during its iteration process,
when dealing with data set whose association graph can be the variation tendency of the output of the hidden layer in
constructed based on prior knowledge, GNN with SA-based STGCN method is tracked as shown in Fig. 17. In this figure,
graph construction method is the preferred one. the output of the hidden layer is processed with PCA method
3) The influence of small sample case: In practice, a system for dimensional reduction, and constitutes a node in the graph.
usually runs in “long-term fault-free and short term fault” As shown in Fig. 17, with the increase in the number of
scenario due to the highly reliable design of the system. This iterations, the outputs with the same label in the hidden
leads to a small number of fault samples, which will be called layer become gradually similar. In addition, the confusion
the “small sample case” in this paper. matrices corresponding to the diagnosis results obtained by
To verify the effectiveness of the GNN-based FD methods these methods are shown in Fig. 16. According to the analysis
for the small sample case, the amount of training set is set to of Table X, when the number of training sets is 10 (accounts
10-100 on the rectifier data set, and a total of ten groups of for 0.94% of the data set), except for STGCN, the overall
14
diagnostic accuracies of the GNN-based methods are better methods can achieve good fault diagnosis. Considering that
than the six baseline methods, this may be attributed to the GraphSage belongs to the inductive learning mode, it is a good
fact that the adjacency matrix of STGCN cannot fully reflect choice to use GraphSage model for fault diagnosis. GAT uses
the effective association between features in the case of small attention mechanism and a multihead mechanism to improve
samples. the generalizability, and has obvious advantages in processing
As the size of the training sets increases, the diagnostic data set with complex internal structures. Compared with
accuracies of the GNN-based and baseline FD methods in the the other three GNN-based FD methods, the performance of
test set increase correspondingly, but the diagnostic perfor- STGCN is relatively poor when the number of training sets is
mance of the GNNs is still better than the methods on the small, but with the increase of the number of training sets, the
whole. In general, GNN-based methods can achieve relatively accuracy of STGCN in the test sets is approximately equal to
better results in the small sample case, and the advantages GraphSage.
of the GNNs is more obvious with the introduction of prior
knowledge. VII. P ERSPECTIVES OF FUTURE RESEARCH
In conclusion, the four GNN-based FD methods have dis- GNN is a promising way for fault diagnosis, but several
tinct advantages in three benchmark data sets. Most of the challenges still need to be further investigated, which are
15
Fig. 17: Variation tendency of the hidden layer output of the STGCN model (the percentage represents the proportion of the
total iterative training process, and different colors of data denote different categories)
VIII. C ONCLUSION [11] S. Ding, Data-driven Design of Fault Diagnosis and Fault-tolerant
Control Systems. London, UK: Springer, 2014.
In this paper, the emerging GNN-based fault diagnosis [12] S. Ding, Model-based Fault Diagnosis Techniques: Design Schemes,
methods are briefly reviewed. The NN-based fault diagno- Algorithms, and Tools. Berlin, Germany: Springer, 2008.
sis methods are divided based on the data representations [13] C. Alippi, N. Stavros, and R. Manuel, “Model-free fault detection and
isolation in large-scale cyber-physical systems,” IEEE Transactions on
in real world, namely, the time-series-based NNFD method, Emerging Topics in Computational Intelligence, vol. 1, no. 1, pp. 61–71,
the image-based NNFD method, and the graph-based NNFD 2016.
method. Then, basic principles and principal architectures of [14] H. Luo, H. Zhao, and S. Yin, “Data-driven design of fog-computing-
aided process monitoring system for large-scale industrial processes,”
GNN are introduced, with attention to GCN, GAT, GraphSage, IEEE Transactions on Industrial Informatics, vol. 14, pp. 4631–4641,
GAE, and STGCN according to the different graph analytic 2018.
tasks. Furthermore, the GNN-based fault diagnosis framework [15] P. M. Papadopoulos, V. Reppa, M. M. Polycarpou, and C. G. Panayiotou,
is detailed with focus on building the association graph and “Scalable distributed sensor fault diagnosis for smart buildings,”
IEEE/CAA Journal of Automatica Sinica, vol. 7, no. 3, pp. 638–655,
designing the GNN models. Experiments on three benchmark 2020.
data sets were carried out to verify the effectiveness and [16] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning
feasibility of GNN-based FD methods by comparing with applied to document recognition,” Proceedings of the IEEE, vol. 86,
no. 11, pp. 2278–2324, 1998.
several baseline FD methods. Finally, perspectives on the [17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
challenges of GNN-based fault diagnosis are discussed. This image recognition,” in IEEE Conference on Computer Vision and Pattern
review is prepared in response to the urgent need of a seamless Recognition, pp. 770–778, 2016.
[18] J. C. David and J. MacKay, “A practical bayesian framework fro
and rigorous transition from classical neural network-based backpropagation networks,” Neural Computation, vol. 4, pp. 448–472,
fault diagnosis to the alternative fault diagnosis approaches 1992.
which operate directly on graph-structured data. It is also [19] C. Cheng, J. Wang, H. Chen, Z. Chen, H. Luo, and P. Xie, “A review of
expected to provide guidelines for future research in this field. intelligent fault diagnosis for high-speed trains: Qualitative approaches,”
Entropy, 2020, doi:10.3390/e23010001.
[20] K. Zhang, H. Hao, Z. Chen, S. Ding, and K. Peng, “A comparison
ACKNOWLEDGMENT and evaluation of key performance indicator-based multivariate statistics
process monitoring approaches,” Journal of Process Control, vol. 33, pp.
This work was supported in part by the National Natural 112–126, 2015.
Science Foundation of China (#62173349,#U20A20186), [21] K. Zhong, M. Han, and B. Han, “Data-driven based fault prognosis
for industrial systems: A concise overview,” IEEE/CAA Journal of
in part by the Swiss National Science Foundation project Automatica Sinica, vol. 7, no. 2, pp. 330–345, 2020.
(#200021_172671): “ALPSFORT: A Learning graph-based [22] D. Zheng, L. Zhou, and Z. Song, “Kernel generalization of multi-
framework for cyber-physical systems”. rate probabilistic principal component analysis for fault detection in
nonlinear process,” IEEE/CAA Journal of Automatica Sinica, vol. 8,
no. 8, pp. 1465–1476, 2021.
R EFERENCES [23] K. Zhang, Y. Shardt, Z. Chen, X. Yang, S. Ding, and K. Peng, “A kpi-
based process monitoring and fault detection framework for large-scale
[1] S. Rizzo, G. Susinni, and F. Iannuzzo, “Intrusiveness of power de- processes,” ISA Transactions, vol. 68, pp. 276–286, 2017.
vice condition monitoring methods: Introducing figures of merit for [24] H. Chen, B. Jiang, S. X. Ding, and B. Huang, “Data-driven fault diagno-
condition monitoring,” IEEE Industrial Electronics Magazine, 2021, sis for traction systems in high-speed trains: A survey, challenges, and
doi:10.1109/MIE.2021.3066959. perspectives,” IEEE Transactions on Intelligent Transportation Systems,
[2] Y. G. Lei, J. Lin, M. J. Zuo, and Z. J. He, “Condition monitoring and pp. 1–17, 2020, doi:10.1109/TITS.2020.3029946.
fault diagnosis of planetary gearboxes: A review,” Measurement, vol. 48, [25] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D. In-
pp. 292–305, 2014. man, “1d convolutional neural networks and applications: A survey,”
[3] S. Yin, S. X. Ding, X. Xie, and H. Luo, “A review on basic data-driven Mechanical Systems and Signal Processing, vol. 151, p. 107398, 2021,
approaches for industrial process monitoring,” IEEE Transactions on doi:10.1016/j.ymssp.2020.107398.
Industrial Electronics, vol. 61, no. 11, pp. 6418–6428, 2014. [26] M. Kuppusamy, A. Hussain, P. Sanjeevikumar, J. Holm-Nielsen, and
[4] Z. Ge, “Review on data-driven modeling and monitoring for plant-wide V. Kaliappan, “Deep learning for fault diagnostics in bearings, insulators,
industrial processes,” Chemometrics and Intelligent Laboratory Systems, pv panels, power lines, and electric vehicle applications-the state-of-the-
vol. 171, pp. 16–25, 2017. art approaches,” IEEE Access, vol. 9, pp. 41 246–41 260, 2021.
[5] Y. Zhao, X. He, J. Zhang, H. Ji, D. Zhou, and M. G. Pecht, “Detection [27] D. Hoang and H. Kang, “A survey on deep learning based bearing fault
of intermittent faults based on an optimally weighted moving average diagnosis,” Neurocomputing, vol. 335, pp. 327–335, 2019.
t2 control chart with stationary observations,” Automatica, vol. 123, p.
[28] T. Kipf and M. Welling, “Semi-supervised classification with graph
109298, 2021.
convolutional networks,” International Conference on Learning Repre-
[6] J. Lee, F. Wu, W. Zhao, M. Ghaffari, L. Liao, and D. Siegel, “Prog-
sentations, vol. abs/1609.02907, 2016.
nostics and health management design for rotary machinery systems-
reviews, methodology and applications,” Mechanical Systems and Signal [29] H. Zhang and Y. Shen, “Template-based prediction of protein structure
Processing, vol. 42, no. 1, pp. 314–334, 2014. with deep learning,” BMC Genomics, vol. 21, 2020.
[7] J. Wang, F. Yang, T. Chen, and S. L. Shah, “An overview of industrial [30] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural
alarm systems: Main causes for alarm overloading, research status, networks on graphs with fast localized spectral filtering,” eural infor-
and open problems,” IEEE Transactions on Automation Science and mation processing systems, vol. 30, pp. 3844–3852, 2016.
Engineering, vol. 13, no. 2, pp. 1045–1061, 2016. [31] M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional neural
[8] Y. Dong, Z. Wu, X. Du, J. Zha, and Q. Yuan, “Resource abnormal networks for graphs,” International Conference on Machine Learning,
management method of unsteady processes of cloud manufacturing pp. 2014–2023, 2016.
services,” China Mechanical Engineering, vol. 29, pp. 1193–1200, 2018. [32] R. Girshick, J. Donahue, and T. Darrell, “Rich feature hierarchies for
[9] H. Henao, G. Capolino, and M. Cabanas, “Trends in fault diagnosis for accurate object detection and semantic segmentation,” Computer Vision
electrical machines: A review of diagnostic techniques,” IEEE Industrial and Pattern Recognition, pp. 580–587, 2014.
Electronics Magazine, vol. 8, no. 2, pp. 31–42, 2014. [33] X. Qi, R. Liao, J. Jia, S. Fidler, and R. Urtasun, “3D graph neural
[10] J. Qin, Y. Dong, Q. Zhu, J. Wang, and Q. Liu, “Bridging systems theory networks for RGBD semantic segmentation,” pp. 5209–5218, 2017.
and data science: A unifying review of dynamic latent variable analytics [34] L. Yao, C. Mao, and Y. Luo, “Graph convolutional networks for
and process monitoring,” Annual Reviews in Control, vol. 50, pp. 29–48, text classification,” Proceedings of the AAAI Conference on Artificial
2020. Intelligence, vol. 33, pp. 7370–7377, 2019.
17
[35] N. Park, A. Kan, X. L. Dong, T. Zhao, and C. Faloutsos, “Estimating [60] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional
node importance in knowledge graphs using graph neural networks,” in networks: A deep learning framework for traffic forecasting,” in Pro-
International Conference on Knowledge Discovery and Data Mining, ceedingsof the International Joint Conference on Artificial Intelligence,
pp. 596–606, 2019. pp. 3634–3640, 2018.
[36] Z. Zhang, P. Cui, and W. Zhu, “Deep learning on graphs: A survey,” [61] E. Mansimov, O. Mahmood, S. Kang, and K. Cho, “Molecular geometry
IEEE Transactions on Knowledge and Data Engineering, 2020, doi: prediction using a deep generative graph neural network,” Scientific
10.1109/TKDE.2020.2981333. Reports, vol. 9, no. 1, pp. 1–13, 2019.
[37] I. Chami, S. Abu, and B. Perozzi, “Machine learning on graphs: A model [62] T. Li, Z. Zhao, C. Sun, R. Yan, and X. Chen, “Multi-receptive field graph
and comprehensive taxonomy,” CoRR, 2020. convolutional networks for machine fault diagnosis,” IEEE Transactions
[38] D. Bacciu, F. Errica, A. Micheli, and M. Podda, “A gentle introduction on Industrial Electronics, 2020, doi: 10.1109/TIE.2020.3040669.
to deep learning for graphs,” Neural Networks, vol. 129, pp. 203–221, [63] L. Franceschi, M. Niepert, M. Pontil, and X. He, “Learning discrete
2020. structures for graph neural networks,” International Conference on
[39] J. Zhou, G. Cui, and Z. Zhang, “Graph neural networks: A review of Machine Learning, 2019.
methods and applications,” 2018. [64] Z. Chen, J. Xu, T. Peng, and C. Yang, “Graph convolutional network-
[40] J. B. Lee, R. A. Rossi, S. Kim, N. K. Ahmed, and E. Koh, “Attention based method for fault diagnosis using a hybrid of measurement and
models in graphs: A survey,” ACM Transactions on Knowledge Discov- prior knowledge,” IEEE Transactions on Cybernetics, pp. 1–13, 2021,
ery from Data, vol. 13, pp. 1–25, 2019. doi: 10.1109/TCYB.2021.3059002.
[41] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. Yu, “A comprehen- [65] R. Berg, T. Kipf, and M. Welling, “Graph convolutional matrix comple-
sive survey on graph neural networks,” IEEE Transactions on Neural tion,” ArXiv, vol. abs/1706.02263, 2017.
Networks and Learning Systems, vol. 32, no. 1, pp. 4–24, 2021. [66] Y. Hou, J. Zhang, J. Cheng, and K. Ma, “Measuring and improving the
[42] K. Chen, J. Hu, Y. Zhang, Z. Yu, and J. He, “Fault location in power use of graph information in graph neural networks,” in International
distribution systems via deep graph convolutional networks,” IEEE Conference on Learning Representations, 2019.
Journal on Selected Areas in Communications, vol. 38, pp. 119–131, [67] C. Yang, C. Yang, T. Peng, X. Yang, and W. Gui, “A fault-injection
2019. strategy for traction drive control systems,” IEEE Transactions on
[43] J. Jiang, J. Chen, and T. Gu, “Anomaly detection with graph convolu- Industrial Electronics, vol. 64, no. 7, pp. 5719–5727, 2017.
tional networks for insider threat and fraud detection,” in IEEE Military [68] S. Arora, “A survey on graph neural networks for knowledge graph
Communications Conference, pp. 109–114, 2019. completion,” CoRR, vol. abs/2007.12374, 2020.
[44] X. Yu, B. Tang, and K. Zhang, “Fault diagnosis of wind turbine gearbox [69] Y. Lei, B. Yang, X. Jiang, F. Jia, N. Li, and A. K. Nandi, “Applications
using a novel method of fast deep graph convolutional networks,” IEEE of machine learning to machine fault diagnosis: A review and roadmap,”
Transactions on Instrumentation and Measurement, vol. 70, pp. 1–14, Mechanical Systems and Signal Processing, vol. 138, p. 106587, 2020.
2021.
[45] T. Li, Z. Zhao, C. Sun, R. Yan, and X. Chen, “Multi-receptive field graph
convolutional networks for machine fault diagnosis,” IEEE Transactions
on Industrial Electronics, 2020, doi: 10.1109/TIE.2020.3040669.
[46] C. Li, L. Mo, and R. Yan, “Rolling bearing fault diagnosis based on
horizontal visibility graph and graph neural networks,” in International
conference on Sensing, measurement, data analytics in the era of
artificial intelligence, pp. 275–279, 2020.
[47] W. Z. A. J. Bruna and Y. L, “Spectral networks and locally connected
networks on graphs,” Computing Research Repository, pp. 1–14, 2013.
[48] Q. Li, Z. Han, and X. Wu, “Deeper insights into graph convolutional
networks for semi-supervised learning,” in Proceedings of the Thirty-
Second AAAI Conference on Artificial Intelligence, (AAAI-18), New
Orleans, Louisiana, USA, February 2-7, 2018, S. A. McIlraith and K. Q.
Weinberger, Eds., pp. 3538–3545. AAAI Press, 2018.
[49] Z. Zhang, J. Huang, and Q. Tan, “SR-HGAT: Symmetric relations based
heterogeneous graph attention network,” IEEE Access, vol. 8, pp. 631–
645, 2020.
[50] H. Li, H. Chen, and W. Wang, “A structural deep network embedding
model for predicting associations between mirna and disease based on
molecular association network,” Scientific Reports, vol. 11, p. 12640,
2021.
[51] A. Tsitsulin, J. Palowitch, B. Perozzi, and E. Muller, “Graph clustering
with graph neural networks,” ArXiv, vol. abs/2006.16904, 2020.
[52] Z. Wang and T. Oates, “Imaging time-series to improve classification
and imputation,” Conference on artificial intelligence, pp. 3939–3945,
2015.
[53] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Philip, “A
comprehensive survey on graph neural networks,” IEEE Transactions
on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 1–21,
2020.
[54] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, and Y. Bengio,
“Graph attention networks,” International Conference on Learning Rep-
resentations, 2017.
[55] J. He and H. Zhao, “Fault diagnosis and location based on graph
neural network in telecom networks,” in International Conference on
Networking and Network Applications, pp. 304–309, 2020.
[56] W. Hamilton, R. Ying, and J. Leskovec, “Inductive representation
learning on large graphs,” in Neural Information Processing Systems,
vol. 30, pp. 1025–1035, 2017.
[57] T. N. Kipf and M. Welling, “Variational graph auto-encoders,” 2016.
[58] S. Pan, R. Hu, G. Long, J. Jiang, and L. Yao, “Adversarially regularized
graph autoencoder for graph embedding,” in Proceedings of the Interna-
tional Joint Conference on Artificial Intelligence, pp. 2609–2615, 2018.
[59] Y. Liao, Y. Wang, and Y. Liu, “Graph regularized auto-encoders for
image representation,” IEEE Transactions on Image Processing, vol. 26,
no. 6, pp. 2839–2852, 2016.