0% found this document useful (0 votes)
20 views17 pages

Graph Neural Network-Based Fault Diagnosis: A Review

This document is a review of graph neural network (GNN)-based fault diagnosis methods, highlighting their advantages over traditional approaches due to the ability to represent data as graphs. It discusses various GNN architectures, their applications in fault diagnosis, and presents a benchmark study comparing GNN methods with baseline techniques. The paper also outlines future research directions and challenges in the field, emphasizing the importance of constructing association graphs for effective fault diagnosis.

Uploaded by

yuangd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views17 pages

Graph Neural Network-Based Fault Diagnosis: A Review

This document is a review of graph neural network (GNN)-based fault diagnosis methods, highlighting their advantages over traditional approaches due to the ability to represent data as graphs. It discusses various GNN architectures, their applications in fault diagnosis, and presents a benchmark study comparing GNN methods with baseline techniques. The paper also outlines future research directions and challenges in the field, emphasizing the importance of constructing association graphs for effective fault diagnosis.

Uploaded by

yuangd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

1

Graph neural network-based fault diagnosis: a


review
Zhiwen Chen, Jiamin Xu, Cesare Alippi, Steven X. Ding, Yuri Shardt, Tao Peng, Chunhua Yang

Abstract—Graph neural network (GNN)-based fault diagnosis V Discussions on constructing association graph 8
(FD) has received increasing attention in recent years, due to V-A Using K-nearest neighbor method to
the fact that data coming from several application domains can construct the association graph . . . . . 8
arXiv:2111.08185v1 [eess.SY] 16 Nov 2021

be advantageously represented as graphs. Indeed, this particular


representation form has led to superior performance compared to V-B Using prior knowledge to construct the
traditional FD approaches. In this review, an easy introduction association graph . . . . . . . . . . . . 9
to GNN, potential applications to the field of fault diagnosis, V-C Using matrix completion to adjust the
and future perspectives are given. First, the paper reviews association graph . . . . . . . . . . . . 9
neural network-based FD methods by focusing on their data V-D Assess the quality of the association graph 10
representations, namely, time-series, images, and graphs. Second,
basic principles and principal architectures of GNN are intro-
duced, with attention to graph convolutional networks, graph VI Benchmark study and comparison 10
attention networks, graph sample and aggregate, graph auto- VI-A The designed models . . . . . . . . . . 10
encoder, and spatial-temporal graph convolutional networks. VI-B Description of data sets . . . . . . . . . 11
Third, the most relevant fault diagnosis methods based on GNN VI-C Description of fault types . . . . . . . . 11
are validated through the detailed experiments, and conclusions
are made that the GNN-based methods can achieve good fault
VI-D Constructing association graphs from
diagnosis performance. Finally, discussions and future challenges the data sets . . . . . . . . . . . . . . . 11
are provided. VI-E Experimental results and analysis . . . . 12
VI-E1 Comparison between graph
Index Terms—Data driven, Neural network, Deep neural
network, Graph neural network, Fault diagnosis, Condition construction methods . . . . 12
monitoring VI-E2 Comparison between GNN
models and baselines . . . . 12
VI-E3 The influence of small sam-
C ONTENTS ple case . . . . . . . . . . . 13

VII Perspectives of future research 14


I Introduction 1
VIII Conclusion 16
II Data-driven fault diagnosis 3
References 16
III Graph neural network 3
III-A Mathematical notations . . . . . . . . . 3 I. I NTRODUCTION
III-B GNN Architectures . . . . . . . . . . . 4
In industry, fault diagnosis (FD) is required to guarantee safe
III-C Visualization of GNN’s data classifica- and specification compliant operation of plants. This allows
tion capability . . . . . . . . . . . . . . 6 the process to attain industrial intelligence. Research on FD
has a long history, and is also an important part of some
IV Fault diagnosis methods based on GNN 7 related techniques, like condition monitoring [1], [2], process
IV-A Building association graph from data . . 7 monitoring [3]–[5], prognostic and health management (PHM)
IV-B Building GNN models for fault diagnosis 7 [6], and abnormality management [7], [8]. Fault diagnosis
methods can be divided into two categories, namely, model-
based and data-based methods [9], [10]. Early fault diagnosis
Zhiwen Chen, Jiamin Xu, Tao Peng and Chunhua Yang are with the School methods mainly belong to the model-based class, where the
of Automation, Central South University, Changsha, 410083, China. Zhiwen
Chen and Tao Peng are also with the Peng Cheng Laboratory, Shenzhen physical model or a state observer for the system of interest is
518066, China. constructed, and fault diagnosis is achieved by inspecting the
Cesare Alippi is with Universita della Svizzera italiana, Lugano, Switzer- changes in the residuals [11]–[15].
land, and Politecnico di Milano, Milano, Italy.
Steven X. Ding is with the Institute for Automatic Control and Complex In recent years, with the progress of sensor technology
Systems (AKS), University of Duisburg-Essen, Duisburg, 47057, Germany. and the improvement of data storage capacity, data-driven
Yuri Shardt is with Institute of Automation and Systems Engineering, fault diagnosis methods have becoming a research focus, with
Technical University of Ilmenau, Ilmenau, Germany.
Email addresses of corresponding authors: zhiwen.chen@csu.edu.cn; ce- neural network (NN)-based method an important part due
sare.alippi@polimi.it). to powerful data processing capabilities. Researchers have
2

developed a variety of NN architectures, such as convolutional


neural network (CNN) architectures [16], residual network
(ResNet) architectures [17], and Bayesian neural network
(BNN) architectures [18]. Inspired by these, researchers began
to apply NN and deep learning models to the fault diagnosis
field [19]–[24]. For example, H. Chen et al. [24] reviewed
the-state-of-the-art data driven methods with applications to
high-speed trains. Besides, in [25], a comprehensive review
of the general architecture and principles of one-dimensional
convolutional neural networks (1dCNNs) along with their
main engineering applications was presented. M. Kuppusamy
et al. [26] made a review on the application of deep learning
techniques in five critical electrical applications. In [27], three
popular deep learning algorithms were briefly introduced, and Fig. 1: Publication trend of GNN
their applications were reviewed through publications and
research works on the area of bearing fault diagnosis.
The research and practice of NNs including CNN have
proved outstanding results in numerous applications. However,
in some research fields involving non-Euclidean structured
data, the widely-used CNN models are not able to achieve
optimal performance limited by its structural characteristics.
For example, in [28], it is pointed out that papers refer to each
other and construct a non-Euclidean graph structure citation
network. In bio-engineering, predicting the functional types
of protein based on its structure is a fascinating research field,
while the internal structure of protein data can be abstracted as
a complex graph structure [29]. Due to the limitation of their Fig. 2: Heat map of GNN related keywords
network architecture and computation standard, CNNs cannot
exploit existence of non-Euclidean graph structure, thus limits
the application of CNNs. [42], a new graph convolutional network (GCN) framework
Graph neural networks (GNNs) were proposed to take for distribution network fault location was proposed, which
advance of inductive bias associated with functional depen- not only obtains high model performance, but also shows
dencies and, as such, exploit non-Euclidean representations good robustness properties. To deal with the problem of
[30], [31]. GNNs provide architectures inspired by the deep limited labeled data collected by an electromechanical system,
ones and offer suitable operators to process such information- [43] constructed a semisupervised graph convolutional depth
rich structures [32]. In fact, compared with deep neural confidence network and an intelligent fault diagnosis method.
network, GNN can process data characterized by complex In the field of fault diagnosis of wind turbine gearbox, based
spatiotemporal relationships. As a result, GNNs are widely on the fact that the deep learning method cannot make full
used in computer vision [33], text processing [34], knowledge use of the relevant information of data, [44] proposed a new
mapping [35], and recommendation systems. Fig. 1 records the fast deep GCN, and achieved improved performance. In [45],
number of papers related to GNN obtained from the Web of authors discussed some structural defects of GCN models, and
Knowledge using keyword “Graph neural network”. It shows proposed a multireceiving field graph convolutional network
how the number of publications related to GNN is growing to realize effective intelligent fault diagnosis. [46] proposed
exponentially. Furthermore, Fig. 2 lists some hot words related a new bearing fault diagnosis model based on horizontal
to GNN and fault diagnosis, which are obtained by searching visibility graph and GNN. However, most of these publications
Web of Knowledge with keywords “Graph neural network” only focus on a small scope of research and seek to leverage
and “Fault diagnosis”. The figure shows a large intersection GNN to a certain level only.
between both research fields. With that in mind, this review is prepared in response
At present, there have been numerous review papers related to the urgent need of a seamless and rigorous transition
to GNN [36]–[38]. In particular, existing GNN models were from classical neural network-based fault diagnosis to the
reviewed in [39], and several issues for future work raised. alternative fault diagnosis methods which operate directly on
Later, [40] investigated GNNs and the attention mechanism. graph-structured data. This paper focuses on investigating and
In [41], a new taxonomy of GNN architectures was proposed, reviewing GNN for fault diagnosis purpose. GNN-based fault
and the application of GNN to various fields was discussed. diagnosis methods are studied and analyzed from multiple
In a real world, faults may change the running state of perspectives. The contributions of this paper are:
the process, and lead to a change of the dependency between 1. New Taxonomy. Neural network-based fault diagnosis
measurements. Based on this understanding, researchers began methods are divided into three categories according to the
to explore GNN for fault diagnosis tasks. For instance, in representations of input data.
3

Fig. 3: Schematic diagram of data representation (sample example)

2. Review-Oriented. Several architectures of GNNs are tasks that are hardly achievable when ignoring the functional
reviewed, and the feasible applications of these architectures dependency.
on fault diagnosis are explained. It is pointed out that one It has been found in [47]–[49] that the category of a
of difficulties of GNN-based fault diagnosis method lies in given node can be inferred from its neighbors, so except
the construction of the association graph, so several feasible for using time-series and image data for fault diagnosis, the
solutions are provided. introduction of the graph structure to the data is an alternative
3. Benchmark Study. The diagnosis performance of several and promising way. Furthermore, the involvement of graphs
GNN-based methods and baseline methods are compared makes it possible for the classical fault diagnosis based on
through three benchmark data sets. Based on the results time and space sensing to take into account dependencies
obtained, a discussion follows. type of inductive bias. On top of this, we argue that the
4. Future Research. With the expectation of identifying connections existing among nodes not only model explicit
possible research directions, several challenges related with functional dependencies, but also dependencies as well as
GNN-based fault diagnosis are presented and discussed. topological connections.
5. Open Source. The source code for this review will be Generally speaking, the common methods for processing
made available after the peer-review stage. The code provides graph structure among data include the random-walk [50],
the implementation details of the GNN-based fault diagnosis the graph-clustering [51], and GNN-based methods [39]. In
methods discussed in this paper. this paper, only GNN-based methods, including GCN, graph
attention network (GAT), and graph sample and aggregate
II. DATA - DRIVEN FAULT DIAGNOSIS (GraphSage), are considered.
With the rapid development of advanced sensing tech-
nologies and computational power, neural-network-based fault III. G RAPH NEURAL NETWORK
diagnosis (NNFD) has received considerable attention in in-
dustry. According to the various types of data representations, A. Mathematical notations
conventional NNFD methods can be divided into two cate- Mathematical notations used in this paper follow [53] and
gories, namely, image-based NNFD methods and time-series- are given in Table I. A simple graph can be represented as
based NNFD methods.
As shown in Fig. 3, taking a modern wind farm for example, G = G(V, E) (1)
sensor data contains rich heterogeneous information, such as
images of possible fan blade cracks and time-series data of where V and E are the sets of nodes and edges, respectively.
vibration, temperature, and speed. After obtaining the sensor Let vi ∈ V be a node and eij = (vi , vj ) ∈ E denote an edge
data, the NNDF methods can be constructed according to the between vi and vj . Then, the neighborhood of a node v can
structures described in Fig. 4 and Fig. 5, and the typical fault be defined as N (v) = {u ∈ V |(v, u) ∈ E}. Usually, a graph
diagnosis is realized. can be described by an adjacency matrix A ∈ RN ×N where
However, few attention has focused on the ubiquitous de- N is the number of nodes, that is, N = |V |. In particular,
pendencies among data. For example, considering the inter- Aij = 1 if {vi , vj } ∈ E and i 6= j; Aij = 1, otherwise.
action of neighboring wind turbines, a graph representation In undirected graph, Aij denotes an edge connection between
well models the relationship of all wind turbines in the wind nodes vi and vj , while in a directed graph, Aij represents
farm, and permits the assessment of the operation status of the an edge pointing from vi to vj . In practical applications, a
whole farm. In fact, even for a single wind turbine, a graph graph may have node features (also called attributes) X ∈
representation models the coupling and dependency among RN ×c where c is the dimension of a node feature vector. A
components, and provides additional value for the diagnosis degree matrix D ∈ RN ×N is a diagonal matrix, which can be
4

Fig. 4: Fault diagnosis framework based on time-series processing. A time-series-based network model is built to compute
the latent representations of data. To compute the probability for fault categories, linear and a softmax layers are used.

Fig. 5: Fault diagnosis framework based on image processing. Images are obtained through optical cameras or derived from
time-series data [52]. Images go through image processing network, several layers, and a softmax layer, yielding the
probability of fault categories.

Fig. 6: Illustration of the Laplacian matrix

PN
obtained as Dii = j=1 Aij . A graph can also be represented Table I: Notations
by the Laplacian matrix L, defined as: Notations Descriptions
|| Concatenation operation
L=D−A (2) |·| The length of a set
G A graph representation of a system in the fault-free case
Gf A graph representation of a system in the faulty case
An illustration of the relationship among the above three V The set of nodes in a graph
matrices is shown in Fig. 6. v A node v ∈ V
E The set of edges in a graph
N (v) The neighbors of a node v
B. GNN Architectures A The graph adjacency matrix
AT The transpose of the matrix A
The input of GNNs receives two contributions [39], i.e., the N The number of nodes, that is, N = |V |
c The dimension of a node feature vector
feature matrix X and the adjacency matrix A. The output of
X ∈ RN ×c The feature matrix of a graph
GNNs can be obtained by the general forward propagation D The degree matrix of A, that is, Dii = N
P
j=1 Aij
equation (3) L The Laplacian matrix of a graph, that is, L = D − A
l The layer index
Zgnn = sof tmax(gnnl (..., gnn2 (A, gnn1 (A, X)))) (3) t The time step/iteration index
δ Nonlinear activation function
where gnn denotes an operation of GNN and l is the number Θ Learnable model parameters
of layers in the architecture.
1) Graph convolutional networks (GCNs)
model has undergone many improvements, yielding to [28],
A GCN can be regarded as an extension of a CNN ar-
which proposed the computation
chitecture, and provides an effective method to extract rela-
tional/spatial features from graph structured inputs. The GCN Z = σ((D̃−0.5 ÃD̃−0.5 )XΘ) (4)
5

where à = A + IN , IN is identity matrix of order N , and where k represents the concatenation operator. Then, without
D̃ a diagonal matrix, whose diagonal elements are D̃ii = loss of generality, the output of the (l + 1)th hidden layer
P N
j=1 Ãij . σ is an activation function, e.g., the Relu function. become X
0
Θ ∈ Rc×c is the parameter matrix of the network to be hl+1
i = σ( Whlj aij ) (9)
learned, c0 represents the dimension of the output, and Z ∈ j∈N (i)
0
RN ×c is the output matrix.
where σ denotes the activation function, and hl is the output of
The cross-entropy loss function
the lth hidden layer, with h0 = X. The learnable parameters
Ntr X
M
X in GAT include the vector a and the parameter matrix W.
Loss = − Zij logYij (5) 3) Graph sample and aggregate
i=1 j=1
Similar to the GAT model, GraphSage belongs to the
is generally used to train the parameters. Here, Ntr repre- inductive learning model. In principle, GraphSage learns an
sents the number of the training points, M is the number aggregation function, which can aggregate the features of a
of categories, Z is obtained from Eq. (4), and Y is the specific node and its neighbors to obtain its high-order features
corresponding label of the training set. It should be noted that [56].
in a classification problem, the dimension of the output equals Denote the input feature matrix as X. The forward propa-
the number of categories, i.e., c0 = M . gation formula of the lth hidden layer in a GraphSage model
Even though GCNs have shown major improvements w.r.t. is:
preceding non-relational architectures, there still exists some
aspects to be improved: hlv = σ(Wl Concat(hl−1 l−1
v , Aggre(hu , ∀u ∈ N (v))) (10)
(1) Shallow GCN network cannot spread label information
where Concat represents the concatenation operation, Aggre
on a large scale, which limits its receptive field [30].
performs an aggregation operation on features, and hl the
(2) The deep GCN network leads to over-smoothing solu-
output of the lth hidden layer, h0 = X. Graphsage uses four
tions. A multi-layer GCN will make data become highly simi-
types of feature aggregation operations, including mean aggre-
lar in the forward transmission process, and cannot be correctly
gator, GCN aggregator, Long Short-Term Memory (LSTM)
distinguished, which greatly reduces the model performance.
aggregator, and a pooling aggregator. GraphSage does not
(3) GCN belongs to the transductive learning model. When
involve the attention mechanism, so it treats all neighbor
dealing with new data, new nodes must be introduced to
nodes equally. Due to the inductive learning characteristic,
modify the original association graph, and the whole GCN
GraphSage can be used for an online fault diagnosis task with
model must be trained again to adapt to the new graph.
a lower computational complexity compared to GCN.
(4) For a given node, GCN considers all its neighbors to
4) Graph auto-encoder (GAE)
be equally important, instead of paying selective attention to
A GAE is an unsupervised learning model, similar to the
special neighbors: this limits the performance.
Above deficiencies not only restrict performance, but also auto-encoder [57] [58] [59]. GAE uses a GCN layer as the
constraint applications. To address these limits, several new encoder, its input includes the association graph and the feature
GNN models have been proposed over time. matrix, and the output is the coding value  of the node in
2) Graph attention networks the graph.
A GAT architecture uses the attention mechanism to assign  = σ(ZZT ) (11)
different weights to a node neighbors. In addition, it is an where Z = gcn(X, A); the mean-squared error function is
inductive model, implying that GAT can perform online (fault used as the loss function,
diagnosis) task after training [54] [55].
1 X
The GAT architecture is more complex than a GCN one. LGAE = − ||Aij − Âij ||22 (12)
In detail, the feature matrix X = {x1 , x2 , . . . , xN } ∈ N ij
RN ×c , xi ∈ Rc , i = 1, 2, ..., N . Then, performing a linear
transformation on the vector xi of the node to obtain its feature where  derives from Eq. (11).
0
0
vector xi ∈ Rc , W is a linear mapping matrix: 5) Spatial temporal graph convolutional network (STGCN)
0 0 STGCNs have been widely used to solve traffic forecasting
×c
xi = Wxi , W ∈ Rc , i = 1, 2, ..., N (6) problems [60]. Generally, STGCN is composed both of GCN
0
and CNN architectures, where GCN is responsible for aggre-
0 0 0 0
X = [x1 , x2 , . . . , xN ] ∈ RN ×c (7) gating nodes features in the spatial dimension, while CNN
performs temporal convolution in the temporal dimension.
In GAT, for a given xi , the attention weight value of its
0 0 According to the different graph analytic tasks, the networks
neighbor xj is reflected by aij obtained as xi and xj of
mentioned above fall into three categories, namely, node-level
nodes i and j after linear mapping are concatenated together,
GNNs, edge-level GNNs, and graph-level GNNs [53]. GCN,
and then the inner product is calculated with a vector a of
0 GAT, and GraphSage belong to the node-level GNN, which
length 2c . The activation function uses Leaky_ReLU , and a
classify the nodes in the association graph. Edge-level GNN,
final softmax layer normalizes the weights value.
0 0
like GAE, can be used for matrix completion, which predicts
exp(Leaky_ReLU ([xi kxj ]a)) the correlation edges that do not exist in the input adjacency
aij = P 0 0 (8)
k∈N (i) exp(Leaky_ReLU ([xi kxk ]a)
matrix. In a graph-level GNN, such as STGNN, each graph
6

Fig. 7: A summary of the different GNN architectures

corresponds to a single feature. The characteristics and taxon- colors show different categories of the motor running states.
omy of the aforementioned GNN models in subsection III.B There are eight square nodes (the number of training sets) in
are summarized in Fig. 7. the figure; the test set is composed of 290 nodes. Fig. 8(b)
shows the motor running state data after the graph convolu-
tional layer. Data have been normalized and their dimension
C. Visualization of GNN’s data classification capability reduced too. The meaning of nodes in both subfigures is the
Fig. 8 illustrates the efficiency of a graph neural network to same; data processed by GCN show operational clusters. As
exploit information coming from its neighbors. Fig. 8(a) shows shown in Fig. 8(a), it is likely that the high model performance
the original data of a motor running state after normalization cannot be obtained using the insufficient training set with eight
and dimensionality reduction. Principal component analysis nodes. On the other hand, for the data shown in Fig. 8(b), since
(PCA) was used to reduce the original data from 48 dimen- nodes belonging to the same categories are more concentrated
sions to 2 dimensions for visualization. Nodes with different and nodes belonging to different categories are farther apart,
7

the training set after the GCN layer is more representative. It Besides, now that industrial systems are characterized by rela-
can be expected that GNN-based method can achieve better tional dependencies and that we wish to exploit the relational
classification performance. Considering that the data set for inductive bias, it is an alternative and promising way to use a
fault diagnosis is always composed of a large proportion of GNN-based method for fault diagnosis task.
unlabeled data (fault-free data) and a small part of labeled Typically, the framework of GNN-based fault diagnosis
data (faulty data), the ability of GNN to obtain information of method is shown in Fig. 9, which mainly consists of two parts,
neighbors becomes essential. that is, building the graph from data and building the GNN
model.

A. Building association graph from data


For a given node, a GNN synthetically analyzes the charac-
teristics of a node and its neighbors for fault diagnosis [61].
According to the difficulty of obtaining the association
graph, GNNs can be divided into two categories. In the early
GNN-based applications, the association graph of obtained
data set has a certain physical meaning, which provides great
convenience for its construction and verification. For example,
(a) Data set without GNN processing in the Karate Club data set mentioned in [28], each individual
data represents the club member’s information, and the task of
the neural network is to predict which club a member belongs
to. In this example, the association graph is constructed
according to club members’ interpersonal relationships. The
explicit dependency between members simplifies the design
of the association graph.
However, in the field of GNN-based fault diagnosis, there
is generally no explicit dependency information, so the asso-
ciation graph can only be built from data. How to construct
the graph and evaluate the quality of the graph are challenges
(b) Data set after a GNN processing faced by the GNN-based fault diagnosis method. In this paper,
two construction methods for the association graph, as well
Fig. 8: Demonstration of GNN’s data processing capability as their advantages and disadvantages, will be discussed in
(different colors represent different categories of motor Section V.
running states, square nodes represent the training set, and
the circular nodes represent the test set) B. Building GNN models for fault diagnosis
In this subsection, fault diagnosis is transformed into three
tasks on graphs, that is node classification, edge classification,
IV. FAULT DIAGNOSIS METHODS BASED ON GNN and graph classification. In addition, the aforementioned GNN
Suppose that an industrial system can be represented by a models are integrated into the fault diagnosis algorithms.
graph, which consists of nodes and edges. For instance, in 1) Exploration of node-level GNNs for FD
a modern chemical process, a great number of sensors are The considered node-level GNNs in this paper include
usually installed for indicating the status of the process. Due GCN, GraphSage, and GAT. These architectures treat each
to the physical coupling, the measurements are entangled or measurement as a node in the association graph. As mentioned
correlated with each other. Hence, if we regard each sensor in subsection III.A.1, two nodes that have a relationship in the
as a node, then the interactions could be viewed as edges. It association graph are more likely to be divided into the same
should be note that there are various ways to determine nodes category by GNN. The implementation of node-level GNN-
and edges. Denote a system operating in a fault-free condition based FD algorithm is shown in Table II.
as G, and a system in faulty condition as Gf . The rationale 2) Exploration of edge-level GNNs in fault diagnosis
of a GNN-based fault diagnosis is that In this paper, edge-level GNN is chosen as a GAE. As its
outputs are the reconstruction of the corresponding adjacency
{G = (X, A)} =
6 {Gf = (Xf , Af )} (13)
matrices of the graph, it cannot be directly used for fault
where Xf and Af denote the feature matrix and the adjacency diagnosis. However, it is pointed out in subsection V.A that
matrix in the faulty case, respectively. Specifically, if a fault a ready-made association graph in fault diagnosis is often
occurs in a system, it could affect the features in nodes (X 6= not accessible, and the accuracy of the graph constructed by
Xf , A = Af ), or the adjacency topology (X = Xf , A 6= various methods cannot be guaranteed. Therefore, the edge-
Af ), or both (X 6= Xf , A 6= Af ). level GNN is used to reconstruct the original association graph,
In this sense, if the fault information is integrated in X so that nodes with uncertain relationship (the corresponding
or A, then the GNN-based fault diagnosis method is feasible. value in the adjacency matrix is 0) are identified as having
8

Fig. 9: Fault diagnosis framework based on GNN

Table II: Algorithm (1) Table III: Algorithm (2)


Algorithm: Fault diagnosis method based on node-level GNN Algorithm: Fault diagnosis method based on edge-level GNN
Input: Data set X with length N , adjacency matrix A, label set Y with Input: Data set X with length N , adjacency matrix A, number of
length Ntr , number of fault types M , number of iterations num. iterations num.
Output: Diagnosis results of the test data set Xtest . Output: The reconstruction matrix  of the adjacency matrix A.
1. Divide data set X into training set Xtrain , test set Xtest , and 1. Construct the GAE model:
validation set Xval according to the label set Y.
2. The forward propagation formula of GAE layers :
2. Construct the GNN model:
3. Z = gcn(X, A).
3. The forward propagation formula of GNN layers :
4. Â = σ(ZZT ).
4. For GCN model, Z = σ(D̃−0.5 ÃD̃−0.5 XΘ).
P 5. Loss function:
5. For GAT model, Z = σ( j∈N (i) αij WXj ). 1 P
Loss = − N Aij log Âij + (1 − Aij )log(1 − Âij ).
6. For GraphSage model,
6. Train the GAE model:
Z = σ(W · Concat(Xi , Aggregate(Xj , ∀j ∈ N (i))).
7. for i = 1, 2, ..., num:
Loss function: Loss = − N
P tr PM
7. i=1 j=1 Zij logYij .
8. Input the data set X and the matrix A to the GAE model.
8. Train the GNN model:
9. Calculate the loss function.
9. for i = 1, 2, ..., num:
10. Input the total data set X to the GNN model. 10. Update the model with back propagation.
11. Calculate the loss function. 11. Obtain the reconstruction matrix Â.
12. Update the model with back propagation.
13. Complete GNN model training.
14. Validate the trained GNN model using the data set Xval .
among nodes. At present, the widely used construction meth-
15. Obtain the diagnosis results, Z = model(Xtest ).
ods for the association graph are:

A. Using K-nearest neighbor method to construct the associ-


association. The above steps construct the association graph
ation graph
that can better reflect dependencies between nodes.
Edge-level GNN can be combined with node-level GNN. In practice, a possible method to construct the association
First, the original graph is input into the GAE model for graph is: first, a large number of time-frequency features of
reconstruction, then the reconstructed graph is fed into the a given data set are extracted. Then, a small number of time-
node-level GNN together with feature matrix X to realize the frequency features that can reflect the characteristic of the
final fault diagnosis. The implementation of the edge-level system are obtained through feature selection. Next the K-
GNN-based FD algorithm is given in Table III (only GAE nearest neighbor (KNN) method is used to find the K nearest
model is considered in this paper). neighbors of the given node according to these features.
3) Exploration of graph-level GNNs in fault diagnosis Finally, the construction of the association graph is realized
STGCN is the commonly used graph-level GNN. Unlike [62].
node-level GNN, STGCN does not focus on the dependencies Suppose the data set X = {x1 , x2 , ..., xN } is a feature
within the data set, but on the complex dependency between matrix, the KNN method is used, then the nearest neighbor
components in a single data point [60]. In the task of fault matrix P is formed, where Pi stores the K nearest neighbor
diagnosis, time-series are usually composed of data collected nodes of the ith data in X, i ∈ [1, N ]. The composition can
by multiple sensors. Therefore, a sample can be understood be realized by using the nearest neighbor matrix P, and the
as a node, and a series of measurements sampled by multiple corresponding adjacency matrix A ∈ RN ×N , that is,
sensors together form a graph. The implementation of the 
graph-level GNN-based FD algorithm is shown in Table IV. 1 Pi ∈ xj
Aij = (14)
0 Pi ∈
/ xj
V. D ISCUSSIONS ON CONSTRUCTING ASSOCIATION GRAPH The illustration of KNN-based graph construction method
As mentioned previously, the association graph is an es- is shown in Fig. 10, and there are two factors that affect the
sential part of GNN, which reflects the implicit dependency quality of KNN-based graph construction method:
9

Table IV: Algorithm (3)


Algorithm: Fault diagnosis method based on graph-level GNN
Input: Data set X with length N , adjacency matrix A, label set YL with
length n1 , number of fault types M , number of iterations num.
Output: Diagnosis results of the test data set Xtest .
1. Divide dataset X into training set Xtrain , test set Xtest and
validation set Xval according to the label set YL .
2. Use Xtest to construct the adjacency matrix X.
3. Construct the STGCN model:
4. In STGCN Block :
5. In Temporal Block1 :
6. Xout1 = CN N s(X). Fig. 11: SA-based graph construction method
7. In Spatial Block :
8. Xout2 = σ(D̃−0.5 ÃD̃−0.5 Xout1 Θ).
method, all features are treated equally. In addition, its
9. In Temporal Block2 :
time complexity is O(N × N ), resulting that the increase
10. Z = CN N s(Xout2 ).
P n1 P M of the amount of data N can lead to a dramatic increase
11. Loss function: Loss = − Zlf logLlf .
l=1 f =1 of the time consumption for the graph construction.
12. Train the STGCN model:
In general, the KNN-based construction method is favorable
13. for i = 1, 2, ..., num:
due to its simple and clear logic.
14. Input the training set Xtrain to the STGCN model.
15. Calculate the loss function.
16. Update the model with back propagation. B. Using prior knowledge to construct the association graph
17. Evaluate the model with validation set Xval . In [64], the idea of using prior knowledge to construct the
18. Complete GNN model training. association graph is considered. It uses structural analysis (SA)
19. Validate the trained STGNN model using the data set Xval . method to prediagnose the fault, and transforms the results of
20. Obtain the diagnosis results, Z = model(Xtest ). prediagnosis into a graph to construct the GCN model for the
final fault diagnosis. In this method, the graph construction is
a bridge that connects the model-based SA and the data-based
GCN. The illustration of SA-based graph construction method
is shown in Fig. 11.
The introduction of prior knowledge significantly increases
the training time, and requires not only the knowledge about
the mechanisms of the system, but also data information.
On the other hand, it ensures the accuracy of the graph,
greatly improves the overall performance of the model, and
is one of the best methods to build the association graph
when prior knowledge is available. In addition, it should be
mentioned that, as a bridge, the association graph can combine
the knowledge with the measurements to exploit the strengths
of both methods.

Fig. 10: KNN-based graph construction method C. Using matrix completion to adjust the association graph
The construction process of the graph is to analyze the
potential existence of the dependency between nodes. There-
1) The selection of time and frequency domain features fore, the construction of the association graph can also be
[63]. In fact, the selection of feature types is completely transformed to the link prediction issue. Inspired by [65],
subjective. It is difficult to explain why some feature we propose a new method for constructing the association
types are used instead of others, many trial-and-error graph. Firstly, an incomplete association graph of a data set is
experiments are necessary. constructed by the KNN method. Then the adjacency matrix
2) The determination of the distance between nodes. A of the graph is reconstructed by the GAE model. Finally, the
straightforward method is to use Euclidean distance to downstream task is completed according to the reconstructed
measure the distance between features. From another association graph.
point of view, the occurrence of a fault will lead to The method of using a matrix to complete and adjust the
the deviation of features, but may not affect others. association graph is an extension of the KNN method. Since
Therefore, various features should be lead to different GAE is a type of unsupervised learning, its purpose is to adjust
distance values. However, due to the principle of KNN the original association graph, so the graph reconstructed
10

Fig. 12: GAE-based graph construction method

by GAE is at least no worse than the original one. Subse- Table V: Hyperparametric numerical simulation of GNN
quent experiments show that the method of reconstructing model
association graph using GAE model can improve diagnostic Models Number of epochs Learning rate Optimizer
performance to a certain extent. The illustration of GAE-based GCN 300 0.000 18 RMSProp
graph adjustment method is shown in Fig. 12. GAT 200 0.000 05 Adam
GraphSage 300 0.005 RMSProp
STGCN 200 0.001 Adam
D. Assess the quality of the association graph
According to the constructed association graphs, GNNs
aggregate the isolated data into a whole. Through various feature distributions of two neighbors in the graph are quite
aggregation methods, GNNs enable each node to aggregate the different (corresponding to a big λf indicator), but they have
information of its neighbors. Due to this, the performance of the same category (corresponding to a small λl indicator).
the GNN model is better than that of ordinary neural network Interestingly, [66] implicitly explains the limitation of KNN-
in many fields. based construction method: it tends to connect two nodes with
In [66], it was pointed out that the GNN model has been similar feature distribution, which makes nodes obtain less
widely used in graph representation learning, but it remains a information from their neighbors. From this point of view,
issue to measure the quality of the graph. To deal with this, both the λf index and the λl index calculated by the graph
two quality indicators, namely feature smoothness and label from KNN-based method are small, which indicate a “correct
smoothness, were proposed. The feature smoothness index λf but useless” association graph.
is:
|| ei,j ∈E (xvi − xvj )2 ||
P
λf = (15) VI. B ENCHMARK STUDY AND COMPARISON
|E| · d To illustrate the performance of GNN-based FD methods,
where xvi and xvj represent the features of the nodes vi and three baseline models as well as several GNN-based FD
vj , respectively, d is the dimension of xvi and xvj , and |E| methods are tested on three industrial data sets and compared.
represents the amount of the edges E in a association graph.
It is assumed that nodes with dissimilar features compared
A. The designed models
with their neighbors tend to obtain more information from
association graph. Therefore, λf is positively correlated with The detailed architectures of various GNN methods have
the quality of the association graph. been discussed in Section III. The corresponding hyperparam-
Another label smoothness index λl is defined as: eters of each models are shown in Table V. The GCN model
X is composed of a GC layer, three 1dCNN layers, and two
λl = (1 − I(vi , vj ))/|E| (16) fully connected layers; The GAT model is composed of two
ei,j ∈E graph attention layers with a multihead mechanism, where the
where I(vi , vj ) = 0 if Yvi 6= Yvj , I(vi , vj ) = 1 if Yvi = Yvj , number of heads is eight; The GraphSage model is composed
Yvi and Yvj represent the category of the nodes vi and vj , of two layers, and the GCN aggregator is used; the STGCN
respectively. model contains two temporal blocks and one spatial block.
The purpose of calculating λl is based on an inductive bias The spatial block uses traditional GCN for graph convolution,
that, in a graph, if most nodes and their neighbors are in the while the temporal blocks perform convolution-max-pooling-
same categories, the graph is beneficial for the training of the convolution for each node in the temporal dimension.
model. Therefore, λl is negatively correlated with the quality Furthermore, 6 kinds of widely-used baseline models are
of the association graph. used, namely the CNN, LSTM, RF, GBDT, LGBM, and SVC
In conclusion, λf and λl measure the quality of the asso- models.
ciation graph from two aspects. They point out that a high- 1. CNN. CNN is a rather common baseline model, which
quality graph should have the following characteristics: the performs well in dealing with issues in various research fields.
11

Table VI: Description of faults

Data set Description of fault Fault type


Current sensor fault Random variation
IGBT1 fault Step
Rectifier IGBT2 fault Step
IGBT3 fault Step
IGBT4 fault Step
Demagnetization fault Slow drifting
Motor Inter-turn short fault Step
Bearing fault Random variation
A/C feed ratio, B composition con-
Step
stant
B composition, A/C ratio constant Step Fig. 13: Hardware-in-the-loop simulation platform for the
D feed temperature Step
TEP Reactor cooling water inlet temp. Step pulse rectifier of a traction system
Condenser cooling water inlet temp. Step
A feed loss Step
C Header pressure loss - reduced
availability
Step and the dimension of the total data set is 1104 × 48. It contains
one normal condition and three fault types. Since the motor
data set has no evident sequential characteristic, STGCN is
2. LSTM. LSTM does well in processing time-series data. not used for validation in it.
It is kind of an advanced recurrent neural network (RNN). The dimension of TEP data set is 1100 × 500 × 52, which
3. RF. Random forest model (RF) constructs multiple is composed of 1100 labeled measurements, each of which is
decision trees. To realize the prediction purpose, it counts the composed of 41 observation variables and 11 control variables.
prediction results of each decision tree, and obtains the final It contains one normal condition and seven fault types. Fault
results through voting methods. types exist in the three data sets are briefly described in Table
4. GBDT. Gradient boosting decision tree, also known VI.
as MART (multiple additive regression tree), is an iterative The three data sets are collected from systems either with
decision tree method. The algorithm is composed of multiple known structural information (like the pulse rectifier) or with-
decision trees, and the conclusions of all trees are accumulated out known structural information. Thereby, they are represen-
to make the final answer. tative in the field of fault diagnosis. Related parameters of the
5. LGBM. Light gradient boosting machine is a framework data sets are summarized in Table VII.
for implementing GBDT method. It supports efficient parallel
training, and has the advantages of faster training speed, lower C. Description of fault types
memory consumption, better accuracy, distributed support and
As is shown in Table VI, more than ten types of faults are
rapid processing of massive data.
considered in experiments, which can be roughly divided into
6. SVC. Support vector classification avoids the traditional
three categories, including random variation fault, step fault,
process from induction to deduction, efficiently realizes the
and slow drifting fault. Their characteristics are depicted as
transformation from training process to prediction process, and
follows [67].
greatly simplifies the usual problems such as classification and
1. Characteristics of a random variation fault. Sampled
regression.
values with random variation fault will fluctuate randomly.
Therefore, there exists irregularly changed deviations between
B. Description of data sets the samples and the actual value. The generation of the random
The experiments in this paper are implemented on three data variation faults is affected by many factors including noise and
sets. The first data set is obtained from a hardware-in-the-loop uncertainties, which are ubiquitous in a industrial process.
(HIL) simulation platform based on the structure of the pulse 2. Characteristics of a step fault. The deviation between
rectifier in a traction system (as shown in Fig. 13), named sampled values and actual values changes significantly in a
the “rectifier data set”. The second data set is collected from short period of time. The step fault can lead to great harm to
a motor benchmark, named “motor data set”. The last data a industrial system, take pulse rectifier for example, step fault
set is obtained from the Tennessee Eastman chemical process may be caused by an open circuit or short circuit.
benchmark, named the “TEP data set”1 . 3. Characteristics of a slow drifting fault. The samples
The dimension of the rectifier data set is 1067 × 256 adds an extra signal that is proportional to time. Slow drifting
× 6. It is composed of 1067 labeled measurements. Each faults may result from aging of components.
measurement is composed of six sensor samples, and the
dimension of a single measurement is 256 × 6. The 1067 D. Constructing association graphs from the data sets
measurements are divided into one normal condition and five
fault types. The construction of the association graph plays an important
The motor data set consists of 1104 labeled measurements, role in GNN-based fault diagnosis. Based on SA, KNN, and
each of which records 48 features of its corresponding motor, KNN + GAE methods, three kinds of graphs are constructed.
Among graphs constructed by three kinds of methods, each
1 http://brahms.scs.uiuc.edu of them has its own characteristics. For the graph constructed
12

Table VII: Details of three data sets


Data set Quantity Dimensions Structural information Data complexity
rectifier data set 1067 256×6 Available High
motor data set 1104 48 Unavailable Low
TEP data set 1100 5000×52 Unavailable High

the TEP data set (accounts for 13.64% of the data set). To
make a comparison between different construction methods,
we used the SA-based method, the KNN and KNN + GAE-
based method on the rectifier data set, KNN and KNN + GAE-
based methods on the motor and TEP data sets. The diagnostic
accuracies are shown in Table IX, they reflect the performance
of various diagnostic methods. For example, when the size of
the test set is 867, and the accurate prediction quantity is 617,
(a) Graph constructed by SA method (b) Graph constructed by
KNN method
the diagnostic accuracy is calculated as 0.712.
From Table IX, the selection of the construction methods
have a great impact on model performance. First consider the
rectifier data set, the accuracies of three GNN models with SA-
based construction method reach approximately 90% in aver-
age, which is much higher than other methods. This is because
the SA-based construction method uses a large amount of
prior knowledge to construct a high-quality association graph.
Besides, the performance of the three FD models using KNN
(c) Graph constructed by
KNN+GAE method + GAE method for graph construction is better than that using
the KNN method; On the motor data set, the performance of
Fig. 14: Schematic diagrams of graphs constructed by three GCN using KNN method is slightly better than that of GCN
kinds of methods with KNN + GAE method; As for the TEP data set, both
construction methods can achieve favorable fault diagnosis
Table VIII: Related parameters of the association graph performance. The accuracies of the GCN and the GAT models
using KNN + GAE method are better than that using KNN.
Construction method Data set No. of edges K value
SA rectifier data set 104 521 \ Generally speaking, the SA-based construction method is
rectifier data set 325 56 45 better than other methods. However, in most cases, it is
KNN motor data set 540 96 50 difficult to obtain prior knowledge of system, so KNN-based
TEP data set 539 00 30
rectifier data set 862 69 45 and KNN + GAE-based construction methods can be used
KNN+GAE motor data set 119 387 50 as its replacement for cases where the structure information
TEP data set 103 667 30 is unknown. Furthermore, the overall performances of GNNs
using KNN + GAE-based method are higher than that of
GNNs using the KNN-based method. This may be attributed
by SA method, all nodes in a graph are clustered respectively. to the fact that, the usage of GAE realizes the prediction of
Nodes belonging to the same cluster are connected to each the association relationships that do not exist in the original
other, and nodes of different clusters are isolated from each graph, which actually improves the quality of the association
other (as is shown in Fig. 14(a)). For the graph constructed by graph.
KNN method, all nodes have the potential to be connected to
each other. Usually, all nodes in the graph form a connected 2) Comparison between GNN models and baselines: In
domain (as is shown in Fig. 14(b)). The graph constructed by the benchmark study, GNN-based methods are also compared
KNN + GAE method is slightly different from the KNN-based with various baseline methods. Now that the SA-based graph
graph, whose schematic diagram is shown in Fig. 14(c). construction method introduces prior knowledge, which is
Related parameters about each graph are shown in Table infeasible in baseline methods, GNN models trained by SA
VIII. Note that the K value is only applicable to the KNN are not compared with baseline methods. It is found that GCN
method. and GraphSage are not as good as CNN when dealing with the
rectifier data set, but achieve better performance than the other
baseline methods. STGCN with KNN-based method and GAT
E. Experimental results and analysis with KNN + GAE-based method outperform baseline methods
1) Comparison between graph construction methods: The in terms of diagnostic accuracy.
sizes of training set in three data sets are It should be noted that on the motor data set, the LGBM
The amount of training set in the rectifier data set is 50 and GBDT methods outperform GNN-based methods as well
(accounts for 4.69% of the data set), while 80 in the motor as other baseline models, this may be because of the fact that
data set (accounts for 7.25% of the data set), and 150 in the motor data set itself is relatively simple and the association
13

Table IX: Experimental results (accuracy±standard deviation)


Model Method rectifier data set(Train_set = 50) motor data set(Train_set = 80) TEP data set(Train_set = 150)
SA 0.862±0.042 \ \
GCN KNN 0.712±0.028 0.747±0.018 0.993±0.002
KNN+GAE 0.750±0.011 0.736±0.030 0.970±0.051
SA 0.901±0.056 \ \
GAT KNN 0.801±0.024 0.661±0.036 0.969±0.033
GNN-Models
KNN+GAE 0.877±0.017 0.697±0.041 0.988±0.010
SA 0.943±0.001 \ \
Graphsage KNN 0.786±0.011 0.809±0.014 0.993±0.002
KNN+GAE 0.818±0.006 0.823±0.014 0.996±0.001
STGCN 0.843±0.141 \ 0.976±0.007
CNN 0.842±0.047 0.472±0.079 0.988±0.013
LSTM 0.675±0.046 0.442±0.075 0.912±0.016
RF 0.666±0.015 0.547±0.067 0.855±0.005
Baseline-Models
LGBM 0.457±0.016 0.958±0.020 0.938±0.010
GBDT 0.415±0.026 0.930±0.018 0.950±0.009
SVC 0.536±0.075 0.881±0.021 0.895±0.008

(a) Accuracy of GCN model (b) Accuracy of GAT Model

(c) Accuracy of GraphSage Model (d) Accuracy of STGCN and ML Models

Fig. 15: Detailed display of experimental results on the rectifier data set

between data is not close. experiments were conducted for the comparison with baseline
In general, considering the performance of GNN methods in methods. The experimental results are shown in Table X, Fig.
three data set, GNN methods achieve same or better diagnosis 15, and Fig. 18. In order to better present the convergence
performance compared with the six baseline methods. Besides, process inside the STGCN model during its iteration process,
when dealing with data set whose association graph can be the variation tendency of the output of the hidden layer in
constructed based on prior knowledge, GNN with SA-based STGCN method is tracked as shown in Fig. 17. In this figure,
graph construction method is the preferred one. the output of the hidden layer is processed with PCA method
3) The influence of small sample case: In practice, a system for dimensional reduction, and constitutes a node in the graph.
usually runs in “long-term fault-free and short term fault” As shown in Fig. 17, with the increase in the number of
scenario due to the highly reliable design of the system. This iterations, the outputs with the same label in the hidden
leads to a small number of fault samples, which will be called layer become gradually similar. In addition, the confusion
the “small sample case” in this paper. matrices corresponding to the diagnosis results obtained by
To verify the effectiveness of the GNN-based FD methods these methods are shown in Fig. 16. According to the analysis
for the small sample case, the amount of training set is set to of Table X, when the number of training sets is 10 (accounts
10-100 on the rectifier data set, and a total of ten groups of for 0.94% of the data set), except for STGCN, the overall
14

Table X: Comparison results on the rectifier data set


Training set 10 20 30 40 50 60 70 80 90 100
KNN 0.586 0.710 0.736 0.774 0.786 0.900 0.894 0.928 0.938 0.945
GraphSage KNN+GAE 0.644 0.761 0.753 0.803 0.818 0.898 0.897 0.947 0.950 0.951
SA 0.935 0.927 0.934 0.944 0.943 0.946 0.950 0.953 0.945 0.945
KNN 0.532 0.617 0.660 0.701 0.712 0.751 0.819 0.830 0.829 0.863
GCN KNN+GAE 0.534 0.669 0.723 0.738 0.750 0.826 0.809 0.879 0.867 0.907
SA 0.769 0.804 0.864 0.906 0.862 0.896 0.926 0.928 0.903 0.929
KNN 0.686 0.739 0.782 0.802 0.801 0.876 0.904 0.903 0.930 0.935
GAT KNN+GAE 0.752 0.789 0.805 0.834 0.877 0.891 0.885 0.933 0.935 0.934
SA 0.808 0.812 0.842 0.877 0.901 0.897 0.921 0.937 0.932 0.932
STGCN 0.312 0.443 0.652 0.782 0.843 0.868 0.888 0.941 0.928 0.944
CNN 0.484 0.618 0.724 0.796 0.842 0.874 0.894 0.906 0.893 0.910
LSTM 0.506 0.529 0.636 0.666 0.676 0.658 0.735 0.782 0.759 0.776
Random Forest 0.433 0.528 0.608 0.643 0.666 0.684 0.717 0.748 0.732 0.757
LGBT 0.347 0.346 0.346 0.424 0.457 0.524 0.556 0.557 0.599 0.641
GBDT 0.301 0.350 0.307 0.407 0.415 0.478 0.522 0.581 0.577 0.595
SVC 0.360 0.403 0.467 0.509 0.536 0.604 0.654 0.635 0.627 0.635

Fig. 16: Confusion matrices of various FD methods

diagnostic accuracies of the GNN-based methods are better methods can achieve good fault diagnosis. Considering that
than the six baseline methods, this may be attributed to the GraphSage belongs to the inductive learning mode, it is a good
fact that the adjacency matrix of STGCN cannot fully reflect choice to use GraphSage model for fault diagnosis. GAT uses
the effective association between features in the case of small attention mechanism and a multihead mechanism to improve
samples. the generalizability, and has obvious advantages in processing
As the size of the training sets increases, the diagnostic data set with complex internal structures. Compared with
accuracies of the GNN-based and baseline FD methods in the the other three GNN-based FD methods, the performance of
test set increase correspondingly, but the diagnostic perfor- STGCN is relatively poor when the number of training sets is
mance of the GNNs is still better than the methods on the small, but with the increase of the number of training sets, the
whole. In general, GNN-based methods can achieve relatively accuracy of STGCN in the test sets is approximately equal to
better results in the small sample case, and the advantages GraphSage.
of the GNNs is more obvious with the introduction of prior
knowledge. VII. P ERSPECTIVES OF FUTURE RESEARCH
In conclusion, the four GNN-based FD methods have dis- GNN is a promising way for fault diagnosis, but several
tinct advantages in three benchmark data sets. Most of the challenges still need to be further investigated, which are
15

Fig. 17: Variation tendency of the hidden layer output of the STGCN model (the percentage represents the proportion of the
total iterative training process, and different colors of data denote different categories)

fault diagnosis methods. However, in GNN-based fault


diagnosis, it is usually assumed that all faults are fully
diagnosable, but this assumption does not always hold.
In order to get a better diagnosis accuracy, researchers
often devote effort to improve the neural network model,
but ignore the analysis of the essential problem of fault
diagnosability. From this point of view, it is of great
significance to analyze fault diagnosability in GNN-based
fault diagnosis.
Fig. 18: Summary of experimental results on the rectifier (4) How to update the graph?
data set It is often assumed that the graph used by GNN is
unchanged. Can the graph be updated in the process
of fault diagnosis so that it can dynamically show the
expected to inspire possible research directions in this field relationship of each node? In addition, the normal status
over the next years. of a system may drift over a period of time. As a result, the
(1) How to construct high quality association graph from data? association relationship between measurements may also
Up to now, the existing graph construction methods are change, which requires timely changes in the association
insufficient, and the data may have poor quality with graph.
outliers, uncertain connections, and missing values. Con-
sidering that the quality of the graph has a large impact (5) How to detect unknown fault types?
on the performance of GNN model, it is quiet essential to The current fault diagnosis methods are based on known
consider how to construct high-quality association graphs types of faults. When a GNN model receives measure-
from a given data set. ments of an unknown fault type, it makes a fault decision
(2) How to use prior knowledge in GNN for fault diagnosis? based on the known fault types. The popular transfer learn-
Most of the data-driven fault diagnosis methods do not ing approach may be a promising technique to provide a
pay attention to the prior knowledge of the system of feasible solution [69].
interest, but in many cases, engineers have at least some (6) How to improve GNN security against cyber attacks?
understanding of the process information. How to combine In the information age, everything is interconnected. A
prior knowledge with measurements is worth considering. mature neural network must not only collect data through
A promising way is to combine knowledge graphs with the internet, but also guard against possible cyber attacks.
GNN [68]. In addition, GNN model emphasizes the relationship be-
(3) How to study fault diagnosability? tween measurements, so once a small part of measure-
In the field of traditional model-based fault diagnosis, fault ments in the data set is maliciously tampered with, it
diagnosability is divided into two parts: detectability and will have a negative impact on the performance. How to
isolability. The fault that does not meet the requirements minimize the above adverse effects is a problem that GNN
of detectability or isolability cannot be diagnosed by must face in the industrial application.
16

VIII. C ONCLUSION [11] S. Ding, Data-driven Design of Fault Diagnosis and Fault-tolerant
Control Systems. London, UK: Springer, 2014.
In this paper, the emerging GNN-based fault diagnosis [12] S. Ding, Model-based Fault Diagnosis Techniques: Design Schemes,
methods are briefly reviewed. The NN-based fault diagno- Algorithms, and Tools. Berlin, Germany: Springer, 2008.
sis methods are divided based on the data representations [13] C. Alippi, N. Stavros, and R. Manuel, “Model-free fault detection and
isolation in large-scale cyber-physical systems,” IEEE Transactions on
in real world, namely, the time-series-based NNFD method, Emerging Topics in Computational Intelligence, vol. 1, no. 1, pp. 61–71,
the image-based NNFD method, and the graph-based NNFD 2016.
method. Then, basic principles and principal architectures of [14] H. Luo, H. Zhao, and S. Yin, “Data-driven design of fog-computing-
aided process monitoring system for large-scale industrial processes,”
GNN are introduced, with attention to GCN, GAT, GraphSage, IEEE Transactions on Industrial Informatics, vol. 14, pp. 4631–4641,
GAE, and STGCN according to the different graph analytic 2018.
tasks. Furthermore, the GNN-based fault diagnosis framework [15] P. M. Papadopoulos, V. Reppa, M. M. Polycarpou, and C. G. Panayiotou,
is detailed with focus on building the association graph and “Scalable distributed sensor fault diagnosis for smart buildings,”
IEEE/CAA Journal of Automatica Sinica, vol. 7, no. 3, pp. 638–655,
designing the GNN models. Experiments on three benchmark 2020.
data sets were carried out to verify the effectiveness and [16] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning
feasibility of GNN-based FD methods by comparing with applied to document recognition,” Proceedings of the IEEE, vol. 86,
no. 11, pp. 2278–2324, 1998.
several baseline FD methods. Finally, perspectives on the [17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
challenges of GNN-based fault diagnosis are discussed. This image recognition,” in IEEE Conference on Computer Vision and Pattern
review is prepared in response to the urgent need of a seamless Recognition, pp. 770–778, 2016.
[18] J. C. David and J. MacKay, “A practical bayesian framework fro
and rigorous transition from classical neural network-based backpropagation networks,” Neural Computation, vol. 4, pp. 448–472,
fault diagnosis to the alternative fault diagnosis approaches 1992.
which operate directly on graph-structured data. It is also [19] C. Cheng, J. Wang, H. Chen, Z. Chen, H. Luo, and P. Xie, “A review of
expected to provide guidelines for future research in this field. intelligent fault diagnosis for high-speed trains: Qualitative approaches,”
Entropy, 2020, doi:10.3390/e23010001.
[20] K. Zhang, H. Hao, Z. Chen, S. Ding, and K. Peng, “A comparison
ACKNOWLEDGMENT and evaluation of key performance indicator-based multivariate statistics
process monitoring approaches,” Journal of Process Control, vol. 33, pp.
This work was supported in part by the National Natural 112–126, 2015.
Science Foundation of China (#62173349,#U20A20186), [21] K. Zhong, M. Han, and B. Han, “Data-driven based fault prognosis
for industrial systems: A concise overview,” IEEE/CAA Journal of
in part by the Swiss National Science Foundation project Automatica Sinica, vol. 7, no. 2, pp. 330–345, 2020.
(#200021_172671): “ALPSFORT: A Learning graph-based [22] D. Zheng, L. Zhou, and Z. Song, “Kernel generalization of multi-
framework for cyber-physical systems”. rate probabilistic principal component analysis for fault detection in
nonlinear process,” IEEE/CAA Journal of Automatica Sinica, vol. 8,
no. 8, pp. 1465–1476, 2021.
R EFERENCES [23] K. Zhang, Y. Shardt, Z. Chen, X. Yang, S. Ding, and K. Peng, “A kpi-
based process monitoring and fault detection framework for large-scale
[1] S. Rizzo, G. Susinni, and F. Iannuzzo, “Intrusiveness of power de- processes,” ISA Transactions, vol. 68, pp. 276–286, 2017.
vice condition monitoring methods: Introducing figures of merit for [24] H. Chen, B. Jiang, S. X. Ding, and B. Huang, “Data-driven fault diagno-
condition monitoring,” IEEE Industrial Electronics Magazine, 2021, sis for traction systems in high-speed trains: A survey, challenges, and
doi:10.1109/MIE.2021.3066959. perspectives,” IEEE Transactions on Intelligent Transportation Systems,
[2] Y. G. Lei, J. Lin, M. J. Zuo, and Z. J. He, “Condition monitoring and pp. 1–17, 2020, doi:10.1109/TITS.2020.3029946.
fault diagnosis of planetary gearboxes: A review,” Measurement, vol. 48, [25] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D. In-
pp. 292–305, 2014. man, “1d convolutional neural networks and applications: A survey,”
[3] S. Yin, S. X. Ding, X. Xie, and H. Luo, “A review on basic data-driven Mechanical Systems and Signal Processing, vol. 151, p. 107398, 2021,
approaches for industrial process monitoring,” IEEE Transactions on doi:10.1016/j.ymssp.2020.107398.
Industrial Electronics, vol. 61, no. 11, pp. 6418–6428, 2014. [26] M. Kuppusamy, A. Hussain, P. Sanjeevikumar, J. Holm-Nielsen, and
[4] Z. Ge, “Review on data-driven modeling and monitoring for plant-wide V. Kaliappan, “Deep learning for fault diagnostics in bearings, insulators,
industrial processes,” Chemometrics and Intelligent Laboratory Systems, pv panels, power lines, and electric vehicle applications-the state-of-the-
vol. 171, pp. 16–25, 2017. art approaches,” IEEE Access, vol. 9, pp. 41 246–41 260, 2021.
[5] Y. Zhao, X. He, J. Zhang, H. Ji, D. Zhou, and M. G. Pecht, “Detection [27] D. Hoang and H. Kang, “A survey on deep learning based bearing fault
of intermittent faults based on an optimally weighted moving average diagnosis,” Neurocomputing, vol. 335, pp. 327–335, 2019.
t2 control chart with stationary observations,” Automatica, vol. 123, p.
[28] T. Kipf and M. Welling, “Semi-supervised classification with graph
109298, 2021.
convolutional networks,” International Conference on Learning Repre-
[6] J. Lee, F. Wu, W. Zhao, M. Ghaffari, L. Liao, and D. Siegel, “Prog-
sentations, vol. abs/1609.02907, 2016.
nostics and health management design for rotary machinery systems-
reviews, methodology and applications,” Mechanical Systems and Signal [29] H. Zhang and Y. Shen, “Template-based prediction of protein structure
Processing, vol. 42, no. 1, pp. 314–334, 2014. with deep learning,” BMC Genomics, vol. 21, 2020.
[7] J. Wang, F. Yang, T. Chen, and S. L. Shah, “An overview of industrial [30] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural
alarm systems: Main causes for alarm overloading, research status, networks on graphs with fast localized spectral filtering,” eural infor-
and open problems,” IEEE Transactions on Automation Science and mation processing systems, vol. 30, pp. 3844–3852, 2016.
Engineering, vol. 13, no. 2, pp. 1045–1061, 2016. [31] M. Niepert, M. Ahmed, and K. Kutzkov, “Learning convolutional neural
[8] Y. Dong, Z. Wu, X. Du, J. Zha, and Q. Yuan, “Resource abnormal networks for graphs,” International Conference on Machine Learning,
management method of unsteady processes of cloud manufacturing pp. 2014–2023, 2016.
services,” China Mechanical Engineering, vol. 29, pp. 1193–1200, 2018. [32] R. Girshick, J. Donahue, and T. Darrell, “Rich feature hierarchies for
[9] H. Henao, G. Capolino, and M. Cabanas, “Trends in fault diagnosis for accurate object detection and semantic segmentation,” Computer Vision
electrical machines: A review of diagnostic techniques,” IEEE Industrial and Pattern Recognition, pp. 580–587, 2014.
Electronics Magazine, vol. 8, no. 2, pp. 31–42, 2014. [33] X. Qi, R. Liao, J. Jia, S. Fidler, and R. Urtasun, “3D graph neural
[10] J. Qin, Y. Dong, Q. Zhu, J. Wang, and Q. Liu, “Bridging systems theory networks for RGBD semantic segmentation,” pp. 5209–5218, 2017.
and data science: A unifying review of dynamic latent variable analytics [34] L. Yao, C. Mao, and Y. Luo, “Graph convolutional networks for
and process monitoring,” Annual Reviews in Control, vol. 50, pp. 29–48, text classification,” Proceedings of the AAAI Conference on Artificial
2020. Intelligence, vol. 33, pp. 7370–7377, 2019.
17

[35] N. Park, A. Kan, X. L. Dong, T. Zhao, and C. Faloutsos, “Estimating [60] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional
node importance in knowledge graphs using graph neural networks,” in networks: A deep learning framework for traffic forecasting,” in Pro-
International Conference on Knowledge Discovery and Data Mining, ceedingsof the International Joint Conference on Artificial Intelligence,
pp. 596–606, 2019. pp. 3634–3640, 2018.
[36] Z. Zhang, P. Cui, and W. Zhu, “Deep learning on graphs: A survey,” [61] E. Mansimov, O. Mahmood, S. Kang, and K. Cho, “Molecular geometry
IEEE Transactions on Knowledge and Data Engineering, 2020, doi: prediction using a deep generative graph neural network,” Scientific
10.1109/TKDE.2020.2981333. Reports, vol. 9, no. 1, pp. 1–13, 2019.
[37] I. Chami, S. Abu, and B. Perozzi, “Machine learning on graphs: A model [62] T. Li, Z. Zhao, C. Sun, R. Yan, and X. Chen, “Multi-receptive field graph
and comprehensive taxonomy,” CoRR, 2020. convolutional networks for machine fault diagnosis,” IEEE Transactions
[38] D. Bacciu, F. Errica, A. Micheli, and M. Podda, “A gentle introduction on Industrial Electronics, 2020, doi: 10.1109/TIE.2020.3040669.
to deep learning for graphs,” Neural Networks, vol. 129, pp. 203–221, [63] L. Franceschi, M. Niepert, M. Pontil, and X. He, “Learning discrete
2020. structures for graph neural networks,” International Conference on
[39] J. Zhou, G. Cui, and Z. Zhang, “Graph neural networks: A review of Machine Learning, 2019.
methods and applications,” 2018. [64] Z. Chen, J. Xu, T. Peng, and C. Yang, “Graph convolutional network-
[40] J. B. Lee, R. A. Rossi, S. Kim, N. K. Ahmed, and E. Koh, “Attention based method for fault diagnosis using a hybrid of measurement and
models in graphs: A survey,” ACM Transactions on Knowledge Discov- prior knowledge,” IEEE Transactions on Cybernetics, pp. 1–13, 2021,
ery from Data, vol. 13, pp. 1–25, 2019. doi: 10.1109/TCYB.2021.3059002.
[41] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. Yu, “A comprehen- [65] R. Berg, T. Kipf, and M. Welling, “Graph convolutional matrix comple-
sive survey on graph neural networks,” IEEE Transactions on Neural tion,” ArXiv, vol. abs/1706.02263, 2017.
Networks and Learning Systems, vol. 32, no. 1, pp. 4–24, 2021. [66] Y. Hou, J. Zhang, J. Cheng, and K. Ma, “Measuring and improving the
[42] K. Chen, J. Hu, Y. Zhang, Z. Yu, and J. He, “Fault location in power use of graph information in graph neural networks,” in International
distribution systems via deep graph convolutional networks,” IEEE Conference on Learning Representations, 2019.
Journal on Selected Areas in Communications, vol. 38, pp. 119–131, [67] C. Yang, C. Yang, T. Peng, X. Yang, and W. Gui, “A fault-injection
2019. strategy for traction drive control systems,” IEEE Transactions on
[43] J. Jiang, J. Chen, and T. Gu, “Anomaly detection with graph convolu- Industrial Electronics, vol. 64, no. 7, pp. 5719–5727, 2017.
tional networks for insider threat and fraud detection,” in IEEE Military [68] S. Arora, “A survey on graph neural networks for knowledge graph
Communications Conference, pp. 109–114, 2019. completion,” CoRR, vol. abs/2007.12374, 2020.
[44] X. Yu, B. Tang, and K. Zhang, “Fault diagnosis of wind turbine gearbox [69] Y. Lei, B. Yang, X. Jiang, F. Jia, N. Li, and A. K. Nandi, “Applications
using a novel method of fast deep graph convolutional networks,” IEEE of machine learning to machine fault diagnosis: A review and roadmap,”
Transactions on Instrumentation and Measurement, vol. 70, pp. 1–14, Mechanical Systems and Signal Processing, vol. 138, p. 106587, 2020.
2021.
[45] T. Li, Z. Zhao, C. Sun, R. Yan, and X. Chen, “Multi-receptive field graph
convolutional networks for machine fault diagnosis,” IEEE Transactions
on Industrial Electronics, 2020, doi: 10.1109/TIE.2020.3040669.
[46] C. Li, L. Mo, and R. Yan, “Rolling bearing fault diagnosis based on
horizontal visibility graph and graph neural networks,” in International
conference on Sensing, measurement, data analytics in the era of
artificial intelligence, pp. 275–279, 2020.
[47] W. Z. A. J. Bruna and Y. L, “Spectral networks and locally connected
networks on graphs,” Computing Research Repository, pp. 1–14, 2013.
[48] Q. Li, Z. Han, and X. Wu, “Deeper insights into graph convolutional
networks for semi-supervised learning,” in Proceedings of the Thirty-
Second AAAI Conference on Artificial Intelligence, (AAAI-18), New
Orleans, Louisiana, USA, February 2-7, 2018, S. A. McIlraith and K. Q.
Weinberger, Eds., pp. 3538–3545. AAAI Press, 2018.
[49] Z. Zhang, J. Huang, and Q. Tan, “SR-HGAT: Symmetric relations based
heterogeneous graph attention network,” IEEE Access, vol. 8, pp. 631–
645, 2020.
[50] H. Li, H. Chen, and W. Wang, “A structural deep network embedding
model for predicting associations between mirna and disease based on
molecular association network,” Scientific Reports, vol. 11, p. 12640,
2021.
[51] A. Tsitsulin, J. Palowitch, B. Perozzi, and E. Muller, “Graph clustering
with graph neural networks,” ArXiv, vol. abs/2006.16904, 2020.
[52] Z. Wang and T. Oates, “Imaging time-series to improve classification
and imputation,” Conference on artificial intelligence, pp. 3939–3945,
2015.
[53] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Philip, “A
comprehensive survey on graph neural networks,” IEEE Transactions
on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 1–21,
2020.
[54] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, and Y. Bengio,
“Graph attention networks,” International Conference on Learning Rep-
resentations, 2017.
[55] J. He and H. Zhao, “Fault diagnosis and location based on graph
neural network in telecom networks,” in International Conference on
Networking and Network Applications, pp. 304–309, 2020.
[56] W. Hamilton, R. Ying, and J. Leskovec, “Inductive representation
learning on large graphs,” in Neural Information Processing Systems,
vol. 30, pp. 1025–1035, 2017.
[57] T. N. Kipf and M. Welling, “Variational graph auto-encoders,” 2016.
[58] S. Pan, R. Hu, G. Long, J. Jiang, and L. Yao, “Adversarially regularized
graph autoencoder for graph embedding,” in Proceedings of the Interna-
tional Joint Conference on Artificial Intelligence, pp. 2609–2615, 2018.
[59] Y. Liao, Y. Wang, and Y. Liu, “Graph regularized auto-encoders for
image representation,” IEEE Transactions on Image Processing, vol. 26,
no. 6, pp. 2839–2852, 2016.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy